Upload
ginger-jones
View
220
Download
0
Embed Size (px)
Citation preview
ROOTROOT.PATROOT.TES(ROOT.WGT)(ROOT.FWT)(ROOT.DBD) MetaNeural
ROOT.XXXROOT.TTTROOT.TRN(ROOT.DBD)ROOT.WGTROOT.FWT
• Use Analyze root –34 for easy way (the file meta let you override defaults)• Use meta root for full mode - e.g meta root - use MetaUI for input file
• ANALYZE = MetaNeural Alternative Code• Either run meta root analyze root.pat –34 (single training and testing) analyze root.pat –3434 (LOO) analyze root.txt 34 (bootstrap mode)• Results for analyze are in resultss.xxx and resultss.ttt• Results from MetaNeural are in root.xxx and root.ttt• MetaNeural input file is generated automatically in analyze• The file name meta overrides the default input file for analyze
4 => 4 layers 2 => 2 inputs 16 => # hidden neurons in layer #1 4 => # hidden neurons in layer# 2 1 => # outputs300 => epoch length (hint:always use 1, for the entire batch)0.01 => learning parameters by weight layer (hint: 1/# patterns or 1/# epochs)0.010.010.5 => momentum parameters by weight layer (hint use 0.5)0.50.510000000 => some very large number of training epochs 200 => error display refresh rate1 =>sigmoid transfer function1 => Temperature of sigmoidcheck.pat => name of file with training patterns (test patterns in root.tes)0 => not used (legacy entry)100 => not used (legacy entry)0.02000 => exit training if error < 0.020 => initial weights from a flat random distribution0.2 => initial random weights all fall between –2 and +2
MetaNeural Input File for the ROOT
EXAMPLE DATA SETS
• IRIS data• Checkerboard data• Svante wold’s QSAR data• Cherkassky’s nonlinear function• Albumin QSAR data
REM GET TRAINING AND TEST DATA FOR CHESScopy mje_data\check\check.patcopy mje_data\check\check.tescopy mje_data\check\check meta
REM METAPLS SCALE THE DATAREM JOIN DATA FILES
copy check.pat+check.tes aa.txtREM META-PLS SCALE
analyze aa.txt 8REM RESPLIT DATA
REM ANSWER (2000 and 0 respectively)analyze aa.txt.txt 20copy cmatrix.txt check.patcopy dmatrix.txt check.tes
REM ERASE CLUTTERerase cmatrix.txterase dmatrix.txterase *.$$$erase aa.*erase *.txt.txtREM PREPARE SCALING FILE HATS.TXT FOR CHECKTEST.Manalyze stats.txt -100copy stats.txt.txt hats.txterase stats.*
analyze check.pat -34pauseexit
REM DESCALE DATAanalyze resultss.ttt -4REM run MATLAB file checktest in MATLABpauseexit
CHECK_DATA.BAT CHECK_NET.BAT
CHECK_TEST.BAT
-9.59E-01 -8.33E-02 1 1965-9.97E-01 -1.09E-01 1 8474-1.69E-01 1.27E+00 0 5849
-1.47E+00 -4.45E-01 1 63833.43E-01 -1.45E+00 0 87132.42E-01 -1.10E+00 0 30768.33E-01 -4.12E-01 1 1032
1.14E+00 1.15E+00 0 5297-5.78E-02 3.65E-01 1 7773
CHECK.PAT
FILES RELATED TO CHECKERBOARD EXAMPLE
42
164110
0.010.010.50.50.5
10000000-100
11
check.pat0
1000.1
00.6
MetaNeural INPUT FILE FOR CHECKERBOARD DATA
NAME PIE PIF DGR SAC MR Lam Vol DDGTS IDAla 0.23 0.31 -0.55 254.2 2.126 -0.02 82.2 8.5 1Asn -0.48 -0.6 0.51 303.6 2.994 -1.24 112.3 8.2 2Asp -0.61 -0.77 1.2 287.9 2.994 -1.08 103.7 8.5 3Cys 0.45 1.54 -1.4 282.9 2.933 -0.11 9.1 11 4Gln -0.11 -0.22 0.29 335 3.458 -1.19 127.5 6.3 5Glu -0.51 -0.64 0.76 311.6 3.243 -1.43 120.5 8.8 6Gly 0 0 0 224.9 1.662 0.03 65 7.1 7His 0.15 0.13 -0.25 337.2 3.856 -1.06 140.6 10.1 8Ile 1.2 1.8 -2.1 322.6 3.35 0.04 131.7 16.8 9
Leu 1.28 1.7 -2 324 3.518 0.12 131.5 15 10Lys -0.77 -0.99 0.78 336.6 2.933 -2.26 144.3 7.9 11Met 0.9 1.23 -1.6 336.3 3.86 -0.33 132.3 13.3 12Phe 1.56 1.79 -2.6 366.1 4.638 -0.05 155.8 11.2 13Pro 0.38 0.49 -1.5 288.5 2.876 -0.31 106.7 8.2 14Ser 0 -0.04 0.09 266.7 2.279 -0.4 88.5 7.4 15Thr 0.17 0.26 -0.58 283.9 2.743 -0.53 105.3 8.8 16Trp 1.85 2.25 -2.7 401.8 5.755 -0.31 185.9 9.9 17Tyr 0.89 0.96 -1.7 377.8 4.791 -0.84 162.7 8.8 18Val 0.71 1.22 -1.6 295.1 3.054 -0.13 115.6 12 19
QSAR DATA SET EXAMPLE: 19 Amino Acids
From Svante Wold, Michael Sjölström, Lennart Erikson, "PLS-regression: a basic tool of chemometrics," Chemometrics and Intelligent Laboratory Systems, Vol 58, pp. 109-130 (2001)
RENSSELAER
PIE Lipophilicity constant of the AA side chainPIF "DGR Free energy of transfer on AA sidechain from protein to H2OSAC Water accessible surface of AAMR Molecular refractivityLam Polarity parametereVol Molecular VolumeDDGTS Free energy of unforlding a protein
7 8 9 10 11 12 13 14 15 16
7
8
9
10
11
12
13
14
15
16
1
2
4
5
7
8
9
10
11
12
13
14
15
18
19
SCATTERPLOT DATA ( results.ttt )
Observed Response
Pred
icte
d R
espo
nse
q2 = 0.261 Q2 = 0.265RMSE = 1.680
4197311
0.050.050.050.50.50.5
1000000200
11
QSAR.pat0
1000.1
00.6
REM COPY DATA FILE AND LABELS FILEcopy MJE_data\qsar\svante.txt qsar.pat
REM META-PLS SCALE THE DATAanalyze qsar.pat 8copy qsar.pat.txt qsar.patcopy qsar.pat qsar.tes
REM COPY LABELScopy MJE_data\qsar\svante_label.txt sel_lbls.txt
REM COPY METANEURAL FILEcopy MJE_data\qsar\qsar metapauseexit
analyze qsar.pat -3434pauseexit
REM DESCALE THE DATAanalyze resultss.ttt -4REM USE MATLAB FILE DOS_MBOTW.MREM DOS_MBOTW RESULTS.TTT
6 7 8 9 10 11 12 13 14 15 16
6
7
8
9
10
11
12
13
14
15
16
1
23
4
5
6
78
910
11
12
13
14
1516
17
18
19
SCATTERPLOT DATA ( results.ttt )
Observed Response
Pred
icte
d R
espo
nse
q2 = 0.684 Q2 = 0.699RMSE = 2.228
PLS1 latent variable
6 7 8 9 10 11 12 13 14 15 16
6
7
8
9
10
11
12
13
14
15
16
1
2
3
4
5
67
8
910
11
12
14
15
16
19
SCATTERPLOT DATA ( results.ttt )
Observed Response
Pred
icte
d R
espo
nse
q2 = 0.356 Q2 = 0.358RMSE = 1.686
PLS1 latent variableNo aromatic AAs
7 8 9 10 11 12 13 14 15 16
7
8
9
10
11
12
13
14
15
16
1
23
4
5
6
78
910
11
12
13
14
15 16
1718
19
SCATTERPLOT DATA ( results.ttt )
Observed Response
Pred
icte
d R
espo
nse
q2 = 0.317 Q2 = 0.325RMSE = 1.520
1 latent variableGaussian Kernel PLS (sigma = 1.3)With aromatic AAs
Molecule #BR #C #CL #F #H #I #N #O #P #S #SI BALA IDC IDCBAR IDW IDWBAR K0 K1 K2 K3 KA1 KA2 KA3 NXC3 NXC4 NXCH10 NXCH3 NXCH4 NXCH5 NXCH6 NXCH7 NXCH8 NXCH9 NXP10 NXP2 NXP3 NXP4 NXP5 NXP6 NXP7 NXP8 NXP9 NXPC4 SI TOPOL90 TOPOL91 TOPOL92 TOPOL93 TOPOL94 TOPOL95 TOPOL96 TOPOL97 TOPOL98 TOPOL99 WW X0 X1 X2 XC3 XC4 XCH10 XCH3 XCH4 XCH5 XCH6 XCH7 XCH8 XCH9 XP10 XP3 XP4 XP5 XP6 XP7 XP8 XP9 XPC4 XV0 XV1 XV2 XVC3 XVC4 XVCH10 XVCH3 XVCH4 XVCH5 XVCH6 XVCH7 XVCH8 XVCH9 XVP10 XVP3 XVP4 XVP5 XVP6 XVP7 XVP8 XVP9 XVPC4 S001 S002 S003 S004 S005 S006 S007 S008 S009 S010 S011 S012 S013 S014 S015 S016 S017 S018 S019 S020 S021 S022 S023 S024 S025 S026 S027 S028 S029 S030 S031 S032 S033 S034 S035 S036 S037 S038 S039 S040 S041 S042 S043 S044 S045 S046 S047 S048 S049 S050 S051 S052 S053 S054 S055 S056 S057 S058 S059 S060 S061 S062 S063 S064 S065 S066 S067 S068 S069 S070 S071 S072 S073 S074 S075 S076 S077 S078 S079 S080 S081 S082 S083 S084 S085 S086 S087 S088 S089 S090 S091 S092 S093 S094 S095 S096 S097 S098 S099 S100 S101 S102 S103 S104 S105 S106 S107 S108 S109 S110 S111 S112 S113 S114 S115 S116 S117 S118 S119 S120 S121 S122 S123 S124 S125 S126 S127 S128 S129 S130 S131 S132 S133 S134 S135 S136 S137 S138 S139 S140 S141 S142 S143 S144 S145 S146 S147 S148 S149 S150 S151 S152 S153 S154 S155 S156 S157 S158 S159 S160 S161 S162 S163 S164 S165 S166 S167 S168 S169 S170 S171 S172 S173 S174 S175 S176 S177 S178 S179 S180 S181 S182 S183 S184 S185 S186 S187 S188 S189 S190 S191 S192 S193 S194 S195 S196 S197 S198 S199 S200 S201 S202 S203 S204 S205 S206 S207 S208 AbsBNP1 AbsBNP10 AbsBNP2 AbsBNP3 AbsBNP4 AbsBNP5 AbsBNP6 AbsBNP7 AbsBNP8 AbsBNP9 AbsBNPMax AbsBNPMin AbsDGN1 AbsDGN10 AbsDGN2 AbsDGN3 AbsDGN4 AbsDGN5 AbsDGN6 AbsDGN7 AbsDGN8 AbsDGN9 AbsDGNMax AbsDGNMin AbsDKN1 AbsDKN10 AbsDKN2 AbsDKN3 AbsDKN4 AbsDKN5 AbsDKN6 AbsDKN7 AbsDKN8 AbsDKN9 AbsDKNMax AbsDKNMin AbsDRN1 AbsDRN10 AbsDRN2 AbsDRN3 AbsDRN4 AbsDRN5 AbsDRN6 AbsDRN7 AbsDRN8 AbsDRN9 AbsDRNMax AbsDRNMin AbsEP1 AbsEP10 AbsEP2 AbsEP3 AbsEP4 AbsEP5 AbsEP6 AbsEP7 AbsEP8 AbsEP9 AbsEPMax AbsEPMin AbsFuk1 AbsFuk10 AbsFuk2 AbsFuk3 AbsFuk4 AbsFuk5 AbsFuk6 AbsFuk7 AbsFuk8 AbsFuk9 AbsFukMax AbsFukMin AbsG1 AbsG10 AbsG2 AbsG3 AbsG4 AbsG5 AbsG6 AbsG7 AbsG8 AbsG9 AbsGMax AbsGMin AbsK1 AbsK10 AbsK2 AbsK3 AbsK4 AbsK5 AbsK6 AbsK7 AbsK8 AbsK9 AbsKMax AbsKMin AbsL1 AbsL10 AbsL2 AbsL3 AbsL4 AbsL5 AbsL6 AbsL7 AbsL8 AbsL9 AbsLMax AbsLMin BNP BNP1 BNP10 BNP2 BNP3 BNP4 BNP5 BNP6 BNP7 BNP8 BNP9 BNPAvg BNPMax BNPMin Del(G)NA1 Del(G)NA10 Del(G)NA2 Del(G)NA3 Del(G)NA4 Del(G)NA5 Del(G)NA6 Del(G)NA7 Del(G)NA8 Del(G)NA9 Del(G)NIA Del(G)NMax Del(G)NMin Del(K)IA Del(K)Max Del(K)Min Del(K)NA1 Del(K)NA10 Del(K)NA2 Del(K)NA3 Del(K)NA4 Del(K)NA5 Del(K)NA6 Del(K)NA7 Del(K)NA8 Del(K)NA9 Del(Rho)NA1 Del(Rho)NA10 Del(Rho)NA2 Del(Rho)NA3 Del(Rho)NA4 Del(Rho)NA5 Del(Rho)NA6 Del(Rho)NA7 Del(Rho)NA8 Del(Rho)NA9 Del(Rho)NIA Del(Rho)NMax Del(Rho)NMin EP1 EP10 EP2 EP3 EP4 EP5 EP6 EP7 EP8 EP9 Fuk Fuk1 Fuk10 Fuk2 Fuk3 Fuk4 Fuk5 Fuk6 Fuk7 Fuk8 Fuk9 FukAvg FukMax FukMin Lapl Lapl1 Lapl10 Lapl2 Lapl3 Lapl4 Lapl5 Lapl6 Lapl7 Lapl8 Lapl9 LaplAvg LaplMax LaplMin PIP1 PIP10 PIP11 PIP12 PIP13 PIP14 PIP15 PIP16 PIP17 PIP18 PIP19 PIP2 PIP20 PIP3 PIP4 PIP5 PIP6 PIP7 PIP8 PIP9 PIPAvg PIPMax PIPMin piV SIDel(G)N SIDel(K)N SIDel(Rho)N SIEP SIEPA1 SIEPA10 SIEPA2 SIEPA3 SIEPA4 SIEPA5 SIEPA6 SIEPA7 SIEPA8 SIEPA9 SIEPIA SIEPMax SIEPMin SIG SIGA1 SIGA10 SIGA2 SIGA3 SIGA4 SIGA5 SIGA6 SIGA7 SIGA8 SIGA9 SIGIA sigmanew sigmaNV sigmaPV SIGMax SIGMin SIK SIKA1 SIKA10 SIKA2 SIKA3 SIKA4 SIKA5 SIKA6 SIKA7 SIKA8 SIKA9 SIKIA SIKMax SIKMin sumsigma SurfArea Volume CAQSOL CHEM_POT CLOGP CMR CSAREA_A CSAREA_B CSAREA_C DELTAHF DIPOLE ETOT EVDW HARDNESS HBAB HBDA HOMO LENGTH_A LENGTH_B LENGTH_C LUMO MASS MUA MUB MUC NHBA NHBD NUMHB PISUBI QMINUS QPLUS RA RB RC SAAB SAAC SABC SAREA SASAREA SASVOL SHAPE VLOOP1 VLOOP2 VLOOP3 VLOOP4 VLOOP5 VOLUME
5-fluorocytosineCEFUROXIMEAXETILCHLORPROPAMIDE DANSYLGLYCINE DROPERIDOL KETOCONAZOLE PROCAINE PROMAZINE SULFAPHENAZOLE TERAZOSIN TOLAZAMIDE bumetanide bupropion camptothecin cefuroxime cromolyn digitoxin ebselen fusidicacid l-tryptophan levofloxacin norfloxacin novobiocin oxyphenbutazonephenoxymethylpenicillinicACIDphenylbutazone propylthiouracilsancycline tetracycline TERBINAFINE acebutolol acetaminophen acetylsalicylic_acidamoxicillin atenolol caffeine captopril carbamazepine cephalexin chlorpromazine cimetidine ciprofloxacin clofibrate clonidine desipramine doxycycline etoposide furosemide glibenclamide hydrochlorothiazideimipramine indomethacin itraconazole ketoprofen labetalol lamotrigine lidocaine methotrexate metoprolol minocycline nadolol naproxen ondansetron phenytoin pindolol prazosin propranolol quinidine quinine ranitidine sotalol sulfasalazine
Chemoinformatic Models to Predict Binding Affinities to Human Serum
Albumin: G. Colmenarejo et. al., J. Med. Chem 2001, 44, pp. 4370-4378
log K’hsa = log(t-t0)/t0 t and t0 are retention times of drug and NaNO3
• Binding affinities to human serum albumin (HSA): log K’hsa• Gonzalo Colmenarejo, GalaxoSmithKline J. Med. Chem. 2001, 44, 4370-4378• 95 molecules, 250-1500+ descriptors• Widely different compounts
1 ) Surface properties are encoded on 0.002 e/au3 surface Breneman, C.M. and Rhem, M., J. Comp. Chem., 1997,18(2), p. 182-197
2 ) Histograms or wavelet encoded of surface properties give TAE property descriptors
Electron Density-Derived TAE-wavelet Descriptors
PIP (Local Ionization Potential)
Histograms
Wavelet Coefficients
• TAE Internal Ray Reflection - low resolution scan
Isosurface (portion removed) with 750 segments
PEST-Shape Descriptors: Surface Property-Encoded Ray Tracing
RENSSELAER
• Segment length and point-of-incidence value form 2D-histogram
• Each bin of 2D-histogram becomes a hybrid descriptor– 36 descriptors per hybrid length-property
PIP vs Segment Length
Shape-Aware Molecular Descriptors fromProperty/Segment-Length Distributions
RENSSELAER
-2.5 -2 -1.5 -1 -0.5 0 0.5 1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
01
23
45
6
1112
15
16
17
18
20
21
22
24
25
27
29
30
33
34
35
36
37
38
39
40
4142
43
44
45
46
47
48
49
50
51
52
53
54
56
57
58 596061
6364
65 666768
6970
72 73
75
76
77
78 808283
84
86
87
88
89
90
92 9394
SCATTERPLOT DATA ( results.ttt )
Observed Response
Pred
icte
d R
espo
nse
q2 = 0.301 Q2 = 0.327RMSE = 0.357
training
-2.5 -2 -1.5 -1 -0.5 0 0.5 1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
01
2
56
7
8
9
10111213
14
15
16 18
19 20
21
2223
2426
2728
2930
3132
3536
37
38
39
40
41
43
44
45
46
47
4849
50
51 5253
54
55
565758
59
6061
62
6364
656768
69 71
72
747576
77
78
79
80
81
82
8485
86
87
88
89
90 91
9294
SCATTERPLOT DATA ( results.ttt )
Observed Response
Pred
icte
d R
espo
nse
q2 = 0.598 Q2 = 0.613RMSE = 0.506
testing
CHERKASSKY’S NONLINEAR BENCHMARK DATA
• Generate 500 datapoints (400 training; 100 testing) for:
25.025.0
sinsin2exp 3241
x
xxxxy
copy cherk.ori cherk.txtanalyze cherk.txt 3copy cherk.txt.txt cherk.txtanalyze cherk.txt 20copy cmatrix.txt cherk.patanalyze cherk.pat 111371copy cherk.pat.txt cherk.patcopy dmatrix.txt cherk.tesanalyze cherk.tes 111371copy cherk.tes.txt cherk.tesanalyze cherk.txt 116erase cherk.pat.txterase cherk.tes.txterase *.$$$erase *.txt.txtcopy cherk.pat a.txtcopy cherk.tes b.txtcopy sel_lbls.txt label.txt
Cherkas.bat
0 10 20 30 40 50 60 70 80 90 1000.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4real predicted
4 6 8 10 12 14 16
4
6
8
10
12
14
16
698
700
724
726728735740743745748750762
775786807
817
821822828835839841844845863
877879892895896899901904910916920929931934935940941944945946947948949953954956957961964965972973974976978980981984985987989
991992996997998999100010041005100710091010101310151016101710181019102210251026102710281032103310351036
1037103910411045104610501051105510611062106510671068107210751083
10911093109511001104110711081109111211141121112311251127114011451146
1147116111671187
12191224122712381242
12571265
128312881289
1295
13061334
SCATTERPLOT DATA ( results.ttt )
Observed Response
Pred
icte
d R
espo
nse
q2 = 0.029 Q2 = 0.031RMSE = 0.538
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
real predicted
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40.7
0.8
0.9
1
1.1
1.2
1.3
1.4real predicted
Y=sin|x|/|x|
• Generate 500 datapoints (100 training; 500 testing) for:
1010
||/||sin
x
xxy
-10 -8 -6 -4 -2 0 2 4 6 8 10-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2real predicted
Comparison Kernel-PLS with PLS4 latent variablessigma = 0.08
PLS
Kernel-PLS