_PROBLEM CoEPrA-2006_Regression_001 _GROUP_NAME Alexander Tropsha _GROUP_MEMBERS Mei Bell Alexander Golbraikh Chris Grulke Alexander Tropsha _ADDRESS Lab for Molecular Modeling School of Pharmacy CB#7360 University of North Carolina at Chapel Hill Chapel Hill, NC, 27599 _MODELING_PROCEDURE Descriptors used: as in CoEPrA-2006_Regression_001.zip Descriptor selection procedures: Pairwise Correlation Analysis. Descriptors were range-scaled. Validation procedures: Division of a dataset into training, test and external validation sets. External validation set was selected randomly. It was used to simulate prediction of new compounds. The remaining subset was rationally divided into multiple training and test sets using a sphere-exclusion algorithm (Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, Tropsha A. J Comput Aided Mol Des. 2003 Feb-Apr; 17(2-4):241-53.) Modeling procedure: k-nearest neighbors QSAR (Hoffman B, Cho SJ, Zheng W, Wyrick S, Nichols DE, Mailman RB, Tropsha A. J Med Chem. 1999 Aug 26;42(17):3217-26.). The number of descriptors selected were 8, 10, 12,... 50. 10 models for each division into training and test set have been built. Calibration statistics: Model generation: q2. Prediction of test sets (see Golbraikh A, Tropsha A. J Mol Graph Model. 2002 Jan;20(4):269-76.): coefficient of determination between predicted and observed activities R2; coefficients of determination for regressions through the origin for predicted vs observed and observed vs predicted activities R02, R'02; slopes for regressions through the origin for predicted vs observed and observed vs predicted activities k and k'. RMSD between predicted and observed activities. Prediction of the external validation set: Consensus prediction by all models satisfying all of the following conditions: (i) q2>0.7; (ii) R2>0.7; (iii) (R2-R02<0.1 and 0.9