_PROBLEM CoEPrA-2006_Regression_001 _GROUP_NAME Artem Cherkasov _GROUP_MEMBERS Emre Karakoc Cenk Sahinalp Artem Cherkasov _ADDRESS University of British Columbia, Medicine Simon Fraser University, Computer Science Vancouver, BC, Canada _MODELING_PROCEDURE Descriptors: Based on our previous experience with QSAR modeling of peptides and general QSAR clustering and classification of bioactivity properties, we decided to use the following strategy. First, we have optimized the geometry of the studied peptides using MMFF94 force field; carboxylic groups have been deprotonated, amino groups – protonated, charges computed according to [1]. Then we have computed QSAR descriptors that describe an entire peptide molecule (global parameters) as well as descriptors corresponding to constituent aminoacids considered in the context of their peptide environment (we did not use 'isolated aminoacids' approximation). Thus, for all peptides in the testing and training sets of Regression_001 problem, we initially calculated > 200 various 3D and 2D QSAR parameters that included: 50 global 'inductive' QSAR descriptors as described in [2]. 10 local 'inductive' QSAR descriptors (computed toward CA atom) have been calculated for each aminoacid of a given nano-mer; therefore, 90 additional 'inductive' QSAR descriptors have been produced. We have also computed ~90 conventional 3D and 2D global QSAR parameters which are implemented within the MOE modeling package [3]. All 'inductive' QSAR descriptors that are described above, have been calculated by our own SVL scripts for the MOE, that can be freely downloaded through the SVL exchange. Descriptors Selection: As a first step, the kNN-based linear optimization method based on a distance measure [4] has been used for the initial selection of most relevant QSAR parameters. As the result, 90 local 'inductive' descriptors have been selected for the 'kernel' QSAR models for Regression_001 problem. As the second round of the training model optimization, we have expanded the set of initial 90 local 'inductive' parameters by sampling global 'inductive' and conventional QSAR descriptors using greedy approach. In particular, we have used the Partial Least Squares (PLS) method implemented in MOE to identify those global parameters that can improve the QSAR fitting of dependent variables for the training set's peptides. Thus, the final set of 101 QSAR descriptors has been identified, and can be found in the attachment file. Modeling Procedure: In order to build a predictive QSAR model on the set of the training peptides we employed the method of Artificial Neural Networks (ANN) as it is implemented within the SNNS package [5]. The values of all descriptors selected in the previous section have been normalized into the range [0.0, 1.0] based on the union of the external and calibration data-sets. To identify the optimal ANN setting we have conducted the Leave-One-Out (LOO) validation of the results, changing the learning rate, number of training cycles, ANN jogging setting, while judging the ANN performance by r^2 correlation coefficients produced by the LOO predictions. As the result, we have identified that the best LOO performance could be achieved with the standard back-propagation training on 400 cycles and 0.7 learning rate, input shuffling, weight decay and random assignment of initial weights in a range of [-1.0, 1.0]. The number of ANN hidden nodes has been set to 100. Finally, all 89 ANN models (each trained on 88 peptides) produced during the LOO validation have been applied to the external set of peptides and their outputs have been averaged. The averaged outputs are presented in the next section. Since we have used 89 ANN models (each trained on 88 peptides) - we can provide their weights upon request. The attachment contains a single ANN trained on all 89 peptides with known activities. [1] Cherkasov, A. Inductive Electronegativity Scale. Iterative Calculation of Inductive Partial Charges. Journal of Chemical Information and Computer Sciences, 2003, 43, 2039-2047. Cherkasov, A., Z. Shi, Y. Li, S.M. Jones, M. Fallahi, G.L. Hammond. ‘Inductive’ Charges on Atoms in Proteins: Comparative Docking with the Extended Steroid Benchmark Set and Discovery of a Novel SHBG Ligand. Journal of Chemical Information and Modelling, 2005, 45, 1842-1853. [2] Cherkasov, A. ‘Inductive’ Descriptors. 10 Successful Years in QSAR. Current Computer-Aided Drug Design, 2005, 1, 21-42. [3] Molecular Operational Environment, 2005, by Chemical Computing Group Inc., Montreal, Canada. [4] Karakoc E., Cherkasov A., Sahinalp S. C. Distance Based Algorithms for Small Biomolecule Classification and Structural Similarity Search. ISMB'06, 14th Annual International conference on Intelligent Systems for Molecular Biology, Fortaleza, Brazil 2006. [5] SNNS: Stuttgart Neural Network Simulator; Version 4.0, University of Stuttgart, 1995. _PREDICTION Obj_00001 5.292 Obj_00002 5.881 Obj_00003 4.680 Obj_00004 4.235 Obj_00005 6.811 Obj_00006 6.641 Obj_00007 4.491 Obj_00008 4.802 Obj_00009 4.996 Obj_00010 5.931 Obj_00011 6.072 Obj_00012 5.077 Obj_00013 5.504 Obj_00014 4.172 Obj_00015 6.565 Obj_00016 5.086 Obj_00017 4.209 Obj_00018 5.904 Obj_00019 5.030 Obj_00020 4.847 Obj_00021 5.362 Obj_00022 5.165 Obj_00023 6.519 Obj_00024 6.443 Obj_00025 5.018 Obj_00026 6.577 Obj_00027 5.959 Obj_00028 6.160 Obj_00029 4.942 Obj_00030 4.691 Obj_00031 5.312 Obj_00032 5.858 Obj_00033 4.767 Obj_00034 5.589 Obj_00035 6.779 Obj_00036 6.039 Obj_00037 5.821 Obj_00038 4.714 Obj_00039 5.672 Obj_00040 4.675 Obj_00041 5.915 Obj_00042 4.155 Obj_00043 6.590 Obj_00044 4.908 Obj_00045 4.965 Obj_00046 6.182 Obj_00047 4.274 Obj_00048 5.162 Obj_00049 4.531 Obj_00050 7.069 Obj_00051 5.650 Obj_00052 5.304 Obj_00053 4.103 Obj_00054 6.670 Obj_00055 5.375 Obj_00056 5.874 Obj_00057 4.363 Obj_00058 4.388 Obj_00059 6.945 Obj_00060 3.778 Obj_00061 6.280 Obj_00062 6.082 Obj_00063 5.949 Obj_00064 4.533 Obj_00065 5.650 Obj_00066 6.571 Obj_00067 6.310 Obj_00068 6.356 Obj_00069 5.530 Obj_00070 4.898 Obj_00071 5.000 Obj_00072 4.636 Obj_00073 5.454 Obj_00074 6.235 Obj_00075 5.666 Obj_00076 4.361 Obj_00077 5.873 Obj_00078 6.009 Obj_00079 3.477 Obj_00080 5.879 Obj_00081 5.649 Obj_00082 5.739 Obj_00083 5.127 Obj_00084 6.363 Obj_00085 6.337 Obj_00086 5.609 Obj_00087 5.311 Obj_00088 4.867