_PROBLEM CoEPrA-2006_Classification_001 _GROUP_NAME Artem Cherkasov _GROUP_MEMBERS Emre Karakoc Cenk Sahinalp Artem Cherkasov _ADDRESS University of British Columbia, Medicine Simon Fraser University, Computer Science Vancouver, BC, Canada _MODELING_PROCEDURE Descriptors: Based on our previous experience with QSAR modeling of peptides and general QSAR clustering and classification of bioactivity properties, we decided to use the following strategy. First, we have optimized the geometry of the studied peptides using MMFF94 force field; carboxylic groups have been deprotonated, amino groups - protonated, partial charges computed according to [1]. Then we have computed QSAR descriptors that describe an entire peptide molecule (global parameters) as well as descriptors corresponding to constituent aminoacids considered in the context of their peptide environment (we did not use 'isolated aminoacids' approximation). Thus, for all peptides in the testing and training sets of Classification_001 problem, we initially calculated > 200 various 3D and 2D QSAR parameters that included: . 50 global 'inductive' QSAR descriptors as described in [2]. . 10 local 'inductive' QSAR descriptors (computed toward CA atom) have been calculated for each aminoacid of a given nano-mer; therefore, 90 additional 'inductive' QSAR descriptors have been produced. . We have also computed ~90 conventional 3D and 2D global QSAR parameters which are implemented within the MOE modeling package [3]. All 'inductive' QSAR descriptors that are described above, have been calculated by our own SVL scripts for the MOE, that can be freely downloaded through the SVL exchange. Descriptors Selection: As a first step, the kNN-based linear optimization method based on a distance measure [4] has been used for the initial selection of most relevant QSAR parameters. As the result, 90 local 'inductive' descriptors have been selected for the 'kernel' QSAR models for Clafficifcation_001 problem. As the second round of the training model optimization, we have expanded the set of initial 90 local 'inductive' parameters by sampling global 'inductive' and conventional QSAR descriptors using greedy approach. In particular, we have used the Partial Least Squares (PLS) method implemented in MOE to identify those global parameters that can improve the QSAR fitting of dependent variables for the training set's peptides. Thus, the final set of 101 QSAR descriptors has been identified and can be found in the attachment file. Modeling Procedure: In order to build a predictive QSAR model on the set of the training peptides we employed the method of Artificial Neural Networks (ANN) as it is implemented within the SNNS package [5]. The values of all descriptors selected in the previous section have been normalized into the range [0.0, 1.0] based on the union of the external and calibration data-sets. To identify the optimal ANN setting we have conducted the Leave-One-Out (LOO) validation of the results, changing the learning rate, number of training cycles, ANN jogging setting, while judging the ANN performance by ROC values produced by the LOO predictions. As the result, we have identified that the best LOO performance could be achieved with the standard back-propagation training on 400 cycles and 0.7 learning rate, input shuffling, weight decay and random assignment of initial weights in a range of [-1.0, 1.0]. The number of ANN hidden nodes has been set to 100. All 89 ANN models produced during the LOO validation have then been applied to the external set of peptides and the outputs have been averaged and interpreted with 0.2 threshold. The interpreted averaged outputs are presented in the next section. Since we have used 89 ANN models (each trained on 88 peptides) - we can provide their weights upon request. The attachment contains a single ANN trained on all 89 peptides with known activities. [1] Cherkasov, A. Inductive Electronegativity Scale. Iterative Calculation of Inductive Partial Charges. Journal of Chemical Information and Computer Sciences, 2003, 43, 2039-2047. Cherkasov, A., Z. Shi, Y. Li, S.M. Jones, M. Fallahi, G.L. Hammond. 'Inductive' Charges on Atoms in Proteins: Comparative Docking with the Extended Steroid Benchmark Set and Discovery of a Novel SHBG Ligand. Journal of Chemical Information and Modelling, 2005, 45, 1842-1853. [2] Cherkasov, A. 'Inductive' Descriptors. 10 Successful Years in QSAR. Current Computer-Aided Drug Design, 2005, 1, 21-42. [3] Molecular Operational Environment, 2005, by Chemical Computing Group Inc., Montreal, Canada. [4] Karakoc E., Cherkasov A., Sahinalp S. C. Distance Based Algorithms for Small Biomolecule Classification and Structural Similarity Search. ISMB'06, 14th Annual International conference on Intelligent Systems for Molecular Biology, Fortaleza, Brazil 2006. [5] SNNS: Stuttgart Neural Network Simulator; Version 4.0, University of Stuttgart, 1995. _PREDICTION Obj_00001 +1 Obj_00002 +1 Obj_00003 -1 Obj_00004 -1 Obj_00005 +1 Obj_00006 +1 Obj_00007 -1 Obj_00008 -1 Obj_00009 +1 Obj_00010 -1 Obj_00011 +1 Obj_00012 -1 Obj_00013 +1 Obj_00014 -1 Obj_00015 +1 Obj_00016 +1 Obj_00017 -1 Obj_00018 +1 Obj_00019 -1 Obj_00020 +1 Obj_00021 +1 Obj_00022 +1 Obj_00023 -1 Obj_00024 -1 Obj_00025 -1 Obj_00026 +1 Obj_00027 +1 Obj_00028 +1 Obj_00029 +1 Obj_00030 -1 Obj_00031 +1 Obj_00032 +1 Obj_00033 -1 Obj_00034 +1 Obj_00035 +1 Obj_00036 +1 Obj_00037 -1 Obj_00038 -1 Obj_00039 -1 Obj_00040 -1 Obj_00041 +1 Obj_00042 -1 Obj_00043 +1 Obj_00044 -1 Obj_00045 +1 Obj_00046 +1 Obj_00047 -1 Obj_00048 +1 Obj_00049 -1 Obj_00050 -1 Obj_00051 +1 Obj_00052 +1 Obj_00053 -1 Obj_00054 +1 Obj_00055 -1 Obj_00056 -1 Obj_00057 -1 Obj_00058 -1 Obj_00059 +1 Obj_00060 -1 Obj_00061 -1 Obj_00062 -1 Obj_00063 +1 Obj_00064 -1 Obj_00065 -1 Obj_00066 +1 Obj_00067 +1 Obj_00068 +1 Obj_00069 +1 Obj_00070 -1 Obj_00071 +1 Obj_00072 -1 Obj_00073 +1 Obj_00074 -1 Obj_00075 +1 Obj_00076 -1 Obj_00077 +1 Obj_00078 +1 Obj_00079 -1 Obj_00080 +1 Obj_00081 +1 Obj_00082 -1 Obj_00083 -1 Obj_00084 +1 Obj_00085 +1 Obj_00086 -1 Obj_00087 -1 Obj_00088 -1