_PROBLEM CoEPrA-2006_Classification_003 _GROUP_NAME Artem Cherkasov _GROUP_MEMBERS Emre Karakoc Cenk Sahinalp Artem Cherkasov _ADDRESS University of British Columbia, Medicine Simon Fraser University, Computer Science Vancouver, BC, Canada _MODELING_PROCEDURE Descriptors: Based on our previous experience with QSAR modeling of peptides and General QSAR clustering and classification of bioactivity properties, we decided to use the following strategy. First, we have optimized the geometry of the studied peptides using MMFF94 force field; carboxylic groups have been deprotonated, amino groups - protonated, partial charges computed according to [1]. Then we have computed QSAR descriptors that describe an entire peptide molecule (global parameters) as well as descriptors corresponding to constituent aminoacids considered in the context of their peptide environment (we did not use 'isolated aminoacids' approximation). Thus, for all peptides in the testing and training sets of Classification_003 problem, we initially calculated > 400 various 3D and 2D QSAR parameters that included: - 50 global 'inductive' QSAR descriptors as described in [2]. - 10 local 'inductive' QSAR descriptors (computed toward CA atom) have been calculated for each aminoacid of a given 8-mer; therefore, 80 additional 'inductive' QSAR descriptors have been produced. - 260 global atomic type-specific 'inductive' QSAR descriptors - (previously unpublished parameters) that have been computed additively For specific atomic types presented in the studied peptides; - We have also computed ~90 conventional 3D and 2D global QSAR parameters which are implemented within the MOE modeling package [3]. All 'inductive' QSAR descriptors that are described above, have been calculated by our own SVL scripts for the MOE; most of them can be freely downloaded through the SVL exchange. Modeling Procedure: We used our linear optimization method based on a distance measure[4] for calculating our prediction model. Given the calibration data-set with our optimization approach aims to find a weighted Minkowski distance that maximizes the difference between active and inactive peptides. The seperation between active and inactive compounds are written as a linear program (LP) where the number of the descriptors used for prediction model is limited. We limit the number of the descriptors around 50 and our final model has 52 descriptors. We trained our prediction model using whole calibration data-set and the quality of our model is determined using the accuracy of the prediction results which is calculated using k nearest neighbor (kNN) classification. kNN based classification assigns the activity of an peptide, P, as the majority of the k nearest neighbors of P using the distance model determined by the LP optimization. k is selected as 3 which gives the best accuracy results. We get an accuracy of 0.65 for the training data-set. Based on these results we apply our distance model with kNN classification where k=3 to the external data-set. [1] Cherkasov, A. Inductive Electronegativity Scale. Iterative Calculation of Inductive Partial Charges. Journal of Chemical Information and Computer Sciences, 2003, 43, 2039-2047. Cherkasov, A., Z. Shi, Y. Li, S.M. Jones, M. Fallahi, G.L. Hammond. 'Inductive' Charges on Atoms in Proteins: Comparative Docking with the Extended Steroid Benchmark Set and Discovery of a Novel SHBG Ligand. Journal of Chemical Information and Modelling, 2005, 45, 1842-1853. [2] Cherkasov, A. 'Inductive' Descriptors. 10 Successful Years in QSAR. Current Computer-Aided Drug Design, 2005, 1, 21-42. [3] Molecular Operational Environment, 2005, by Chemical Computing Group Inc., Montreal, Canada. [4] Karakoc E., Cherkasov A., Sahinalp S. C. Distance Based Algorithms for Small Biomolecule Classification and Structural Similarity Search. ISMB'06, 14th Annual International conference on Intelligent Systems for Molecular Biology, Fortaleza, Brazil 2006. [5] SNNS: Stuttgart Neural Network Simulator; Version 4.0, University of Stuttgart, 1995. _PREDICTION Obj_00001 +1 Obj_00002 -1 Obj_00003 +1 Obj_00004 +1 Obj_00005 -1 Obj_00006 -1 Obj_00007 -1 Obj_00008 -1 Obj_00009 -1 Obj_00010 -1 Obj_00011 -1 Obj_00012 +1 Obj_00013 +1 Obj_00014 -1 Obj_00015 +1 Obj_00016 -1 Obj_00017 +1 Obj_00018 +1 Obj_00019 -1 Obj_00020 -1 Obj_00021 +1 Obj_00022 -1 Obj_00023 +1 Obj_00024 -1 Obj_00025 +1 Obj_00026 +1 Obj_00027 -1 Obj_00028 +1 Obj_00029 +1 Obj_00030 +1 Obj_00031 +1 Obj_00032 +1 Obj_00033 +1 Obj_00034 -1 Obj_00035 +1 Obj_00036 -1 Obj_00037 -1 Obj_00038 +1 Obj_00039 +1 Obj_00040 +1 Obj_00041 +1 Obj_00042 -1 Obj_00043 -1 Obj_00044 -1 Obj_00045 -1 Obj_00046 -1 Obj_00047 +1 Obj_00048 -1 Obj_00049 +1 Obj_00050 +1 Obj_00051 +1 Obj_00052 -1 Obj_00053 -1 Obj_00054 +1 Obj_00055 +1 Obj_00056 +1 Obj_00057 +1 Obj_00058 +1 Obj_00059 -1 Obj_00060 -1 Obj_00061 -1 Obj_00062 -1 Obj_00063 -1 Obj_00064 +1 Obj_00065 +1 Obj_00066 -1 Obj_00067 +1 Obj_00068 +1 Obj_00069 -1 Obj_00070 -1 Obj_00071 +1 Obj_00072 -1 Obj_00073 +1 Obj_00074 -1 Obj_00075 -1 Obj_00076 +1 Obj_00077 +1 Obj_00078 -1 Obj_00079 -1 Obj_00080 -1 Obj_00081 +1 Obj_00082 +1 Obj_00083 +1 Obj_00084 +1 Obj_00085 -1 Obj_00086 -1 Obj_00087 +1 Obj_00088 -1 Obj_00089 +1 Obj_00090 -1 Obj_00091 +1 Obj_00092 -1 Obj_00093 -1 Obj_00094 -1 Obj_00095 -1 Obj_00096 +1 Obj_00097 -1 Obj_00098 -1 Obj_00099 +1 Obj_00100 -1 Obj_00101 -1 Obj_00102 +1 Obj_00103 +1 Obj_00104 +1 Obj_00105 +1 Obj_00106 +1 Obj_00107 -1 Obj_00108 +1 Obj_00109 +1 Obj_00110 -1 Obj_00111 +1 Obj_00112 -1 Obj_00113 +1 Obj_00114 -1 Obj_00115 +1 Obj_00116 +1 Obj_00117 +1 Obj_00118 +1 Obj_00119 +1 Obj_00120 -1 Obj_00121 -1 Obj_00122 +1 Obj_00123 +1 Obj_00124 -1 Obj_00125 +1 Obj_00126 -1 Obj_00127 -1 Obj_00128 -1 Obj_00129 +1 Obj_00130 -1 Obj_00131 -1 Obj_00132 +1 Obj_00133 -1