_PROBLEM CoEPrA-2006_Regression_002 _GROUP_NAME Artem Cherkasov _GROUP_MEMBERS Emre Karakoc Cenk Sahinalp Artem Cherkasov _ADDRESS University of British Columbia, Medicine Simon Fraser University, Computer Science Vancouver, BC, Canada _MODELING_PROCEDURE Descriptors: Based on our previous experience with QSAR modeling of peptides and general QSAR clustering and classification of bioactivity properties, we decided to use the following strategy. First, we have optimized the geometry of the studied peptides using MMFF94 force field; carboxylic groups have been deprotonated, amino groups - protonated, partial charges computed according to [1]. Then we have computed QSAR descriptors that describe an entire peptide molecule (global parameters) as well as descriptors corresponding to constituent aminoacids considered in the context of their peptide environment (we did not use 'isolated aminoacids' approximation). Thus, for all peptides in the testing and training sets of Regression_002 problem, we initially calculated > 400 various 3D and 2D QSAR parameters that included: - 50 global 'inductive' QSAR descriptors as described in [2]. - 10 local 'inductive' QSAR descriptors (computed toward CA atom) have been calculated for each aminoacid of a given 8-mer; therefore, 80 additional 'inductive' QSAR descriptors have been produced. - 260 global atomic type-specific 'inductive' QSAR descriptors - (previously unpublished parameters) that have been computed additively for specific atomic types presented in the studied peptides; - We have also computed ~90 conventional 3D and 2D global QSAR parameters which are implemented within the MOE modeling package [3]. All 'inductive' QSAR descriptors that are described above have been calculated by our own SVL scripts for the MOE; most of them can be freely downloaded through the SVL exchange. Descriptors Selection: As a first step, the kNN-based linear optimization method based on a distance measure [4] has been used for the initial selection of most relevant QSAR parameters. As the result, 31 global, local 'inductive' and conventional 2D descriptors have been selected for QSAR modeling of the Regression_002 problem (the corresponding values can be found in the attached excel spreadsheet). Modeling Procedure: In order to build a predictive QSAR model on the set of the training peptides we the PLS approach as it is implemented within the MOE [3]. The quality of the PLS-based model has been ensured with Leave-One-Out (LOO) validation. [1] Cherkasov, A. Inductive Electronegativity Scale. Iterative Calculation of Inductive Partial Charges. Journal of Chemical Information and Computer Sciences, 2003, 43, 2039-2047. Cherkasov, A., Z. Shi, Y. Li, S.M. Jones, M. Fallahi, G.L. Hammond. 'Inductive' Charges on Atoms in Proteins: Comparative Docking with the Extended Steroid Benchmark Set and Discovery of a Novel SHBG Ligand. Journal of Chemical Information and Modelling, 2005, 45, 1842-1853. [2] Cherkasov, A. 'Inductive' Descriptors. 10 Successful Years in QSAR. Current Computer-Aided Drug Design, 2005, 1, 21-42. [3] Molecular Operational Environment, 2005, by Chemical Computing Group Inc., Montreal, Canada. [4] Karakoc E., Cherkasov A., Sahinalp S. C. Distance Based Algorithms for Small Biomolecule Classification and Structural Similarity Search. ISMB'06, 14th Annual International conference on Intelligent Systems for Molecular Biology, Fortaleza, Brazil 2006. [5] SNNS: Stuttgart Neural Network Simulator; Version 4.0, University of Stuttgart, 1995. _PREDICTION Obj_00001 7.978 Obj_00002 9.235 Obj_00003 6.222 Obj_00004 7.165 Obj_00005 7.523 Obj_00006 8.497 Obj_00007 7.104 Obj_00008 7.669 Obj_00009 8.983 Obj_00010 7.626 Obj_00011 6.495 Obj_00012 7.635 Obj_00013 8.130 Obj_00014 7.408 Obj_00015 14.155 Obj_00016 7.290 Obj_00017 7.530 Obj_00018 7.194 Obj_00019 7.714 Obj_00020 7.126 Obj_00021 8.056 Obj_00022 6.464 Obj_00023 7.511 Obj_00024 7.752 Obj_00025 6.817 Obj_00026 6.908 Obj_00027 7.399 Obj_00028 8.564 Obj_00029 7.506 Obj_00030 8.435 Obj_00031 10.145 Obj_00032 6.713 Obj_00033 7.373 Obj_00034 6.631 Obj_00035 6.556 Obj_00036 9.275 Obj_00037 6.398 Obj_00038 7.899 Obj_00039 7.733 Obj_00040 6.451 Obj_00041 6.803 Obj_00042 8.545 Obj_00043 7.623 Obj_00044 7.548 Obj_00045 7.104 Obj_00046 6.920 Obj_00047 8.273 Obj_00048 9.318 Obj_00049 7.823 Obj_00050 7.337 Obj_00051 7.868 Obj_00052 10.184 Obj_00053 7.661 Obj_00054 7.306 Obj_00055 7.196 Obj_00056 8.389 Obj_00057 7.394 Obj_00058 7.368 Obj_00059 10.002 Obj_00060 8.917 Obj_00061 7.585 Obj_00062 7.329 Obj_00063 8.584 Obj_00064 8.827 Obj_00065 7.819 Obj_00066 7.878 Obj_00067 9.079 Obj_00068 7.735 Obj_00069 7.413 Obj_00070 7.399 Obj_00071 7.628 Obj_00072 7.937 Obj_00073 7.902 Obj_00074 7.383 Obj_00075 7.634 Obj_00076 6.894