_PROBLEM CoEPrA-2006_Regression_003 _GROUP_NAME Levon Budagyan _GROUP_MEMBERS Levon Budagyan _ADDRESS levon@molsoft.com; Levon Budagyan 3366 N. Torrey Pines Ct., La Jolla, CA, 92037, USA; Molsoft (www.molsoft.com) _MODELING_PROCEDURE We used a combination of gapped pair counts and amino acid composition bit strings. The gapped pair count vector of a sequence with gap length >=0 is a vector with coordinates indexed by sequence alphabet symbol pairs, i.e. it has 26 x 26 coordinates, one for each pair of letters. For each pair the corresponding vector component contains the quantity of such ordered pairs with a given gap between them. E.g. for gap length l=2 and alphabet pair (A,A), the corresponding vector component will contain the number of A**A subsequences in the sequence, where * stands for any symbol. Descriptor vectors were composed from the pair count vectors with different gaps. We concatenated the pair count vectors for the gap sizes from 0 up to some gap length l0 (we used l0=2). Totally, we had m = 26^2 (l0+1) components in each sequence descriptor vector. In addition, each amino acid in the peptide was encoded with a simple bit pattern: 'A'->10000.., 'B'->01000.., 'Z'->..0001, and such bit vector was added to the gapped pair count descriptor. PLS regression method was applied then to the constructed descriptor set. All data transformations and analysis were performed using the ICM Pro 3.4 program (http://www.molsoft.com/icm_pro.html). _PREDICTION Obj_00001 7.536 Obj_00002 7.538 Obj_00003 6.926 Obj_00004 6.929 Obj_00005 7.471 Obj_00006 6.907 Obj_00007 6.547 Obj_00008 7.413 Obj_00009 7.363 Obj_00010 6.614 Obj_00011 5.870 Obj_00012 7.650 Obj_00013 7.775 Obj_00014 6.942 Obj_00015 6.593 Obj_00016 7.034 Obj_00017 7.432 Obj_00018 6.940 Obj_00019 7.407 Obj_00020 8.212 Obj_00021 6.389 Obj_00022 7.693 Obj_00023 7.166 Obj_00024 6.808 Obj_00025 7.952 Obj_00026 7.523 Obj_00027 6.500 Obj_00028 6.069 Obj_00029 6.997 Obj_00030 5.495 Obj_00031 8.151 Obj_00032 8.120 Obj_00033 6.749 Obj_00034 6.368 Obj_00035 6.763 Obj_00036 6.652 Obj_00037 6.684 Obj_00038 8.203 Obj_00039 7.348 Obj_00040 6.443 Obj_00041 7.944 Obj_00042 7.383 Obj_00043 7.280 Obj_00044 6.922 Obj_00045 7.830 Obj_00046 6.857 Obj_00047 6.339 Obj_00048 7.314 Obj_00049 7.524 Obj_00050 6.902 Obj_00051 6.467 Obj_00052 6.609 Obj_00053 6.749 Obj_00054 6.036 Obj_00055 5.939 Obj_00056 6.218 Obj_00057 7.030 Obj_00058 6.797 Obj_00059 7.039 Obj_00060 7.327 Obj_00061 7.120 Obj_00062 8.529 Obj_00063 7.083 Obj_00064 6.479 Obj_00065 8.357 Obj_00066 6.350 Obj_00067 6.118 Obj_00068 7.670 Obj_00069 6.888 Obj_00070 7.297 Obj_00071 8.454 Obj_00072 6.317 Obj_00073 7.435 Obj_00074 7.760 Obj_00075 7.537 Obj_00076 7.013 Obj_00077 6.859 Obj_00078 6.754 Obj_00079 7.539 Obj_00080 7.576 Obj_00081 7.253 Obj_00082 7.267 Obj_00083 6.808 Obj_00084 7.991 Obj_00085 6.706 Obj_00086 7.240 Obj_00087 8.021 Obj_00088 6.320 Obj_00089 5.315 Obj_00090 6.222 Obj_00091 7.356 Obj_00092 7.342 Obj_00093 7.379 Obj_00094 6.988 Obj_00095 6.482 Obj_00096 7.201 Obj_00097 7.338 Obj_00098 6.545 Obj_00099 7.611 Obj_00100 6.930 Obj_00101 8.014 Obj_00102 7.306 Obj_00103 6.630 Obj_00104 8.518 Obj_00105 6.173 Obj_00106 6.661 Obj_00107 7.031 Obj_00108 7.381 Obj_00109 7.362 Obj_00110 5.990 Obj_00111 6.956 Obj_00112 6.596 Obj_00113 7.481 Obj_00114 7.447 Obj_00115 7.566 Obj_00116 7.405 Obj_00117 7.000 Obj_00118 7.502 Obj_00119 7.504 Obj_00120 6.222 Obj_00121 6.650 Obj_00122 6.826 Obj_00123 7.120 Obj_00124 7.817 Obj_00125 7.412 Obj_00126 7.295 Obj_00127 7.741 Obj_00128 7.209 Obj_00129 6.244 Obj_00130 6.730 Obj_00131 8.159 Obj_00132 7.084 Obj_00133 6.865