_PROBLEM CoEPrA-2006_Classification_003 _GROUP_NAME Francisco Melo _GROUP_MEMBERS Francisco Melo J.T. Eterovic Evandro Ferrada Tomas Norambuena Rodrigo Malig Ismael Vergara _ADDRESS fmelo@bio.puc.cl _MODELING_PROCEDURE We first applied feature selection using the algorithm described by Koller and Sahami in "Toward Optimal Feature Selection" (1996) at each position/aa of the nonapeptide (i.e., for each matrix of 133 raws x 643 columns). We decided to use it because of its capacity to handle large amounts of data and because it considers redundance among features, and the relevance of association for each feature with the class. After this initial procedure, we selected 287 features from the initial set of the 5787 features. To derive the optimal solution we have used a genetic algorithm (GA) that evolves mathematical functions that are used as linear and/or non-linear transformations of the feature space. The GA has been developed in our lab and it was executed with the following parameters or settings: - mutation rate: 0.1 - cross over: 1.0 - number of individuals: 500 - number of iterations: 1000 - elitism size: 2.0 - max depth of the trees generated: 3 - lineal normalization with parameter 1.0 The GA generated the following optimal solution: Transform = Desc_935 - Desc4998 - Desc_4431*Desc_309 The accuracy obtained for the calibration data set was 0.78. The optimal threshold to classify between labels 1 and -1 was -1.782. This threshold was used to classify the instances in the prediction data set. _PREDICTION Obj_00001 +1 Obj_00002 -1 Obj_00003 +1 Obj_00004 -1 Obj_00005 -1 Obj_00006 -1 Obj_00007 +1 Obj_00008 -1 Obj_00009 -1 Obj_00010 -1 Obj_00011 -1 Obj_00012 -1 Obj_00013 -1 Obj_00014 -1 Obj_00015 +1 Obj_00016 -1 Obj_00017 -1 Obj_00018 +1 Obj_00019 +1 Obj_00020 -1 Obj_00021 +1 Obj_00022 +1 Obj_00023 +1 Obj_00024 +1 Obj_00025 -1 Obj_00026 -1 Obj_00027 -1 Obj_00028 -1 Obj_00029 +1 Obj_00030 -1 Obj_00031 -1 Obj_00032 +1 Obj_00033 -1 Obj_00034 -1 Obj_00035 +1 Obj_00036 -1 Obj_00037 -1 Obj_00038 +1 Obj_00039 +1 Obj_00040 -1 Obj_00041 -1 Obj_00042 -1 Obj_00043 +1 Obj_00044 -1 Obj_00045 -1 Obj_00046 -1 Obj_00047 -1 Obj_00048 -1 Obj_00049 +1 Obj_00050 -1 Obj_00051 -1 Obj_00052 +1 Obj_00053 -1 Obj_00054 +1 Obj_00055 +1 Obj_00056 -1 Obj_00057 -1 Obj_00058 -1 Obj_00059 +1 Obj_00060 +1 Obj_00061 +1 Obj_00062 +1 Obj_00063 +1 Obj_00064 +1 Obj_00065 +1 Obj_00066 -1 Obj_00067 +1 Obj_00068 +1 Obj_00069 +1 Obj_00070 -1 Obj_00071 +1 Obj_00072 +1 Obj_00073 +1 Obj_00074 +1 Obj_00075 +1 Obj_00076 -1 Obj_00077 +1 Obj_00078 +1 Obj_00079 -1 Obj_00080 -1 Obj_00081 +1 Obj_00082 +1 Obj_00083 +1 Obj_00084 +1 Obj_00085 -1 Obj_00086 -1 Obj_00087 -1 Obj_00088 +1 Obj_00089 -1 Obj_00090 -1 Obj_00091 -1 Obj_00092 -1 Obj_00093 -1 Obj_00094 -1 Obj_00095 +1 Obj_00096 +1 Obj_00097 +1 Obj_00098 +1 Obj_00099 -1 Obj_00100 -1 Obj_00101 -1 Obj_00102 -1 Obj_00103 -1 Obj_00104 +1 Obj_00105 -1 Obj_00106 -1 Obj_00107 -1 Obj_00108 +1 Obj_00109 -1 Obj_00110 -1 Obj_00111 -1 Obj_00112 -1 Obj_00113 -1 Obj_00114 -1 Obj_00115 -1 Obj_00116 -1 Obj_00117 -1 Obj_00118 -1 Obj_00119 +1 Obj_00120 -1 Obj_00121 -1 Obj_00122 -1 Obj_00123 +1 Obj_00124 -1 Obj_00125 -1 Obj_00126 -1 Obj_00127 +1 Obj_00128 +1 Obj_00129 -1 Obj_00130 -1 Obj_00131 -1 Obj_00132 -1 Obj_00133 -1