_PROBLEM CoEPrA-2006_Classification_002 _GROUP_NAME Matt Segall _GROUP_MEMBERS Joelle Gola Olga Obrezanova Matt Segall _ADDRESS Inpharmatica 127 Cambridge Science Park Milton Road, Cambridge, CB4 0GD, UK tel. +44(0)1223 706177 e-mail: o.obrezanova@inpharmatica.co.uk _MODELING_PROCEDURE 1.Calculated one additional descriptor - Molecular weight. (for example, see http://www.expasy.org/tools/protparam.html) 2. Descriptor pre-selection. Descriptors with low standard deviation, low occurence and highly correlated were filtered out. Excluded descriptors - with standard deviation < 5.0E-04 - with occurence < 0.5% - with correlation coefficient >= 0.9 (only one of a pair left). After filtering 1289 descriptors remained. 3. Modelling technique. Decision Tree using C4.5 algorithm. Gain ratio criterion is used to select descriptors. Stopping rules used: minimum number of cases on a tree leaf is 2, upper limit for relative frequency of majority class on a node = 0.85. 4. Validation. Cross-Validation procedure (5 groups) was used to obtain an estimate for the misclassification rate on unseen set. 5. Final model and statistics. Calibration set: 76 cases, 1289 descriptors. Decision tree on calibration set: |Desc_02349 <= 0.84, | |Desc_04786 <= -0.38, class=1 (9/0) | |Desc_04786 > -0.38, | | |Desc_02671 <= 1.2, | | | |Desc_03525 <= 1, | | | | |Desc_02792 <= -0.58, class=1 (2/0) | | | | |Desc_02792 > -0.58, | | | | | |Desc_01048 <= 1.11, class=-1 (44/5) | | | | | |Desc_01048 > 1.11, class=1 (2/0) | | | |Desc_03525 > 1, class=1 (3/0) | | |Desc_02671 > 1.2, class=1 (4/0) |Desc_02349 > 0.84, class=1 (12/0) Total number of cases=76. Number of misclassified cases=5. Misclassification error = 6.57895%. Confusion matrix: Class -1 Class 1 <-- predicted ------- ------- 39 0 | Class -1 5 32 | Class 1 Kappa statistic: 0.867872. Mutual Information: 0.70378. Estimated misclassification rate on unseen set is 28.95% (obtained by cross-validation). _PREDICTION Obj_00001 +1 Obj_00002 +1 Obj_00003 -1 Obj_00004 +1 Obj_00005 +1 Obj_00006 -1 Obj_00007 +1 Obj_00008 -1 Obj_00009 +1 Obj_00010 -1 Obj_00011 -1 Obj_00012 +1 Obj_00013 -1 Obj_00014 +1 Obj_00015 +1 Obj_00016 -1 Obj_00017 +1 Obj_00018 -1 Obj_00019 -1 Obj_00020 -1 Obj_00021 -1 Obj_00022 +1 Obj_00023 +1 Obj_00024 -1 Obj_00025 -1 Obj_00026 -1 Obj_00027 +1 Obj_00028 -1 Obj_00029 -1 Obj_00030 -1 Obj_00031 -1 Obj_00032 +1 Obj_00033 +1 Obj_00034 -1 Obj_00035 -1 Obj_00036 +1 Obj_00037 -1 Obj_00038 -1 Obj_00039 +1 Obj_00040 -1 Obj_00041 -1 Obj_00042 -1 Obj_00043 -1 Obj_00044 -1 Obj_00045 +1 Obj_00046 -1 Obj_00047 -1 Obj_00048 +1 Obj_00049 +1 Obj_00050 -1 Obj_00051 +1 Obj_00052 +1 Obj_00053 +1 Obj_00054 +1 Obj_00055 +1 Obj_00056 +1 Obj_00057 -1 Obj_00058 +1 Obj_00059 +1 Obj_00060 +1 Obj_00061 -1 Obj_00062 +1 Obj_00063 +1 Obj_00064 +1 Obj_00065 -1 Obj_00066 -1 Obj_00067 -1 Obj_00068 +1 Obj_00069 -1 Obj_00070 +1 Obj_00071 +1 Obj_00072 +1 Obj_00073 -1 Obj_00074 +1 Obj_00075 +1 Obj_00076 -1