_PROBLEM CoEPrA-2006_Classification_002 _GROUP_NAME Curt Breneman _GROUP_MEMBERS Kristin Bennett Charles Bergeron Curt Breneman Theresa Hepburn Michael Krein Min Li Steven Mulick Sukumar Nagamani Matthew Sundling _ADDRESS Rensselaer Exploratory Centre for Cheminformatics Research, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York 12180-3522, United States of America _MODELING_PROCEDURE For each octapeptide, 5144 descriptors are provided. Additionally, 147 RECON (TAE/RECON electron-density derived) descriptors were generated. Each feature was centred to zero median and unit absolute deviation (sum of absolute difference between descriptor values and the median). Lossless data compression was accomplished by seperately considering each amino acid's descriptors and the RECON descriptors. Principal components analysis on each descriptor subset was used to find components of nonzero variance. This resulted in 179 components. Data reduction was accomplished by considering the covariance matrix C and an estimate S of how the covariance differs between the two sets. Minimum noise fractions, a generalized principal components analysis, was used. The generalized singular value decomposition of C with respect to S found "noise"-insensitive features. SNR is given by the singular values minus one; an SNR cutoff of 10 was chosen; hence, there remained 34 features. Kernel partial least squares was used for modeling. Model selection found that 12 latent variables and a Gaussian kernel function with sigma=159.4843 was retained. A cutoff of 7.7810 was used to discriminate between classes. The resulting classification accuracy was 76.31% for leave-one-out cross-validation across the calibration set. Consider model f=KB+b where f is the prediction reported below, K is the kernel matrix, B are the regression coefficients and b is the bias constant. Then B and b are as follows: B = -1.6374 -9.5877 -2.3194 -3.141 -21.665 -0.24613 -9.3587 -0.85192 -0.74675 -0.61558 7.6128 1.8263 -3.5451 -2.8338 -1.6122 -0.4111 -2.6264 -0.41583 -0.37979 6.6286 1.3382 18.111 0.52781 -1.3566 -9.2637 -2.823 -1.2057 0.85469 1.0383 0.24309 0.37763 2.912 -0.1232 1.518 1.6294 0.67124 2.7636 2.3501 -3.49 1.2212 2.2789 0.32194 -0.16696 0.042224 -0.086358 -0.39148 -1.1129 0.62137 0.36215 -3.2419 0.40171 0.13039 1.2837 -3.2407 0.92136 1.4999 -0.57394 1.6376 -1.8886 0.20103 1.4547 6.1535 -3.0457 -0.14666 1.5286 0.86124 0.86867 -3.46 4.2878 3.5212 5.6133 2.4685 0.99671 0.92356 1.0517 0.87537 b = 7.5509 _PREDICTION Obj_00001 +1 Obj_00002 -1 Obj_00003 +1 Obj_00004 +1 Obj_00005 -1 Obj_00006 +1 Obj_00007 +1 Obj_00008 -1 Obj_00009 +1 Obj_00010 -1 Obj_00011 +1 Obj_00012 -1 Obj_00013 -1 Obj_00014 -1 Obj_00015 -1 Obj_00016 +1 Obj_00017 +1 Obj_00018 -1 Obj_00019 -1 Obj_00020 -1 Obj_00021 -1 Obj_00022 +1 Obj_00023 +1 Obj_00024 -1 Obj_00025 -1 Obj_00026 +1 Obj_00027 +1 Obj_00028 +1 Obj_00029 -1 Obj_00030 +1 Obj_00031 -1 Obj_00032 -1 Obj_00033 +1 Obj_00034 +1 Obj_00035 -1 Obj_00036 +1 Obj_00037 +1 Obj_00038 -1 Obj_00039 +1 Obj_00040 -1 Obj_00041 +1 Obj_00042 +1 Obj_00043 +1 Obj_00044 -1 Obj_00045 -1 Obj_00046 +1 Obj_00047 -1 Obj_00048 +1 Obj_00049 +1 Obj_00050 -1 Obj_00051 -1 Obj_00052 -1 Obj_00053 -1 Obj_00054 +1 Obj_00055 -1 Obj_00056 -1 Obj_00057 +1 Obj_00058 -1 Obj_00059 -1 Obj_00060 +1 Obj_00061 +1 Obj_00062 -1 Obj_00063 +1 Obj_00064 +1 Obj_00065 -1 Obj_00066 +1 Obj_00067 -1 Obj_00068 -1 Obj_00069 +1 Obj_00070 +1 Obj_00071 +1 Obj_00072 -1 Obj_00073 +1 Obj_00074 +1 Obj_00075 -1 Obj_00076 -1