_PROBLEM CoEPrA-2006_Regression_002 _GROUP_NAME Curt Breneman _GROUP_MEMBERS Kristin Bennett Charles Bergeron Curt Breneman Theresa Hepburn Michael Krein Min Li Steven Mulick Sukumar Nagamani Matthew Sundling _ADDRESS Rensselaer Exploratory Centre for Cheminformatics Research, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York 12180-3522, United States of America _MODELING_PROCEDURE For each octapeptide, 5144 descriptors are provided. Additionally, 147 RECON (TAE/RECON electron-density derived) descriptors were generated. Each feature was centred to zero median and unit absolute deviation (sum of absolute difference between descriptor values and the median). Lossless data compression was accomplished by seperately considering each amino acid's descriptors and the RECON descriptors. Principal components analysis on each descriptor subset was used to find components of nonzero variance. This resulted in 179 components. Data reduction was accomplished by considering the covariance matrix C and an estimate S of how the covariance differs between the two sets. Minimum noise fractions, a generalized principal components analysis, was used. The generalized singular value decomposition of C with respect to S found "noise"-insensitive features. SNR is given by the singular values minus one; an SNR cutoff of 10 was chosen; hence, there remained 34 features. Kernel partial least squares was used for modeling. Model selection found that 12 latent variables and a Gaussian kernel function with sigma=159.4843 was retained. The resulting q^2 was 0.5799 for leave-one-out cross-validation across the calibration set. Consider model f=KB+b where f is the prediction reported below, K is the kernel matrix, B are the regression coefficients and b is the bias constant. Then B and b are as follows: B = -1.6374 -9.5877 -2.3194 -3.141 -21.665 -0.24613 -9.3587 -0.85192 -0.74675 -0.61558 7.6128 1.8263 -3.5451 -2.8338 -1.6122 -0.4111 -2.6264 -0.41583 -0.37979 6.6286 1.3382 18.111 0.52781 -1.3566 -9.2637 -2.823 -1.2057 0.85469 1.0383 0.24309 0.37763 2.912 -0.1232 1.518 1.6294 0.67124 2.7636 2.3501 -3.49 1.2212 2.2789 0.32194 -0.16696 0.042224 -0.086358 -0.39148 -1.1129 0.62137 0.36215 -3.2419 0.40171 0.13039 1.2837 -3.2407 0.92136 1.4999 -0.57394 1.6376 -1.8886 0.20103 1.4547 6.1535 -3.0457 -0.14666 1.5286 0.86124 0.86867 -3.46 4.2878 3.5212 5.6133 2.4685 0.99671 0.92356 1.0517 0.87537 b = 7.5509 _PREDICTION Obj_00001 7.879 Obj_00002 7.243 Obj_00003 8.085 Obj_00004 7.815 Obj_00005 7.633 Obj_00006 8.395 Obj_00007 7.926 Obj_00008 7.730 Obj_00009 8.098 Obj_00010 6.441 Obj_00011 8.259 Obj_00012 6.823 Obj_00013 7.680 Obj_00014 7.145 Obj_00015 7.608 Obj_00016 7.923 Obj_00017 8.270 Obj_00018 7.466 Obj_00019 7.503 Obj_00020 7.691 Obj_00021 7.725 Obj_00022 7.883 Obj_00023 7.900 Obj_00024 6.300 Obj_00025 7.612 Obj_00026 8.123 Obj_00027 7.857 Obj_00028 8.109 Obj_00029 7.335 Obj_00030 8.053 Obj_00031 7.211 Obj_00032 7.663 Obj_00033 7.912 Obj_00034 7.806 Obj_00035 6.908 Obj_00036 7.901 Obj_00037 8.020 Obj_00038 7.656 Obj_00039 8.154 Obj_00040 5.643 Obj_00041 8.176 Obj_00042 7.794 Obj_00043 8.235 Obj_00044 7.097 Obj_00045 7.739 Obj_00046 7.919 Obj_00047 7.548 Obj_00048 8.069 Obj_00049 8.036 Obj_00050 5.828 Obj_00051 7.234 Obj_00052 6.815 Obj_00053 5.641 Obj_00054 8.462 Obj_00055 7.705 Obj_00056 5.188 Obj_00057 7.915 Obj_00058 7.577 Obj_00059 7.478 Obj_00060 7.908 Obj_00061 7.922 Obj_00062 7.627 Obj_00063 8.211 Obj_00064 7.795 Obj_00065 7.579 Obj_00066 7.873 Obj_00067 7.592 Obj_00068 7.572 Obj_00069 8.008 Obj_00070 7.954 Obj_00071 8.054 Obj_00072 7.677 Obj_00073 8.233 Obj_00074 8.140 Obj_00075 7.134 Obj_00076 7.649