_PROBLEM CoEPrA-2006_Regression_003_Dataset_2 _GROUP_NAME Curt Breneman _GROUP_MEMBERS Kristin Bennett Charles Bergeron Curt Breneman Theresa Hepburn Michael Krein Min Li Steven Mulick Sukumar Nagamani Matthew Sundling _ADDRESS Rensselaer Exploratory Centre for Cheminformatics Research, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, New York 12180-3522, United States of America _MODELING_PROCEDURE For each nonapeptide, 5787 descriptors are provided. Additionally, 26 SIMIL descriptors were generated; these describe similarities in residues based on the following classes: tiny, small, positive, negative, polar, non-polar, aliphatic, and aromatic. Each feature was centred to zero median and unit absolute deviation. Lossless data compression was accomplished by seperately considering each amino acid's descriptors and the SIMIL descriptors. Principal components analysis on each descriptor subset was used to find components of nonzero variance. This resulted in 185 components. Data reduction was accomplished by considering the covariance matrix C and an estimate S of how the covariance differs between the calibration and first prediction sets. Minimum noise fractions, a generalized principal components analysis, was used. The generalized singular value decomposition of C with respect to S found "noise"-insensitive features. SNR is given by the singular values minus one; an SNR cutoff of 10 was chosen; hence, there remained 148 features. Kernel partial least squares was used for modeling. Model selection found that 4 latent variables and an exponential kernel function with sigma=1000 was retained. The resulting q^2 was 0.3822 for leave-one-out cross-validation across the calibration set. Consider model f=KB+b where f is the prediction reported below, K is the kernel matrix, B are the regression coefficients and b is the bias constant. Then B and b are as follows: B = -15.6813607570176 -9.42790445676816 -8.66189297018583 -9.93154009239204 -6.40969780097585 -7.28191377167004 -3.75552876225701 -8.58276892828209 -5.17496020098558 -1.56948884161684 -6.28822686454907 -2.61507472154591 -5.86425128380494 -8.85129800235377 -4.23593559781983 -6.73486221710064 -4.12845693396226 -4.8194730359042 -5.2190443738612 -7.66522564355446 -1.96850266183281 -4.91025334670188 -6.70842864094941 1.58143300972474 -5.99299472281593 -3.88120885030259 -1.62241538791443 -3.29961738383301 -8.68932734738482 0.407539209229032 -2.76064139022652 1.93847011340118 -2.52487260758584 -4.17288516466459 -7.19374006130904 -4.17147244232503 -1.06938072898093 -3.21137973148393 -3.14378288204505 -4.69359064892285 -5.63090241021722 -0.298031213500826 -3.80193646855283 1.22816239547553 -4.60465702566834 -0.162668899258707 -2.87221837285775 -3.26583091959754 -2.47820312028539 -1.01764339366236 -2.01809385116099 -3.35777845113554 -2.68529382778804 -1.48095915337912 -0.392125602921693 3.52999758995796 -1.27668119748472 2.20356231148548 -2.69012679767703 -3.0826510459201 -3.03777263443175 -3.81522473917964 -2.95489517678838 -1.46600801890124 4.23378012589996 3.39061920542751 4.62093170489756 0.76101485438306 -1.06831773062057 1.31720623631879 2.01639210966206 1.02250476220519 1.21190200095021 4.25164116986223 -1.75698498591607 -3.56101506124742 -0.300836805757353 -5.38660205732338 -0.909981125152722 2.67403652104354 -1.82030710796276 -2.77774427223599 -0.166801220375569 -1.76625365895771 1.43441007509574 2.17295711964932 4.05118479734694 1.30004633492913 4.87054076815175 5.27700583076627 5.16595184889003 3.7505606252977 8.3393448185694 1.19683000217075 4.42678419198064 4.73584606941331 2.12149559966008 0.866464687938236 5.28262540197609 3.80298695780498 5.32102035288612 1.78314597660842 1.46565518066376 4.70673652849943 5.54360539976645 -1.68478494784154 3.48587352143366 2.07288443265245 10.2015808845345 10.6240230165646 6.12277922727913 0.281166892358236 1.42039254800417 9.32821272773033 5.03589769607138 4.38923693760364 5.63546958374979 4.64030074553935 8.7464468188418 0.750797044768218 3.61777342217929 7.62801404890132 2.92572724661179 5.57165922118489 9.23885681544552 6.83391230252171 10.0729348780033 6.57990087263573 6.03811890294356 5.48880105355749 7.54168709434206 9.47167115652201 7.57439281512461 b = 7.08152631578948 _PREDICTION Obj_00001 7.183 Obj_00002 7.404 Obj_00003 6.710 Obj_00004 7.068 Obj_00005 7.518 Obj_00006 7.954 Obj_00007 6.706 Obj_00008 6.880 Obj_00009 7.395 Obj_00010 8.034 Obj_00011 6.594 Obj_00012 6.821 Obj_00013 6.546 Obj_00014 6.689 Obj_00015 6.451 Obj_00016 7.030 Obj_00017 6.335 Obj_00018 6.546 Obj_00019 6.572 Obj_00020 6.581 Obj_00021 6.999 Obj_00022 6.611 Obj_00023 6.611 Obj_00024 6.951 Obj_00025 6.436 Obj_00026 6.487 Obj_00027 7.288 Obj_00028 6.556 Obj_00029 6.611 Obj_00030 7.100 Obj_00031 6.738 Obj_00032 7.371 Obj_00033 7.285 Obj_00034 7.334 Obj_00035 6.743 Obj_00036 6.774 Obj_00037 7.096 Obj_00038 6.859 Obj_00039 7.511 Obj_00040 5.505 Obj_00041 5.705 Obj_00042 7.147 Obj_00043 7.321 Obj_00044 6.389 Obj_00045 7.061 Obj_00046 6.799 Obj_00047 7.094