_PROBLEM CoEPrA-2006_Classification_002 _GROUP_NAME Levon Budagyan _GROUP_MEMBERS Levon Budagyan _ADDRESS levon@molsoft.com _MODELING_PROCEDURE We used gapped pair counts as descriptors and an SVM classifier as a prediction method. The gapped pair count vector of a sequence with gap length >=0 is a vector with coordinates indexed by sequence alphabet symbol pairs, i.e. it has 26 x 26 coordinates, one for each pair of letters. For each pair the corresponding vector component contains the quantity of such ordered pairs with a given gap between them. E.g. for gap length l=2 and alphabet pair (A,A), the corresponding vector component will contain the number of A**A subsequences in the sequence, where * stands for any symbol. Descriptor vectors were composed from the pair count vectors with different gaps. We concatenated the pair count vectors for the gap sizes from 0 up to some gap length l0 (we used l0=3). Totally, we had m = 26^2 (l0+1) components in each sequence descriptor vector. SVM classifier with dot kernel was used. _PREDICTION Obj_00001 +1 Obj_00002 +1 Obj_00003 -1 Obj_00004 +1 Obj_00005 +1 Obj_00006 -1 Obj_00007 -1 Obj_00008 -1 Obj_00009 +1 Obj_00010 -1 Obj_00011 -1 Obj_00012 +1 Obj_00013 +1 Obj_00014 +1 Obj_00015 -1 Obj_00016 -1 Obj_00017 +1 Obj_00018 -1 Obj_00019 +1 Obj_00020 -1 Obj_00021 -1 Obj_00022 +1 Obj_00023 +1 Obj_00024 -1 Obj_00025 -1 Obj_00026 -1 Obj_00027 +1 Obj_00028 +1 Obj_00029 -1 Obj_00030 +1 Obj_00031 -1 Obj_00032 +1 Obj_00033 -1 Obj_00034 -1 Obj_00035 -1 Obj_00036 -1 Obj_00037 -1 Obj_00038 -1 Obj_00039 +1 Obj_00040 +1 Obj_00041 -1 Obj_00042 -1 Obj_00043 -1 Obj_00044 -1 Obj_00045 +1 Obj_00046 -1 Obj_00047 -1 Obj_00048 +1 Obj_00049 +1 Obj_00050 -1 Obj_00051 +1 Obj_00052 +1 Obj_00053 +1 Obj_00054 +1 Obj_00055 +1 Obj_00056 -1 Obj_00057 -1 Obj_00058 +1 Obj_00059 -1 Obj_00060 +1 Obj_00061 -1 Obj_00062 +1 Obj_00063 +1 Obj_00064 +1 Obj_00065 -1 Obj_00066 +1 Obj_00067 -1 Obj_00068 +1 Obj_00069 -1 Obj_00070 +1 Obj_00071 +1 Obj_00072 +1 Obj_00073 +1 Obj_00074 +1 Obj_00075 +1 Obj_00076 -1