News
Functionality of omniClassifier has been temporarily disabled until the prototype can be deployed as a volunteer grid computing project.
 
Recent Activity
Job Name Information Feature Selection & Classification
(New SEQC Pipeline LungCancer MIBLab Bowtie2 Single HTSeq Gene FPM MRMR
Bayes) miblab_bowtie2_single_htseq_gene_fpm_luad_train.txt->Array
bayes_(0|1)_(0|1)


2016-02-27 22:02:32
40c4034e71b7999b242f480578695abe
Queued 0%
Data Type:  Data Type
# Features:  55874
Pos. Samples:  Pos
Neg. Samples:  Neg
# Train Samples:  20(+), 24(–)
# Test Datasets:  1
# Test Samples 0:  20(+), 24(–)
CV Seed:  0
CV:  5x10
Opt. CV:  5x10
Feature Selection:

Classification:
View All Submitted Jobs
 
Submit Analysis Job
Description:
Your Name or Unique Identifier: (optional)
Training Data: 
Testing Data: 
Description of Negative Samples: (optional)
Description of Positive Samples: (optional)
Data Type: (optional)
Cross Validation Random Seed:
Cross Validation:
Internal CV?
Folds (M):
Iterations (N):
Optimizing Cross Validation:
Folds (K):
Iterations (L):
Feature Selection Methods:
Classification Methods:
Folds per Work Unit:
  
Gene Expression Data Format
Format your file as tab-delimited text:
  sample 1 sample 2 ... sample N
class label 1 label 2 ... label N
feature 1 X1,1 X2,1 ... XN,1
feature 2 X1,2 X2,2 ... XN,2
... ... ... ... ...
feature M X1,M X2,M ... XN,M
Replace the sample #, feature #, label #, and X#,# fields with the appropriate values.
Labels should be either -1 or +1. (currently, only binary classification is supported)
Features in the training data should match those of the testing data.
Make sure there are no extra spaces or tabs between elements and at the beginning/end of lines.
Nested Cross Validation w/ External Validation
omniClassifier uses nested cross validation with (an optional) external validation. Samples are randomly partitioned into stratified folds for each iteration. Random partitioning depends on the Cross Validation Random Seed. Folds from a previous analysis of a specific dataset can be exactly reproduced if the same random seed is used.
Each iteration (M) and fold (N) of the "Cross Validation" procedure is optimized using a nested "Optimizing Cross Validation" (see Figure). The purpose of the "Optimizing Cross Validation" is to choose optimal feature selection and classification parameters. Thus, the "CV Accuracy" result should be a mean (or some kind of summary) of feature selection and classification performance using optimal parameters.
Likewise, the "External Validation Accuracy" result is the performance of an optimized prediction model, evaluated using independent data. External validation is optional, and the training data may be used in place of the testing data (just be sure to ignore the external validation results if you do this).
CV Accuracy should be able to "estimate" External Validation Accuracy.
Feature Selection Methods
Feature selection methods are encoded as follows:
[fs code]_[param 1]_[param 2]_[param 3]_..._[param N]
Each feature selection method has a specific set of parameters that varies from method to method.
Method # Code Param 1 Param 2 Param 3 ... Param N
Fold Change 0 fc size1
T-test 1 t size1
Min. Redundancy Max. Relevance w/ Diff 2 mrmrd max2 size1
Min. Redundancy Max. Relevance w/ Quot. 3 mrmrq max2 size1
Significance Analysis of Microarrays 5 sam size1
Rank Sum Test 6 rs size1
Rank Products 7 rp size1
#This is the code used in the extracted MATLAB file
1Size is the number of features selected
2Max is the maximum number of features selected using the full mRMR procedure, all subsequent features are ranked by mutual information with the class labels.
Multiple values for a parameter can be encoded in short-hand notation as follows:
fc_(1:1:100) expands to fc_1,fc_2,...,fc_100
fc_(1:1:10|20:5:30) expands to fc_1,fc_2,...,fc_10,fc_20,fc_25,fc_30
Classification Methods
Classification methods are encoded as follows:
[cls code]_[param 1]_[param 2]_[param 3]_..._[param N]
Each classifier has a specific set of parameters that varies from method to method.
Method # Code Param 1 Param 2 Param 3 Param 4
Linear Support Vector Machine 0 svm lin cost
RBF Support Vector Machine 0 svm rbf gamma cost
K-Nearest Neighbors 1 knn k
Bayesian (Gaussian Model) 6 bayes pooled cov.1 cov. type2
Logistic Regression 8 lr
10 = not pooled, 1 = pooled covariance
20 = spherial, 1 = diagonal, 2 = full covariance
Multiple values for a parameter can be encoded in short-hand notation as follows:
bayes_(0|1)_(0:1:2) expands to bayes_0_0,bayes_0_1,bayes_0,2,bayes_1_0,bayes_1_1,bayes_1_2
MATLAB Output
All classification results can be downloaded to MATLAB *.mat files. These files contain a number of variables. Extraction of N-Tuple feature selection to *.mat files is not yet supported.
Variables Description Datatype
fs_methods List of feature selection methods Vector
fs_param1 Parameter 1 of feature selection methods Vector
fs_param2 Parameter 2 of feature selection methods Vector
cls_methods List of classification methods Vector (currently only one at a time)
cls_param1 Parameter 1 of classification methods Vector
cls_param2 Parameter 2 of classification methods Vector
cls_param3 Parameter 3 of classification methods Vector
cls_param4 Parameter 4 of classification methods Vector
thresholds List of classification thresholds Vector (currently only one, 0)
description Description of analysis String
metric Performance metric used during extraction (auc,acc,bauc,mcc) String
num_test_data Number of testing datasets Integer
cv_seed Cross validation random seed Integer
do_cv Indicates if internal cross validation was performed Integer
datatype Description of datatype String
pos_samples Description of positive samples String
neg_samples Description of negative samples String
num_pos_train_samples Number of positive samples in the training dataset Integer
num_neg_train_samples Number of negative samples in the training dataset Integer
num_pos_test_samples Number of positive samples in each testing dataset Vector
num_neg_test_samples Number of negative samples in each testing dataset Vector
features List of all features in datasets Cell array of strings
cv_features Features selected in CV 5-D cell array of vectors
ev_features Features selected in EV 3-D cell array of vectors
opt_cv Performance values from optimizing CV 12-D array
cv Performance values from CV 11-D array
opt_ev Performance values from optimizing EV 10-D array
ev Performance values from EV 10-D array
cv_features
Dimension Size
1 CV Iterations
2 CV Folds
3 length(fs_methods)
4 length(fs_param1)
5 length(fs_param2)
ev_features
Dimension Size
1 length(fs_methods)
2 length(fs_param1)
3 length(fs_param2)
opt_cv
Dimension Size
1 CV Iterations
2 CV Folds
3 length(fs_methods)
4 length(fs_param1)
5 length(fs_param2)
6 length(cls_methods)
7 length(cls_param1)
8 length(cls_param2)
9 length(cls_param3)
10 length(cls_param4)
11 Opt. CV Iter. x Opt. CV Folds
12 length(thresholds)
cv
Dimension Size
1 CV Iterations
2 CV Folds
3 length(fs_methods)
4 length(fs_param1)
5 length(fs_param2)
6 length(cls_methods)
7 length(cls_param1)
8 length(cls_param2)
9 length(cls_param3)
10 length(cls_param4)
11 length(thresholds)
opt_ev
Dimension Size
1 length(fs_methods)
2 length(fs_param1)
3 length(fs_param2)
4 length(cls_methods)
5 length(cls_param1)
6 length(cls_param2)
7 length(cls_param3)
8 length(cls_param4)
9 Opt. CV Iter. x Opt. CV Folds
10 length(thresholds)
ev
Dimension Size
1 length(num_test_data)
2 length(fs_methods)
3 length(fs_param1)
4 length(fs_param2)
5 length(cls_methods)
6 length(cls_param1)
7 length(cls_param2)
8 length(cls_param3)
9 length(cls_param4)
10 length(thresholds)