(Selecting Connected Explanatory SNPs)
MATLAB code implementing the method described in:
C.-A. Azencott, D. Grimm, M. Sugiyama, Y. Kawahara and K. Borgwardt (2013) Efficient network-guided multi-locus association mapping with graph cuts, Bioinformatics 29 (13), i171-i179 doi:10.1093/bioinformatics/btt238
-
EasyGWAS: SConES has been integrated to EasyGWAS, a framework for the analysis and meta-analysis of GWAS data. In particular, this offers a Python interface.
-
sfan: Regarding the feature selection part (i.e. after the GWAS data has been processed and the SNP scored), sfan uses a different (faster) maxflow solver, is written in Python, and also incorporates the multi-task version proposed in
M. Sugiyama, C.-A. Azencott, D. Grimm, Y. Kawahara and K. Borgwardt (2014) Multi-task feature selection on multiple networks via maximum flows, SIAM ICDM, 199-207 doi:10.1137/1.9781611973440.23
- MultiSConeS: For the original version of this multi-task version, see Multi-SConES.
In the code folder, there is a MATLAB script demo.m.
To run the demo start MATLAB and type in:
demo()
X= Genotypematrix of sizen x s, wherenis the number of samples andsis the number of SNPsY= Phenotypevector of sizen x 1, wherenis the number of samplesW= sparse network with sizes x s
Demo files are provided in the data folder.
<<<<<<< HEAD
=======
[indicators, objectives] = scones(data, option)
a957a00a7dfd4d289aeb55fba4d65e96b145adba To run SConES two parameters are needed. The first one is a data cell array:
data.Xis the genotype datadata.Yis the phenotypedata.Wis the sparse network
<<<<<<< HEADdata.selected_PCsis the number of principle components that should be used for population structure correctiondata.lambda_valuesis a vector of size1 x kwithkvalues forlambdadata.eta_valuesis a vector of size1 x hwithhvalues foreta
=======
data.selected_PCsis the number of principal components that should be used for population structure correctiondata.lambda_valuesis a vector of size1 x kwithkvalues forlambdadata.eta_valuesis a vector of size1 x hwithhvalues foreta
a957a00a7dfd4d289aeb55fba4d65e96b145adba The second parameter is a options cell array (optional - default values are specified):
options.automatic: if this parameter is truedata.lambda_valuesanddata.eta_valuesare determined automatically (default:true)options.number_parameters: this parameters specifices the number of eta and lambda values in the caseoptions.automaticis set to true (default:10)options.stdout: if this parameter is true output is printed into the terminal window (default:true)
<<<<<<< HEADoptions.nfold: ifscones_crossvalidationis called this parameter specifices the number of folds (default:10)options.seed: ifscones_crossvalidationis called this parameter specifies a seed for splitting the data (default: 0)
=======
a957a00a7dfd4d289aeb55fba4d65e96b145adba
- indicators = indicator matrix of size
n x k x h, wherenis the length of vectorc,kthe length of vectorlambda_valuesandhthe length of vectoreta_values - objectives = matrix with all objective values with size
k x hfor the grid oflambda x etavalues
<<<<<<< HEAD
=======
[indicators, objectives] = scones_crossvalidation(data, option)
The first parameter (data) is the same as described above.
The second parameter (options) can additionally take the following values:
options.nfold: ifscones_crossvalidationis called this parameter specifices the number of folds (default:10)options.seed: ifscones_crossvalidationis called this parameter specifies a seed for splitting the data (default: 0)
a957a00a7dfd4d289aeb55fba4d65e96b145adba
Any questions can be directed to Chloe-Agathe Azencott: chloe-agathe.azencott [at] mines-paristech.fr