Home > RAVEN > getINITModel.m

getINITModel

PURPOSE ^

getINITModel

SYNOPSIS ^

function [model metProduction essentialRxnsForTasks addedRxnsForTasks deletedDeadEndRxns deletedRxnsInINIT taskReport]=getINITModel(refModel, tissue, celltype, hpaData, arrayData, metabolomicsData, taskFile, useScoresForTasks, printReport, taskStructure, params, paramsFT)

DESCRIPTION ^

 getINITModel
   Generates a model using the INIT algorithm, based on proteomics and/or
   transcriptomics and/or metabolomics and/or metabolic tasks.

   refModel            a model structure. The model should be in the
                       closed form (no exchange reactions open). Import
                       using import(filename,false). If the model is not
                       loaded using importModel, it might be that there
                       is no "unconstrained" field. In that case,
                       manually add the field like:
                       model.unconstrained=false(numel(model.mets),1);
   tissue              tissue to score for. Should exist in either
                       hpaData.tissues or arrayData.tissues
   celltype            cell type to score for. Should exist in either
                       hpaData.celltypes or arrayData.celltypes for this 
                       tissue (opt, default is to use the best values
                       among all the cell types for the tissue. Use [] if
                       you want to supply more arguments)
   hpaData             HPA data structure from parseHPA (opt if arrayData is 
                       supplied, default [])
   arrayData           gene expression data structure (opt if hpaData is
                       supplied, default [])
       genes           cell array with the unique gene names
       tissues         cell array with the tissue names. The list may not be
                       unique, as there can be multiple cell types per tissue
       celltypes       cell array with the cell type names for each tissue
       levels          GENESxTISSUES array with the expression level for
                       each gene in each tissue/celltype. NaN should be
                       used when no measurement was performed
   metabolomicsData    cell array with metabolite names that the model
                       should produce (opt, default [])
   taskFile            a task list in Excel format. See parseTaskList for
                       details (opt, default [])
   useScoresForTasks   true if the calculated reaction scored should be used as
                       weights in the fitting to tasks (opt, default true)
   printReport         true if a report should be printed to the screen
                       (opt, default true)
   taskStructure       task structure as from parseTaskList. Can be used
                       as an alternative way to define tasks when Excel
                       sheets are not suitable. Overrides taskFile (opt,
                       default [])
   params              parameter structure as used by getMILPParams. This is
                       for the INIT algorithm. For the the MILP problems
                       solved to fit tasks, see paramsFT (opt, default [])
   paramsFT            parameter structure as used by getMILPParams. This is
                       for the fitTasks step. For the INIT algorithm, see
                       params (opt, default [])

   model                   the resulting model structure
   metProduction           array that indicates which of the
                           metabolites in metabolomicsData that could be
                           produced. Note that this is before the
                           gap-filling process to enable defined tasks. To
                           see which metabolites that can be produced in
                           the final model, use canProduce. 
                           -2: metabolite name not found in model
                           -1: metabolite found, but it could not be produced
                           1: metabolite could be produced
   essentialRxnsForTasks   cell array of the reactions which were
                           essential to perform the tasks
   addedRxnsForTasks       cell array of the reactions which were added in
                           order to perform the tasks
   deletedDeadEndRxns      cell array of reactions deleted because they
                           could not carry flux (INIT requires a
                           functional input model)
   deletedRxnsInINIT       cell array of the reactions which were deleted by 
                           the INIT algorithm
   taskReport              structure with the results for each task
       id                  cell array with the id of the task
       description         cell array with the description of the task
       ok                  boolean array with true if the task was successful
       essential           cell array with cell arrays of essential
                           reactions for the task
       gapfill             cell array of cell arrays of reactions included
                           in the gap-filling for the task

   This is the main function for automatic reconstruction of models based
   on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all
   settings are possible using this function, and you may want to call the
   functions scoreModel, runINIT and fitTasks individually instead.

   NOTE: Exchange metabolites should normally not be removed from the model
   when using this approach, since checkTasks/fitTasks rely on putting specific
   constraints for each task. The INIT algorithm will remove exchange metabolites
   if any are present. Use importModel(file,false) to import a model with
   exchange metabolites remaining.

   Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks...
               deletedDeadEndRxns deletedRxnsInINIT taskReport]=...
               getINITModel(refModel, tissue, celltype, hpaData, arrayData,...
               metabolomicsData, taskFile, useScoresForTasks, printReport,...
               taskStructure, params, paramsFT)

   Rasmus Agren, 2013-05-16

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [model metProduction essentialRxnsForTasks addedRxnsForTasks deletedDeadEndRxns deletedRxnsInINIT taskReport]=getINITModel(refModel, tissue, celltype, hpaData, arrayData, metabolomicsData, taskFile, useScoresForTasks, printReport, taskStructure, params, paramsFT)
0002 % getINITModel
0003 %   Generates a model using the INIT algorithm, based on proteomics and/or
0004 %   transcriptomics and/or metabolomics and/or metabolic tasks.
0005 %
0006 %   refModel            a model structure. The model should be in the
0007 %                       closed form (no exchange reactions open). Import
0008 %                       using import(filename,false). If the model is not
0009 %                       loaded using importModel, it might be that there
0010 %                       is no "unconstrained" field. In that case,
0011 %                       manually add the field like:
0012 %                       model.unconstrained=false(numel(model.mets),1);
0013 %   tissue              tissue to score for. Should exist in either
0014 %                       hpaData.tissues or arrayData.tissues
0015 %   celltype            cell type to score for. Should exist in either
0016 %                       hpaData.celltypes or arrayData.celltypes for this
0017 %                       tissue (opt, default is to use the best values
0018 %                       among all the cell types for the tissue. Use [] if
0019 %                       you want to supply more arguments)
0020 %   hpaData             HPA data structure from parseHPA (opt if arrayData is
0021 %                       supplied, default [])
0022 %   arrayData           gene expression data structure (opt if hpaData is
0023 %                       supplied, default [])
0024 %       genes           cell array with the unique gene names
0025 %       tissues         cell array with the tissue names. The list may not be
0026 %                       unique, as there can be multiple cell types per tissue
0027 %       celltypes       cell array with the cell type names for each tissue
0028 %       levels          GENESxTISSUES array with the expression level for
0029 %                       each gene in each tissue/celltype. NaN should be
0030 %                       used when no measurement was performed
0031 %   metabolomicsData    cell array with metabolite names that the model
0032 %                       should produce (opt, default [])
0033 %   taskFile            a task list in Excel format. See parseTaskList for
0034 %                       details (opt, default [])
0035 %   useScoresForTasks   true if the calculated reaction scored should be used as
0036 %                       weights in the fitting to tasks (opt, default true)
0037 %   printReport         true if a report should be printed to the screen
0038 %                       (opt, default true)
0039 %   taskStructure       task structure as from parseTaskList. Can be used
0040 %                       as an alternative way to define tasks when Excel
0041 %                       sheets are not suitable. Overrides taskFile (opt,
0042 %                       default [])
0043 %   params              parameter structure as used by getMILPParams. This is
0044 %                       for the INIT algorithm. For the the MILP problems
0045 %                       solved to fit tasks, see paramsFT (opt, default [])
0046 %   paramsFT            parameter structure as used by getMILPParams. This is
0047 %                       for the fitTasks step. For the INIT algorithm, see
0048 %                       params (opt, default [])
0049 %
0050 %   model                   the resulting model structure
0051 %   metProduction           array that indicates which of the
0052 %                           metabolites in metabolomicsData that could be
0053 %                           produced. Note that this is before the
0054 %                           gap-filling process to enable defined tasks. To
0055 %                           see which metabolites that can be produced in
0056 %                           the final model, use canProduce.
0057 %                           -2: metabolite name not found in model
0058 %                           -1: metabolite found, but it could not be produced
0059 %                           1: metabolite could be produced
0060 %   essentialRxnsForTasks   cell array of the reactions which were
0061 %                           essential to perform the tasks
0062 %   addedRxnsForTasks       cell array of the reactions which were added in
0063 %                           order to perform the tasks
0064 %   deletedDeadEndRxns      cell array of reactions deleted because they
0065 %                           could not carry flux (INIT requires a
0066 %                           functional input model)
0067 %   deletedRxnsInINIT       cell array of the reactions which were deleted by
0068 %                           the INIT algorithm
0069 %   taskReport              structure with the results for each task
0070 %       id                  cell array with the id of the task
0071 %       description         cell array with the description of the task
0072 %       ok                  boolean array with true if the task was successful
0073 %       essential           cell array with cell arrays of essential
0074 %                           reactions for the task
0075 %       gapfill             cell array of cell arrays of reactions included
0076 %                           in the gap-filling for the task
0077 %
0078 %   This is the main function for automatic reconstruction of models based
0079 %   on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all
0080 %   settings are possible using this function, and you may want to call the
0081 %   functions scoreModel, runINIT and fitTasks individually instead.
0082 %
0083 %   NOTE: Exchange metabolites should normally not be removed from the model
0084 %   when using this approach, since checkTasks/fitTasks rely on putting specific
0085 %   constraints for each task. The INIT algorithm will remove exchange metabolites
0086 %   if any are present. Use importModel(file,false) to import a model with
0087 %   exchange metabolites remaining.
0088 %
0089 %   Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks...
0090 %               deletedDeadEndRxns deletedRxnsInINIT taskReport]=...
0091 %               getINITModel(refModel, tissue, celltype, hpaData, arrayData,...
0092 %               metabolomicsData, taskFile, useScoresForTasks, printReport,...
0093 %               taskStructure, params, paramsFT)
0094 %
0095 %   Rasmus Agren, 2013-05-16
0096 %
0097 
0098 if nargin<3
0099     celltype=[];
0100 end
0101 if nargin<4
0102     hpaData=[];
0103 end
0104 if nargin<5
0105     arrayData=[];
0106 end
0107 if nargin<6
0108     metabolomicsData=[];
0109 end
0110 if nargin<7
0111     taskFile=[];
0112 end
0113 if nargin<8
0114     useScoresForTasks=true;
0115 end
0116 if nargin<9
0117     printReport=true;
0118 end
0119 if nargin<10
0120     taskStructure=[];
0121 end
0122 if nargin<11
0123     params=[];
0124 end
0125 if nargin<12
0126     paramsFT=[];
0127 end
0128 
0129 %Check that the model is in the closed form
0130 if ~isfield(refModel,'unconstrained')
0131    throw(MException('','Exchange metabolites should normally not be removed from the model when using getINITModel. Use importModel(file,false) to import a model with exchange metabolites remaining (see the documentation for details)')); 
0132 end
0133 
0134 %Create the task structure if not supplied
0135 if any(taskFile) && isempty(taskStructure)
0136     taskStructure=parseTaskList(taskFile);
0137 end
0138 
0139 if printReport==true
0140     if any(celltype)
0141         fprintf(['***Generating model for: ' tissue ' - ' celltype '\n']);
0142     else
0143         fprintf(['***Generating model for: ' tissue '\n']);
0144     end
0145     if ~isempty(hpaData)
0146         fprintf('-Using HPA data\n');
0147     end
0148     if ~isempty(arrayData)
0149         fprintf('-Using array data\n');
0150     end
0151     if ~isempty(metabolomicsData)
0152         fprintf('-Using metabolomics data\n');
0153     end
0154     if any(taskFile)
0155         fprintf('-Using metabolic tasks\n');
0156     end
0157     fprintf('\n');
0158     
0159     printScores(refModel,'Reference model statistics',hpaData,arrayData,tissue,celltype);
0160 end
0161 
0162 %Remove dead-end reactions to speed up the optimization and to
0163 %differentiate between reactions removed by INIT and those that are
0164 %dead-end
0165 [crap, deletedDeadEndRxns]=simplifyModel(refModel,true,false,true,true,true);
0166 cModel=removeRxns(refModel,deletedDeadEndRxns,false,true);
0167 
0168 %Store the connected model like this to keep track of stuff
0169 if printReport==true
0170     printScores(cModel,'Pruned model statistics',hpaData,arrayData,tissue,celltype);
0171 end
0172 
0173 %If tasks have been defined, then go through them and get essential
0174 %reactions
0175 if ~isempty(taskStructure)
0176     [taskReport essentialRxnMat]=checkTasks(cModel,[],printReport,true,true,taskStructure);
0177     
0178     essentialRxnsForTasks=cModel.rxns(any(essentialRxnMat,2));
0179     
0180     %Remove tasks that cannot be performed
0181     taskStructure(taskReport.ok==false)=[];
0182     if printReport==true
0183         printScores(removeRxns(cModel,setdiff(cModel.rxns,essentialRxnsForTasks),true,true),'Reactions essential for tasks',hpaData,arrayData,tissue,celltype);
0184     end
0185 else
0186     essentialRxnsForTasks={};
0187 end
0188 
0189 %Score the connected model
0190 [rxnScores geneScores]=scoreModel(cModel,hpaData,arrayData,tissue,celltype);
0191 
0192 %Run the INIT algorithm. The exchange reactions that are used in the final
0193 %reactions will be open, which doesn't fit with the last step. Therefore
0194 %delete reactions from the original model instead of taking the output.
0195 %The default implementation does not constrain reversible reactions to only
0196 %carry flux in one direction.
0197 %Runs without the constraints on reversibility and with all output allowed.
0198 %This is to reduce the complexity of the problem.
0199 [crap deletedRxnsInINIT metProduction]=runINIT(simplifyModel(cModel),rxnScores,metabolomicsData,essentialRxnsForTasks,0,true,false,params);
0200 initModel=removeRxns(cModel,deletedRxnsInINIT,true,true);
0201 if printReport==true
0202     printScores(initModel,'INIT model statistics',hpaData,arrayData,tissue,celltype);
0203     printScores(removeRxns(cModel,setdiff(cModel.rxns,deletedRxnsInINIT),true,true),'Reactions deleted by INIT',hpaData,arrayData,tissue,celltype);
0204 end
0205 
0206 %The full model has exchange reactions in it. fitTasks calls on fillGaps,
0207 %which automatically removes exchange metabolites (because it assumes that
0208 %the reactions are constrained when appropriate). In this case the
0209 %uptakes/outputs are retrieved from the task sheet instead. To prevent
0210 %exchange reactions being used to fill gaps, they are delete from the
0211 %reference model here.
0212 initModel.id='INITModel';
0213 
0214 %If gaps in the model should be filled using a task list
0215 if ~isempty(taskStructure)
0216     %Remove exchange reactions and reactions already included in the INIT
0217     %model
0218     refModelNoExc=removeRxns(refModel,union(initModel.rxns,getExchangeRxns(refModel)),true,true);
0219     
0220     %At this stage the model is fully connected and most of the genes with
0221     %good scores should have been included. The final gap-filling should
0222     %take the scores of the genes into account, so that "rather bad"
0223     %reactions are preferred to "very bad" reactions. However, reactions
0224     %with positive scores will be included even if they are not connected
0225     %in the current formulation. Therefore, such reactions will have to be
0226     %assigned a small negative score instead.
0227     if useScoresForTasks==true
0228         refRxnScores=scoreModel(refModelNoExc,hpaData,arrayData,tissue,celltype);
0229         [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,min(refRxnScores,-0.1),taskStructure,paramsFT);
0230     else
0231         [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,[],taskStructure,paramsFT);
0232     end
0233     if printReport==true
0234         printScores(outModel,'Functional model statistics',hpaData,arrayData,tissue,celltype);
0235         printScores(removeRxns(outModel,intersect(outModel.rxns,initModel.rxns),true,true),'Reactions added to perform the tasks',hpaData,arrayData,tissue,celltype);
0236     end
0237     
0238     addedRxnsForTasks=refModelNoExc.rxns(any(addedRxnMat,2));
0239 else
0240     outModel=initModel;
0241     addedRxnMat=[];
0242     addedRxnsForTasks={};
0243 end
0244 
0245 %The model can now perform all the tasks defined in the task list. The
0246 %algorithm cannot deal with gene-complexes at the moment. It is therefore
0247 %ok to remove bad genes from a reaction (as long as at least one gene is
0248 %kept)
0249 model=outModel;
0250 
0251 [crap I]=ismember(model.genes,cModel.genes); %All should be found
0252 %This is a little weird way to make sure that only one bad gene is included
0253 %if there are no good ones (since all -Inf==max(-Inf))
0254 geneScores(isinf(geneScores))=-1000+rand(sum(isinf(geneScores)),1);
0255 
0256 model.grRules(:)={''};
0257 for i=1:numel(model.rxns)
0258     ids=find(model.rxnGeneMat(i,:));
0259     if numel(ids)>1
0260        scores=geneScores(I(ids));
0261        %Only keep the positive ones if possible
0262        model.rxnGeneMat(i,ids(~(scores>0 | scores==max(scores))))=0;
0263     end
0264     %Rewrite the grRules to be only OR
0265     if isfield(model,'grRules')
0266        J=find(model.rxnGeneMat(i,:));
0267        for j=1:numel(J)
0268            model.grRules{i}=[model.grRules{i} '(' model.genes{J(j)} ')'];
0269            if j<numel(J)
0270                model.grRules{i}=[model.grRules{i} ' or '];
0271            end
0272        end
0273     end
0274 end
0275 
0276 %Find all genes that are not used and delete them
0277 I=sum(model.rxnGeneMat)==0;
0278 model.genes(I)=[];
0279 model.rxnGeneMat(:,I)=[];
0280 if isfield(model,'geneShortNames')
0281     model.geneShortNames(I)=[];
0282 end
0283 if isfield(model,'geneMiriams')
0284     model.geneMiriams(I)=[];
0285 end
0286 if isfield(model,'geneFrom')
0287     model.geneFrom(I)=[];
0288 end
0289 
0290 %At this stage the model will contain some exchange reactions but probably not all
0291 %(and maybe zero). This can be inconvenient, so all exchange reactions from the
0292 %reference model are added, except for those which involve metabolites that
0293 %are not in the model.
0294 
0295 %First delete and included exchange reactions in order to prevent the order
0296 %from changing
0297 model=removeRxns(model,getExchangeRxns(model));
0298 
0299 %Create a model with only the exchange reactions in refModel
0300 excModel=removeRxns(refModel,setdiff(refModel.rxns,getExchangeRxns(refModel)),true,true);
0301 
0302 %Find the metabolites there which are not exchange metabolites and which do
0303 %not exist in the output model
0304 I=~ismember(excModel.mets,model.mets) & excModel.unconstrained==0;
0305 
0306 %Then find those reactions and delete them
0307 [crap J]=find(excModel.S(I,:));
0308 excModel=removeRxns(excModel,J,true,true);
0309 
0310 %Merge with the output model
0311 model=mergeModels({model;excModel});
0312 model.id='INITModel';
0313 model.description=['Automatically generated model for ' tissue];
0314 if any(celltype)
0315     model.description=[model.description ' - ' celltype];
0316 end
0317 
0318 if printReport==true
0319     printScores(model,'Final model statistics',hpaData,arrayData,tissue,celltype); 
0320 end
0321 
0322 %Add information about essential reactions and reactions included for
0323 %gap-filling and return a taskReport
0324 if ~isempty(taskStructure)
0325     I=find(taskReport.ok); %Ignore failed tasks
0326     for i=1:numel(I)
0327         taskReport.essential{I(i),1}=cModel.rxns(essentialRxnMat(:,I(i)));
0328         taskReport.gapfill{I(i),1}=refModelNoExc.rxns(addedRxnMat(:,i));
0329     end
0330 else
0331     taskReport=[];
0332 end
0333 end
0334 
0335 %This is for printing a summary of a model
0336 function [rxnS geneS]=printScores(model,name,hpaData,arrayData,tissue,celltype)
0337     [a b]=scoreModel(model,hpaData,arrayData,tissue,celltype);
0338     rxnS=mean(a);
0339     geneS=mean(b(~isinf(b)));
0340     fprintf([name ':\n']);
0341     fprintf(['\t' num2str(numel(model.rxns)) ' reactions, ' num2str(numel(model.genes)) ' genes\n']);
0342     fprintf(['\tMean reaction score: ' num2str(rxnS) '\n']);
0343     fprintf(['\tMean gene score: ' num2str(geneS) '\n']);
0344     fprintf(['\tReactions with positive scores: ' num2str(100*sum(a>0)/numel(a)) '%%\n\n']);
0345 end

Generated on Tue 16-Jul-2013 21:50:02 by m2html © 2005