Home > RAVEN > getINITModel.m

getINITModel

PURPOSE ^

getINITModel

SYNOPSIS ^

function [model metProduction essentialRxnsForTasks addedRxnsForTasks deletedDeadEndRxns deletedRxnsInINIT taskReport]=getINITModel(refModel, tissue, celltype, hpaData, arrayData, metabolomicsData, taskFile, useScoresForTasks, printReport, taskStructure, params, paramsFT)

DESCRIPTION ^

 getINITModel
   Generates a model using the INIT algorithm, based on proteomics and/or
   transcriptomics and/or metabolomics and/or metabolic tasks.

   refModel            a model structure
   tissue              tissue to score for. Should exist in either
                       hpaData.tissues or arrayData.tissues
   celltype            cell type to score for. Should exist in either
                       hpaData.celltypes or arrayData.celltypes for this 
                       tissue (opt, default is to use the best values
                       among all the cell types for the tissue. Use [] if
                       you want to supply more arguments)
   hpaData             HPA data structure from parseHPA (opt if arrayData is 
                       supplied, default [])
   arrayData           gene expression data structure (opt if hpaData is
                       supplied, default [])
       genes           cell array with the unique gene names
       tissues         cell array with the tissue names. The list may not be
                       unique, as there can be multiple cell types per tissue
       celltypes       cell array with the cell type names for each tissue
       levels          GENESxTISSUES array with the expression level for
                       each gene in each tissue/celltype. NaN should be
                       used when no measurement was performed
   metabolomicsData    cell array with metabolite names that the model
                       should produce (opt, default [])
   taskFile            a task list in Excel format. See parseTaskList for
                       details (opt, default [])
   useScoresForTasks   true if the calculated reaction scored should be used as
                       weights in the fitting to tasks (opt, default true)
   printReport         true if a report should be printed to the screen
                       (opt, default true)
   taskStructure       task structure as from parseTaskList. Can be used
                       as an alternative way to define tasks when Excel
                       sheets are not suitable. Overrides taskFile (opt,
                       default [])
   params              parameter structure as used by getMILPParams. This is
                       for the INIT algorithm. For the the MILP problems
                       solved to fit tasks, see paramsFT (opt, default [])
   paramsFT            parameter structure as used by getMILPParams. This is
                       for the fitTasks step. For the INIT algorithm, see
                       params (opt, default [])

   model                   the resulting model structure
   metProduction           array that indicates which of the
                           metabolites in metabolomicsData that could be
                           produced. Note that this is before the
                           gap-filling process to enable defined tasks. To
                           see which metabolites that can be produced in
                           the final model, use canProduce. 
                           -2: metabolite name not found in model
                           -1: metabolite found, but it could not be produced
                           1: metabolite could be produced
   essentialRxnsForTasks   cell array of the reactions which were
                           essential to perform the tasks
   addedRxnsForTasks       cell array of the reactions which were added in
                           order to perform the tasks
   deletedDeadEndRxns      cell array of reactions deleted because they
                           could not carry flux (INIT requires a
                           functional input model)
   deletedRxnsInINIT       cell array of the reactions which were deleted by 
                           the INIT algorithm
   taskReport              structure with the results for each task
       id                  cell array with the id of the task
       description         cell array with the description of the task
       ok                  boolean array with true if the task was successful
       essential           cell array with cell arrays of essential
                           reactions for the task
       gapfill             cell array of cell arrays of reactions included
                           in the gap-filling for the task

   This is the main function for automatic reconstruction of models based
   on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all
   settings are possible using this function, and you may want to call the
   functions scoreModel, runINIT and fitTasks individually instead.

   NOTE: Exchange metabolites should normally not be removed from the model
   when using this approach, since checkTasks/fitTasks rely on putting specific
   constraints for each task. The INIT algorithm will remove exchange metabolites
   if any are present. Use importModel(file,false) to import a model with
   exchange metabolites remaining.

   Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks...
               deletedDeadEndRxns deletedRxnsInINIT taskReport]=...
               getINITModel(refModel, tissue, celltype, hpaData, arrayData,...
               metabolomicsData, taskFile, useScoresForTasks, printReport,...
               taskStructure, params, paramsFT)

   Rasmus Agren, 2013-04-22

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [model metProduction essentialRxnsForTasks addedRxnsForTasks deletedDeadEndRxns deletedRxnsInINIT taskReport]=getINITModel(refModel, tissue, celltype, hpaData, arrayData, metabolomicsData, taskFile, useScoresForTasks, printReport, taskStructure, params, paramsFT)
0002 % getINITModel
0003 %   Generates a model using the INIT algorithm, based on proteomics and/or
0004 %   transcriptomics and/or metabolomics and/or metabolic tasks.
0005 %
0006 %   refModel            a model structure
0007 %   tissue              tissue to score for. Should exist in either
0008 %                       hpaData.tissues or arrayData.tissues
0009 %   celltype            cell type to score for. Should exist in either
0010 %                       hpaData.celltypes or arrayData.celltypes for this
0011 %                       tissue (opt, default is to use the best values
0012 %                       among all the cell types for the tissue. Use [] if
0013 %                       you want to supply more arguments)
0014 %   hpaData             HPA data structure from parseHPA (opt if arrayData is
0015 %                       supplied, default [])
0016 %   arrayData           gene expression data structure (opt if hpaData is
0017 %                       supplied, default [])
0018 %       genes           cell array with the unique gene names
0019 %       tissues         cell array with the tissue names. The list may not be
0020 %                       unique, as there can be multiple cell types per tissue
0021 %       celltypes       cell array with the cell type names for each tissue
0022 %       levels          GENESxTISSUES array with the expression level for
0023 %                       each gene in each tissue/celltype. NaN should be
0024 %                       used when no measurement was performed
0025 %   metabolomicsData    cell array with metabolite names that the model
0026 %                       should produce (opt, default [])
0027 %   taskFile            a task list in Excel format. See parseTaskList for
0028 %                       details (opt, default [])
0029 %   useScoresForTasks   true if the calculated reaction scored should be used as
0030 %                       weights in the fitting to tasks (opt, default true)
0031 %   printReport         true if a report should be printed to the screen
0032 %                       (opt, default true)
0033 %   taskStructure       task structure as from parseTaskList. Can be used
0034 %                       as an alternative way to define tasks when Excel
0035 %                       sheets are not suitable. Overrides taskFile (opt,
0036 %                       default [])
0037 %   params              parameter structure as used by getMILPParams. This is
0038 %                       for the INIT algorithm. For the the MILP problems
0039 %                       solved to fit tasks, see paramsFT (opt, default [])
0040 %   paramsFT            parameter structure as used by getMILPParams. This is
0041 %                       for the fitTasks step. For the INIT algorithm, see
0042 %                       params (opt, default [])
0043 %
0044 %   model                   the resulting model structure
0045 %   metProduction           array that indicates which of the
0046 %                           metabolites in metabolomicsData that could be
0047 %                           produced. Note that this is before the
0048 %                           gap-filling process to enable defined tasks. To
0049 %                           see which metabolites that can be produced in
0050 %                           the final model, use canProduce.
0051 %                           -2: metabolite name not found in model
0052 %                           -1: metabolite found, but it could not be produced
0053 %                           1: metabolite could be produced
0054 %   essentialRxnsForTasks   cell array of the reactions which were
0055 %                           essential to perform the tasks
0056 %   addedRxnsForTasks       cell array of the reactions which were added in
0057 %                           order to perform the tasks
0058 %   deletedDeadEndRxns      cell array of reactions deleted because they
0059 %                           could not carry flux (INIT requires a
0060 %                           functional input model)
0061 %   deletedRxnsInINIT       cell array of the reactions which were deleted by
0062 %                           the INIT algorithm
0063 %   taskReport              structure with the results for each task
0064 %       id                  cell array with the id of the task
0065 %       description         cell array with the description of the task
0066 %       ok                  boolean array with true if the task was successful
0067 %       essential           cell array with cell arrays of essential
0068 %                           reactions for the task
0069 %       gapfill             cell array of cell arrays of reactions included
0070 %                           in the gap-filling for the task
0071 %
0072 %   This is the main function for automatic reconstruction of models based
0073 %   on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all
0074 %   settings are possible using this function, and you may want to call the
0075 %   functions scoreModel, runINIT and fitTasks individually instead.
0076 %
0077 %   NOTE: Exchange metabolites should normally not be removed from the model
0078 %   when using this approach, since checkTasks/fitTasks rely on putting specific
0079 %   constraints for each task. The INIT algorithm will remove exchange metabolites
0080 %   if any are present. Use importModel(file,false) to import a model with
0081 %   exchange metabolites remaining.
0082 %
0083 %   Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks...
0084 %               deletedDeadEndRxns deletedRxnsInINIT taskReport]=...
0085 %               getINITModel(refModel, tissue, celltype, hpaData, arrayData,...
0086 %               metabolomicsData, taskFile, useScoresForTasks, printReport,...
0087 %               taskStructure, params, paramsFT)
0088 %
0089 %   Rasmus Agren, 2013-04-22
0090 %
0091 
0092 if nargin<3
0093     celltype=[];
0094 end
0095 if nargin<4
0096     hpaData=[];
0097 end
0098 if nargin<5
0099     arrayData=[];
0100 end
0101 if nargin<6
0102     metabolomicsData=[];
0103 end
0104 if nargin<7
0105     taskFile=[];
0106 end
0107 if nargin<8
0108     useScoresForTasks=true;
0109 end
0110 if nargin<9
0111     printReport=true;
0112 end
0113 if nargin<10
0114     taskStructure=[];
0115 end
0116 if nargin<11
0117     params=[];
0118 end
0119 if nargin<12
0120     paramsFT=[];
0121 end
0122 
0123 %Create the task structure if not supplied
0124 if any(taskFile) && isempty(taskStructure)
0125     taskStructure=parseTaskList(taskFile);
0126 end
0127 
0128 if printReport==true
0129     if any(celltype)
0130         fprintf(['***Generating model for: ' tissue ' - ' celltype '\n']);
0131     else
0132         fprintf(['***Generating model for: ' tissue '\n']);
0133     end
0134     if ~isempty(hpaData)
0135         fprintf('-Using HPA data\n');
0136     end
0137     if ~isempty(arrayData)
0138         fprintf('-Using array data\n');
0139     end
0140     if ~isempty(metabolomicsData)
0141         fprintf('-Using metabolomics data\n');
0142     end
0143     if any(taskFile)
0144         fprintf('-Using metabolic tasks\n');
0145     end
0146     fprintf('\n');
0147     
0148     printScores(refModel,'Reference model statistics',hpaData,arrayData,tissue,celltype);
0149 end
0150 
0151 %Remove dead-end reactions to speed up the optimization and to
0152 %differentiate between reactions removed by INIT and those that are
0153 %dead-end
0154 [crap, deletedDeadEndRxns]=simplifyModel(refModel,true,false,true,true,true);
0155 cModel=removeRxns(refModel,deletedDeadEndRxns,false,true);
0156 
0157 %Store the connected model like this to keep track of stuff
0158 if printReport==true
0159     printScores(cModel,'Pruned model statistics',hpaData,arrayData,tissue,celltype);
0160 end
0161 
0162 %If tasks have been defined, then go through them and get essential
0163 %reactions
0164 if ~isempty(taskStructure)
0165     [taskReport essentialRxnMat]=checkTasks(cModel,[],printReport,true,true,taskStructure);
0166     
0167     essentialRxnsForTasks=cModel.rxns(any(essentialRxnMat,2));
0168     
0169     %Remove tasks that cannot be performed
0170     taskStructure(taskReport.ok==false)=[];
0171     if printReport==true
0172         printScores(removeRxns(cModel,setdiff(cModel.rxns,essentialRxnsForTasks),true,true),'Reactions essential for tasks',hpaData,arrayData,tissue,celltype);
0173     end
0174 else
0175     essentialRxnsForTasks={};
0176 end
0177 
0178 %Score the connected model
0179 [rxnScores geneScores]=scoreModel(cModel,hpaData,arrayData,tissue,celltype);
0180 
0181 %Run the INIT algorithm. The exchange reactions that are used in the final
0182 %reactions will be open, which doesn't fit with the last step. Therefore
0183 %delete reactions from the original model instead of taking the output.
0184 %The default implementation does not constrain reversible reactions to only
0185 %carry flux in one direction.
0186 %Runs without the constraints on reversibility and with all output allowed.
0187 %This is to reduce the complexity of the problem.
0188 [crap deletedRxnsInINIT metProduction]=runINIT(simplifyModel(cModel),rxnScores,metabolomicsData,essentialRxnsForTasks,0,true,false,params);
0189 initModel=removeRxns(cModel,deletedRxnsInINIT,true,true);
0190 if printReport==true
0191     printScores(initModel,'INIT model statistics',hpaData,arrayData,tissue,celltype);
0192     printScores(removeRxns(cModel,setdiff(cModel.rxns,deletedRxnsInINIT),true,true),'Reactions deleted by INIT',hpaData,arrayData,tissue,celltype);
0193 end
0194 
0195 %The full model has exchange reactions in it. fitTasks calls on fillGaps,
0196 %which automatically removes exchange metabolites (because it assumes that
0197 %the reactions are constrained when appropriate). In this case the
0198 %uptakes/outputs are retrieved from the task sheet instead. To prevent
0199 %exchange reactions being used to fill gaps, they are delete from the
0200 %reference model here.
0201 initModel.id='INITModel';
0202 
0203 %If gaps in the model should be filled using a task list
0204 if ~isempty(taskStructure)
0205     %Remove exchange reactions and reactions already included in the INIT
0206     %model
0207     refModelNoExc=removeRxns(refModel,union(initModel.rxns,getExchangeRxns(refModel)),true,true);
0208     
0209     %At this stage the model is fully connected and most of the genes with
0210     %good scores should have been included. The final gap-filling should
0211     %take the scores of the genes into account, so that "rather bad"
0212     %reactions are preferred to "very bad" reactions. However, reactions
0213     %with positive scores will be included even if they are not connected
0214     %in the current formulation. Therefore, such reactions will have to be
0215     %assigned a small negative score instead.
0216     if useScoresForTasks==true
0217         refRxnScores=scoreModel(refModelNoExc,hpaData,arrayData,tissue,celltype);
0218         [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,min(refRxnScores,-0.1),taskStructure,paramsFT);
0219     else
0220         [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,[],taskStructure,paramsFT);
0221     end
0222     if printReport==true
0223         printScores(outModel,'Functional model statistics',hpaData,arrayData,tissue,celltype);
0224         printScores(removeRxns(outModel,intersect(outModel.rxns,initModel.rxns),true,true),'Reactions added to perform the tasks',hpaData,arrayData,tissue,celltype);
0225     end
0226     
0227     addedRxnsForTasks=refModelNoExc.rxns(any(addedRxnMat,2));
0228 else
0229     outModel=initModel;
0230     addedRxnMat=[];
0231     addedRxnsForTasks={};
0232 end
0233 
0234 %The model can now perform all the tasks defined in the task list. The
0235 %algorithm cannot deal with gene-complexes at the moment. It is therefore
0236 %ok to remove bad genes from a reaction (as long as at least one gene is
0237 %kept)
0238 model=outModel;
0239 
0240 [crap I]=ismember(model.genes,cModel.genes); %All should be found
0241 %This is a little weird way to make sure that only one bad gene is included
0242 %if there are no good ones (since all -Inf==max(-Inf))
0243 geneScores(isinf(geneScores))=-1000+rand(sum(isinf(geneScores)),1);
0244 
0245 model.grRules(:)={''};
0246 for i=1:numel(model.rxns)
0247     ids=find(model.rxnGeneMat(i,:));
0248     if numel(ids)>1
0249        scores=geneScores(I(ids));
0250        %Only keep the positive ones if possible
0251        model.rxnGeneMat(i,ids(~(scores>0 | scores==max(scores))))=0;
0252     end
0253     %Rewrite the grRules to be only OR
0254     if isfield(model,'grRules')
0255        J=find(model.rxnGeneMat(i,:));
0256        for j=1:numel(J)
0257            model.grRules{i}=[model.grRules{i} '(' model.genes{J(j)} ')'];
0258            if j<numel(J)
0259                model.grRules{i}=[model.grRules{i} ' or '];
0260            end
0261        end
0262     end
0263 end
0264 
0265 %Find all genes that are not used and delete them
0266 I=sum(model.rxnGeneMat)==0;
0267 model.genes(I)=[];
0268 model.rxnGeneMat(:,I)=[];
0269 if isfield(model,'geneShortNames')
0270     model.geneShortNames(I)=[];
0271 end
0272 if isfield(model,'geneMiriams')
0273     model.geneMiriams(I)=[];
0274 end
0275 if isfield(model,'geneFrom')
0276     model.geneFrom(I)=[];
0277 end
0278 
0279 %At this stage the model will contain some exchange reactions but probably not all
0280 %(and maybe zero). This can be inconvenient, so all exchange reactions from the
0281 %reference model are added, except for those which involve metabolites that
0282 %are not in the model.
0283 
0284 %First delete and included exchange reactions in order to prevent the order
0285 %from changing
0286 model=removeRxns(model,getExchangeRxns(model));
0287 
0288 %Create a model with only the exchange reactions in refModel
0289 excModel=removeRxns(refModel,setdiff(refModel.rxns,getExchangeRxns(refModel)),true,true);
0290 
0291 %Find the metabolites there which are not exchange metabolites and which do
0292 %not exist in the output model
0293 I=~ismember(excModel.mets,model.mets) & excModel.unconstrained==0;
0294 
0295 %Then find those reactions and delete them
0296 [crap J]=find(excModel.S(I,:));
0297 excModel=removeRxns(excModel,J,true,true);
0298 
0299 %Merge with the output model
0300 model=mergeModels({model;excModel});
0301 model.id='INITModel';
0302 model.description=['Automatically generated model for ' tissue];
0303 if any(celltype)
0304     model.description=[model.description ' - ' celltype];
0305 end
0306 
0307 if printReport==true
0308     printScores(model,'Final model statistics',hpaData,arrayData,tissue,celltype); 
0309 end
0310 
0311 %Add information about essential reactions and reactions included for
0312 %gap-filling and return a taskReport
0313 I=find(taskReport.ok); %Ignore failed tasks
0314 for i=1:numel(I)
0315     taskReport.essential{I(i),1}=cModel.rxns(essentialRxnMat(:,I(i)));
0316     taskReport.gapfill{I(i),1}=refModelNoExc.rxns(addedRxnMat(:,i));
0317 end
0318 end
0319 
0320 %This is for printing a summary of a model
0321 function [rxnS geneS]=printScores(model,name,hpaData,arrayData,tissue,celltype)
0322     [a b]=scoreModel(model,hpaData,arrayData,tissue,celltype);
0323     rxnS=mean(a);
0324     geneS=mean(b(~isinf(b)));
0325     fprintf([name ':\n']);
0326     fprintf(['\t' num2str(numel(model.rxns)) ' reactions, ' num2str(numel(model.genes)) ' genes\n']);
0327     fprintf(['\tMean reaction score: ' num2str(rxnS) '\n']);
0328     fprintf(['\tMean gene score: ' num2str(geneS) '\n']);
0329     fprintf(['\tReactions with positive scores: ' num2str(100*sum(a>0)/numel(a)) '%%\n\n']);
0330 end

Generated on Tue 23-Apr-2013 15:18:37 by m2html © 2005