getINITModel Generates a model using the INIT algorithm, based on proteomics and/or transcriptomics and/or metabolomics and/or metabolic tasks. refModel a model structure tissue tissue to score for. Should exist in either hpaData.tissues or arrayData.tissues celltype cell type to score for. Should exist in either hpaData.celltypes or arrayData.celltypes for this tissue (opt, default is to use the best values among all the cell types for the tissue. Use [] if you want to supply more arguments) hpaData HPA data structure from parseHPA (opt if arrayData is supplied, default []) arrayData gene expression data structure (opt if hpaData is supplied, default []) genes cell array with the unique gene names tissues cell array with the tissue names. The list may not be unique, as there can be multiple cell types per tissue celltypes cell array with the cell type names for each tissue levels GENESxTISSUES array with the expression level for each gene in each tissue/celltype. NaN should be used when no measurement was performed metabolomicsData cell array with metabolite names that the model should produce (opt, default []) taskFile a task list in Excel format. See parseTaskList for details (opt, default []) useScoresForTasks true if the calculated reaction scored should be used as weights in the fitting to tasks (opt, default true) printReport true if a report should be printed to the screen (opt, default true) taskStructure task structure as from parseTaskList. Can be used as an alternative way to define tasks when Excel sheets are not suitable. Overrides taskFile (opt, default []) params parameter structure as used by getMILPParams. This is for the INIT algorithm. For the the MILP problems solved to fit tasks, see paramsFT (opt, default []) paramsFT parameter structure as used by getMILPParams. This is for the fitTasks step. For the INIT algorithm, see params (opt, default []) model the resulting model structure metProduction array that indicates which of the metabolites in metabolomicsData that could be produced. Note that this is before the gap-filling process to enable defined tasks. To see which metabolites that can be produced in the final model, use canProduce. -2: metabolite name not found in model -1: metabolite found, but it could not be produced 1: metabolite could be produced essentialRxnsForTasks cell array of the reactions which were essential to perform the tasks addedRxnsForTasks cell array of the reactions which were added in order to perform the tasks deletedDeadEndRxns cell array of reactions deleted because they could not carry flux (INIT requires a functional input model) deletedRxnsInINIT cell array of the reactions which were deleted by the INIT algorithm taskReport structure with the results for each task id cell array with the id of the task description cell array with the description of the task ok boolean array with true if the task was successful essential cell array with cell arrays of essential reactions for the task gapfill cell array of cell arrays of reactions included in the gap-filling for the task This is the main function for automatic reconstruction of models based on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all settings are possible using this function, and you may want to call the functions scoreModel, runINIT and fitTasks individually instead. NOTE: Exchange metabolites should normally not be removed from the model when using this approach, since checkTasks/fitTasks rely on putting specific constraints for each task. The INIT algorithm will remove exchange metabolites if any are present. Use importModel(file,false) to import a model with exchange metabolites remaining. Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks... deletedDeadEndRxns deletedRxnsInINIT taskReport]=... getINITModel(refModel, tissue, celltype, hpaData, arrayData,... metabolomicsData, taskFile, useScoresForTasks, printReport,... taskStructure, params, paramsFT) Rasmus Agren, 2013-04-22
0001 function [model metProduction essentialRxnsForTasks addedRxnsForTasks deletedDeadEndRxns deletedRxnsInINIT taskReport]=getINITModel(refModel, tissue, celltype, hpaData, arrayData, metabolomicsData, taskFile, useScoresForTasks, printReport, taskStructure, params, paramsFT) 0002 % getINITModel 0003 % Generates a model using the INIT algorithm, based on proteomics and/or 0004 % transcriptomics and/or metabolomics and/or metabolic tasks. 0005 % 0006 % refModel a model structure 0007 % tissue tissue to score for. Should exist in either 0008 % hpaData.tissues or arrayData.tissues 0009 % celltype cell type to score for. Should exist in either 0010 % hpaData.celltypes or arrayData.celltypes for this 0011 % tissue (opt, default is to use the best values 0012 % among all the cell types for the tissue. Use [] if 0013 % you want to supply more arguments) 0014 % hpaData HPA data structure from parseHPA (opt if arrayData is 0015 % supplied, default []) 0016 % arrayData gene expression data structure (opt if hpaData is 0017 % supplied, default []) 0018 % genes cell array with the unique gene names 0019 % tissues cell array with the tissue names. The list may not be 0020 % unique, as there can be multiple cell types per tissue 0021 % celltypes cell array with the cell type names for each tissue 0022 % levels GENESxTISSUES array with the expression level for 0023 % each gene in each tissue/celltype. NaN should be 0024 % used when no measurement was performed 0025 % metabolomicsData cell array with metabolite names that the model 0026 % should produce (opt, default []) 0027 % taskFile a task list in Excel format. See parseTaskList for 0028 % details (opt, default []) 0029 % useScoresForTasks true if the calculated reaction scored should be used as 0030 % weights in the fitting to tasks (opt, default true) 0031 % printReport true if a report should be printed to the screen 0032 % (opt, default true) 0033 % taskStructure task structure as from parseTaskList. Can be used 0034 % as an alternative way to define tasks when Excel 0035 % sheets are not suitable. Overrides taskFile (opt, 0036 % default []) 0037 % params parameter structure as used by getMILPParams. This is 0038 % for the INIT algorithm. For the the MILP problems 0039 % solved to fit tasks, see paramsFT (opt, default []) 0040 % paramsFT parameter structure as used by getMILPParams. This is 0041 % for the fitTasks step. For the INIT algorithm, see 0042 % params (opt, default []) 0043 % 0044 % model the resulting model structure 0045 % metProduction array that indicates which of the 0046 % metabolites in metabolomicsData that could be 0047 % produced. Note that this is before the 0048 % gap-filling process to enable defined tasks. To 0049 % see which metabolites that can be produced in 0050 % the final model, use canProduce. 0051 % -2: metabolite name not found in model 0052 % -1: metabolite found, but it could not be produced 0053 % 1: metabolite could be produced 0054 % essentialRxnsForTasks cell array of the reactions which were 0055 % essential to perform the tasks 0056 % addedRxnsForTasks cell array of the reactions which were added in 0057 % order to perform the tasks 0058 % deletedDeadEndRxns cell array of reactions deleted because they 0059 % could not carry flux (INIT requires a 0060 % functional input model) 0061 % deletedRxnsInINIT cell array of the reactions which were deleted by 0062 % the INIT algorithm 0063 % taskReport structure with the results for each task 0064 % id cell array with the id of the task 0065 % description cell array with the description of the task 0066 % ok boolean array with true if the task was successful 0067 % essential cell array with cell arrays of essential 0068 % reactions for the task 0069 % gapfill cell array of cell arrays of reactions included 0070 % in the gap-filling for the task 0071 % 0072 % This is the main function for automatic reconstruction of models based 0073 % on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all 0074 % settings are possible using this function, and you may want to call the 0075 % functions scoreModel, runINIT and fitTasks individually instead. 0076 % 0077 % NOTE: Exchange metabolites should normally not be removed from the model 0078 % when using this approach, since checkTasks/fitTasks rely on putting specific 0079 % constraints for each task. The INIT algorithm will remove exchange metabolites 0080 % if any are present. Use importModel(file,false) to import a model with 0081 % exchange metabolites remaining. 0082 % 0083 % Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks... 0084 % deletedDeadEndRxns deletedRxnsInINIT taskReport]=... 0085 % getINITModel(refModel, tissue, celltype, hpaData, arrayData,... 0086 % metabolomicsData, taskFile, useScoresForTasks, printReport,... 0087 % taskStructure, params, paramsFT) 0088 % 0089 % Rasmus Agren, 2013-04-22 0090 % 0091 0092 if nargin<3 0093 celltype=[]; 0094 end 0095 if nargin<4 0096 hpaData=[]; 0097 end 0098 if nargin<5 0099 arrayData=[]; 0100 end 0101 if nargin<6 0102 metabolomicsData=[]; 0103 end 0104 if nargin<7 0105 taskFile=[]; 0106 end 0107 if nargin<8 0108 useScoresForTasks=true; 0109 end 0110 if nargin<9 0111 printReport=true; 0112 end 0113 if nargin<10 0114 taskStructure=[]; 0115 end 0116 if nargin<11 0117 params=[]; 0118 end 0119 if nargin<12 0120 paramsFT=[]; 0121 end 0122 0123 %Create the task structure if not supplied 0124 if any(taskFile) && isempty(taskStructure) 0125 taskStructure=parseTaskList(taskFile); 0126 end 0127 0128 if printReport==true 0129 if any(celltype) 0130 fprintf(['***Generating model for: ' tissue ' - ' celltype '\n']); 0131 else 0132 fprintf(['***Generating model for: ' tissue '\n']); 0133 end 0134 if ~isempty(hpaData) 0135 fprintf('-Using HPA data\n'); 0136 end 0137 if ~isempty(arrayData) 0138 fprintf('-Using array data\n'); 0139 end 0140 if ~isempty(metabolomicsData) 0141 fprintf('-Using metabolomics data\n'); 0142 end 0143 if any(taskFile) 0144 fprintf('-Using metabolic tasks\n'); 0145 end 0146 fprintf('\n'); 0147 0148 printScores(refModel,'Reference model statistics',hpaData,arrayData,tissue,celltype); 0149 end 0150 0151 %Remove dead-end reactions to speed up the optimization and to 0152 %differentiate between reactions removed by INIT and those that are 0153 %dead-end 0154 [crap, deletedDeadEndRxns]=simplifyModel(refModel,true,false,true,true,true); 0155 cModel=removeRxns(refModel,deletedDeadEndRxns,false,true); 0156 0157 %Store the connected model like this to keep track of stuff 0158 if printReport==true 0159 printScores(cModel,'Pruned model statistics',hpaData,arrayData,tissue,celltype); 0160 end 0161 0162 %If tasks have been defined, then go through them and get essential 0163 %reactions 0164 if ~isempty(taskStructure) 0165 [taskReport essentialRxnMat]=checkTasks(cModel,[],printReport,true,true,taskStructure); 0166 0167 essentialRxnsForTasks=cModel.rxns(any(essentialRxnMat,2)); 0168 0169 %Remove tasks that cannot be performed 0170 taskStructure(taskReport.ok==false)=[]; 0171 if printReport==true 0172 printScores(removeRxns(cModel,setdiff(cModel.rxns,essentialRxnsForTasks),true,true),'Reactions essential for tasks',hpaData,arrayData,tissue,celltype); 0173 end 0174 else 0175 essentialRxnsForTasks={}; 0176 end 0177 0178 %Score the connected model 0179 [rxnScores geneScores]=scoreModel(cModel,hpaData,arrayData,tissue,celltype); 0180 0181 %Run the INIT algorithm. The exchange reactions that are used in the final 0182 %reactions will be open, which doesn't fit with the last step. Therefore 0183 %delete reactions from the original model instead of taking the output. 0184 %The default implementation does not constrain reversible reactions to only 0185 %carry flux in one direction. 0186 %Runs without the constraints on reversibility and with all output allowed. 0187 %This is to reduce the complexity of the problem. 0188 [crap deletedRxnsInINIT metProduction]=runINIT(simplifyModel(cModel),rxnScores,metabolomicsData,essentialRxnsForTasks,0,true,false,params); 0189 initModel=removeRxns(cModel,deletedRxnsInINIT,true,true); 0190 if printReport==true 0191 printScores(initModel,'INIT model statistics',hpaData,arrayData,tissue,celltype); 0192 printScores(removeRxns(cModel,setdiff(cModel.rxns,deletedRxnsInINIT),true,true),'Reactions deleted by INIT',hpaData,arrayData,tissue,celltype); 0193 end 0194 0195 %The full model has exchange reactions in it. fitTasks calls on fillGaps, 0196 %which automatically removes exchange metabolites (because it assumes that 0197 %the reactions are constrained when appropriate). In this case the 0198 %uptakes/outputs are retrieved from the task sheet instead. To prevent 0199 %exchange reactions being used to fill gaps, they are delete from the 0200 %reference model here. 0201 initModel.id='INITModel'; 0202 0203 %If gaps in the model should be filled using a task list 0204 if ~isempty(taskStructure) 0205 %Remove exchange reactions and reactions already included in the INIT 0206 %model 0207 refModelNoExc=removeRxns(refModel,union(initModel.rxns,getExchangeRxns(refModel)),true,true); 0208 0209 %At this stage the model is fully connected and most of the genes with 0210 %good scores should have been included. The final gap-filling should 0211 %take the scores of the genes into account, so that "rather bad" 0212 %reactions are preferred to "very bad" reactions. However, reactions 0213 %with positive scores will be included even if they are not connected 0214 %in the current formulation. Therefore, such reactions will have to be 0215 %assigned a small negative score instead. 0216 if useScoresForTasks==true 0217 refRxnScores=scoreModel(refModelNoExc,hpaData,arrayData,tissue,celltype); 0218 [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,min(refRxnScores,-0.1),taskStructure,paramsFT); 0219 else 0220 [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,[],taskStructure,paramsFT); 0221 end 0222 if printReport==true 0223 printScores(outModel,'Functional model statistics',hpaData,arrayData,tissue,celltype); 0224 printScores(removeRxns(outModel,intersect(outModel.rxns,initModel.rxns),true,true),'Reactions added to perform the tasks',hpaData,arrayData,tissue,celltype); 0225 end 0226 0227 addedRxnsForTasks=refModelNoExc.rxns(any(addedRxnMat,2)); 0228 else 0229 outModel=initModel; 0230 addedRxnMat=[]; 0231 addedRxnsForTasks={}; 0232 end 0233 0234 %The model can now perform all the tasks defined in the task list. The 0235 %algorithm cannot deal with gene-complexes at the moment. It is therefore 0236 %ok to remove bad genes from a reaction (as long as at least one gene is 0237 %kept) 0238 model=outModel; 0239 0240 [crap I]=ismember(model.genes,cModel.genes); %All should be found 0241 %This is a little weird way to make sure that only one bad gene is included 0242 %if there are no good ones (since all -Inf==max(-Inf)) 0243 geneScores(isinf(geneScores))=-1000+rand(sum(isinf(geneScores)),1); 0244 0245 model.grRules(:)={''}; 0246 for i=1:numel(model.rxns) 0247 ids=find(model.rxnGeneMat(i,:)); 0248 if numel(ids)>1 0249 scores=geneScores(I(ids)); 0250 %Only keep the positive ones if possible 0251 model.rxnGeneMat(i,ids(~(scores>0 | scores==max(scores))))=0; 0252 end 0253 %Rewrite the grRules to be only OR 0254 if isfield(model,'grRules') 0255 J=find(model.rxnGeneMat(i,:)); 0256 for j=1:numel(J) 0257 model.grRules{i}=[model.grRules{i} '(' model.genes{J(j)} ')']; 0258 if j<numel(J) 0259 model.grRules{i}=[model.grRules{i} ' or ']; 0260 end 0261 end 0262 end 0263 end 0264 0265 %Find all genes that are not used and delete them 0266 I=sum(model.rxnGeneMat)==0; 0267 model.genes(I)=[]; 0268 model.rxnGeneMat(:,I)=[]; 0269 if isfield(model,'geneShortNames') 0270 model.geneShortNames(I)=[]; 0271 end 0272 if isfield(model,'geneMiriams') 0273 model.geneMiriams(I)=[]; 0274 end 0275 if isfield(model,'geneFrom') 0276 model.geneFrom(I)=[]; 0277 end 0278 0279 %At this stage the model will contain some exchange reactions but probably not all 0280 %(and maybe zero). This can be inconvenient, so all exchange reactions from the 0281 %reference model are added, except for those which involve metabolites that 0282 %are not in the model. 0283 0284 %First delete and included exchange reactions in order to prevent the order 0285 %from changing 0286 model=removeRxns(model,getExchangeRxns(model)); 0287 0288 %Create a model with only the exchange reactions in refModel 0289 excModel=removeRxns(refModel,setdiff(refModel.rxns,getExchangeRxns(refModel)),true,true); 0290 0291 %Find the metabolites there which are not exchange metabolites and which do 0292 %not exist in the output model 0293 I=~ismember(excModel.mets,model.mets) & excModel.unconstrained==0; 0294 0295 %Then find those reactions and delete them 0296 [crap J]=find(excModel.S(I,:)); 0297 excModel=removeRxns(excModel,J,true,true); 0298 0299 %Merge with the output model 0300 model=mergeModels({model;excModel}); 0301 model.id='INITModel'; 0302 model.description=['Automatically generated model for ' tissue]; 0303 if any(celltype) 0304 model.description=[model.description ' - ' celltype]; 0305 end 0306 0307 if printReport==true 0308 printScores(model,'Final model statistics',hpaData,arrayData,tissue,celltype); 0309 end 0310 0311 %Add information about essential reactions and reactions included for 0312 %gap-filling and return a taskReport 0313 I=find(taskReport.ok); %Ignore failed tasks 0314 for i=1:numel(I) 0315 taskReport.essential{I(i),1}=cModel.rxns(essentialRxnMat(:,I(i))); 0316 taskReport.gapfill{I(i),1}=refModelNoExc.rxns(addedRxnMat(:,i)); 0317 end 0318 end 0319 0320 %This is for printing a summary of a model 0321 function [rxnS geneS]=printScores(model,name,hpaData,arrayData,tissue,celltype) 0322 [a b]=scoreModel(model,hpaData,arrayData,tissue,celltype); 0323 rxnS=mean(a); 0324 geneS=mean(b(~isinf(b))); 0325 fprintf([name ':\n']); 0326 fprintf(['\t' num2str(numel(model.rxns)) ' reactions, ' num2str(numel(model.genes)) ' genes\n']); 0327 fprintf(['\tMean reaction score: ' num2str(rxnS) '\n']); 0328 fprintf(['\tMean gene score: ' num2str(geneS) '\n']); 0329 fprintf(['\tReactions with positive scores: ' num2str(100*sum(a>0)/numel(a)) '%%\n\n']); 0330 end