getINITModel Generates a model using the INIT algorithm, based on proteomics and/or transcriptomics and/or metabolomics and/or metabolic tasks. refModel a model structure. The model should be in the closed form (no exchange reactions open). Import using import(filename,false). If the model is not loaded using importModel, it might be that there is no "unconstrained" field. In that case, manually add the field like: model.unconstrained=false(numel(model.mets),1); tissue tissue to score for. Should exist in either hpaData.tissues or arrayData.tissues celltype cell type to score for. Should exist in either hpaData.celltypes or arrayData.celltypes for this tissue (opt, default is to use the best values among all the cell types for the tissue. Use [] if you want to supply more arguments) hpaData HPA data structure from parseHPA (opt if arrayData is supplied, default []) arrayData gene expression data structure (opt if hpaData is supplied, default []) genes cell array with the unique gene names tissues cell array with the tissue names. The list may not be unique, as there can be multiple cell types per tissue celltypes cell array with the cell type names for each tissue levels GENESxTISSUES array with the expression level for each gene in each tissue/celltype. NaN should be used when no measurement was performed metabolomicsData cell array with metabolite names that the model should produce (opt, default []) taskFile a task list in Excel format. See parseTaskList for details (opt, default []) useScoresForTasks true if the calculated reaction scored should be used as weights in the fitting to tasks (opt, default true) printReport true if a report should be printed to the screen (opt, default true) taskStructure task structure as from parseTaskList. Can be used as an alternative way to define tasks when Excel sheets are not suitable. Overrides taskFile (opt, default []) params parameter structure as used by getMILPParams. This is for the INIT algorithm. For the the MILP problems solved to fit tasks, see paramsFT (opt, default []) paramsFT parameter structure as used by getMILPParams. This is for the fitTasks step. For the INIT algorithm, see params (opt, default []) model the resulting model structure metProduction array that indicates which of the metabolites in metabolomicsData that could be produced. Note that this is before the gap-filling process to enable defined tasks. To see which metabolites that can be produced in the final model, use canProduce. -2: metabolite name not found in model -1: metabolite found, but it could not be produced 1: metabolite could be produced essentialRxnsForTasks cell array of the reactions which were essential to perform the tasks addedRxnsForTasks cell array of the reactions which were added in order to perform the tasks deletedDeadEndRxns cell array of reactions deleted because they could not carry flux (INIT requires a functional input model) deletedRxnsInINIT cell array of the reactions which were deleted by the INIT algorithm taskReport structure with the results for each task id cell array with the id of the task description cell array with the description of the task ok boolean array with true if the task was successful essential cell array with cell arrays of essential reactions for the task gapfill cell array of cell arrays of reactions included in the gap-filling for the task This is the main function for automatic reconstruction of models based on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all settings are possible using this function, and you may want to call the functions scoreModel, runINIT and fitTasks individually instead. NOTE: Exchange metabolites should normally not be removed from the model when using this approach, since checkTasks/fitTasks rely on putting specific constraints for each task. The INIT algorithm will remove exchange metabolites if any are present. Use importModel(file,false) to import a model with exchange metabolites remaining. Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks... deletedDeadEndRxns deletedRxnsInINIT taskReport]=... getINITModel(refModel, tissue, celltype, hpaData, arrayData,... metabolomicsData, taskFile, useScoresForTasks, printReport,... taskStructure, params, paramsFT) Rasmus Agren, 2013-08-01
0001 function [model metProduction essentialRxnsForTasks addedRxnsForTasks deletedDeadEndRxns deletedRxnsInINIT taskReport]=getINITModel(refModel, tissue, celltype, hpaData, arrayData, metabolomicsData, taskFile, useScoresForTasks, printReport, taskStructure, params, paramsFT) 0002 % getINITModel 0003 % Generates a model using the INIT algorithm, based on proteomics and/or 0004 % transcriptomics and/or metabolomics and/or metabolic tasks. 0005 % 0006 % refModel a model structure. The model should be in the 0007 % closed form (no exchange reactions open). Import 0008 % using import(filename,false). If the model is not 0009 % loaded using importModel, it might be that there 0010 % is no "unconstrained" field. In that case, 0011 % manually add the field like: 0012 % model.unconstrained=false(numel(model.mets),1); 0013 % tissue tissue to score for. Should exist in either 0014 % hpaData.tissues or arrayData.tissues 0015 % celltype cell type to score for. Should exist in either 0016 % hpaData.celltypes or arrayData.celltypes for this 0017 % tissue (opt, default is to use the best values 0018 % among all the cell types for the tissue. Use [] if 0019 % you want to supply more arguments) 0020 % hpaData HPA data structure from parseHPA (opt if arrayData is 0021 % supplied, default []) 0022 % arrayData gene expression data structure (opt if hpaData is 0023 % supplied, default []) 0024 % genes cell array with the unique gene names 0025 % tissues cell array with the tissue names. The list may not be 0026 % unique, as there can be multiple cell types per tissue 0027 % celltypes cell array with the cell type names for each tissue 0028 % levels GENESxTISSUES array with the expression level for 0029 % each gene in each tissue/celltype. NaN should be 0030 % used when no measurement was performed 0031 % metabolomicsData cell array with metabolite names that the model 0032 % should produce (opt, default []) 0033 % taskFile a task list in Excel format. See parseTaskList for 0034 % details (opt, default []) 0035 % useScoresForTasks true if the calculated reaction scored should be used as 0036 % weights in the fitting to tasks (opt, default true) 0037 % printReport true if a report should be printed to the screen 0038 % (opt, default true) 0039 % taskStructure task structure as from parseTaskList. Can be used 0040 % as an alternative way to define tasks when Excel 0041 % sheets are not suitable. Overrides taskFile (opt, 0042 % default []) 0043 % params parameter structure as used by getMILPParams. This is 0044 % for the INIT algorithm. For the the MILP problems 0045 % solved to fit tasks, see paramsFT (opt, default []) 0046 % paramsFT parameter structure as used by getMILPParams. This is 0047 % for the fitTasks step. For the INIT algorithm, see 0048 % params (opt, default []) 0049 % 0050 % model the resulting model structure 0051 % metProduction array that indicates which of the 0052 % metabolites in metabolomicsData that could be 0053 % produced. Note that this is before the 0054 % gap-filling process to enable defined tasks. To 0055 % see which metabolites that can be produced in 0056 % the final model, use canProduce. 0057 % -2: metabolite name not found in model 0058 % -1: metabolite found, but it could not be produced 0059 % 1: metabolite could be produced 0060 % essentialRxnsForTasks cell array of the reactions which were 0061 % essential to perform the tasks 0062 % addedRxnsForTasks cell array of the reactions which were added in 0063 % order to perform the tasks 0064 % deletedDeadEndRxns cell array of reactions deleted because they 0065 % could not carry flux (INIT requires a 0066 % functional input model) 0067 % deletedRxnsInINIT cell array of the reactions which were deleted by 0068 % the INIT algorithm 0069 % taskReport structure with the results for each task 0070 % id cell array with the id of the task 0071 % description cell array with the description of the task 0072 % ok boolean array with true if the task was successful 0073 % essential cell array with cell arrays of essential 0074 % reactions for the task 0075 % gapfill cell array of cell arrays of reactions included 0076 % in the gap-filling for the task 0077 % 0078 % This is the main function for automatic reconstruction of models based 0079 % on the INIT algorithm (PLoS Comput Biol. 2012;8(5):e1002518). Not all 0080 % settings are possible using this function, and you may want to call the 0081 % functions scoreModel, runINIT and fitTasks individually instead. 0082 % 0083 % NOTE: Exchange metabolites should normally not be removed from the model 0084 % when using this approach, since checkTasks/fitTasks rely on putting specific 0085 % constraints for each task. The INIT algorithm will remove exchange metabolites 0086 % if any are present. Use importModel(file,false) to import a model with 0087 % exchange metabolites remaining. 0088 % 0089 % Usage: [model metProduction essentialRxnsForTasks addedRxnsForTasks... 0090 % deletedDeadEndRxns deletedRxnsInINIT taskReport]=... 0091 % getINITModel(refModel, tissue, celltype, hpaData, arrayData,... 0092 % metabolomicsData, taskFile, useScoresForTasks, printReport,... 0093 % taskStructure, params, paramsFT) 0094 % 0095 % Rasmus Agren, 2013-08-01 0096 % 0097 0098 if nargin<3 0099 celltype=[]; 0100 end 0101 if nargin<4 0102 hpaData=[]; 0103 end 0104 if nargin<5 0105 arrayData=[]; 0106 end 0107 if nargin<6 0108 metabolomicsData=[]; 0109 end 0110 if nargin<7 0111 taskFile=[]; 0112 end 0113 if nargin<8 0114 useScoresForTasks=true; 0115 end 0116 if nargin<9 0117 printReport=true; 0118 end 0119 if nargin<10 0120 taskStructure=[]; 0121 end 0122 if nargin<11 0123 params=[]; 0124 end 0125 if nargin<12 0126 paramsFT=[]; 0127 end 0128 0129 %Check that the model is in the closed form 0130 if ~isfield(refModel,'unconstrained') 0131 dispEM('Exchange metabolites should normally not be removed from the model when using getINITModel. Use importModel(file,false) to import a model with exchange metabolites remaining (see the documentation for details)'); 0132 end 0133 0134 %Create the task structure if not supplied 0135 if any(taskFile) && isempty(taskStructure) 0136 taskStructure=parseTaskList(taskFile); 0137 end 0138 0139 if printReport==true 0140 if any(celltype) 0141 fprintf(['***Generating model for: ' tissue ' - ' celltype '\n']); 0142 else 0143 fprintf(['***Generating model for: ' tissue '\n']); 0144 end 0145 if ~isempty(hpaData) 0146 fprintf('-Using HPA data\n'); 0147 end 0148 if ~isempty(arrayData) 0149 fprintf('-Using array data\n'); 0150 end 0151 if ~isempty(metabolomicsData) 0152 fprintf('-Using metabolomics data\n'); 0153 end 0154 if any(taskFile) 0155 fprintf('-Using metabolic tasks\n'); 0156 end 0157 fprintf('\n'); 0158 0159 printScores(refModel,'Reference model statistics',hpaData,arrayData,tissue,celltype); 0160 end 0161 0162 %Remove dead-end reactions to speed up the optimization and to 0163 %differentiate between reactions removed by INIT and those that are 0164 %dead-end 0165 [crap, deletedDeadEndRxns]=simplifyModel(refModel,true,false,true,true,true); 0166 cModel=removeRxns(refModel,deletedDeadEndRxns,false,true); 0167 0168 %Store the connected model like this to keep track of stuff 0169 if printReport==true 0170 printScores(cModel,'Pruned model statistics',hpaData,arrayData,tissue,celltype); 0171 end 0172 0173 %If tasks have been defined, then go through them and get essential 0174 %reactions 0175 if ~isempty(taskStructure) 0176 [taskReport essentialRxnMat]=checkTasks(cModel,[],printReport,true,true,taskStructure); 0177 0178 essentialRxnsForTasks=cModel.rxns(any(essentialRxnMat,2)); 0179 0180 %Remove tasks that cannot be performed 0181 taskStructure(taskReport.ok==false)=[]; 0182 if printReport==true 0183 printScores(removeRxns(cModel,setdiff(cModel.rxns,essentialRxnsForTasks),true,true),'Reactions essential for tasks',hpaData,arrayData,tissue,celltype); 0184 end 0185 else 0186 essentialRxnsForTasks={}; 0187 end 0188 0189 %Score the connected model 0190 [rxnScores geneScores]=scoreModel(cModel,hpaData,arrayData,tissue,celltype); 0191 0192 %Run the INIT algorithm. The exchange reactions that are used in the final 0193 %reactions will be open, which doesn't fit with the last step. Therefore 0194 %delete reactions from the original model instead of taking the output. 0195 %The default implementation does not constrain reversible reactions to only 0196 %carry flux in one direction. 0197 %Runs without the constraints on reversibility and with all output allowed. 0198 %This is to reduce the complexity of the problem. 0199 [crap deletedRxnsInINIT metProduction]=runINIT(simplifyModel(cModel),rxnScores,metabolomicsData,essentialRxnsForTasks,0,true,false,params); 0200 initModel=removeRxns(cModel,deletedRxnsInINIT,true,true); 0201 if printReport==true 0202 printScores(initModel,'INIT model statistics',hpaData,arrayData,tissue,celltype); 0203 printScores(removeRxns(cModel,setdiff(cModel.rxns,deletedRxnsInINIT),true,true),'Reactions deleted by INIT',hpaData,arrayData,tissue,celltype); 0204 end 0205 0206 %The full model has exchange reactions in it. fitTasks calls on fillGaps, 0207 %which automatically removes exchange metabolites (because it assumes that 0208 %the reactions are constrained when appropriate). In this case the 0209 %uptakes/outputs are retrieved from the task sheet instead. To prevent 0210 %exchange reactions being used to fill gaps, they are delete from the 0211 %reference model here. 0212 initModel.id='INITModel'; 0213 0214 %If gaps in the model should be filled using a task list 0215 if ~isempty(taskStructure) 0216 %Remove exchange reactions and reactions already included in the INIT 0217 %model 0218 refModelNoExc=removeRxns(refModel,union(initModel.rxns,getExchangeRxns(refModel)),true,true); 0219 0220 %At this stage the model is fully connected and most of the genes with 0221 %good scores should have been included. The final gap-filling should 0222 %take the scores of the genes into account, so that "rather bad" 0223 %reactions are preferred to "very bad" reactions. However, reactions 0224 %with positive scores will be included even if they are not connected 0225 %in the current formulation. Therefore, such reactions will have to be 0226 %assigned a small negative score instead. 0227 if useScoresForTasks==true 0228 refRxnScores=scoreModel(refModelNoExc,hpaData,arrayData,tissue,celltype); 0229 [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,min(refRxnScores,-0.1),taskStructure,paramsFT); 0230 else 0231 [outModel addedRxnMat]=fitTasks(initModel,refModelNoExc,[],true,[],taskStructure,paramsFT); 0232 end 0233 if printReport==true 0234 printScores(outModel,'Functional model statistics',hpaData,arrayData,tissue,celltype); 0235 printScores(removeRxns(outModel,intersect(outModel.rxns,initModel.rxns),true,true),'Reactions added to perform the tasks',hpaData,arrayData,tissue,celltype); 0236 end 0237 0238 addedRxnsForTasks=refModelNoExc.rxns(any(addedRxnMat,2)); 0239 else 0240 outModel=initModel; 0241 addedRxnMat=[]; 0242 addedRxnsForTasks={}; 0243 end 0244 0245 %The model can now perform all the tasks defined in the task list. The 0246 %algorithm cannot deal with gene-complexes at the moment. It is therefore 0247 %ok to remove bad genes from a reaction (as long as at least one gene is 0248 %kept) 0249 model=outModel; 0250 0251 [crap I]=ismember(model.genes,cModel.genes); %All should be found 0252 %This is a little weird way to make sure that only one bad gene is included 0253 %if there are no good ones (since all -Inf==max(-Inf)) 0254 geneScores(isinf(geneScores))=-1000+rand(sum(isinf(geneScores)),1); 0255 0256 model.grRules(:)={''}; 0257 for i=1:numel(model.rxns) 0258 ids=find(model.rxnGeneMat(i,:)); 0259 if numel(ids)>1 0260 scores=geneScores(I(ids)); 0261 %Only keep the positive ones if possible 0262 model.rxnGeneMat(i,ids(~(scores>0 | scores==max(scores))))=0; 0263 end 0264 %Rewrite the grRules to be only OR 0265 if isfield(model,'grRules') 0266 J=find(model.rxnGeneMat(i,:)); 0267 for j=1:numel(J) 0268 model.grRules{i}=[model.grRules{i} '(' model.genes{J(j)} ')']; 0269 if j<numel(J) 0270 model.grRules{i}=[model.grRules{i} ' or ']; 0271 end 0272 end 0273 end 0274 end 0275 0276 %Find all genes that are not used and delete them 0277 I=sum(model.rxnGeneMat)==0; 0278 model.genes(I)=[]; 0279 model.rxnGeneMat(:,I)=[]; 0280 if isfield(model,'geneShortNames') 0281 model.geneShortNames(I)=[]; 0282 end 0283 if isfield(model,'geneMiriams') 0284 model.geneMiriams(I)=[]; 0285 end 0286 if isfield(model,'geneFrom') 0287 model.geneFrom(I)=[]; 0288 end 0289 if isfield(model,'geneComps') 0290 model.geneComps(I)=[]; 0291 end 0292 0293 %At this stage the model will contain some exchange reactions but probably not all 0294 %(and maybe zero). This can be inconvenient, so all exchange reactions from the 0295 %reference model are added, except for those which involve metabolites that 0296 %are not in the model. 0297 0298 %First delete and included exchange reactions in order to prevent the order 0299 %from changing 0300 model=removeRxns(model,getExchangeRxns(model)); 0301 0302 %Create a model with only the exchange reactions in refModel 0303 excModel=removeRxns(refModel,setdiff(refModel.rxns,getExchangeRxns(refModel)),true,true); 0304 0305 %Find the metabolites there which are not exchange metabolites and which do 0306 %not exist in the output model 0307 I=~ismember(excModel.mets,model.mets) & excModel.unconstrained==0; 0308 0309 %Then find those reactions and delete them 0310 [crap J]=find(excModel.S(I,:)); 0311 excModel=removeRxns(excModel,J,true,true); 0312 0313 %Merge with the output model 0314 model=mergeModels({model;excModel}); 0315 model.id='INITModel'; 0316 model.description=['Automatically generated model for ' tissue]; 0317 if any(celltype) 0318 model.description=[model.description ' - ' celltype]; 0319 end 0320 0321 if printReport==true 0322 printScores(model,'Final model statistics',hpaData,arrayData,tissue,celltype); 0323 end 0324 0325 %Add information about essential reactions and reactions included for 0326 %gap-filling and return a taskReport 0327 if ~isempty(taskStructure) 0328 I=find(taskReport.ok); %Ignore failed tasks 0329 for i=1:numel(I) 0330 taskReport.essential{I(i),1}=cModel.rxns(essentialRxnMat(:,I(i))); 0331 taskReport.gapfill{I(i),1}=refModelNoExc.rxns(addedRxnMat(:,i)); 0332 end 0333 else 0334 taskReport=[]; 0335 end 0336 end 0337 0338 %This is for printing a summary of a model 0339 function [rxnS geneS]=printScores(model,name,hpaData,arrayData,tissue,celltype) 0340 [a b]=scoreModel(model,hpaData,arrayData,tissue,celltype); 0341 rxnS=mean(a); 0342 geneS=mean(b(~isinf(b))); 0343 fprintf([name ':\n']); 0344 fprintf(['\t' num2str(numel(model.rxns)) ' reactions, ' num2str(numel(model.genes)) ' genes\n']); 0345 fprintf(['\tMean reaction score: ' num2str(rxnS) '\n']); 0346 fprintf(['\tMean gene score: ' num2str(geneS) '\n']); 0347 fprintf(['\tReactions with positive scores: ' num2str(100*sum(a>0)/numel(a)) '%%\n\n']); 0348 end