Home > RAVEN > importModel.m

importModel

PURPOSE ^

importModel

SYNOPSIS ^

function model=importModel(fileName,removeExcMets,isCOBRA,supressWarnings)

DESCRIPTION ^

 importModel
   Import a constraint-based model from a SBML file

   fileName        a SBML file to import
   removeExcMets   true if exchange metabolites should be removed. This is
                   needed to be able to run simulations, but it could also 
                   be done using simplifyModel at a later stage (opt,
                   default true)
   isCOBRA         true if the SBML-file is a COBRA Toolbox file (opt, default
                   false)
   supressWarnings true if warnings regarding the model structure should
                   be supressed (opt, default false)

   model
       description      description of model contents
       id               model ID
       rxns             reaction ids
       mets             metabolite ids
       S                stoichiometric matrix
       lb               lower bounds
       ub               upper bounds
       rev              reversibility vector
       c                objective coefficients
       b                equality constraints for the metabolite equations
       comps            compartment ids
       compNames        compartment names
       compOutside      the id (as in comps) for the compartment
                        surrounding each of the compartments
       compMiriams      structure with MIRIAM information about the
                        compartments
       rxnNames         reaction description
       rxnComps         compartments for reactions
       grRules          reaction to gene rules in text form
       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
       subSystems       subsystem name for each reaction
       eccodes          EC-codes for the reactions
       rxnMiriams       structure with MIRIAM information about the reactions
       genes            list of all genes
       geneComps        compartments for reactions
       geneMiriams      structure with MIRIAM information about the genes
       metNames         metabolite description
       metComps         compartments for metabolites
       inchis           InChI-codes for metabolites
       metFormulas      metabolite chemical formula
       metMiriams       structure with MIRIAM information about the metabolites
       unconstrained    true if the metabolite is an exchange metabolite

   Loads models in the COBRA Toolbox format and in the format used in
   the yeast consensus reconstruction. The resulting structure is compatible
   with COBRA Toolbox. A number of consistency checks are performed in order
   to ensure that the model is valid. Take these warnings seriously and modify the
   model structure to solve them. The RAVEN Toolbox is made to function
   only on consistent models, and the only checks performed are when the
   model is imported. You can use exportToExcelFormat, modify the model in
   Microsoft Excel and then reimport it using importExcelModel (or remake
   the SBML file using SBMLFromExcel)

   NOTE: This script requires that libSBML is installed.
   
   NOTE: All IDs are assumed to be named C_, M_, E_, R_ for compartments, 
         metabolites, genes, and reactions. This is true for models
         generated by SBMLFromExcel and those that follow the yeast
         consensus network model formulation.

   Usage: model=importModel(fileName,removeExcMets,isCOBRA,supressWarnings)

   Rasmus Agren, 2013-08-06

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function model=importModel(fileName,removeExcMets,isCOBRA,supressWarnings)
0002 % importModel
0003 %   Import a constraint-based model from a SBML file
0004 %
0005 %   fileName        a SBML file to import
0006 %   removeExcMets   true if exchange metabolites should be removed. This is
0007 %                   needed to be able to run simulations, but it could also
0008 %                   be done using simplifyModel at a later stage (opt,
0009 %                   default true)
0010 %   isCOBRA         true if the SBML-file is a COBRA Toolbox file (opt, default
0011 %                   false)
0012 %   supressWarnings true if warnings regarding the model structure should
0013 %                   be supressed (opt, default false)
0014 %
0015 %   model
0016 %       description      description of model contents
0017 %       id               model ID
0018 %       rxns             reaction ids
0019 %       mets             metabolite ids
0020 %       S                stoichiometric matrix
0021 %       lb               lower bounds
0022 %       ub               upper bounds
0023 %       rev              reversibility vector
0024 %       c                objective coefficients
0025 %       b                equality constraints for the metabolite equations
0026 %       comps            compartment ids
0027 %       compNames        compartment names
0028 %       compOutside      the id (as in comps) for the compartment
0029 %                        surrounding each of the compartments
0030 %       compMiriams      structure with MIRIAM information about the
0031 %                        compartments
0032 %       rxnNames         reaction description
0033 %       rxnComps         compartments for reactions
0034 %       grRules          reaction to gene rules in text form
0035 %       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
0036 %       subSystems       subsystem name for each reaction
0037 %       eccodes          EC-codes for the reactions
0038 %       rxnMiriams       structure with MIRIAM information about the reactions
0039 %       genes            list of all genes
0040 %       geneComps        compartments for reactions
0041 %       geneMiriams      structure with MIRIAM information about the genes
0042 %       metNames         metabolite description
0043 %       metComps         compartments for metabolites
0044 %       inchis           InChI-codes for metabolites
0045 %       metFormulas      metabolite chemical formula
0046 %       metMiriams       structure with MIRIAM information about the metabolites
0047 %       unconstrained    true if the metabolite is an exchange metabolite
0048 %
0049 %   Loads models in the COBRA Toolbox format and in the format used in
0050 %   the yeast consensus reconstruction. The resulting structure is compatible
0051 %   with COBRA Toolbox. A number of consistency checks are performed in order
0052 %   to ensure that the model is valid. Take these warnings seriously and modify the
0053 %   model structure to solve them. The RAVEN Toolbox is made to function
0054 %   only on consistent models, and the only checks performed are when the
0055 %   model is imported. You can use exportToExcelFormat, modify the model in
0056 %   Microsoft Excel and then reimport it using importExcelModel (or remake
0057 %   the SBML file using SBMLFromExcel)
0058 %
0059 %   NOTE: This script requires that libSBML is installed.
0060 %
0061 %   NOTE: All IDs are assumed to be named C_, M_, E_, R_ for compartments,
0062 %         metabolites, genes, and reactions. This is true for models
0063 %         generated by SBMLFromExcel and those that follow the yeast
0064 %         consensus network model formulation.
0065 %
0066 %   Usage: model=importModel(fileName,removeExcMets,isCOBRA,supressWarnings)
0067 %
0068 %   Rasmus Agren, 2013-08-06
0069 %
0070 
0071 if nargin<2
0072     removeExcMets=true;
0073 end
0074 
0075 if nargin<3
0076     isCOBRA=false;
0077 end
0078 
0079 if nargin<4
0080     supressWarnings=false;
0081 end
0082 
0083 %This is to match the order of the fields to those you get from importing
0084 %from Excel
0085 model=[];
0086 model.description=[];
0087 model.id=[];
0088 model.annotation=[];
0089 model.rxns={};
0090 model.mets={};
0091 model.S=[];
0092 model.lb=[];
0093 model.ub=[];
0094 model.rev=[];
0095 model.c=[];
0096 model.b=[];
0097 model.comps={};
0098 model.compNames={};
0099 model.compOutside={};
0100 model.compMiriams={};
0101 model.rxnNames={};
0102 model.rxnComps=[];
0103 model.grRules={};
0104 model.rxnGeneMat=[];
0105 model.subSystems={};
0106 model.eccodes={};
0107 model.rxnMiriams={};
0108 model.genes={};
0109 model.geneComps=[];
0110 model.geneMiriams={};
0111 model.metNames={};
0112 model.metComps=[];
0113 model.inchis={};
0114 model.metFormulas={};
0115 model.metMiriams={};
0116 model.unconstrained=[];
0117 
0118 %Load the model using libSBML
0119 modelSBML = TranslateSBML(fileName);
0120 
0121 if isempty(modelSBML)
0122    dispEM('There is a problem with the SBML file. Try using the SBML Validator at http://sbml.org/Facilities/Validator');
0123 end
0124 
0125 %Retrieve compartment names and IDs
0126 compartmentNames=cell(numel(modelSBML.compartment),1);
0127 compartmentIDs=cell(numel(modelSBML.compartment),1);
0128 compartmentOutside=cell(numel(modelSBML.compartment),1);
0129 compartmentMiriams=cell(numel(modelSBML.compartment),1);
0130 
0131 for i=1:numel(modelSBML.compartment)
0132     compartmentNames{i}=modelSBML.compartment(i).name;
0133     if isCOBRA==true
0134         compartmentIDs{i}=modelSBML.compartment(i).id;
0135     else
0136         compartmentIDs{i}=modelSBML.compartment(i).id(3:end);
0137     end
0138     if isfield(modelSBML.compartment(i),'outside')
0139         if ~isempty(modelSBML.compartment(i).outside)
0140             if isCOBRA==true
0141                 compartmentOutside{i}=modelSBML.compartment(i).outside;
0142             else
0143                 compartmentOutside{i}=modelSBML.compartment(i).outside(3:end);
0144             end
0145         else
0146             compartmentOutside{i}='';
0147         end
0148     else
0149         compartmentOutside{i}=[];
0150     end
0151     
0152     if isfield(modelSBML.compartment(i),'annotation')
0153         compartmentMiriams{i}=parseMiriam(modelSBML.compartment(i).annotation);
0154     else
0155         compartmentMiriams{i}=[];
0156     end
0157 end
0158 
0159 %If there are no compartment names then use compartment id as name
0160 if all(cellfun(@isempty,compartmentNames))
0161     compartmentNames=compartmentIDs;
0162 end
0163 
0164 %Retrieve info on metabolites, genes, complexes
0165 metaboliteNames={};
0166 metaboliteIDs={};
0167 metaboliteCompartments={};
0168 metaboliteUnconstrained=[];
0169 metaboliteFormula={};
0170 metaboliteInChI={};
0171 metaboliteMiriams={};
0172 
0173 geneNames={};
0174 geneIDs={};
0175 geneMiriams={};
0176 geneShortNames={};
0177 geneCompartments={};
0178 complexIDs={};
0179 complexNames={};
0180 
0181 %If the file is not a COBRA Toolbox model. According to the format
0182 %specified in the yeast consensus model both metabolites and genes are a
0183 %type of 'species'. The metabolites have names starting with 'M_' and genes
0184 %with 'E_'.
0185 if isCOBRA==false
0186     for i=1:numel(modelSBML.species) 
0187         if strcmpi(modelSBML.species(i).id(1:2),'M_')
0188             metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name;
0189             metaboliteIDs{numel(metaboliteIDs)+1,1}=modelSBML.species(i).id(3:end);
0190             metaboliteCompartments{numel(metaboliteCompartments)+1,1}=modelSBML.species(i).compartment(3:end);
0191             metaboliteUnconstrained(numel(metaboliteUnconstrained)+1,1)=modelSBML.species(i).boundaryCondition;
0192 
0193             %For each metabolite retrieve the formula and the InChI code if
0194             %available
0195             %First add the InChI code and the formula from the InChI. This
0196             %allows for overwriting the formula by setting the actual formula
0197             %field
0198             if ~isempty(modelSBML.species(i).annotation)
0199                 %Get the formula if available
0200                 startString='>InChI=';
0201                 endString='</in:inchi>';
0202                 formStart=strfind(modelSBML.species(i).annotation,startString);
0203                 if ~isempty(formStart)
0204                     formEnd=strfind(modelSBML.species(i).annotation,endString);
0205                     formEndIndex=find(formEnd>formStart, 1 );
0206                     formula=modelSBML.species(i).annotation(formStart+numel(startString):formEnd(formEndIndex)-1);
0207                     metaboliteInChI{numel(metaboliteInChI)+1,1}=formula;
0208 
0209                     %The composition is most often present between the first
0210                     %and second "/" in the model. In some simple molecules,
0211                     %such as salts, there is no second "/". The formula is then
0212                     %assumed to be to the end of the string
0213                     compositionIndexes=strfind(formula,'/');
0214                     if numel(compositionIndexes)>1
0215                         metaboliteFormula{numel(metaboliteFormula)+1,1}=...
0216                             formula(compositionIndexes(1)+1:compositionIndexes(2)-1);
0217                     else
0218                         if numel(compositionIndexes)==1
0219                             %Probably a simple molecule which can have only one
0220                             %conformation
0221                             metaboliteFormula{numel(metaboliteFormula)+1,1}=...
0222                             formula(compositionIndexes(1)+1:numel(formula));
0223                         else
0224                             metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0225                         end
0226                     end
0227                 else
0228                     metaboliteInChI{numel(metaboliteInChI)+1,1}='';
0229                     metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0230                 end
0231                 
0232                 %Get Miriam info
0233                 metMiriam=parseMiriam(modelSBML.species(i).annotation);
0234                 metaboliteMiriams{numel(metaboliteMiriams)+1,1}=metMiriam;
0235             else
0236                 metaboliteInChI{numel(metaboliteInChI)+1,1}='';
0237                 metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0238                 metaboliteMiriams{numel(metaboliteMiriams)+1,1}=[];
0239             end
0240 
0241             if ~isempty(modelSBML.species(i).notes)
0242                 %Get the formula if available
0243                 startString='FORMULA:';
0244                 endString='</';
0245                 formStart=strfind(modelSBML.species(i).notes,startString);
0246                 if ~isempty(formStart)
0247                     formEnd=strfind(modelSBML.species(i).notes,endString);
0248                     formEndIndex=find(formEnd>formStart, 1 );
0249                     formula=strtrim(modelSBML.species(i).notes(formStart+numel(startString):formEnd(formEndIndex)-1));
0250                     metaboliteFormula{numel(metaboliteFormula),1}=formula;
0251                 end
0252             end
0253         end
0254 
0255         if strcmpi(modelSBML.species(i).id(1:2),'E_')
0256             geneNames{numel(geneNames)+1,1}=modelSBML.species(i).name;
0257             
0258             %The "E_" is included in the ID. This is because it's only used
0259             %internally in this file and it makes the matching a little
0260             %smoother
0261             geneIDs{numel(geneIDs)+1,1}=modelSBML.species(i).id;
0262             geneCompartments{numel(geneCompartments)+1,1}=modelSBML.species(i).compartment(3:end);
0263 
0264             %Get Miriam structure
0265             if isfield(modelSBML.species(i),'annotation')
0266                 %Get Miriam info
0267                 geneMiriam=parseMiriam(modelSBML.species(i).annotation);
0268                 geneMiriams{numel(geneMiriams)+1,1}=geneMiriam;
0269             else
0270                 geneMiriams{numel(geneMiriams)+1,1}=[];
0271             end
0272             
0273             %Protein short names (for example ERG10) are saved as SHORT
0274             %NAME: NAME in the notes-section of metabolites for the "new
0275             %format" and as PROTEIN_ASSOCIATION for each reaction in COBRA
0276             %Toolbox format. For now only the SHORT NAME is loaded, and no
0277             %mapping takes place
0278             if ~isempty(modelSBML.species(i).notes)
0279                 %Get the short name if available
0280                 startString='SHORT NAME:';
0281                 endString='</';
0282                 shortStart=strfind(modelSBML.species(i).notes,startString);
0283                 if ~isempty(shortStart)
0284                     shortEnd=strfind(modelSBML.species(i).notes,endString);
0285                     shortEndIndex=find(shortEnd>shortStart, 1 );
0286                     shortName=strtrim(modelSBML.species(i).notes(shortStart+numel(startString):shortEnd(shortEndIndex)-1));
0287                     geneShortNames{numel(geneShortNames)+1,1}=shortName;
0288                 else
0289                     geneShortNames{numel(geneShortNames)+1,1}='';
0290                 end
0291             else
0292                 geneShortNames{numel(geneShortNames)+1,1}='';
0293             end
0294         end
0295         
0296         %If it's a complex keep the ID and name
0297         if strcmpi(modelSBML.species(i).id(1:3),'Cx_')
0298             complexIDs=[complexIDs;modelSBML.species(i).id];
0299             complexNames=[complexNames;modelSBML.species(i).name];
0300         end        
0301     end
0302     
0303 %If it's a COBRA Toolbox file
0304 else
0305     for i=1:numel(modelSBML.species) 
0306         %The metabolite names are assumed to be M_NAME_COMPOSITION or
0307         %_NAME_COMPOSITION or NAME_COMPOSITION or NAME
0308         underscoreIndex=strfind(modelSBML.species(i).name,'_');
0309 
0310         %Skip the first character if it's an underscore
0311         if any(underscoreIndex)
0312             if underscoreIndex(1)==1
0313                 metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(2:max(underscoreIndex)-1);
0314             else
0315                 %If the second character is an underscore than check if the
0316                 %first is "M".
0317                 if underscoreIndex(1)==2
0318                     if modelSBML.species(i).name(1)=='M'
0319                         metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(3:max(underscoreIndex)-1);
0320                     else
0321                         %If not, then use the full name
0322                         metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(1:max(underscoreIndex)-1);
0323                     end
0324                 else
0325                     %Use the full name
0326                     metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(1:max(underscoreIndex)-1);
0327                 end
0328             end
0329         else
0330             %Use the full name
0331             metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name;
0332         end
0333         
0334         metaboliteIDs{numel(metaboliteIDs)+1,1}=modelSBML.species(i).id(3:numel(modelSBML.species(i).id));
0335         metaboliteCompartments{numel(metaboliteCompartments)+1,1}=modelSBML.species(i).compartment;
0336         
0337         %I think that COBRA doesn't set the boundary condition, but rather
0338         %uses name_b. Check for either
0339         metaboliteUnconstrained(numel(metaboliteUnconstrained)+1,1)=modelSBML.species(i).boundaryCondition;
0340         if strcmp(metaboliteIDs{end}(max(end-1,1):end),'_b')
0341             metaboliteUnconstrained(end)=1;
0342         end
0343         
0344         %Get the formula
0345         if max(underscoreIndex)<length(modelSBML.species(i).name)
0346             metaboliteFormula{numel(metaboliteFormula)+1,1}=modelSBML.species(i).name(max(underscoreIndex)+1:length(modelSBML.species(i).name));
0347         else
0348             metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0349         end
0350         
0351         %The newer COBRA version sometimes has composition information in
0352         %the notes instead
0353         if ~isempty(modelSBML.species(i).notes)
0354             %Get the formula if available
0355             startString='FORMULA:';
0356             endString='</';
0357             formStart=strfind(modelSBML.species(i).notes,startString);
0358             if ~isempty(formStart)
0359                 formEnd=strfind(modelSBML.species(i).notes,endString);
0360                 formEndIndex=find(formEnd>formStart, 1 );
0361                 formula=strtrim(modelSBML.species(i).notes(formStart+numel(startString):formEnd(formEndIndex)-1));
0362                 metaboliteFormula{numel(metaboliteFormula),1}=formula;
0363             end
0364         end
0365     end
0366 end
0367 
0368 %Retrieve info on reactions
0369 reactionNames=cell(numel(modelSBML.reaction),1);
0370 reactionIDs=cell(numel(modelSBML.reaction),1);
0371 subsystems=cell(numel(modelSBML.reaction),1);
0372 subsystems(:,:)=cellstr('');
0373 eccodes=cell(numel(modelSBML.reaction),1);
0374 eccodes(:,:)=cellstr('');
0375 grRules=cell(numel(modelSBML.reaction),1);
0376 grRules(:,:)=cellstr('');
0377 rxnComps=zeros(numel(modelSBML.reaction),1);
0378 rxnMiriams=cell(numel(modelSBML.reaction),1);
0379 reactionReversibility=zeros(numel(modelSBML.reaction),1);
0380 reactionUB=zeros(numel(modelSBML.reaction),1);
0381 reactionLB=zeros(numel(modelSBML.reaction),1);
0382 reactionObjective=zeros(numel(modelSBML.reaction),1);
0383 
0384 %Construct the stoichiometric matrix while the reaction info is read
0385 S=zeros(numel(metaboliteIDs),numel(modelSBML.reaction));
0386 
0387 %This is for collecting all genes before getting a unique list (only used
0388 %in the COBRA format). The reason is to avoid too many calls to strmatch
0389 tempGeneList={};
0390 
0391 counter=0;
0392 for i=1:numel(modelSBML.reaction)
0393     
0394     %Check that the reaction doesn't produce a complex and nothing else.
0395     %If so, then jump to the next reaction. This is because I get the
0396     %genes for complexes from the names and not from the reactions that
0397     %create them. This only applies to the non-COBRA format.
0398     if numel(modelSBML.reaction(i).product)==1
0399         if strcmp(modelSBML.reaction(i).product(1).species(1:3),'Cx_')==true
0400             continue;
0401         end
0402     end
0403     
0404     %It didn't look like a gene complex-forming reaction
0405     counter=counter+1;
0406     
0407     if isCOBRA && numel(modelSBML.reaction(i).name)>2
0408         %In COBRA the reaction names starts with "R_" but I check to be
0409         %sure
0410         if strcmp(modelSBML.reaction(i).name(1:2),'R_')
0411             reactionNames{counter}=modelSBML.reaction(i).name(3:end);
0412         else
0413             reactionNames{counter}=modelSBML.reaction(i).name;
0414         end
0415     else
0416         reactionNames{counter}=modelSBML.reaction(i).name;
0417     end
0418     
0419     %Assumes that the ID starts with "R_"
0420     reactionIDs{counter}=modelSBML.reaction(i).id(3:end);
0421     reactionReversibility(counter)=modelSBML.reaction(i).reversible;
0422     
0423     %The order of these parameters should not be hard coded
0424     if isfield(modelSBML.reaction(i).kineticLaw,'parameter')
0425         reactionLB(counter)=modelSBML.reaction(i).kineticLaw.parameter(1).value;
0426         reactionUB(counter)=modelSBML.reaction(i).kineticLaw.parameter(2).value;
0427         reactionObjective(counter)=modelSBML.reaction(i).kineticLaw.parameter(3).value;
0428     else
0429         if reactionReversibility(counter)==true
0430             reactionLB(counter)=-inf;
0431         else
0432             reactionLB(counter)=0;
0433         end
0434         reactionUB(counter)=inf;
0435         reactionObjective(counter)=0;
0436     end
0437     
0438     %Find the associated gene if available
0439     if isCOBRA==false
0440         if isfield(modelSBML.reaction(i),'modifier')
0441             if ~isempty(modelSBML.reaction(i).modifier)
0442                 rules='';
0443                 for j=1:numel(modelSBML.reaction(i).modifier)
0444                     modifier=modelSBML.reaction(i).modifier(j).species;
0445                     if ~isempty(modifier)
0446                         if strfind(modifier,'E_')
0447                             index=find(strcmp(modifier,geneIDs));
0448                             %This should be unique and in the geneIDs list, otherwise
0449                             %something is wrong
0450                             if numel(index)~=1
0451                                dispEM(['Could not get the gene association data from reaction ' reactionIDs{i}]); 
0452                             end
0453                             %Add the association
0454                             %rxnGeneMat(i,index)=1;
0455                             if ~isempty(rules)
0456                                 rules=[rules ' or (' geneNames{index} ')'];
0457                             else
0458                                 rules=['(' geneNames{index} ')'];
0459                             end
0460                         else
0461                            %It seems to be a complex. Add the corresponding
0462                            %genes from the name of the complex (not the
0463                            %reaction that creates it)
0464                            index=find(strcmp(modifier,complexIDs));
0465                            if numel(index)==1
0466                                if ~isempty(rules)
0467                                     rules=[rules ' or (' strrep(complexNames{index},':',' and ') ')'];
0468                                 else
0469                                     rules=['(' strrep(complexNames{index},':',' and ') ')'];
0470                                end
0471                            else
0472                               %Could not find a complex
0473                               dispEM(['Could not get the gene association data from reaction ' reactionIDs{i}]);
0474                            end
0475                         end
0476                     end
0477                 end
0478                 grRules{counter}=rules;
0479             end
0480         end
0481     else
0482         %Get gene association for COBRA Toolbox models. The genes are added
0483         %here as well as the associations. Gene complexes are ok, but only
0484         %if they are on the form (A and B and...)
0485         if ~isempty(modelSBML.reaction(i).notes)
0486             startString='GENE_ASSOCIATION:';
0487             endString='</';
0488             geneStart=strfind(modelSBML.reaction(i).notes,startString);
0489             if isempty(geneStart)
0490                 startString='GENE ASSOCIATION:';
0491                 geneStart=strfind(modelSBML.reaction(i).notes,startString);
0492             end
0493             if ~isempty(geneStart)
0494                 geneEnd=strfind(modelSBML.reaction(i).notes,endString);
0495                 geneEndIndex=find(geneEnd>geneStart, 1 );
0496                 geneAssociation=strtrim(modelSBML.reaction(i).notes(geneStart+numel(startString):geneEnd(geneEndIndex)-1));          
0497                 if ~isempty(geneAssociation)
0498                     %This adds the grRules. The gene list and rxnGeneMat
0499                     %are created later
0500                     grRules{counter}=geneAssociation;
0501                 end
0502             end
0503         end
0504     end
0505     
0506     %Add subsystems and reaction compartment
0507     if ~isempty(modelSBML.reaction(i).notes)
0508         %Get the subsystem if available
0509         startString='SUBSYSTEM:';
0510         endString='</';
0511         subStart=strfind(modelSBML.reaction(i).notes,startString);
0512         if ~isempty(subStart)
0513             subEnd=strfind(modelSBML.reaction(i).notes,endString);
0514             subEndIndex=find(subEnd>subStart, 1 );
0515             subsystem=strtrim(modelSBML.reaction(i).notes(subStart+numel(startString):subEnd(subEndIndex)-1));
0516             subsystems{counter}=subsystem;
0517         end
0518         startString='COMPARTMENT:';
0519         endString='</';
0520         compStart=strfind(modelSBML.reaction(i).notes,startString);
0521         if ~isempty(compStart)
0522             compEnd=strfind(modelSBML.reaction(i).notes,endString);
0523             compEndIndex=find(compEnd>compStart, 1 );
0524             rxnComp=strtrim(modelSBML.reaction(i).notes(compStart+numel(startString):compEnd(compEndIndex)-1));
0525             %Find it in the compartment list
0526             [crap J]=ismember(rxnComp,compartmentIDs);
0527             rxnComps(counter)=J;
0528         end
0529     end
0530     
0531     %Get ec-codes
0532     flagEmpty=false;
0533     if isCOBRA==false
0534         if ~isempty(modelSBML.reaction(i).annotation)
0535             searchString=modelSBML.reaction(i).annotation;
0536             startString='urn:miriam:ec-code:';
0537             endString='"';
0538         else
0539             flagEmpty=true;
0540         end
0541     else
0542         if ~isempty(modelSBML.reaction(i).notes)
0543             searchString=modelSBML.reaction(i).notes;
0544             startString='PROTEIN_CLASS:';
0545             endString='</';
0546         else
0547             flagEmpty=true;
0548         end
0549     end
0550     
0551     if flagEmpty==false
0552         ecStart=strfind(searchString,startString);
0553         ecEnd=strfind(searchString,endString);
0554         %There can be several ec-codes, but they should be merged to one
0555         %string
0556         for j=1:numel(ecStart)
0557             ecEndIndex=find(ecEnd>ecStart(j), 1 );
0558             eccode=strtrim(searchString(ecStart(j)+numel(startString):ecEnd(ecEndIndex)-1));
0559             if j==1
0560                 eccodes{counter}=eccode;
0561             else
0562                 eccodes{counter}=[eccodes{counter} ';' eccode];
0563             end
0564         end
0565     end 
0566  
0567     %Get other Miriam fields. This may include for example database indexes
0568     %to organism-specific databases. EC-codes are supported by the COBRA
0569     %Toolbox format and are therefore loaded separately
0570     if isCOBRA==false
0571         miriamStruct=parseMiriam(modelSBML.reaction(i).annotation);
0572         rxnMiriams{counter}=miriamStruct; 
0573     end
0574     
0575     %Add all reactants
0576     for j=1:numel(modelSBML.reaction(i).reactant)
0577        %Get the index of the metabolite in metaboliteIDs. External
0578        %metabolites will be removed at a later stage
0579        %Assumes that all metabolites start with "M_"
0580        metIndex=find(strcmp(modelSBML.reaction(i).reactant(j).species(3:end),metaboliteIDs),1);
0581        if isempty(metIndex)
0582             dispEM(['Could not find metabolite ' modelSBML.reaction(i).reactant(j).species(3:end) ' in reaction ' reactionIDs{counter}]); 
0583        end
0584        S(metIndex,counter)=S(metIndex,counter)+modelSBML.reaction(i).reactant(j).stoichiometry*-1;
0585     end
0586     
0587     %Add all products
0588     for j=1:numel(modelSBML.reaction(i).product)
0589        %Get the index of the metabolite in metaboliteIDs.
0590        %Assumes that all metabolites start with "M_"
0591        metIndex=find(strcmp(modelSBML.reaction(i).product(j).species(3:end),metaboliteIDs),1);
0592        if isempty(metIndex)
0593             dispEM(['Could not find metabolite ' modelSBML.reaction(i).reactant(j).species(3:end) ' in reaction ' reactionIDs{counter}]); 
0594        end
0595        S(metIndex,counter)=S(metIndex,counter)+modelSBML.reaction(i).product(j).stoichiometry;
0596     end
0597 end
0598 
0599 %Shrink the structures if complex-forming reactions had to be skipped
0600 reactionNames=reactionNames(1:counter);
0601 reactionIDs=reactionIDs(1:counter);
0602 subsystems=subsystems(1:counter);
0603 eccodes=eccodes(1:counter);
0604 grRules=grRules(1:counter);
0605 rxnMiriams=rxnMiriams(1:counter);
0606 reactionReversibility=reactionReversibility(1:counter);
0607 reactionUB=reactionUB(1:counter);
0608 reactionLB=reactionLB(1:counter);
0609 reactionObjective=reactionObjective(1:counter);
0610 S=S(:,1:counter);
0611 
0612 model.description=modelSBML.name;
0613 model.id=modelSBML.id;
0614 model.rxns=reactionIDs;
0615 model.mets=metaboliteIDs;
0616 model.S=sparse(S);
0617 model.lb=reactionLB;
0618 model.ub=reactionUB;
0619 model.rev=reactionReversibility;
0620 model.c=reactionObjective;
0621 model.b=zeros(numel(metaboliteIDs),1);
0622 model.comps=compartmentIDs;
0623 model.compNames=compartmentNames;
0624 
0625 %Load annotation if available
0626 if isfield(modelSBML,'annotation')
0627     endString='</';
0628     I=strfind(modelSBML.annotation,endString);
0629     J=strfind(modelSBML.annotation,'<vCard:Family>');
0630     if any(J)
0631        model.annotation.familyName=modelSBML.annotation(J+14:I(find(I>J,1))-1);
0632     end
0633     J=strfind(modelSBML.annotation,'<vCard:Given>');
0634     if any(J)
0635        model.annotation.givenName=modelSBML.annotation(J+13:I(find(I>J,1))-1);
0636     end
0637     J=strfind(modelSBML.annotation,'<vCard:EMAIL>');
0638     if any(J)
0639        model.annotation.email=modelSBML.annotation(J+13:I(find(I>J,1))-1);
0640     end
0641     J=strfind(modelSBML.annotation,'<vCard:Orgname>');
0642     if any(J)
0643        model.annotation.organization=modelSBML.annotation(J+15:I(find(I>J,1))-1);
0644     end
0645     endString='"/>';
0646     I=strfind(modelSBML.annotation,endString);
0647     J=strfind(modelSBML.annotation,'"urn:miriam:');
0648     if any(J)
0649        model.annotation.taxonomy=modelSBML.annotation(J+12:I(find(I>J,1))-1);
0650     end
0651 end
0652 if isfield(modelSBML,'notes')
0653     startString=strfind(modelSBML.notes,'xhtml">');
0654     endString=strfind(modelSBML.notes,'</body>');
0655     if any(startString) && any(endString)
0656        model.annotation.note=modelSBML.notes(startString+7:endString-1);
0657     end
0658 end
0659 
0660 if any(~cellfun(@isempty,compartmentOutside))
0661     model.compOutside=compartmentOutside;
0662 end
0663 
0664 model.rxnNames=reactionNames;
0665 model.metNames=metaboliteNames;
0666 
0667 %Match the compartments for metabolites
0668 [I J]=ismember(metaboliteCompartments,model.comps);
0669 model.metComps=J;
0670 
0671 %If any genes have been loaded (only for the new format)
0672 if ~isempty(geneNames)
0673     model.genes=geneNames;
0674     model.rxnGeneMat=getGeneMat(grRules,geneNames);
0675     model.grRules=grRules;
0676     %Match the compartments for genes
0677     [I J]=ismember(geneCompartments,model.comps);
0678     model.geneComps=J;
0679 else
0680     if ~isempty(grRules)
0681        grRules=strrep(grRules,' AND ',' and ');
0682        grRules=strrep(grRules,' OR ',' or ');
0683        %In the non-COBRA version genes are surrounded by parenthesis even
0684        %if they are the only gene. Also, only single spaces are used
0685        %between genes. I'm pretty sure this is compatible with COBRA Toolbox so I
0686        %change it to be the same here.
0687        grRules=strrep(grRules,'  ',' ');
0688        grRules=strrep(grRules,'((','(');
0689        grRules=strrep(grRules,'))',')');
0690        grRules=strrep(grRules,'( ','(');
0691        grRules=strrep(grRules,' )',')');
0692        grRules=strrep(grRules,') or (','*%%%%*');
0693        grRules=strrep(grRules,' or ',') or (');
0694        grRules=strrep(grRules,'*%%%%*',') or (');
0695        
0696        %Not very neat, but add parenthesis if missing
0697        for i=1:numel(grRules)
0698           if any(grRules{i})
0699               if ~strcmp(grRules{i}(1),'(')
0700                   grRules{i}=['(' grRules{i} ')'];
0701               end
0702           end
0703        end
0704        [rxnGeneMat, genes]=getGeneMat(grRules);
0705        model.rxnGeneMat=rxnGeneMat;
0706        model.genes=genes;
0707        model.grRules=grRules;
0708     end
0709 end
0710 
0711 %If any InChIs have been loaded
0712 if any(~cellfun(@isempty,metaboliteInChI))
0713     model.inchis=metaboliteInChI;
0714 end
0715 
0716 %If any formulas have been loaded
0717 if any(~cellfun(@isempty,metaboliteFormula))
0718     model.metFormulas=metaboliteFormula;
0719 end
0720 
0721 %If any gene short names have been loaded
0722 if any(~cellfun(@isempty,geneShortNames))
0723     model.geneShortNames=geneShortNames;
0724 end
0725 
0726 %If any Miriam strings for compartments have been loaded
0727 if any(~cellfun(@isempty,compartmentMiriams))
0728     model.compMiriams=compartmentMiriams;
0729 end
0730 
0731 %If any Miriam strings for metabolites have been loaded
0732 if any(~cellfun(@isempty,metaboliteMiriams))
0733     model.metMiriams=metaboliteMiriams;
0734 end
0735 
0736 %If any subsystems have been loaded
0737 if any(~cellfun(@isempty,subsystems))
0738     model.subSystems=subsystems;
0739 end
0740 if any(rxnComps)
0741    if all(rxnComps)
0742        model.rxnComps=rxnComps;
0743    else
0744        if supressWarnings==false
0745             dispEM('The compartments for the following reactions could not be matched. Ignoring reaction compartment information',false,model.rxns(rxnComps==0));
0746        end
0747    end
0748 end
0749 
0750 %If any ec-codes have been loaded
0751 if any(~cellfun(@isempty,eccodes))
0752     model.eccodes=eccodes;
0753 end
0754 
0755 %If any Miriam strings for reactions have been loaded
0756 if any(~cellfun(@isempty,rxnMiriams))
0757     model.rxnMiriams=rxnMiriams;
0758 end
0759 
0760 %If any Miriam strings for genes have been loaded
0761 if any(~cellfun(@isempty,geneMiriams))
0762     model.geneMiriams=geneMiriams;
0763 end
0764 
0765 model.unconstrained=metaboliteUnconstrained;
0766 
0767 %Remove unused fields
0768 if isempty(model.annotation)
0769     model=rmfield(model,'annotation');
0770 end
0771 if isempty(model.compOutside)
0772     model=rmfield(model,'compOutside');
0773 end
0774 if isempty(model.compMiriams)
0775     model=rmfield(model,'compMiriams');
0776 end
0777 if isempty(model.rxnComps)
0778     model=rmfield(model,'rxnComps');
0779 end
0780 if isempty(model.grRules)
0781     model=rmfield(model,'grRules');
0782 end
0783 if isempty(model.rxnGeneMat)
0784     model=rmfield(model,'rxnGeneMat');
0785 end
0786 if isempty(model.subSystems)
0787     model=rmfield(model,'subSystems');
0788 end
0789 if isempty(model.eccodes)
0790     model=rmfield(model,'eccodes');
0791 end
0792 if isempty(model.rxnMiriams)
0793     model=rmfield(model,'rxnMiriams');
0794 end
0795 if isempty(model.genes)
0796     model=rmfield(model,'genes');
0797 end
0798 if isempty(model.geneComps)
0799     model=rmfield(model,'geneComps');
0800 end
0801 if isempty(model.geneMiriams)
0802     model=rmfield(model,'geneMiriams');
0803 end
0804 if isempty(model.inchis)
0805     model=rmfield(model,'inchis');
0806 end
0807 if isempty(model.metFormulas)
0808     model=rmfield(model,'metFormulas');
0809 end
0810 if isempty(model.metMiriams)
0811     model=rmfield(model,'metMiriams');
0812 end
0813 
0814 %This just removes the grRules if no genes have been loaded
0815 if ~isfield(model,'genes') && isfield(model,'grRules')
0816    model=rmfield(model,'grRules'); 
0817 end
0818 
0819 %Print warnings about bad structure
0820 if supressWarnings==false
0821     checkModelStruct(model,false);
0822 end
0823 
0824 if removeExcMets==true
0825     model=simplifyModel(model);
0826 end
0827 end
0828 
0829 function [rxnGeneMat, matchGenes]=getGeneMat(grRules,matchGenes)
0830 %Constructs the rxnGeneMat matrix and the cell array with gene names from
0831 %the grRules. Uses the genes in the order defined by matchGenes if supplied. No
0832 %checks are made here since that should have been made before.
0833 
0834 if nargin<2
0835     matchGenes={};
0836 end
0837 
0838 %Assume that everything that isn't a paranthesis, " AND " or " or " is a
0839 %gene name
0840 genes=strrep(grRules,'(','');
0841 genes=strrep(genes,')','');
0842 genes=strrep(genes,' or ',' ');
0843 genes=strrep(genes,' and ',' ');
0844 [crap crap crap crap crap crap genes]=regexp(genes,' ');
0845 
0846 if isempty(matchGenes)
0847     allNames={};
0848     for i=1:numel(genes)
0849         allNames=[allNames genes{i}];
0850     end
0851     matchGenes=unique(allNames)';
0852     
0853     %Remove the empty element if present
0854     if isempty(matchGenes{1})
0855         matchGenes(1)=[];
0856     end
0857 end
0858 
0859 %Create the matrix
0860 rxnGeneMat=zeros(numel(genes),numel(matchGenes));
0861 
0862 for i=1:numel(genes)
0863     if ~isempty(genes{i})
0864         for j=1:numel(genes{i})
0865             if ~isempty(genes{i}{j})
0866                 index=find(strcmp(genes{i}{j},matchGenes));
0867                 if numel(index)==1
0868                     rxnGeneMat(i,index)=1;
0869                 else
0870                     dispEM(['The gene ' genes{i}{j} ' could not be matched to a gene in the gene list.']); 
0871                 end
0872             end
0873         end
0874     end
0875 end
0876 rxnGeneMat=sparse(rxnGeneMat);
0877 end
0878 
0879 function miriamStruct=parseMiriam(searchString)
0880 %Gets the names and values of Miriam-string. Nothing fancy at all, just to
0881 %prevent using the same code for metabolites, genes, and reactions
0882 
0883 if ~isempty(searchString)    
0884     startString='urn:miriam:';
0885     midString=':';
0886     endString='"';
0887     startIndexes=strfind(searchString,startString);
0888     midIndexes=strfind(searchString,midString);
0889     endIndexes=strfind(searchString,endString);
0890     miriamStruct=[];
0891 
0892     if ~isempty(startIndexes)
0893         counter=0;
0894         for j=1:numel(startIndexes)
0895             endIndex=find(endIndexes>startIndexes(j)+numel(startString), 1 );
0896             
0897             midIndex=find(midIndexes>startIndexes(j)+numel(startString) & midIndexes<endIndexes(endIndex),1);
0898             
0899             %It was like this before and I don't understand why!
0900             %midIndex=find(midIndexes<endIndexes(endIndex),1,'last');
0901             
0902             miriam=searchString(startIndexes(j)+numel(startString):midIndexes(midIndex)-1);
0903 
0904             %Construct the struct
0905             if ~strcmpi(miriam,'ec-code')
0906                 counter=counter+1;
0907                 miriamStruct.name{counter,1}=miriam;
0908                 if any(midIndex)
0909                     miriamStruct.value{counter,1}=searchString(midIndexes(midIndex)+1:endIndexes(endIndex)-1);
0910                 else
0911                     %This is if there is no miriam type defined, but that
0912                     %there still is some value defined
0913                     miriamStruct.value{counter,1}=searchString(startIndexes(j)+numel(startString):endIndexes(endIndex)-1);
0914                 end
0915             end
0916         end
0917     end
0918 else
0919     miriamStruct=[];
0920 end
0921 end

Generated on Mon 06-Jan-2014 14:58:12 by m2html © 2005