Home > RAVEN > importModel.m

importModel

PURPOSE ^

importModel

SYNOPSIS ^

function model=importModel(fileName,removeExcMets,isCOBRA)

DESCRIPTION ^

 importModel
   Import a constraint-based model from a SBML file

   fileName      a SBML file to import
   removeExcMets true if exchange metabolites should be removed. This is
                 needed to be able to run simulations, but it could also 
                 be done using simplifyModel at a later stage (opt,
                 default true)
   isCOBRA       true if the SBML-file is a COBRA Toolbox file (opt, default
                 false)

   model
       description      description of model contents
       id               model ID
       rxns             reaction ids
       mets             metabolite ids
       S                stoichiometric matrix
       lb               lower bounds
       ub               upper bounds
       rev              reversibility vector
       c                objective coefficients
       b                equality constraints for the metabolite equations
       comps            compartment ids
       compNames        compartment names
       compOutside      the id (as in comps) for the compartment
                        surrounding each of the compartments
       rxnNames         reaction description
       metNames         metabolite description
       metComps         compartments for metabolites
       genes            list of all genes
       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
       grRules          reaction to gene rules in text form
       metFormulas      metabolite chemical formula
       geneShortName    abbreviations for proteins encoded by the genes
       compMiriams      structure with MIRIAM information about the
                        compartments
       subSystems       subsystem name for each reaction
       inchis           InChI-codes for metabolites
       eccodes          EC-codes for the reactions
       rxnMiriams       structure with MIRIAM information about the reactions
       metMiriams       structure with MIRIAM information about the metabolites
       geneMiriams      structure with MIRIAM information about the genes
       unconstrained    true if the metabolite is an exchange metabolite

   Loads models in the COBRA Toolbox format and in the format used in
   the yeast consensus reconstruction. The resulting structure is compatible
   with COBRA Toolbox.

   NOTE: This script requires that libSBML is installed.
   
   NOTE: All IDs are assumed to be named C_, M_, E_, R_ for compartments, 
         metabolites, genes, and reactions. This is true for models
         generated by SBMLFromExcel and those that follow the yeast
         consensus network model formulation.

   Usage: model=importModel(fileName,removeExcMets,isCOBRA)

   Rasmus Agren, 2013-04-19

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function model=importModel(fileName,removeExcMets,isCOBRA)
0002 % importModel
0003 %   Import a constraint-based model from a SBML file
0004 %
0005 %   fileName      a SBML file to import
0006 %   removeExcMets true if exchange metabolites should be removed. This is
0007 %                 needed to be able to run simulations, but it could also
0008 %                 be done using simplifyModel at a later stage (opt,
0009 %                 default true)
0010 %   isCOBRA       true if the SBML-file is a COBRA Toolbox file (opt, default
0011 %                 false)
0012 %
0013 %   model
0014 %       description      description of model contents
0015 %       id               model ID
0016 %       rxns             reaction ids
0017 %       mets             metabolite ids
0018 %       S                stoichiometric matrix
0019 %       lb               lower bounds
0020 %       ub               upper bounds
0021 %       rev              reversibility vector
0022 %       c                objective coefficients
0023 %       b                equality constraints for the metabolite equations
0024 %       comps            compartment ids
0025 %       compNames        compartment names
0026 %       compOutside      the id (as in comps) for the compartment
0027 %                        surrounding each of the compartments
0028 %       rxnNames         reaction description
0029 %       metNames         metabolite description
0030 %       metComps         compartments for metabolites
0031 %       genes            list of all genes
0032 %       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
0033 %       grRules          reaction to gene rules in text form
0034 %       metFormulas      metabolite chemical formula
0035 %       geneShortName    abbreviations for proteins encoded by the genes
0036 %       compMiriams      structure with MIRIAM information about the
0037 %                        compartments
0038 %       subSystems       subsystem name for each reaction
0039 %       inchis           InChI-codes for metabolites
0040 %       eccodes          EC-codes for the reactions
0041 %       rxnMiriams       structure with MIRIAM information about the reactions
0042 %       metMiriams       structure with MIRIAM information about the metabolites
0043 %       geneMiriams      structure with MIRIAM information about the genes
0044 %       unconstrained    true if the metabolite is an exchange metabolite
0045 %
0046 %   Loads models in the COBRA Toolbox format and in the format used in
0047 %   the yeast consensus reconstruction. The resulting structure is compatible
0048 %   with COBRA Toolbox.
0049 %
0050 %   NOTE: This script requires that libSBML is installed.
0051 %
0052 %   NOTE: All IDs are assumed to be named C_, M_, E_, R_ for compartments,
0053 %         metabolites, genes, and reactions. This is true for models
0054 %         generated by SBMLFromExcel and those that follow the yeast
0055 %         consensus network model formulation.
0056 %
0057 %   Usage: model=importModel(fileName,removeExcMets,isCOBRA)
0058 %
0059 %   Rasmus Agren, 2013-04-19
0060 %
0061 
0062 if nargin<2
0063     removeExcMets=true;
0064 end
0065 
0066 if nargin<3
0067     isCOBRA=false;
0068 end
0069 
0070 %Load the model using libSBML
0071 
0072 modelSBML = TranslateSBML(fileName);
0073 
0074 if isempty(modelSBML)
0075    throw(MException('','There is a problem with the SBML file. Try using the SBML Validator at http://sbml.org/Facilities/Validator'));
0076 end
0077 
0078 %Retrieve compartment names and IDs
0079 compartmentNames=cell(numel(modelSBML.compartment),1);
0080 compartmentIDs=cell(numel(modelSBML.compartment),1);
0081 compartmentOutside=cell(numel(modelSBML.compartment),1);
0082 compartmentMiriams=cell(numel(modelSBML.compartment),1);
0083 
0084 for i=1:numel(modelSBML.compartment)
0085     compartmentNames{i}=modelSBML.compartment(i).name;
0086     if isCOBRA==true
0087         compartmentIDs{i}=modelSBML.compartment(i).id;
0088     else
0089         compartmentIDs{i}=modelSBML.compartment(i).id(3:end);
0090     end
0091     if isfield(modelSBML.compartment(i),'outside')
0092         if ~isempty(modelSBML.compartment(i).outside)
0093             if isCOBRA==true
0094                 compartmentOutside{i}=modelSBML.compartment(i).outside;
0095             else
0096                 compartmentOutside{i}=modelSBML.compartment(i).outside(3:end);
0097             end
0098         else
0099             compartmentOutside{i}='';
0100         end
0101     else
0102         compartmentOutside{i}=[];
0103     end
0104     
0105     if isfield(modelSBML.compartment(i),'annotation')
0106         compartmentMiriams{i}=parseMiriam(modelSBML.compartment(i).annotation);
0107     else
0108         compartmentMiriams{i}=[];
0109     end
0110 end
0111 
0112 %If there are no compartment names then use compartment id as name
0113 if all(cellfun(@isempty,compartmentNames))
0114     compartmentNames=compartmentIDs;
0115 end
0116 
0117 %Retrieve info on metabolites, genes, complexes
0118 metaboliteNames={};
0119 metaboliteIDs={};
0120 metaboliteCompartments={};
0121 metaboliteUnconstrained=[];
0122 metaboliteFormula={};
0123 metaboliteInChI={};
0124 metaboliteMiriams={};
0125 
0126 geneNames={};
0127 geneIDs={};
0128 geneMiriams={};
0129 geneShortNames={};
0130 complexIDs={};
0131 complexNames={};
0132 
0133 %If the file is not a COBRA Toolbox model. According to the format
0134 %specified in the yeast consensus model both metabolites and genes are a
0135 %type of 'species'. The metabolites have names starting with 'M_' and genes
0136 %with 'E_'.
0137 if isCOBRA==false
0138     for i=1:numel(modelSBML.species) 
0139         if strcmpi(modelSBML.species(i).id(1:2),'M_')
0140             metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name;
0141             metaboliteIDs{numel(metaboliteIDs)+1,1}=modelSBML.species(i).id(3:end);
0142             metaboliteCompartments{numel(metaboliteCompartments)+1,1}=modelSBML.species(i).compartment(3:end);
0143             metaboliteUnconstrained(numel(metaboliteUnconstrained)+1,1)=modelSBML.species(i).boundaryCondition;
0144 
0145             %For each metabolite retrieve the formula and the InChI code if
0146             %available
0147             %First add the InChI code and the formula from the InChI. This
0148             %allows for overwriting the formula by setting the actual formula
0149             %field
0150             if ~isempty(modelSBML.species(i).annotation)
0151                 %Get the formula if available
0152                 startString='>InChI=';
0153                 endString='</in:inchi>';
0154                 formStart=strfind(modelSBML.species(i).annotation,startString);
0155                 if ~isempty(formStart)
0156                     formEnd=strfind(modelSBML.species(i).annotation,endString);
0157                     formEndIndex=find(formEnd>formStart, 1 );
0158                     formula=modelSBML.species(i).annotation(formStart+numel(startString):formEnd(formEndIndex)-1);
0159                     metaboliteInChI{numel(metaboliteInChI)+1,1}=formula;
0160 
0161                     %The composition is most often present between the first
0162                     %and second "/" in the model. In some simple molecules,
0163                     %such as salts, there is no second "/". The formula is then
0164                     %assumed to be to the end of the string
0165                     compositionIndexes=strfind(formula,'/');
0166                     if numel(compositionIndexes)>1
0167                         metaboliteFormula{numel(metaboliteFormula)+1,1}=...
0168                             formula(compositionIndexes(1)+1:compositionIndexes(2)-1);
0169                     else
0170                         if numel(compositionIndexes)==1
0171                             %Probably a simple molecule which can have only one
0172                             %conformation
0173                             metaboliteFormula{numel(metaboliteFormula)+1,1}=...
0174                             formula(compositionIndexes(1)+1:numel(formula));
0175                         else
0176                             metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0177                         end
0178                     end
0179                 else
0180                     metaboliteInChI{numel(metaboliteInChI)+1,1}='';
0181                     metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0182                 end
0183                 
0184                 %Get Miriam info
0185                 metMiriam=parseMiriam(modelSBML.species(i).annotation);
0186                 metaboliteMiriams{numel(metaboliteMiriams)+1,1}=metMiriam;
0187             else
0188                 metaboliteInChI{numel(metaboliteInChI)+1,1}='';
0189                 metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0190                 metaboliteMiriams{numel(metaboliteMiriams)+1,1}=[];
0191             end
0192 
0193             if ~isempty(modelSBML.species(i).notes)
0194                 %Get the formula if available
0195                 startString='FORMULA:';
0196                 endString='</';
0197                 formStart=strfind(modelSBML.species(i).notes,startString);
0198                 if ~isempty(formStart)
0199                     formEnd=strfind(modelSBML.species(i).notes,endString);
0200                     formEndIndex=find(formEnd>formStart, 1 );
0201                     formula=strtrim(modelSBML.species(i).notes(formStart+numel(startString):formEnd(formEndIndex)-1));
0202                     metaboliteFormula{numel(metaboliteFormula),1}=formula;
0203                 end
0204             end
0205         end
0206 
0207         if strcmpi(modelSBML.species(i).id(1:2),'E_')
0208             geneNames{numel(geneNames)+1,1}=modelSBML.species(i).name;
0209             
0210             %The "E_" is included in the ID. This is because it's only used
0211             %internally in this file and it makes the matching a little
0212             %smoother
0213             geneIDs{numel(geneIDs)+1,1}=modelSBML.species(i).id;
0214             
0215             %Get Miriam structure
0216             if isfield(modelSBML.species(i),'annotation')
0217                 %Get Miriam info
0218                 geneMiriam=parseMiriam(modelSBML.species(i).annotation);
0219                 geneMiriams{numel(geneMiriams)+1,1}=geneMiriam;
0220             else
0221                 geneMiriams{numel(geneMiriams)+1,1}=[];
0222             end
0223             
0224             %Protein short names (for example ERG10) are saved as SHORT
0225             %NAME: NAME in the notes-section of metabolites for the "new
0226             %format" and as PROTEIN_ASSOCIATION for each reaction in COBRA
0227             %Toolbox format. For now only the SHORT NAME is loaded, and no
0228             %mapping takes place
0229             if ~isempty(modelSBML.species(i).notes)
0230                 %Get the short name if available
0231                 startString='SHORT NAME:';
0232                 endString='</';
0233                 shortStart=strfind(modelSBML.species(i).notes,startString);
0234                 if ~isempty(shortStart)
0235                     shortEnd=strfind(modelSBML.species(i).notes,endString);
0236                     shortEndIndex=find(shortEnd>shortStart, 1 );
0237                     shortName=strtrim(modelSBML.species(i).notes(shortStart+numel(startString):shortEnd(shortEndIndex)-1));
0238                     geneShortNames{numel(geneShortNames)+1,1}=shortName;
0239                 else
0240                     geneShortNames{numel(geneShortNames)+1,1}='';
0241                 end
0242             else
0243                 geneShortNames{numel(geneShortNames)+1,1}='';
0244             end
0245         end
0246         
0247         %If it's a complex keep the ID and name
0248         if strcmpi(modelSBML.species(i).id(1:3),'Cx_')
0249             complexIDs=[complexIDs;modelSBML.species(i).id];
0250             complexNames=[complexNames;modelSBML.species(i).name];
0251         end        
0252     end
0253     
0254 %If it's a COBRA Toolbox file
0255 else
0256     for i=1:numel(modelSBML.species) 
0257         %The metabolite names are assumed to be M_NAME_COMPOSITION or
0258         %_NAME_COMPOSITION or NAME_COMPOSITION or NAME
0259         underscoreIndex=strfind(modelSBML.species(i).name,'_');
0260 
0261         %Skip the first character if it's an underscore
0262         if any(underscoreIndex)
0263             if underscoreIndex(1)==1
0264                 metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(2:max(underscoreIndex)-1);
0265             else
0266                 %If the second character is an underscore than check if the
0267                 %first is "M".
0268                 if underscoreIndex(1)==2
0269                     if modelSBML.species(i).name(1)=='M'
0270                         metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(3:max(underscoreIndex)-1);
0271                     else
0272                         %If not, then use the full name
0273                         metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(1:max(underscoreIndex)-1);
0274                     end
0275                 else
0276                     %Use the full name
0277                     metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name(1:max(underscoreIndex)-1);
0278                 end
0279             end
0280         else
0281             %Use the full name
0282             metaboliteNames{numel(metaboliteNames)+1,1}=modelSBML.species(i).name;
0283         end
0284         
0285         metaboliteIDs{numel(metaboliteIDs)+1,1}=modelSBML.species(i).id(3:numel(modelSBML.species(i).id));
0286         metaboliteCompartments{numel(metaboliteCompartments)+1,1}=modelSBML.species(i).compartment;
0287         
0288         %I think that COBRA doesn't set the boundary condition, but rather
0289         %uses name_b. Check for either
0290         metaboliteUnconstrained(numel(metaboliteUnconstrained)+1,1)=modelSBML.species(i).boundaryCondition;
0291         if strcmp(metaboliteIDs{end}(max(end-1,1):end),'_b')
0292             metaboliteUnconstrained(end)=1;
0293         end
0294         
0295         %Get the formula
0296         if max(underscoreIndex)<length(modelSBML.species(i).name)
0297             metaboliteFormula{numel(metaboliteFormula)+1,1}=modelSBML.species(i).name(max(underscoreIndex)+1:length(modelSBML.species(i).name));
0298         else
0299             metaboliteFormula{numel(metaboliteFormula)+1,1}='';
0300         end
0301         
0302         %The newer COBRA version sometimes has composition information in
0303         %the notes instead
0304         if ~isempty(modelSBML.species(i).notes)
0305             %Get the formula if available
0306             startString='FORMULA:';
0307             endString='</';
0308             formStart=strfind(modelSBML.species(i).notes,startString);
0309             if ~isempty(formStart)
0310                 formEnd=strfind(modelSBML.species(i).notes,endString);
0311                 formEndIndex=find(formEnd>formStart, 1 );
0312                 formula=strtrim(modelSBML.species(i).notes(formStart+numel(startString):formEnd(formEndIndex)-1));
0313                 metaboliteFormula{numel(metaboliteFormula),1}=formula;
0314             end
0315         end
0316     end
0317 end
0318 
0319 %Retrieve info on reactions
0320 reactionNames=cell(numel(modelSBML.reaction),1);
0321 reactionIDs=cell(numel(modelSBML.reaction),1);
0322 subsystems=cell(numel(modelSBML.reaction),1);
0323 subsystems(:,:)=cellstr('');
0324 eccodes=cell(numel(modelSBML.reaction),1);
0325 eccodes(:,:)=cellstr('');
0326 grRules=cell(numel(modelSBML.reaction),1);
0327 grRules(:,:)=cellstr('');
0328 rxnMiriams=cell(numel(modelSBML.reaction),1);
0329 reactionReversibility=zeros(numel(modelSBML.reaction),1);
0330 reactionUB=zeros(numel(modelSBML.reaction),1);
0331 reactionLB=zeros(numel(modelSBML.reaction),1);
0332 reactionObjective=zeros(numel(modelSBML.reaction),1);
0333 
0334 %Construct the stoichiometric matrix while the reaction info is read
0335 S=sparse(numel(metaboliteIDs),numel(modelSBML.reaction));
0336 
0337 %This is for collecting all genes before getting a unique list (only used
0338 %in the COBRA format). The reason is to avoid too many calls to strmatch
0339 tempGeneList={};
0340 
0341 counter=0;
0342 for i=1:numel(modelSBML.reaction)
0343     
0344     %Check that the reaction doesn't produce a complex and nothing else.
0345     %If so, then jump to the next reaction. This is because I get the
0346     %genes for complexes from the names and not from the reactions that
0347     %create them. This only applies to the non-COBRA format.
0348     if numel(modelSBML.reaction(i).product)==1
0349         if strcmp(modelSBML.reaction(i).product(1).species(1:3),'Cx_')==true
0350             continue;
0351         end
0352     end
0353     
0354     %It didn't look like a gene complex-forming reaction
0355     counter=counter+1;
0356     
0357     if isCOBRA && numel(modelSBML.reaction(i).name)>2
0358         %In COBRA the reaction names starts with "R_" but I check to be
0359         %sure
0360         if strcmp(modelSBML.reaction(i).name(1:2),'R_')
0361             reactionNames{counter}=modelSBML.reaction(i).name(3:end);
0362         else
0363             reactionNames{counter}=modelSBML.reaction(i).name;
0364         end
0365     else
0366         reactionNames{counter}=modelSBML.reaction(i).name;
0367     end
0368     
0369     %Assumes that the ID starts with "R_"
0370     reactionIDs{counter}=modelSBML.reaction(i).id(3:end);
0371     reactionReversibility(counter)=modelSBML.reaction(i).reversible;
0372     
0373     %The order of these parameters should not be hard coded
0374     if isfield(modelSBML.reaction(i).kineticLaw,'parameter')
0375         reactionLB(counter)=modelSBML.reaction(i).kineticLaw.parameter(1).value;
0376         reactionUB(counter)=modelSBML.reaction(i).kineticLaw.parameter(2).value;
0377         reactionObjective(counter)=modelSBML.reaction(i).kineticLaw.parameter(3).value;
0378     else
0379         if reactionReversibility(counter)==true
0380             reactionLB(counter)=-inf;
0381         else
0382             reactionLB(counter)=0;
0383         end
0384         reactionUB(counter)=inf;
0385         reactionObjective(counter)=0;
0386     end
0387     
0388     %Find the associated gene if available
0389     if isCOBRA==false
0390         if isfield(modelSBML.reaction(i),'modifier')
0391             if ~isempty(modelSBML.reaction(i).modifier)
0392                 rules='';
0393                 for j=1:numel(modelSBML.reaction(i).modifier)
0394                     modifier=modelSBML.reaction(i).modifier(j).species;
0395                     if ~isempty(modifier)
0396                         if strfind(modifier,'E_')
0397                             index=find(strcmp(modifier,geneIDs));
0398                             %This should be unique and in the geneIDs list, otherwise
0399                             %something is wrong
0400                             if numel(index)~=1
0401                                throw(MException('',['Could not get the gene association data from reaction ' reactionIDs{i}])); 
0402                             end
0403                             %Add the association
0404                             %rxnGeneMat(i,index)=1;
0405                             if ~isempty(rules)
0406                                 rules=[rules ' or (' geneNames{index} ')'];
0407                             else
0408                                 rules=['(' geneNames{index} ')'];
0409                             end
0410                         else
0411                            %It seems to be a complex. Add the corresponding
0412                            %genes from the name of the complex (not the
0413                            %reaction that creates it)
0414                            index=find(strcmp(modifier,complexIDs));
0415                            if numel(index)==1
0416                                if ~isempty(rules)
0417                                     rules=[rules ' or (' strrep(complexNames{index},':',' and ') ')'];
0418                                 else
0419                                     rules=['(' strrep(complexNames{index},':',' and ') ')'];
0420                                end
0421                            else
0422                               %Could not find a complex
0423                               throw(MException('',['Could not get the gene association data from reaction ' reactionIDs{i}]));
0424                            end
0425                         end
0426                     end
0427                 end
0428                 grRules{counter}=rules;
0429             end
0430         end
0431     else
0432         %Get gene association for COBRA Toolbox models. The genes are added
0433         %here as well as the associations. Gene complexes are ok, but only
0434         %if they are on the form (A and B and...)
0435         if ~isempty(modelSBML.reaction(i).notes)
0436             startString='GENE_ASSOCIATION:';
0437             endString='</';
0438             geneStart=strfind(modelSBML.reaction(i).notes,startString);
0439             if isempty(geneStart)
0440                 startString='GENE ASSOCIATION:';
0441                 geneStart=strfind(modelSBML.reaction(i).notes,startString);
0442             end
0443             if ~isempty(geneStart)
0444                 geneEnd=strfind(modelSBML.reaction(i).notes,endString);
0445                 geneEndIndex=find(geneEnd>geneStart, 1 );
0446                 geneAssociation=strtrim(modelSBML.reaction(i).notes(geneStart+numel(startString):geneEnd(geneEndIndex)-1));          
0447                 if ~isempty(geneAssociation)
0448                     %This adds the grRules. The gene list and rxnGeneMat
0449                     %are created later
0450                     grRules{counter}=geneAssociation;
0451                 end
0452             end
0453         end
0454     end
0455     
0456     %Add subsystems
0457     if ~isempty(modelSBML.reaction(i).notes)
0458         %Get the subsystem if available
0459         startString='SUBSYSTEM:';
0460         endString='</';
0461         subStart=strfind(modelSBML.reaction(i).notes,startString);
0462         if ~isempty(subStart)
0463             subEnd=strfind(modelSBML.reaction(i).notes,endString);
0464             subEndIndex=find(subEnd>subStart, 1 );
0465             subsystem=strtrim(modelSBML.reaction(i).notes(subStart+numel(startString):subEnd(subEndIndex)-1));
0466             subsystems{counter}=subsystem;
0467         end
0468     end
0469     
0470     %Get ec-codes
0471     flagEmpty=false;
0472     if isCOBRA==false
0473         if ~isempty(modelSBML.reaction(i).annotation)
0474             searchString=modelSBML.reaction(i).annotation;
0475             startString='urn:miriam:ec-code:';
0476             endString='"';
0477         else
0478             flagEmpty=true;
0479         end
0480     else
0481         if ~isempty(modelSBML.reaction(i).notes)
0482             searchString=modelSBML.reaction(i).notes;
0483             startString='PROTEIN_CLASS:';
0484             endString='</';
0485         else
0486             flagEmpty=true;
0487         end
0488     end
0489     
0490     if flagEmpty==false
0491         ecStart=strfind(searchString,startString);
0492         if ~isempty(ecStart)
0493             ecEnd=strfind(searchString,endString);
0494             ecEndIndex=find(ecEnd>ecStart, 1 );
0495             eccode=strtrim(searchString(ecStart+numel(startString):ecEnd(ecEndIndex)-1));
0496             eccodes{counter}=eccode;
0497         end
0498     end 
0499  
0500     %Get other Miriam fields. This may include for example database indexes
0501     %to organism-specific databases. EC-codes are supported by the COBRA
0502     %Toolbox format and are therefore loaded separately
0503     if isCOBRA==false
0504         miriamStruct=parseMiriam(modelSBML.reaction(i).annotation);
0505         rxnMiriams{counter}=miriamStruct; 
0506     end
0507     
0508     %Add all reactants
0509     for j=1:numel(modelSBML.reaction(i).reactant)
0510        %Get the index of the metabolite in metaboliteIDs. External
0511        %metabolites will be removed at a later stage
0512        %Assumes that all metabolites start with "M_"
0513        metIndex=find(strcmp(modelSBML.reaction(i).reactant(j).species(3:end),metaboliteIDs),1);
0514        if isempty(metIndex)
0515             throw(MException('',['Could not find metabolite ' modelSBML.reaction(i).reactant(j).species(3:end) ' in reaction ' reactionIDs{counter}])); 
0516        end
0517        S(metIndex,counter)=S(metIndex,counter)+modelSBML.reaction(i).reactant(j).stoichiometry*-1;
0518     end
0519     
0520     %Add all products
0521     for j=1:numel(modelSBML.reaction(i).product)
0522        %Get the index of the metabolite in metaboliteIDs.
0523        %Assumes that all metabolites start with "M_"
0524        metIndex=find(strcmp(modelSBML.reaction(i).product(j).species(3:end),metaboliteIDs),1);
0525        if isempty(metIndex)
0526             throw(MException('',['Could not find metabolite ' modelSBML.reaction(i).reactant(j).species(3:end) ' in reaction ' reactionIDs{counter}])); 
0527        end
0528        S(metIndex,counter)=S(metIndex,counter)+modelSBML.reaction(i).product(j).stoichiometry;
0529     end
0530 end
0531 
0532 %Shrink the structures if complex-forming reactions had to be skipped
0533 reactionNames=reactionNames(1:counter);
0534 reactionIDs=reactionIDs(1:counter);
0535 subsystems=subsystems(1:counter);
0536 eccodes=eccodes(1:counter);
0537 grRules=grRules(1:counter);
0538 rxnMiriams=rxnMiriams(1:counter);
0539 reactionReversibility=reactionReversibility(1:counter);
0540 reactionUB=reactionUB(1:counter);
0541 reactionLB=reactionLB(1:counter);
0542 reactionObjective=reactionObjective(1:counter);
0543 S=S(:,1:counter);
0544 
0545 model.description=modelSBML.name;
0546 model.id=modelSBML.id;
0547 model.rxns=reactionIDs;
0548 model.mets=metaboliteIDs;
0549 model.S=S;
0550 model.lb=reactionLB;
0551 model.ub=reactionUB;
0552 model.rev=reactionReversibility;
0553 model.c=reactionObjective;
0554 model.b=zeros(numel(metaboliteIDs),1);
0555 model.comps=compartmentIDs;
0556 model.compNames=compartmentNames;
0557 
0558 if any(~cellfun(@isempty,compartmentOutside))
0559     model.compOutside=compartmentOutside;
0560 end
0561 
0562 model.rxnNames=reactionNames;
0563 model.metNames=metaboliteNames;
0564 
0565 %Match the compartments for metabolites
0566 [I J]=ismember(metaboliteCompartments,model.comps);
0567 model.metComps=J;
0568 
0569 %If any genes have been loaded (only for the new format)
0570 if ~isempty(geneNames)
0571     model.genes=geneNames;
0572     model.rxnGeneMat=getGeneMat(grRules,geneNames);
0573     model.grRules=grRules;
0574 else
0575     if ~isempty(grRules)
0576        grRules=strrep(grRules,' AND ',' and ');
0577        grRules=strrep(grRules,' OR ',' or ');
0578        %In the non-COBRA version genes are surrounded by parenthesis even
0579        %if they are the only gene. Also, only single spaces are used
0580        %between genes. I'm pretty sure this is compatible with COBRA Toolbox so I
0581        %change it to be the same here.
0582        grRules=strrep(grRules,'  ',' ');
0583        grRules=strrep(grRules,'((','(');
0584        grRules=strrep(grRules,'))',')');
0585        grRules=strrep(grRules,'( ','(');
0586        grRules=strrep(grRules,' )',')');
0587        grRules=strrep(grRules,') or (','*%%%%*');
0588        grRules=strrep(grRules,' or ',') or (');
0589        grRules=strrep(grRules,'*%%%%*',') or (');
0590        
0591        %Not very neat, but add parenthesis if missing
0592        for i=1:numel(grRules)
0593           if any(grRules{i})
0594               if ~strcmp(grRules{i}(1),'(')
0595                   grRules{i}=['(' grRules{i} ')'];
0596               end
0597           end
0598        end
0599        [rxnGeneMat, genes]=getGeneMat(grRules);
0600        model.rxnGeneMat=rxnGeneMat;
0601        model.genes=genes;
0602        model.grRules=grRules;
0603     end
0604 end
0605 
0606 %If any InChIs have been loaded
0607 if any(~cellfun(@isempty,metaboliteInChI))
0608     model.inchis=metaboliteInChI;
0609 end
0610 
0611 %If any formulas have been loaded
0612 if any(~cellfun(@isempty,metaboliteFormula))
0613     model.metFormulas=metaboliteFormula;
0614 end
0615 
0616 %If any gene short names have been loaded
0617 if any(~cellfun(@isempty,geneShortNames))
0618     model.geneShortNames=geneShortNames;
0619 end
0620 
0621 %If any Miriam strings for compartments have been loaded
0622 if any(~cellfun(@isempty,compartmentMiriams))
0623     model.compMiriams=compartmentMiriams;
0624 end
0625 
0626 %If any Miriam strings for metabolites have been loaded
0627 if any(~cellfun(@isempty,metaboliteMiriams))
0628     model.metMiriams=metaboliteMiriams;
0629 end
0630 
0631 %If any subsystems have been loaded
0632 if any(~cellfun(@isempty,subsystems))
0633     model.subSystems=subsystems;
0634 end
0635 
0636 %If any ec-codes have been loaded
0637 if any(~cellfun(@isempty,eccodes))
0638     model.eccodes=eccodes;
0639 end
0640 
0641 %If any Miriam strings for reactions have been loaded
0642 if any(~cellfun(@isempty,rxnMiriams))
0643     model.rxnMiriams=rxnMiriams;
0644 end
0645 
0646 %If any Miriam strings for genes have been loaded
0647 if any(~cellfun(@isempty,geneMiriams))
0648     model.geneMiriams=geneMiriams;
0649 end
0650 
0651 model.unconstrained=metaboliteUnconstrained;
0652 
0653 if removeExcMets==true
0654     model=simplifyModel(model);
0655 end
0656 end
0657 
0658 function [rxnGeneMat, matchGenes]=getGeneMat(grRules,matchGenes)
0659 %Constructs the rxnGeneMat matrix and the cell array with gene names from
0660 %the grRules. Uses the genes in the order defined by matchGenes if supplied. No
0661 %checks are made here since that should have been made before.
0662 
0663 if nargin<2
0664     matchGenes={};
0665 end
0666 
0667 %Assume that everything that isn't a paranthesis, " AND " or " or " is a
0668 %gene name
0669 genes=strrep(grRules,'(','');
0670 genes=strrep(genes,')','');
0671 genes=strrep(genes,' or ',' ');
0672 genes=strrep(genes,' and ',' ');
0673 [crap crap crap crap crap crap genes]=regexp(genes,' ');
0674 
0675 if isempty(matchGenes)
0676     allNames={};
0677     for i=1:numel(genes)
0678         allNames=[allNames genes{i}];
0679     end
0680     matchGenes=unique(allNames)';
0681     
0682     %Remove the empty element if present
0683     if isempty(matchGenes{1})
0684         matchGenes(1)=[];
0685     end
0686 end
0687 
0688 %Create the matrix
0689 rxnGeneMat=sparse(numel(genes),numel(matchGenes));
0690 
0691 for i=1:numel(genes)
0692     if ~isempty(genes{i})
0693         for j=1:numel(genes{i})
0694             if ~isempty(genes{i}{j})
0695                 index=find(strcmp(genes{i}{j},matchGenes));
0696                 if numel(index)==1
0697                     rxnGeneMat(i,index)=1;
0698                 else
0699                     throw(MException('',['The gene ' genes{i}{j} ' could not be matched to a gene in the gene list.'])); 
0700                 end
0701             end
0702         end
0703     end
0704 end
0705 end
0706 
0707 function miriamStruct=parseMiriam(searchString)
0708 %Gets the names and values of Miriam-string. Nothing fancy at all, just to
0709 %prevent using the same code for metabolites, genes, and reactions
0710 
0711 if ~isempty(searchString)    
0712     startString='urn:miriam:';
0713     midString=':';
0714     endString='"';
0715     startIndexes=strfind(searchString,startString);
0716     midIndexes=strfind(searchString,midString);
0717     endIndexes=strfind(searchString,endString);
0718     miriamStruct=[];
0719 
0720     if ~isempty(startIndexes)
0721         counter=0;
0722         for j=1:numel(startIndexes)
0723             endIndex=find(endIndexes>startIndexes(j)+numel(startString), 1 );
0724             
0725             midIndex=find(midIndexes>startIndexes(j)+numel(startString) & midIndexes<endIndexes(endIndex),1);
0726             
0727             %It was like this before and I don't understand why!
0728             %midIndex=find(midIndexes<endIndexes(endIndex),1,'last');
0729             
0730             miriam=searchString(startIndexes(j)+numel(startString):midIndexes(midIndex)-1);
0731 
0732             %Construct the struct
0733             if ~strcmpi(miriam,'ec-code')
0734                 counter=counter+1;
0735                 miriamStruct.name{counter,1}=miriam;
0736                 if any(midIndex)
0737                     miriamStruct.value{counter,1}=searchString(midIndexes(midIndex)+1:endIndexes(endIndex)-1);
0738                 else
0739                     %This is if there is no miriam type defined, but that
0740                     %there still is some value defined
0741                     miriamStruct.value{counter,1}=searchString(startIndexes(j)+numel(startString):endIndexes(endIndex)-1);
0742                 end
0743             end
0744         end
0745     end
0746 else
0747     miriamStruct=[];
0748 end
0749 end

Generated on Tue 16-Jul-2013 21:50:02 by m2html © 2005