Home > RAVEN > getRxnsFromKEGG.m

getRxnsFromKEGG

PURPOSE ^

getRxnsFromKEGG

SYNOPSIS ^

function model=getRxnsFromKEGG(keggPath,keepUndefinedStoich,keepIncomplete, keepGeneral)

DESCRIPTION ^

 getRxnsFromKEGG
   Retrieves information on all reactions stored in KEGG database

   keggPath            this function reads data from a local FTP dump of 
                       the KEGG database. keggPath is the pathway to the
                       root of the database
   keepUndefinedStoich include reactions in the form n A <=> n+1 A. These
                       will be dealt with as two separate metabolites 
                       (opt, default true)
   keepIncomplete      include reactions which have been labelled as
                       "incomplete", "erroneous" or "unclear" (opt,
                       default true)
   keepGeneral         include reactions which have been labelled as
                       "general reaction". These are reactions on the form
                       "an aldehyde <=> an alcohol", and are therefore
                       unsuited for modelling purposes. Note that not all
                       reactions have this type of annotation, and the
                       script will therefore not be able to remove all
                       such reactions (opt, default false)

   model     a model structure generated from the database. The following
             fields are filled
             id:             'KEGG'
             description:    'Automatically generated from KEGG database'
             rxns:           KEGG reaction ids
             rxnNames:       Name for each reaction entry
             mets:           KEGG compound ids. If the equations use
                             stoichiometry such as ID(n+1) then the whole 
                             expression is saved as the id
             eccodes:        Corresponding ec-number if available
             rxnMiriams:     Contains reaction specific information such as
                             KO id and pathways that the reaction is 
                             associated to
             S:              Stoichiometric matrix
             lb:             -1000 for all reactions
             ub:             1000 for all reactions
             rev:            1 for reversible and 0 for irreversible. For
                             reactions present in pathway maps the reversibility 
                             is taken from there
             b:              0 for all metabolites

   Reactions on the form A <=> A + B will not be loaded. If the file
   keggRxns.mat is in the RAVEN directory it will be loaded instead of
   parsing of the KEGG files. If it does not exist it will be saved after
   parsing of the KEGG files. In general, you should remove the
   keggRxns.mat file if you want to rebuild the model structure from a
   newer version of KEGG.

   Usage: model=getRxnsFromKEGG(keggPath,keepUndefinedStoich,keepIncomplete,keepGeneral)

   Rasmus Agren, 2012-12-16

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function model=getRxnsFromKEGG(keggPath,keepUndefinedStoich,keepIncomplete, keepGeneral)
0002 % getRxnsFromKEGG
0003 %   Retrieves information on all reactions stored in KEGG database
0004 %
0005 %   keggPath            this function reads data from a local FTP dump of
0006 %                       the KEGG database. keggPath is the pathway to the
0007 %                       root of the database
0008 %   keepUndefinedStoich include reactions in the form n A <=> n+1 A. These
0009 %                       will be dealt with as two separate metabolites
0010 %                       (opt, default true)
0011 %   keepIncomplete      include reactions which have been labelled as
0012 %                       "incomplete", "erroneous" or "unclear" (opt,
0013 %                       default true)
0014 %   keepGeneral         include reactions which have been labelled as
0015 %                       "general reaction". These are reactions on the form
0016 %                       "an aldehyde <=> an alcohol", and are therefore
0017 %                       unsuited for modelling purposes. Note that not all
0018 %                       reactions have this type of annotation, and the
0019 %                       script will therefore not be able to remove all
0020 %                       such reactions (opt, default false)
0021 %
0022 %   model     a model structure generated from the database. The following
0023 %             fields are filled
0024 %             id:             'KEGG'
0025 %             description:    'Automatically generated from KEGG database'
0026 %             rxns:           KEGG reaction ids
0027 %             rxnNames:       Name for each reaction entry
0028 %             mets:           KEGG compound ids. If the equations use
0029 %                             stoichiometry such as ID(n+1) then the whole
0030 %                             expression is saved as the id
0031 %             eccodes:        Corresponding ec-number if available
0032 %             rxnMiriams:     Contains reaction specific information such as
0033 %                             KO id and pathways that the reaction is
0034 %                             associated to
0035 %             S:              Stoichiometric matrix
0036 %             lb:             -1000 for all reactions
0037 %             ub:             1000 for all reactions
0038 %             rev:            1 for reversible and 0 for irreversible. For
0039 %                             reactions present in pathway maps the reversibility
0040 %                             is taken from there
0041 %             b:              0 for all metabolites
0042 %
0043 %   Reactions on the form A <=> A + B will not be loaded. If the file
0044 %   keggRxns.mat is in the RAVEN directory it will be loaded instead of
0045 %   parsing of the KEGG files. If it does not exist it will be saved after
0046 %   parsing of the KEGG files. In general, you should remove the
0047 %   keggRxns.mat file if you want to rebuild the model structure from a
0048 %   newer version of KEGG.
0049 %
0050 %   Usage: model=getRxnsFromKEGG(keggPath,keepUndefinedStoich,keepIncomplete,keepGeneral)
0051 %
0052 %   Rasmus Agren, 2012-12-16
0053 %
0054 
0055 %NOTE: This is how one entry looks in the file
0056 
0057 % ENTRY       R00001                      Reaction
0058 % NAME        Polyphosphate polyphosphohydrolase
0059 % DEFINITION  Polyphosphate + n H2O <=> (n+1) Oligophosphate
0060 % EQUATION    C00890 + n C00001 <=> (n+1) C02174
0061 % ENZYME      3.6.1.10
0062 % ///
0063 
0064 %The file is not tab-delimited. Instead each label is 12 characters
0065 %(except for '///')
0066 
0067 if nargin<2
0068     keepUndefinedStoich=true;
0069 end
0070 if nargin<3
0071     keepIncomplete=true;
0072 end
0073 if nargin<4
0074     keepGeneral=true;
0075 end
0076 
0077 %Check if the reactions have been parsed before and saved. If so, load the
0078 %model.
0079 [ST I]=dbstack('-completenames');
0080 ravenPath=fileparts(ST(I).file);
0081 rxnsFile=fullfile(ravenPath,'kegg','keggRxns.mat');
0082 if exist(rxnsFile, 'file')
0083     fprintf(['NOTE: Importing KEGG reactions from ' strrep(rxnsFile,'\','/') '.\n']);
0084     load(rxnsFile);
0085 else
0086     %Download required files from KEGG if it doesn't exist in the directory
0087     downloadKEGG(keggPath);
0088     
0089     %Add new functionality in the order specified in models
0090     model.id='KEGG';
0091     model.description='Automatically generated from KEGG database';
0092 
0093     %Preallocate memory for 10000 reactions
0094     model.rxns=cell(10000,1);
0095     model.rxnNames=cell(10000,1);
0096     model.eccodes=cell(10000,1);
0097     model.subSystems=cell(10000,1);
0098     model.rxnMiriams=cell(10000,1);
0099     equations=cell(10000,1); %Temporarily stores the equations
0100     isIncomplete=false(10000,1);
0101     isGeneral=false(10000,1);
0102 
0103     %First load information on reaction ID, reaction name, KO, pathway, and ec-number
0104     fid = fopen(fullfile(keggPath,'reaction'), 'r');
0105 
0106     %Keeps track of how many reactions that have been added
0107     rxnCounter=0;
0108 
0109     %Loop through the file
0110     orthology=false;
0111     pathway=false;
0112     while 1
0113       %Get the next line
0114       tline = fgetl(fid);
0115 
0116       %Abort at end of file
0117       if ~ischar(tline)
0118           break;
0119       end
0120 
0121       %Skip '///'
0122       if numel(tline)<12
0123           continue;
0124       end
0125 
0126       %Check if it's a new reaction
0127       if strcmp(tline(1:12),'ENTRY       ')
0128           rxnCounter=rxnCounter+1;
0129 
0130           %Add empty strings where there should be such
0131           model.rxnNames{rxnCounter}='';
0132           model.eccodes{rxnCounter}='';
0133           model.subSystems{rxnCounter}='';
0134           equations{rxnCounter}='';
0135 
0136           %Add reaction ID (always 6 characters)
0137           model.rxns{rxnCounter}=tline(13:18);
0138           orthology=false;
0139           pathway=false;
0140       end
0141 
0142       %Add name
0143       if strcmp(tline(1:12),'NAME        ')
0144           model.rxnNames{rxnCounter}=tline(13:end);
0145       end
0146       
0147       %Add whether the comment includes "incomplete", "erroneous" or "unclear"
0148       if strcmp(tline(1:12),'COMMENT     ')
0149           %Read all text until '///' or 'RPAIR'
0150           commentText=tline(13:end);
0151           while 1
0152             tline = fgetl(fid);
0153             if ~strcmp(tline(1:3),'///') && ~strcmp(tline(1:3),'RPA') && ~strcmp(tline(1:3),'ENZ')
0154                commentText=strcat(commentText,' ',tline);
0155             else
0156                 break;
0157             end
0158           end
0159           upperLine=upper(commentText);
0160           if any(strfind(upperLine,'INCOMPLETE')) || any(strfind(upperLine,'ERRONEOUS')) || any(strfind(upperLine,'UNCLEAR'))
0161             isIncomplete(rxnCounter)=true;
0162           end
0163           if any(strfind(upperLine,'GENERAL REACTION')==1) %It should start this way
0164             isGeneral(rxnCounter)=true;
0165           end
0166               
0167           %Go to next iteration if it is '///'
0168           if numel(tline)<12
0169               continue;
0170           end
0171       end
0172       
0173       %Add ec-number
0174       if strcmp(tline(1:12),'ENZYME      ')
0175           model.eccodes{rxnCounter}=tline(13:end);
0176       end
0177       if numel(tline)>8
0178         if strcmp(tline(1:9),'REFERENCE')
0179             pathway=false;
0180             orthology=false;
0181         end
0182       end
0183 
0184       %Add KO-ids
0185       if numel(tline)>16
0186           if strcmp(tline(1:16),'ORTHOLOGY   KO: ') || strcmp(tline(1:16),'            KO: ') || strcmp(tline(1:12),'ORTHOLOGY   ') || orthology==true
0187               pathway=false;
0188               %Check if KO has been added already (each reaction may belong to
0189               %several)
0190               if isstruct(model.rxnMiriams{rxnCounter})
0191                   addToIndex=numel(model.rxnMiriams{rxnCounter}.name)+1;
0192               else
0193                   addToIndex=1;
0194               end
0195 
0196               tempStruct=model.rxnMiriams{rxnCounter};
0197               tempStruct.name{addToIndex,1}='urn:miriam:kegg.ko'; %WARNING: This is not a real MIRIAM identifier
0198               if strcmp(tline(13:16),'KO:') %This is in the old version
0199                 tempStruct.value{addToIndex,1}=tline(17:22);
0200               else
0201                   if strcmp(tline(1:12),'ORTHOLOGY   ')
0202                     %This means that it found one KO in the new format and that
0203                     %subsequent lines might be other KOs
0204                     orthology=true;
0205                   end
0206                   tempStruct.value{addToIndex,1}=tline(13:18);  
0207               end
0208               model.rxnMiriams{rxnCounter}=tempStruct;
0209           end
0210       end
0211 
0212       %Add pathways
0213       if numel(tline)>18
0214           if strcmp(tline(1:18),'PATHWAY     PATH: ') || strcmp(tline(1:18),'            PATH: ') || strcmp(tline(1:12),'PATHWAY     ') || pathway==true
0215               %Check if annotation has been added already
0216               if isstruct(model.rxnMiriams{rxnCounter})
0217                   addToIndex=numel(model.rxnMiriams{rxnCounter}.name)+1;
0218               else
0219                   addToIndex=1;
0220               end
0221 
0222               tempStruct=model.rxnMiriams{rxnCounter};
0223               tempStruct.name{addToIndex,1}='urn:miriam:kegg.pathway';
0224               %If it's the old version
0225               if strcmp(tline(14:17),'PATH:')
0226                 tempStruct.value{addToIndex,1}=tline(19:25);
0227               else
0228                 %If it's the new version
0229                 tempStruct.value{addToIndex,1}=tline(13:19);
0230                 pathway=true;
0231               end
0232 
0233               %Don't do this if the pathway is rn01100 (Metabolic pathways)
0234               if ~strcmp('rn01100',tempStruct.value{addToIndex,1})
0235                 model.rxnMiriams{rxnCounter}=tempStruct;
0236 
0237                 %Also save the subSystems entry as being the first path found
0238                 if ~any(model.subSystems{rxnCounter})
0239                     if strcmp(tline(14:17),'PATH:')
0240                         model.subSystems{rxnCounter}=tline(28:end);
0241                     else
0242                         model.subSystems{rxnCounter}=tline(22:end);
0243                     end
0244                 end
0245               end
0246           end
0247       end
0248     end
0249 
0250     %Close the file
0251     fclose(fid);
0252 
0253     %This is done here since the the indexes won't match since some reactions
0254     %are removed along the way
0255     isIncomplete=model.rxns(isIncomplete);
0256     isGeneral=model.rxns(isGeneral);
0257 
0258     %If too much space was allocated, shrink the model
0259     model.rxns=model.rxns(1:rxnCounter);
0260     model.rxnNames=model.rxnNames(1:rxnCounter);
0261     model.eccodes=model.eccodes(1:rxnCounter);
0262     equations=equations(1:rxnCounter);
0263     model.rxnMiriams=model.rxnMiriams(1:rxnCounter);
0264     model.subSystems=model.subSystems(1:rxnCounter);
0265 
0266     %Then load the equations from another file. This is because the equations
0267     %are easier to retrieve from there
0268 
0269     %The format is rxnID: equation
0270     %The reactions should have been loaded in the exact same order
0271     fid = fopen(fullfile(keggPath,'reaction.lst'), 'r');
0272 
0273     %Loop through the file
0274     for i=1:rxnCounter
0275       %Get the next line
0276       tline = fgetl(fid);
0277 
0278       equations{i}=tline(9:end);
0279     end
0280 
0281     %Close the file
0282     fclose(fid);
0283 
0284     %Construct the S matrix and list of metabolites
0285     [S mets badRxns]=constructS(equations);
0286     model.S=S;
0287     model.mets=mets;
0288 
0289     %There is some limited evidence for directionality in
0290     %reaction_mapformula.lst. The information there concerns how the reactions
0291     %are drawn in the KEGG maps. If a reaction is irreversible in the same
0292     %direction for all maps, then I consider is irreversible, otherwise
0293     %reversible. Also, not all reactions are present in the maps, so not all
0294     %will have directionality. They will be considered to be reversible.
0295 
0296     %The format is R00005: 00330: C01010 => C00011
0297     %Generate a reversibility structure with the fields:
0298     %rxns: reaction ids
0299     %product: one met id that is a product. This is because the reactions
0300     %might be written in another direction compared to in the reactions.lst
0301     %file
0302     %rev: 1 if reversible, otherwise 0
0303     reversibility.rxns={};
0304     reversibility.product={};
0305     reversibility.rev=[];
0306 
0307     fid = fopen(fullfile(keggPath,'reaction_mapformula.lst'), 'r');
0308     while 1
0309       %Get the next line
0310       tline = fgetl(fid);
0311 
0312       %Abort at end of file
0313       if ~ischar(tline)
0314           break;
0315       end
0316 
0317       rxn=tline(1:6);
0318       prod=tline(end-5:end);
0319       rev=any(strfind(tline,'<=>'));
0320       if isempty(reversibility.rxns)
0321         reversibility.rxns{1}=rxn;
0322         reversibility.product{1}=prod;
0323         reversibility.rev(1)=rev;
0324       else
0325         %Check if the reaction was added before. It's an ordered list, so only
0326         %check the last element
0327         if strcmp(reversibility.rxns(end),rxn)
0328             %If it's reversible in the new reaction or reversible in the old reaction
0329             %then set (keep) to be reversible
0330             if rev==1 || reversibility.rev(end)==1
0331                 reversibility.rev(end)=1;
0332             else
0333                 %This means that the reaction was already loaded, that it was
0334                 %irreversible before and irreversible in the new reaction.
0335                 %However, it could be that they are written in diferent
0336                 %directions. If the product differ, then set to be reversible.
0337                 %This assumes that the reactions are written with the same
0338                 %metabolite as the last one if they are in the same direction.
0339                 if ~strcmp(prod,reversibility.product(end))
0340                     reversibility.rev(end)=1;
0341                 end
0342             end
0343         else
0344             reversibility.rxns=[reversibility.rxns;rxn];
0345             reversibility.product=[reversibility.product;prod];
0346             reversibility.rev=[reversibility.rev;rev];
0347         end
0348       end
0349     end
0350     fclose(fid);
0351 
0352     %Update the reversibility
0353     model.rev=ones(rxnCounter,1);
0354     %Match the reaction ids
0355     irrevIDs=find(reversibility.rev==0);
0356     [crap I]=ismember(reversibility.rxns(irrevIDs),model.rxns);
0357     [crap prodMetIDs]=ismember(reversibility.product(irrevIDs),model.mets);
0358     model.rev(I)=0;
0359 
0360     %See if the reactions are written in the same order in model.S
0361     linearInd=sub2ind(size(model.S), prodMetIDs, I);
0362     changeOrder=I(model.S(linearInd)<0);
0363     model.S(:,changeOrder)=model.S(:,changeOrder).*-1; %Change the order of these reactions
0364 
0365     %Add some stuff to get a correct model structure
0366     model.ub=ones(rxnCounter,1)*1000;
0367     model.lb=model.rev*-1000;
0368     model.c=zeros(rxnCounter,1);
0369     model.b=zeros(numel(model.mets),1);
0370     model=removeRxns(model,badRxns,true,true);
0371     
0372     %Save the model structure
0373     save(rxnsFile,'model','isGeneral','isIncomplete');
0374 end
0375 
0376 %Delete reaction which are labeled as "incomplete", "erroneous", "unclear"
0377 %or "general reaction" (depending on settings.
0378 if keepGeneral==false
0379     model=removeRxns(model,intersect(isGeneral,model.rxns),true,true);
0380 end
0381 if keepIncomplete==false
0382     model=removeRxns(model,intersect(isIncomplete,model.rxns),true,true);
0383 end
0384 
0385 %Delete reactions involving undefined stoichiometry. These metabolites have
0386 %an ID containing the letter "n" or "m"
0387 if keepUndefinedStoich==false
0388     I=cellfun(@any,strfind(model.mets,'n')) | cellfun(@any,strfind(model.mets,'m'));
0389     [crap J]=find(model.S(I,:));
0390     model=removeRxns(model,J,true,true);
0391 end
0392 end

Generated on Mon 06-Jan-2014 14:58:12 by m2html © 2005