getMetsFromKEGG Retrieves information on all metabolites stored in KEGG database keggPath this function reads data from a local FTP dump of the KEGG database. keggPath is the pathway to the root of the database model a model structure generated from the database. The following fields are filled id: 'KEGG' description: 'Automatically generated from KEGG database' mets: KEGG compound ids metNames: Compound name. Only the first name will be saved if there are several synonyms metMiriams: If there is a CHEBI id available, then that will be saved here inchis: InChI string for the metabolite metFormulas: The chemical composition of the metabolite. This will only be loaded if there is no InChI string If the file keggMets.mat is in the RAVEN directory it will be loaded instead of parsing of the KEGG files. If it does not exist it will be saved after parsing of the KEGG files. In general, you should remove the keggMets.mat file if you want to rebuild the model structure from a newer version of KEGG. Usage: model=getMetsFromKEGG(keggPath) Rasmus Agren, 2013-08-01
0001 function model=getMetsFromKEGG(keggPath) 0002 % getMetsFromKEGG 0003 % Retrieves information on all metabolites stored in KEGG database 0004 % 0005 % keggPath this function reads data from a local FTP dump of the KEGG 0006 % database. keggPath is the pathway to the root of the database 0007 % 0008 % model a model structure generated from the database. The following 0009 % fields are filled 0010 % id: 'KEGG' 0011 % description: 'Automatically generated from KEGG database' 0012 % mets: KEGG compound ids 0013 % metNames: Compound name. Only the first name will be 0014 % saved if there are several synonyms 0015 % metMiriams: If there is a CHEBI id available, then that 0016 % will be saved here 0017 % inchis: InChI string for the metabolite 0018 % metFormulas: The chemical composition of the metabolite. 0019 % This will only be loaded if there is no InChI 0020 % string 0021 % If the file keggMets.mat is in the RAVEN directory it will be loaded 0022 % instead of parsing of the KEGG files. If it does not exist it will be 0023 % saved after parsing of the KEGG files. In general, you should remove the 0024 % keggMets.mat file if you want to rebuild the model structure from a 0025 % newer version of KEGG. 0026 % 0027 % Usage: model=getMetsFromKEGG(keggPath) 0028 % 0029 % Rasmus Agren, 2013-08-01 0030 % 0031 0032 %NOTE: This is how one entry looks in the file 0033 0034 % ENTRY C00001 Compound 0035 % NAME H2O; 0036 % Water 0037 % FORMULA H2O 0038 % MASS 18.0106 0039 % REMARK Same as: D00001 0040 % REACTION R00001 R00002 R00004 R00005 R00009 R00010 R00011 R00017 0041 % R00022 R00024 R00026 R00028 R00036 R00041 R00044 R00045 0042 % ENZYME 1.1.1.160 0043 % DBLINKS PubChem: 7435 0044 % ChEBI: 29110 0045 0046 %Then a lot of info about the positions of the atoms and so on. It is not 0047 %certain that each metabolite follows this structure exactly 0048 0049 %The file is not tab-delimited. Instead each label is 12 characters 0050 %(except for '///') 0051 0052 %Check if the reactions have been parsed before and saved. If so, load the 0053 %model. 0054 [ST I]=dbstack('-completenames'); 0055 ravenPath=fileparts(ST(I).file); 0056 metsFile=fullfile(ravenPath,'kegg','keggMets.mat'); 0057 if exist(metsFile, 'file') 0058 fprintf(['NOTE: Importing KEGG metabolites from ' strrep(metsFile,'\','/') '.\n']); 0059 load(metsFile); 0060 else 0061 %Download required files from KEGG if it doesn't exist in the directory 0062 downloadKEGG(keggPath); 0063 0064 %Add new functionality in the order specified in models 0065 model.id='KEGG'; 0066 model.description='Automatically generated from KEGG database'; 0067 0068 %Preallocate memory for 20000 metabolites 0069 model.mets=cell(20000,1); 0070 model.metNames=cell(20000,1); 0071 model.metFormulas=cell(20000,1); 0072 model.metMiriams=cell(20000,1); 0073 0074 %First load information on metabolite ID, metabolite name, composition, and 0075 %CHEBI 0076 fid = fopen(fullfile(keggPath,'compound'), 'r'); 0077 0078 %Keeps track of how many metabolites that have been added 0079 metCounter=0; 0080 0081 %Loop through the file 0082 while 1 0083 %Get the next line 0084 tline = fgetl(fid); 0085 0086 %Abort at end of file 0087 if ~ischar(tline) 0088 break; 0089 end 0090 0091 %Skip '///' 0092 if numel(tline)<12 0093 continue; 0094 end 0095 0096 %Check if it's a new reaction 0097 if strcmp(tline(1:12),'ENTRY ') 0098 metCounter=metCounter+1; 0099 0100 %Add empty strings where there should be such 0101 model.metNames{metCounter}=''; 0102 model.metFormulas{metCounter}=''; 0103 0104 %Add compound ID (always 6 characters) 0105 model.mets{metCounter}=tline(13:18); 0106 end 0107 0108 %Add name 0109 if strcmp(tline(1:12),'NAME ') 0110 %If there are synonyms, then the last character is ';' 0111 if strcmp(tline(end),';') 0112 model.metNames{metCounter}=tline(13:end-1); 0113 else 0114 model.metNames{metCounter}=tline(13:end); 0115 end 0116 end 0117 0118 %Add composition 0119 if strcmp(tline(1:12),'FORMULA ') 0120 model.metFormulas{metCounter}=tline(13:end); 0121 end 0122 0123 %Add CHEBI id 0124 if numel(tline)>19 0125 if strcmp(tline(1:19),' ChEBI: ') 0126 chebiID=tline(20:end); %This is because there is sometimes more then one CHEBI index 0127 0128 %Only load one id for now 0129 s=strfind(chebiID,' '); 0130 if any(s) 0131 chebiID=chebiID(1:s-1); 0132 end 0133 miriamStruct.name{1}='obo.chebi:CHEBI'; 0134 miriamStruct.value{1}=chebiID; 0135 model.metMiriams{metCounter}=miriamStruct; 0136 end 0137 end 0138 end 0139 0140 %Close the file 0141 fclose(fid); 0142 0143 %If too much space was allocated, shrink the model 0144 model.mets=model.mets(1:metCounter); 0145 model.metNames=model.metNames(1:metCounter); 0146 model.metFormulas=model.metFormulas(1:metCounter); 0147 model.metMiriams=model.metMiriams(1:metCounter); 0148 0149 %If there was no CHEBI found, add the KEGG id as a metMiriams 0150 for i=1:numel(model.mets) 0151 if ~isstruct(model.metMiriams{i}) 0152 miriamStruct.name{1}='kegg.compound'; 0153 miriamStruct.value{1}=model.mets{i}; 0154 model.metMiriams{i}=miriamStruct; 0155 end 0156 end 0157 0158 %Then load the InChI strings from another file. Not all metabolites will be 0159 %present in the list 0160 0161 inchIDs=cell(numel(model.mets),1); 0162 inchis=cell(numel(model.mets),1); 0163 0164 %The format is metID*tab*string 0165 0166 fid = fopen(fullfile(keggPath,'compound.inchi'), 'r'); 0167 0168 %Loop through the file 0169 counter=1; 0170 while 1 0171 %Get the next line 0172 tline = fgetl(fid); 0173 0174 %Abort at end of file 0175 if ~ischar(tline) 0176 break; 0177 end 0178 0179 %Get the ID and the InChI 0180 inchIDs{counter}=tline(1:6); 0181 inchis{counter}=tline(14:end); 0182 counter=counter+1; 0183 end 0184 0185 %Close the file 0186 fclose(fid); 0187 0188 inchIDs=inchIDs(1:counter-1); 0189 inchis=inchis(1:counter-1); 0190 0191 %Find the metabolites that had InChI strings and add them to the model 0192 [a b]=ismember(inchIDs,model.mets); 0193 0194 %If there were mets with InChIs but that were not in the list 0195 if ~all(a) 0196 dispEM('Not all metabolites with InChI strings were found in the original list'); 0197 end 0198 0199 model.inchis=cell(numel(model.mets),1); 0200 model.inchis(:)={''}; 0201 model.inchis(b)=inchis; 0202 0203 %Remove composition if InChI was found 0204 model.metFormulas(b)={''}; 0205 0206 %Saves the model 0207 save(metsFile,'model'); 0208 end 0209 end