reporterMets The Reporter Metabolites algorithm for identifying metabolites around which transcriptional changes occur model a model structure genes a cell array of gene names (should match with model.genes) genePValues P-values for differential expression of the genes printResults true if the top 20 Reporter Metabolites should be printed to the screen (opt, default false) outputFile the results are printed to this file (opt) geneFoldChanges log-fold changes for the genes. If supplied, then Reporter Metabolites are calculated for only up/down- regulated genes in addition to the full test (opt) repMets an array of structures with the following fields. test a string the describes the genes that were used to calculate the Reporter Metabolites ('all', 'only up', or 'only down'). The two latter structures are only calculated if geneFoldChanges are supplied. mets a cell array of metabolite IDs for the metabolites for which a score could be calculated metZScores Z-scores for differential expression around each metabolite in "mets" metPValues P-values for differential expression around each metabolite in "mets" metNGenes number of neighbouring genes for each metabolite in "mets" meanZ average Z-scores for the genes around each metabolite in "mets" stdZ standard deviations of the Z-scores around each metabolite in "mets" NOTE: For details about the algorithm, see Patil KR, Nielsen J, Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc. Natl Acad. Sci. USA 2005;102:2685-2689. Usage: repMets=reporterMets(model,genes,genePValues,printResults,... outputFile,geneFoldChanges) Rasmus Agren, 2012-11-14
0001 function repMets=reporterMets(model,genes,genePValues,printResults,outputFile,geneFoldChanges) 0002 % reporterMets 0003 % The Reporter Metabolites algorithm for identifying metabolites around 0004 % which transcriptional changes occur 0005 % 0006 % model a model structure 0007 % genes a cell array of gene names (should match with 0008 % model.genes) 0009 % genePValues P-values for differential expression of the genes 0010 % printResults true if the top 20 Reporter Metabolites should be 0011 % printed to the screen (opt, default false) 0012 % outputFile the results are printed to this file (opt) 0013 % geneFoldChanges log-fold changes for the genes. If supplied, then 0014 % Reporter Metabolites are calculated for only up/down- 0015 % regulated genes in addition to the full test (opt) 0016 % 0017 % repMets an array of structures with the following fields. 0018 % test a string the describes the genes that were used to 0019 % calculate the Reporter Metabolites ('all', 'only up', 0020 % or 'only down'). The two latter structures are 0021 % only calculated if geneFoldChanges are supplied. 0022 % mets a cell array of metabolite IDs for the metabolites for 0023 % which a score could be calculated 0024 % metZScores Z-scores for differential expression around each 0025 % metabolite in "mets" 0026 % metPValues P-values for differential expression around each 0027 % metabolite in "mets" 0028 % metNGenes number of neighbouring genes for each metabolite in 0029 % "mets" 0030 % meanZ average Z-scores for the genes around each metabolite 0031 % in "mets" 0032 % stdZ standard deviations of the Z-scores around each 0033 % metabolite in "mets" 0034 % 0035 % NOTE: For details about the algorithm, see Patil KR, Nielsen J, 0036 % Uncovering transcriptional regulation of metabolism by using metabolic 0037 % network topology. Proc. Natl Acad. Sci. USA 2005;102:2685-2689. 0038 % 0039 % Usage: repMets=reporterMets(model,genes,genePValues,printResults,... 0040 % outputFile,geneFoldChanges) 0041 % 0042 % Rasmus Agren, 2012-11-14 0043 % 0044 0045 if nargin<4 0046 printResults=false; 0047 end 0048 if nargin<5 0049 outputFile=[]; 0050 end 0051 if nargin<6 0052 geneFoldChanges=[]; 0053 end 0054 0055 %Check some stuff 0056 if numel(genes)~=numel(genePValues) 0057 throw(MException('','The number of genes and the number of Z-scores must be the same')); 0058 end 0059 if ~isfield(model,'rxnGeneMat') 0060 throw(MException('','The model structure must have a rxnGeneMat field')); 0061 end 0062 0063 %Remove the genes which are not in the model 0064 genePValues(~ismember(genes,model.genes))=[]; 0065 genes(~ismember(genes,model.genes))=[]; 0066 0067 %Convert p-values to Z-scores 0068 geneZScores=norminv(genePValues)*-1; 0069 0070 %This is to prevent errors if the p-values are exactly 0 or 1 0071 geneZScores(geneZScores==-inf)=-15; 0072 geneZScores(geneZScores==inf)=15; 0073 0074 %For each metabolite, calculate the aggregate Z-score and keep track of 0075 %the number of neighbouring genes 0076 metZScores=nan(numel(model.mets),1); 0077 metNGenes=nan(numel(model.mets),1); 0078 meanZ=nan(numel(model.mets),1); 0079 stdZ=nan(numel(model.mets),1); 0080 for i=1:numel(model.mets) 0081 %Get the involved rxns 0082 I=model.S(i,:); 0083 0084 %Get the involved genes 0085 [crap J]=find(model.rxnGeneMat(I~=0,:)); 0086 0087 %Find the genes in the gene list 0088 K=find(ismember(genes,model.genes(J))); 0089 0090 %Calculate the aggregated Z-score for the metabolite 0091 if any(K) 0092 metZScores(i)=sum(geneZScores(K))/sqrt(numel(K)); 0093 meanZ(i)=mean(geneZScores(K)); 0094 stdZ(i)=std(geneZScores(K)); 0095 metNGenes(i)=numel(K); 0096 end 0097 end 0098 0099 %Remove the metabolites which have no Z-scores 0100 mets=model.mets(~isnan(metZScores)); 0101 metNames=model.metNames(~isnan(metZScores)); 0102 metZScores(isnan(metZScores))=[]; 0103 metNGenes(isnan(metNGenes))=[]; 0104 meanZ(isnan(meanZ))=[]; 0105 stdZ(isnan(stdZ))=[]; 0106 0107 %Then correct for background by calculating the mean Z-score for random 0108 %sets of the same size as the ones that were found for the metabolites 0109 sizes=unique(metNGenes); 0110 nGenes=numel(genes); 0111 0112 for i=1:numel(sizes) 0113 %Sample 100000 sets for each size. Sample with replacement, which may or 0114 %may not be the best choice. 0115 I=ceil(rand(100000,sizes(i))*nGenes); 0116 J=geneZScores(I); 0117 K=sum(J,2)/sqrt(sizes(i)); 0118 0119 %Correct all elements of this size 0120 mK=mean(K); 0121 stdK=std(K); 0122 metZScores(metNGenes==sizes(i))=(metZScores(metNGenes==sizes(i))-mK)/stdK; 0123 end 0124 0125 %Calculate the P-values 0126 metPValues=1-normcdf(metZScores); 0127 0128 %Sort the results 0129 [metZScores I]=sort(metZScores,'descend'); 0130 mets=mets(I); 0131 metNames=metNames(I); 0132 metPValues=metPValues(I); 0133 metNGenes=metNGenes(I); 0134 meanZ=meanZ(I); 0135 stdZ=stdZ(I); 0136 0137 %Prepare the output 0138 repMets.mets=mets; 0139 repMets.metNames=metNames; 0140 repMets.metZScores=metZScores; 0141 repMets.metPValues=metPValues; 0142 repMets.metNGenes=metNGenes; 0143 repMets.meanZ=meanZ; 0144 repMets.stdZ=stdZ; 0145 0146 %Call this function recursively if geneFoldChanges has been specified 0147 repMets(1).test='all'; 0148 if any(geneFoldChanges) 0149 repMets=[repMets;reporterMets(model,genes(geneFoldChanges>=0),genePValues(geneFoldChanges>=0))]; 0150 repMets=[repMets;reporterMets(model,genes(geneFoldChanges<0),genePValues(geneFoldChanges<0))]; 0151 repMets(2).test='only up'; 0152 repMets(3).test='only down'; 0153 end 0154 0155 %This is for printing results to the screen. For printing a full list of 0156 %all scores, specify a output file 0157 if printResults==true 0158 for i=1:numel(repMets) 0159 fprintf(['TOP 20 REPORTER METABOLITES\nTEST TYPE: ' repMets(i).test '\nID\tNAME\tP-VALUE\n']); 0160 for j=1:min(20,numel(repMets(i).mets)) 0161 fprintf([repMets(i).mets{j} '\t' repMets(i).metNames{j} '\t' num2str(repMets(i).metPValues(j)) '\n']); 0162 end 0163 fprintf('\n'); 0164 end 0165 end 0166 0167 %Print results to file 0168 if any(outputFile) 0169 fid=fopen(outputFile,'w'); 0170 for i=1:numel(repMets) 0171 fprintf(fid,['REPORTER METABOLITES USING TEST TYPE: ' repMets(i).test '\n']); 0172 fprintf(fid,'ID\tNAME\tZ-SCORE\tP-VALUE\tNUMBER OF NEIGHBOURS\tAVERAGE Z-SCORE\tSTD Z-SCORE\n'); 0173 for j=1:numel(repMets(i).mets) 0174 fprintf(fid,[repMets(i).mets{j} '\t' repMets(i).metNames{j} '\t' num2str(repMets(i).metZScores(j)) '\t' num2str(repMets(i).metPValues(j)) '\t' num2str(repMets(i).metNGenes(j)) '\t' num2str(repMets(i).meanZ(j)) '\t' num2str(repMets(i).stdZ(j)) '\n']); 0175 end 0176 end 0177 fclose(fid); 0178 end 0179 end