Home > RAVEN > parseScores.m

parseScores

PURPOSE ^

parseScores

SYNOPSIS ^

function geneScoreStructure=parseScores(inputFile,predictor)

DESCRIPTION ^

 parseScores
   Parses the output from a predictor to generate the geneScoreStructure.

   inputFile    a file with the output from the predictor
   predictor   the predictor that was used. 'tsv' for tab-separated values
               where the name of the compartments in the first row and each
               row after that correspond to a gene. 'wolf' for 
               WoLFPSORT. (opt, default 'tsv')

   The function normalizes the scores so that the best score for each gene
   is 1.0.

   geneScoreStructure  a structure to be used in predictLocalization

   Usage: geneScoreStructure=parseScores(inputFile,predictor,normalize)

   Rasmus Agren, 2012-03-27

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function geneScoreStructure=parseScores(inputFile,predictor)
0002 % parseScores
0003 %   Parses the output from a predictor to generate the geneScoreStructure.
0004 %
0005 %   inputFile    a file with the output from the predictor
0006 %   predictor   the predictor that was used. 'tsv' for tab-separated values
0007 %               where the name of the compartments in the first row and each
0008 %               row after that correspond to a gene. 'wolf' for
0009 %               WoLFPSORT. (opt, default 'tsv')
0010 %
0011 %   The function normalizes the scores so that the best score for each gene
0012 %   is 1.0.
0013 %
0014 %   geneScoreStructure  a structure to be used in predictLocalization
0015 %
0016 %   Usage: geneScoreStructure=parseScores(inputFile,predictor,normalize)
0017 %
0018 %   Rasmus Agren, 2012-03-27
0019 
0020 if nargin<2
0021     predictor='tsv';
0022 end
0023 
0024 fid=fopen(inputFile,'r');
0025 
0026 if fid<1
0027    throw(MException('','Could not open file'));  
0028 end
0029 
0030 if strcmpi(predictor,'wolf')
0031    A=textscan(fid,'%s','Delimiter','\n','CommentStyle','#'); 
0032    
0033    %Each element should be for one gene, but some of them are on the form
0034    %"Pc20g11350: treating 9 X's as Glycines". Those should be removed.
0035    I=~cellfun(@any,strfind(A{1},'treating'));
0036    
0037    B=regexp(A{1}(I),' ','split');
0038    
0039    %Reserve space for stuff
0040    geneScoreStructure.compartments={};
0041    geneScoreStructure.scores=[]; %Don't know number of comps yet
0042    geneScoreStructure.genes=cell(numel(B),1);
0043    
0044    %Parsing is a bit cumbersome as ', ' is used as a delimiter in some cases
0045    %and ' ' in others. Use strrep to get rid of ','.
0046    for i=1:numel(B)
0047         b=strrep(B{i},',','');
0048         geneScoreStructure.genes{i}=b{1};
0049         
0050         %Then go through the compartments and add new ones as they are
0051         %found
0052         for j=2:2:numel(b)-1
0053             [crap J]=ismember(b(j),geneScoreStructure.compartments);
0054             
0055             %Add new compartment if it doesn't exist
0056             if J==0
0057                geneScoreStructure.compartments=[geneScoreStructure.compartments;b(j)];
0058                J=numel(geneScoreStructure.compartments);
0059                geneScoreStructure.scores=[geneScoreStructure.scores zeros(numel(B),1)];
0060             end
0061             
0062             geneScoreStructure.scores(i,J)=str2double(b(j+1));
0063         end
0064    end
0065 end
0066 
0067 %Check if there are duplicate genes
0068 [crap J K]=unique(geneScoreStructure.genes);
0069 
0070 if numel(J)~=numel(K)
0071    fprintf('WARNING: There are duplicate genes in the input file\n');
0072    geneScoreStructure.genes=geneScoreStructure.genes(J);
0073    geneScoreStructure.scores=geneScoreStructure.scores(J,:);   
0074 end
0075 
0076 %Normalize
0077 I=max(geneScoreStructure.scores,[],2);
0078 geneScoreStructure.scores=bsxfun(@times, geneScoreStructure.scores, 1./I);
0079 
0080 fclose(fid);

Generated on Tue 16-Jul-2013 21:50:02 by m2html © 2005