Home > RAVEN > parseHPA.m

parseHPA

PURPOSE ^

parseHPA

SYNOPSIS ^

function hpaData=parseHPA(fileName)

DESCRIPTION ^

 parseHPA
   Parses a database dump of the Human Protein Atlas (HPA)

   fileName            comma-separated database dump of HPA. For details
                       regarding the format, see
                       http://www.proteinatlas.org/about/download.                          
   
   hpaData
       genes               cell array with the unique gene names
       tissues             cell array with the tissue names. The list may not be
                           unique, as there can be multiple cell types per tissue
       celltypes           cell array with the cell type names for each tissue
       levels              cell array with the unique expression levels
       types               cell array with the unique evidence types
       reliabilities       cell array with the unique reliability levels
       
       gene2Level          gene-to-expression level mapping in sparse matrix form.
                           The value for element i,j is the index in
                           hpaData.levels of gene i in cell type j
       gene2Type           gene-to-evidence type mapping in sparse matrix form.
                           The value for element i,j is the index in
                           hpaData.types of gene i in cell type j
       gene2Reliability    gene-to-reliability level mapping in sparse matrix form.
                           The value for element i,j is the index in
                           hpaData.reliabilities of gene i in cell type j

       
   Usage: hpaData=parseHPA(fileName)

   Rasmus Agren, 2013-08-01

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function hpaData=parseHPA(fileName)
0002 % parseHPA
0003 %   Parses a database dump of the Human Protein Atlas (HPA)
0004 %
0005 %   fileName            comma-separated database dump of HPA. For details
0006 %                       regarding the format, see
0007 %                       http://www.proteinatlas.org/about/download.
0008 %
0009 %   hpaData
0010 %       genes               cell array with the unique gene names
0011 %       tissues             cell array with the tissue names. The list may not be
0012 %                           unique, as there can be multiple cell types per tissue
0013 %       celltypes           cell array with the cell type names for each tissue
0014 %       levels              cell array with the unique expression levels
0015 %       types               cell array with the unique evidence types
0016 %       reliabilities       cell array with the unique reliability levels
0017 %
0018 %       gene2Level          gene-to-expression level mapping in sparse matrix form.
0019 %                           The value for element i,j is the index in
0020 %                           hpaData.levels of gene i in cell type j
0021 %       gene2Type           gene-to-evidence type mapping in sparse matrix form.
0022 %                           The value for element i,j is the index in
0023 %                           hpaData.types of gene i in cell type j
0024 %       gene2Reliability    gene-to-reliability level mapping in sparse matrix form.
0025 %                           The value for element i,j is the index in
0026 %                           hpaData.reliabilities of gene i in cell type j
0027 %
0028 %
0029 %   Usage: hpaData=parseHPA(fileName)
0030 %
0031 %   Rasmus Agren, 2013-08-01
0032 %
0033 
0034 fid=fopen(fileName,'r');
0035 hpa=textscan(fid,'%q %q %q %q %q %q','Delimiter',',');
0036 fclose(fid);
0037 
0038 %Go through and see if the headers match what was expected
0039 headers={'Gene' 'Tissue' 'Cell type' 'Level' 'Expression type' 'Reliability'};
0040 for i=1:numel(headers)
0041     if ~strcmpi(headers(i),hpa{i}(1))
0042         dispEM(['Could not find the header "' headers{i} '". Make sure that the input file matches the format specified at http://www.proteinatlas.org/about/download']);
0043     end
0044     %Remove the header line here
0045     hpa{i}(1)=[];
0046 end
0047 
0048 %Get the unique values of each data type
0049 [hpaData.genes crap I]=unique(hpa{1});
0050 [crap J K]=unique(strcat(hpa{2},'¤¤',hpa{3}));
0051 hpaData.tissues=hpa{2}(J);
0052 hpaData.celltypes=hpa{3}(J);
0053 [hpaData.levels crap L]=unique(hpa{4});
0054 [hpaData.types crap M]=unique(hpa{5});
0055 [hpaData.reliabilities crap N]=unique(hpa{6});
0056 
0057 %Map the data to be sparse matrises instead
0058 hpaData.gene2Level=sparse(I,K,L,numel(hpaData.genes),numel(hpaData.tissues));
0059 hpaData.gene2Type=sparse(I,K,M,numel(hpaData.genes),numel(hpaData.tissues));
0060 hpaData.gene2Reliability=sparse(I,K,N,numel(hpaData.genes),numel(hpaData.tissues));

Generated on Mon 06-Jan-2014 14:58:12 by m2html © 2005