parseHPA Parses a database dump of the Human Protein Atlas (HPA) fileName comma-separated database dump of HPA. For details regarding the format, see http://www.proteinatlas.org/about/download. hpaData genes cell array with the unique gene names tissues cell array with the tissue names. The list may not be unique, as there can be multiple cell types per tissue celltypes cell array with the cell type names for each tissue levels cell array with the unique expression levels types cell array with the unique evidence types reliabilities cell array with the unique reliability levels gene2Level gene-to-expression level mapping in sparse matrix form. The value for element i,j is the index in hpaData.levels of gene i in cell type j gene2Type gene-to-evidence type mapping in sparse matrix form. The value for element i,j is the index in hpaData.types of gene i in cell type j gene2Reliability gene-to-reliability level mapping in sparse matrix form. The value for element i,j is the index in hpaData.reliabilities of gene i in cell type j Usage: hpaData=parseHPA(fileName) Rasmus Agren, 2012-08-29
0001 function hpaData=parseHPA(fileName) 0002 % parseHPA 0003 % Parses a database dump of the Human Protein Atlas (HPA) 0004 % 0005 % fileName comma-separated database dump of HPA. For details 0006 % regarding the format, see 0007 % http://www.proteinatlas.org/about/download. 0008 % 0009 % hpaData 0010 % genes cell array with the unique gene names 0011 % tissues cell array with the tissue names. The list may not be 0012 % unique, as there can be multiple cell types per tissue 0013 % celltypes cell array with the cell type names for each tissue 0014 % levels cell array with the unique expression levels 0015 % types cell array with the unique evidence types 0016 % reliabilities cell array with the unique reliability levels 0017 % 0018 % gene2Level gene-to-expression level mapping in sparse matrix form. 0019 % The value for element i,j is the index in 0020 % hpaData.levels of gene i in cell type j 0021 % gene2Type gene-to-evidence type mapping in sparse matrix form. 0022 % The value for element i,j is the index in 0023 % hpaData.types of gene i in cell type j 0024 % gene2Reliability gene-to-reliability level mapping in sparse matrix form. 0025 % The value for element i,j is the index in 0026 % hpaData.reliabilities of gene i in cell type j 0027 % 0028 % 0029 % Usage: hpaData=parseHPA(fileName) 0030 % 0031 % Rasmus Agren, 2012-08-29 0032 % 0033 0034 fid=fopen(fileName,'r'); 0035 hpa=textscan(fid,'%q %q %q %q %q %q','Delimiter',','); 0036 fclose(fid); 0037 0038 %Go through and see if the headers match what was expected 0039 headers={'Gene' 'Tissue' 'Cell type' 'Level' 'Expression type' 'Reliability'}; 0040 for i=1:numel(headers) 0041 if ~strcmpi(headers(i),hpa{i}(1)) 0042 throw(MException('',['Could not find the header "' headers{i} '". Make sure that the input file matches the format specified at http://www.proteinatlas.org/about/download'])); 0043 end 0044 %Remove the header line here 0045 hpa{i}(1)=[]; 0046 end 0047 0048 %Get the unique values of each data type 0049 [hpaData.genes crap I]=unique(hpa{1}); 0050 [crap J K]=unique(strcat(hpa{2},'¤¤',hpa{3})); 0051 hpaData.tissues=hpa{2}(J); 0052 hpaData.celltypes=hpa{3}(J); 0053 [hpaData.levels crap L]=unique(hpa{4}); 0054 [hpaData.types crap M]=unique(hpa{5}); 0055 [hpaData.reliabilities crap N]=unique(hpa{6}); 0056 0057 %Map the data to be sparse matrises instead 0058 hpaData.gene2Level=sparse(I,K,L,numel(hpaData.genes),numel(hpaData.tissues)); 0059 hpaData.gene2Type=sparse(I,K,M,numel(hpaData.genes),numel(hpaData.tissues)); 0060 hpaData.gene2Reliability=sparse(I,K,N,numel(hpaData.genes),numel(hpaData.tissues));