Home > RAVEN > predictLocalization.m

predictLocalization

PURPOSE ^

predictLocalization

SYNOPSIS ^

function [outModel geneLocalization transportStruct scores removedRxns]=predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime,plotResults)

DESCRIPTION ^

 predictLocalization
   Tries to assign reactions to compartments in a manner that is in
   agreement with localization predictors while at the same time
   maintaining connectivity.

   model                 a model structure. If the model contains several
                         compartments they will be merged
   GSS                   gene scoring structure as from parseScores
   defaultCompartment    transport reactions are expressed as diffusion
                         between the defaultCompartment and the others.
                         This is usually the cytosol. The default
                         compartment must have a match in
                         GSS
   transportCost         the cost for including a transport reaction (opt, 
                         default 0.5)
   maxTime               maximum optimization time in minutes (opt,
                         default 15)
   plotResults           true if the result should be ploted during the
                         optimization (opt false)

   outModel              the resulting model structure
   geneLocalization      structure with the genes and their resulting
                         localization
   transportStruct       structure with the transport reactions that had
                         to be inferred and between which compartments
   scores                structure that contains the total score history 
                         together with the score based on gene localization
                         and the score based on included transport reactions
   removedRxns           cell array with the reaction ids that had to be
                         removed in order to have a connected input model

   This function requires that the starting network is connected when it's in
   one compartment. Reactions that are unconnected are removed and saved
   in removedRxns. Try running fillGaps to have a more connected input
   model if there are many such reactions.

   In the final model all metabolites are produced in at least one reaction.
   This doesn't guarantee a fully functional model since there can be internal
   loops. Transport reactions are only included as passive diffusion (A <=> B).

   The score of a model is the sum of scores for all genes in their
   assigned compartment minus the cost of all transport reactions that 
   had to be included. A gene can only be assigned to one compartment.
   This is a simplification to keep the problem size down. The problem is 
   solved using simulated annealing.

   Usage: [outModel geneLocalization transportStruct score removedRxns]=...
       predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime)

   Rasmus Agren, 2013-02-07

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [outModel geneLocalization transportStruct scores removedRxns]=predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime,plotResults)
0002 % predictLocalization
0003 %   Tries to assign reactions to compartments in a manner that is in
0004 %   agreement with localization predictors while at the same time
0005 %   maintaining connectivity.
0006 %
0007 %   model                 a model structure. If the model contains several
0008 %                         compartments they will be merged
0009 %   GSS                   gene scoring structure as from parseScores
0010 %   defaultCompartment    transport reactions are expressed as diffusion
0011 %                         between the defaultCompartment and the others.
0012 %                         This is usually the cytosol. The default
0013 %                         compartment must have a match in
0014 %                         GSS
0015 %   transportCost         the cost for including a transport reaction (opt,
0016 %                         default 0.5)
0017 %   maxTime               maximum optimization time in minutes (opt,
0018 %                         default 15)
0019 %   plotResults           true if the result should be ploted during the
0020 %                         optimization (opt false)
0021 %
0022 %   outModel              the resulting model structure
0023 %   geneLocalization      structure with the genes and their resulting
0024 %                         localization
0025 %   transportStruct       structure with the transport reactions that had
0026 %                         to be inferred and between which compartments
0027 %   scores                structure that contains the total score history
0028 %                         together with the score based on gene localization
0029 %                         and the score based on included transport reactions
0030 %   removedRxns           cell array with the reaction ids that had to be
0031 %                         removed in order to have a connected input model
0032 %
0033 %   This function requires that the starting network is connected when it's in
0034 %   one compartment. Reactions that are unconnected are removed and saved
0035 %   in removedRxns. Try running fillGaps to have a more connected input
0036 %   model if there are many such reactions.
0037 %
0038 %   In the final model all metabolites are produced in at least one reaction.
0039 %   This doesn't guarantee a fully functional model since there can be internal
0040 %   loops. Transport reactions are only included as passive diffusion (A <=> B).
0041 %
0042 %   The score of a model is the sum of scores for all genes in their
0043 %   assigned compartment minus the cost of all transport reactions that
0044 %   had to be included. A gene can only be assigned to one compartment.
0045 %   This is a simplification to keep the problem size down. The problem is
0046 %   solved using simulated annealing.
0047 %
0048 %   Usage: [outModel geneLocalization transportStruct score removedRxns]=...
0049 %       predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime)
0050 %
0051 %   Rasmus Agren, 2013-02-07
0052 %
0053 
0054 if nargin<4
0055     transportCost=0.5;
0056 end
0057 
0058 if nargin<5
0059     maxTime=15;
0060 end
0061 
0062 if nargin<6
0063     plotResults=false;
0064 end
0065 
0066 I=ismember(defaultCompartment,GSS.compartments);
0067 if I==false
0068     throw(MException('','defaultCompartment not found in GSS')); 
0069 end
0070 
0071 if numel(model.comps)>1
0072    fprintf('WARNING: The model has several compartments. All compartments will be merged.\n');
0073    model=mergeCompartments(model,true,true);
0074 end
0075 
0076 %***Begin formating the data structures
0077 
0078 %Expand the model so that iso-enzymes have different reactions
0079 model=expandModel(model);
0080 
0081 %Identify reactions that have to be deleted because the involved mets are
0082 %never produced. This is done in an iterative manner
0083 removedRxns={};
0084 while 1
0085     irrevModel=convertToIrrev(model);
0086 
0087     I=sum(irrevModel.S>0,2); 
0088 
0089     %Pretend that the unconstrained metabolites are made enough
0090     if isfield(irrevModel,'unconstrained')
0091         I(irrevModel.unconstrained~=0)=2;
0092     end
0093     metsToDelete=false(numel(model.mets),1);
0094 
0095     %This is not very neat but I loop through each metabolite and check whether
0096     %it can be produced (without using only one isolated reversible reaction)
0097     for i=1:numel(irrevModel.mets)
0098         %If something can be made in two reactions then everything is fine. If i
0099         %can be made in one reaction it's fine unless it's through an isolated
0100         %reversible reaction (which can act as a mini loop)
0101         if I(i)<2
0102             if I(i)==1
0103                 %Find the reaction where this metabolite is produced
0104                 [crap J]=find(irrevModel.S(i,:)>0);
0105 
0106                 %Check the metabolites that are consumed in this reaction. The
0107                 %problem is if any of them is only produced in the opposite
0108                 %reversible reaction
0109                 K=irrevModel.S(:,J)<0;
0110                 check=find(K & I<=1);
0111 
0112                 for j=1:numel(check)
0113                     %Find the reactions where it participates
0114                     [crap L]=find(irrevModel.S(check(j),:)>0);
0115 
0116                     if ~isempty(L)
0117                         rxn=irrevModel.rxns(J);
0118                         rxnRev=irrevModel.rxns(L);
0119                         if strcmp(strrep(rxn,'_REV',''),strrep(rxnRev,'_REV',''))
0120                            metsToDelete(i)=true;
0121                         end
0122                     else
0123                         %If the metabolite was never produced then do nothing and deal with
0124                         %it when the loop gets there :)
0125                         continue;
0126                     end
0127                 end
0128             else
0129                 %Not made anywhere
0130                 metsToDelete(i)=true;
0131             end
0132         end
0133     end
0134 
0135     if any(metsToDelete)
0136         %Delete any reactions involving any of the metsToDelete
0137         [crap I]=find(model.S(metsToDelete,:));
0138         removedRxns=[removedRxns;model.rxns(I)];
0139         model=removeRxns(model,I,true,true);
0140     else
0141         %All bad reactions deleted
0142         break;
0143     end
0144 end
0145 
0146 %Assign fake genes to reactions without genes. This is just to make things
0147 %easier later on
0148 I=find(sum(model.rxnGeneMat,2)==0);
0149 for i=1:numel(I)
0150     model.genes=[model.genes;['&&FAKE&&' num2str(i)]];
0151     if isfield(model,'geneShortNames')
0152         model.geneShortNames=[model.geneShortNames;{''}];
0153     end
0154     if isfield(model,'geneMiriams')
0155         model.geneMiriams=[model.geneMiriams;{[]}];
0156     end
0157     if isfield(model,'geneFrom')
0158         model.geneFrom=[model.geneFrom;{{'FAKE'}}];
0159     end
0160     model.rxnGeneMat(I(i),numel(model.genes))=1;
0161     model.grRules{I(i)}=''; 
0162 end
0163 
0164 %Update the GSS. All genes, fake or real, for which
0165 %there is no evidence gets a score 0.5 in all compartments. Also just to
0166 %make it easier further on
0167 I=setdiff(model.genes,GSS.genes);
0168 GSS.genes=[GSS.genes;I];
0169 GSS.scores=[GSS.scores;ones(numel(I),numel(GSS.compartments))*0.5];
0170 
0171 %Gene complexes should be moved together in order to be biologically
0172 %relevant. The average score for the genes is used for each compartment.
0173 %This is done by changing the model so that gene complexes are used as a
0174 %single gene name and then a score is calculated for that "gene".
0175 
0176 %Only "and"-relationships exist after expandModel
0177 genes=unique(model.grRules);
0178 nGenes=strrep(genes,'(','');
0179 nGenes=strrep(nGenes,')','');
0180 %nGenes=strrep(nGenes,' and ','_and_');
0181 complexes=setdiff(nGenes,model.genes);
0182 if isempty(complexes{1}) %Empty grRules also come up here
0183     complexes(1)=[];
0184 end
0185 cScores=zeros(numel(complexes),numel(GSS.compartments));
0186 for i=1:numel(complexes)
0187     genesInComplex=regexp(complexes{i},' and ','split');
0188 
0189     %Find these genes in GSS
0190     [I J]=ismember(genesInComplex,GSS.genes);
0191 
0192     if any(I)
0193         %Get the average of the genes that were found.
0194         mScores=mean(GSS.scores(J(I),:));
0195 
0196         %And add 0.5 for the genes that were not found in order to be
0197         %consistent with non-complexes
0198         mScores=(mScores.*sum(I)+(numel(genesInComplex)-sum(I))*0.5)/numel(genesInComplex);
0199     else
0200         fprintf(['WARNING: Could not parse grRule "' complexes{i} '". Assigning score 0.0 in all compartments\n']);
0201         mScores=ones(1,numel(genesInComplex))*0.5;
0202     end
0203     cScores(i,:)=mScores;
0204 
0205     %Add this complex as a new gene
0206     model.genes=[model.genes;complexes{i}];
0207     if isfield(model,'geneMiriams')
0208         model.geneMiriams=[model.geneMiriams;{[]}];
0209     end
0210     if isfield(model,'geneShortNames')
0211         model.geneShortNames=[model.geneShortNames;{''}];
0212     end
0213     if isfield(model,'geneFrom')
0214         model.geneFrom=[model.geneFrom;{'COMPLEX'}];
0215     end
0216     %Find the reactions which had the original complex and change them to
0217     %use the new "gene"
0218     I=ismember(model.grRules,['(' complexes{i} ')']);
0219 
0220     %Should check more carefully if there can be an error here
0221     if ~isempty(I)
0222         model.rxnGeneMat(I,:)=0; %Ok since we have split on "or"
0223         model.rxnGeneMat(I,numel(model.genes))=1;
0224     end
0225 end
0226 
0227 %Add the new "genes"
0228 GSS.genes=[GSS.genes;complexes];
0229 GSS.scores=[GSS.scores;cScores];
0230 
0231 %After merging the complexes it could happen that there are genes that are
0232 %no longer in use. Delete such genes
0233 model=removeRxns(model,{},true,true);
0234 
0235 %Exchange reactions, defined as involving an unconstrained metabolite, are
0236 %special in that they have to stay in the defaultCompartment. This means
0237 %that uptake/excretion of metabolites is always via the default
0238 %compartment. This is a small simplification, but should be valid in most
0239 %cases
0240 [crap I]=getExchangeRxns(model);
0241 
0242 %It will be easier later on if the same place. Put them in the beginning
0243 J=1:numel(model.rxns);
0244 J(I)=[];
0245 model=permuteModel(model,[I;J'],'rxns');
0246 
0247 %Number of exchange reactions
0248 nER=numel(I);
0249 
0250 %Also put the exchange metabolites in the beginning
0251 if isfield(model,'unconstrained')
0252     I=find(model.unconstrained);
0253     J=1:numel(model.mets);
0254     J(I)=[];
0255     model=permuteModel(model,[I;J'],'mets');
0256     %Number of exchange metabolites
0257     nEM=numel(I);
0258 else
0259     nEM=0;
0260 end
0261 
0262 %There is no point of having genes for exchange reactions, so delete them.
0263 %Also to make computations easier.
0264 model.rxnGeneMat(1:nER,:)=0;
0265 model.grRules(1:nER)={''};
0266 
0267 %Remove unused genes
0268 model=removeRxns(model,{},true,true);
0269 
0270 %Remove genes with no match to the model and reorder so that the genes are
0271 %in the same order as model.genes. Since we have already added fake genes
0272 %so that all genes in model exist in GSS it's fine to do
0273 %like this.
0274 [crap J]=ismember(model.genes,GSS.genes);
0275 GSS.genes=model.genes;
0276 GSS.scores=GSS.scores(J,:);
0277 
0278 %Reorder the GSS so that the first index corresponds to the
0279 %default compartment
0280 [crap J]=ismember(defaultCompartment,GSS.compartments);
0281 reorder=1:numel(GSS.compartments);
0282 reorder(J)=[];
0283 reorder=[J reorder];
0284 GSS.scores=GSS.scores(:,reorder);
0285 GSS.compartments=GSS.compartments(reorder);
0286 
0287 %Since we're only looking at whether the metabolites can be synthesized, we
0288 %don't have to care about the stoichiometry. Change to -1/1 to simplify
0289 %later. Keep the S matrix for later though.
0290 oldS=model.S;
0291 model.S(model.S>0)=1;
0292 model.S(model.S<0)=-1;
0293 
0294 %Here I do a bit of a trick. Since I don't want to calculate which
0295 %reactions are reversible all the time, I let reversible reactions have the
0296 %coefficients -10/10 instead of -1/1
0297 model.S(:,model.rev==1)=model.S(:,model.rev==1).*10;
0298 
0299 %***Begin problem formulation
0300 
0301 %Some numbers that are good to have
0302 nRxns=numel(model.rxns)-nER; %Excluding exchange rxns
0303 nMets=numel(model.mets)-nEM; %Excluding exchange mets
0304 nGenes=numel(model.genes);
0305 nComps=numel(GSS.compartments);
0306 
0307 %Create a big stoichiometric matrix that will be the current model. In
0308 %order to have faster simulations the maximal model size is declared and
0309 %reactions are then moved within it.
0310 
0311 %First the original model (with the first nE being exchange rxns), then
0312 %reserve space for number of rxns minus exchange rxns for each non-default
0313 %compartment, then transport reactions for all non-exchange mets between
0314 %the default compartment and all others.
0315 %NOTE: Kept eye()*0 since eye() can be used to include all transport from
0316 %the beginning
0317 s=repmat(eye(nMets)*0,1,nComps-1);
0318 s=[zeros(numel(model.mets)-nMets,size(s,2));s];
0319 S=[model.S sparse(numel(model.mets),nRxns*(nComps-1)) s];
0320 s=[sparse(nMets*(nComps-1),numel(model.rxns)+nRxns*(nComps-1)) eye(nMets*(nComps-1))*0];
0321 S=[S;s];
0322 
0323 %Create a binary matrix that says where the genes are in the current
0324 %solution
0325 g2c=false(nGenes,nComps);
0326 %All genes start in the default compartment
0327 g2c(:,1)=true;
0328 
0329 %Start of main optimization loop
0330 tic;
0331 bestScore=-inf;
0332 bestS=[];
0333 bestg2c=[];
0334 
0335 %Temp for testing
0336 plotScore=[];
0337 nTrans=[];
0338 totScore=[];
0339 minScore=sum(min(GSS.scores,[],2));
0340 maxScore=sum(max(GSS.scores,[],2));
0341 while toc<maxTime*60
0342    %Pick a random gene, weighted by it's current score minus the best score
0343    %for that gene (often 1.0, but can be 0.5 for no genes or average for complexes.
0344    %Genes with bad fits are more likely to be moved. This formulation never
0345    %moves a gene from its best compartment. Therefore a small uniform
0346    %weight is added.
0347    [I J]=find(g2c);
0348    geneToMove=randsample(nGenes,1,true,max(GSS.scores(I,:),[],2)-GSS.scores(sub2ind(size(g2c),I,J))+0.1);
0349 
0350    %Sample among possible compartments to move to. Add a larger weight to
0351    %even out the odds a little. Also a way of getting rid of loops where
0352    %the same set of genes are moved back and forth several times.
0353    toComp=randsample(nComps,1,true,GSS.scores(geneToMove,:)+0.2);
0354    
0355    %Check that it moves to a new compartment
0356    if toComp==find(g2c(geneToMove,:))
0357        continue;
0358    end
0359    
0360    %Moves the gene
0361    [newS newg2c]=moveGene(S,model,g2c,geneToMove,toComp,nRxns,nMets);
0362    
0363    %Tries to connect the network. If this was not possible in 10
0364    %iterations, then abort. If more than 20 modifications were needed then
0365    %it's unlikely that it will be a lower score
0366    wasConnected=false;
0367    for j=1:5
0368        %Find the metabolites that are now unconnected
0369        unconnected=findUnconnected(newS,nEM);
0370        
0371        %Continue if there are still unconnected
0372        if any(unconnected)
0373            %For each gene find out how many of these could be connected if
0374            %the gene was moved and how many would be disconnected by moving
0375            %that gene
0376            [geneIndex moveTo deltaConnected deltaScore]=selectGenes(newS,nEM,nMets,nER,nRxns,model,unconnected,g2c,GSS);
0377            
0378            %Score which gene would be the best to move. The highest
0379            %deltaScore is 1.0. I want it to be possible to move a gene from
0380            %worst to best compartment even if it disconnects, say, 1.5 more
0381            %metabolites.
0382            [score I]=max(1.5*deltaScore+deltaConnected);
0383            
0384            %Checks if it has to add a transport or if there is a gene that
0385            %could be moved order to have a more connected network
0386            hasToAddTransport=true;
0387            if ~isempty(deltaConnected)
0388               if score>0
0389                   hasToAddTransport=false;
0390               end
0391            end
0392            
0393            %If it is possible to move any gene in order to have a more
0394            %connected network, then move the best one
0395            if hasToAddTransport==false;
0396                 [newS newg2c]=moveGene(newS,model,g2c,geneIndex(I),moveTo(I),nRxns,nMets);
0397            else
0398                 %Choose a random unconnected metabolite that should be
0399                 %connected
0400                 transMet=unconnected(randsample(numel(unconnected),1));
0401                 
0402                 %First get where the metabolite is now
0403                 comps=ceil((transMet-nEM)/((size(S,1)-nEM)/nComps));
0404 
0405                 %Find the corresponding metabolite index if it were in the
0406                 %default compartment
0407                 dcIndex=transMet-(comps-1)*nMets;
0408                 
0409                 %Then get the indexes of that metabolite in all
0410                 %compartments
0411                 allIndexes=dcIndex;
0412                 for k=1:nComps-1
0413                    allIndexes=[allIndexes;dcIndex+nMets*k];
0414                 end
0415                 
0416                 %It could be that some of these aren't used in any
0417                 %reaction. Get only the ones which are
0418                 I=sum(newS(allIndexes,:)~=0,2)>0;
0419                 
0420                 %Then get the ones that are used but not in unconnected.
0421                 %These are metabolites that could potentially be
0422                 %transported to connect transMet
0423                 connectedUsed=setdiff(allIndexes(I),unconnected);
0424                 
0425                 %I think this is an error but I leave it for now. It seems
0426                 %to happen if nothing can be connected in one step
0427                 if isempty(connectedUsed)
0428                    break;
0429                 end
0430                 
0431                 %If transMet is in the default compartment then everything
0432                 %is fine, just connect it to a random one
0433                 if transMet==dcIndex
0434                     newS=addTransport(newS,nRxns,nER,nMets,nEM,nComps,transMet,connectedUsed(randsample(numel(connectedUsed),1)));
0435                 else
0436                     %If one of the connectedUsed is in the default
0437                     %compartment then connect to that one
0438                     I=connectedUsed(connectedUsed<(nMets+nEM));
0439                     if any(I)
0440                         newS=addTransport(newS,nRxns,nER,nMets,nEM,nComps,transMet,I(randsample(numel(I),1)));
0441                     else
0442                         %This is if the only way to connect it is by adding
0443                         %two transport reactions, going via the default
0444                         %compartment
0445                         break;
0446                     end
0447                 end
0448            end
0449        else
0450            wasConnected=true;
0451            break;
0452        end
0453    end
0454    
0455    %If the network was connected in a new way, it is possible that some
0456    %transport reactions are no longer needed. They should be removed
0457    if wasConnected==true        
0458         %These are the metabolites that are being transported
0459         activeTransport=find(sum(newS(:,nER+nRxns*nComps+1:end),2));
0460         
0461         %Get the metabolites that are unconnected if transport wasn't used
0462         unconnected=findUnconnected(newS(:,1:nER+nRxns*nComps),nEM);
0463         
0464         %Find the transport reactions that are not needed and delete them
0465         I=setdiff(activeTransport,unconnected);
0466         
0467         %Since both metabolites in a transport rxns must be connected for
0468         %the reaction to be deleted, the sum over the colums should be 4.
0469         newS(:,find(sum(newS(I,nER+nRxns*nComps+1:end))==4)+nER+nRxns*nComps)=0;
0470        
0471         %Score the solution and determine whether to keep it as a new solution
0472         [score geneScore trCost]=scoreModel(newS,newg2c,GSS,transportCost);
0473 
0474         %If it was the best solution so far, keep it
0475         if score>bestScore
0476             bestScore=score;
0477             bestS=newS;
0478             bestg2c=newg2c;
0479         end
0480         
0481         %This should not be steepest descent later
0482         if score>=bestScore% || exp((score-bestScore)*7)>rand()
0483             plotScore=[plotScore;geneScore];
0484             nTrans=[nTrans;trCost];
0485             totScore=[totScore;score];
0486             S=newS;
0487             g2c=newg2c;
0488             
0489             if plotResults==true
0490                 subplot(3,2,1);
0491                 spy(S);
0492                 subplot(3,2,2);
0493                 plot(plotScore,'r');
0494                 xlabel('Gene score');
0495                 subplot(3,2,3);
0496                 plot((plotScore-minScore)/(maxScore-minScore),'r');
0497                 xlabel('Gene score relative to predictions');
0498                 subplot(3,2,4);
0499                 plot(nTrans,'g');
0500                 xlabel('Transport cost');
0501                 subplot(3,2,5);
0502                 plot(totScore,'b');
0503                 xlabel('Total score');
0504                 subplot(3,2,6);
0505                 pause(0.2);
0506             end
0507         end
0508    end
0509 end
0510 scores.totScore=score;
0511 scores.geneScore=geneScore;
0512 scores.transCost=trCost;
0513 
0514 %Find which metabolites are transported and to where
0515 [I J]=find(bestS(nEM+1:nEM+nMets,end-nMets*(nComps-1)+1:end));
0516 J=ceil(J/nMets+1);
0517 transportStruct.mets=model.metNames(I+nEM);
0518 transportStruct.toComp=GSS.compartments(J);
0519 
0520 [I J]=find(bestg2c);
0521 geneLocalization.genes=GSS.genes(I);
0522 geneLocalization.comps=GSS.compartments(J);
0523 
0524 %Resort the gene names
0525 [crap I]=sort(geneLocalization.genes);
0526 geneLocalization.genes=geneLocalization.genes(I);
0527 geneLocalization.comps=geneLocalization.comps(I);
0528 
0529 %Remove the fake genes
0530 I=strmatch('&&FAKE&&',geneLocalization.genes);
0531 geneLocalization.genes(I)=[];
0532 geneLocalization.comps(I)=[];
0533 
0534 %Put together the model. This is done by first duplicating the S matrix into the
0535 %different compartments. Then the transport reactions are added based on
0536 %transportStruct. By now model.S should have the same size as the S matrix
0537 %used in the optimization, but with conserved stoichiometry. In the final
0538 %step all reactions and metabolites that aren't used in the S matrix from the optimization
0539 %are deleted from the model.
0540 outModel=model;
0541 outModel.S=oldS;
0542 
0543 %This is the S matrix without exchange rxns or metabolites
0544 copyPart=outModel.S(nEM+1:end,nER+1:end);
0545 
0546 %Replicate to give the rxnGeneMat for the full system
0547 copyRxnGeneMat=outModel.rxnGeneMat(nER+1:end,:);
0548 outModel.rxnGeneMat=[outModel.rxnGeneMat;repmat(copyRxnGeneMat,nComps-1,1)];
0549 
0550 %First fix the compartments. The model is already ordered with the exchange
0551 %metabolites first. The original model may contain one or two compartments,
0552 %depending on whether any exchange metabolites are defined.
0553 nStartComps=numel(outModel.comps);
0554 if nStartComps==1
0555    outModel.comps={'1'};
0556    outModel.compNames=GSS.compartments(1);
0557 else
0558     if model.metComps(1)==1
0559         outModel.compNames(1)=GSS.compartments(1);
0560     else
0561         outModel.compNames(2)=GSS.compartments(1);
0562     end
0563 end
0564 outModel.compNames=[outModel.compNames;GSS.compartments(2:end)];
0565 
0566 %Ugly little loop
0567 for i=1:numel(GSS.compartments)-1
0568     outModel.comps=[outModel.comps;num2str(numel(outModel.comps)+1)];
0569 end
0570 %This information is not known from the data, so empty fields are added
0571 outModel.compOutside=cell(numel(outModel.comps),1);
0572 outModel.compOutside(:)={''};
0573 
0574 for i=1:nComps-1
0575     outModel.S=[outModel.S sparse(size(outModel.S,1),nRxns)];
0576     outModel.S=[outModel.S;[sparse(nMets,nRxns*i+nER) copyPart]];
0577     outModel.rxns=[outModel.rxns;strcat(outModel.rxns(nER+1:nER+nRxns),'_',GSS.compartments{i+1})];
0578     outModel.rxnNames=[outModel.rxnNames;strcat(outModel.rxnNames(nER+1:nER+nRxns),' (',GSS.compartments{i+1},')')];
0579     outModel.lb=[outModel.lb;outModel.lb(nER+1:nER+nRxns)];
0580     outModel.ub=[outModel.ub;outModel.ub(nER+1:nER+nRxns)];
0581     outModel.rev=[outModel.rev;outModel.rev(nER+1:nER+nRxns)];
0582     outModel.c=[outModel.c;outModel.c(nER+1:nER+nRxns)];
0583     if isfield(outModel,'grRules')
0584         outModel.grRules=[outModel.grRules;outModel.grRules(nER+1:nER+nRxns)];
0585     end
0586     if isfield(outModel,'subSystems')
0587         outModel.subSystems=[outModel.subSystems;outModel.subSystems(nER+1:nER+nRxns)];
0588     end
0589     if isfield(outModel,'eccodes')
0590         outModel.eccodes=[outModel.eccodes;outModel.eccodes(nER+1:nER+nRxns)];
0591     end
0592     if isfield(outModel,'rxnFrom')
0593         outModel.rxnFrom=[outModel.rxnFrom;outModel.rxnFrom(nER+1:nER+nRxns)];
0594     end
0595     outModel.mets=[outModel.mets;strcat(outModel.mets(nEM+1:nEM+nMets),'_',GSS.compartments{i+1})];
0596     outModel.metNames=[outModel.metNames;outModel.metNames(nEM+1:nEM+nMets)];
0597     outModel.b=[outModel.b;outModel.b(nEM+1:nEM+nMets,:)];
0598     I=ones(nMets,1)*nStartComps+i;
0599     outModel.metComps=[outModel.metComps;I];
0600     if isfield(outModel,'inchis')
0601         outModel.inchis=[outModel.inchis;outModel.inchis(nEM+1:nEM+nMets)];
0602     end
0603     if isfield(outModel,'unconstrained')
0604         outModel.unconstrained=[outModel.unconstrained;outModel.unconstrained(nEM+1:nEM+nMets)];
0605     end
0606     if isfield(outModel,'metMiriams')
0607         outModel.metMiriams=[outModel.metMiriams;outModel.metMiriams(nEM+1:nEM+nMets)];
0608     end
0609     if isfield(outModel,'metFormulas')
0610         outModel.metFormulas=[outModel.metFormulas;outModel.metFormulas(nEM+1:nEM+nMets)];
0611     end
0612     if isfield(outModel,'metFrom')
0613         outModel.metFrom=[outModel.metFrom;outModel.metFrom(nEM+1:nEM+nMets)];
0614     end
0615 end
0616 
0617 %Add the transport reactions
0618 transS=bestS(:,numel(outModel.rxns)+1:end);
0619 J=sum(transS)>0; %Active rxns
0620 
0621 %Transport reactions are written in a different way compared to a "real"
0622 %stoichimetric matrix. This is to fix that
0623 transS(transS~=0)=1;
0624 transS(1:nEM+nMets,:)=transS(1:nEM+nMets,:)*-1;
0625 I=find(sum(transS>0,2));
0626 nTransRxns=numel(I);
0627 outModel.S=[outModel.S transS(:,J)];
0628 filler=ones(nTransRxns,1);
0629 outModel.lb=[outModel.lb;filler*-1000];
0630 outModel.ub=[outModel.ub;filler*1000];
0631 outModel.rev=[outModel.rev;filler];
0632 outModel.c=[outModel.c;filler*0];
0633 outModel.rxnGeneMat=[outModel.rxnGeneMat;sparse(nTransRxns,numel(outModel.genes))];
0634 
0635 for i=1:numel(I)
0636     outModel.rxns=[outModel.rxns;strcat('transport',num2str(i))];
0637     outModel.rxnNames=[outModel.rxnNames;['Transport of ',outModel.metNames{I(i)}]];
0638     if isfield(outModel,'grRules')
0639         outModel.grRules=[outModel.grRules;{''}];
0640     end
0641     if isfield(outModel,'subSystems')
0642         outModel.subSystems=[outModel.subSystems;'Inferred transport reactions'];
0643     end
0644     if isfield(outModel,'eccodes')
0645         outModel.eccodes=[outModel.eccodes;{''}];
0646     end
0647     if isfield(outModel,'rxnFrom')
0648         outModel.rxnFrom=[outModel.rxnFrom;{''}];
0649     end
0650 end
0651 
0652 %Then remove all reactions and metabolites that aren't used in the final
0653 %solution from the optimization
0654 [I J]=find(bestS(:,1:nER+nComps*nRxns));
0655 K=true(numel(outModel.rxns),1);
0656 K(J)=false;
0657 K(end-nTransRxns+1:end)=false;
0658 outModel=removeRxns(outModel,K,true);
0659 
0660 %Remove all fake genes
0661 I=strmatch('&&FAKE&&',outModel.genes);
0662 outModel.genes(I)=[];
0663 if isfield(outModel,'geneMiriams')
0664     outModel.geneMiriams(I)=[];
0665 end
0666 if isfield(outModel,'geneShortNames')
0667     outModel.geneShortNames(I)=[];
0668 end
0669 outModel.rxnGeneMat(:,I)=[];
0670 end
0671 
0672 %Moves a gene and all associated reactions from one compartment to another
0673 function [S g2c]=moveGene(S,model,g2c,geneToMove,toComp,nRxns,nMets)
0674     %Find the current compartment and update to the new one
0675     currentComp=find(g2c(geneToMove,:));
0676     g2c(geneToMove,:)=false;
0677     g2c(geneToMove,toComp)=true;
0678     
0679     %Find the reactions in the original model that the gene controls
0680     [I crap]=find(model.rxnGeneMat(:,geneToMove));
0681     
0682     %Calculate their current positions in the S matrix
0683     oldRxns=I+(currentComp-1)*nRxns;
0684 
0685     %And their new positions
0686     newRxns=I+(toComp-1)*nRxns;
0687     
0688     %The metabolite ids also have to be changed in order to match the new
0689     %compartment
0690     metChange=nMets*(toComp-currentComp);
0691     
0692     %Update the reactions
0693     [I J K]=find(S(:,oldRxns));
0694     I=I+metChange;
0695 
0696     %Move the reactions
0697     S(:,oldRxns)=0;
0698     S(sub2ind(size(S),I,newRxns(J)))=K;
0699 end
0700 
0701 %Finds which metabolites are unconnected, in the sense that they are never
0702 %a product or only a product in a reversible reaction where one reactant is
0703 %only a product in the opposite direction of that reaction. This function
0704 %ignores exchange metabolites. Returns a vector of metabolite indexes.
0705 %metsToCheck is an array of metabolite indexes to check for connectivity.
0706 %If not supplied then all metabolites are checked
0707 function unconnected=findUnconnected(S,nEM,metsToCheck)
0708     if nargin>2
0709         %Do this by deleting everything from the network that is not in
0710         %metsToCheck and that is not exchange metabolites
0711         I=false(size(S,1),1);
0712         I(1:nEM)=true;
0713         I(metsToCheck)=true;
0714         S=S(I,:);
0715     end
0716     
0717     em=false(size(S,1),1);
0718     em(1:nEM)=true;
0719     
0720     %Construct a matrix in which the reversible reactions are inverted
0721     I=sum(S>2,1) | sum(S>2,1);
0722     revS=S;
0723     revS(:,I)=revS(:,I)*-1;
0724     
0725     %First calculate the ones that are ok
0726     %Produced in 2 rxns, is exchange, is not used at all, is produced in
0727     %non-reversible, involved in more than 1 reversible reactions
0728     connected=sum(S>0,2)>1 | em | sum(S~=0,2)==0 | sum(S(:,~I)>0,2)>0 | sum(S(:,I)~=0,2)>1;
0729     
0730     %Then get the ones that are unconnected because they are never produced
0731     unconnected=sum(S>0 | revS>0,2)==0 & connected==false;
0732     
0733     %Then get the ones that are potentially unconnected
0734     maybeUnconnected=~connected & ~unconnected;
0735     %maybeUnconnected=find(maybeUnconnectedS);
0736 
0737     %The metabolites in maybeUnconnected are involved in one reversible reaction and
0738     %not produced in any other reaction.
0739     %This means that the reactions which have at least one met in
0740     %maybeUnconnected as reactant and one as product are unconnected. The
0741     %metabolites in maybeUnconnected that are present in those reactions
0742     %are then dead ends
0743     deadRxns=any(S(maybeUnconnected,:)>0) & any(S(maybeUnconnected,:)<0);
0744     
0745     %Get the mets involved in any of those reactions
0746     problematic=any(S(:,deadRxns)~=0,2);
0747 
0748     %If any of these are in the maybeUnconnected list then the
0749     %metabolite is unconnected
0750     unconnected(problematic & maybeUnconnected)=true;
0751     
0752     %Map back to metsToCheck
0753     if nargin>2
0754         unconnected=metsToCheck(unconnected(nEM+1:end));
0755     else
0756         unconnected=find(unconnected);
0757     end
0758 end
0759 
0760 %Given a set of unconnected metabolites, this function tries to move each
0761 %gene that could connect any of them, calculates the number of newly connected
0762 %metabolites minus the number of newly disconnected metabolites. As some metabolites
0763 %are very connected, only 25 genes are checked. Genes that have a low score
0764 %in their current compartment are more likely to be moved.
0765 function [geneIndex moveTo deltaConnected deltaScore]=selectGenes(S,nEM,nMets,nER,nRxns,model,unconnected,g2c,GSS)
0766     %If moveTo is 0 then the gene can't connect any of the metabolites
0767     moveTo=zeros(numel(model.genes),1);
0768     deltaConnected=zeros(numel(model.genes),1);
0769 
0770     %First get where the metabolites are now
0771     nComps=size(g2c,2);
0772     comps=ceil((unconnected-nEM)/((size(S,1)-nEM)/nComps));
0773     
0774     %Find the corresponding metabolite indexes if they all were in the
0775     %default compartment
0776     dcIndexes=unconnected-(comps-1)*nMets;
0777     
0778     %Then find them if they were in any other compartment
0779     allIndexes=dcIndexes;
0780     for i=1:nComps-1
0781        allIndexes=[allIndexes;dcIndexes+nMets*i];
0782     end
0783     
0784     %Also check which reversible reactions that could be used
0785     I=sum(S>2,1) | sum(S>2,1);
0786     revS=S;
0787     revS(:,I)=revS(:,I)*-1;
0788     
0789     %Find all reactions that could make any of the unconnected metabolites
0790     %in some other compartment
0791     newMets=setdiff(allIndexes,unconnected);
0792     potential=find(sum(S(newMets,:)>0) | sum(revS(newMets,:)>0));
0793     potential(potential<=nER | potential>nER+nRxns*nComps)=[]; %No exchange rxns or transport rxns
0794     
0795     %Map J to the real metabolic reactions in model
0796     rxnComps=ceil((potential-nER)/(nRxns));
0797     
0798     %Find the corresponding reaction indexes if they all were in the
0799     %default compartment
0800     dcRxnIndexes=potential-(rxnComps-1)*nRxns;
0801     
0802     %Get the genes for those reactions
0803     genes=find(sum(model.rxnGeneMat(dcRxnIndexes,:)>0,1));
0804     
0805     %For some cases there can be very many reactions to connect something.
0806     %This is in particular true in the beginning of the optimization if,
0807     %say, ATP is unconnected. Therefore limit the number of genes to be
0808     %checked to 25. Weigh so that genes with bad scores in their current
0809     %compartment are more likely to be moved
0810     
0811     %Get scores for these genes
0812     [crap J]=find(g2c(genes,:));
0813 
0814     %Add a small weight so that genes in their best compartment could be
0815     %moved as well
0816     geneScores=GSS.scores(sub2ind(size(g2c),genes(:),J));
0817     modGeneScores=1.1-geneScores;
0818     if numel(genes)>25
0819         rGenes=genes(randsample(numel(genes),min(numel(genes),25),true,modGeneScores));
0820         
0821         %The sampling with weights could give duplicates
0822         rGenes=unique(rGenes);
0823         
0824         %Reorder the geneScores to match
0825         [crap I]=ismember(rGenes,genes);
0826         geneScores=geneScores(I);
0827         genes=rGenes;
0828     end
0829     for i=1:numel(genes)
0830         %Since we are moving one gene at a time, only metabolites involved
0831         %in any of the reactions for that gene can become unconnected. We
0832         %get them so speed up the algorithm.
0833         %First get all involved reactions in the default compartment
0834         rxns=find(model.rxnGeneMat(:,genes(i)));
0835         
0836         %Then get their mets
0837         mets=find(sum(model.S(:,rxns)~=0,2)>0);
0838 
0839         %Then get their indexes in all compartments
0840         allIndexes=mets;
0841         for j=1:nComps-1
0842            allIndexes=[allIndexes;mets+nMets*j];
0843         end
0844         
0845         %Check which of the unconnected metabolites that these
0846         %reactions correspond to. This could have been done earlier,
0847         %but it's fast. I skip the reversibility check because it's
0848         %unlikely to be an issue here. Worst case is that the gene is
0849         %tested once to much
0850         [I crap]=find(model.S(:,rxns));
0851         moveToComps=unique(comps(ismember(dcIndexes,I)));
0852         
0853         %Try to move the gene to each of the compartments
0854         bestMove=-inf;
0855         bestComp=[];
0856         for j=1:numel(moveToComps)
0857             newS=moveGene(S,model,g2c,genes(i),moveToComps(j),nRxns,nMets);
0858 
0859             %Check how many metabolites that are unconnected after moving
0860             %the gene
0861             dConnected=numel(unconnected)-numel(findUnconnected(newS,nEM,[allIndexes;unconnected]));
0862             if dConnected>bestMove
0863                 bestMove=dConnected;
0864                 bestComp=moveToComps(j);
0865             end
0866         end
0867 
0868         %Add the difference in connectivity and where the genes should
0869         %be moved
0870         moveTo(genes(i))=bestComp;
0871         deltaConnected(genes(i))=bestMove;
0872     end
0873     
0874     %Finish up
0875     geneIndex=genes(:);
0876     moveTo=moveTo(geneIndex);
0877     deltaConnected=deltaConnected(geneIndex);
0878     deltaScore=GSS.scores(sub2ind(size(g2c),geneIndex(:),moveTo))-geneScores;
0879 end
0880 
0881 %Small function to add a transport reactions between two metabolites.
0882 %Transport reactions are written as having a coefficient 2.0 for both
0883 %reactant and product. This is not a "real" reaction, but since all normal
0884 %reaction have coefficient -1/1 or -10/10 it's a compact way of writing it.
0885 function S=addTransport(S,nRxns,nER,nMets,nEM,nComps,metA,metB)
0886     mets=[metA;metB];
0887     %Find the current compartments for the metabolites
0888     comps=ceil((mets-nEM)/((size(S,1)-nEM)/nComps));
0889     
0890     if sum(comps==1)~=1
0891         throw(MException('','Tried to create a transport reaction from a non-default compartment')); 
0892     end
0893     
0894     %Calculate the reaction index
0895     rIndex=(nER+nRxns*nComps)+mets(comps~=1)-nEM-nMets;
0896     
0897     S(mets,rIndex)=2;
0898 end
0899 
0900 %Scores a network based on the localization of the genes and the number of
0901 %transporter reactions used.
0902 function [score geneScore transportCost]=scoreModel(S,g2c,GSS,transportCost)
0903     [I J]=find(g2c);
0904     geneScore=sum(GSS.scores(sub2ind(size(g2c),I,J)));
0905     transportCost=sum(sum(S==2,1)>0)*transportCost;
0906     score=geneScore-transportCost;
0907 end

Generated on Tue 23-Apr-2013 15:18:37 by m2html © 2005