Home > RAVEN > predictLocalization.m

predictLocalization

PURPOSE ^

predictLocalization

SYNOPSIS ^

function [outModel geneLocalization transportStruct scores removedRxns]=predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime,plotResults)

DESCRIPTION ^

 predictLocalization
   Tries to assign reactions to compartments in a manner that is in
   agreement with localization predictors while at the same time
   maintaining connectivity.

   model                 a model structure. If the model contains several
                         compartments they will be merged
   GSS                   gene scoring structure as from parseScores
   defaultCompartment    transport reactions are expressed as diffusion
                         between the defaultCompartment and the others.
                         This is usually the cytosol. The default
                         compartment must have a match in
                         GSS
   transportCost         the cost for including a transport reaction. If this
                         a scalar then the same cost is used for all metabolites.
                         It can also be a vector of costs with the same dimension
                         as model.mets. Note that negative costs will result in that
                         transport of the metabolite is encouraged (opt, default 0.5)
   maxTime               maximum optimization time in minutes (opt,
                         default 15)
   plotResults           true if the result should be plotted during the
                         optimization (opt false)

   outModel              the resulting model structure
   geneLocalization      structure with the genes and their resulting
                         localization
   transportStruct       structure with the transport reactions that had
                         to be inferred and between which compartments
   scores                structure that contains the total score history 
                         together with the score based on gene localization
                         and the score based on included transport reactions
   removedRxns           cell array with the reaction ids that had to be
                         removed in order to have a connected input model

   This function requires that the starting network is connected when it's in
   one compartment. Reactions that are unconnected are removed and saved
   in removedRxns. Try running fillGaps to have a more connected input
   model if there are many such reactions.

   In the final model all metabolites are produced in at least one reaction.
   This doesn't guarantee a fully functional model since there can be internal
   loops. Transport reactions are only included as passive diffusion (A <=> B).

   The score of a model is the sum of scores for all genes in their
   assigned compartment minus the cost of all transport reactions that 
   had to be included. A gene can only be assigned to one compartment.
   This is a simplification to keep the problem size down. The problem is 
   solved using simulated annealing.

   Usage: [outModel geneLocalization transportStruct score removedRxns]=...
       predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime)

   Rasmus Agren, 2013-09-12

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [outModel geneLocalization transportStruct scores removedRxns]=predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime,plotResults)
0002 % predictLocalization
0003 %   Tries to assign reactions to compartments in a manner that is in
0004 %   agreement with localization predictors while at the same time
0005 %   maintaining connectivity.
0006 %
0007 %   model                 a model structure. If the model contains several
0008 %                         compartments they will be merged
0009 %   GSS                   gene scoring structure as from parseScores
0010 %   defaultCompartment    transport reactions are expressed as diffusion
0011 %                         between the defaultCompartment and the others.
0012 %                         This is usually the cytosol. The default
0013 %                         compartment must have a match in
0014 %                         GSS
0015 %   transportCost         the cost for including a transport reaction. If this
0016 %                         a scalar then the same cost is used for all metabolites.
0017 %                         It can also be a vector of costs with the same dimension
0018 %                         as model.mets. Note that negative costs will result in that
0019 %                         transport of the metabolite is encouraged (opt, default 0.5)
0020 %   maxTime               maximum optimization time in minutes (opt,
0021 %                         default 15)
0022 %   plotResults           true if the result should be plotted during the
0023 %                         optimization (opt false)
0024 %
0025 %   outModel              the resulting model structure
0026 %   geneLocalization      structure with the genes and their resulting
0027 %                         localization
0028 %   transportStruct       structure with the transport reactions that had
0029 %                         to be inferred and between which compartments
0030 %   scores                structure that contains the total score history
0031 %                         together with the score based on gene localization
0032 %                         and the score based on included transport reactions
0033 %   removedRxns           cell array with the reaction ids that had to be
0034 %                         removed in order to have a connected input model
0035 %
0036 %   This function requires that the starting network is connected when it's in
0037 %   one compartment. Reactions that are unconnected are removed and saved
0038 %   in removedRxns. Try running fillGaps to have a more connected input
0039 %   model if there are many such reactions.
0040 %
0041 %   In the final model all metabolites are produced in at least one reaction.
0042 %   This doesn't guarantee a fully functional model since there can be internal
0043 %   loops. Transport reactions are only included as passive diffusion (A <=> B).
0044 %
0045 %   The score of a model is the sum of scores for all genes in their
0046 %   assigned compartment minus the cost of all transport reactions that
0047 %   had to be included. A gene can only be assigned to one compartment.
0048 %   This is a simplification to keep the problem size down. The problem is
0049 %   solved using simulated annealing.
0050 %
0051 %   Usage: [outModel geneLocalization transportStruct score removedRxns]=...
0052 %       predictLocalization(model,GSS,defaultCompartment,transportCost,maxTime)
0053 %
0054 %   Rasmus Agren, 2013-09-12
0055 %
0056 
0057 if nargin<4
0058     transportCost=ones(numel(model.mets),1)*0.5;
0059 end
0060 if numel(transportCost)==1
0061     transportCost=ones(numel(model.mets),1)*transportCost;
0062 end
0063 transportCost=transportCost(:);
0064 
0065 if numel(transportCost)~=numel(model.mets)
0066     dispEM('The vector of transport costs must have the same dimension as model.mets',true);
0067 end
0068 if nargin<5
0069     maxTime=15;
0070 end
0071 if nargin<6
0072     plotResults=false;
0073 end
0074 
0075 if isfield(model,'rxnComps')
0076    model=rmfield(model,'rxnComps');
0077    dispEM('The model structure contains information about reaction compartmentalization. This is not supported by this function. The rxnComps field has been deleted',false);
0078 end
0079 if isfield(model,'geneComps')
0080    model=rmfield(model,'geneComps');
0081    dispEM('The model structure contains information about gene compartmentalization. This is not supported by this function. The geneComps field has been deleted',false);
0082 end
0083 
0084 I=ismember(defaultCompartment,GSS.compartments);
0085 if I==false
0086     dispEM('defaultCompartment not found in GSS'); 
0087 end
0088 
0089 if numel(model.comps)>1
0090    dispEM('The model has several compartments. All compartments will be merged',false);
0091    model=mergeCompartments(model,true,true);
0092 end
0093 
0094 %***Begin formating the data structures
0095 
0096 %Expand the model so that iso-enzymes have different reactions
0097 model=expandModel(model);
0098 
0099 %Identify reactions that have to be deleted because the involved mets are
0100 %never produced. This is done in an iterative manner
0101 removedRxns={};
0102 %This is to keep track of which metabolites are removed in this step. It is
0103 %needed to adjust the transport costs
0104 originalModelMets=model.mets;
0105 while 1
0106     irrevModel=convertToIrrev(model);
0107 
0108     I=sum(irrevModel.S>0,2); 
0109 
0110     %Pretend that the unconstrained metabolites are made enough
0111     if isfield(irrevModel,'unconstrained')
0112         I(irrevModel.unconstrained~=0)=2;
0113     end
0114     metsToDelete=false(numel(model.mets),1);
0115 
0116     %This is not very neat but I loop through each metabolite and check whether
0117     %it can be produced (without using only one isolated reversible reaction)
0118     for i=1:numel(irrevModel.mets)
0119         %If something can be made in two reactions then everything is fine. If i
0120         %can be made in one reaction it's fine unless it's through an isolated
0121         %reversible reaction (which can act as a mini loop)
0122         if I(i)<2
0123             if I(i)==1
0124                 %Find the reaction where this metabolite is produced
0125                 [crap J]=find(irrevModel.S(i,:)>0);
0126 
0127                 %Check the metabolites that are consumed in this reaction. The
0128                 %problem is if any of them is only produced in the opposite
0129                 %reversible reaction
0130                 K=irrevModel.S(:,J)<0;
0131                 check=find(K & I<=1);
0132 
0133                 for j=1:numel(check)
0134                     %Find the reactions where it participates
0135                     [crap L]=find(irrevModel.S(check(j),:)>0);
0136 
0137                     if ~isempty(L)
0138                         rxn=irrevModel.rxns(J);
0139                         rxnRev=irrevModel.rxns(L);
0140                         if strcmp(strrep(rxn,'_REV',''),strrep(rxnRev,'_REV',''))
0141                            metsToDelete(i)=true;
0142                         end
0143                     else
0144                         %If the metabolite was never produced then do nothing and deal with
0145                         %it when the loop gets there :)
0146                         continue;
0147                     end
0148                 end
0149             else
0150                 %Not made anywhere
0151                 metsToDelete(i)=true;
0152             end
0153         end
0154     end
0155 
0156     if any(metsToDelete)
0157         %Delete any reactions involving any of the metsToDelete
0158         [crap I]=find(model.S(metsToDelete,:));
0159         removedRxns=[removedRxns;model.rxns(I)];
0160         model=removeRxns(model,I,true,true);
0161     else
0162         %All bad reactions deleted
0163         break;
0164     end
0165 end
0166 
0167 %Adjust the transport costs
0168 transportCost=transportCost(ismember(originalModelMets,model.mets));
0169 
0170 %Assign fake genes to reactions without genes. This is just to make things
0171 %easier later on
0172 I=find(sum(model.rxnGeneMat,2)==0);
0173 for i=1:numel(I)
0174     model.genes=[model.genes;['&&FAKE&&' num2str(i)]];
0175     if isfield(model,'geneShortNames')
0176         model.geneShortNames=[model.geneShortNames;{''}];
0177     end
0178     if isfield(model,'geneMiriams')
0179         model.geneMiriams=[model.geneMiriams;{[]}];
0180     end
0181     if isfield(model,'geneFrom')
0182         model.geneFrom=[model.geneFrom;{{'FAKE'}}];
0183     end
0184     model.rxnGeneMat(I(i),numel(model.genes))=1;
0185     model.grRules{I(i)}=''; 
0186 end
0187 
0188 %Update the GSS. All genes, fake or real, for which
0189 %there is no evidence gets a score 0.5 in all compartments. Also just to
0190 %make it easier further on
0191 I=setdiff(model.genes,GSS.genes);
0192 GSS.genes=[GSS.genes;I];
0193 GSS.scores=[GSS.scores;ones(numel(I),numel(GSS.compartments))*0.5];
0194 
0195 %Gene complexes should be moved together in order to be biologically
0196 %relevant. The average score for the genes is used for each compartment.
0197 %This is done by changing the model so that gene complexes are used as a
0198 %single gene name and then a score is calculated for that "gene".
0199 
0200 %Only "and"-relationships exist after expandModel
0201 genes=unique(model.grRules);
0202 nGenes=strrep(genes,'(','');
0203 nGenes=strrep(nGenes,')','');
0204 %nGenes=strrep(nGenes,' and ','_and_');
0205 complexes=setdiff(nGenes,model.genes);
0206 if ~isempty(complexes)
0207     if isempty(complexes{1}) %Empty grRules also come up here
0208         complexes(1)=[];
0209     end
0210 end
0211 cScores=zeros(numel(complexes),numel(GSS.compartments));
0212 for i=1:numel(complexes)
0213     genesInComplex=regexp(complexes{i},' and ','split');
0214 
0215     %Find these genes in GSS
0216     [I J]=ismember(genesInComplex,GSS.genes);
0217 
0218     if any(I)
0219         %Get the average of the genes that were found.
0220         mScores=mean(GSS.scores(J(I),:));
0221 
0222         %And add 0.5 for the genes that were not found in order to be
0223         %consistent with non-complexes
0224         mScores=(mScores.*sum(I)+(numel(genesInComplex)-sum(I))*0.5)/numel(genesInComplex);
0225     else
0226         dispEM(['Could not parse grRule "' complexes{i} '". Assigning score 0.0 in all compartments'],false);
0227         mScores=ones(1,numel(genesInComplex))*0.5;
0228     end
0229     cScores(i,:)=mScores;
0230 
0231     %Add this complex as a new gene
0232     model.genes=[model.genes;complexes{i}];
0233     if isfield(model,'geneMiriams')
0234         model.geneMiriams=[model.geneMiriams;{[]}];
0235     end
0236     if isfield(model,'geneShortNames')
0237         model.geneShortNames=[model.geneShortNames;{''}];
0238     end
0239     if isfield(model,'geneFrom')
0240         model.geneFrom=[model.geneFrom;{'COMPLEX'}];
0241     end
0242     %Find the reactions which had the original complex and change them to
0243     %use the new "gene"
0244     I=ismember(model.grRules,['(' complexes{i} ')']);
0245 
0246     %Should check more carefully if there can be an error here
0247     if ~isempty(I)
0248         model.rxnGeneMat(I,:)=0; %Ok since we have split on "or"
0249         model.rxnGeneMat(I,numel(model.genes))=1;
0250     end
0251 end
0252 
0253 %Add the new "genes"
0254 GSS.genes=[GSS.genes;complexes];
0255 GSS.scores=[GSS.scores;cScores];
0256 
0257 %After merging the complexes it could happen that there are genes that are
0258 %no longer in use. Delete such genes
0259 model=removeRxns(model,{},false,true);
0260 
0261 %Exchange reactions, defined as involving an unconstrained metabolite, are
0262 %special in that they have to stay in the defaultCompartment. This means
0263 %that uptake/excretion of metabolites is always via the default
0264 %compartment. This is a small simplification, but should be valid in most
0265 %cases
0266 [crap I]=getExchangeRxns(model);
0267 
0268 %It will be easier later on if the same place. Put them in the beginning
0269 J=1:numel(model.rxns);
0270 J(I)=[];
0271 model=permuteModel(model,[I;J'],'rxns');
0272 
0273 %Number of exchange reactions
0274 nER=numel(I);
0275 
0276 %Also put the exchange metabolites in the beginning
0277 if isfield(model,'unconstrained')
0278     I=find(model.unconstrained);
0279     J=1:numel(model.mets);
0280     J(I)=[];
0281     model=permuteModel(model,[I;J'],'mets');
0282     %Also reorder the transport costs
0283     transportCost=transportCost([I;J']);
0284     %Number of exchange metabolites
0285     nEM=numel(I);
0286 else
0287     nEM=0;
0288 end
0289 
0290 %There is no point of having genes for exchange reactions, so delete them.
0291 %Also to make computations easier.
0292 model.rxnGeneMat(1:nER,:)=0;
0293 model.grRules(1:nER)={''};
0294 
0295 %Remove unused genes
0296 model=removeRxns(model,{},false,true);
0297 
0298 %Remove genes with no match to the model and reorder so that the genes are
0299 %in the same order as model.genes. Since we have already added fake genes
0300 %so that all genes in model exist in GSS it's fine to do
0301 %like this.
0302 [crap J]=ismember(model.genes,GSS.genes);
0303 GSS.genes=model.genes;
0304 GSS.scores=GSS.scores(J,:);
0305 
0306 %Reorder the GSS so that the first index corresponds to the
0307 %default compartment
0308 [crap J]=ismember(defaultCompartment,GSS.compartments);
0309 reorder=1:numel(GSS.compartments);
0310 reorder(J)=[];
0311 reorder=[J reorder];
0312 GSS.scores=GSS.scores(:,reorder);
0313 GSS.compartments=GSS.compartments(reorder);
0314 
0315 %Since we're only looking at whether the metabolites can be synthesized, we
0316 %don't have to care about the stoichiometry. Change to -1/1 to simplify
0317 %later. Keep the S matrix for later though.
0318 oldS=model.S;
0319 model.S(model.S>0)=1;
0320 model.S(model.S<0)=-1;
0321 
0322 %Here I do a bit of a trick. Since I don't want to calculate which
0323 %reactions are reversible all the time, I let reversible reactions have the
0324 %coefficients -10/10 instead of -1/1
0325 model.S(:,model.rev==1)=model.S(:,model.rev==1).*10;
0326 
0327 %***Begin problem formulation
0328 
0329 %Some numbers that are good to have
0330 nRxns=numel(model.rxns)-nER; %Excluding exchange rxns
0331 nMets=numel(model.mets)-nEM; %Excluding exchange mets
0332 nGenes=numel(model.genes);
0333 nComps=numel(GSS.compartments);
0334 
0335 %Create a big stoichiometric matrix that will be the current model. In
0336 %order to have faster simulations the maximal model size is declared and
0337 %reactions are then moved within it.
0338 
0339 %First the original model (with the first nE being exchange rxns), then
0340 %reserve space for number of rxns minus exchange rxns for each non-default
0341 %compartment, then transport reactions for all non-exchange mets between
0342 %the default compartment and all others.
0343 %NOTE: Kept eye()*0 since eye() can be used to include all transport from
0344 %the beginning
0345 s=repmat(eye(nMets)*0,1,nComps-1);
0346 s=[zeros(numel(model.mets)-nMets,size(s,2));s];
0347 S=[model.S sparse(numel(model.mets),nRxns*(nComps-1)) s];
0348 s=[sparse(nMets*(nComps-1),numel(model.rxns)+nRxns*(nComps-1)) eye(nMets*(nComps-1))*0];
0349 S=[S;s];
0350 
0351 %Also replicate the transport costs
0352 transportCost=[transportCost(1:nEM);repmat(transportCost(nEM+1:end),nComps,1)];
0353 
0354 %Create a binary matrix that says where the genes are in the current
0355 %solution
0356 g2c=false(nGenes,nComps);
0357 %All genes start in the default compartment
0358 g2c(:,1)=true;
0359 
0360 %Start of main optimization loop
0361 tic;
0362 bestScore=-inf;
0363 bestS=[];
0364 bestg2c=[];
0365 
0366 %Temp for testing
0367 plotScore=[];
0368 nTrans=[];
0369 totScore=[];
0370 minScore=sum(min(GSS.scores,[],2));
0371 maxScore=sum(max(GSS.scores,[],2));
0372 
0373 while toc<maxTime*60
0374    %Pick a random gene, weighted by it's current score minus the best score
0375    %for that gene (often 1.0, but can be 0.5 for no genes or average for complexes.
0376    %Genes with bad fits are more likely to be moved. This formulation never
0377    %moves a gene from its best compartment. Therefore a small uniform
0378    %weight is added.
0379    [I J]=find(g2c);
0380    geneToMove=randsample(nGenes,1,true,max(GSS.scores(I,:),[],2)-GSS.scores(sub2ind(size(g2c),I,J))+0.1);
0381 
0382    %Sample among possible compartments to move to. Add a larger weight to
0383    %even out the odds a little. Also a way of getting rid of loops where
0384    %the same set of genes are moved back and forth several times.
0385    toComp=randsample(nComps,1,true,GSS.scores(geneToMove,:)+0.2);
0386    
0387    %Check that it moves to a new compartment
0388    if toComp==find(g2c(geneToMove,:))
0389        continue;
0390    end
0391    
0392    %Moves the gene
0393    [newS newg2c]=moveGene(S,model,g2c,geneToMove,toComp,nRxns,nMets);
0394    
0395    %Tries to connect the network. If this was not possible in 10
0396    %iterations, then abort. If more than 20 modifications were needed then
0397    %it's unlikely that it will be a lower score
0398    wasConnected=false;
0399    for j=1:10
0400        %Find the metabolites that are now unconnected
0401        unconnected=findUnconnected(newS,nEM);
0402        
0403        %Continue if there are still unconnected
0404        if any(unconnected)
0405            %For each gene find out how many of these could be connected if
0406            %the gene was moved and how many would be disconnected by moving
0407            %that gene
0408            [geneIndex moveTo deltaConnected deltaScore]=selectGenes(newS,nEM,nMets,nER,nRxns,model,unconnected,g2c,GSS);
0409 
0410            %Score which gene would be the best to move. The highest
0411            %deltaScore is 1.0. I want it to be possible to move a gene from
0412            %worst to best compartment even if it disconnects, say, 1.5 more
0413            %metabolites.
0414            [score I]=max(1.5*deltaScore+deltaConnected);
0415            
0416            %Checks if it has to add a transport or if there is a gene that
0417            %could be moved order to have a more connected network
0418            hasToAddTransport=true;
0419            if ~isempty(deltaConnected)
0420               if score>0
0421                   hasToAddTransport=false;
0422               end
0423            end
0424            
0425            %If it is possible to move any gene in order to have a more
0426            %connected network, then move the best one
0427            if hasToAddTransport==false;
0428                 [newS newg2c]=moveGene(newS,model,g2c,geneIndex(I),moveTo(I),nRxns,nMets);
0429            else
0430                 %Choose a random unconnected metabolite that should be
0431                 %connected
0432                 transMet=unconnected(randsample(numel(unconnected),1));
0433                 
0434                 %First get where the metabolite is now
0435                 comps=ceil((transMet-nEM)/((size(S,1)-nEM)/nComps));
0436 
0437                 %Find the corresponding metabolite index if it were in the
0438                 %default compartment
0439                 dcIndex=transMet-(comps-1)*nMets;
0440                 
0441                 %Then get the indexes of that metabolite in all
0442                 %compartments
0443                 allIndexes=dcIndex;
0444                 for k=1:nComps-1
0445                    allIndexes=[allIndexes;dcIndex+nMets*k];
0446                 end
0447                 
0448                 %It could be that some of these aren't used in any
0449                 %reaction. Get only the ones which are
0450                 I=sum(newS(allIndexes,:)~=0,2)>0;
0451                 
0452                 %Then get the ones that are used but not in unconnected.
0453                 %These are metabolites that could potentially be
0454                 %transported to connect transMet
0455                 connectedUsed=setdiff(allIndexes(I),unconnected);
0456                 
0457                 %I think this is an error but I leave it for now. It seems
0458                 %to happen if nothing can be connected in one step
0459                 if isempty(connectedUsed)
0460                    break;
0461                 end
0462                 
0463                 %If transMet is in the default compartment then everything
0464                 %is fine, just connect it to a random one
0465                 if transMet==dcIndex
0466                     newS=addTransport(newS,nRxns,nER,nMets,nEM,nComps,transMet,connectedUsed(randsample(numel(connectedUsed),1)));
0467                 else
0468                     %If one of the connectedUsed is in the default
0469                     %compartment then connect to that one
0470                     I=connectedUsed(connectedUsed<(nMets+nEM));
0471                     if any(I)
0472                         newS=addTransport(newS,nRxns,nER,nMets,nEM,nComps,transMet,I(randsample(numel(I),1)));
0473                     else
0474                         %This is if the only way to connect it is by adding
0475                         %two transport reactions, going via the default
0476                         %compartment
0477                         break;
0478                     end
0479                 end
0480            end
0481        else
0482            wasConnected=true;
0483            break;
0484        end
0485    end
0486    
0487    %If the network was connected in a new way, it is possible that some
0488    %transport reactions are no longer needed. They should be removed
0489    if wasConnected==true        
0490         %These are the metabolites that are being transported
0491         activeTransport=find(sum(newS(:,nER+nRxns*nComps+1:end),2));
0492         
0493         %Get the metabolites that are unconnected if transport wasn't used
0494         unconnected=findUnconnected(newS(:,1:nER+nRxns*nComps),nEM);
0495         
0496         %Find the transport reactions that are not needed and delete them
0497         I=setdiff(activeTransport,unconnected);
0498         
0499         %Since both metabolites in a transport rxns must be connected for
0500         %the reaction to be deleted, the sum over the colums should be 4.
0501         newS(:,find(sum(newS(I,nER+nRxns*nComps+1:end))==4)+nER+nRxns*nComps)=0;
0502        
0503         %Score the solution and determine whether to keep it as a new solution
0504         [score geneScore trCost]=scoreModel(newS,newg2c,GSS,transportCost);
0505 
0506         %If it was the best solution so far, keep it
0507         if score>bestScore
0508             bestScore=score;
0509             bestS=newS;
0510             bestg2c=newg2c;
0511         end
0512         
0513         %This should not be steepest descent later
0514         if score>=bestScore% || exp((score-bestScore)*7)>rand()
0515             plotScore=[plotScore;geneScore];
0516             nTrans=[nTrans;trCost];
0517             totScore=[totScore;score];
0518             S=newS;
0519             g2c=newg2c;
0520             
0521             if plotResults==true
0522                 subplot(3,2,1);
0523                 spy(S);
0524                 subplot(3,2,2);
0525                 plot(plotScore,'r');
0526                 xlabel('Gene score');
0527                 subplot(3,2,3);
0528                 plot((plotScore-minScore)/(maxScore-minScore),'r');
0529                 xlabel('Gene score relative to predictions');
0530                 subplot(3,2,4);
0531                 plot(nTrans,'g');
0532                 xlabel('Transport cost');
0533                 subplot(3,2,5);
0534                 plot(totScore,'b');
0535                 xlabel('Total score');
0536                 subplot(3,2,6);
0537                 pause(0.2);
0538             end
0539         end
0540    end
0541 end
0542 scores.totScore=score;
0543 scores.geneScore=geneScore;
0544 scores.transCost=trCost;
0545 
0546 %Find which metabolites are transported and to where
0547 [I J]=find(bestS(nEM+1:nEM+nMets,end-nMets*(nComps-1)+1:end));
0548 J=ceil(J/nMets+1);
0549 transportStruct.mets=model.metNames(I+nEM);
0550 transportStruct.toComp=GSS.compartments(J);
0551 
0552 [I J]=find(bestg2c);
0553 geneLocalization.genes=GSS.genes(I);
0554 geneLocalization.comps=GSS.compartments(J);
0555 
0556 %Resort the gene names
0557 [crap I]=sort(geneLocalization.genes);
0558 geneLocalization.genes=geneLocalization.genes(I);
0559 geneLocalization.comps=geneLocalization.comps(I);
0560 
0561 %Remove the fake genes
0562 I=strmatch('&&FAKE&&',geneLocalization.genes);
0563 geneLocalization.genes(I)=[];
0564 geneLocalization.comps(I)=[];
0565 
0566 %Put together the model. This is done by first duplicating the S matrix into the
0567 %different compartments. Then the transport reactions are added based on
0568 %transportStruct. By now model.S should have the same size as the S matrix
0569 %used in the optimization, but with conserved stoichiometry. In the final
0570 %step all reactions and metabolites that aren't used in the S matrix from the optimization
0571 %are deleted from the model.
0572 outModel=model;
0573 outModel.S=oldS;
0574 
0575 %This is the S matrix without exchange rxns or metabolites
0576 copyPart=outModel.S(nEM+1:end,nER+1:end);
0577 
0578 %Replicate to give the rxnGeneMat for the full system
0579 copyRxnGeneMat=outModel.rxnGeneMat(nER+1:end,:);
0580 outModel.rxnGeneMat=[outModel.rxnGeneMat;repmat(copyRxnGeneMat,nComps-1,1)];
0581 
0582 %First fix the compartments. The model is already ordered with the exchange
0583 %metabolites first. The original model may contain one or two compartments,
0584 %depending on whether any exchange metabolites are defined.
0585 nStartComps=numel(outModel.comps);
0586 if nStartComps==1
0587    outModel.comps={'1'};
0588    outModel.compNames=GSS.compartments(1);
0589 else
0590     if model.metComps(1)==1
0591         outModel.compNames(1)=GSS.compartments(1);
0592     else
0593         outModel.compNames(2)=GSS.compartments(1);
0594     end
0595 end
0596 outModel.compNames=[outModel.compNames;GSS.compartments(2:end)];
0597 
0598 %Ugly little loop
0599 for i=1:numel(GSS.compartments)-1
0600     outModel.comps=[outModel.comps;num2str(numel(outModel.comps)+1)];
0601 end
0602 %This information is not known from the data, so empty fields are added
0603 outModel.compOutside=cell(numel(outModel.comps),1);
0604 outModel.compOutside(:)={''};
0605 
0606 for i=1:nComps-1
0607     outModel.S=[outModel.S sparse(size(outModel.S,1),nRxns)];
0608     outModel.S=[outModel.S;[sparse(nMets,nRxns*i+nER) copyPart]];
0609     outModel.rxns=[outModel.rxns;strcat(outModel.rxns(nER+1:nER+nRxns),'_',GSS.compartments{i+1})];
0610     outModel.rxnNames=[outModel.rxnNames;strcat(outModel.rxnNames(nER+1:nER+nRxns),' (',GSS.compartments{i+1},')')];
0611     outModel.lb=[outModel.lb;outModel.lb(nER+1:nER+nRxns)];
0612     outModel.ub=[outModel.ub;outModel.ub(nER+1:nER+nRxns)];
0613     outModel.rev=[outModel.rev;outModel.rev(nER+1:nER+nRxns)];
0614     outModel.c=[outModel.c;outModel.c(nER+1:nER+nRxns)];
0615     if isfield(outModel,'grRules')
0616         outModel.grRules=[outModel.grRules;outModel.grRules(nER+1:nER+nRxns)];
0617     end
0618     if isfield(outModel,'subSystems')
0619         outModel.subSystems=[outModel.subSystems;outModel.subSystems(nER+1:nER+nRxns)];
0620     end
0621     if isfield(outModel,'eccodes')
0622         outModel.eccodes=[outModel.eccodes;outModel.eccodes(nER+1:nER+nRxns)];
0623     end
0624     if isfield(outModel,'rxnFrom')
0625         outModel.rxnFrom=[outModel.rxnFrom;outModel.rxnFrom(nER+1:nER+nRxns)];
0626     end
0627     if isfield(outModel,'rxnMiriams')
0628         outModel.rxnMiriams=[outModel.rxnMiriams;outModel.rxnMiriams(nER+1:nER+nRxns)];
0629     end
0630     outModel.mets=[outModel.mets;strcat(outModel.mets(nEM+1:nEM+nMets),'_',GSS.compartments{i+1})];
0631     outModel.metNames=[outModel.metNames;outModel.metNames(nEM+1:nEM+nMets)];
0632     outModel.b=[outModel.b;outModel.b(nEM+1:nEM+nMets,:)];
0633     I=ones(nMets,1)*nStartComps+i;
0634     outModel.metComps=[outModel.metComps;I];
0635     if isfield(outModel,'inchis')
0636         outModel.inchis=[outModel.inchis;outModel.inchis(nEM+1:nEM+nMets)];
0637     end
0638     if isfield(outModel,'unconstrained')
0639         outModel.unconstrained=[outModel.unconstrained;outModel.unconstrained(nEM+1:nEM+nMets)];
0640     end
0641     if isfield(outModel,'metMiriams')
0642         outModel.metMiriams=[outModel.metMiriams;outModel.metMiriams(nEM+1:nEM+nMets)];
0643     end
0644     if isfield(outModel,'metFormulas')
0645         outModel.metFormulas=[outModel.metFormulas;outModel.metFormulas(nEM+1:nEM+nMets)];
0646     end
0647     if isfield(outModel,'metFrom')
0648         outModel.metFrom=[outModel.metFrom;outModel.metFrom(nEM+1:nEM+nMets)];
0649     end
0650 end
0651 
0652 %Add the transport reactions
0653 transS=bestS(:,numel(outModel.rxns)+1:end);
0654 J=sum(transS)>0; %Active rxns
0655 
0656 %Transport reactions are written in a different way compared to a "real"
0657 %stoichimetric matrix. This is to fix that
0658 transS(transS~=0)=1;
0659 transS(1:nEM+nMets,:)=transS(1:nEM+nMets,:)*-1;
0660 I=find(sum(transS>0,2));
0661 nTransRxns=numel(I);
0662 outModel.S=[outModel.S transS(:,J)];
0663 filler=ones(nTransRxns,1);
0664 outModel.lb=[outModel.lb;filler*-1000];
0665 outModel.ub=[outModel.ub;filler*1000];
0666 outModel.rev=[outModel.rev;filler];
0667 outModel.c=[outModel.c;filler*0];
0668 outModel.rxnGeneMat=[outModel.rxnGeneMat;sparse(nTransRxns,numel(outModel.genes))];
0669 
0670 for i=1:numel(I)
0671     outModel.rxns=[outModel.rxns;strcat('transport',num2str(i))];
0672     outModel.rxnNames=[outModel.rxnNames;['Transport of ',outModel.metNames{I(i)}]];
0673     if isfield(outModel,'grRules')
0674         outModel.grRules=[outModel.grRules;{''}];
0675     end
0676     if isfield(outModel,'rxnMiriams')
0677         outModel.rxnMiriams=[outModel.rxnMiriams;{[]}];
0678     end
0679     if isfield(outModel,'subSystems')
0680         outModel.subSystems=[outModel.subSystems;'Inferred transport reactions'];
0681     end
0682     if isfield(outModel,'eccodes')
0683         outModel.eccodes=[outModel.eccodes;{''}];
0684     end
0685     if isfield(outModel,'rxnFrom')
0686         outModel.rxnFrom=[outModel.rxnFrom;{''}];
0687     end
0688 end
0689 
0690 %Then remove all reactions and metabolites that aren't used in the final
0691 %solution from the optimization
0692 [I J]=find(bestS(:,1:nER+nComps*nRxns));
0693 K=true(numel(outModel.rxns),1);
0694 K(J)=false;
0695 K(end-nTransRxns+1:end)=false;
0696 outModel=removeRxns(outModel,K,true);
0697 
0698 %Remove all fake genes
0699 I=strmatch('&&FAKE&&',outModel.genes);
0700 outModel.genes(I)=[];
0701 if isfield(outModel,'geneMiriams')
0702     outModel.geneMiriams(I)=[];
0703 end
0704 if isfield(outModel,'geneShortNames')
0705     outModel.geneShortNames(I)=[];
0706 end
0707 outModel.rxnGeneMat(:,I)=[];
0708 end
0709 
0710 %Moves a gene and all associated reactions from one compartment to another
0711 function [S g2c]=moveGene(S,model,g2c,geneToMove,toComp,nRxns,nMets)
0712     %Find the current compartment and update to the new one
0713     currentComp=find(g2c(geneToMove,:));
0714     g2c(geneToMove,:)=false;
0715     g2c(geneToMove,toComp)=true;
0716     
0717     %Find the reactions in the original model that the gene controls
0718     [I crap]=find(model.rxnGeneMat(:,geneToMove));
0719     
0720     %Calculate their current positions in the S matrix
0721     oldRxns=I+(currentComp-1)*nRxns;
0722 
0723     %And their new positions
0724     newRxns=I+(toComp-1)*nRxns;
0725     
0726     %The metabolite ids also have to be changed in order to match the new
0727     %compartment
0728     metChange=nMets*(toComp-currentComp);
0729     
0730     %Update the reactions
0731     [I J K]=find(S(:,oldRxns));
0732     I=I+metChange;
0733 
0734     %Move the reactions
0735     S(:,oldRxns)=0;
0736     S(sub2ind(size(S),I,newRxns(J)))=K;
0737 end
0738 
0739 %Finds which metabolites are unconnected, in the sense that they are never
0740 %a product or only a product in a reversible reaction where one reactant is
0741 %only a product in the opposite direction of that reaction. This function
0742 %ignores exchange metabolites. Returns a vector of metabolite indexes.
0743 %metsToCheck is an array of metabolite indexes to check for connectivity.
0744 %If not supplied then all metabolites are checked
0745 function unconnected=findUnconnected(S,nEM,metsToCheck)
0746     if nargin>2
0747         %Do this by deleting everything from the network that is not in
0748         %metsToCheck and that is not exchange metabolites
0749         I=false(size(S,1),1);
0750         I(1:nEM)=true;
0751         I(metsToCheck)=true;
0752         S=S(I,:);
0753     end
0754     
0755     em=false(size(S,1),1);
0756     em(1:nEM)=true;
0757     
0758     %Construct a matrix in which the reversible reactions are inverted
0759     I=sum(S>2,1) | sum(S>2,1);
0760     revS=S;
0761     revS(:,I)=revS(:,I)*-1;
0762     
0763     %First calculate the ones that are ok
0764     %Produced in 2 rxns, is exchange, is not used at all, is produced in
0765     %non-reversible, involved in more than 1 reversible reactions
0766     connected=sum(S>0,2)>1 | em | sum(S~=0,2)==0 | sum(S(:,~I)>0,2)>0 | sum(S(:,I)~=0,2)>1;
0767     
0768     %Then get the ones that are unconnected because they are never produced
0769     unconnected=sum(S>0 | revS>0,2)==0 & connected==false;
0770     
0771     %Then get the ones that are potentially unconnected
0772     maybeUnconnected=~connected & ~unconnected;
0773     %maybeUnconnected=find(maybeUnconnectedS);
0774 
0775     %The metabolites in maybeUnconnected are involved in one reversible reaction and
0776     %not produced in any other reaction.
0777     %This means that the reactions which have at least one met in
0778     %maybeUnconnected as reactant and one as product are unconnected. The
0779     %metabolites in maybeUnconnected that are present in those reactions
0780     %are then dead ends
0781     deadRxns=any(S(maybeUnconnected,:)>0) & any(S(maybeUnconnected,:)<0);
0782     
0783     %Get the mets involved in any of those reactions
0784     problematic=any(S(:,deadRxns)~=0,2);
0785 
0786     %If any of these are in the maybeUnconnected list then the
0787     %metabolite is unconnected
0788     unconnected(problematic & maybeUnconnected)=true;
0789     
0790     %Map back to metsToCheck
0791     if nargin>2
0792         unconnected=metsToCheck(unconnected(nEM+1:end));
0793     else
0794         unconnected=find(unconnected);
0795     end
0796 end
0797 
0798 %Given a set of unconnected metabolites, this function tries to move each
0799 %gene that could connect any of them, calculates the number of newly connected
0800 %metabolites minus the number of newly disconnected metabolites. As some metabolites
0801 %are very connected, only 25 genes are checked. Genes that have a low score
0802 %in their current compartment are more likely to be moved.
0803 function [geneIndex moveTo deltaConnected deltaScore]=selectGenes(S,nEM,nMets,nER,nRxns,model,unconnected,g2c,GSS)
0804     %If moveTo is 0 then the gene can't connect any of the metabolites
0805     moveTo=zeros(numel(model.genes),1);
0806     deltaConnected=zeros(numel(model.genes),1);
0807 
0808     %First get where the metabolites are now
0809     nComps=size(g2c,2);
0810     comps=ceil((unconnected-nEM)/((size(S,1)-nEM)/nComps));
0811     
0812     %Find the corresponding metabolite indexes if they all were in the
0813     %default compartment
0814     dcIndexes=unique(unconnected-(comps-1)*nMets);
0815     
0816     %Then find them if they were in any other compartment
0817     allIndexes=dcIndexes;
0818     for i=1:nComps-1
0819        allIndexes=[allIndexes;dcIndexes+nMets*i];
0820     end
0821     
0822     %Also check which reversible reactions that could be used
0823     I=sum(S>2,1) | sum(S>2,1);
0824     revS=S;
0825     revS(:,I)=revS(:,I)*-1;
0826     
0827     %Find all reactions that could make any of the unconnected metabolites
0828     %in some other compartment
0829     newMets=setdiff(allIndexes,unconnected);
0830     [crap potential]=find(S(newMets,:)>0 | revS(newMets,:)>0);
0831     %[crap potential]=find(sum(S(newMets,:)>0,1) | sum(revS(newMets,:)>0,1));
0832     potential(potential<=nER | potential>nER+nRxns*nComps)=[]; %No exchange rxns or transport rxns
0833     
0834     %Map J to the real metabolic reactions in model
0835     rxnComps=ceil((potential-nER)/(nRxns));
0836     
0837     %Find the corresponding reaction indexes if they all were in the
0838     %default compartment
0839     dcRxnIndexes=potential-(rxnComps-1)*nRxns;
0840     
0841     %Get the genes for those reactions
0842     genes=find(sum(model.rxnGeneMat(dcRxnIndexes,:)>0,1));
0843     
0844     %For some cases there can be very many reactions to connect something.
0845     %This is in particular true in the beginning of the optimization if,
0846     %say, ATP is unconnected. Therefore limit the number of genes to be
0847     %checked to 25. Weigh so that genes with bad scores in their current
0848     %compartment are more likely to be moved
0849     
0850     %Get scores for these genes
0851     [crap J]=find(g2c(genes,:));
0852 
0853     %Add a small weight so that genes in their best compartment could be
0854     %moved as well
0855     geneScores=GSS.scores(sub2ind(size(g2c),genes(:),J));
0856     modGeneScores=1.1-geneScores;
0857     if numel(genes)>25
0858         rGenes=genes(randsample(numel(genes),min(numel(genes),25),true,modGeneScores));
0859         
0860         %The sampling with weights could give duplicates
0861         rGenes=unique(rGenes);
0862         
0863         %Reorder the geneScores to match
0864         [crap I]=ismember(rGenes,genes);
0865         geneScores=geneScores(I);
0866         genes=rGenes;
0867     end
0868     for i=1:numel(genes)
0869         %Since we are moving one gene at a time, only metabolites involved
0870         %in any of the reactions for that gene can become unconnected. We
0871         %get them so speed up the algorithm.
0872         %First get all involved reactions in the default compartment
0873         rxns=find(model.rxnGeneMat(:,genes(i)));
0874         
0875         %Then get their mets
0876         mets=find(sum(model.S(:,rxns)~=0,2)>0);
0877 
0878         %Then get their indexes in all compartments
0879         allIndexes=mets;
0880         for j=1:nComps-1
0881            allIndexes=[allIndexes;mets+nMets*j];
0882         end
0883         
0884         %Check which of the unconnected metabolites that these
0885         %reactions correspond to. This could have been done earlier,
0886         %but it's fast. I skip the reversibility check because it's
0887         %unlikely to be an issue here. Worst case is that the gene is
0888         %tested once to much
0889         [I crap]=find(model.S(:,rxns));
0890         moveToComps=unique(comps(ismember(dcIndexes,I)));
0891         
0892         %Try to move the gene to each of the compartments
0893         bestMove=-inf;
0894         bestComp=[];
0895         for j=1:numel(moveToComps)
0896             newS=moveGene(S,model,g2c,genes(i),moveToComps(j),nRxns,nMets);
0897 
0898             %Check how many metabolites that are unconnected after moving
0899             %the gene
0900             dConnected=numel(unconnected)-numel(findUnconnected(newS,nEM,[allIndexes;unconnected]));
0901             if dConnected>bestMove
0902                 bestMove=dConnected;
0903                 bestComp=moveToComps(j);
0904             end
0905         end
0906 
0907         %Add the difference in connectivity and where the genes should
0908         %be moved
0909         moveTo(genes(i))=bestComp;
0910         deltaConnected(genes(i))=bestMove;
0911     end
0912     
0913     %Finish up
0914     geneIndex=genes(:);
0915     moveTo=moveTo(geneIndex);
0916     deltaConnected=deltaConnected(geneIndex);
0917     deltaScore=GSS.scores(sub2ind(size(g2c),geneIndex(:),moveTo))-geneScores;
0918 end
0919 
0920 %Small function to add a transport reactions between two metabolites.
0921 %Transport reactions are written as having a coefficient 2.0 for both
0922 %reactant and product. This is not a "real" reaction, but since all normal
0923 %reaction have coefficient -1/1 or -10/10 it's a compact way of writing it.
0924 function S=addTransport(S,nRxns,nER,nMets,nEM,nComps,metA,metB)
0925     mets=[metA;metB];
0926     %Find the current compartments for the metabolites
0927     comps=ceil((mets-nEM)/((size(S,1)-nEM)/nComps));
0928     
0929     if sum(comps==1)~=1
0930         dispEM('Tried to create a transport reaction from a non-default compartment'); 
0931     end
0932     
0933     %Calculate the reaction index
0934     rIndex=(nER+nRxns*nComps)+mets(comps~=1)-nEM-nMets;
0935     
0936     S(mets,rIndex)=2;
0937 end
0938 
0939 %Scores a network based on the localization of the genes and the number of
0940 %transporter reactions used.
0941 function [score geneScore transportCost]=scoreModel(S,g2c,GSS,transportCost)
0942     [I J]=find(g2c);
0943     geneScore=sum(GSS.scores(sub2ind(size(g2c),I,J)));
0944     [I crap]=find(S==2);
0945     I=unique(I);
0946     transportCost=sum(transportCost(I));
0947     score=geneScore-transportCost;
0948 end

Generated on Mon 06-Jan-2014 14:58:12 by m2html © 2005