Home > RAVEN > parseTaskList.m

parseTaskList

PURPOSE ^

parseTaskList

SYNOPSIS ^

function taskStruct=parseTaskList(inputFile)

DESCRIPTION ^

 parseTaskList
   Parses a task list file.

   inputFile       a task list in Excel format. The file must contain a
                   sheet named TASKS, which in turn may contain the
                   following column headers (note, all rows starting with
                   a non-empty cell are removed. The first row after that
                   is considered the headers):
                   ID 
                       the only required header. Each task must have a
                       unique id (string or numeric). Tasks can span multiple
                       rows, only the first row in each task should have
                       an id
                   DESCRIPTION
                       description of the task
                   IN
                       allowed input(s) for the task. Metabolite names
                       should be on the form
                       "model.metName[model.comps]". Several inputs
                       can be delimited by ";". If so, then the same
                       bounds are used for all inputs. If that is not
                       wanted, then use several rows for the task
                   IN LB
                       lower bound for the uptake of the metabolites in
                       the row (opt, default 0 which corresponds to a
                       minimal uptake of 0 units)
                   IN UB
                       upper bound for the uptake of the metabolites in
                       the row (opt, default 1000 which corresponds to a
                       maximal uptake of 1000 units)
                   OUT
                       allowed output(s) for the task (see IN)
                   OUT LB
                       lower bound for the production of the metabolites in
                       the row (opt, default 0 which corresponds to a
                       minimal production of 0 units)
                   OUT UB
                       upper bound for the production of the metabolites in
                       the row (opt, default 1000 which corresponds to a
                       maximal production of 1000 units)
                   EQU
                       equation to add. The equation should be on the form
                       "0.4 A + 2 B <=> (or =>) C" and the metabolites
                       should be on the form
                       "model.metName[model.comps]" (opt)
                   EQU LB
                       lower bound for the equation (opt, default -1000
                       for reversible and 0 for irreversible)
                   EQU UB
                       upper bound for the equation (opt, default 1000)
                   CHANGED RXN
                       reaction ID for which to change the bounds for.
                       Several IDs can be delimited by ";". If so, 
                       then the same bounds are used for all reactions. If
                       that is not wanted, then use several rows for the task
                   CHANGED LB
                       lower bound for the reaction
                   CHANGED UB
                       upper bound for the reaction
                   SHOULD FAIL
                       true if the correct behavior of the model is to
                       not have a feasible solution given the constraints
                       (opt, default false)
                   PRINT FLUX
                       true if the function should print the corresponding
                       flux distribution for a task. Can be useful for
                       testing (opt, default false)

   taskStruct      array of structures with the following fields
       id          the id of the task
       description the description of the task
       shouldFail  true if the task should fail
       printFluxes true if the fluxes should be printed
       comments    string with comments
       inputs      cell array with input metabolites (in the form metName[comps]) 
       LBin        array with lower bounds on inputs (default, 0)
       UBin        array with upper bounds on inputs (default, 1000)
       outputs     cell array with output metabolites (in the form metName[comps])
       LBout       array with lower bounds on outputs (default, 0)
       UBout       array with upper bounds on outputs (default, 1000)
       equations   cell array with equations (with mets in the form metName[comps])
       LBequ       array with lower bounds on equations (default, -1000 for
                   reversible and 0 for irreversible)
       UBequ       array with upper bounds on equations (default, 1000)
       changed     cell array with reactions to change bounds for
       LBrxn       array with lower bounds on changed reactions
       UBrxn       array with upper bounds on changed reactions

   This function is used for defining a set of tasks for a model to
   perform. The tasks are defined by defining constraints on the model,
   and if the problem is feasible, then the task is considered successful.
   In general, each row can contain one constraint on uptakes, one 
   constraint on outputs, one new equation, and one change of reaction
   bounds. If more bounds are needed to define the task, then several rows
   can be used for each task. To perform the task use checkTasks or
   fitTasks.

   NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]"
   can be used as inputs or outputs in the similar manner to normal
   metabolites. This is a convenient way to, for example, allow excretion of
   all metabolites to check whether it's the synthesis of some metabolite
   that is limiting or whether it's the degradation of some byproduct. One
   important difference is that only the upper bounds are used for these general
   metabolites. That is, you can only say that uptake or excretion is
   allowed, not that it is required. This is to avoid conflicts where the
   constraints for the general metabolites overwrite those of the real
   ones.

   Usage: taskStruct=parseTaskList(inputFile)

   Rasmus Agren, 2013-02-06

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function taskStruct=parseTaskList(inputFile)
0002 % parseTaskList
0003 %   Parses a task list file.
0004 %
0005 %   inputFile       a task list in Excel format. The file must contain a
0006 %                   sheet named TASKS, which in turn may contain the
0007 %                   following column headers (note, all rows starting with
0008 %                   a non-empty cell are removed. The first row after that
0009 %                   is considered the headers):
0010 %                   ID
0011 %                       the only required header. Each task must have a
0012 %                       unique id (string or numeric). Tasks can span multiple
0013 %                       rows, only the first row in each task should have
0014 %                       an id
0015 %                   DESCRIPTION
0016 %                       description of the task
0017 %                   IN
0018 %                       allowed input(s) for the task. Metabolite names
0019 %                       should be on the form
0020 %                       "model.metName[model.comps]". Several inputs
0021 %                       can be delimited by ";". If so, then the same
0022 %                       bounds are used for all inputs. If that is not
0023 %                       wanted, then use several rows for the task
0024 %                   IN LB
0025 %                       lower bound for the uptake of the metabolites in
0026 %                       the row (opt, default 0 which corresponds to a
0027 %                       minimal uptake of 0 units)
0028 %                   IN UB
0029 %                       upper bound for the uptake of the metabolites in
0030 %                       the row (opt, default 1000 which corresponds to a
0031 %                       maximal uptake of 1000 units)
0032 %                   OUT
0033 %                       allowed output(s) for the task (see IN)
0034 %                   OUT LB
0035 %                       lower bound for the production of the metabolites in
0036 %                       the row (opt, default 0 which corresponds to a
0037 %                       minimal production of 0 units)
0038 %                   OUT UB
0039 %                       upper bound for the production of the metabolites in
0040 %                       the row (opt, default 1000 which corresponds to a
0041 %                       maximal production of 1000 units)
0042 %                   EQU
0043 %                       equation to add. The equation should be on the form
0044 %                       "0.4 A + 2 B <=> (or =>) C" and the metabolites
0045 %                       should be on the form
0046 %                       "model.metName[model.comps]" (opt)
0047 %                   EQU LB
0048 %                       lower bound for the equation (opt, default -1000
0049 %                       for reversible and 0 for irreversible)
0050 %                   EQU UB
0051 %                       upper bound for the equation (opt, default 1000)
0052 %                   CHANGED RXN
0053 %                       reaction ID for which to change the bounds for.
0054 %                       Several IDs can be delimited by ";". If so,
0055 %                       then the same bounds are used for all reactions. If
0056 %                       that is not wanted, then use several rows for the task
0057 %                   CHANGED LB
0058 %                       lower bound for the reaction
0059 %                   CHANGED UB
0060 %                       upper bound for the reaction
0061 %                   SHOULD FAIL
0062 %                       true if the correct behavior of the model is to
0063 %                       not have a feasible solution given the constraints
0064 %                       (opt, default false)
0065 %                   PRINT FLUX
0066 %                       true if the function should print the corresponding
0067 %                       flux distribution for a task. Can be useful for
0068 %                       testing (opt, default false)
0069 %
0070 %   taskStruct      array of structures with the following fields
0071 %       id          the id of the task
0072 %       description the description of the task
0073 %       shouldFail  true if the task should fail
0074 %       printFluxes true if the fluxes should be printed
0075 %       comments    string with comments
0076 %       inputs      cell array with input metabolites (in the form metName[comps])
0077 %       LBin        array with lower bounds on inputs (default, 0)
0078 %       UBin        array with upper bounds on inputs (default, 1000)
0079 %       outputs     cell array with output metabolites (in the form metName[comps])
0080 %       LBout       array with lower bounds on outputs (default, 0)
0081 %       UBout       array with upper bounds on outputs (default, 1000)
0082 %       equations   cell array with equations (with mets in the form metName[comps])
0083 %       LBequ       array with lower bounds on equations (default, -1000 for
0084 %                   reversible and 0 for irreversible)
0085 %       UBequ       array with upper bounds on equations (default, 1000)
0086 %       changed     cell array with reactions to change bounds for
0087 %       LBrxn       array with lower bounds on changed reactions
0088 %       UBrxn       array with upper bounds on changed reactions
0089 %
0090 %   This function is used for defining a set of tasks for a model to
0091 %   perform. The tasks are defined by defining constraints on the model,
0092 %   and if the problem is feasible, then the task is considered successful.
0093 %   In general, each row can contain one constraint on uptakes, one
0094 %   constraint on outputs, one new equation, and one change of reaction
0095 %   bounds. If more bounds are needed to define the task, then several rows
0096 %   can be used for each task. To perform the task use checkTasks or
0097 %   fitTasks.
0098 %
0099 %   NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]"
0100 %   can be used as inputs or outputs in the similar manner to normal
0101 %   metabolites. This is a convenient way to, for example, allow excretion of
0102 %   all metabolites to check whether it's the synthesis of some metabolite
0103 %   that is limiting or whether it's the degradation of some byproduct. One
0104 %   important difference is that only the upper bounds are used for these general
0105 %   metabolites. That is, you can only say that uptake or excretion is
0106 %   allowed, not that it is required. This is to avoid conflicts where the
0107 %   constraints for the general metabolites overwrite those of the real
0108 %   ones.
0109 %
0110 %   Usage: taskStruct=parseTaskList(inputFile)
0111 %
0112 %   Rasmus Agren, 2013-02-06
0113 %
0114 
0115 %Load the tasks file
0116 [crap,crap,raw]=xlsread(inputFile,'TASKS');
0117 
0118 %Remove all lines starting with "#" (or actually any character)
0119 raw=cleanImported(raw);
0120 
0121 %Captions
0122 columns={'ID';'DESCRIPTION';'IN';'IN LB';'IN UB';'OUT';'OUT LB';'OUT UB';'EQU';'EQU LB';'EQU UB';'CHANGED RXN';'CHANGED LB';'CHANGED UB';'SHOULD FAIL';'PRINT FLUX';'COMMENTS'};
0123 
0124 %Match the columns, but ignore the first one (since it can be NaN)
0125 [I colI]=ismember(columns,raw(1,2:end));
0126 colI=colI+1;
0127 
0128 %Check that the ID field is present
0129 if I(1)==0
0130     throw(MException('','The TASKS sheet must have a column named ID'));
0131 end
0132 
0133 %Prepare the input file a little. Put NaN for missing strings and default
0134 %bounds where needed
0135 for i=1:numel(colI)
0136     I=cellfun(@isBad,raw(:,colI(i)));
0137     if ~ismember(i,[4 5 7 8])
0138         raw(I,colI(i))={NaN};
0139     else
0140         if i==5 || i==8
0141             raw(I,colI(i))={1000};
0142         else
0143             raw(I,colI(i))={0};
0144         end
0145     end    
0146 end
0147 
0148 %Create an empty task structure
0149 eTask.id='';
0150 eTask.description='';
0151 eTask.shouldFail=false;
0152 eTask.printFluxes=false;
0153 eTask.comments='';
0154 eTask.inputs={};
0155 eTask.LBin=[];
0156 eTask.UBin=[];
0157 eTask.outputs={};
0158 eTask.LBout=[];
0159 eTask.UBout=[];
0160 eTask.equations={};
0161 eTask.LBequ=[];
0162 eTask.UBequ=[];
0163 eTask.changed={};
0164 eTask.LBrxn=[];
0165 eTask.UBrxn=[];
0166 
0167 %Main loop
0168 taskStruct=[];
0169 task=eTask;
0170 if isnumeric(raw{2,colI(1)})
0171     task.id=num2str(raw{2,colI(1)});
0172 else
0173     task.id=raw{2,colI(1)};
0174 end
0175 task.description=raw{2,colI(2)};
0176 if ~isnan(raw{2,colI(15)})
0177     task.shouldFail=true;
0178 end
0179 if ~isnan(raw{2,colI(16)})
0180     task.printFluxes=true;
0181 end
0182 if ~isnan(raw{2,colI(17)})
0183     task.comments=raw{2,colI(17)};
0184 end
0185 
0186 for i=2:size(raw,1)
0187     %Set the inputs
0188     if ischar(raw{i,colI(3)})
0189         inputs=regexp(raw{i,colI(3)},';','split');
0190         task.inputs=[task.inputs;inputs(:)];
0191         task.LBin=[task.LBin;ones(numel(inputs),1)*raw{i,colI(4)}];
0192         task.UBin=[task.UBin;ones(numel(inputs),1)*raw{i,colI(5)}];
0193     end
0194     %Set the outputs
0195     if ischar(raw{i,colI(6)})
0196         outputs=regexp(raw{i,colI(6)},';','split');
0197         task.outputs=[task.outputs;outputs(:)];
0198         task.LBout=[task.LBout;ones(numel(outputs),1)*raw{i,colI(7)}];
0199         task.UBout=[task.UBout;ones(numel(outputs),1)*raw{i,colI(8)}];
0200     end
0201     %Add new rxns
0202     if ischar(raw{i,colI(9)})
0203         task.equations=[task.equations;raw{i,colI(9)}];
0204         if ~isnan(raw{i,colI(10)})
0205             task.LBequ=[task.LBequ;raw{i,colI(10)}];
0206         else
0207             if any(strfind(raw{i,colI(9)},'<=>'))
0208                 task.LBequ=[task.LBequ;-1000];
0209             else
0210                 task.LBequ=[task.LBequ;0];
0211             end
0212         end
0213         if ~isnan(raw{i,colI(11)})
0214             task.UBequ=[task.UBequ;raw{i,colI(11)}];
0215         else
0216             task.UBequ=[task.UBequ;1000];
0217         end
0218     end
0219     %Add changed bounds
0220     if ischar(raw{i,colI(12)})
0221         changed=regexp(raw{i,colI(12)},';','split');
0222         task.changed=[task.changed;changed(:)];
0223         task.LBrxn=[task.LBrxn;ones(numel(changed),1)*raw{i,colI(13)}];
0224         task.UBrxn=[task.UBrxn;ones(numel(changed),1)*raw{i,colI(14)}];
0225     end
0226 
0227     %Check if it should add more constraints
0228     if i<size(raw,1)
0229         if isnan(raw{i+1,colI(1)})   
0230             continue;
0231         end
0232     end
0233     
0234     taskStruct=[taskStruct;task];
0235     task=eTask;
0236     if i<size(raw,1)
0237         if isnumeric(raw{i+1,colI(1)})
0238             task.id=num2str(raw{i+1,colI(1)});
0239         else
0240             task.id=raw{i+1,colI(1)};
0241         end
0242         task.description=raw{i+1,colI(2)};
0243         if ~isnan(raw{i+1,colI(15)})
0244             task.shouldFail=true;
0245         end
0246         if ~isnan(raw{i+1,colI(16)})
0247             task.printFluxes=true;
0248         end
0249         if ~isnan(raw{i+1,colI(17)})
0250             task.comments=raw{i+1,colI(17)};
0251         end
0252     end
0253 end
0254 
0255 %Should add more checks, such as unique IDs and missing headers
0256 
0257 end
0258 function I=isBad(x)
0259     I=false;
0260     if ischar(x)
0261         if numel(x)==0 || all(isstrprop(x, 'wspace'))
0262            I=true; 
0263         end
0264     else
0265        if isnan(x)
0266           I=true; 
0267        end
0268     end
0269     if isempty(x)
0270         I=true;
0271     end
0272 end
0273 
0274 %Cleans up the structure that is imported from using xlsread
0275 function raw=cleanImported(raw)
0276     %Find the lines that are not commented
0277     keepers=strcmp('',raw(:,1)) | cellfun(@wrapperNAN,raw(:,1));
0278     raw=raw(keepers~=0,:);
0279     
0280     %Remove columns that aren't strings. If you cut and paste a lot in the sheet
0281     %there tends to be columns that are NaN
0282     I=cellfun(@isstr,raw(1,:));
0283     I(1)=true; %This is because the "#" column might be empty
0284     raw=raw(:,I);
0285 
0286     %Check if there are any rows that are all NaN. This could happen if
0287     %xlsread reads too far. Remove any such rows.
0288     nans=cellfun(@wrapperNAN,raw);
0289     I=all(nans,2);
0290     raw(I,:)=[];
0291    
0292     %Also check if there are any lines that contain only NaNs or white
0293     %spaces. This could happen if you accidentaly inserted a space
0294     %somewhere
0295     whites=cellfun(@wrapperWS,raw);
0296     I=all(whites,2);
0297     raw(I,:)=[];
0298     
0299     %Checks if something is NaN. Can't use isnan with cellfun as it does it
0300     %character by character for strings
0301     function I=wrapperNAN(A)
0302        I=any(isnan(A)); 
0303     end
0304     
0305     %Checks if something is all white spaces or NaN
0306     function I=wrapperWS(A)
0307         if isnan(A)
0308             I=true;
0309         else
0310             %isstrprob gives an error if boolean
0311             if islogical(A)
0312                 I=false;
0313             else
0314                 I=all(isstrprop(A,'wspace'));
0315             end
0316         end
0317     end
0318 end

Generated on Tue 23-Apr-2013 15:18:37 by m2html © 2005