parseTaskList Parses a task list file. inputFile a task list in Excel format. The file must contain a sheet named TASKS, which in turn may contain the following column headers (note, all rows starting with a non-empty cell are removed. The first row after that is considered the headers): ID the only required header. Each task must have a unique id (string or numeric). Tasks can span multiple rows, only the first row in each task should have an id DESCRIPTION description of the task IN allowed input(s) for the task. Metabolite names should be on the form "model.metName[model.comps]". Several inputs can be delimited by ";". If so, then the same bounds are used for all inputs. If that is not wanted, then use several rows for the task IN LB lower bound for the uptake of the metabolites in the row (opt, default 0 which corresponds to a minimal uptake of 0 units) IN UB upper bound for the uptake of the metabolites in the row (opt, default 1000 which corresponds to a maximal uptake of 1000 units) OUT allowed output(s) for the task (see IN) OUT LB lower bound for the production of the metabolites in the row (opt, default 0 which corresponds to a minimal production of 0 units) OUT UB upper bound for the production of the metabolites in the row (opt, default 1000 which corresponds to a maximal production of 1000 units) EQU equation to add. The equation should be on the form "0.4 A + 2 B <=> (or =>) C" and the metabolites should be on the form "model.metName[model.comps]" (opt) EQU LB lower bound for the equation (opt, default -1000 for reversible and 0 for irreversible) EQU UB upper bound for the equation (opt, default 1000) CHANGED RXN reaction ID for which to change the bounds for. Several IDs can be delimited by ";". If so, then the same bounds are used for all reactions. If that is not wanted, then use several rows for the task CHANGED LB lower bound for the reaction CHANGED UB upper bound for the reaction SHOULD FAIL true if the correct behavior of the model is to not have a feasible solution given the constraints (opt, default false) PRINT FLUX true if the function should print the corresponding flux distribution for a task. Can be useful for testing (opt, default false) taskStruct array of structures with the following fields id the id of the task description the description of the task shouldFail true if the task should fail printFluxes true if the fluxes should be printed comments string with comments inputs cell array with input metabolites (in the form metName[comps]) LBin array with lower bounds on inputs (default, 0) UBin array with upper bounds on inputs (default, 1000) outputs cell array with output metabolites (in the form metName[comps]) LBout array with lower bounds on outputs (default, 0) UBout array with upper bounds on outputs (default, 1000) equations cell array with equations (with mets in the form metName[comps]) LBequ array with lower bounds on equations (default, -1000 for reversible and 0 for irreversible) UBequ array with upper bounds on equations (default, 1000) changed cell array with reactions to change bounds for LBrxn array with lower bounds on changed reactions UBrxn array with upper bounds on changed reactions This function is used for defining a set of tasks for a model to perform. The tasks are defined by defining constraints on the model, and if the problem is feasible, then the task is considered successful. In general, each row can contain one constraint on uptakes, one constraint on outputs, one new equation, and one change of reaction bounds. If more bounds are needed to define the task, then several rows can be used for each task. To perform the task use checkTasks or fitTasks. NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]" can be used as inputs or outputs in the similar manner to normal metabolites. This is a convenient way to, for example, allow excretion of all metabolites to check whether it's the synthesis of some metabolite that is limiting or whether it's the degradation of some byproduct. One important difference is that only the upper bounds are used for these general metabolites. That is, you can only say that uptake or excretion is allowed, not that it is required. This is to avoid conflicts where the constraints for the general metabolites overwrite those of the real ones. Usage: taskStruct=parseTaskList(inputFile) Rasmus Agren, 2013-02-06
0001 function taskStruct=parseTaskList(inputFile) 0002 % parseTaskList 0003 % Parses a task list file. 0004 % 0005 % inputFile a task list in Excel format. The file must contain a 0006 % sheet named TASKS, which in turn may contain the 0007 % following column headers (note, all rows starting with 0008 % a non-empty cell are removed. The first row after that 0009 % is considered the headers): 0010 % ID 0011 % the only required header. Each task must have a 0012 % unique id (string or numeric). Tasks can span multiple 0013 % rows, only the first row in each task should have 0014 % an id 0015 % DESCRIPTION 0016 % description of the task 0017 % IN 0018 % allowed input(s) for the task. Metabolite names 0019 % should be on the form 0020 % "model.metName[model.comps]". Several inputs 0021 % can be delimited by ";". If so, then the same 0022 % bounds are used for all inputs. If that is not 0023 % wanted, then use several rows for the task 0024 % IN LB 0025 % lower bound for the uptake of the metabolites in 0026 % the row (opt, default 0 which corresponds to a 0027 % minimal uptake of 0 units) 0028 % IN UB 0029 % upper bound for the uptake of the metabolites in 0030 % the row (opt, default 1000 which corresponds to a 0031 % maximal uptake of 1000 units) 0032 % OUT 0033 % allowed output(s) for the task (see IN) 0034 % OUT LB 0035 % lower bound for the production of the metabolites in 0036 % the row (opt, default 0 which corresponds to a 0037 % minimal production of 0 units) 0038 % OUT UB 0039 % upper bound for the production of the metabolites in 0040 % the row (opt, default 1000 which corresponds to a 0041 % maximal production of 1000 units) 0042 % EQU 0043 % equation to add. The equation should be on the form 0044 % "0.4 A + 2 B <=> (or =>) C" and the metabolites 0045 % should be on the form 0046 % "model.metName[model.comps]" (opt) 0047 % EQU LB 0048 % lower bound for the equation (opt, default -1000 0049 % for reversible and 0 for irreversible) 0050 % EQU UB 0051 % upper bound for the equation (opt, default 1000) 0052 % CHANGED RXN 0053 % reaction ID for which to change the bounds for. 0054 % Several IDs can be delimited by ";". If so, 0055 % then the same bounds are used for all reactions. If 0056 % that is not wanted, then use several rows for the task 0057 % CHANGED LB 0058 % lower bound for the reaction 0059 % CHANGED UB 0060 % upper bound for the reaction 0061 % SHOULD FAIL 0062 % true if the correct behavior of the model is to 0063 % not have a feasible solution given the constraints 0064 % (opt, default false) 0065 % PRINT FLUX 0066 % true if the function should print the corresponding 0067 % flux distribution for a task. Can be useful for 0068 % testing (opt, default false) 0069 % 0070 % taskStruct array of structures with the following fields 0071 % id the id of the task 0072 % description the description of the task 0073 % shouldFail true if the task should fail 0074 % printFluxes true if the fluxes should be printed 0075 % comments string with comments 0076 % inputs cell array with input metabolites (in the form metName[comps]) 0077 % LBin array with lower bounds on inputs (default, 0) 0078 % UBin array with upper bounds on inputs (default, 1000) 0079 % outputs cell array with output metabolites (in the form metName[comps]) 0080 % LBout array with lower bounds on outputs (default, 0) 0081 % UBout array with upper bounds on outputs (default, 1000) 0082 % equations cell array with equations (with mets in the form metName[comps]) 0083 % LBequ array with lower bounds on equations (default, -1000 for 0084 % reversible and 0 for irreversible) 0085 % UBequ array with upper bounds on equations (default, 1000) 0086 % changed cell array with reactions to change bounds for 0087 % LBrxn array with lower bounds on changed reactions 0088 % UBrxn array with upper bounds on changed reactions 0089 % 0090 % This function is used for defining a set of tasks for a model to 0091 % perform. The tasks are defined by defining constraints on the model, 0092 % and if the problem is feasible, then the task is considered successful. 0093 % In general, each row can contain one constraint on uptakes, one 0094 % constraint on outputs, one new equation, and one change of reaction 0095 % bounds. If more bounds are needed to define the task, then several rows 0096 % can be used for each task. To perform the task use checkTasks or 0097 % fitTasks. 0098 % 0099 % NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]" 0100 % can be used as inputs or outputs in the similar manner to normal 0101 % metabolites. This is a convenient way to, for example, allow excretion of 0102 % all metabolites to check whether it's the synthesis of some metabolite 0103 % that is limiting or whether it's the degradation of some byproduct. One 0104 % important difference is that only the upper bounds are used for these general 0105 % metabolites. That is, you can only say that uptake or excretion is 0106 % allowed, not that it is required. This is to avoid conflicts where the 0107 % constraints for the general metabolites overwrite those of the real 0108 % ones. 0109 % 0110 % Usage: taskStruct=parseTaskList(inputFile) 0111 % 0112 % Rasmus Agren, 2013-02-06 0113 % 0114 0115 %Load the tasks file 0116 [crap,crap,raw]=xlsread(inputFile,'TASKS'); 0117 0118 %Remove all lines starting with "#" (or actually any character) 0119 raw=cleanImported(raw); 0120 0121 %Captions 0122 columns={'ID';'DESCRIPTION';'IN';'IN LB';'IN UB';'OUT';'OUT LB';'OUT UB';'EQU';'EQU LB';'EQU UB';'CHANGED RXN';'CHANGED LB';'CHANGED UB';'SHOULD FAIL';'PRINT FLUX';'COMMENTS'}; 0123 0124 %Match the columns, but ignore the first one (since it can be NaN) 0125 [I colI]=ismember(columns,raw(1,2:end)); 0126 colI=colI+1; 0127 0128 %Check that the ID field is present 0129 if I(1)==0 0130 throw(MException('','The TASKS sheet must have a column named ID')); 0131 end 0132 0133 %Prepare the input file a little. Put NaN for missing strings and default 0134 %bounds where needed 0135 for i=1:numel(colI) 0136 I=cellfun(@isBad,raw(:,colI(i))); 0137 if ~ismember(i,[4 5 7 8]) 0138 raw(I,colI(i))={NaN}; 0139 else 0140 if i==5 || i==8 0141 raw(I,colI(i))={1000}; 0142 else 0143 raw(I,colI(i))={0}; 0144 end 0145 end 0146 end 0147 0148 %Create an empty task structure 0149 eTask.id=''; 0150 eTask.description=''; 0151 eTask.shouldFail=false; 0152 eTask.printFluxes=false; 0153 eTask.comments=''; 0154 eTask.inputs={}; 0155 eTask.LBin=[]; 0156 eTask.UBin=[]; 0157 eTask.outputs={}; 0158 eTask.LBout=[]; 0159 eTask.UBout=[]; 0160 eTask.equations={}; 0161 eTask.LBequ=[]; 0162 eTask.UBequ=[]; 0163 eTask.changed={}; 0164 eTask.LBrxn=[]; 0165 eTask.UBrxn=[]; 0166 0167 %Main loop 0168 taskStruct=[]; 0169 task=eTask; 0170 if isnumeric(raw{2,colI(1)}) 0171 task.id=num2str(raw{2,colI(1)}); 0172 else 0173 task.id=raw{2,colI(1)}; 0174 end 0175 task.description=raw{2,colI(2)}; 0176 if ~isnan(raw{2,colI(15)}) 0177 task.shouldFail=true; 0178 end 0179 if ~isnan(raw{2,colI(16)}) 0180 task.printFluxes=true; 0181 end 0182 if ~isnan(raw{2,colI(17)}) 0183 task.comments=raw{2,colI(17)}; 0184 end 0185 0186 for i=2:size(raw,1) 0187 %Set the inputs 0188 if ischar(raw{i,colI(3)}) 0189 inputs=regexp(raw{i,colI(3)},';','split'); 0190 task.inputs=[task.inputs;inputs(:)]; 0191 task.LBin=[task.LBin;ones(numel(inputs),1)*raw{i,colI(4)}]; 0192 task.UBin=[task.UBin;ones(numel(inputs),1)*raw{i,colI(5)}]; 0193 end 0194 %Set the outputs 0195 if ischar(raw{i,colI(6)}) 0196 outputs=regexp(raw{i,colI(6)},';','split'); 0197 task.outputs=[task.outputs;outputs(:)]; 0198 task.LBout=[task.LBout;ones(numel(outputs),1)*raw{i,colI(7)}]; 0199 task.UBout=[task.UBout;ones(numel(outputs),1)*raw{i,colI(8)}]; 0200 end 0201 %Add new rxns 0202 if ischar(raw{i,colI(9)}) 0203 task.equations=[task.equations;raw{i,colI(9)}]; 0204 if ~isnan(raw{i,colI(10)}) 0205 task.LBequ=[task.LBequ;raw{i,colI(10)}]; 0206 else 0207 if any(strfind(raw{i,colI(9)},'<=>')) 0208 task.LBequ=[task.LBequ;-1000]; 0209 else 0210 task.LBequ=[task.LBequ;0]; 0211 end 0212 end 0213 if ~isnan(raw{i,colI(11)}) 0214 task.UBequ=[task.UBequ;raw{i,colI(11)}]; 0215 else 0216 task.UBequ=[task.UBequ;1000]; 0217 end 0218 end 0219 %Add changed bounds 0220 if ischar(raw{i,colI(12)}) 0221 changed=regexp(raw{i,colI(12)},';','split'); 0222 task.changed=[task.changed;changed(:)]; 0223 task.LBrxn=[task.LBrxn;ones(numel(changed),1)*raw{i,colI(13)}]; 0224 task.UBrxn=[task.UBrxn;ones(numel(changed),1)*raw{i,colI(14)}]; 0225 end 0226 0227 %Check if it should add more constraints 0228 if i<size(raw,1) 0229 if isnan(raw{i+1,colI(1)}) 0230 continue; 0231 end 0232 end 0233 0234 taskStruct=[taskStruct;task]; 0235 task=eTask; 0236 if i<size(raw,1) 0237 if isnumeric(raw{i+1,colI(1)}) 0238 task.id=num2str(raw{i+1,colI(1)}); 0239 else 0240 task.id=raw{i+1,colI(1)}; 0241 end 0242 task.description=raw{i+1,colI(2)}; 0243 if ~isnan(raw{i+1,colI(15)}) 0244 task.shouldFail=true; 0245 end 0246 if ~isnan(raw{i+1,colI(16)}) 0247 task.printFluxes=true; 0248 end 0249 if ~isnan(raw{i+1,colI(17)}) 0250 task.comments=raw{i+1,colI(17)}; 0251 end 0252 end 0253 end 0254 0255 %Should add more checks, such as unique IDs and missing headers 0256 0257 end 0258 function I=isBad(x) 0259 I=false; 0260 if ischar(x) 0261 if numel(x)==0 || all(isstrprop(x, 'wspace')) 0262 I=true; 0263 end 0264 else 0265 if isnan(x) 0266 I=true; 0267 end 0268 end 0269 if isempty(x) 0270 I=true; 0271 end 0272 end 0273 0274 %Cleans up the structure that is imported from using xlsread 0275 function raw=cleanImported(raw) 0276 %Find the lines that are not commented 0277 keepers=strcmp('',raw(:,1)) | cellfun(@wrapperNAN,raw(:,1)); 0278 raw=raw(keepers~=0,:); 0279 0280 %Remove columns that aren't strings. If you cut and paste a lot in the sheet 0281 %there tends to be columns that are NaN 0282 I=cellfun(@isstr,raw(1,:)); 0283 I(1)=true; %This is because the "#" column might be empty 0284 raw=raw(:,I); 0285 0286 %Check if there are any rows that are all NaN. This could happen if 0287 %xlsread reads too far. Remove any such rows. 0288 nans=cellfun(@wrapperNAN,raw); 0289 I=all(nans,2); 0290 raw(I,:)=[]; 0291 0292 %Also check if there are any lines that contain only NaNs or white 0293 %spaces. This could happen if you accidentaly inserted a space 0294 %somewhere 0295 whites=cellfun(@wrapperWS,raw); 0296 I=all(whites,2); 0297 raw(I,:)=[]; 0298 0299 %Checks if something is NaN. Can't use isnan with cellfun as it does it 0300 %character by character for strings 0301 function I=wrapperNAN(A) 0302 I=any(isnan(A)); 0303 end 0304 0305 %Checks if something is all white spaces or NaN 0306 function I=wrapperWS(A) 0307 if isnan(A) 0308 I=true; 0309 else 0310 %isstrprob gives an error if boolean 0311 if islogical(A) 0312 I=false; 0313 else 0314 I=all(isstrprop(A,'wspace')); 0315 end 0316 end 0317 end 0318 end