The experiment engine runs the experiment described in an experiment XML file. The
experiment engine runs this experiment in a directory which is
provided to it either via the XML file (the "dir" attribute of the
<experiment> element) or on the command line (the --exp_dir
option). The experiment engine prepares the corpora, builds the
models, and performs the experiment runs in this directory. The
experiment XML file is copied into the experiment directory, if a
file with the same name is not already present.
By default, the engine can continue an experiment which is halted
in the middle. Each corpus, model set and run stores its metadata
in a file called "properties.txt" in its specified directory, and
keeps track of whether it's been completed or not. If the engine
fails in the middle, it will not redo work it knows has been
completed. The --force argument overrides this default behavior,
and ought to force a full rerun of the experiment; however, the
interactions among the components are extremely complex, and
--force often fails. If you want to rerun an experiment, the
safest thing to do is use a different experiment directory.
There is one exception to this generalization. If there are
experimental runs present, the engine will always score them, even
if it's scored them before. So an easy way to review the scores
for an experiment is just to run the engine again.
See here for a set of use cases for the experiment engine; see here for the structure of its output directory.
Unix:
% $MAT_PKG_HOME/bin/MATExperimentEngine
Windows native:
> %MAT_PKG_HOME%\bin\MATExperimentEngine.cmd
Usage: MATExperimentEngine [options] <xml_file>
<xml_file>: An experiment XML file
--language <l> |
A language name or code supported by the
experiment task. Ignored if the experiment XML file
specifies a language. Obligatory if no language is specified
in the experiment XML file; the task supports multiple
languages; and some of the actions executed in the
experiment vary depending on the language. |
--exp_dir <dir> |
Optionally, the directory the
experiment will be run in. This directory may also be
provided in the experiment XML file (if both are provided,
the command-line setting is ignored). The directory will be
created if it doesn't yet exist. |
--pattern_dir <dir> |
Optionally, this path is the
prefix used for relative directory paths in file patterns in
the <pattern> element in the corpora in the experiment
XML file. Otherwise, these patterns must be absolute
pathnames. |
--binding <k>=<v> |
Optionally, add a binding to
be used in expanding settings in the experiment file. These
values override values in the experiment file itself. |
--csv_formula_output
<s> |
A comma-separated list of
options for CSV output. The possibilities are 'oo' (formulas
with OpenOffice separators), 'excel' (formulas with Excel
separators), 'literal' (no formulas). The experiment engine
will produce CSV output files for each of the conditions you
specify. By default, this value is 'excel'. Note that the
OpenOffice and Excel formula formats are incompatible with
each other, so you'll only be able to open output files with
Excel separators in Excel, etc. |
--dont_compute_confidence |
By default, the experiment
engine computes confidence measures when it runs the scorer.
This process can be time consuming. Disable it with this
flag. |
--dont_rescore |
By default, the experiment
engine rescores complete runs when it's restarted. Use this
flag to disable this feature. This should only be used for
debugging purposes, because the scores from the completed
runs won't be accumulated in this mode. |
MATExperimentEngine also makes the common options available.
These options are more complicated, and not as well supported.
Use them at your own risk.
--force |
If present, forces the
reprocess of the experiment file. |
--batch_test_runs |
By default, test runs are
performed as soon as the relevant model is available. This
flag postpones all test runs until after all models are
constructed. |
--mark_done |
This flag is intended for the
exceptional situation where you've interrupted an experiment
before it's completed, and you just want to rerun the
scoring for what's already done. This flag will force the
engine to mark all corpora, models and runs as completed.
The effect is that from this point on, the engine will only
report scores for this experiment. |
For examples of the experiment XML files themselves, look here.
Let's say your experiment XML file /document/exp_files/exp.xml
contains a value for the "dir" attribute of the <experiment>
element, and all the paths in the <pattern> elements are
absolute. Then your invocation is simple:
Unix:
% $MAT_PKG_HOME/bin/MATExperimentEngine /documents/exp_files/exp.xml
Windows native:
> %MAT_PKG_HOME%\bin\MATExperimentEngine.cmd c:\documents\exp_files\exp.xml
Let's say that your experiment XML file does not contain a value
for the "dir" attribute, and you want to create an experiment run
in /documents/exp_runs/run1:
Unix:
% $MAT_PKG_HOME/bin/MATExperimentEngine --exp_dir /documents/exp_runs/run1 \
/documents/exp_files/exp.xml
Windows native:
> %MAT_PKG_HOME%\bin\MATExperimentEngine.cmd --exp_dir c:\documents\exp_runs\run1 \
c:\documents\exp_files\exp.xml
Let's say you have the same situation as in example 2, but you
don't want spreadsheet formulas in your output, because you're
feeding the data to a statistical package like R instead of to
Excel:
Unix:
% $MAT_PKG_HOME/bin/MATExperimentEngine --exp_dir /documents/exp_runs/run1 \
--csv_formula_output literal /documents/exp_files/exp.xml
Windows native:
> %MAT_PKG_HOME%\bin\MATExperimentEngine.cmd --exp_dir c:\documents\exp_runs\run1 \
--csv_formula_output literal c:\documents\exp_files\exp.xml
Let's say that you have the same situation as in example 2, and
you want to view the results in a spreadsheet, but you can't
afford Excel, and you're using OpenOffice instead:
Unix:
% $MAT_PKG_HOME/bin/MATExperimentEngine --exp_dir /documents/exp_runs/run1 \
--csv_formula_output oo /documents/exp_files/exp.xml
Windows native:
> %MAT_PKG_HOME%\bin\MATExperimentEngine.cmd --exp_dir c:\documents\exp_runs\run1 \
--csv_formula_output oo c:\documents\exp_files\exp.xml
Let's say you're in the same situation as in example 2, but you
have relative pathnames in <pattern> elements in your XML
file, and all the document paths are a suffix of
/documents/completed:
Unix:
% $MAT_PKG_HOME/bin/MATExperimentEngine --exp_dir /documents/exp_runs/run1 \
--pattern_dir /documents/completed /documents/exp_files/exp.xml
Windows native:
> %MAT_PKG_HOME%\bin\MATExperimentEngine.cmd --exp_dir c:\documents\exp_runs\run1 \
--pattern_dir c:\documents\completed c:\documents\exp_files\exp.xml