Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd
Usage: MATWorkspaceEngine [options] <dir> create ...
MATWorkspaceEngine [options] <dir> import ...
MATWorkspaceEngine [options] <dir> list ...
MATWorkspaceEngine [options] <dir> remove ...
...
Provide the directory and operation followed by --help for more detailed help.
--other_app_dir <dir> |
If present, a directory to
look in to find a MAT task specification. This directory
must contain a task.xml file which describes the task. This
is only necessary if 'MATManagePluginDirs install' has not
been called on the task directory. |
--help |
Prints the core help message
and exits |
MATWorkspaceEngine also makes the common options available.
All workspace operations have fairly consistent syntax; first the
core options, then the workspace directory, then the operation
name, then any operation options, and then the operation
arguments. If the operation is a folder operation, its first
operation argument is the folder name.
The available operations are:
There are also internal operations which are not publicly visible
(release_lock, update_ui_log) and are not accessible via
MATWorkspaceEngine.
We'll review each of these operations in turn.
The create operation creates a workspace. It requires a task and an initial user. If the
workspace supports multiple languages, similarity profiles, or
workspace configurations, these must be supplied as well.
Usage: MATWorkspaceEngine [options] <dir> create [create_options]
Command line option |
Description |
---|---|
--task <task> |
The name of the task to be
associated with this workspace. Required if more than one
task is available. The tasks are the same tasks as those
available to MATEngine. |
--workspace_config <w> |
The workspace
configuration or workflow to be associated with this
workspace. Required if the task has more than one
human-mediated workflow. |
--language <l> |
The language to be associated with this
workspace. Required if the task has more than one language
associated with it. |
--similarity_profile <s> |
The similarity profile in the task to use
when reconciling documents. If not specified, the default
similarity profile for the task will be used. If the task
has multiple defined similarity profiles but no default, a
warning will be issued. |
--initial_users
<s(,s..)> |
A comma-separated list of
initial registered users. Required. The names of the users
do not need to correspond to login names - they're used as
distinctive mnemonics to distinguish between annotators.
However, using login names may be the least confusing
option. |
--max_old_models |
Number of previous models to
retain after model building. Default is 0. |
--help |
Print the help message and
exit. |
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace create --task 'Sentence tagging' --initial_users user1
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace create --task "Sentence tagging" --initial_users user1
The import operation ingests documents into the workspace. The documents are all converted to MAT JSON format, and are prepared for annotation. Optionally, the documents are assigned to users.
Historically, the import operation could target multiple
folders, but as of MAT 2.0, only the core folder is eligible for
import.
Usage: MATWorkspaceEngine [options] <dir> import [import_options] <folder> <file> ...
<folder>: The name of the folder to import documents into.
<file>: The file to import into the folder (can be repeated).
Available folders are:
core: for all files during normal annotation
Command line option |
Description |
---|---|
--strip_suffix <suff> |
Remove this suffix from the
file name when determining the basename for the file in the
workspace. By default, the original file basename is used. |
--encoding <encoding> |
For raw documents, input
encoding. Default is utf-8. All imported raw documents will
be converted to utf-8. |
--file_type <type> |
The file type of the
document. One of the readers.
The default file type is mat-json. |
--workflow <w> |
The workflow defined for the workspace is applied to each imported document, up to the first hand annotation step. However, this is sometimes not adequate to bring documents to the appropriate status; e.g., you may want to tokenize and and align a document which has been previously annotated. If --workflow is specified, it will be applied to each imported document before the workspace workflow is (possibly vacuously) applied. If --steps is specified, those steps will be applied in this preprocessing phase; otherwise, all the steps in the workflow will be applied. |
--steps <s(,s...)> |
The workflow defined for the workspace is applied to each imported document, up to the first hand annotation step. However, this is sometimes not adequate to bring documents to the appropriate status; e.g., you may want to tokenize and and align a document which has been previously annotated. If --workflow is specified, it will be applied to each imported document before the workspace workflow is (possibly vacuously) applied. If --steps is specified, those steps will be applied in this preprocessing phase; otherwise, all the steps in the workflow will be applied. |
--status_user <s> |
By default, the current
workflow step for the document and its status are inferred
from the document itself. But when the current step has
annotations, the workspace may not be able to tell what user
the annotations should be attributed to. You can use this
flag to fix the attribution. If the document has been tagged
automatically but not corrected by a human, use MACHINE as
your value; otherwise, use the name of the registered
workspace user. This value will overwrite any attributions
that are already in the document. The import process will
usually signal an error if there are annotated segments
which are not attributed. |
--step_status gold |
reconciled |
By default, the current
workflow step for the document and its status are inferred
from the document itself. But when the current step has
annotations, the workspace sometimes can't tell whether the
step is partially annotated, gold, or reconciled. The
default is to judge the step to be partially annotated; use
this option to change the value to gold or reconciled. If
the document is already explicitly marked as gold or
reconciled for that step, this setting will be ignored (so
you can't use it, for instance, to make a reconciled step
gold).If this setting must be imposed, all steps previous to
the current workflow step will also be marked in the same
way. |
--suppress_advancement |
By default, the workspace automatically
advances a document through its workflow as far as it can,
stopping at the first non-completed human-reviewable step.
If this flag is present, it suppresses this automatic
advancement in two ways. First, it blocks the application of
any workspace models, so no pre-tagging is applied. Second,
it completely blocks advancement past gold and reconciled
steps, even if the following step is a non-trainable,
completely automated step. If you choose this option, the advance operation will allow you
to apply automatic advancement later. |
--defer_reconciliation |
By default, if the current workflow step is
gold, the engine will automatically check for scheduled
reviews, then advance to the next hand-annotatable step, or
complete the workflow if there are no such steps. If you
choose --step_status gold, or if the current workflow step
is already marked gold, and this step is scheduled for
reconciliation review, this flag will hold the document in a
state where it is waiting for reconciliation partners rather
than assume that there are no other reconciliation partners
available and resort to human review instead. |
--assign_to_users
<s(,s...)> |
If present, assign the
imported document to the specified user or users. MACHINE is
not an eligible target. NOTE: If multiple users are
assigned, and the documents imported are gold or marked gold
in the current step, any scheduled reviews will be skipped. |
--add_to_basename_set
<set> |
Add the basenames to a given
basename set.
You might want to do this if, e.g., you were importing
gold-standard documents which you wanted to set aside for
evaluation. |
--help |
Print the help message and exit. |
The reader referenced in the --file_type option may introduce
additional options, which are described here.
Let's say the directory /home/user/myrandomfiles contains the
files first.txt and second.txt, and these are both raw text files
in latin1 encoding. Then:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import --encoding 'latin1' \
--strip_suffix '.txt' "core" /home/user/myrandomfiles/*
Windows native:
> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import --encoding "latin1" --strip_suffix ".txt" "core" %f
will import these two files, apply the appropriate steps to
prepare them for annotation, and name them "first" and "second".
(Note that in Windows, you can't do globbing the way you can in
Unix, so you have to use the loop.)
Let's say the directory /home/user/myrandomfiles contains the files first.xml and second.xml, and these are both rich annotated files which user1 has partially hand-annotated and saved in XML inline format. Then:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import \
--strip_suffix '.xml' --file-type xml-inline --status_user user1 core /home/user/myrandomfiles/*
Windows native:
> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import --strip_suffix ".xml" --file-type xml-inline --status_user user1 core %f
will import these two files, and name them "first" and "second".
Like the previous example, except these documents are completed and you want them to be treated as gold. Then:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import \
--strip_suffix '.xml' --file-type xml-inline --status_user user1 core --step_status gold \
home/user/myrandomfiles/*
Windows native:
> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import --strip_suffix ".xml" --file-type xml-inline \
--status_user user1 --step_status gold core %f
will import these two files, and name them "first" and "second".
The remove operation removes all copies of the basename from the
workspace. Warning: this
operation will remove all traces of the basenames from the
workspace folders and the database. Do not use it unless you
really want them removed.
Usage: MATWorkspaceEngine [options] <dir> remove <basename> ...
<basename>...: the basename(s) to be removed from the workspace.
Available basenames are: ...
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace remove first
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace remove first
This operation assigns the specified basenames to the specified users. Each user gets his or her own copy of the document to annotate. If there are no available documents corresponding to the basename which haven't already been altered by a human, the basename cannot be assigned.
Usage: MATWorkspaceEngine [options] <dir> assign [assign_options] <basename>...
<basename>...: the basename(s) to be assigned to the specified users.
Command line option |
Description |
---|---|
--user <users> |
Assign the basename to the
named user or users (comma-separated). Required. |
--help |
Print the help message and exit. |
This operation is available via MATWorkspaceEngine only when
--debug is provided in the core options. This operation is
typically available only to the MAT UI. On the command line, it
locks the document and prints out the transaction ID. When
debugging, you can subsequently edit the document in question by
hand, and then save it.
Usage: MATWorkspaceEngine [options] <dir> open_file [open_options] folder basename
<folder>: the folder to find the file in
<basename>: the basename to open
Available basenames are: ...
Command line option |
Description |
---|---|
--user <users> |
Open the file with a
particular user. Required unless --read_only is present. |
--read_only |
If present, open the document
for reading and don't lock it. Used by the MAT UI; not
particularly useful here. |
--file_basename <b> |
Open a particular file basename, rather than
choosing the proper one based on the user. |
--help |
Print the help message and exit. |
This operation is available via MATWorkspaceEngine only when
--debug is provided in the core options. This operation is
typically available only to the MAT UI. This operation marks all
of the "non-gold" segments in a document "human
gold" for the current hand annotatable step, and records the
step as done. Then, by default, it checks for scheduled
reviews; if it finds a scheduled review, it submits the
document for review, and if no reviews are found, it advances the
document to the next hand-annotatable step.
Usage: MATWorkspaceEngine [options] <dir> markgold [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
Supported by folder 'core'.
Command line option |
Description |
---|---|
--user <u> |
Specify a user responsible
for the gold marking. |
--suppress_advancement |
By default, if a document is gold, it will
automatically check for scheduled reviews, then advance to
the next hand-annotatable step, or complete the workflow if
there are no such steps. This flag will suppress automatic
advancement. |
--lock_id <s> |
Lock ID for marking gold
(obligatory if document is locked). |
--release_lock |
Release the lock after marking gold (if
you're done with the document) |
--help |
Print the help message and exit. |
This operation is available via MATWorkspaceEngine only when
--debug is provided in the core options. This operation is
typically available only to the MAT UI. This operation marks all
of the "human gold" or "reconciled" segments in a document "non-gold",
and marks the step undone for that document.
Usage: MATWorkspaceEngine [options] <dir> unmarkgold [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
Supported by folder 'core'.
Command line option |
Description |
---|---|
--user <u> |
Specify a user for the lock
ID. |
--lock_id <s> |
Lock ID for unmarking gold
(obligatory if document is locked). |
--help |
Print the help message and exit. |
This operation is available via MATWorkspaceEngine only when
--debug is provided in the core options. This operation is
typically available only to the MAT UI. On the command line, it
can be used to update the workspace metadata associated with a
document and release the document lock. In the reconciliation and
review folders, it will advance to the next hand-annotatable step
by default.
Usage: MATWorkspaceEngine [options] <dir> save [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
Supported by folders 'core', 'review', 'reconciliation'.
Command line option |
Description |
---|---|
--doc <s> |
Document to save, as a JSON
string. Used by the MAT UI; not particularly useful here. |
--lock_id <s> |
The transaction ID with which
this document was opened. |
--release_lock |
Release the lock after save
(if you're done with the document). |
--log <s> |
Only available in the 'core'
folder. Log fragment (from UI) to trigger fine-grained
progress monitoring. |
--log_format <f> |
Only available in the 'core'
folder. The format of the log fragment. |
--timestamp <s> |
Only available in the 'core'
folder. Millisecond timestamp of log upload from the UI's
point of view. |
--next_op <json> |
Only available in the 'core'
folder. JSON string describing operation to perform after
the save (for UI connectivity, mostly). |
--review_done |
Only available in the 'review' folder. If
present, complete the review. |
--suppress_advancement |
Only available in the 'reconciliation' and
'review' folders. By default, when the lock is released and
reconciliation or review is done, the workspace advances the
document to the next hand-annotatable step, or completes the
workflow if there are no such steps. This flag will suppress
automatic advancement. |
--help |
Print the help message and exit. |
This operation shows you the contents of the folders in the workspace. The listing shows you the status of the document, as well as who it's assigned to.
Usage: MATWorkspaceEngine [options] <dir> list ( <folder> ...)
<folder>: (optional) the name of the folder to list the contents of. For certain folders,
extended information will be shown. If no folders are named, all folders will be listed.
Available folders are:
core: for all files during normal annotation
review: for human review files
export: for exported files
reconciliation: for reconciliation files
Your task may make other folders available.
Command line option |
Description |
---|---|
--user <u> |
The workspace user who's
listing the folder (used to compute the user's ability to
open the file) |
--read_only |
Whether the workspace has
been opened read-only or not. |
--help |
Print the help message and exit. |
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace list "core"
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace list "core"
This operation describes a number of properties of the workspace.
The properties reported are:
Usage: MATWorkspaceEngine [options] <dir> workspace_configuration
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
This operation describes all the tables in the workspace
database. It is a useful debugging tool for the technically
inclined.
Usage: MATWorkspaceEngine [options] <dir> dump_database [<table>...]
<table>: the name of a table to dump. If none are provided, the entire database will be dumped.
See the workspace documentation for a description of the workspace
database.
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
MAT provides a rich and extensive logging infrastructure
specifically for workspaces. When logging is enabled, MAT
workspace operations log every action and data modification, so
that the activities in the workspace can be rerun from the point
that logging was enabled, exactly as they were originally
performed.
Workspace logging is distinct from UI logging. The MAT UI has the capability of capturing all the user gestures, and save these gestures to a CSV file at the user's request. If workspace logging is enabled, the UI turns on this capability specifically for the current workspace, and uploads the log fragments to the MAT server with every save operation in the "core" folder. The format of this log is identical to the format of the UI logger. Unlike general UI logging, this logging cannot be configured or controlled from the UI. Finally, this logging does not interfere with general UI logging; if you choose to enable UI logging, you'll still get all the user gestures, including those that are captured for workspace logging.
This operation enables logging. The log will be saved in the
_checkpoint subdirectory of the workspace directory.
Usage: MATWorkspaceEngine [options] <dir> enable_logging
Command line option |
Description |
---|---|
--help |
Print the help message and
exit. |
This operation disables logging. If a log is being collected, it
is either moved aside or deleted.
Usage: MATWorkspaceEngine [options] <dir> disable_logging
Command line option |
Description |
---|---|
--remove_log |
By default, logs are moved aside to the first
available _checkpoint_<n> location. If this flag is
provided, the log will be removed instead. |
--help |
Print the help message and
exit. |
This operation allows you to rerun the log. It will use the
_checkpoint/_rerun subdirectory of the workspace directory to
store the rerun state. If you've used the --stop_at option to halt
the rerun before it completes, the next call to rerun_log will
continue from that point, unless you provide the --restart option.
Usage: MATWorkspaceEngine [options] <dir> [rerun_options] rerun_log
Command line option |
Description |
---|---|
--stop_at <ts> |
The log timestamp to stop
immediately before. |
--restart |
If present, go back to the
beginning. |
--verbose |
If present, describe the
state of the workspace at each timestamp. |
--help |
Print the help message and
exit. |
Users in these workspaces are simply labels with which human
annotations are associated. When you set up users for your
workspace, it probably makes the most sense to use the user logins
of the people who will be using the workspace. However, there is
no per-user password authentication, or any formal connection or
dependency between workspace users and your computer user
accounts.
This operation allows you to add registered users to your workspace. Perhaps you want to be able to track the contributions of multiple annotators, or you might want to actually assign documents to multiple annotators and do multiple annotation. You cannot unregister users once they're registered.
Users can have roles, as described here.
Usage: MATWorkspaceEngine [options] <dir> register_users <user>...
<user>: the name of a user to register for the workspace.
Command line option |
Description |
---|---|
--roles <roles> |
Available roles are
'annotator', 'reviewer'. If omitted, the role will be
'annotator'. The string 'all' adds both roles. Otherwise, a
comma-separated list of roles. |
--help |
Prints the help message and
exits. |
This operation lists the users in a workspace. It is also available as part of the workspace_configuration operation.
Usage: MATWorkspaceEngine [options] <dir> list_users
Command line option |
Description |
---|---|
--no_roles |
Don't show the roles. |
--help |
Prints the help message and
exits. |
The add_roles operation adds roles to existing users.
Usage: MATWorkspaceEngine [options] <dir> add_roles [add_roles_options] <user>...
<user>: the name of a user to update the roles for.
Command line option |
Description |
---|---|
--roles <roles> |
Available roles are
'annotator', 'reviewer'. If omitted, the role will be
'annotator'. The string 'all' adds both roles. Otherwise, a
comma-separated list of roles. |
--help |
Prints the add_roles help
message and exits. |
The remove_roles operation removes roles from existing users.
Usage: MATWorkspaceEngine [options] <dir> remove_roles [remove_roles_options] <user>...
<user>: the name of a user to update the roles for.
Command line option |
Description |
---|---|
--roles <roles> |
Available roles are
'annotator', 'reviewer'. If omitted, the role will be
'annotator'. The string 'all' removes both roles. Otherwise,
a comma-separated list of roles. |
--help |
Prints the remove_roles help
message and exits. |
By default, the workspace will attempt to ensure that each file
is positioned at an opportunity for user interaction. When a file
is imported, the workspace advances the file to the first
hand-annotatable step; when the user marks a document gold in a
given step, the workspace attempts to advance to the next
hand-annotatable step (assuming no reviews are scheduled). If a
model exists for a given step, it will be applied to documents in
the appropriate circumstances.
This operation builds a model which can be used to automatically
tag other documents. Every document which is gold or reconciled
for the relevant annotation set is used to build this model;
documents which are in the process of being corrected or annotated
are not used. If there are multiple copies of a document because
the document is multiply assigned, all copies will be used (so
that document will be overrepresented in the model, and all
conflicting annotations will be used as well).
You can optionally ask the workspace to autotag eligible
documents after the model is built. Basenames will be autotagged
only if they are either unannotated or uncorrected in the step for
which the model was built.
Usage: MATWorkspaceEngine [options] <dir> modelbuild [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
Supported by folder 'core'.
If no basenames are specified, the modelbuild operation will use
all the eligible documents.
Command line option |
Description |
---|---|
--do_autotag |
If present, autotag eligible
basenames with the model after the model is constructed. |
--trainable_step <step> |
A step in the task workflow which has a
trainable engine. Required if there are multiple trainable
steps in your workflow. |
--config_name <name> |
If present, use a model
settings configuration other than the default. |
--autotag_basename
<basename> |
If --do_autotag is present, a
single basename to autotag. This option can be repeated. If
--do_autotag is present and neither this option nor
--autotagged_basenames is present, all eligible files will
be autotagged. |
--autotag_basenames
<basenames> |
If --do_autotag is present, a
space-separated sequence of basenames to autotag. This
option can be repeated. If --do_autotag is present and
neither this option nor --autotagged_basename is present,
all eligible files will be autotagged. |
Let's say you want to build a model using all the eligible
documents, and you want to autotag all the eligible documents:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
modelbuild --do_autotag "core"
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
modelbuild --do_autotag "core"
Usage: MATWorkspaceEngine [options] <dir> advance [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
Supported by folder 'core'.
Command line option |
Description |
---|---|
--lock_id <s> |
Lock ID (if document is
locked) |
--help |
Print the help message and exit. |
You can use your workspace as a corpus for experiments. You can
access this capability via the <workspace_corpora> element
for MATExperimentEngine,
or you can access it via the workspace engine. You can further
subdivide your workspace into basename sets which can be
referred to in your experiment.
This operation lists the basename sets and their contents.
Usage: MATWorkspaceEngine [options] <dir> list_basename_sets ( <set_name>... )
<set_name>: the name of a basename set
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
This operation adds basenames to a given basename set (and implicitly creates the set if necessary).
Usage: MATWorkspaceEngine [options] <dir> add_to_basename_set <set_name> <basename>...
<set_name>: the name of a basename set
<basename>: a known workspace basename
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
This operation removes basenames from a given basename set (and implicitly removes the set if necessary).
Usage: MATWorkspaceEngine [options] <dir> remove_from_basename_set <set_name> <basename>...
<set_name>: the name of a basename set
<basename>: a workspace basename
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
This operation allows you to run an experiment based on this
workspace, either using an experiment
file or by specifying the properties of the test set in
terms of properties of the workspace basenames. You can do this
either by specifying an experiment file and an experiment file
variable to which the workspace directory should be bound, or by
specifying the specific properties of the workspace documents to
use as your test corpus. In the latter case, the training corpus
will be the remainder of documents at least partially
hand-annotated documents in the workspace.
If you do not use an experiment file, the defaults provided will
be different than those in the workspace file. The only documents
in the workspace that will be used are those which are at least
partially gold; the model trainer will train only on gold or
reconciled segments; and the scorer will only compare to gold or
reconciled segments in the test corpus. You can override these
restrictions by providing your own experiment file. You can
duplicate these restrictions in your experiment file by:
The experiments will be saved in the experiments/ subdirectory of
the workspace, in a directory named
<year><month><day>_<hr><min><sec>_<msec>.
Usage: MATWorkspaceEngine [options] <dir> run_experiment [options]
Command line option |
Description |
---|---|
--experiment_file
<file> |
Specify an experiment file to use. If specified, --workspace_binding is also required. Either this or one of the --test_* parameters must be provided. |
--workspace_binding
<var> |
A variable in the workspace
experiment file to which this workspace should be bound.
Required if --experiment_file is present. |
--test_users
<user(,user..)> |
A comma-separated sequence of
users to restrict the test corpus to. Not permitted if
--experiment_file is provided. |
--test_basename_sets
<set(,set...)> |
A comma-separated sequence of
basename set names to restrict the test corpus to. Not
permitted if --experiment_file is provided. |
--test_basename_patterns
<pat(,pat...)> |
A comma-separated sequence of
glob-style basename patterns to restrict the test corpus to.
Not permitted if --experiment_file is provided. |
--test_step_statuses
<status(,status...)> |
A comma-separated sequence of
step statuses in the target trainable step to restrict the
test corpus to. The background corpus will already be
restricted to 'partially gold,gold,reconciled'. Not
permitted if --experiment_file is provided. |
--test_exclude_unassigned |
If present, exclude
unassigned documents from the test corpus. Not permitted if
--experiment_file is provided. |
--test_step <s> |
The name of the trainable
step in the workspace's workflow to target. Not permitted if
--experiment_file is provided. Required if --experiment_file
is absent and the workflow has more than one trainable step. |
--csv_formula_output
<fmt> |
The format for the CSV output
files. See the MATScore
documentation for details. |
--help |
Print the help message and
exit. |
Let's say you have an experiment file exp.xml whose corpora are
defined entirely using the <workspace_corpus> element, and
the workspace_dir attribute of <workspace_corpora> in that
file refers to the "WS" binding variable. Then, you can use that
experiment file as follows:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
run_experiment --experiment_file exp.xml --workspace_binding WS
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
run_experiment --experiment_file exp.xml --workspace_binding WS
Workspaces support the option of reviewing documents after they're annotated. You can schedule a review in advance, for any document that completes a particular step, or, if there's no existing schedule, you can request a review after you complete a step. Finally, you can use a requested review to repair errors in previous steps. We provide more details on how the reviews work here and here.
There are three types of review: human review, reconciliation
review, or reconciliation with crossvalidation.
Only the first and third are available for an ad-hoc review
request; all are available for scheduled reviews. For
reconciliation reviews, the document will wait for its other
reconciliation partners before actually submitting the
reconciliation request. Crossvalidation reviews are the same,
except once all the partners are submitted for review, the
documents wait for crossvalidation (which must be manually
triggered on the command-line using the apply_crossvalidation
operation) before entering reconciliation.
This operation allows you to schedule a review. This review
will be initiated when the document is marked gold in the step for
which the review is scheduled. If you need suspend this behavior
during import (e.g., because you're importing multiple copies of
the same document, and assigning them to different people, and you
don't want the workspace engine to assume that the first import is
the only copy that it will find), you can use the
--defer_reconciliation option to the import
operation.
Usage: MATWorkspaceEngine [options] <dir> schedule_review <step> <review_type>
<step>: the name or pretty name of the step in the workflow for which the review is being scheduled.
All basenames which complete this step will be submitted for this review,
including the case where this step is the final step reached during the import process.
<review_type>: one of human, reconciliation, reconciliation_with_crossvalidation.
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
This operation allows you to remove a scheduled review.
Usage: MATWorkspaceEngine [options] <dir> unschedule_review <step>
<step>: the name or pretty name of the step in the workflow for which the review schedule is being removed.
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
This operation will list the scheduled reviews, by step.
Usage: MATWorkspaceEngine [options] <dir> list_review_schedule
Command line option |
Description |
---|---|
--help |
Print the help message and exit. |
Use this operation to apply crossvalidation to accumulated
documents which are waiting for it. In general, you should allow a
reasonable number of documents to accumulate awaiting
crossvalidation before you trigger it, since otherwise, it'll
essentially do the same thing that autotagging does.
This operation accepts basename arguments, but those arguments
are ignored.
Usage: MATWorkspaceEngine [options] <dir> apply_crossvalidation [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
The following folders support this operation: core
Command line option |
Description |
---|---|
--folds <n> |
Number of cross-validation
folds (i.e., number of ways the corpus is split). Default is
8. |
--crossvalidation_doc_count_threshold
<n> |
Number of documents awaiting
crossvalidation required in a given step for crossvalidation
to occur. Default is 10. |
--help | Prints the apply_crossvalidation help message and exits. |
If, for some reason, a document fails to exit reconciliation
naturally (if some of the users fail to complete their
reconciliation steps, for example), you can use this operation to
remove the document forcibly from reconciliation. This operation
will also free documents from waiting for crossvalidation or for
reconciliation partners. By default, this operation will advance
the document to the next hand-annotatable step.
Usage: MATWorkspaceEngine [options] <dir> remove_from_reconciliation [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
The following folders support this operation: reconciliation
Command line option |
Description |
---|---|
--dont_reintegrate |
By default, reconciliation
updates are integrated back into the core documents. Use
this flag to skip that step. |
--suppress_advancement |
By default, when reconciliation is done, the
workspace advances the document to the next hand-annotatable
step, or completes the workflow if there are no such steps.
This flag will suppress automatic advancement. |
--help | Prints the remove_from_reconciliation help message and exits. |
Keep in mind that the document may already be partially
reconciled. If you want to remove the document and preserve
the decisions already made, you can use the operation as follows:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove_from_reconciliation reconciliation basename1
This will migrate the agreed-upon document segments back into the
documents which were used to create the reconciliation document.
If you do not want to preserve those decisions, and simply want to
stop the document from being reconciled, do this instead:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove_from_reconciliation --dont_reintegrate reconciliation basename1
If the current step isn't scheduled for review or reconciliation,
you can request a review yourself, if you want one. Only human
review and reconciliation with crossvalidation are available; you
can't request a review for a document assigned to someone else.
The 'repair' review type is special; it's equivalent to
requesting a human review which you'll conduct yourself, on a
document which isn't complete in its current step.
Usage: MATWorkspaceEngine [options] <dir> request_review [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
Supported by folder 'core'.
Command line option |
Description |
---|---|
--review_type <r> |
The type of ad-hoc review. Available types
are 'human', 'repair' and
'reconciliation_with_crossvalidation' |
--lock_id <id> |
Lock ID. |
--user <u> |
The user requesting the review. |
--help | Prints the request_review help message and exits. |
If a document is in the human review folder, you can indicate that you're satisfied with the document with this operation. If the document isn't being reviewed for repair, this operation will mark the document reconciled for the current step, and then advance the document to the next hand-annotatable step.
Command line option |
Description |
---|---|
--suppress_advancement |
By default, when the review is completed, the
workspace advances the document to the next hand-annotatable
step, or completes the workflow if there are no such steps.
This flag will suppress automatic advancement. |
--lock_id <id> |
Lock ID. |
--help | Prints the complete_human_review help message and exits. |
This operation forces a basename in the named folder to be
unlocked. The --user option is obligatory. Warning: be very certain that
you apply the force_unlock operation only to basenames whose locks have been stranded.
If you unlock a basename which is being annotated, the annotator
will not be able to save her changes.
Note: you can't use this lock to forcibly undo an
operation lock. In this situation you'll get an error "workspace
is currently unavailable (processing another request)". More on
how to deal with that here.
Usage: MATWorkspaceEngine [options] <dir> force_unlock [operation_options] <folder> [ <basename> ... ]
<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.
Supported by folder 'core'.
Command line option |
Description |
---|---|
--user <user> |
The user who's locked the
basename. |
--help | Prints the force_unlock help message and exits. |