Workspace engine

Description

The workspace engine manages workspaces. Once you create a workspace, you can perform toplevel operations on it, such as importing a document into the workspace or listing the contents, or perform an operation on one of the folders in the workspace. There are core options, plus options which are specific to each activity.

Note that you should never use MATEngine or MATModelBuilder to save files or models into workspaces.

Usage

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd

Usage: MATWorkspaceEngine [options] <dir> create ...
MATWorkspaceEngine [options] <dir> import ...
MATWorkspaceEngine [options] <dir> list ...
MATWorkspaceEngine [options] <dir> remove ...
...

Provide the directory and operation followed by --help for more detailed help.

Core options

--other_app_dir <dir>
If present, a directory to look in to find a MAT task specification. This directory must contain a task.xml file which describes the task. This is only necessary if 'MATManagePluginDirs install' has not been called on the task directory.
--help
Prints the core help message and exits

MATWorkspaceEngine also makes the common options available.

All workspace operations have fairly consistent syntax; first the core options, then the workspace directory, then the operation name, then any operation options, and then the operation arguments. If the operation is a folder operation, its first operation argument is the folder name.

The available operations are:

topic
operation
folder
creation
create
(global)
file management
import
(global)
remove
(global)
assign
(global)
open_file (only when --debug is provided)
(global)
markgold (only when --debug is provided)
core
unmarkgold (only when --debug is provided)
core
save (only when --debug is provided)
core, review, reconciliation
inspection
list
(global)
workspace_configuration
(global)
dump_database
(global)
logging
enable_logging
(global)
disable_logging
(global)
rerun_log
(global)
users
register_users
(global)
list_users
(global)
add_roles
(global)
remove_roles
(global)
automated tagging

modelbuild
core
advance
core
experimentation
list_basename_sets
(global)
add_to_basename_set
(global)
remove_from_basename_set
(global)
run_experiment
(global)
review and reconciliation
schedule_review
(global)
unschedule_review
(global)
list_review_schedule
(global)
apply_crossvalidation
core
remove_from_reconciliation
reconciliation
request_review
core
complete_human_review
review
administration
force_unlock
core, review, reconciliation

There are also internal operations which are not publicly visible (release_lock, update_ui_log) and are not accessible via MATWorkspaceEngine.

We'll review each of these operations in turn.

Creation

create

The create operation creates a workspace. It requires a task and an initial user. If the workspace supports multiple languages, similarity profiles, or workspace configurations, these must be supplied as well.

Usage: MATWorkspaceEngine [options] <dir> create [create_options]

Options

Command line option
Description
--task <task>
The name of the task to be associated with this workspace. Required if more than one task is available. The tasks are the same tasks as those available to MATEngine.
--workspace_config <w>
The workspace configuration or workflow to be associated with this workspace. Required if the task has more than one human-mediated workflow.
--language <l>
The language to be associated with this workspace. Required if the task has more than one language associated with it.
--similarity_profile <s>
The similarity profile in the task to use when reconciling documents. If not specified, the default similarity profile for the task will be used. If the task has multiple defined similarity profiles but no default, a warning will be issued.
--initial_users <s(,s..)>
A comma-separated list of initial registered users. Required. The names of the users do not need to correspond to login names - they're used as distinctive mnemonics to distinguish between annotators. However, using login names may be the least confusing option.
--max_old_models
Number of previous models to retain after model building. Default is 0.
--help
Print the help message and exit.

Example

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace create --task 'Sentence tagging' --initial_users user1

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace create --task "Sentence tagging" --initial_users user1

File management

import

The import operation ingests documents into the workspace. The documents are all converted to MAT JSON format, and are prepared for annotation. Optionally, the documents are assigned to users.

Historically, the import operation could target multiple folders, but as of MAT 2.0, only the core folder is eligible for import.

Usage: MATWorkspaceEngine [options] <dir> import [import_options] <folder> <file> ...

<folder>: The name of the folder to import documents into.
<file>: The file to import into the folder (can be repeated).

Available folders are:

core: for all files during normal annotation

Options

Command line option
Description
--strip_suffix <suff>
Remove this suffix from the file name when determining the basename for the file in the workspace. By default, the original file basename is used.
--encoding <encoding>
For raw documents, input encoding. Default is utf-8. All imported raw documents will be converted to utf-8.
--file_type <type>
The file type of the document. One of the readers. The default file type is mat-json.
--workflow <w>
The workflow defined for the workspace is applied to each imported document, up to the first hand annotation step. However, this is sometimes not adequate to bring documents to the appropriate status; e.g., you may want to tokenize and and align a document which has been previously annotated. If --workflow is specified, it will be applied to each imported document before the workspace workflow is (possibly vacuously) applied. If --steps is specified, those steps will be applied in this preprocessing phase; otherwise, all the steps in the workflow will be applied.
--steps <s(,s...)>
The workflow defined for the workspace is applied to each imported document, up to the first hand annotation step. However, this is sometimes not adequate to bring documents to the appropriate status; e.g., you may want to tokenize and and align a document which has been previously annotated. If --workflow is specified, it will be applied to each imported document before the workspace workflow is (possibly vacuously) applied. If --steps is specified, those steps will be applied in this preprocessing phase; otherwise, all the steps in the workflow will be applied.
--status_user <s>
By default, the current workflow step for the document and its status are inferred from the document itself. But when the current step has annotations, the workspace may not be able to tell what user the annotations should be attributed to. You can use this flag to fix the attribution. If the document has been tagged automatically but not corrected by a human, use MACHINE as your value; otherwise, use the name of the registered workspace user. This value will overwrite any attributions that are already in the document. The import process will usually signal an error if there are annotated segments which are not attributed.
--step_status gold | reconciled
By default, the current workflow step for the document and its status are inferred from the document itself. But when the current step has annotations, the workspace sometimes can't tell whether the step is partially annotated, gold, or reconciled. The default is to judge the step to be partially annotated; use this option to change the value to gold or reconciled. If the document is already explicitly marked as gold or reconciled for that step, this setting will be ignored (so you can't use it, for instance, to make a reconciled step gold).If this setting must be imposed, all steps previous to the current workflow step will also be marked in the same way.
--suppress_advancement
By default, the workspace automatically advances a document through its workflow as far as it can, stopping at the first non-completed human-reviewable step. If this flag is present, it suppresses this automatic advancement in two ways. First, it blocks the application of any workspace models, so no pre-tagging is applied. Second, it completely blocks advancement past gold and reconciled steps, even if the following step is a non-trainable, completely automated step. If you choose this option, the advance operation will allow you to apply automatic advancement later.
--defer_reconciliation
By default, if the current workflow step is gold, the engine will automatically check for scheduled reviews, then advance to the next hand-annotatable step, or complete the workflow if there are no such steps. If you choose --step_status gold, or if the current workflow step is already marked gold, and this step is scheduled for reconciliation review, this flag will hold the document in a state where it is waiting for reconciliation partners rather than assume that there are no other reconciliation partners available and resort to human review instead.
--assign_to_users <s(,s...)>
If present, assign the imported document to the specified user or users. MACHINE is not an eligible target. NOTE: If multiple users are assigned, and the documents imported are gold or marked gold in the current step, any scheduled reviews will be skipped.
--add_to_basename_set <set>
Add the basenames to a given basename set. You might want to do this if, e.g., you were importing gold-standard documents which you wanted to set aside for evaluation.
--help
Print the help message and exit.

The reader referenced in the --file_type option may introduce additional options, which are described here.

Example

Let's say the directory /home/user/myrandomfiles contains the files first.txt and second.txt, and these are both raw text files in latin1 encoding. Then:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import --encoding 'latin1' \
--strip_suffix '.txt' "core" /home/user/myrandomfiles/*

Windows native:

> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import --encoding "latin1"
--strip_suffix ".txt" "core" %f

will import these two files, apply the appropriate steps to prepare them for annotation, and name them "first" and "second". (Note that in Windows, you can't do globbing the way you can in Unix, so you have to use the loop.)

Example

Let's say the directory /home/user/myrandomfiles contains the files first.xml and second.xml, and these are both rich annotated files which user1 has partially hand-annotated and saved in XML inline format. Then:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import \
--strip_suffix '.xml' --file-type xml-inline --status_user user1 core /home/user/myrandomfiles/*

Windows native:

> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import -
-strip_suffix ".xml" --file-type xml-inline --status_user user1 core %f

will import these two files, and name them "first" and "second".

Example

Like the previous example, except these documents are completed and you want them to be treated as gold. Then:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import \
--strip_suffix '.xml' --file-type xml-inline --status_user user1 core --step_status gold \
home/user/myrandomfiles/*

Windows native:

> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import -
-strip_suffix ".xml" --file-type xml-inline \
--status_user user1 --step_status gold core %f

will import these two files, and name them "first" and "second".

remove

The remove operation removes all copies of the basename from the workspace. Warning: this operation will remove all traces of the basenames from the workspace folders and the database. Do not use it unless you really want them removed.

Usage: MATWorkspaceEngine [options] <dir> remove <basename> ...

<basename>...: the basename(s) to be removed from the workspace.

Available basenames are: ...

Options

Command line option
Description
--help
Print the help message and exit.

Example

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace remove first

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace remove first

assign

This operation assigns the specified basenames to the specified users. Each user gets his or her own copy of the document to annotate. If there are no available documents corresponding to the basename which haven't already been altered by a human, the basename cannot be assigned.

Usage: MATWorkspaceEngine [options] <dir> assign [assign_options] <basename>... 

<basename>...: the basename(s) to be assigned to the specified users.

Options

Command line option
Description
--user <users>
Assign the basename to the named user or users (comma-separated). Required.
--help
Print the help message and exit.

open_file

This operation is available via MATWorkspaceEngine only when --debug is provided in the core options. This operation is typically available only to the MAT UI. On the command line, it locks the document and prints out the transaction ID. When debugging, you can subsequently edit the document in question by hand, and then save it.

Usage: MATWorkspaceEngine [options] <dir> open_file [open_options] folder basename

<folder>: the folder to find the file in
<basename>: the basename to open

Available basenames are: ...

Options

Command line option
Description
--user <users>
Open the file with a particular user. Required unless --read_only is present.
--read_only
If present, open the document for reading and don't lock it. Used by the MAT UI; not particularly useful here.
--file_basename <b>
Open a particular file basename, rather than choosing the proper one based on the user.
--help
Print the help message and exit.

markgold

This operation is available via MATWorkspaceEngine only when --debug is provided in the core options. This operation is typically available only to the MAT UI. This operation marks all of the "non-gold" segments in a document "human gold" for the current hand annotatable step, and records the step as done. Then, by default, it checks for scheduled reviews; if it finds a scheduled review, it submits the document for review, and if no reviews are found, it advances the document to the next hand-annotatable step.

Usage: MATWorkspaceEngine [options] <dir> markgold [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

Supported by folder 'core'.

Options

Command line option
Description
--user <u>
Specify a user responsible for the gold marking.
--suppress_advancement
By default, if a document is gold, it will automatically check for scheduled reviews, then advance to the next hand-annotatable step, or complete the workflow if there are no such steps. This flag will suppress automatic advancement.
--lock_id <s>
Lock ID for marking gold (obligatory if document is locked).
--release_lock
Release the lock after marking gold (if you're done with the document)
--help
Print the help message and exit.

unmarkgold

This operation is available via MATWorkspaceEngine only when --debug is provided in the core options. This operation is typically available only to the MAT UI. This operation marks all of the "human gold" or "reconciled" segments in a document "non-gold", and marks the step undone for that document.

Usage: MATWorkspaceEngine [options] <dir> unmarkgold [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

Supported by folder 'core'.

Options

Command line option
Description
--user <u>
Specify a user for the lock ID.
--lock_id <s>
Lock ID for unmarking gold (obligatory if document is locked).
--help
Print the help message and exit.

save

This operation is available via MATWorkspaceEngine only when --debug is provided in the core options. This operation is typically available only to the MAT UI. On the command line, it can be used to update the workspace metadata associated with a document and release the document lock. In the reconciliation and review folders, it will advance to the next hand-annotatable step by default.

Usage: MATWorkspaceEngine [options] <dir> save [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

Supported by folders 'core', 'review', 'reconciliation'.

Options

Command line option
Description
--doc <s>
Document to save, as a JSON string. Used by the MAT UI; not particularly useful here.
--lock_id <s>
The transaction ID with which this document was opened.
--release_lock
Release the lock after save (if you're done with the document).
--log <s>
Only available in the 'core' folder. Log fragment (from UI) to trigger fine-grained progress monitoring.
--log_format <f>
Only available in the 'core' folder. The format of the log fragment.
--timestamp <s>
Only available in the 'core' folder. Millisecond timestamp of log upload from the UI's point of view.
--next_op <json>
Only available in the 'core' folder. JSON string describing operation to perform after the save (for UI connectivity, mostly).
--review_done
Only available in the 'review' folder. If present, complete the review.
--suppress_advancement
Only available in the 'reconciliation' and 'review' folders. By default, when the lock is released and reconciliation or review is done, the workspace advances the document to the next hand-annotatable step, or completes the workflow if there are no such steps. This flag will suppress automatic advancement.
--help
Print the help message and exit.

Inspection

list

This operation shows you the contents of the folders in the workspace. The listing shows you the status of the document, as well as who it's assigned to.

Usage: MATWorkspaceEngine [options] <dir> list ( <folder> ...)

<folder>: (optional) the name of the folder to list the contents of. For certain folders,
extended information will be shown. If no folders are named, all folders will be listed.

Available folders are:
core: for all files during normal annotation
review: for human review files
export: for exported files
reconciliation: for reconciliation files

Your task may make other folders available.

Command line option
Description
--user <u>
The workspace user who's listing the folder (used to compute the user's ability to open the file)
--read_only
Whether the workspace has been opened read-only or not.
--help
Print the help message and exit.

Example

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace list "core"

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace list "core"

workspace_configuration

This operation describes a number of properties of the workspace. The properties reported are:

Usage: MATWorkspaceEngine [options] <dir> workspace_configuration

Options

Command line option
Description
--help
Print the help message and exit.

dump_database

This operation describes all the tables in the workspace database. It is a useful debugging tool for the technically inclined.

Usage: MATWorkspaceEngine [options] <dir> dump_database [<table>...]

<table>: the name of a table to dump. If none are provided, the entire database will be dumped.

See the workspace documentation for a description of the workspace database.

Options

Command line option
Description
--help
Print the help message and exit.

Logging

MAT provides a rich and extensive logging infrastructure specifically for workspaces. When logging is enabled, MAT workspace operations log every action and data modification, so that the activities in the workspace can be rerun from the point that logging was enabled, exactly as they were originally performed.

Workspace logging is distinct from UI logging. The MAT UI has the capability of capturing all the user gestures, and save these gestures to a CSV file at the user's request. If workspace logging is enabled, the UI turns on this capability specifically for the current workspace, and uploads the log fragments to the MAT server with every save operation in the "core" folder. The format of this log is identical to the format of the UI logger. Unlike general UI logging, this logging cannot be configured or controlled from the UI. Finally, this logging does not interfere with general UI logging; if you choose to enable UI logging, you'll still get all the user gestures, including those that are captured for workspace logging.

enable_logging

This operation enables logging. The log will be saved in the _checkpoint subdirectory of the workspace directory.

Usage: MATWorkspaceEngine [options] <dir> enable_logging

Options

Command line option
Description
--help
Print the help message and exit.

disable_logging

This operation disables logging. If a log is being collected, it is either moved aside or deleted.

Usage: MATWorkspaceEngine [options] <dir> disable_logging

Options

Command line option
Description
--remove_log
By default, logs are moved aside to the first available _checkpoint_<n> location. If this flag is provided, the log will be removed instead.
--help
Print the help message and exit.

rerun_log

This operation allows you to rerun the log. It will use the _checkpoint/_rerun subdirectory of the workspace directory to store the rerun state. If you've used the --stop_at option to halt the rerun before it completes, the next call to rerun_log will continue from that point, unless you provide the --restart option.

Usage: MATWorkspaceEngine [options] <dir> [rerun_options] rerun_log

Options

Command line option
Description
--stop_at <ts>
The log timestamp to stop immediately before.
--restart
If present, go back to the beginning.
--verbose
If present, describe the state of the workspace at each timestamp.
--help
Print the help message and exit.

Users

Users in these workspaces are simply labels with which human annotations are associated. When you set up users for your workspace, it probably makes the most sense to use the user logins of the people who will be using the workspace. However, there is no per-user password authentication, or any formal connection or dependency between workspace users and your computer user accounts.

register_users

This operation allows you to add registered users to your workspace. Perhaps you want to be able to track the contributions of multiple annotators, or you might want to actually assign documents to multiple annotators and do multiple annotation. You cannot unregister users once they're registered.

Users can have roles, as described here.

Usage: MATWorkspaceEngine [options] <dir> register_users <user>...

<user>: the name of a user to register for the workspace.

Options

Command line option
Description
--roles <roles>
Available roles are 'annotator', 'reviewer'. If omitted, the role will be 'annotator'. The string 'all' adds both roles. Otherwise, a comma-separated list of roles.
--help
Prints the help message and exits.

list_users

This operation lists the users in a workspace. It is also available as part of the workspace_configuration operation.

Usage: MATWorkspaceEngine [options] <dir> list_users

Options

Command line option
Description
--no_roles
Don't show the roles.
--help
Prints the help message and exits.

add_roles

The add_roles operation adds roles to existing users.

Usage: MATWorkspaceEngine [options] <dir> add_roles [add_roles_options] <user>...

<user>: the name of a user to update the roles for.

Options

Command line option
Description
--roles <roles>
Available roles are 'annotator', 'reviewer'. If omitted, the role will be 'annotator'. The string 'all' adds both roles. Otherwise, a comma-separated list of roles.
--help
Prints the add_roles help message and exits.

remove_roles

The remove_roles operation removes roles from existing users.

Usage: MATWorkspaceEngine [options] <dir> remove_roles [remove_roles_options] <user>...

<user>: the name of a user to update the roles for.

Options

Command line option
Description
--roles <roles>
Available roles are 'annotator', 'reviewer'. If omitted, the role will be 'annotator'. The string 'all' removes both roles. Otherwise, a comma-separated list of roles.
--help
Prints the remove_roles help message and exits.

Automated tagging and advancement

By default, the workspace will attempt to ensure that each file is positioned at an opportunity for user interaction. When a file is imported, the workspace advances the file to the first hand-annotatable step; when the user marks a document gold in a given step, the workspace attempts to advance to the next hand-annotatable step (assuming no reviews are scheduled). If a model exists for a given step, it will be applied to documents in the appropriate circumstances.

modelbuild

This operation builds a model which can be used to automatically tag other documents. Every document which is gold or reconciled for the relevant annotation set is used to build this model; documents which are in the process of being corrected or annotated are not used. If there are multiple copies of a document because the document is multiply assigned, all copies will be used (so that document will be overrepresented in the model, and all conflicting annotations will be used as well).

You can optionally ask the workspace to autotag eligible documents after the model is built. Basenames will be autotagged only if they are either unannotated or uncorrected in the step for which the model was built.

Usage: MATWorkspaceEngine [options] <dir> modelbuild [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

Supported by folder 'core'.

If no basenames are specified, the modelbuild operation will use all the eligible documents.

Options

Command line option
Description
--do_autotag
If present, autotag eligible basenames with the model after the model is constructed.
--trainable_step <step>
A step in the task workflow which has a trainable engine. Required if there are multiple trainable steps in your workflow.
--config_name <name>
If present, use a model settings configuration other than the default.
--autotag_basename <basename>
If --do_autotag is present, a single basename to autotag. This option can be repeated. If --do_autotag is present and neither this option nor --autotagged_basenames is present, all eligible files will be autotagged.
--autotag_basenames <basenames>
If --do_autotag is present, a space-separated sequence of basenames to autotag. This option can be repeated. If --do_autotag is present and neither this option nor --autotagged_basename is present, all eligible files will be autotagged.

Example

Let's say you want to build a model using all the eligible documents, and you want to autotag all the eligible documents:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
modelbuild --do_autotag "core"

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
modelbuild --do_autotag "core"

advance

By default, documents advance automatically to the next hand-annotatable step. Several operations permit you to suppress advancement. If you do, you can complete the advancement later using this operation. This operation automatically advances the document to the next hand-annotatable point, or to the end of the workflow if there are no more hand-annotatable points. You can specify individual basenames to process, or process all documents.

Note: this operation does not use the jCarafe tagging server, even in the UI. So the startup cost of the tagging engine is incurred each time the automated tagger is executed.
Usage: MATWorkspaceEngine [options] <dir> advance [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

Supported by folder 'core'.

Options

Command line option
Description
--lock_id <s>
Lock ID (if document is locked)
--help
Print the help message and exit.

Experimentation

You can use your workspace as a corpus for experiments. You can access this capability via the <workspace_corpora> element for MATExperimentEngine, or you can access it via the workspace engine. You can further subdivide your workspace into basename sets which can be referred to in your experiment.

list_basename_sets

This operation lists the basename sets and their contents.

Usage: MATWorkspaceEngine [options] <dir> list_basename_sets ( <set_name>... )

<set_name>: the name of a basename set

Options

Command line option
Description
--help
Print the help message and exit.

add_to_basename_set

This operation adds basenames to a given basename set (and implicitly creates the set if necessary).

Usage: MATWorkspaceEngine [options] <dir> add_to_basename_set <set_name> <basename>...

<set_name>: the name of a basename set
<basename>: a known workspace basename

Options

Command line option
Description
--help
Print the help message and exit.

remove_from_basename_set

This operation removes basenames from a given basename set (and implicitly removes the set if necessary).

Usage: MATWorkspaceEngine [options] <dir> remove_from_basename_set <set_name> <basename>...

<set_name>: the name of a basename set
<basename>: a workspace basename

Options

Command line option
Description
--help
Print the help message and exit.

run_experiment

This operation allows you to run an experiment based on this workspace, either using an experiment file or by specifying the properties of the test set in terms of properties of the workspace basenames. You can do this either by specifying an experiment file and an experiment file variable to which the workspace directory should be bound, or by specifying the specific properties of the workspace documents to use as your test corpus. In the latter case, the training corpus will be the remainder of documents at least partially hand-annotated documents in the workspace.

If you do not use an experiment file, the defaults provided will be different than those in the workspace file. The only documents in the workspace that will be used are those which are at least partially gold; the model trainer will train only on gold or reconciled segments; and the scorer will only compare to gold or reconciled segments in the test corpus. You can override these restrictions by providing your own experiment file. You can duplicate these restrictions in your experiment file by:

The experiments will be saved in the experiments/ subdirectory of the workspace, in a directory named <year><month><day>_<hr><min><sec>_<msec>.

Usage: MATWorkspaceEngine [options] <dir> run_experiment [options]

Options

Command line option
Description
--experiment_file <file>
Specify an experiment file to use. If specified, --workspace_binding is also required. Either this or one of the --test_* parameters must be provided.
--workspace_binding <var>
A variable in the workspace experiment file to which this workspace should be bound. Required if --experiment_file is present.
--test_users <user(,user..)>
A comma-separated sequence of users to restrict the test corpus to. Not permitted if --experiment_file is provided.
--test_basename_sets <set(,set...)>
A comma-separated sequence of basename set names to restrict the test corpus to. Not permitted if --experiment_file is provided.
--test_basename_patterns <pat(,pat...)>
A comma-separated sequence of glob-style basename patterns to restrict the test corpus to. Not permitted if --experiment_file is provided.
--test_step_statuses <status(,status...)>
A comma-separated sequence of step statuses in the target trainable step to restrict the test corpus to. The background corpus will already be restricted to 'partially gold,gold,reconciled'. Not permitted if --experiment_file is provided.
--test_exclude_unassigned
If present, exclude unassigned documents from the test corpus. Not permitted if --experiment_file is provided.
--test_step <s>
The name of the trainable step in the workspace's workflow to target. Not permitted if --experiment_file is provided. Required if --experiment_file is absent and the workflow has more than one trainable step.
--csv_formula_output <fmt>
The format for the CSV output files. See the MATScore documentation for details.
--help
Print the help message and exit.

Example

Let's say you have an experiment file exp.xml whose corpora are defined entirely using the <workspace_corpus> element, and the workspace_dir attribute of <workspace_corpora> in that file refers to the "WS" binding variable. Then, you can use that experiment file as follows:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
run_experiment --experiment_file exp.xml --workspace_binding WS

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
run_experiment --experiment_file exp.xml --workspace_binding WS

Review and reconciliation

Workspaces support the option of reviewing documents after they're annotated. You can schedule a review in advance, for any document that completes a particular step, or, if there's no existing schedule, you can request a review after you complete a step. Finally, you can use a requested review to repair errors in previous steps. We provide more details on how the reviews work here and here.

There are three types of review: human review, reconciliation review, or reconciliation with crossvalidation. Only the first and third are available for an ad-hoc review request; all are available for scheduled reviews. For reconciliation reviews, the document will wait for its other reconciliation partners before actually submitting the reconciliation request. Crossvalidation reviews are the same, except once all the partners are submitted for review, the documents wait for crossvalidation (which must be manually triggered on the command-line using the apply_crossvalidation operation) before entering reconciliation.

schedule_review

This operation allows you to schedule a review.  This review will be initiated when the document is marked gold in the step for which the review is scheduled. If you need suspend this behavior during import (e.g., because you're importing multiple copies of the same document, and assigning them to different people, and you don't want the workspace engine to assume that the first import is the only copy that it will find), you can use the --defer_reconciliation option to the import operation.

Usage: MATWorkspaceEngine [options] <dir> schedule_review <step> <review_type>

<step>: the name or pretty name of the step in the workflow for which the review is being scheduled.
All basenames which complete this step will be submitted for this review,
including the case where this step is the final step reached during the import process.
<review_type>: one of human, reconciliation, reconciliation_with_crossvalidation.

Options

Command line option
Description
--help
Print the help message and exit.

unschedule_review

This operation allows you to remove a scheduled review.

Usage: MATWorkspaceEngine [options] <dir> unschedule_review <step>

<step>: the name or pretty name of the step in the workflow for which the review schedule is being removed.

Options

Command line option
Description
--help
Print the help message and exit.

list_review_schedule

This operation will list the scheduled reviews, by step.

Usage: MATWorkspaceEngine [options] <dir> list_review_schedule

Options

Command line option
Description
--help
Print the help message and exit.

apply_crossvalidation

Use this operation to apply crossvalidation to accumulated documents which are waiting for it. In general, you should allow a reasonable number of documents to accumulate awaiting crossvalidation before you trigger it, since otherwise, it'll essentially do the same thing that autotagging does.

This operation accepts basename arguments, but those arguments are ignored.

Usage: MATWorkspaceEngine [options] <dir> apply_crossvalidation [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

The following folders support this operation: core

Options

Command line option
Description
--folds <n>
Number of cross-validation folds (i.e., number of ways the corpus is split). Default is 8.
--crossvalidation_doc_count_threshold <n>
Number of documents awaiting crossvalidation required in a given step for crossvalidation to occur. Default is 10.
--help Prints the apply_crossvalidation help message and exits.

remove_from_reconciliation

If, for some reason, a document fails to exit reconciliation naturally (if some of the users fail to complete their reconciliation steps, for example), you can use this operation to remove the document forcibly from reconciliation. This operation will also free documents from waiting for crossvalidation or for reconciliation partners. By default, this operation will advance the document to the next hand-annotatable step.

Usage: MATWorkspaceEngine [options] <dir> remove_from_reconciliation [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

The following folders support this operation: reconciliation

Options

Command line option
Description
--dont_reintegrate
By default, reconciliation updates are integrated back into the core documents. Use this flag to skip that step.
--suppress_advancement
By default, when reconciliation is done, the workspace advances the document to the next hand-annotatable step, or completes the workflow if there are no such steps. This flag will suppress automatic advancement.
--help Prints the remove_from_reconciliation help message and exits.

Keep in mind that the document may already be partially reconciled.  If you want to remove the document and preserve the decisions already made, you can use the operation as follows:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove_from_reconciliation reconciliation basename1

This will migrate the agreed-upon document segments back into the documents which were used to create the reconciliation document. If you do not want to preserve those decisions, and simply want to stop the document from being reconciled, do this instead:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove_from_reconciliation --dont_reintegrate reconciliation basename1

request_review

If the current step isn't scheduled for review or reconciliation, you can request a review yourself, if you want one. Only human review and reconciliation with crossvalidation are available; you can't request a review for a document assigned to someone else.

The 'repair' review type is special; it's equivalent to requesting a human review which you'll conduct yourself, on a document which isn't complete in its current step.

Usage: MATWorkspaceEngine [options] <dir> request_review [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

Supported by folder 'core'.

Options

Command line option
Description
--review_type <r>
The type of ad-hoc review. Available types are 'human', 'repair' and 'reconciliation_with_crossvalidation'
--lock_id <id>
Lock ID.
--user <u>
The user requesting the review.
--help Prints the request_review help message and exits.

complete_human_review

If a document is in the human review folder, you can indicate that you're satisfied with the document with this operation. If the document isn't being reviewed for repair, this operation will mark the document reconciled for the current step, and then advance the document to the next hand-annotatable step.

Options

Command line option
Description
--suppress_advancement
By default, when the review is completed, the workspace advances the document to the next hand-annotatable step, or completes the workflow if there are no such steps. This flag will suppress automatic advancement.
--lock_id <id>
Lock ID.
--help Prints the complete_human_review help message and exits.

Administration

force_unlock

This operation forces a basename in the named folder to be unlocked. The --user option is obligatory. Warning: be very certain that you apply the force_unlock operation only to basenames whose locks have been stranded. If you unlock a basename which is being annotated, the annotator will not be able to save her changes.

Note: you can't use this lock to forcibly undo an operation lock. In this situation you'll get an error "workspace is currently unavailable (processing another request)". More on how to deal with that here.

Usage: MATWorkspaceEngine [options] <dir> force_unlock [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

Supported by folder 'core'.

Options

Command line option
Description
--user <user>
The user who's locked the basename.
--help Prints the force_unlock help message and exits.