MAT workspaces are quite powerful. We described the basics for
individual users here, here, and here.
We've covered importing documents, annotating them, building
models, automated tagging, and correction, and also touched on
workspace logging. In this document, we discuss some more advanced
capabilities that the workspaces provide.
One of the major innovations in MAT 2.0 is that workspaces
support multiple annotators. All annotation within workspaces is
attributed to a particular human annotator, or to the automated
tagging engine. This is not a security feature, and no security
(e.g., a password) is associated with it; it's merely to partition
responsibility. The usernames don't need to match your system
username, but you might find it convenient to set them up that
way. You can declare multiple annotators when you create a
workspace:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace create \
--task 'Named Entity' --initial_users user1,user2
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace create \
--task "Named Entity" --initial_users "user1,user2"
or later:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace \
register_users user1 user2
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace \
register_users user1 user2
You can list the registered users with the list_users
operation.
In the default situation, when you import a document, it's
available for annotation by any of the workspace users (it's unassigned).
You can change this by assigning a document to a user, and perhaps
even to multiple users for duplicate annotation. You can assign a
document to a user when you import it:
Unix:
% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
--file_type raw --assign_to_users user1,user2 \
"core" sample/ne/resources/data/raw/voa2.txt
Windows native:
> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
--file_type raw --assign_to_users "user1,user2" \
"core" %CD%\sample\ne\resources\data\raw\voa2.txt
You can also assign a document after it's imported (but only if
it hasn't been modified by a human yet):
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace assign \
--user user1,user2 voa2
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace assign \
--user "user1,user2" voa2
These assignments will be in addition to any assignments made
when you import.
The workspace list
operation allows you to list the contents of all the folders in
the workspace, and the dump_database
operation shows you the contents of the workspace database. You
may find these operations useful for debugging.
As an annotator, you might want to have your document reviewed.
You have the option of scheduling a review for a particular step;
if you do this, the selected review type will be invoked for any
document which is marked gold in that step. Alternatively, if no
review is scheduled for that step, you can request an ad-hoc
review once the document is marked complete for a step. You can
also use the review mechanism to repair errors from previous
steps, if you previously missed them. You can find additional
details here and here.
You can schedule a review using the schedule_review
operation, as follows:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace schedule_review \
tag human
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace schedule_review \
tag human
You can see what reviews are scheduled using the list_review_schedule
operation, or unschedule them using the unschedule_review
operation.
You can schedule three types of review.
In human review, the completed document is copied from the core
folder to the review folder. An annotator with review privileges
(other than the one who last annotated the document in the core
folder) reviews the document, and applies the "Save & Done"
operation in the UI when satisfied. Once the review document is
complete, it is copied back to the core folder and marked
reconciled for the step just completed.
The reconciliation review is intended to be used when documents
are multiply assigned. Each
completed document is placed in a "suspended" state until all the
copies of this document have completed the scheduled step. If you
execute the list
operation on the command line, or inspect the core folder in the
MAT UI, you'll see that the state of these suspended documents is
described as "awaiting reconciliation partner".
Once all the versions of the document have been completed in that
step, a reconciliation document is created and inserted into the
reconciliation folder. At this point, the state of the suspended
documents in the core folder will read "in reconciliation". An
annotator with review privileges reconciles
the conflicts in the reconciliation document, and applies
the "Save & Done" operation, which closes the completely
reconciled document. Once the reconciliation document is closed,
it is converted back into a normal document and copied back into
the core folder, replacing the documents which were submitted for
the review. These now-reviewed documents are marked reconciled for
the step just completed. At this point, all the copies of the
document will be identical.
Note that if the workspace encounters a request for
reconciliation review for a document which only has one copy
(e.g., it's not assigned to anyone, or assigned to only one
person), it will submit the document for human review instead. If
you're importing multiple copies of a document, and assigning them
to different people via multiple import operations, you
probably don't want the first import to trigger a human review
because it's the only copy of the document the workspace knows
about; in this case, use the --defer_reconciliation operation on
the all the imports except the last one to block immediate review.
This review is like reconciliation review, but it doesn't require
multiple assignment. The idea is that in addition to (or instead
of) comparing documents to each other, the document is compared to
the annotations that the step's trainable engine would assign to
it, based on the rest of the corpus. to create these expected
annotations, the workspace uses crossvalidation.
In crossvalidation, a corpus is segmented into a number of folds,
and each fold is annotated by the model constructed from the other
folds. So, e.g., in 5-fold crossvalidation, the corpus is split
five ways, and each slice is annotated by the model constructed
from the other four slices. So these annotations are
(approximately) what a model built out of your corpus would
produce, and can give you a sense of the consistency of your
annotations from the trainable engine's point of view.
If you've recently built a model for this step in your workspace,
it doesn't make a whole lot of sense to use crossvalidation with a
single document; after all, the document has just been
automatically annotated for you to correct, and the models that
get built during crossvalidation are pretty much the model that
you just applied and corrected. So it usually makes sense to allow
a number of documents to accumulate for crossvalidation.
As a result, the sequence of states and operations for this
review type is more complex than for reconciliation review. Like
reconciliation, documents still wait for their reconciliation
partners; but once all the document versions are complete for the
step, they move to a new state called "awating reconciliation".
These documents remain in this state until you execute the apply_crossvalidation
operation, e.g.:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace apply_crossvalidation \
--folds 5 core
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace apply_crossvalidation \
--folds 5 core
As you see, you can specify the number of folds; you can
also specify the lower limit of the number of documents which will
allow crossvalidation to execute.
Once this operation is complete, reconciliation review proceeds
normally, with the additional documents contributed by the
crossvalidation annotation.
If no review is scheduled for a step, you can request a review
for your document. There are three types of requested reviews
available: human, reconciliation_with_crossvalidation, and repair.
If you want to request one of these reviews, make sure
that when you mark the document gold in the UI, you select "Mark
gold (don't advance)"; if you permit the document to advance
automatically, it'll pass right by your review opportunity, since
only gold documents can be reviewed using these review types. You
can request a review using the request_review
operation:
Unix:
% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace request_review \
--user user1 voa2
Windows native:
> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace request_review \
--user user1 voa2
You can only request a review for documents you annotate (as
indicated by the --user option). If the document is multiply
assigned, and you ask for a reconciliation_with_crossvalidation
review for your document, it will not await the other
document versions; it will move directly to "awaiting
crossvalidation". Otherwise, these reviews proceed exactly as they
do when they're scheduled.
The repair review is special. It's intended for situations where
you've made a mistake in a previous workspace step. This review is
like human review, in that the review is conducted in the review
folder; but the reviewer should be the requesting annotator, and
the reviewer does not require the reviewer role. Finally, the
document is not marked reconciled when the review is completed;
it's returned to the state it was previously.