Advanced workspace maintenance

MAT workspaces are quite powerful. We described the basics for individual users here, here, and here. We've covered importing documents, annotating them, building models, automated tagging, and correction, and also touched on workspace logging. In this document, we discuss some more advanced capabilities that the workspaces provide.

Multiple annotators

One of the major innovations in MAT 2.0 is that workspaces support multiple annotators. All annotation within workspaces is attributed to a particular human annotator, or to the automated tagging engine. This is not a security feature, and no security (e.g., a password) is associated with it; it's merely to partition responsibility. The usernames don't need to match your system username, but you might find it convenient to set them up that way. You can declare multiple annotators when you create a workspace:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace create \
--task 'Named Entity' --initial_users user1
,user2

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace create \
--task "Named Entity"
--initial_users "user1,user2"

or later:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace \
register_users user1
user2

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace \
register_users user1 user2

You can list the registered users with the list_users operation.

Document assignment

In the default situation, when you import a document, it's available for annotation by any of the workspace users (it's unassigned). You can change this by assigning a document to a user, and perhaps even to multiple users for duplicate annotation. You can assign a document to a user when you import it:

Unix:

% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
--file_type raw --assign_to_users user1,user2 \
"core" sample/ne/resources/data/raw/voa2.txt


Windows native:

> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
--file_type raw --assign_to_users "user1,user2" \
"core" %CD%\sample\ne\resources\data\raw\voa2.txt

You can also assign a document after it's imported (but only if it hasn't been modified by a human yet):

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace assign \
--user user1,user2 voa2

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace assign \
--user "user1,user2" voa2

These assignments will be in addition to any assignments made when you import.

Inspecting the state of the workspace

The workspace list operation allows you to list the contents of all the folders in the workspace, and the dump_database operation shows you the contents of the workspace database. You may find these operations useful for debugging.

Review and reconciliation

As an annotator, you might want to have your document reviewed. You have the option of scheduling a review for a particular step; if you do this, the selected review type will be invoked for any document which is marked gold in that step. Alternatively, if no review is scheduled for that step, you can request an ad-hoc review once the document is marked complete for a step. You can also use the review mechanism to repair errors from previous steps, if you previously missed them. You can find additional details here and here.

Scheduling a review

You can schedule a review using the schedule_review operation, as follows:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace schedule_review \
tag human

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace schedule_review \
tag human

You can see what reviews are scheduled using the list_review_schedule operation, or unschedule them using the unschedule_review operation.

You can schedule three types of review.

human

In human review, the completed document is copied from the core folder to the review folder. An annotator with review privileges (other than the one who last annotated the document in the core folder) reviews the document, and applies the "Save & Done" operation in the UI when satisfied. Once the review document is complete, it is copied back to the core folder and marked reconciled for the step just completed.

reconciliation

The reconciliation review is intended to be used when documents are multiply assigned. Each completed document is placed in a "suspended" state until all the copies of this document have completed the scheduled step. If you execute the list operation on the command line, or inspect the core folder in the MAT UI, you'll see that the state of these suspended documents is described as "awaiting reconciliation partner".

Once all the versions of the document have been completed in that step, a reconciliation document is created and inserted into the reconciliation folder. At this point, the state of the suspended documents in the core folder will read "in reconciliation". An annotator with review privileges reconciles the conflicts in the reconciliation document, and applies the "Save & Done" operation, which closes the completely reconciled document. Once the reconciliation document is closed, it is converted back into a normal document and copied back into the core folder, replacing the documents which were submitted for the review. These now-reviewed documents are marked reconciled for the step just completed. At this point, all the copies of the document will be identical.

Note that if the workspace encounters a request for reconciliation review for a document which only has one copy (e.g., it's not assigned to anyone, or assigned to only one person), it will submit the document for human review instead. If you're importing multiple copies of a document, and assigning them to different people via multiple import operations, you probably don't want the first import to trigger a human review because it's the only copy of the document the workspace knows about; in this case, use the --defer_reconciliation operation on the all the imports except the last one to block immediate review.

reconciliation_with_crossvalidation

This review is like reconciliation review, but it doesn't require multiple assignment. The idea is that in addition to (or instead of) comparing documents to each other, the document is compared to the annotations that the step's trainable engine would assign to it, based on the rest of the corpus. to create these expected annotations, the workspace uses crossvalidation.

In crossvalidation, a corpus is segmented into a number of folds, and each fold is annotated by the model constructed from the other folds. So, e.g., in 5-fold crossvalidation, the corpus is split five ways, and each slice is annotated by the model constructed from the other four slices. So these annotations are (approximately) what a model built out of your corpus would produce, and can give you a sense of the consistency of your annotations from the trainable engine's point of view.

If you've recently built a model for this step in your workspace, it doesn't make a whole lot of sense to use crossvalidation with a single document; after all, the document has just been automatically annotated for you to correct, and the models that get built during crossvalidation are pretty much the model that you just applied and corrected. So it usually makes sense to allow a number of documents to accumulate for crossvalidation.

As a result, the sequence of states and operations for this review type is more complex than for reconciliation review. Like reconciliation, documents still wait for their reconciliation partners; but once all the document versions are complete for the step, they move to a new state called "awating reconciliation". These documents remain in this state until you execute the apply_crossvalidation operation, e.g.:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace apply_crossvalidation \
--folds 5 core

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace apply_crossvalidation \
--folds 5 core

 As you see, you can specify the number of folds; you can also specify the lower limit of the number of documents which will allow crossvalidation to execute.

Once this operation is complete, reconciliation review proceeds normally, with the additional documents contributed by the crossvalidation annotation.

Requesting a review

If no review is scheduled for a step, you can request a review for your document. There are three types of requested reviews available: human, reconciliation_with_crossvalidation, and repair.

human and reconciliation_with_crossvalidation

If you want to request one of these reviews, make sure that when you mark the document gold in the UI, you select "Mark gold (don't advance)"; if you permit the document to advance automatically, it'll pass right by your review opportunity, since only gold documents can be reviewed using these review types. You can request a review using the request_review operation:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace request_review \
--user user1 voa2

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace request_review \
--user user1 voa2

You can only request a review for documents you annotate (as indicated by the --user option). If the document is multiply assigned, and you ask for a reconciliation_with_crossvalidation review for your document, it will not await the other document versions; it will move directly to "awaiting crossvalidation". Otherwise, these reviews proceed exactly as they do when they're scheduled.

repair

The repair review is special. It's intended for situations where you've made a mistake in a previous workspace step. This review is like human review, in that the review is conducted in the review folder; but the reviewer should be the requesting annotator, and the reviewer does not require the reviewer role. Finally, the document is not marked reconciled when the review is completed; it's returned to the state it was previously.