Using workspaces

Workspaces provide a guided, structured way of managing and processing your documents. Workspace mode offers a number of advantages over working with documents manually in file mode, but it is difficult to change modes once you've begun, so make sure that this is what you want. Workspace mode is provided by MATWorkspaceEngine on the command line, and via "File -> Open workspace..." in the Web UI.

To process documents in a workspace, you must import those documents into the workspace. The workspace maintains a physical copy of that document, and also keeps track of information about the document (e.g., how far it's been processed) in a separate database inside the workspace. While workspaces are simply directories, it's important that you, as the user, not modify these directories by hand (i.e., no adding, removing, or editing files) - you should only interact with the workspace via the MAT UI or the workspace engine. You should also never use MATEngine or MATModelBuilder to save files or models into workspaces.

Each workspace has at least one workspace user. This is not a security feature, and no security (e.g., a password) is associated with it. The users are there merely to partition and track the hand annotation that's performed on the document. If you set up your own workspace, you can use whatever name you choose; if your task maintainer set up your workspace as a server on a separate machine, she'll tell you what username to use.

Concepts

A workspace encapsulates a particular language and workflow. The goal of the workspace is to try to ensure that each document the workspace is processing is always ready for hand annotation. So when you import documents into a workspace, they'll be advanced to the first hand-annotatable step in the workspace's workflow; when you declare that you're done with a step, the document will be advanced to the next hand-annotatable step (if any). All intervening automated processing, including applying the appropriate trainable models, is done for you.

The workspace keeps track of the state of the document in the current step for the document. Documents can be:

unannotated, which means that no annotations for that step have been added
uncorrected, which means that the document has been automatically tagged, but no corrections have been made
partially corrected, which means that a human annotator has modified the annotations for the current step, but hasn't marked them gold
gold, which means that a human annotator has judged the annotations to be complete for that step
reconciled, which means the completed annotations have undergone some sort of review

Documents may be editable by any workspace user, or might be assigned to a particular user. If a document is assigned to someone other than you, you'll be able to view it, but not edit it, in the UI.

Workspaces support both human review for documents, and also reconciliation of conflicting document annotations, including the option of cross-validation. Reviews can be scheduled for a particular step, or requested for a particular document by a user.

Using the workspace

Let's see how you can use workspaces. Tutorial 5 presents examples of most of the steps below, and more examples can be found in the documentation for MATWorkspaceEngine.

Step 1: create the workspace

First, you create the workspace. The workspace must have an assigned task, which you specify when you create it. You must also specify a workflow to build the workspace out of, or a workspace configuration which customizes that workflow. Creating the workspace creates the directory, the folder subdirectories, a place to store the models, and some administrative information.

You must also specify an initial user when you create the workspace. Unlike file mode, all annotation in workspaces is attributed to one or more named annotators.

Workspace creation is currently only available on the command line.

Step 2: import documents

Next, you import documents into the workspace. At the moment, the workspace has four predefined folders: "core", "review", "reconciliation" and "export". (The "export" folder is not currently used.) Your task may also define additional folders, which you might import documents into, but typically, you'll import documents into the "core" folder.

When a document is imported, it is assigned a unique basename, which is usually the basename of the path of the imported file (i.e., the final path component). All versions of this file in the various workspace folders have the identical basename. If you assign a document to a particular annotator, its basename will be suffixed with the name of the assigned annotator; if you assign the document to multiple annotators, each annotator will have his or her own copy.

You can import documents as many times as you like, and at any point while you work with your workspace. For instance, you can import some documents, hand annotate them, and then build a model, and then import more raw documents to process with the models you've built.

File import is currently only available on the command line.

Step 3: perform operations on documents

The vast majority of your time in the workspace will be spent interacting with your documents. Each folder has predefined operations which you can perform on documents in the folder. The operations you can perform on folders, as well as the operations you can perform on the workspace itself, are described here.

On the command line, these operations are applied by default to all the files in the folder, and optionally to a specified subset. In the UI, on the other hand, these operations are only available on a file-by-file basis. We haven't yet tackled managing the more time-consuming folder-level operations in the UI.

A typical interaction

Because interacting with the workspace means switching between longer-duration batch operations (e.g., model building) and quicker file-level operations, (e.g., hand tagging), the user will end up moving back and forth between the UI and the terminal. This is currently unavoidable. Here's what a typical interaction might look like.

Command line: Create a workspace
Command line: Import a batch of documents
UI: Hand annotate some documents
Command line: Build a model and autotag the documents which are eligible to be tagged by that model

Step 4 can be repeated with newly imported documents, so you can iteratively expand the model and your supply of hand-corrected documents.

Workspace logging

You can enable logging in your workspace. The logger will capture all the workspace operations, and also collect all the interactions that users have with workspaces in the workspace UI. This UI logging is completely separate from the global UI logger. The user has no control over whether workspace logging happens; it's controlled entirely from the command line. See here for more details.

Workspace security

Unlike file mode, workspace mode is stateful from the point of view of the UI. It is the server, rather than the client, which loads and saves the files. However, we don't want just anybody to be able to cause the server to perform these stateful operations, so the MAT web server implements some security mechanisms.

Note, however, that the MAT workspace functionality is not an enterprise-secure implementation, and will never be one. It does not use SSL; it does not perform any sort of user authentication beyond the workspace key; it does not provide any security logging or traceability; and it does not currently implement transactions. You should assume that anyone who has access to your network can see your workspace traffic, and overwrite your data.

Note that workspace users play no role in workspace security.

Troubleshooting

Failed import

You may realize, once you've completed an import operation, that you didn't import the basenames the way you'd wanted; perhaps you'd intended to strip a suffix, or you assigned them to the wrong workspace user. If you need to undo your import, see here.

Locked files

Documents in workspaces can only be edited by one user at a time. When you're editing a document, and you close the window, the workspace makes that document available to other annotators if appropriate. Sometimes, the "lock" the workspace applies while you're editing doesn't get freed. If the UI tells you that a document is locked by someone other than you, and you're supposed to be editing it, see here.

Error "workspace is currently unavailable (processing another request)"

If you get this error message, and you're absolutely certain that no one else is working on the workspace, something horrible has happened, and a previous operation has failed in such a way that the entire workspace is locked. More on how to deal with this here.