Workspaces provide a guided, structured way of managing and
processing your documents. Workspace mode offers a number of
advantages over working with documents manually in file mode, but
it is difficult to change modes once you've begun, so make sure that this is what
you want. Workspace mode is provided by MATWorkspaceEngine on the
command line, and via "File -> Open workspace..." in the Web
To process documents in a workspace, you must import those
documents into the workspace. The workspace maintains a physical
copy of that document, and also keeps track of information about
the document (e.g., how far it's been processed) in a separate
database inside the workspace. While workspaces are simply
directories, it's important that you, as the user, not modify
these directories by hand (i.e., no adding, removing, or editing
files) - you should only interact with the workspace via the MAT UI or the workspace engine. You should
also never use MATEngine or MATModelBuilder to save files or
models into workspaces.
Each workspace has at least one workspace user. This is not a
security feature, and no security (e.g., a password) is associated
with it. The users are there merely to partition and track the
hand annotation that's performed on the document. If you set up
your own workspace, you can use whatever name you choose; if your
task maintainer set up your
workspace as a server on a separate machine, she'll tell you what
username to use.
A workspace encapsulates a particular language and workflow. The
goal of the workspace is to try to ensure that each document the
workspace is processing is always ready for hand annotation. So
when you import documents into a workspace, they'll be
advanced to the first hand-annotatable step in the workspace's
workflow; when you declare that you're done with a step, the
document will be advanced to the next hand-annotatable step (if
any). All intervening automated processing, including applying the
appropriate trainable models, is done for you.
The workspace keeps track of the state of the document in the
current step for the document. Documents can be:
Documents may be editable by any workspace user, or might be
assigned to a particular user. If a document is assigned to
someone other than you, you'll be able to view it, but not edit
it, in the UI.
Workspaces support both human review for documents, and also
reconciliation of conflicting document annotations, including the
option of cross-validation. Reviews can be scheduled for a
particular step, or requested for a particular document by a user.
Let's see how you can use workspaces. Tutorial
5 presents examples of most of the steps below, and more
examples can be found in the documentation for MATWorkspaceEngine.
First, you create the workspace. The workspace must have an
assigned task, which you specify when you create it. You must also
specify a workflow to build the workspace out of, or a workspace
configuration which customizes that workflow. Creating the
workspace creates the directory, the folder subdirectories, a
place to store the models, and some administrative information.
You must also specify an initial user when you create the
workspace. Unlike file mode, all annotation in workspaces is
attributed to one or more named annotators.
Workspace creation is currently only available on the command
Next, you import documents into the workspace. At the moment, the
workspace has four predefined folders: "core", "review",
"reconciliation" and "export". (The "export" folder is not
currently used.) Your task may also define additional folders,
which you might import documents into, but typically, you'll
import documents into the "core" folder.
When a document is imported, it is assigned a unique basename,
which is usually the basename of the path of the imported file
(i.e., the final path component). All versions of this file in the
various workspace folders have the identical basename. If you
assign a document to a particular annotator, its basename will be
suffixed with the name of the assigned annotator; if you assign
the document to multiple annotators, each annotator will have his
or her own copy.
You can import documents as many times as you like, and at any
point while you work with your workspace. For instance, you can
import some documents, hand annotate them, and then build a model,
and then import more raw documents to process with the models
File import is currently only available on the command line.
The vast majority of your time in the workspace will be spent
interacting with your documents. Each folder has predefined
operations which you can perform on documents in the folder. The
operations you can perform on folders, as well as the operations
you can perform on the workspace itself, are described here.
On the command line, these operations are applied by default to
all the files in the folder, and optionally to a specified subset.
In the UI, on the other hand, these operations are only available
on a file-by-file basis. We haven't yet tackled managing the more
time-consuming folder-level operations in the UI.
Because interacting with the workspace means switching between
longer-duration batch operations (e.g., model building) and
quicker file-level operations, (e.g., hand tagging), the user will
end up moving back and forth between the UI and the terminal. This
is currently unavoidable. Here's what a typical interaction might
Step 4 can be repeated with newly imported documents, so you can
iteratively expand the model and your supply of hand-corrected
You can enable logging in your workspace. The logger will capture
all the workspace operations, and also collect all the
interactions that users have with workspaces in the workspace UI. This UI logging is
completely separate from the global UI logger.
The user has no control over whether workspace logging happens;
it's controlled entirely from the command line. See here for more details.
Unlike file mode, workspace mode is stateful from the point of view of the UI. It is
the server, rather than the client, which loads and saves the
files. However, we don't want just anybody to be able to cause the
server to perform these stateful operations, so the MAT web server implements some security mechanisms.
Note, however, that the MAT workspace functionality is not an enterprise-secure implementation, and will never be one. It does not use SSL; it does not perform any sort of user authentication beyond the workspace key; it does not provide any security logging or traceability; and it does not currently implement transactions. You should assume that anyone who has access to your network can see your workspace traffic, and overwrite your data.
Note that workspace users play no role in workspace security.
You may realize, once you've completed an import operation, that
you didn't import the basenames the way you'd wanted; perhaps
you'd intended to strip a suffix, or you assigned them to the
wrong workspace user. If you need to undo your import, see here.
Documents in workspaces can only be edited by one user at a time. When you're editing a document, and you close the window, the workspace makes that document available to other annotators if appropriate. Sometimes, the "lock" the workspace applies while you're editing doesn't get freed. If the UI tells you that a document is locked by someone other than you, and you're supposed to be editing it, see here.
If you get this error message, and you're absolutely certain that
no one else is working on the workspace, something horrible has
happened, and a previous operation has failed in such a way that
the entire workspace is locked. More on how to deal with this here.