Tutorial 5: Workspaces

Now that we've covered file mode in the first five tutorials, we're going to address workspace mode. In workspace mode, you don't have nearly as much control over

On the other hand, you don't need to worry about any of those things, either.

We're going to use the same simple 'Named Entity' task, and we're going to assume that your task is installed. This tutorial involves both the UI and the command line. Because this tutorial involves the command line, make sure you're familiar with the "Conventions" section in your platform-specific instructions in the "Getting Started" section of the documentation.

Step 1: Create your workspace

The only way to create a workspace is on the command line. We use MATWorkspaceEngine. The first argument of MATWorkspaceEngine is the path of the affected workspace, and the second argument is the operation. Options and arguments for the chosen operation follow.

Creating a workspace requires a task, so we provide the --task directive. Workspaces also track annotation progress by user, so we need at least one user name to create the workspace.

Every workflow that has at least one hand-annotatable step can be made into a workspace. Your task may have a default workflow for your workspace; in the case of the 'Named Entity' task, the default workflow is the same "Demo" workflow we've been working with up to now. If we want to use the default workflow, we don't need to specify it.

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace create \
--task 'Named Entity' --initial_users user1


Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace create \
--task "Named Entity"
--initial_users user1

Created workspace for task 'Named Entity' in directory ...

You now have a workspace in the specified directory, built on top of the "Demo" workflow of the "Named Entity" task.

If you're interested in the structure of a workspace, look here.

Step 2: Import files into your workspace

Workspaces organize files by folders, and they track the status of the files as they're processed. The "core" folder supports all the normal annotation functions. We'll begin by importing a single raw file into the core folder.

Unix:

% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
--file_type raw "core" sample/ne/resources/data/raw/voa2.txt

Windows native:

> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
--file_type raw "core" %CD%\sample\ne\resources\data\raw\voa2.txt

So here we use the "import" operation, which takes two arguments: the folder name ("core") and the file to import. We've also used the --strip_suffix directive to modify the name by which the workspace knows the file. Finally, we've told the workspace engine, via the --file_type option, that the file we're importing is a raw file (rather than a rich MAT JSON file). For more details on importing documents, see here.

We can see the contents of the workspace (and of each folder), with the "list" operation:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace list "core"

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace list "core"

core:
voa2 (status unannotated, openable True, current step carafe_tag, workflow status awaiting hand annotation)

Note that the listing tells you the status of the document.

You can only import a file name once. If you try to import the file again, you'll get an error:

Unix: 

% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
--file_type raw "core" sample/ne/resources/data/raw/voa2.txt

Windows native:

> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
--file_type raw "core" sample\ne\resources\data\raw\voa2.txt

Error: Basenames for files sample/ne/resources/data/raw/voa2.txt already exist in workspace; no files imported.

In other words, once you create a particular basename in the workspace using the "import" operation, you can't do it again.

Step 3: Open the workspace in the UI

In this step, we're going to learn about the UI aspects of the workspace.

First, start up the UI as we described in tutorial 1.

Note: when you start up the Web server in its default mode, workspaces will only be accessible from a browser client running on the same host. There are many options available to the Web server at startup which affect the workspaces, so if you want to use workspaces in the UI, we recommend that you familiarize yourself with the MATWeb documentation.

In the terminal in which you're running the Web server, you'll see this when it starts up:

Web server started on port 7801.

Web server command loop. Commands are:

exit - exit the command loop and stop the Web server
loopexit - exit the command loop, but leave the Web server running
taggerexit - shut down the tagger service, if it's running
restart - restart the Web server
ws_key - show the workspace key
help, ? - this message

Workspace key is XJ9dGBaCNveYHk9CZzw6wTM5WH8x05y1
Command:

Note the workspace key. This key is randomly generated, and known only to the user who starts the Web server. This key must be provided to the UI when the user opens the workspace. This simple security feature ensures that even though the Web server will be modifying the workspace, it does so if the UI user has proved that s/he has the appropriate access. For more about workspace security and the UI, see here.

Next:

You should see a window that looks like this:

[core folder]

Step 4: Open a document

A single left click on the file name in the workspace tab should open the file. You'll see that this document has been prepared for annotation (it has been zoned and tokenized, in particular). You'll see in the controls on the right that its status, as shown in the listing above, is "unannotated", which means that no human annotator has touched it yet, and you'll see the current step marked, since your workflow may have multiple steps in which you can perform hand annotation:

[core view]

Note how the controls area here differs from the one in file mode:

If you select the folder tab now, you'll see that the document is now listed as "locked by user1". Workspaces maintain document locks to ensure that no one else trounces your changes. This lock will be freed when you close the document.

Step 5: Hand annotate

At this point, you can annotate your document as you did in Tutorial 1. If you want to leave the workspace without finishing your annotation, just select the Save operation in the operations menu and press Go; you can always return to the document. Once you're satisfied with your annotations, select "Mark gold" in the operations menu and press Go; your document will be saved and the document status updated.

Finally, close the document. In a minute, we're going to do some automated tagging in the workspace, and currently this is not possible while documents are locked.

Step 6: Import more documents

You'd typically annotate several documents in the first round before building a model, but we want to move directly to that step. Since we only have one hand-annotated document at the moment, what we're going to do is import some other documents into the workspace. We're going to import some of the annotated documents that come with the Named Entity task into the core folder; these documents are already marked internally as gold-standard reconciled documents (i.e., in addition to being marked gold, their correctness has been validated by further review). We're also going to import one of them as a raw document.

Unix:

% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
--file_type raw "core" sample/ne/resources/data/raw/voa1.txt
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt.json" \
"core" sample/ne/resources/data/json/voa[3-9].txt.json

Windows native:

> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
--file_type raw "core" sample\ne\resources\data\raw\voa1.txt
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt.json" \
"core" sample\ne\resources\data\json\voa3.txt.json \
sample\ne\resources\data\json\voa4.txt.json \
sample\ne\resources\data\json\voa5.txt.json \
sample\ne\resources\data\json\voa6.txt.json \
sample\ne\resources\data\json\voa7.txt.json \
sample\ne\resources\data\json\voa8.txt.json \
sample\ne\resources\data\json\voa9.txt.json

Now, let's list the workspace to see what we have:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace list

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace list

core:
voa1 (status unannotated, openable True, current step tag, workflow status awaiting hand annotation)
voa2 (status gold, locked by user1, workflow status done, current step tag, openable False)
voa3 (status reconciled, openable True, current step tag, workflow status done)
voa4 (status reconciled, openable True, current step tag, workflow status done)
voa5 (status reconciled, openable True, current step tag, workflow status done)
voa6 (status reconciled, openable True, current step tag, workflow status done)
voa7 (status reconciled, openable True, current step tag, workflow status done)
voa8 (status reconciled, openable True, current step tag, workflow status done)
voa9 (status reconciled, openable True, current step tag, workflow status done)

review:

export:

reconciliation:

You can see that the document you tagged is marked gold, and the documents you just imported are marked reconciled. And finally, you can see that there is one document - the raw document you just imported - which is marked unannotated.

Step 7: Build a model

Now, we build a model. Workspace models are completely distinct from from default task models, like the one we built in Tutorial 2. They're built exclusively from the documents in the workspace.

This is a command line operation only. We're going to ask the workspace to autotag afterwards, which should mark  "voa1" as uncorrected (since now it's been automatically annotated). Each time we build a model and autotag, any documents that are either unannotated or uncorrected are autotagged.

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace modelbuild \
--do_autotag
"core"

Windows native:

% %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace modelbuild \
--do_autotag
"core"

Once this is done, we can look at the contents of the workspace again:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace list

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace list

core:
voa1 (status uncorrected, openable True, current step tag, workflow status awaiting hand annotation)
voa2 (status gold, locked by user1, workflow status done, current step tag, openable False)
voa3 (status reconciled, openable True, current step tag, workflow status done)
voa4 (status reconciled, openable True, current step tag, workflow status done)
voa5 (status reconciled, openable True, current step tag, workflow status done)
voa6 (status reconciled, openable True, current step tag, workflow status done)
voa7 (status reconciled, openable True, current step tag, workflow status done)
voa8 (status reconciled, openable True, current step tag, workflow status done)
voa9 (status reconciled, openable True, current step tag, workflow status done)

review:

export:

reconciliation:

Note that voa1, which was previously unannotated, is now uncorrected - i.e., it's been autotagged but not hand-corrected. The other documents, because they're gold or reconciled, were used to create the model which the workspace applied to voa1.

Step 8: Hand correct

Now, you'll want to hand-correct the autotagged document.

If your UI has been open while you've performed the last two steps on the command line, the UI won't know that the state of the workspace has changed. You can select the workspace tab and press the "Refresh" button in the controls area. Now, the state of the UI and the state of the workspace will be synchronized.

Select the core folder from the folder menu. You should see "voa1", among other documents. Open it. Review the annotations and correct whatever is needed. When the document is correct, choose "Mark gold" and press Go, and the document will be marked gold.

Step 9: Clean up (optional)

In the next tutorial, we'll learn about the experiment engine. If you want to learn how to use the experiment engine with workspaces, don't remove your workspace.

If you're not planning on doing any other tutorials, remove the workspace:

Unix:

% rm -rf /tmp/ne_workspace

Windows native:

> rd /s /q %TMP%\ne_workspace list

If you don't want the "Named Entity" task hanging around, remove it as shown in the final step of Tutorial 1.

This concludes Tutorial 5.