Tutorial 1: Load and hand tag a document

To get our feet wet with MAT, let's load and hand tag a document. In MAT, you need a task in order to do any annotation, and we're going to use a task that comes with MAT, which is called "Named Entity".

The named entity task was original defined for a public evaluation connected to the third Message Understanding Conference (MUC-3). The labels and their (approximate) intended meanings are:

PERSON: the name of a person (e.g., "Hilary Clinton")
LOCATION: the name of politically or geographically defined location (e.g., "Poland", "Asia")
ORGANIZATION: the name of a corporate, governmental, or other organizational entity (e.g., "Yankees", "Microsoft", "Department of Justice")

There's a lot more to know about what the precise definitions of these labels are, but you don't really need to know any more than this.

Make sure you're familiar with the "Conventions" section in your platform-specific instructions in the "Getting Started" section of the documentation (Unix, MacOS X, Windows native). In particular, you should know how to set the value of the MAT_PKG_HOME environment variable, which is mentioned frequently in this tutorial.

We're going to do this tutorial in file mode.

Step 1: Install the task

While the named entity task is included in the distribution, it is not installed yet. The named entity task implementation is in the sample/ne subdirectory of MAT_PKG_HOME. Install it as follows:

Unix:

% cd $MAT_PKG_HOME
% bin/MATManagePluginDirs install $PWD/sample/ne

Windows native:

> cd %MAT_PKG_HOME%
> bin\MATManagePluginDirs.cmd install %CD%\sample\ne

(If you received this distribution as a zip file, you won't have to do this with any tasks you find in src/tasks inside the zip file; these will have been installed as part of the overall installation procedure.)

Step 2: Start the UI

Open another terminal, and start the Web server (see here for more details):

Unix:

% $MAT_PKG_HOME/bin/MATWeb

Windows native:

> %MAT_PKG_HOME%\bin\MATWeb

Then open your Firefox browser and:

Ensure that popups are not blocked for localhost (you can check this in the Firefox preferences window under the Content tab).
In the Firefox preferences window in the General tab, ensure that "Always ask me where to save files" is selected.

Then, navigate to http://localhost:7801/MAT/workbench (see here for more details).

Step 3: Load a document

You're now ready to load a document.

In the UI, select File -> Open file... . You'll see a popup window.
Set the task dropdown menu to "Named Entity", if it isn't already set.
Select "Demo" as your workflow.
Press the "Browse" button next to "Input", and navigate to MAT_PKG_HOME/sample/ne/resources/data/raw. Select any of the files in that directory.
The document type should already be "raw", and encoding should be "ascii" or "utf-8".
Press the "Open" button, which should be active as soon as you select an input file.

You should now see a window with a tab which contains the document:

[voa2 raw]

At the right, you'll see a menu where you can change the workflow. You'll see immediately below a status line, which contains each of the steps in the workflow; steps which are finished will be grayed out, and none should be grayed out at the moment. Below the status line are two buttons: a button containing a gear and a right arrow, which will advance the document through the pending automated annotation step, and a reload button. The buttons to the left of the reload button guide you through the workflow; you can find out more details about what they do by hovering over them, and read more about them here.

Below this section, you'll see a tag legend. The legend presents the annotations you'll be able to edit by hand (labeled "Content tags" here), and then other annotations which are automatically added by the initial workflow steps (more about this later). The hand-annotatable types have menu controls next to them; we're not going to discuss those in our tutorials, but you'll be able to learn more about them here.

Within the document tab, you'll see a tagging status area at the top which tells you that hand annotation is unavailable.

The document has two icons at the right end of its tab. The "-" will hide the document, and the "x" will close it. The Tabs menu at the top of the UI provides a way of showing it once it's hidden. Try hiding and showing the document. If you press the "x" by mistake, just follow the instructions in this step above to load the document again.

Step 4: Prepare the document for hand tagging

You're now ready to prepare the document for hand tagging. In order for the document to be hand taggable, it should usually be tokenized; i.e., the basic word elements must be identified. In addition, the regions of the document which might contain interesting elements must be identified (this is called "zoning").

Press the forward button (the one with the gear and the right arrow). As the backend applies the annotated zoning step, the step name will blink briefly, and then the zone step should be grayed out. You shouldn't see any change in the document itself, because this particular task treats the entire document as potentially interesting; if there were uninteresting areas, they would be grayed out in the document text:

[voa2 zoned]

Note, however, that there are other small changes in the controls. First, the gear button has been joined by a button with a left arrow on it; this allows you to undo the just-applied automated step. (This option isn't always available, but it's available in this demo workflow.) Second, the document is now marked as modified, and an asterisk appears in the tab label to indicate this.

Press the forward button again. The tokenize step will blink (for a while longer this time; the system is calling a Java-based tokenizer), and then will be grayed out, and all the words in the document should be surrounded by faint boxes. These outlines show you where the system believes the word boundaries are, which will be relevant in a moment:

[voa2 zoned and toked]

In the tagging status area, it should now say "Hand annotation: available (swipe or left-click)".

You're now in the "tag" step. You'll notice that the buttons have changed again. The "tag" step is what we call a mixed step; it provides both automatic annotation and the option of hand annotation, either from scratch or as a way of correcting the annotated output. So there are three buttons:

the left arrow, which undoes the previous automated step;
the gear button, which is in parentheses, because it's optional;
and the button with a writing hand and a right arrow on it, which is used to complete the hand annotation.

The gear button is only available at the beginning of the tag step. If you start hand annotation (which is what we're about to do), it will be immediately grayed out, so you can't overwrite your hand annotations. We're not going to press this button; you're going to have to wait until Tutorial 3 to find out how it works.

If you're paying attention, you'll also notice that the legend menus have changed from "Visible" to "Active"; again, we're not going to discuss those in our tutorials, but you'll be able to learn more about them here.

Step 5: Insert some tags

There are two ways to select text to tag. You can swipe using the mouse (click left, hold, and move), or click left on an individual word. The system will expand the selection to the nearest word boundaries, and pop up a tagging menu. You can select the appropriate tag with the mouse, or use the keyboard accelerators (in parentheses in the menu).

If you need to remove or change a tag, just click on it. You'll get a popup menu that will allow you to do what you want.

You'll notice that as soon as you start annotating, in addition to the gear being grayed out, the "tag" step will be partially grayed out. This shows you that the step is partially completed.

Step 6: Save the document

Select "File -> Save..." in the menu bar, and then select "mat-json". You should be prompted with a file save dialog. Put this file somewhere you can find it again; we'll come back to it in a bit. Give it a name like "annotated_doc.json".

What you're doing is saving your document, along with its annotations. MAT uses standoff annotations, which record an annotation by recording offsets into the document, rather than in-line annotations, where the annotations would be inserted into the document text directly (e.g., XML). MAT's standoff annotation format is our own format, built on top of the Javascript Object Notation (JSON).

Note: if you're having trouble finding the document you saved, please keep in mind that the browser, not the MAT UI, is responsible for saving your file. In particular, if you haven't configured your browser to prompt you for where to save your file, it will be saved to your browser's download directory. To fix this in Firefox, see the documentation on starting the UI.

Step 7: Close and reload the annotated document, with logging

Finally, you'll close and reload this document, so you can see how loading an annotated document differs from loading a raw document. We'll also see how to start and stop the MAT UI logger.

To close the document, press the "x" in the upper-right corner of the document pane.

Now, let's start the logger.

In the menubar at the top of the UI, press "Logging is off (press to start)". It should change color, and tell you that logging is on.

Now, let's reload the document. All our actions will be logged.

In the UI, select "File -> Open file..." . You'll see a popup window.
Set the task dropdown menu to "Named Entity", if it isn't already set.
Select "Demo" as your workflow.
Press the "Browse" button next to "Input", and navigate to the document you just saved.
Select "mat-json" as the document type. The encoding should automatically switch to utf-8, and it should not be editable.
Press the "Open" button, which should be active as soon as you select an input file.

A tab should appear which shows your annotated document, and the automatic steps you've already performed on the document should be visible on the right.

Step 8: View the logging output

In this open window, add and remove some annotations, then press the logging button again. The browser will download a CSV file which contains the contents of the log. Open the file in your favorite spreadsheet application to see how your actions were logged, and see the logger documentation for a description of the logger output.

Step 9: Shut down the Web server

Shut down your Web server by typing "exit" in the window where you started the Web server. More details here.

Step 10: Clean up (optional)

If you're not planning on doing any other tutorials, and you don't want the "Named Entity" task hanging around, remove it as follows:

Unix:

% cd $MAT_PKG_HOME
% bin/MATManagePluginDirs remove $PWD/sample/ne

Windows native:

> cd %MAT_PKG_HOME%%
> bin\MATManagePluginDirs.cmd remove %CD%\sample\ne

This concludes Tutorial 1.