Frequently asked questions

Installation and configuration

How do I upgrade the version of Java or Python that MAT depends on?

If you upgrade your version of Java or Python, and you want MAT to use the new version, the easiest thing to do is rerun the installer. See the installation instructions for your platform (Unix, MacOS or Windows native).

What's the difference between the full application and the Python pip module?

The pip module is a small, almost completely self-contained module (about 350K compressed, relying only on the third-party Python munkres library) which is intended for people working with MAT documents outside the bounds of the full application. The pip module implements the core document model, task, workflow, scoring, workspace and experiment capabilities of MAT. The following are not included:

What MAT can do

How can I incrementally build a corpus and train a jCarafe model?

The various steps involved in this incremental, tag-a-little, learn-a-little loop deserve their own page, and are described here.

How do I use the MAT UI to hand-annotate documents?

You don't need to train a model or automatically tag data to use MAT. You can use the MAT UI as a pure hand annotation tool. Tutorial 1 covers this process pretty well, or, if your task involves more than simple span annotations, check out tutorial 7.

Here's a summary with pointers to more of the details. First, create your task and install it, or use a task you've already defined. Make sure that this task has a hand annotation step, as illustrated in the sample 'Named Entity' task. Then, start the Web server and load the MAT UI. You can either manually load and save individual documents on your local machine, or you can set up a workspace and access it in the UI.

How do I use the MAT scorer and UI to compare the output of multiple annotators or annotation tools?

To score simple span annotations (that is, annotations whose main label is the distinguishing element) against each other, you don't even need to have a task. You can use the scorer with the --content_annotations option.

If you're comparing multiple annotation tools, you can run them outside of MAT and ensure that their output can be read by MAT by identifying or creating a MAT reader. Alternatively, you can write a wrapper for the annotation tool, and create workflows and model configurations in task.xml which use the tool (this is harder, and not really documented).

If you're comparing multiple annotators, you can have them annotate the files in file mode and then compare them using the scorer, or you can use the  assignment capability in the workspaces to have the files multiply-annotated in the context of a workspace. In the future, you'll be able to score and reconcile these multiply-annotated files directly in the workspace.

How can I use MAT without its Web server to visualize or create annotations?

The MAT document visualization and annotation tool is available as a standalone utility which does not require the MATWeb server.

How can I get a handle on a set of annotated documents I haven't seen before?

If you have documents for which you don't have a task, the first step is to ensure that you can read them. If the documents aren't in a known format, you'll need to write a reader for them (or convert them, outside MAT, to a format that MAT knows). Once MAT can read the documents, you can use MATReport to summarize the annotations which are present in the documents, and you can use the --create_task option to create a task, which you can then install. At that point, you can start MATWeb and load the documents into the UI. Alternatively, you can load the documents into the UI while inferring a task.

Where can I find examples of how to set up MAT?

MAT is an intimidatingly configurable tool, in many respects. We've tried to exemplify some of the most common use cases for various aspects of the tool.

Core capabilities

How do I choose between file and workspace modes?

You can find a succinct comparison of file and workspace modes here.

Can the default jCarafe engine use lexicons to enhance its performance?

Yes. See the --lexicon_dir option to the jCarafe training engine. But you should be aware of the case sensitivity of feature specifications.

Why can't I select "Open workspace..." in the "File" menu?

Because the IP address of the machine your web browser is running on is different than the IP address of the machine that MATWeb is running on. This is the default behavior of MATWeb. You can override this behavior by providing the --allow_remote_workspace_access option to MATWeb when you start it up.

UI and display

Why can't I see my annotations?

If you've defined your annotations in your task.xml file, and you're sure your document contains annotations, but you can't see them in the UI, the most likely cause is that you haven't assigned any display features to them.

Why does my document take so long to render?

If you have lots of annotations (including tokens), the document redisplay can sometimes take a while. We've found that when full display redraw is triggered, documents that have more than 1000 annotations will take a visible amount of time to redisplay. We've attempted to optimize the panel redraw, but full redisplay is still triggered in a number of circumstances, including when documents are loaded or returned from a server-side annotation process, and sometimes when the annotation window width is changed.

What's that URL in the bottom left corner of the UI?

Sometimes, in the UI, in Firefox, you'll notice a URL in the bottom left corner of the UI; if you move your mouse over it, it moves to the bottom right corner. This is an artifact of the particular way that the Yahoo! UI toolkit implements the tabbing that the MAT UI uses. If this URL annoys you, you can almost always make it go away by clicking on one of the tab labels in the UI. We've done our best to eliminate this floating URL, but we haven't been able to work around all of the places it occurs.