Core technologies

jCarafe

The central technology underlying MAT is jCarafe, a MITRE-built trainable conditional random field text tagger, implemented in Scala, a Java-compatible programming language which compiles to Java object files. This engine creates models from annotated documents and annotates documents based on those models. The tokenizer that serves as a preprocess to jCarafe is also implemented in Scala, and is distributed with jCarafe.

In many annotation systems, the main focus is on tweaking the tagger to get the best possible results. This is definitely possible with jCarafe, but it's not the focus of MAT; the focus of MAT is to build an annotation infrastructure around an existing tagging feature set and tokenizer.

MAT makes jCarafe available as a command-line tool, but also as a server, which avoids repeatedly incurring the startup cost of loading the annotation model. This server behavior is the default behavior in MAT; it must be explicitly disabled using the --<step_name>_local option to MATEngine, where <step_name> is the name of the tagging step the model is intended for.

JavaScript and AJAX

MAT is distributed with a UI which is a novel combination of two capabilities which are required for managing annotated documents. On the one hand,  it's a hand annotation tool, as well as an annotation display tool. On the other hand, it also allows the user to control the automated steps which the document goes through. This UI is written entirely in JavaScript, and MAT runs its own Web server to make the UI available.

JSON

MAT maintains its annotated documents in its own simple standoff annotation format, which is based on the JavaScript Object Notation (JSON). JSON is especially convenient for passing documents back and forth via AJAX.

Python

The core engine which controls the automated document processing is written in Python. This includes both the command-line capabilities and the Web backend.

Open-source packages used by MAT

YUI

For its Web frontend, MAT relies on the Yahoo! YUI toolkit, a BSD-licensed library for building rich interactive Web applications.

CherryPy

For its service infrastructure, including its Web server, MAT relies on CherryPy, a BSD-licensed, lightweight, flexible threaded Web protocol engine written in Python.