Web application server

Description

The Web application server is a threaded server which provides services to the Web UI, as well as supporting the tagger broker. By default, the application server also starts up a command loop which allows the user to start up and shut down various portions of the application. The application server supports CGI access for some services, and XML-RPC for others. The service API is intended to be internal; it will be documented at a later point.

The MAT Web server is NOT SECURE

The native MAT Web server is not an enterprise-secure implementation, and will never be one. It does not use SSL, and it does not provide any security logging or traceability. You should assume that anyone who has access to your network can see your Web server traffic. If you know how to do it, it is possible to proxy the MAT Web server behind Apache to take advantage of Apache's security options.

Note: the Web application server inventories the tagger models it makes available in file mode at startup time. If you build a model after you start the Web server, the Web server won't know about that model until you restart it. (This condition does not apply to workspaces.)

The command loop

When you start up the Web server, and it's running a command loop, you see something like this:

Web server started on port 7801.

Web server command loop. Commands are:

exit - exit the command loop and stop the Web server
loopexit - exit the command loop, but leave the Web server running
taggerexit - shut down the tagger service, if it's running
restart - restart the Web server
ws_key - show the workspace key
help, ? - this message

Workspace key is w65gABlWFKYSdZwXCHaspyhW0DySP7ZW
Command:

These commands should be self-explanatory. The command that you're most likely to use is "ws_key", which prints out the workspace key that the Web server knows. The workspace security documentation explains the importance of this key.

On Windows, it's not possible for the Web server to restart if it's running in a terminal (so the "restart" command will fail, if you want, e.g., to restart because you've added a new task). The --supersede_existing_server will not work, either. Automatic midnight restarts for the service configuration still work.

Usage

Unix:

% $MAT_PKG_HOME/bin/MATWeb

Windows native:

> %MAT_PKG_HOME%\bin\MATWeb.cmd

Usage: MATWeb [options]

Options

In addition to providing explicit command-line options, you can provide a configuration file of default MATWeb settings. The template for this file is found in MAT_PKG_HOME/etc/MATWeb_settings.config.in. You can either provide the customized configuration file to MATWeb via the --web_settings option, or save the file MAT_PKG_HOME/MATWeb_settings.config as a default Web settings file for your installation. If you save this default file, either the --web_settings option or the --suppress_default_web_settings option will cause it to be ignored.

--port <num>
Port to listen on (default is 7801).
--ui_settings <file>
Optional XML file to modify default UI settings
--noscreen
Don't report Web errors and access to stdout.
--access_log <file>
Absolute pathname of intended access log file. If absent, the Web access information will be written to standard output, unless --noscreen is specified.
--error_log <file> Absolute pathname of intended error log file. If absent, the Web error information will be written to standard output, unless --noscreen is specified.
--tagger_log <file> Absolute pathname of intended tagger log file. If absent, tagger service status output will be routed to standard output.
--clear_logs
Clear the logs before startup.
--no_cmdloop
Don't start up an interactive command loop.
--no_tagger_service
Don't start up the tagger service.
--localhost_only
Accept connections only from localhost or 127.0.0.1.
--log_rotation_count <n>
Rotate logs on restart, and keep only n logs
--midnight_restart
Restart automatically at midnight (useful for rolling over logs)
--workspace_key <k>
Use the specified key as a workspace key. This flag is used mostly when the server automatically restarts after an error, so the user doesn't need to change the workspace key in the UI. However, it can also be used explicitly, e.g., to associate a previously-agreed-upon "password" for the workspaces.

WARNING: the key will be visible in process listings. If you want something more secure, use --workspace_key_file.
--workspace_key_file <f>
Like --workspace_key, but more secure. The key file should be readable only by the user starting up MATWeb, and should be a UTF-8 file containing the workspace key.
--workspace_key_file_is_temporary
If --workspace_key_file is provided, remove the key file immediately after startup.
--allow_remote_workspace_access
By default, workspaces only permit access from local clients. If this flag is specified, the server allows workspaces to be accessed by clients on remote hosts.
--workspace_container_directory <dir>
By default, when the user opens a workspace from the UI, she must provide a full pathname. If you provide this option, which is repeatable, and the user provides a partial pathname, the server will search in each of the candidate container directories for a workspace at the location indicated by the partial pathname. If this option is provided, the user will no longer be able to specify a full pathname, in order to circumscribe remote access when MATWeb is being run as a persistent service.
--supersede_existing_server
If specified, terminate any MATWeb server running on the requested port on this machine, if the current user has permission to do so.
--output_log <log>
Redirect all stray output from stdout and stderr to this log. This file should be readable only by the user starting up MATWeb, since the workspace key may be written to this log.
--as_service <dir>
Start up MATWeb as a service. This creates the specified log directory and starts up as if you specified
--log_rotation_count 7 --midnight_restart --allow_remote_workspace_access \
--supersede_existing_server --noscreen --no_cmdloop \
--access_log <dir>/access.log --error_log <dir>/error.log \
--tagger_log <dir>/tagger.log --output_log <dir>/output.log
This option is interpreted before all other command line options, so you can also specify options like --log_rotation_count or --access_log if you like. Options like --workspace_container_directory, --localhost_only, --workspace_key, --no_tagger_service are also respected.
--spawn_tabbed_terminal <cmd>
Spawn the Web server using the specified command and exit. The command must accept four arguments: the command to run the Web server command loop, the command to show the Web access log, the command to show the Web error log, and the command to show the tagger log. See the GUI section.
--web_settings <file>
Optional settings file (see etc/MATWeb_settings.config.in for the structure) to set options for the Web server. Overrides any default web settings file in the directory dominating the bin/ directory. Explicitly-provided command line options (e.g., --port) supersede either this file or the default Web settings file, when appropriate.
--suppress_default_web_settings
If provided, causes MATWeb to ignore any default web settings file in the directory dominating the bin/ directory.
--app_prefix <prefix>
Normally, the MAT server provides the MAT application as http://<yourhost>:<port><prefix>/, where the default prefix is '/MAT'. You can change the prefix with this option.
--service_debug
If provided, print out verbose error handling information for server debugging.

For more details about --output_log, --workspace_key, --workspace_key_file, and --workspace_container_directory, see the section on workspace security.

MATWeb also makes the common options available.

GUI

The --spawn_tabbed_terminal option re-invokes the MATWeb server using the tabbed terminal invocation script provided as its argument. The command is invoked with four commands:

An example of such a script, using the Unix GNOME gnome-terminal application, can be found in MAT_PKG_HOME/web/examples/gnome_tabbed_web_server_terminal.sh. The script used should name the individual tabs, if possible, and make the Web server command loop tab active, if possible.

Most, but not all, command-line options are compatible with the tabbed terminal; the ones that aren't are the ones that are relevant to setting up a service.

If the tabbed terminal isn't available, MATWeb will start up in the usual way on the console.

Proxying behind Apache

If you're an Apache guru, you can use Apache to implement security for MATWeb, by proxying MATWeb behind Apache and adding Apache SSL support, password-protected access, etc. The key to this procedure is starting up MATWeb with the --localhost_only option, and securing access to the machine running MATWeb. Providing details on the Apache configuration is beyond the scope of this documentation. It has not yet been tested with the new bottle Web server.

Examples

Example 1

To start up the Web server listening on the default port, with a command loop and all status messages printed to standard output:

Unix:

% $MAT_PKG_HOME/bin/MATWeb

Windows native:

> %MAT_PKG_HOME%\bin\MATWeb.cmd

Example 2

To start up the Web server on port 7806, but only allow connections from the local host:

Unix:

% $MAT_PKG_HOME/bin/MATWeb --port 7806 --localhost_only

Windows native:

> %MAT_PKG_HOME%\bin\MATWeb.cmd --port 7806 --localhost_only

Example 3

To start up the Web server on the default port, but write the status messages to different files:

Unix:

% $MAT_PKG_HOME/bin/MATWeb --access_log ./web_access.log --error_log ./web_error.log \
--tagger_log ./web_tagger.log --noscreen

Windows native:

> %MAT_PKG_HOME%\bin\MATWeb.cmd --access_log %CD%\web_access.log --error_log %CD%\web_error.log \
--tagger_log %CD%\web_tagger.log --noscreen

Without the --noscreen flag, the Web server messages will be printed both to the files and to standard output.

Example 4

On Unix, to start up the Web server as a Unix cron job, saving logs for the previous 7 days in /etc/MAT/logs, and restarting each midnight, enter this command line in your crontab, if MAT_PKG_HOME is /usr/MAT:

/usr/MAT/bin/MATWeb --access_log /etc/MAT/logs/web_access.log \
--error_log /etc/MAT/logs/web_error.log \

--tagger_log /etc/MAT/logs/web_tagger.log --noscreen \
--no_cmdloop --midnight_restart --log_rotation_count 7

Caution: this mode is very, very lightly tested.

Example 5

Let's say you want to distribute access to some MAT workspaces to a range of annotators on a variety of machines, but you don't want to force them to remember a complex workspace key or pathname. You can specify the workspace key and the container directory for the workspaces on the command line when you start up MATWeb. (Be sure to distribute the workspace key to your annotators in a secure manner, since it's our only means of security at the moment.)

Unix:

% $MAT_PKG_HOME/bin/MATWeb --workspace_key my_annotator_key \
--workspace_container_directory
/path/to/dir/containing/workspaces

Windows native:

> %MAT_PKG_HOME%\bin\MATWeb.cmd --workspace_key my_annotator_key \
--workspace_container_directory
C:\path\to\dir\containing\workspaces

Example 6

Let's say you want to start up the Web server as a cron job, but you don't want all the verbosity in example 4, and you also want to specify the workspace key and the workspace container directory, and you prefer a log rotation of 5:

/usr/MAT/bin/MATWeb --as_service /etc/MAT/logs \
--workspace_key my_annotator_key \
--workspace_container_directory
/path/to/dir/containing/workspaces \
--log_rotation_count 5

This is the same as if you had specified

/usr/MAT/bin/MATWeb --midnight_restart --allow_remote_workspace_access \
--supersede_existing_server --noscreen --no_cmdloop \
--access_log /etc/MAT/logs/access.log --error_log /etc/MAT/logs/error.log \
--tagger_log /etc/MAT/logs/tagger.log --output_log /etc/MAT/logs/output.log \
--workspace_key my_annotator_key \
--workspace_container_directory /path/to/dir/containing/workspaces \

--log_rotation_count 5