The model builder constructs a model according to the
      configuration provided in the specified task and on the command
      line. This model can be used by MATEngine
      to automatically tag documents. Note that if you're using the
      jCarafe engine provided with MAT, the model that is built will
      only train to find the simple spanned annotations in documents (no
      spanless annotations will be trained for, and no attributes will
      be trained for beyond those associated with the effective label).
    
Note that you should never use MATModelBuilder to save models into workspaces; use MATWorkspaceEngine instead.
Note: if you create a model using this tool, and you want to do autotagging in file mode in the MAT UI, you must restart the MAT Web
        server. Otherwise, the UI will not be able to access the
      newest model.
    
If the annotation set corresponding to your training step
      contains labels which aren't intended to be trained for (e.g.,
      they're human annotator notes which are added during hand
      annotation but aren't intended to be processed), be sure those
      labels are designated processable="no"
      in the annotation set descriptor.
    
Unix:
% $MAT_PKG_HOME/bin/MATModelBuilder
Windows native:
> %MAT_PKG_HOME%\bin\MATModelBuilder.cmd
Usage: MATModelBuilder [task option] [step option] [config name option] [other options]
| --task <task> | 
          Name of the task to use. Must
            be the first argument, if present. Obligatory if the system
            knows of more than one task. The system will provide a list
            of known tasks as part of its help string. | 
        
| --step <step> | 
          Name of the step which contains the trainable
            engine. Must be the first argument after --task, if present.
            Optional. Obligatory if the task has more than one trainable
            step. | 
        
| --config_name <name> | 
          Name of the model build
            config to use. Must be the first argument after --step, if
            present, or --task, if --step is not present and --task is
            present. Optional. Default model build config will be used
            if no config is specified. | 
        
| --language <l> | 
            Language to use, either by name or code, as
              specified in the task. Obligatory if multiple languages
              are present and the engine for the step in question
              supports multiple languages. | 
          
| --input_dir <dir> | 
            A directory, all of whose
              files will be used in the model construction. Can be
              repeated. May be specified with --input_files. | 
          
| --input_files <pat> | 
            A glob-style pattern
              describing full pathnames to use in the model
              construction. May be specified with --input_dir. Can be
              repeated. (If you're not familiar with Unix, glob patterns
              are file name patterns recognized by Unix shells. Consult
              your favorite Unix documentation for details.) | 
          
| --file_type <t> | 
            The file type of the input.
              One of the available readers.
              The "raw" reader is not permitted. The "mat-json" reader
              is the default. | 
          
| --encoding <encoding> | 
            The encoding of the input.
              The default is the appropriate default for the file type. | 
          
| --handle_non_bmp <v> | 
            Instructions on how to handle Unicode characters outside the Basic Multilingual Plane. Overrides the default HANDLE_NON_BMP configuration variable. Value is one of 'warn', 'scrub_or_warn', 'fail', 'ignore'. See the Unicode issues discussion for details. Default is 'warn'. | 
| --fresh_task | 
            If this option is present, all task
              information in each document will be removed and
              re-inferred before the model is created. Use this option
              if you're processing documents which were created using a
              task other than the current one. | 
          
| --model_file <file> | 
            Location to save the
              created model. The directory must already exist.
              Obligatory if --save_as_default_model isn't specified. | 
          
| --save_as_default_model | 
            If the the task.xml file
              for the task specifies the <default_model> element,
              save the model in the specified location, possibly
              overriding any existing model. The default model path
              receives a suffix reflecting the appropriate step and
              language; see here
              for more details. | 
          
MATModelBuilder also makes the common options available.
The reader referenced in the --file_type option may introduce
      additional options, which are described here. These additional
      options must follow the --file_type option.
    
The particular training engine defined for the task in your
      task.xml file will make available other command-line options. The
      command-line options for the jCarafe engine are described here. The examples below assume
      that you're using the jCarafe engine.
    
Let's say that you have several annotated documents in
      /path/to/my/docs, and there are no other files in that directory.
      Further, you have only one task, the task has no default model,
      and you have a default <model_config> in your task.xml file
      which contains appropriate settings for the engine, feature set
      and PSA training. The following command would write your model to
      the file named "task_model" in the current directory:
    
Unix:
% $MAT_PKG_HOME/bin/MATModelBuilder --input_dir /path/to/my/docs --model_file $PWD/task_model
Windows native:
> %MAT_PKG_HOME%\bin\MATModelBuilder.cmd --input_dir c:\path\to\my\docs --model_file %CD%\task_model
To make use of this model, you could pass it to MATEngine, e.g.,
      as the value of the --carafe_tag_model flag in our sample task.
    
Let's say you have multiple tasks, and the one you want to use is
      "Named Entity". Your documents are in the same place, but there
      are other documents there too; fortunately, all the documents you
      want to use end with '.json'. In addition, your documents have
      lots of really odd person names in them, but you conveniently have
      a list of the names you're looking for, and you've prepared a
      directory /path/to/my/lexicon which contains a single file named
      NAMES which contains each of the tokens of interest, like so:
    
Urbatz
Yuguwima
Florshin
Batywan
The task you're using has a default model. The following command
      would save your model as the default:
    
Unix:
% $MAT_PKG_HOME/bin/MATModelBuilder --task "Named Entity" \
--input_files '/path/to/my/docs/*.json' \
--lexicon_dir /path/to/my/lexicon/ --save_as_default_model
Windows native:
> %MAT_PKG_HOME%\bin\MATModelBuilder.cmd --task "Named Entity" \
--input_files "c:\path\to\my\docs\*.json" \
--lexicon_dir c:\path\to\my\lexicon\ --save_as_default_model
Let's say we're in the same situation as example 2, except you
      only want to build a model out of the files 100.json through
      199.json, as well as the files in /path/to/my/other/docs.
    
Unix:
% $MAT_PKG_HOME/bin/MATModelBuilder --task "Named Entity" \
--input_files '/path/to/my/docs/1[0-9][0-9].json' \
--input_dir /path/to/my/other/docs \
--lexicon_dir /path/to/my/lexicon/ --save_as_default_model
Windows native:
> %MAT_PKG_HOME%\bin\MATModelBuilder.cmd --task "Named Entity" \
--input_files "c:\path\to\my\docs\1[0-9][0-9].json" \
--input_dir c:\path\to\my\other/docs \
--lexicon_dir c:\path\to\my\lexicon\ --save_as_default_model
Let's say we're in the same situation as example 3, except the
      documents are XML inline documents with the ".xml" suffix:
    
Unix:
% $MAT_PKG_HOME/bin/MATModelBuilder --task "Named Entity" \
--input_files '/path/to/my/docs/1[0-9][0-9].xml' \
--file_type xml-inline \
--input_dir /path/to/my/other/docs \
--lexicon_dir /path/to/my/lexicon/ --save_as_default_model
Windows native:
> %MAT_PKG_HOME%\bin\MATModelBuilder.cmd --task "Named Entity" \
--input_files "c:\path\to\my\docs\1[0-9][0-9].xml" \
--file_type xml-inline \
--input_dir c:\path\to\my\other\docs \
--lexicon_dir c:\path\to\my\lexicon\ --save_as_default_model
Let's say we're in the same situation as example 3, but we have a
      non-default model configuration that we want to use:
    
Unix:
% $MAT_PKG_HOME/bin/MATModelBuilder --task "Named Entity"
--config_name 'alt_config' \
--input_files '/path/to/my/docs/1[0-9][0-9].xml' \
--file_type xml-inline \
--input_dir /path/to/my/other/docs \
--lexicon_dir /path/to/my/lexicon/ --save_as_default_model
Windows native:
> %MAT_PKG_HOME%\bin\MATModelBuilder.cmd --task "Named Entity"
--config_name "alt_config" \
--input_files "c:\path\to\my\docs\1[0-9][0-9].xml" \
--file_type xml-inline \
--input_dir c:\path\to\my\other/docs \
--lexicon_dir c:\path\to\my\lexicon\ --save_as_default_model
Let's say we're in the same situation as Example 2, but you want
      to build a model for the "Sample Relations"
      task. Since this task has multiple trainable steps, you must
      specify the step you're targeting:
    
% $MAT_PKG_HOME/bin/MATModelBuilder --task "Sample Relations" \
--step relation_tag \
--input_files '/path/to/my/docs/1[0-9][0-9].json' \
--input_dir /path/to/my/other/docs \
--lexicon_dir /path/to/my/lexicon/ --save_as_default_model
Windows native:
> %MAT_PKG_HOME%\bin\MATModelBuilder.cmd --task "Sample Relations" \
--step relation_tag \
--input_files "c:\path\to\my\docs\1[0-9][0-9].json" \
--input_dir c:\path\to\my\other/docs \
--lexicon_dir c:\path\to\my\lexicon\ --save_as_default_model