Moses Support Digest:About the hierarchical model of Moses

[Moses-support] About the hierarchical model of Moses

Hi
Now I find a problem when I’m training a hierarchical model with script of train-model.pl. The parallel corpus I used to train the hierarchical model have more than two million sentences. Then when moses extracting rules from the corpus, it extract so many rules that I don’t have enough disk space to store them. The rules take more than 100 GB disk space and the extracting process is so aborted. Is there any method to reduce the space when extracting rules? Now I fail to train a hierarchical model. Thanks in advance.

zhu hai


Re:[Moses-support] About the hierarchical model of Moses

Try clean-corpus-n.perl, for example: clean-corpus-n.perl corpus-in ru en corpus-out 1 10

or buy 2 terabyte hard drive ;-)

Re:[Moses-support] About the hierarchical model of Moses

Hi,

one way to reduce the size of the rule table is to enforce a lower span size for the rules, for instance:

train-model.perl [...] -extract-options=”–MaxSpan 8″

The default is 12.

-phi

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest:Hierarchical rule extraction
  2. Moses Support Digest:Moses Error in training phrase
  3. Moses Support Digest:Building POS language model with SRILM
  4. Moses Support Digest:Reading binary phrase table from the disk
  5. Moses Support Digest:About giza++ options when running moses
  6. Moses Support Digest:Is reordering model a must-be-used component to use?
  7. Moses Support Digest: Hierarchical and syntax-based decoding in Moses
  8. Moses Support Digest:CreateBerkeleyPt and On-Disk Rule Table
  9. Moses Support Digest:Suffix arrays in Moses
  10. Moses Support Digest:How do you solve this moses problem
This entry was posted in Moses, SMT and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>