Moses Support Digest:Moses step 1 – data preparation step

[Moses-support] Moses step 1 – data preparation step

Hi,

does anyone know what step 1 of the moses training script does other than produce the dictionaries and the numerical sentences that enable GIZA++ to do its job. The reason I ask is that on my machine step 1 takes just over 70 mins for en-fr Europarl corpus.

My optimised version of data preparation and EM IBM Model 1 completes is 121 seconds for five iterations of EM, that’s just over 2 minutes. Before publishing these results I just wanted to make sure there’s nothing I’ve missed about step 1 of the training process. Does it do anything at all that influences GIZA++ other than preparing the digital sentences?

James


[Moses-support] Re: Moses step 1 – data preparation step

Hi,

yes, it is correct that step 1 is doing just the data preparation for GIZA++. The most time-consuming step is running mkcls to creake the classes for the relative distortion models.

-phi

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest:Pulling source data
  2. Moses Support Digest: Issues with Score data
  3. Moses Support Digest:How do you solve this moses problem
  4. Moses Support Digest:About giza++ options when running moses
  5. Moses Support Digest:Moses Error in training phrase
  6. Moses Support Digest:POS LM
  7. Moses Support Digest:openTMS supports Moses as a data source
  8. Moses Support Digest:Dictonary use during training
  9. Moses Support Digest:running giza in parts
  10. Moses Support Digest:Hierarchical rule extraction
This entry was posted in Moses, SMT and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>