[Moses-support] Moses step 1 – data preparation step
Hi,
does anyone know what step 1 of the moses training script does other than produce the dictionaries and the numerical sentences that enable GIZA++ to do its job. The reason I ask is that on my machine step 1 takes just over 70 mins for en-fr Europarl corpus.
My optimised version of data preparation and EM IBM Model 1 completes is 121 seconds for five iterations of EM, that’s just over 2 minutes. Before publishing these results I just wanted to make sure there’s nothing I’ve missed about step 1 of the training process. Does it do anything at all that influences GIZA++ other than preparing the digital sentences?
James
[Moses-support] Re: Moses step 1 – data preparation step
Hi,
yes, it is correct that step 1 is doing just the data preparation for GIZA++. The most time-consuming step is running mkcls to creake the classes for the relative distortion models.
-phi
NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.
Related posts:
- Moses Support Digest:Pulling source data
- Moses Support Digest: Issues with Score data
- Moses Support Digest:How do you solve this moses problem
- Moses Support Digest:About giza++ options when running moses
- Moses Support Digest:Moses Error in training phrase
- Moses Support Digest:POS LM
- Moses Support Digest:openTMS supports Moses as a data source
- Moses Support Digest:Dictonary use during training
- Moses Support Digest:running giza in parts
- Moses Support Digest:Hierarchical rule extraction