Moses Support Digest:running giza in parts

[Moses-support] running giza in parts

Dear list,

can anyone direct me to a description of the exact algorithm of running giza++ in parts? I know the co-occurrence file is used for more memory efficient storage of the translation table and probably basically defines which word pairs are to be included into the t-table. However I’m not sure how the combination of several co-occurrence files is performed if the training data is processed in several parts (–parts N). I tried reading the training script (the “run_single_giza_on_parts” sub) and the algorithm is still a mystery to me.

Thank You in advance,
Mark Fishel


Re:[Moses-support] running giza in parts

Hi,

the running in parts options only affects the “cooc” file creation – which is mostly for memory efficiency, so GIZA++ does not run out of memory. It only makes sense to use this option, if the cooc file creation runs out of memory.

-phi

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest:About giza++ options when running moses
  2. Moses Support Digest:GIZA++ error
  3. Moses Support Digest:How to run giza++ with a dictionary
  4. Moses Support Digest:problem running mosesserver
  5. Moses Support Digest:dictionary problem solved
  6. Moses Support Digest: moses-irstlm memory racing with 5-gram lm
  7. Moses Support Digest:Alignment information from binary phrase table
  8. Moses Support Digest:Moses Error in training phrase
  9. Moses Support Digest:Reading binary phrase table from the disk
  10. Moses Support Digest:Moses step 1 – data preparation step
This entry was posted in Moses, SMT and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>