I Love Natural Language Processing

I LOVE NLP

Moses Support Digest:running giza in parts

without comments

[Moses-support] running giza in parts

Dear list,

can anyone direct me to a description of the exact algorithm of running giza++ in parts? I know the co-occurrence file is used for more memory efficient storage of the translation table and probably basically defines which word pairs are to be included into the t-table. However I’m not sure how the combination of several co-occurrence files is performed if the training data is processed in several parts (–parts N). I tried reading the training script (the “run_single_giza_on_parts” sub) and the algorithm is still a mystery to me.

Thank You in advance,
Mark Fishel


Re:[Moses-support] running giza in parts

Hi,

the running in parts options only affects the “cooc” file creation – which is mostly for memory efficiency, so GIZA++ does not run out of memory. It only makes sense to use this option, if the cooc file creation runs out of memory.

-phi

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest:About giza++ options when running moses
  2. Moses Support Digest:GIZA++ error
  3. Moses Support Digest:How to run giza++ with a dictionary
  4. Moses Support Digest:problem running mosesserver
  5. Moses Support Digest:dictionary problem solved
  6. Moses Support Digest: moses-irstlm memory racing with 5-gram lm
  7. Moses Support Digest:Alignment information from binary phrase table
  8. Moses Support Digest:Moses Error in training phrase
  9. Moses Support Digest: Moses seems to hang
  10. Moses Support Digest:Moses step 1 – data preparation step

Written by 52nlp

December 26th, 2009 at 10:57 pm

Posted in Moses,SMT

Tagged with , , ,

Leave a Reply