Moses Support Digest:running giza in parts
[Moses-support] running giza in parts
Dear list,
can anyone direct me to a description of the exact algorithm of running giza++ in parts? I know the co-occurrence file is used for more memory efficient storage of the translation table and probably basically defines which word pairs are to be included into the t-table. However I’m not sure how the combination of several co-occurrence files is performed if the training data is processed in several parts (–parts N). I tried reading the training script (the “run_single_giza_on_parts” sub) and the algorithm is still a mystery to me.
Thank You in advance,
Mark Fishel
Re:[Moses-support] running giza in parts
Hi,
the running in parts options only affects the “cooc” file creation – which is mostly for memory efficiency, so GIZA++ does not run out of memory. It only makes sense to use this option, if the cooc file creation runs out of memory.
-phi
NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.
Related posts:
- Moses Support Digest:About giza++ options when running moses
- Moses Support Digest:GIZA++ error
- Moses Support Digest:How to run giza++ with a dictionary
- Moses Support Digest:problem running mosesserver
- Moses Support Digest:dictionary problem solved
- Moses Support Digest: moses-irstlm memory racing with 5-gram lm
- Moses Support Digest:Alignment information from binary phrase table
- Moses Support Digest:Moses Error in training phrase
- Moses Support Digest: Moses seems to hang
- Moses Support Digest:Moses step 1 – data preparation step