Moses Support Digest:Dictonary use during training

[Moses-support] Dictonary use during training

Hi all, I’m wondering if you would know where I can find an english to spanish parallel, word to word dictionary to complement my training corpus.

Also, from what I have searched I understand you can either add the dictionary words at the end of the corpus or use the giza option. I would like to try both, but for the giza option -d I see that the file format uses the word’s ids, then where will the real words (from the parallel dictionary) go? in the corpus as well? or in a separate file?

Any other suggestions for using a dictionary are welcome.

Thank you.

Re:[Moses-support] Dictonary use during training

re: adding dictionary entries, this is certainly a hack. but the standard trick is to pretend that the dictionary actually consists of tiny parallel sentences. you therefore just append each word-entry as a new sentence pair. don’t bother with that -d option.

Miles

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest:Moses Error in training phrase
  2. Moses Support Digest:dictionary problem solved
  3. Moses Support Digest:How to run giza++ with a dictionary
  4. Moses Support Digest:GIZA++ error
  5. Moses Support Digest:About giza++ options when running moses
  6. Moses Support Digest: Word Alignment – Moses
  7. Moses Support Digest:running giza in parts
  8. Moses Support Digest:Moses step 1 – data preparation step
  9. Moses Support Digest: moses-chart error while compiling training scripts
  10. Moses Support Digest:About the hierarchical model of Moses
This entry was posted in Moses, SMT and tagged , , , . Bookmark the permalink.

One Response to Moses Support Digest:Dictonary use during training

  1. Pingback: P2 Review – US to Korean Firmware + UMS + Dictionary

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>