I Love Natural Language Processing

I LOVE NLP

Moses Support Digest:Dictonary use during training

with one comment

[Moses-support] Dictonary use during training

Hi all, I’m wondering if you would know where I can find an english to spanish parallel, word to word dictionary to complement my training corpus.

Also, from what I have searched I understand you can either add the dictionary words at the end of the corpus or use the giza option. I would like to try both, but for the giza option -d I see that the file format uses the word’s ids, then where will the real words (from the parallel dictionary) go? in the corpus as well? or in a separate file?

Any other suggestions for using a dictionary are welcome.

Thank you.

Re:[Moses-support] Dictonary use during training

re: adding dictionary entries, this is certainly a hack. but the standard trick is to pretend that the dictionary actually consists of tiny parallel sentences. you therefore just append each word-entry as a new sentence pair. don’t bother with that -d option.

Miles

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest:Moses Error in training phrase
  2. Moses Support Digest:How to run giza++ with a dictionary
  3. Moses Support Digest:dictionary problem solved
  4. Moses Support Digest:GIZA++ error
  5. Moses Support Digest:About giza++ options when running moses
  6. Moses Support Digest: Word Alignment – Moses
  7. Moses Support Digest:running giza in parts
  8. Moses Support Digest:Moses step 1 – data preparation step
  9. Moses Support Digest: moses-chart error while compiling training scripts
  10. Moses Support Digest:About the hierarchical model of Moses

Written by 52nlp

February 24th, 2010 at 8:37 pm

Posted in Moses,SMT

Tagged with , , ,

One Response to 'Moses Support Digest:Dictonary use during training'

Subscribe to comments with RSS or TrackBack to 'Moses Support Digest:Dictonary use during training'.

  1. [...] Moses Support Digest:Dictonary use during training at I Love Natural Language Processing [...]

Leave a Reply