[Moses-support] Dictonary use during training
Hi all, I’m wondering if you would know where I can find an english to spanish parallel, word to word dictionary to complement my training corpus.
Also, from what I have searched I understand you can either add the dictionary words at the end of the corpus or use the giza option. I would like to try both, but for the giza option -d I see that the file format uses the word’s ids, then where will the real words (from the parallel dictionary) go? in the corpus as well? or in a separate file?
Any other suggestions for using a dictionary are welcome.
Thank you.
Re:[Moses-support] Dictonary use during training
re: adding dictionary entries, this is certainly a hack. but the standard trick is to pretend that the dictionary actually consists of tiny parallel sentences. you therefore just append each word-entry as a new sentence pair. don’t bother with that -d option.
Miles
NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.
Related posts:
- Moses Support Digest:Moses Error in training phrase
- Moses Support Digest:dictionary problem solved
- Moses Support Digest:How to run giza++ with a dictionary
- Moses Support Digest:GIZA++ error
- Moses Support Digest:About giza++ options when running moses
- Moses Support Digest: Word Alignment – Moses
- Moses Support Digest:running giza in parts
- Moses Support Digest:Moses step 1 – data preparation step
- Moses Support Digest: moses-chart error while compiling training scripts
- Moses Support Digest:About the hierarchical model of Moses
Pingback: P2 Review – US to Korean Firmware + UMS + Dictionary