Archive for the ‘Dictionary’ tag
Moses Support Digest:Dictonary use during training
[Moses-support] Dictonary use during training
Hi all, I’m wondering if you would know where I can find an english to spanish parallel, word to word dictionary to complement my training corpus.
Also, from what I have searched I understand you can either add the dictionary words at the end of the corpus or use the giza option. I would like to try both, but for the giza option -d I see that the file format uses the word’s ids, then where will the real words (from the parallel dictionary) go? in the corpus as well? or in a separate file?
Any other suggestions for using a dictionary are welcome.
Thank you.
Read the rest of this entry »
Moses Support Digest:dictionary problem solved
[Moses-support] dictionary problem solved
Hi all,
This dictionary problem is finally solved. “-d” option works well. I made a silly mistake here and caused the problem. I converted the dictionary file to UTF8, but the coding of other files is:7bit ASCII characters. So sorry to bother you for such a long time…
I really appreciate your kind help, especially Mark Fishel and Chris Dyer. You have helped this green hand a lot
As I google this dictionary problem, all I found is my own question. So, to those who may use dictionary and don’t know how, here’s the advice:
1. well…make sure your texts of the same coding
2. check your giza++ source code, and find variable “useDict”, make sure it’s set to ture
3. add a “-d” option to your command, followed by your dictionary the dictionary should be in this format:
target-word-id source-word-id
it must be sorted by the target-word-id.
here’s my command line:
(you may have to know those options which are set to 0 or 1, or a lot of files would be generated )
./GIZA++
5 -CoocurrenceFile korean-chinese.cooc
6 -c korean-chinese-int-train.snt
7 -m1 5 -m2 0 -mh 5 -m3 3 -m4 3
8 -model1dumpfrequency 1
9 -model2dumpfrequency 1
10 -model345dumpfrequency 1
11 -hmmdumpfrequency 1
12 -model4smoothfactor 0.4
13 -nbestalignments 1
14 -onlyaldumps 0
15 -nodumps 0
16 -nsmooth 4
17 -d ck.txt
18 -o korean-chinese
19 -onlyaldumps 1
20 -p0 0.999
21 -s chinese.vcb
22 -t korean.vcb
2009-12-23
Best regards,
Lee Xianhua
Read the rest of this entry »
Moses Support Digest:How to run giza++ with a dictionary
[Moses-support] How to run giza++ with a dictionary?
hi all,
How to run giza++ with a dictionary?
I’ve looked through both moses manual and giza++ readme,but there seems no answer to this question.All I got to know is the format of dictionary, and so on.Could somebody please help me with this?
My command line is like this:
./GIZA++-m3 -CoocurrenceFile en-ch.cooc -c en-ch-int-train.snt -m1 5 -m2 0 -mh 5 -m3 5 -m4 0 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 0 -nsmooth 4 -o en-ch -onlyaldumps 1 -p0 0.999 -s ch.vcb -t en.vcb > logec 2> errec &
Thanks in advance.
2009-12-21
Best regards,
Lee Xianhua
Read the rest of this entry »
A Cool Dictionary for Natural Language Processing
I found Professor Bill Wilson’s “The Natural Language Processing Dictionary” accidentally tonight, and thought it very cool for nlpers. Except from the NLP Dictionary, you also can find the Prolog, Artificial Intelligence and Machine learning Dictionary in this web page. Below is from this Dictionary:
Read the rest of this entry »