I Love Natural Language Processing

I LOVE NLP

Archive for the ‘Dictionary’ tag

Moses Support Digest:Dictonary use during training

with one comment

[Moses-support] Dictonary use during training

Hi all, I’m wondering if you would know where I can find an english to spanish parallel, word to word dictionary to complement my training corpus.

Also, from what I have searched I understand you can either add the dictionary words at the end of the corpus or use the giza option. I would like to try both, but for the giza option -d I see that the file format uses the word’s ids, then where will the real words (from the parallel dictionary) go? in the corpus as well? or in a separate file?

Any other suggestions for using a dictionary are welcome.

Thank you.
Read the rest of this entry »

Written by 52nlp

February 24th, 2010 at 8:37 pm

Posted in Moses,SMT

Tagged with , , ,

Moses Support Digest:dictionary problem solved

without comments

[Moses-support] dictionary problem solved

Hi all,

This dictionary problem is finally solved. “-d” option works well. I made a silly mistake here and caused the problem. I converted the dictionary file to UTF8, but the coding of other files is:7bit ASCII characters. So sorry to bother you for such a long time…
I really appreciate your kind help, especially Mark Fishel and Chris Dyer. You have helped this green hand a lot ;)

As I google this dictionary problem, all I found is my own question. So, to those who may use dictionary and don’t know how, here’s the advice:
1. well…make sure your texts of the same coding
2. check your giza++ source code, and find variable “useDict”, make sure it’s set to ture
3. add a “-d” option to your command, followed by your dictionary the dictionary should be in this format:
target-word-id source-word-id
it must be sorted by the target-word-id.

here’s my command line:
(you may have to know those options which are set to 0 or 1, or a lot of files would be generated )

./GIZA++
5 -CoocurrenceFile korean-chinese.cooc
6 -c korean-chinese-int-train.snt
7 -m1 5 -m2 0 -mh 5 -m3 3 -m4 3
8 -model1dumpfrequency 1
9 -model2dumpfrequency 1
10 -model345dumpfrequency 1
11 -hmmdumpfrequency 1
12 -model4smoothfactor 0.4
13 -nbestalignments 1
14 -onlyaldumps 0
15 -nodumps 0
16 -nsmooth 4
17 -d ck.txt
18 -o korean-chinese
19 -onlyaldumps 1
20 -p0 0.999
21 -s chinese.vcb
22 -t korean.vcb

2009-12-23

Best regards,

Lee Xianhua
Read the rest of this entry »

Written by 52nlp

December 23rd, 2009 at 11:50 pm

Posted in Moses,SMT

Tagged with , , ,

Moses Support Digest:How to run giza++ with a dictionary

without comments

[Moses-support] How to run giza++ with a dictionary?

hi all,
How to run giza++ with a dictionary?

I’ve looked through both moses manual and giza++ readme,but there seems no answer to this question.All I got to know is the format of dictionary, and so on.Could somebody please help me with this?

My command line is like this:

./GIZA++-m3 -CoocurrenceFile en-ch.cooc -c en-ch-int-train.snt -m1 5 -m2 0 -mh 5 -m3 5 -m4 0 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 0 -nsmooth 4 -o en-ch -onlyaldumps 1 -p0 0.999 -s ch.vcb -t en.vcb > logec 2> errec &

Thanks in advance.

2009-12-21

Best regards,

Lee Xianhua
Read the rest of this entry »

Written by 52nlp

December 21st, 2009 at 8:27 pm

Posted in Moses,SMT

Tagged with , , , ,

A Cool Dictionary for Natural Language Processing

without comments

I found Professor Bill Wilson’s “The Natural Language Processing Dictionary” accidentally tonight, and thought it very cool for nlpers. Except from the NLP Dictionary, you also can find the Prolog, Artificial Intelligence and Machine learning Dictionary in this web page. Below is from this Dictionary:
Read the rest of this entry »

Written by 52nlp

November 30th, 2009 at 11:53 pm