Moses Support Digest:GIZA++ error
[Moses-support] GIZA++ error
sir,
while running the Giza++ by the command
./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt
we are getting as below :
what is the coocurrence file ?
how to rectify this problem and run Giza++ ?
reading vocabulary files
Source vocabulary list has 35497 unique tokens
Target vocabulary list has 71683 unique tokens
Calculating vocabulary frequencies from corpus corp.en_corp.ta.snt
Reading more sentence pairs into memory …
Corpus fits in memory, corpus has: 14035 sentence pairs.
Train total # sentence pairs (weighted): 14035
Size of source portion of the training corpus: 330148 tokens
Size of the target portion of the training corpus: 262033 tokens
In source portion of the training corpus, only 35496 unique tokens appeared
In target portion of the training corpus, only 71681 unique tokens appeared
lambda for PP calculation in IBM-1,IBM-2,HMM:= 262033/(344183-14035)==
0.793683
ERROR: NO COOCURRENCE FILE GIVEN
Aborted
thank you
Re:[Moses-support] GIZA++ error
Hi Sreeja,
the error below is due to the fact that you are using a GIZA++ version compiled for use with coocurrence files without using such parameter. Typically, when you compile GIZA++ as part of the Moses toolkit, GIZA++ is compiled so as to use coocurrence files. This allows for better memory usage (and reduces computational time).
Hence, there are two ways to solve this problem:
1.- you compile GIZA++ without the coocurrence file option
2.- you provide a coocurrence file to the GIZA++ you have already compiled.
In order to produce a coocurrence file, you will need to use the
following command:
snt2cooc.out
In your case, this would look like this:
snt2cooc.out corp.en.vcb corp.ta.vcb corp.en_corp.ta.snt > corp.cooc
This will generate a coocurrence file, i.e. corp.cooc, will you will need to pass to GIZA++ as:
./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt -CoocurrenceFile corp.cooc
the snt2cooc.out binary is to be found in the GIZA++ compilation directory.
Cheers,
Germán Sanchis-Trilles
NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.
Related posts:
- Moses Support Digest:running giza in parts
- Moses Support Digest:About giza++ options when running moses
- Moses Support Digest:How to run giza++ with a dictionary
- Moses Support Digest:Moses Error in training phrase
- Moses Support Digest:ConfusionNet GetSubString error when using lattice with UTF8 input
- Moses Support Digest: moses-chart error while compiling training scripts
- Moses Support Digest:How do you solve this moses problem
- Moses Support Digest:mt3_chart compilation error
- Moses Support Digest:Dictonary use during training
- Moses Support Digest:Error compiling on Linux