Archive for February, 2010
Moses Support Digest:GIZA++ error
[Moses-support] GIZA++ error
sir,
while running the Giza++ by the command
./GIZA++ -S corp.en.vcb -T corp.ta.vcb -C corp.en_corp.ta.snt
we are getting as below :
what is the coocurrence file ?
how to rectify this problem and run Giza++ ?
reading vocabulary files
Source vocabulary list has 35497 unique tokens
Target vocabulary list has 71683 unique tokens
Calculating vocabulary frequencies from corpus corp.en_corp.ta.snt
Reading more sentence pairs into memory …
Corpus fits in memory, corpus has: 14035 sentence pairs.
Train total # sentence pairs (weighted): 14035
Size of source portion of the training corpus: 330148 tokens
Size of the target portion of the training corpus: 262033 tokens
In source portion of the training corpus, only 35496 unique tokens appeared
In target portion of the training corpus, only 71681 unique tokens appeared
lambda for PP calculation in IBM-1,IBM-2,HMM:= 262033/(344183-14035)==
0.793683
ERROR: NO COOCURRENCE FILE GIVEN
Aborted
thank you
Read the rest of this entry »
Moses Support Digest:Is reordering model a must-be-used component to use?
[Moses-support] Is reordering model a “must-be-used” component to use?
I wonder if I may exclude the reordering model during search.Since I came up with my morpho-syntactical preprocessor to transform source languagein both training-time and run-time, and reordering model deals with the local reordering or wordsduring translation, there doesn’t seem to be the need for using it if the preprocessorcompletely got rid of local/global distortion, even for language model?Is my hypothesis justified?Actually, from my subjective evaluation, using phrase-table and lm alone shows better result thanwhen I use them with reordering model. But I am not sure of my theory.
Read the rest of this entry »
Moses Support Digest:about the moses-chart reordering
[Moses-support] about the moses-chart reordering
Dear all:
Could you tell me where can I find the details of moses-chart reordering? It seems that the reordering table is quite different from that of moses. Is there any papers or documents for it? Thank you very much!
Best regards!
Jie Jiang
CNGL, School of Computing,
Dublin City University,
Glasnevin, Dublin 9.
Tel: +353 (0)1 700 6724
Read the rest of this entry »
Moses Support Digest:Dictonary use during training
[Moses-support] Dictonary use during training
Hi all, I’m wondering if you would know where I can find an english to spanish parallel, word to word dictionary to complement my training corpus.
Also, from what I have searched I understand you can either add the dictionary words at the end of the corpus or use the giza option. I would like to try both, but for the giza option -d I see that the file format uses the word’s ids, then where will the real words (from the parallel dictionary) go? in the corpus as well? or in a separate file?
Any other suggestions for using a dictionary are welcome.
Thank you.
Read the rest of this entry »
Moses Support Digest:2.718 in the phrase-table
[Moses-support] 2.718 in the phrase-table
The exact purpose of the value 2.718 is what I am not sure of.As far as I know, the value is used for preferring the hypothesis with less phrases tothe one that use more phrases(words) for the same coverage of the source sentence duringthe prefix cost comparison inside priority queue( for the hyps that covers the same source range ).for example ) phrase_1 + phrase_2 + phrase_3 )to prefer hyp1 to hyp2 by multiplying hyp1 by 2.718.This is how I understand the use of the value, since longer phrase has empirically better translationthan the one made up of word-based translation.Is there any one who can confirm my belief, or to correct my conclusion?p.s. Is 2.718 the Euler’s number? If it is, why is the weight determined as such?
Read the rest of this entry »
Moses Support Digest:mert extractor
[Moses-support] mert extractor
Another basic question: When trying to run the mert-moses-new.pl script it seems to require a mert script called extractor, but my training/cmert-0.5/ directory does not contain such a script.
the mert-moses-new.pl contains these two lines:
my $mert_extract_cmd = “$mertdir/extractor”;
my $mert_mert_cmd = “$mertdir/mert”
Where can I find the extractor or what could be the reason it didnt get included, when executing ”make release”.
Thanks!
Marce
Read the rest of this entry »
Moses Support Digest:Translation from English to Foreign Language
[Moses-support] Translation from English to Foreign Language
Is it possible to translate with moses from English to any Foreign language?
because I tried using moses, and at “Sanity Check Trained Model” step, the result of best translation always UNK (English-Foreign), but if I translate from Foreign-English, it’s work fine
I modified Train Phrase Model into:
nohup nice
tools/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored-phrase-model.perl -scripts-root-dir tools/moses-scripts/scripts-YYYYMMDD-HHMM/ -root-dir
work -corpus work/corpus/news-commentary.lowercased -e en -f id -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:/home/jschroe1/demo/work/lm/news-commentary.lm >&
work/training.out &
the original was:
nohup nice tools/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored-phrase-model.perl -scripts-root-dir tools/moses-scripts/scripts-YYYYMMDD-HHMM/ -root-dir work -corpus work/corpus/news-commentary.lowercased -f fr -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/home/jschroe1/demo/work/lm/news-commentary.lm >& work/training.out &
I’ve build language model for my foreign language, not English-language-model. Is there anything I missed here?
Laurent
Read the rest of this entry »
Moses Support Digest:Moses liscencing terms when used in a commercial product
[Moses-support] Moses liscencing terms when used in a commercial product
Hi Philipp,
We are very interested to use moses for our language translation purpose. We would like to know the liscence & payment terms & conditions on building a product out of the moses as a translation engine on some domain specific corpus. Here are our questions:
1. When our product internally uses moses-decoder to translate from one language to other, do we need to buy a liscence for moses decoder?
2. Or because it is under GNU and open source, will it allow the commercial products to be developed that internally uses moses decoder?
Thanks & Regards,
Abhinandan
Read the rest of this entry »