Moses Support Digest:moses decoder results on cygwin and dos
[Moses-support] moses decoder results on cygwin and dos
Dear All
Running the moses decoder on cygwin and dos gives slightly different results, even though I’m using the same executable and the same models.
For example, translating from Welsh to English:
Welsh: bydd y bore ‘n oer .
English: the morning will be cold .moses at cygwin: morning will be cold .
moses at dos: bydd the morning will be cold .
The main problem is that on dos, moses is always returning the first word of the source language, prepended to the translation itself. Easy to strip off but annoying. The translation itself is often slightly better on dos than on cygwin, as above (which is if anything even stranger).
Can anyone account for this strange behaviour? More important, how can I stop the first word of source language returning?
Thanks and best wishes
Ivan
Re:[Moses-support] moses decoder results on cygwin and dos
hi ivan
i think this might be a problem with line ending again. The non-printing 0×13 character is being appended to the 1st input word which causes it to be unrecognised so it is outputted ad-verbatim. Cygwin properly has internal code which strips out this character make sure you convert all text files to unix line endings using dos2unix.
Re:[Moses-support] moses decoder results on cygwin and dos
Hieu
Thanks for your comment.
How can this be a line-ending issue? Where are line-endings involved?
What is appending an extra character to the first word and why?
The 1st input word *is* being recognised and translated (as I said, the translations under dos are correct) — “bydd” translates to “will be”.
I’m using identical material under cygwin and dos, the only difference
is under cygwin I’m using a shell script and under dos I’m using a
“.bat” file. If it is a line-ending issue why is it affecting dos and
not cygwin?
Re:[Moses-support] moses decoder results on cygwin and dos
Hi,
I have seen text files under windows that add a starting byte to indicate the encoding of the file. Sine the first word is a problem,
this may be the cause.
-phi
Re:[Moses-support] moses decoder results on cygwin and dos
I am not sending text to moses from a text file, I am using the command-line:
m.bat contains:
echo %1 | c:\cygwin\path\to\moses.exe -f c:\cygwin\path\to\moses.ini 2> msc_tywyddTeletestun.err
usage
> m.bat “bydd y bore ‘n oer .”
“bydd the morning will be cold .”
Re:[Moses-support] moses decoder results on cygwin and dos
the echo in dos & unix works slightly differently.
dos: echo “c”
“c”
unix:echo “c”
c
the ” char is being appended to the 1st word. oh the joys of command line….
Re:[Moses-support] moses decoder results on cygwin and dos
Dear Hieu
This was the vital clue I needed. Thanks! If I make sure I use the cygwin echo instead of the dos echo, I get the expected result without the repeated first word.
Why is the dos echo idiosyncratic and unreliable? I think we all know the answer to that.
Thanks again for persevering with suggestions.
Best
Ivan
NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.
Related posts:
- Moses Support Digest: Moses decoder on windows freezes after 5 sentences
- Moses Support Digest:moses for cygwin
- Moses Support Digest: compiling on Mac
- Moses Support Digest:Moses Error in training phrase
- Moses Support Digest:RDBMS for the decoder
- Moses Support Digest:nbest list option in decoder
- Moses Support Digest: Word Alignment – Moses
- Moses Support Digest:Pulling source data
- Moses Support Digest:dictionary problem solved
- Moses Support Digest:POS translation