Moses Support Digest: different bleu scores from nist and moses scripts

[Moses-support] different bleu scores from nist and moses scripts

Dear list,

I am getting different BLEU scores from the NIST mteval script (version) and the multi-bleu.perl script within Moses’s distribution for the same reference and hypothesis translations — even the individual n-gram precisions are different:

BLEU = 16.80, 53.0/26.2/13.4/6.4 (BP=0.905, ratio=0.909, hyp_len=281, ref_len=309)

and

BLEU score = 0.1681 for system “x”

Individual N-gram scoring
1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram
8-gram 9-gram
—— —— —— —— —— —— ——
—— ——
BLEU: 0.5246 0.2591 0.1326 0.0630 0.0328 0.0213 0.0133
0.0046 0.0000 “x”

The files that produced the scores are here: mtj.ut.ee/diffbleu.tgz .

Does everyone else get different scores? Can anyone suggest a reason for that? It’s not the smoothing of the NIST script, both support UTF8 i/o, etc; so I’m out of ideas, and before comparing the implementations I wanted to ask for opinions.

Thanks in advance,
Mark

Re:[Moses-support] different bleu scores from nist and moses scripts

IIRC, the principle difference is the calculation of the brevity penalty, but there also seem to be some slight differences in tokenization between the scripts.

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest: experiment management system and Moses scripts
  2. Moses Support Digest: moses-chart error while compiling training scripts
  3. Moses Support Digest: Issues with Score data
  4. Moses Support Digest:ConfusionNet GetSubString error when using lattice with UTF8 input
  5. Moses Support Digest:nbest list option in decoder
  6. Moses Support Digest:Building POS language model with SRILM
  7. Moses Support Digest: moses-irstlm memory racing with 5-gram lm
  8. Moses Support Digest: Note that regr. test for ptable-filtering fails
  9. Moses Support Digest: different servers different time different result
  10. Moses Support Digest:tuning tree-based models
This entry was posted in Moses, SMT and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>