I Love Natural Language Processing

I LOVE NLP

Moses Support Digest:ConfusionNet GetSubString error when using lattice with UTF8 input

without comments

[Moses-support] ConfusionNet::GetSubString error when using lattice with UTF8 input

Hi,

I’m having a strange problem that moses crashes when fed with a lattice that has non-ASCII characters in it. If the input is:

(((‘damit’,1.0,1),),((‘ist’,1.0,1),(‘war’,1.0,1),(‘sei’,1.0,1),),((‘der’,1.0,1),(‘die’,1.0,1),(‘das’,1.0,1),),((‘arbeitsplan’,1.0,1),),)

then moses completes without any problems. However, if I change the
last edge into Chinese,

(((‘damit’,1.0,1),),((‘ist’,1.0,1),(‘war’,1.0,1),(‘sei’,1.0,1),),((‘der’,1.0,1),(‘die’,1.0,1),(‘das’,1.0,1),),((‘ç»™’,1.0,1),),)

.. then moses crashes with the following error:


Translating: word lattice: 4
0 — (damit , 0.000, 1)
1 — (ist , 0.000, 1) (war , 0.000, 1) (sei , 0.000, 1)
2 — (der , 0.000, 1) (die , 0.000, 1) (das , 0.000, 1)
3 — (ç»™ , 0.000, 1)

path stats for current CN:
CN (full): 8.000 15.000 18.000 9.000
CN (explored): 1 0 0 0
ERROR: call to ConfusionNet::GetSubString

My command is:

./???/moses-cmd/src/moses
-inputtype 2 -weight-i 0
-config `pwd`/moses.ini
-input-file `pwd`/src.lattice
-verbose 3

And below is my moses.ini, although there’s nothing particularly
interesting about it:

# MERT optimized configuration
# decoder /???/moses-cmd/src/moses
# BLEU 0.517812 -> 0.517945 on dev /???/working/dev.zh
# We were before running iteration 7
# finished Tue Dec 22 02:44:42 SGT 2009
### MOSES CONFIG FILE ###
#########################

# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

# translation tables: source-factors, target-factors, number of scores, file
[ttable-file]
0 0 5 /my-working-directory/working/model/phrase-table.gz

# no generation models, no generation-file section

# language models: type(srilm/irstlm), factors, order, file
[lmodel-file]
0 0 5 /???/en.lm

# limit on how many phrase translations e for each phrase f are loaded
# 0 = all elements loaded
[ttable-limit]
20

# distortion (reordering) files
[distortion-file]
0-0 monotonicity-bidirectional-f 4 /???/working/model/reordering-table.gz

# distortion (reordering) weight
[weight-d]
0.008227
-0.085606
0.024868
0.064056
0.074998

# language model weights
[weight-l]
0.218681

# translation model weights
[weight-t]
0.042811
0.084593
0.147177
0.045780
0.073527

# no generation models, no weight-generation section

# word penalty
[weight-w]
-0.129677

[distortion-limit]
6

[drop-unknown]
1

Any help is greatly appreciated.

Regards,
Liu Chang
National University of Singapore

Re:[Moses-support] ConfusionNet::GetSubString error when using lattice with UTF8 input

Confusion network input causes this error when verbose=3. You can fix this by using a lower level verbosity.

Re:[Moses-support] ConfusionNet::GetSubString error when using lattice with UTF8 input

That solved my problem. Thanks!

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest: Moses seems to hang
  2. Moses Support Digest:word lattice and multiple translation tables optimization problem
  3. Moses Support Digest:Is reordering model a must-be-used component to use?
  4. Moses Support Digest: Regarding moses.weight-reused.ini
  5. Moses Support Digest:CreateBerkeleyPt and On-Disk Rule Table
  6. Moses Support Digest:Error compiling on Linux
  7. Moses Support Digest:Moses Error in training phrase
  8. Moses Support Digest:mt3_chart compilation error
  9. Moses Support Digest:Alignment information from binary phrase table
  10. Moses Support Digest:Tuning failure with Language model type unknown

Written by 52nlp

January 2nd, 2010 at 9:57 am

Posted in Moses,SMT

Tagged with , , ,

Leave a Reply