Moses Support Digest:POS LM

[Moses-support] POS LM

Hi

There is pos.lm of the target language in factored model training. I want to know the steps involved in preparing the POS.lm and the kind of input parameters altogether.

-Doren


Re:[Moses-support] POS LM

Hi Doren,

please read the tutorial on factored models:

http://www.statmt.org/moses/?n=Moses.FactoredTutorial

-phi

Re:[Moses-support] POS LM

Hi Doren,

I’ve used SRILM to generate POS LMs. The LM, as you might expect, needs to be training on a corpus consisting of sequences of POSes instead of sequences of surface forms, e.g. instead of

The cat sat on the mat

the corpus should contain

DET N V P DET N

or whatever.

Furthermore, the set of POSes is probably small as vocabularies go, so smoothing methods that rely on counts-of-counts, such as Kneser-Ney, are inappropriate. The SRILM website’s FAQrecommends Witten-Bell discounting (command line option ‘-wbdiscount’) for such cases. (See question C3, answer (b) at the FAQ.)

Also because the vocabulary is small, you can get away with using
higher-order n-grams than you would use for a surface LM.

Other than that, it’s the same as preparing a surface LM.

Regards,
Ben

NOTICE:This is digested from the Moses-support mailing list, which supports for the moses SMT decoder.

Related posts:

  1. Moses Support Digest:Building POS language model with SRILM
  2. Moses Support Digest:Moses Error in training phrase
  3. Moses Support Digest:Aligned phrase counts
  4. Moses Support Digest:Translation from English to Foreign Language
  5. Moses Support Digest:Moses step 1 – data preparation step
  6. Moses Support Digest:Hierarchical rule extraction
  7. Moses Support Digest:How to run giza++ with a dictionary
  8. Moses Support Digest:About giza++ options when running moses
  9. Moses Support Digest:Binarized SRILM
  10. Moses Support Digest:Dictonary use during training
This entry was posted in Moses, SMT and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>