This post is reprinted from Dr Zhang’s Maximum Entropy Modeling Toolkit manul. This section lists some recommended papers for your further reference.
1. Maximum Entropy Approach to Natural Language Processing [Berger et al., 1996]
A must read paper on applying maxent technique to Natural Language Processing. This paper describes maxent in detail and presents an Increment Feature Selection algorithm for increasingly construct a maxent model as well as several example in statistical Machine Translation.
2.Inducing Features of Random Fields [Della Pietra et al., 1997]
Another must read paper on maxent. It deals with a more general frame work: Random Fields and proposes an Improved Iterative Scaling algorithm for estimating parameters of Random Fields. This paper gives theoretical background to Random Fields (and hence Maxent model). A greedy Field Induction method is presented to automatically construct a detail random elds from a set of atomic features. An word morphology application for English is developed.
3.Adaptive Statistical Language Modeling: A Maximum Entropy Approach [Rosenfeld, 1996]
This paper applied ME technique to statistical language modeling task. More specically, it built a conditional Maximum Entropy model that incorporated traditional N-gram, distant N-gram and trigger pair features. Significantly perplexity reduction over baseline trigram model was reported. Later, Rosenfeld and his group proposed a Whole Sentence Exponential Model that overcome the computation bottleneck of conditional ME model.
4.Maximum Entropy Models For Natural Language Ambiguity Resolution [Ratnaparkhi, 1998]
This dissertation discussed the application of maxent model to various Natural Language Disambiguity tasks in detail. Several problems were attacked within the ME framework: sentence boundary detection, part-of-speech tagging, shallow parsing and text categorization. Comparison with other machine learning technique (Naive Bayes, Transform Based Learning, Decision Tree etc.) are given.
5.The Improved Iterative Scaling Algorithm: A Gentle Introduction [Berger, 1997]
This paper describes IIS algorithm in detail. The description is easier to understand than [Della Pietra et al., 1997], which involves more mathematical notations.
6.Stochastic Attribute-Value Grammars (Abney, 1997)
Abney applied Improved Iterative Scaling algorithm to parameters estimation of Attribute-Value grammars, which can not be corrected calculated by ERF method (though it works on PCFG). Random Fields is the model of choice here with a general Metropolis-Hasting Sampling on calculating feature expectation under newly constructed model.
7.A comparison of algorithms for maximum entropy parameter estimation [Malouf, 2003]
Four iterative parameter estimation algorithms were compared on several NLP tasks. L-BFGS was observed to be the most effective parameter estimation method for Maximum Entropy model, much better than IIS and GIS. [Wallach, 2002] reported similar results on parameter estimation of Conditional Random Fields.
Related posts:
- Statistical Machine Translation Tutorial Reading
- Graphical Models and Bayesian Networks Tutorial Reading
- Bayesian Modeling for Language Tutorial Reading
- Moses Support Digest:Is reordering model a must-be-used component to use?
- Moses Support Digest:Code monkey available,Will work for peanuts
- Moses Support Digest:About the hierarchical model of Moses
- Moses Support Digest:Building POS language model with SRILM
- From nlpers:Getting Started in NLP
- A Cool Dictionary for Natural Language Processing
- Moses Support Digest:Tuning failure with Language model type unknown