<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>52nlp&#039;s Learning Notes &#187; Bayesian Modeling</title>
	<atom:link href="http://www.52nlp.com/tag/bayesian-modeling/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.52nlp.com</link>
	<description>Natural Language Processing, Machine Learning, Programming Skill, Mathematics</description>
	<lastBuildDate>Sat, 23 Apr 2011 05:17:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>Bayesian Modeling for Language Tutorial Reading</title>
		<link>http://www.52nlp.com/bayesian-modeling-for-language-tutorial-reading/</link>
		<comments>http://www.52nlp.com/bayesian-modeling-for-language-tutorial-reading/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 15:04:01 +0000</pubDate>
		<dc:creator>52nlp</dc:creator>
				<category><![CDATA[NLP]]></category>
		<category><![CDATA[Bayesian Modeling]]></category>

		<guid isPermaLink="false">http://www.52nlp.com/bayesian-modeling-for-language-tutorial-reading/</guid>
		<description><![CDATA[This is reprint from Sharon Goldwater&#8217;s &#8220;Reading list on Bayesian modeling for language&#8220;. People often ask me what they can read to learn more about recent Bayesian modeling techniques and their applications to language learning. Here is a list of &#8230; <a href="http://www.52nlp.com/bayesian-modeling-for-language-tutorial-reading/">Continue reading <span class="meta-nav">&#8594;</span></a>


Related posts:<ol><li><a href='http://www.52nlp.com/graphical-models-and-bayesian-networks-tutorial-reading/' rel='bookmark' title='Permanent Link: Graphical Models and Bayesian Networks Tutorial Reading'>Graphical Models and Bayesian Networks Tutorial Reading</a></li>
<li><a href='http://www.52nlp.com/maximum-entropy-model-tutorial-reading/' rel='bookmark' title='Permanent Link: Maximum Entropy Model Tutorial Reading'>Maximum Entropy Model Tutorial Reading</a></li>
<li><a href='http://www.52nlp.com/statistical-machine-translation-tutorial-reading/' rel='bookmark' title='Permanent Link: Statistical Machine Translation Tutorial Reading'>Statistical Machine Translation Tutorial Reading</a></li>
<li><a href='http://www.52nlp.com/a-cool-dictionary-for-natural-language-processing/' rel='bookmark' title='Permanent Link: A Cool Dictionary for Natural Language Processing'>A Cool Dictionary for Natural Language Processing</a></li>
<li><a href='http://www.52nlp.com/moses-support-digesta-code-monkey-available-will-work-for-peanuts/' rel='bookmark' title='Permanent Link: Moses Support Digest:Code monkey available,Will work for peanuts'>Moses Support Digest:Code monkey available,Will work for peanuts</a></li>
<li><a href='http://www.52nlp.com/from-nlpers-getting-started-in-nlp/' rel='bookmark' title='Permanent Link: From nlpers:Getting Started in NLP'>From nlpers:Getting Started in NLP</a></li>
<li><a href='http://www.52nlp.com/moses-support-digest-about-the-moses-chart-reordering/' rel='bookmark' title='Permanent Link: Moses Support Digest:about the moses-chart reordering'>Moses Support Digest:about the moses-chart reordering</a></li>
<li><a href='http://www.52nlp.com/moses-support-digest-any-documentation-about-the-multiple-decoding-path-functionality/' rel='bookmark' title='Permanent Link: Moses Support Digest: Any documentation about the Multiple Decoding Path functionality'>Moses Support Digest: Any documentation about the Multiple Decoding Path functionality</a></li>
<li><a href='http://www.52nlp.com/from-nlpersgetting-started-in-summarization/' rel='bookmark' title='Permanent Link: From nlpers:Getting Started In Summarization'>From nlpers:Getting Started In Summarization</a></li>
<li><a href='http://www.52nlp.com/moses-support-digest-call-for-papers-pbml/' rel='bookmark' title='Permanent Link: Moses Support Digest: CALL FOR PAPERS &#8211; PBML'>Moses Support Digest: CALL FOR PAPERS &#8211; PBML</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This is reprint from Sharon Goldwater&#8217;s &#8220;<a href="http://homepages.inf.ed.ac.uk/sgwater/reading_list.html" target="_blank">Reading list on Bayesian modeling for language</a>&#8220;.<span id="more-44"></span></p>
<p>People often ask me what they can read to learn more about recent Bayesian modeling techniques and their applications to language learning.  Here is a list of the papers I have found to be most useful and relevant to my own research.  I try to emphasize the papers aimed at a slightly less technical/more cognitively inclined audience.  This is not intended to be a complete list, only a starting point.</p>
<hr /><strong> General introductory material </strong></p>
<p>Thomas L. Griffiths and Alan Yuille (2006). <a href="http://cocosci.berkeley.edu/tom/papers/tutorial.pdf">A primer on probabilistic inference.</a> Trends in Cognitive Sciences. Supplement to special issue on Probabilistic Models of Cognition (volume 10, issue 7).</p>
<ul>
<li> Reviews many of the basic concepts underlying probabilistic (especially Bayesian) modeling and inference, using simple examples.</li>
</ul>
<p>Sharon Goldwater (2006). <a href="http://homepages.inf.ed.ac.uk/sgwater/papers/thesis_1spc.pdf">Nonparametric Bayesian Models of Lexical Acquisition.</a> Unpublished doctoral dissertation, Brown University, 2006.</p>
<ul>
<li> Aimed primarily at computational linguists, but should (I hope) be accessible to anyone who has a basic familiarity with generative probabilistic models. Chapters 2 and 3 cover many useful topics, including Bayesian integration in finite and infinite models (i.e., Dirichlet distribution, Dirichlet process, Chinese restaurant process) and a brief introduction to sampling techniques (Gibbs sampling and Metropolis-Hastings sampling).</li>
</ul>
<p>Daniel J. Navarro, Thomas L. Griffiths, Mark Steyvers, and Michael D. Lee (2006). <a href="http://cocosci.berkeley.edu/tom/papers/indivdiffs_jmp.pdf">Modeling individual differences using Dirichlet processes.</a> Journal of Mathematical Psychology, 50, 101-122.</p>
<ul>
<li> A very nice introduction to Dirichlet processes aimed at cognitive scientists. Slightly more in-depth, covers the stick-breaking construction for the Dirichlet process (which is not in my thesis) as well as the Chinese restaurant process.</li>
</ul>
<p><strong> Bayesian language models for learning </strong></p>
<p>Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson (2007). <a href="http://homepages.inf.ed.ac.uk/sgwater/papers/bucld07.pdf"> Distributional Cues to Word Segmentation: Context is Important.</a> Proceedings of the 31st Boston University Conference on Language Development.</p>
<p>Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson (2006). <a href="http://homepages.inf.ed.ac.uk/sgwater/papers/acl06.pdf"> Contextual Dependencies in Unsupervised Word Segmentation.</a> Proceedings of Coling/ACL.</p>
<ul>
<li> These two papers apply the Dirichlet process and hierarchical Dirichlet process to word segmentation. The BUCLD paper is more conceptual, the ACL paper is more technical. For a more in-depth treatment, see also Chapter 5 of my thesis (above).</li>
</ul>
<p>Sharon Goldwater and Thomas L. Griffiths. <a href="http://homepages.inf.ed.ac.uk/sgwater/papers/acl07-bhmm.pdf"> A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging.</a> Proceedings of the Association for Computational Linguistics.</p>
<ul>
<li>This paper provides a direct comparison between Bayesian methods (averaging over parameters and estimation using Gibbs sampling) and standard methods (estimating parameters directly using EM) using the same underlying model (a standard finite HMM).</li>
</ul>
<p>Mark Johnson (2007). <a href="http://acl.ldc.upenn.edu/D/D07/D07-1031.pdf"> Why Doesn&#8217;t EM Find Good HMM POS-Taggers? </a>Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).</p>
<ul>
<li>Includes Variational Bayes as well as Gibbs sampling and EM as estimation procedures. Results are somewhat contradictory to Goldwater and Griffiths, possibly due to the combination of a simpler model and more training data.</li>
</ul>
<p>Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). <a href="http://www.cs.berkeley.edu/%7Epliang/papers/hdppcfg-emnlp2007.pdf">The infinite PCFG using hierarchical Dirichlet processes.</a>Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL).</p>
<p>Jenny Rose Finkel, Trond Grenager and Christopher D. Manning (2007).  <a href="http://www.stanford.edu/%7Ejrfinkel/papers/infinite_tree.pdf">The Infinite Tree.</a> Proceedings of the Association for Computational Linguistics.</p>
<p>Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater (2007).  <a href="http://homepages.inf.ed.ac.uk/sgwater/papers/nips07-adaptor.pdf">Adaptor Grammars: a Framework for Specifying Compositional Nonparametric Bayesian Models.</a> Advances in Neural Information Processing Systems 19.</p>
<ul>
<li>These three papers all deal with nonparametric models of syntax (dependency or context-free grammars). They might be a bit tough for those with less background in nonparametrics, although the exposition in Liang et al. is very nice.</li>
</ul>
<p>Thomas L. Griffiths, Michael Steyvers, and Joshua B. Tenenbaum (2007).  <a href="http://cocosci.berkeley.edu/tom/papers/topicsreview.pdf">Topics in semantic representation. </a> Psychological Review, 114, 211-244.</p>
<p>Thomas L. Griffiths, Michael Steyvers, David M. Blei, and Joshua B. Tenenbaum (2005).  <a href="http://cocosci.berkeley.edu/tom/papers/composite.pdf">Integrating topics and syntax.</a> Advances in Neural Information Processing Systems 17.</p>
<p>David Blei, Andrew Ng, and Michael Jordan (2003). <a href="http://www.cs.princeton.edu/%7Eblei/papers/BleiNgJordan2003.pdf">Latent Dirichlet allocation.</a> Journal of Machine Learning Research, 3:993-1022. (A shorter version appeared in NIPS 2002).</p>
<ul>
<li>These three papers are about Latent Dirichlet Allocation (a.k.a. topic models) for learning semantic structure. The Psych Review paper provides a less technical introduction and considers LDA as a cognitive model. The JMLR paper is the original one, suitable if you want more technical details. The NIPS paper is just cool.</li>
</ul>
<p>Fei Xu and Joshua B. Tenenbaum (2007). <a href="http://www.psych.ubc.ca/%7Efei/XuTenenbaum-PsychRev.pdf"> Word learning as Bayesian inference.</a> Psychological Review, 114, 245-272.</p>
<ul>
<li>Develops a Bayesian model to explain how children learn words at different levels of specificity (basic-level categories versus subordinate or superordinate).</li>
</ul>
<p><strong> Bayesian models of language processing </strong></p>
<p>This isn&#8217;t really my area, but here are a couple of interesting papers I know of:</p>
<p>Dennis Norris (2006).  <a href="http://www.mrc-cbu.cam.ac.uk/%7Edennis/BayesianReader.pdf">The Bayesian reader: explaining word recognition as an optimal Bayesian decision process.</a> Psychological Review, 113(2), 327-357.</p>
<p>Naomi Feldman and Thomas L. Griffiths (2007). <a href="http://cocosci.berkeley.edu/tom/papers/perceptualmagnet.pdf"> A rational account of the perceptual magnet effect.</a> Proceedings of the Twenty-Ninth Annual Conference of the Cognitive Science Society.</p>
<p><strong> Inference </strong></p>
<p>A bunch of the papers mentioned above have descriptions of sampling algorithms and/or variational inference procedures for specific models. For more general information on these topics, consider reading some of the following:</p>
<p>Sharon Goldwater (2006). <a href="http://homepages.inf.ed.ac.uk/sgwater/papers/thesis_1spc.pdf">Nonparametric Bayesian Models of Lexical Acquisition.</a> Unpublished doctoral dissertation, Brown University, 2006.</p>
<ul>
<li> As I mentioned above, there is a brief overview of Markov chain Monte Carlo methods (Gibbs sampling and Metropolis-Hastings) in Chapter 2. Examples of Gibbs sampling algorithms are described in chapters 4 and 5.</li>
</ul>
<p>Julian Besag (2000). <a href="http://citeseer.ist.psu.edu/cache/papers/cs/16898/http:zSzzSzwww.csss.washington.eduzSzPaperszSzwp9.pdf/besag00markov.pdf">Markov chain Monte Carlo for statistical inference.</a> Working paper no. 9.  University of Washington Center for Statistics and the Social Sciences.</p>
<ul>
<li> A longer and more technical introduction to Markov chain Monte Carlo methods.</li>
</ul>
<p>Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater (2007). <a href="http://homepages.inf.ed.ac.uk/sgwater/papers/naacl07-mcmc-pcfg.pdf"> Bayesian Inference for PCFGs via Markov Cain Monte Carlo.</a> Proceedings of the North American Association for Computational Linguistics.</p>
<ul>
<li> How to do efficient sampling for PCFGs.</li>
</ul>
<p>Matthew Beal (2003). <a href="http://www.cse.buffalo.edu/faculty/mbeal/papers/beal03.pdf">Variational Algorithms for Approximate Bayesian Inference.</a> PhD. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Or download individual chapters from <a href="http://www.cse.buffalo.edu/faculty/mbeal/thesis/"> here.</a>)</p>
<ul>
<li> I don&#8217;t know much about variational methods myself, but I&#8217;ve been told this is a good place to start.</li>
</ul>
<p><strong> Further Reading </strong></p>
<p>Yee Whye Teh, Michael Jordan, Matthew Beal, and David Blei (2006). <a href="http://www.cs.princeton.edu/%7Eblei/papers/TehJordanBealBlei2006.pdf"> Hierarchical Dirichlet processes. </a> Journal of the American Statistical Association, 2006. 101(476):1566-1581.</p>
<ul>
<li>The original HDP paper. Comprehensive, but I would suggest getting familiar with the ideas using some of the resources above before reading this one.</li>
</ul>
<p>Radford Neal (1993). <a href="http://omega.albany.edu:8008/neal.pdf">Probabilistic Inference Using Markov Chain Monte Carlo Methods.</a> Technical report CRG-TR-93-1. University of Toronto Department of Computer Science.</p>
<ul>
<li> Even more information about Markov chain Monte Carlo methods.</li>
</ul>
<p><!--adsense--></p>


<p>Related posts:<ol><li><a href='http://www.52nlp.com/graphical-models-and-bayesian-networks-tutorial-reading/' rel='bookmark' title='Permanent Link: Graphical Models and Bayesian Networks Tutorial Reading'>Graphical Models and Bayesian Networks Tutorial Reading</a></li>
<li><a href='http://www.52nlp.com/maximum-entropy-model-tutorial-reading/' rel='bookmark' title='Permanent Link: Maximum Entropy Model Tutorial Reading'>Maximum Entropy Model Tutorial Reading</a></li>
<li><a href='http://www.52nlp.com/statistical-machine-translation-tutorial-reading/' rel='bookmark' title='Permanent Link: Statistical Machine Translation Tutorial Reading'>Statistical Machine Translation Tutorial Reading</a></li>
<li><a href='http://www.52nlp.com/a-cool-dictionary-for-natural-language-processing/' rel='bookmark' title='Permanent Link: A Cool Dictionary for Natural Language Processing'>A Cool Dictionary for Natural Language Processing</a></li>
<li><a href='http://www.52nlp.com/moses-support-digesta-code-monkey-available-will-work-for-peanuts/' rel='bookmark' title='Permanent Link: Moses Support Digest:Code monkey available,Will work for peanuts'>Moses Support Digest:Code monkey available,Will work for peanuts</a></li>
<li><a href='http://www.52nlp.com/from-nlpers-getting-started-in-nlp/' rel='bookmark' title='Permanent Link: From nlpers:Getting Started in NLP'>From nlpers:Getting Started in NLP</a></li>
<li><a href='http://www.52nlp.com/moses-support-digest-about-the-moses-chart-reordering/' rel='bookmark' title='Permanent Link: Moses Support Digest:about the moses-chart reordering'>Moses Support Digest:about the moses-chart reordering</a></li>
<li><a href='http://www.52nlp.com/moses-support-digest-any-documentation-about-the-multiple-decoding-path-functionality/' rel='bookmark' title='Permanent Link: Moses Support Digest: Any documentation about the Multiple Decoding Path functionality'>Moses Support Digest: Any documentation about the Multiple Decoding Path functionality</a></li>
<li><a href='http://www.52nlp.com/from-nlpersgetting-started-in-summarization/' rel='bookmark' title='Permanent Link: From nlpers:Getting Started In Summarization'>From nlpers:Getting Started In Summarization</a></li>
<li><a href='http://www.52nlp.com/moses-support-digest-call-for-papers-pbml/' rel='bookmark' title='Permanent Link: Moses Support Digest: CALL FOR PAPERS &#8211; PBML'>Moses Support Digest: CALL FOR PAPERS &#8211; PBML</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.52nlp.com/bayesian-modeling-for-language-tutorial-reading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

