This is not a list of required readings for the course - instead the resources are meant to give you a starting point for getting into text analysis and scraping and finding interesting applications in your field. If you think something is missing - ping me and I will add.


Overviews of the Field

  • Atteveldt, Wouter van, and Tai-Quan Peng. “When Communication Meets Computation: Opportunities, Challenges, and Pitfalls in Computational Communication Science.” Communication Methods and Measures 12, no. 2–3 (April 3, 2018): 81–92.
  • Fréchet, Nadjim, Justin Savoie, and Yannick Dufresne. “Analysis of Text-Analysis Syllabi: Building a Text-Analysis Syllabus Using Scaling.” PS: Political Science & Politics, undefined/ed, 1–6.
  • Gentzkow, Matthew, Bryan Kelly, and Matt Taddy. “Text as Data.” Journal of Economic Literature 57, no. 3 (September 1, 2019): 535–74.
  • Grimmer, J., and B. M. Stewart. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21, no. 3 (July 1, 2013): 267–97.
  • Schoonvelde, Martijn, Gijs Schumacher, and Bert N. Bakker. “Friends With Text as Data Benefits: Assessing and Extending the Use of Automated Text Analysis in Political Science and Political Psychology.” Journal of Social and Political Psychology 7, no. 1 (February 8, 2019): 124-143–143.

Text Books and Cheat Sheets


  • Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. “Quanteda: An R Package for the Quantitative Analysis of Textual Data.” Journal of Open Source Software 3, no. 30 (October 6, 2018): 774.
  • Roberts, Margaret E., Brandon M. Stewart, and Dustin Tingley. “Stm: R Package for Structural Topic Models.” Journal of Statistical Software, 2013.

Key Methods

  • Barberá, Pablo, Amber E. Boydstun, Suzanna Linn, Ryan McMahon, and Jonathan Nagler. “Automated Text Classification of News Articles: A Practical Guide.” Political Analysis, undefined/ed, 1–24.
  • Cranmer, Skyler J. “Introduction to the Virtual Issue: Machine Learning in Political Science,” n.d., 9.
  • Denny, Matthew James, and Arthur Spirling. “Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, January 25, 2017.
  • Grimmer, Justin, and Gary King. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences 108, no. 7 (February 15, 2011): 2643–50.
  • Monroe, B. L., and P. A. Schrodt. “Introduction to the Special Issue: The Statistical Analysis of Political Text.” Political Analysis 16, no. 4 (October 4, 2008): 351–55.
  • Muddiman, Ashley, Shannon C. McGregor, and Natalie Jomini Stroud. “(Re)Claiming Our Expertise: Parsing Large Text Corpora With Manually Validated and Organic Dictionaries.” Political Communication 0, no. 0 (November 7, 2018): 1–13.
  • Roberts, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G. Rand. “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58, no. 4 (October 1, 2014): 1064–82.
  • Rodman, Emma. “A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors.” Political Analysis, undefined/ed, 1–25.
  • Slapin, Jonathan B., and Sven-Oliver Proksch. “A Scaling Model for Estimating Time-Series Party Positions from Texts.” American Journal of Political Science 52, no. 3 (July 1, 2008): 705–22.

Interesting Applications

Applications mentioned in class and other interesting applications - this list is by no means complete and misses a lot of relevant and great research.

  • Anastasopoulos, L. Jason, and Anthony M. Bertelli. “Understanding Delegation Through Machine Learning: A Method and Application to the European Union.” American Political Science Review, undefined/ed, 1–11.
  • Bauer, Paul C., Pablo Barberá, Kathrin Ackermann, and Aaron Venetz. “Is the Left-Right Scale a Valid Measure of Ideology?” Political Behavior 39, no. 3 (2017): 553–83.
  • Beltran, Javier, Aina Gallego, Alba Huidobro, Enrique Romero, and Lluís Padró. “Male and Female Politicians on Twitter: A Machine Learning Approach.” European Journal of Political Research n/a, no. n/a. Accessed March 24, 2020.
  • Benoit, Kenneth, Kevin Munger, and Arthur Spirling. “Measuring and Explaining Political Sophistication through Textual Complexity.” American Journal of Political Science 63, no. 2 (2019): 491–508.
  • Burscher, Bjorn, Rens Vliegenthart, and Claes H. De Vreese. “Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?” The ANNALS of the American Academy of Political and Social Science 659, no. 1 (May 1, 2015): 122–31.
  • DiMaggio, Paul, Manish Nag, and David Blei. “Exploiting Affinities between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of U.S. Government Arts Funding.” Poetics, Topic Models and the Cultural Sciences, 41, no. 6 (December 2013): 570–606.
  • Egami, Naoki, Christian J Fong, Justin Grimmer, Margaret E Roberts, and Brandon M Stewart. “How to Make Causal Inferences Using Texts∗,” n.d., 68.
  • Gilardi, Fabrizio, Theresa Gessler, Mael Kubli and Stefan Müller. “Social Media and Political Agenda Setting.” Work in Progress, 2020.
  • Grimmer, J. “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases.” Political Analysis 18, no. 1 (January 1, 2010): 1–35.
  • Hobbs, William R., and Margaret E. Roberts. “How Sudden Censorship Can Increase Access to Information.” American Political Science Review 112, no. 3 (August 2018): 621–36.
  • King, Gary, Jennifer Pan, and Margaret E. Roberts. “How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument.” American Political Science Review 111, no. 3 (August 2017): 484–501.
  • Loughran, Tim, and Bill Mcdonald. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” The Journal of Finance 66, no. 1 (2011): 35–65.
  • Peterson, Andrew, and Arthur Spirling. “Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26, no. 1 (January 2018): 120–28.
  • Proksch, Sven-Oliver, Will Lowe, Jens Wäckerle, and Stuart Soroka. “Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches.” Legislative Studies Quarterly 0, no. 0. Accessed January 20, 2019.
  • Proksch, Sven-Oliver, and Jonathan B. Slapin. “Parliamentary Questions and Oversight in the European Union.” European Journal of Political Research 50, no. 1 (January 1, 2011): 53–79.
  • Rossiter, Erin, Measuring Agenda Setting in Interactive Political Communications, working paper
  • Schmidt, Benjamin M. “Words Alone: Dismantling Topic Models in the Humanities.” Journal of Digital Humanities, April 5, 2013.
  • Schoonvelde, Martijn, Anna Brosius, Gijs Schumacher, and Bert N. Bakker. “Liberals Lecture, Conservatives Communicate: Analyzing Complexity and Ideology in 381,609 Political Speeches.” PLOS ONE 14, no. 2 (February 6, 2019).
  • Schwemmer, Carsten, and Oliver Wieczorek. “The Methodological Divide of Sociology: Evidence from Two Decades of Journal Publications.” Sociology 54, no. 1 (2020): 3–21.
  • Shugars, Sarah. “The Structure of Reasoning: Measuring Justification and Preferences in Text”, Working Paper, 26.
  • Shugars, Sarah, and Nicholas Beauchamp. “Why Keep Arguing? Predicting Engagement in Political Conversations Online.” SAGE Open 9, no. 1 (January 1, 2019): 2158244019828850.
  • Spirling, Arthur. “Democratization and Linguistic Complexity: The Effect of Franchise Extension on Parliamentary Discourse, 1832–1915.” The Journal of Politics 78, no. 1 (December 17, 2015): 120–36.
  • Spirling, Arthur. “U.S. Treaty Making with American Indians: Institutional Change and Relative Power, 1784–1911.” American Journal of Political Science 56, no. 1 (January 1, 2012): 84–97.
  • Terman, Rochelle. “Islamophobia and Media Portrayals of Muslim Women: A Computational Text Analysis of US News Coverage.” International Studies Quarterly 61, no. 3 (September 1, 2017): 489–502.
  • Watanabe, Kohei, and Yuan Zhou. “Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches.” Social Science Computer Review, February 21, 2020, 0894439320907027.
  • Wiedemann, Gregor. “Proportional Classification Revisited: Automatic Content Analysis of Political Manifestos Using Active Learning.” Social Science Computer Review, February 25, 2018, 0894439318758389.

Scraping Ethics

Beyond Text

  • Proksch, Sven-Oliver, Christopher Wratil, and Jens Wäckerle. “Testing the Validity of Automatic Speech Recognition for Political Text Analysis.” Political Analysis 27, no. 3 (July 2019): 339–59.
  • Webb Williams, Nora, Andreu Casas, and John D. Wilkerson. Images as Data for Social Science Research: An Introduction to Convolutional Neural Nets for Image Classification. 1st ed. Cambridge University Press, 2020.

Books on other Approaches to Text Analysis

Maybe you found that you do like text analysis but R and or quanteda are not for you. Here are some recommendations based on different packages or programming languages: