This is not a list of required readings for the course - instead the resources are meant to give you a starting point for getting into text analysis and scraping and finding interesting applications in your field. If you think something is missing - ping me and I will add.

Basics

Overviews of the Field

  • Atteveldt, Wouter van, and Tai-Quan Peng. “When Communication Meets Computation: Opportunities, Challenges, and Pitfalls in Computational Communication Science.” Communication Methods and Measures 12, no. 2–3 (April 3, 2018): 81–92. https://doi.org/10.1080/19312458.2018.1458084.
  • Fréchet, Nadjim, Justin Savoie, and Yannick Dufresne. “Analysis of Text-Analysis Syllabi: Building a Text-Analysis Syllabus Using Scaling.” PS: Political Science & Politics, undefined/ed, 1–6. https://doi.org/10.1017/S1049096519001732.
  • Gentzkow, Matthew, Bryan Kelly, and Matt Taddy. “Text as Data.” Journal of Economic Literature 57, no. 3 (September 1, 2019): 535–74. https://doi.org/10.1257/jel.20181020.
  • Grimmer, J., and B. M. Stewart. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21, no. 3 (July 1, 2013): 267–97. https://doi.org/10.1093/pan/mps028.
  • Schoonvelde, Martijn, Gijs Schumacher, and Bert N. Bakker. “Friends With Text as Data Benefits: Assessing and Extending the Use of Automated Text Analysis in Political Science and Political Psychology.” Journal of Social and Political Psychology 7, no. 1 (February 8, 2019): 124-143–143. https://doi.org/10.5964/jspp.v7i1.964.

Text Books and Cheat Sheets

Packages

  • Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. “Quanteda: An R Package for the Quantitative Analysis of Textual Data.” Journal of Open Source Software 3, no. 30 (October 6, 2018): 774. https://doi.org/10.21105/joss.00774.
  • Roberts, Margaret E., Brandon M. Stewart, and Dustin Tingley. “Stm: R Package for Structural Topic Models.” Journal of Statistical Software, 2013.

Key Methods

  • Barberá, Pablo, Amber E. Boydstun, Suzanna Linn, Ryan McMahon, and Jonathan Nagler. “Automated Text Classification of News Articles: A Practical Guide.” Political Analysis, undefined/ed, 1–24. https://doi.org/10.1017/pan.2020.8.
  • Cranmer, Skyler J. “Introduction to the Virtual Issue: Machine Learning in Political Science,” n.d., 9.
  • Denny, Matthew James, and Arthur Spirling. “Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, January 25, 2017. https://papers.ssrn.com/abstract=2849145.
  • Grimmer, Justin, and Gary King. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences 108, no. 7 (February 15, 2011): 2643–50. https://doi.org/10.1073/pnas.1018067108.
  • Monroe, B. L., and P. A. Schrodt. “Introduction to the Special Issue: The Statistical Analysis of Political Text.” Political Analysis 16, no. 4 (October 4, 2008): 351–55. https://doi.org/10.1093/pan/mpn017.
  • Muddiman, Ashley, Shannon C. McGregor, and Natalie Jomini Stroud. “(Re)Claiming Our Expertise: Parsing Large Text Corpora With Manually Validated and Organic Dictionaries.” Political Communication 0, no. 0 (November 7, 2018): 1–13. https://doi.org/10.1080/10584609.2018.1517843.
  • Roberts, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G. Rand. “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58, no. 4 (October 1, 2014): 1064–82. https://doi.org/10.1111/ajps.12103.
  • Rodman, Emma. “A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors.” Political Analysis, undefined/ed, 1–25. https://doi.org/10.1017/pan.2019.23.
  • Slapin, Jonathan B., and Sven-Oliver Proksch. “A Scaling Model for Estimating Time-Series Party Positions from Texts.” American Journal of Political Science 52, no. 3 (July 1, 2008): 705–22. https://doi.org/10.1111/j.1540-5907.2008.00338.x.

Interesting Applications

Applications mentioned in class and other interesting applications - this list is by no means complete and misses a lot of relevant and great research.

  • Anastasopoulos, L. Jason, and Anthony M. Bertelli. “Understanding Delegation Through Machine Learning: A Method and Application to the European Union.” American Political Science Review, undefined/ed, 1–11. https://doi.org/10.1017/S0003055419000522.
  • Bauer, Paul C., Pablo Barberá, Kathrin Ackermann, and Aaron Venetz. “Is the Left-Right Scale a Valid Measure of Ideology?” Political Behavior 39, no. 3 (2017): 553–83.
  • Beltran, Javier, Aina Gallego, Alba Huidobro, Enrique Romero, and Lluís Padró. “Male and Female Politicians on Twitter: A Machine Learning Approach.” European Journal of Political Research n/a, no. n/a. Accessed March 24, 2020. https://doi.org/10.1111/1475-6765.12392.
  • Benoit, Kenneth, Kevin Munger, and Arthur Spirling. “Measuring and Explaining Political Sophistication through Textual Complexity.” American Journal of Political Science 63, no. 2 (2019): 491–508. https://doi.org/10.1111/ajps.12423.
  • Burscher, Bjorn, Rens Vliegenthart, and Claes H. De Vreese. “Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?” The ANNALS of the American Academy of Political and Social Science 659, no. 1 (May 1, 2015): 122–31. https://doi.org/10.1177/0002716215569441.
  • DiMaggio, Paul, Manish Nag, and David Blei. “Exploiting Affinities between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of U.S. Government Arts Funding.” Poetics, Topic Models and the Cultural Sciences, 41, no. 6 (December 2013): 570–606. https://doi.org/10.1016/j.poetic.2013.08.004.
  • Egami, Naoki, Christian J Fong, Justin Grimmer, Margaret E Roberts, and Brandon M Stewart. “How to Make Causal Inferences Using Texts∗,” n.d., 68.
  • Gilardi, Fabrizio, Theresa Gessler, Mael Kubli and Stefan Müller. “Social Media and Political Agenda Setting.” Work in Progress, 2020.
  • Grimmer, J. “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases.” Political Analysis 18, no. 1 (January 1, 2010): 1–35. https://doi.org/10.1093/pan/mpp034.
  • Hobbs, William R., and Margaret E. Roberts. “How Sudden Censorship Can Increase Access to Information.” American Political Science Review 112, no. 3 (August 2018): 621–36. https://doi.org/10.1017/S0003055418000084.
  • King, Gary, Jennifer Pan, and Margaret E. Roberts. “How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument.” American Political Science Review 111, no. 3 (August 2017): 484–501. https://doi.org/10.1017/S0003055417000144.
  • Loughran, Tim, and Bill Mcdonald. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” The Journal of Finance 66, no. 1 (2011): 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x.
  • Peterson, Andrew, and Arthur Spirling. “Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26, no. 1 (January 2018): 120–28. https://doi.org/10.1017/pan.2017.39.
  • Proksch, Sven-Oliver, Will Lowe, Jens Wäckerle, and Stuart Soroka. “Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches.” Legislative Studies Quarterly 0, no. 0. Accessed January 20, 2019. https://doi.org/10.1111/lsq.12218.
  • Proksch, Sven-Oliver, and Jonathan B. Slapin. “Parliamentary Questions and Oversight in the European Union.” European Journal of Political Research 50, no. 1 (January 1, 2011): 53–79. https://doi.org/10.1111/j.1475-6765.2010.01919.x.
  • Rossiter, Erin, Measuring Agenda Setting in Interactive Political Communications, working paper
  • Schmidt, Benjamin M. “Words Alone: Dismantling Topic Models in the Humanities.” Journal of Digital Humanities, April 5, 2013. http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/.
  • Schoonvelde, Martijn, Anna Brosius, Gijs Schumacher, and Bert N. Bakker. “Liberals Lecture, Conservatives Communicate: Analyzing Complexity and Ideology in 381,609 Political Speeches.” PLOS ONE 14, no. 2 (February 6, 2019). https://doi.org/10.1371/journal.pone.0208450.
  • Schwemmer, Carsten, and Oliver Wieczorek. “The Methodological Divide of Sociology: Evidence from Two Decades of Journal Publications.” Sociology 54, no. 1 (2020): 3–21.
  • Shugars, Sarah. “The Structure of Reasoning: Measuring Justification and Preferences in Text”, Working Paper, 26.
  • Shugars, Sarah, and Nicholas Beauchamp. “Why Keep Arguing? Predicting Engagement in Political Conversations Online.” SAGE Open 9, no. 1 (January 1, 2019): 2158244019828850. https://doi.org/10.1177/2158244019828850.
  • Spirling, Arthur. “Democratization and Linguistic Complexity: The Effect of Franchise Extension on Parliamentary Discourse, 1832–1915.” The Journal of Politics 78, no. 1 (December 17, 2015): 120–36. https://doi.org/10.1086/683612.
  • Spirling, Arthur. “U.S. Treaty Making with American Indians: Institutional Change and Relative Power, 1784–1911.” American Journal of Political Science 56, no. 1 (January 1, 2012): 84–97. https://doi.org/10.1111/j.1540-5907.2011.00558.x.
  • Terman, Rochelle. “Islamophobia and Media Portrayals of Muslim Women: A Computational Text Analysis of US News Coverage.” International Studies Quarterly 61, no. 3 (September 1, 2017): 489–502. https://doi.org/10.1093/isq/sqx051.
  • Watanabe, Kohei, and Yuan Zhou. “Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches.” Social Science Computer Review, February 21, 2020, 0894439320907027. https://doi.org/10.1177/0894439320907027.
  • Wiedemann, Gregor. “Proportional Classification Revisited: Automatic Content Analysis of Political Manifestos Using Active Learning.” Social Science Computer Review, February 25, 2018, 0894439318758389. https://doi.org/10.1177/0894439318758389.

Scraping Ethics

Beyond Text

  • Proksch, Sven-Oliver, Christopher Wratil, and Jens Wäckerle. “Testing the Validity of Automatic Speech Recognition for Political Text Analysis.” Political Analysis 27, no. 3 (July 2019): 339–59. https://doi.org/10.1017/pan.2018.62.
  • Webb Williams, Nora, Andreu Casas, and John D. Wilkerson. Images as Data for Social Science Research: An Introduction to Convolutional Neural Nets for Image Classification. 1st ed. Cambridge University Press, 2020. https://doi.org/10.1017/9781108860741.

Books on other Approaches to Text Analysis

Maybe you found that you do like text analysis but R and or quanteda are not for you. Here are some recommendations based on different packages or programming languages: