Tiziano Rotesi

Text as Data for Social Science Research
SOC 2070 - Brown University

Course Outline:

The explosion of accessible digital text is rapidly changing the work of researchers interested in studying culture, decision making and human interaction. For example, the narratives, debates, laws, and opinions that form the core of political discourse are predominantly text-based, emphasizing the need to understand what is being communicated and written. These data provide a complementary dimension to the more traditional, structured datasets typically used in social science research. From analyzing news articles and social media posts to understand public sentiment and opinion, to examining online forums and comment sections to gain insights into community dynamics and social issues, the applications of text analysis are broad and impactful.

This graduate-level course provides an overview and hands-on experience of the methods that comprise the essential toolkit for text analysis. Aimed at equipping students with practical skills, the course covers a wide range of topics, including data collection strategies and ethical considerations related to text analysis. From the perspective of social science researchers, the course explores various methods to discover patterns, measure variables of interest, and assess causal relationships using textual data. Through theoretical discussions, engagement with recent literature, and practical exercises, students will gain the necessary knowledge and expertise to effectively analyze text data in their own research.

Textbook:

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press. (GRS)
This is the main reference for most of the course. It does not cover some of the topics in the second part of the course, for which we will need to use other material.

Jurafsky, D., & Martin, J. H. (2023). Speech and Language Processing, 3rd Edition.
Available online HERE. (JM)

Bird, S., Klein, E., & Loper, E. (2019). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit.
Available online HERE. (BKL)
This book is a good reference point for the basic NLP tasks in Python.

Calendar:

Optional Readings:

Python: Intro to Python.



Required Readings:

Python: Web Scraping, APIs, loading and cleaning data.


Optional Readings:

Python: Tokenization, dictionary methods, sentiment analysis, mutual and information.


Required Readings:

Optional Readings:

Python: Multinomial Models and Vector Models.


Required Readings:

Optional Readings:

Python: PCA, Topic Models.


Required Readings:

Optional Readings:

Python: Machine learning methods applied to text.


Required Readings:

Optional Readings:

Python: Word embeddings.


Required Readings:

Optional Readings:

Python: Parsing, Named Entities, Semantic Role Labeling, application to gendered language.


Required Readings:

Optional Readings:

Python: Transformers and LLMs.


Optional Readings:


Readings: