Take Action

Home | Faculty & Research Overview | Research

Research Details

Using Python for Text Analysis in Accounting Research, Foundations and Trends in Accounting

Abstract

The prominence of textual data in accounting research has increased dramatically. To assist researchers in understanding and using textual data, this monograph defines and describes common measures of textual data and then demonstrates the collection and processing of textual data using the Python programming language. The monograph is replete with sample code that replicates textual analysis tasks from recent research papers. In the first part of the monograph, we provide guidance on getting started in Python. We first describe Anaconda, a distribution of Python that provides the requisite libraries for textual analysis, and its installation. We then introduce the Jupyter notebook, a programming environment that improves research workflows and promotes replicable research. Next, we teach the basics of Python programming and demonstrate the basics of working with tabular data in the Pandas package. The second part of the monograph focuses on specific textual analysis methods and techniques commonly used in accounting research. We first introduce regular expressions, a sophisticated language for finding patterns in text. We then show how to use regular expressions to extract specific parts from text. Next, we introduce the idea of transforming text data (unstructured data) into numerical measures representing variables of interest (structured data). Specifically, we introduce dictionary-based methods of (1) measuring document sentiment, (2) computing text complexity, (3) identifying forward-looking sentences and risk disclosures, (4) collecting informative numbers in text, and (5) computing the similarity of different pieces of text. For each of these tasks, we cite relevant papers and provide code snippets to implement the relevant metrics from these papers. Finally, the third part of the monograph focuses on automating the collection of textual data. We introduce web scraping and provide code for downloading filings from EDGAR.

Type

Article

Author(s)

Andrew J. Leone, Vic Anand, Khrystyna Bochkay, Roman Chychyla

Date Published

2020

Citations

Leone, Andrew J., Vic Anand, Khrystyna Bochkay, and Roman Chychyla. 2020. Using Python for Text Analysis in Accounting Research. Foundations and Trends in Accounting.(3-4): 128-359.

KELLOGG INSIGHT

Explore leading research and ideas

Find articles, podcast episodes, and videos that spark ideas in lifelong learners, and inspire those looking to advance in their careers.
learn more

COURSE CATALOG

Review Courses & Schedules

Access information about specific courses and their schedules by viewing the interactive course scheduler tool.
LEARN MORE

DEGREE PROGRAMS

Discover the path to your goals

Whether you choose our Full-Time, Part-Time or Executive MBA program, you’ll enjoy the same unparalleled education, exceptional faculty and distinctive culture.
learn more