Take Action

Home | Faculty & Research Overview | Research

Research Details

Measuring Document Similarity with Weighted Averages of Word Embeddings, Explorations in Economic History

Abstract

We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.

Type

Article

Author(s)

Bryan Seegmiller, Dimitris Papanikolaou, Lawrence Schmidt

Date Published

2023

Citations

Seegmiller, Bryan, Dimitris Papanikolaou, and Lawrence Schmidt. 2023. Measuring Document Similarity with Weighted Averages of Word Embeddings. Explorations in Economic History.

KELLOGG INSIGHT

Explore leading research and ideas

Find articles, podcast episodes, and videos that spark ideas in lifelong learners, and inspire those looking to advance in their careers.
learn more

COURSE CATALOG

Review Courses & Schedules

Access information about specific courses and their schedules by viewing the interactive course scheduler tool.
LEARN MORE

DEGREE PROGRAMS

Discover the path to your goals

Whether you choose our Full-Time, Part-Time or Executive MBA program, you’ll enjoy the same unparalleled education, exceptional faculty and distinctive culture.
learn more