Supervised Latent Dirichlet Allocation

Exchange Rate Supervised Topic Extraction

Winner of the Hiran C. Haney Fellowship Award in Economics, University of Pennsylvania, 2022 This paper shows how to use a hybrid of supervised and unsupervised learning models to go from text from news articles to an FX news index that can be used to enhance traditional models from the FX literature. To do so we rely on Supervised Latent Dirichlet Allocation (sLDA) (Blei and Mcauliffe (2008)) which combines information about a supervising variable with topic extraction over a corpus of text in a single-stage estimation. Although this estimation can be done in two stages, we document with a Monte Carlo simulation that there are efficiency gains from a single-stage approach. The empirical application we suggest is centered around the Monex Market, the main Costa Rican platform for FX trade; accordingly news articles are gathered from the main Costa Rican newspapers. The exchange rate of interest is the Costa Rican Colón (CRC), the local currency, and the United States dollar (USD). Using the CRC/USD exchange rate as the supervising variable we suggest using sLDA to extract the topics from the news article corpus that are relevant as covariates for the exchange rate over short frequencies.