Contributor
Samuel Borms

Sentometrics: An integrated framework for text based multivariate time series modeling and forecasting


Mentors
kboudt, Keven Bluteau, ArdiaD
Organization
R project for statistical computing

This project leads to the creation of the Sentometrics package that is designed to do time series analysis based on textual sentiment. Time series modeling using sentiment from text requires its own package because of the intrinsic challenge that for a given text we can compute sentiment in hundreds of different ways, as well as the large number of possibilities to pool sentiment across text and time. This additional layer of manipulation does not exist in standard time series analysis. The package aims to derive the optimal sentiment extraction and aggregation for the forecasting task. Aggregation can be optimized across several dimensions, for example word term weighting schemes or time lag structures. The package created therefore integrates the qualification of sentiment from text, the aggregation into different sentiment measures and the optimized forecasting based on these measures. No such integrated textual sentiment forecasting approach exists in any R package. The goal is to provide an automated means to measure the impact of sentiment in texts on a given variable, obtaining readily interpretable and useful outputs and an object to play with, when using this package.