Regression Based Alpha Strategies Using Sentiment

Alexandria Technology Research from Kaizhen Tan, Quantitative Analyst & Chris Kantos, Managing Director

  • Using regression-based analysis on news sentiment[1] can provide higher returns while limiting maximum monthly drawdown and risk.
  • Alexandria uses an elastic net regression calibrated on a rolling 10-year window to rank stocks by sentiment; the rolling window helps to capture evolving market conditions and regime changes.
  • Running a long-short strategy using the calibrated sentiment data consistently outperforms the market.

Using sentiment derived from news, Alexandria looks at how they can best utilize this data and create strategies that will produce alpha and limit downside risk. They can produce these outcomes using statistical modeling that considers volume, sentiment, topics, as well as sensitivities topics. Alexandria constructs a simple trading strategy based on sentiment to see how their regression-based approach improves upon the results.

Simple Sentiment-based Strategy

Alexandria first builds a simple strategy based on news sentiment data. In their analysis, they look at the S&P 1500 from January 2000 to February 2022. To construct the signal, they aggregate sentiment, agnostic of topic, over a period of 16:00 (Day-1) to 15:30 (Day-0) and apply a relevance filter on the sentiment data of 1, limiting data to sentiment corresponding to a single security. They then calculate their net sentiment score for all securities mentioned in the news as follows:

Net Sentiment = Log[(Count Positive +1)/(Count Negative +1)]

With the net sentiment scores, they sort the group into quintiles, equal weighting the securities in each. They then trade their basket Q1 long, and Q5 short 30 minutes prior to the close on D0. They repeat over sample period 2000-2022. The summarised results are as follows:

Quantile Summary for all Periods: S&P 1500
Cumulative Log Returns – Simple Sentiment Strategy
S&P 1500 Top Bottom Spread Return Distribution

Topic Regression Sentiment Strategy

Alexandria then takes the approach to account for topics while calculating their net sentiment scores. To do this, they use a restricted linear regression to calculate sensitivities to each topic in their ontology. They use elastic net for the regression, combining characteristics of both Ridge and Lasso techniques, giving them feature selection and better predictive power than other techniques.

To calculate sensitivities, they regress the sentiment against security returns on a rolling ten-year basis. These sensitivities are then used out of sample on the following year[2], with data from 2010 onwards out of sample. From this, they apply the sensitivities to incoming news to calculate and aggregate a daily net sentiment score.

They split the group into quintiles daily, going long Quintile 1 and Short Quintile 5 at 15:30 daily. They have the following results:

Quantile Summary for all Periods: S&P 1500
Cumulative Log Returns – Topic Sentiment Regression
S&P 1500 Top Bottom Spread Return Distribution

Compared with Alexandria’s simple approach to using sentiment, using a topic regression not only provides higher cumulative returns over the period, but limits the maximum monthly drawdown encountered as well. During the tech selloff experienced in the early months of 2022, the simple strategy produces annualized returns of -13.75%, while the regression strategy is +15.26% annualized. During the first year of the COVID-19 pandemic, the simple strategy has annualized volatility of 10.52%, while the regression approach limits risk to 7.89%.

From Alexandria’s analysis, using a regression-based approach with sentiment data can provide clear and substantial benefits both from a risk and return perspective. This approach is not just limited to news, but also macro, commodity, and longer-term strategies as well. For more information on Topic Regression, and to obtain the python notebook to replicate this study, please contact Alexandria here.

[1] News sentiment derived from Dow Jones Newswires (

[2] For example, the sensitivities for 2022 news are calculated from regressing over 2011-2021.