Advancing predictive content analysis: a natural language processing and machine learning approach to television script data
Article
orcid.org/0000-0002-0055-1414This study introduces a predictive framework for estimating television episode viewership using machine learning and natural language processing applied to over 25,000 TV scripts. By analyzing linguistic and emotional features embedded in dialogue, the research identifies content patterns linked to audience viewership. Multiple regression models, including OLS, Lasso, Ridge, Elastic Net, Gradient Boosting, and XGBoost, are trained to forecast next-episode viewership, explaining up to 50% of variance at the genre level and 41% at the series level. These findings suggest that early-stage script analysis can offer
actionable insights for media development and marketing teams. Rather than viewing scripts solely as creative artifacts, this research highlights their potential as data assets for content strategy, allowing for more informed decisions in green lighting, promotion, and brand alignment.
Natural language processing, Content analytics, Seriality and engagement theory, TV scripts, Peak-end theory
All rights reserved (no additional license for public reuse)
English
Palomba, A. (2025). Advancing predictive content analysis: A natural language processing and machine learning approach to television script data. Journal of Marketing Analytics. https://doi.org/10.1057/s41270-025-00435-1
University of Virginia
August 2025