A Predictive Equities Model for the Process Industries
Backend code and data analytics in a private GitHub repository that I can share if you message me!
Objective
The objective of this project is to develop a predictive model that determines whether a company is a viable investment based on its expected growth in EBITDA. Using historical financial data extracted from 10-K statements across the oil and gas, manufacturing, and pharmaceutical sectors, the model is designed to serve as a practical investment tool. If a company is predicted to experience EBITDA growth in the following fiscal year, it is classified as a favorable investment. The final model architecture is based on an artificial neural network, selected for its ability to capture complex, non-linear relationships in financial data. Predictive accuracy is the primary benchmark for model performance, with prior PCA and PLS models used to inform, compare, and optimize the final ANN structure.
Discussion of Existing Technology and Methods
Traditional methods for predicting financial performance, like multiple linear or logistic regression, are valued for simplicity and interpretability but rely on assumptions (e.g., linearity, variable independence) that often don't hold in financial data [1]. These models struggle with multicollinearity, non-linear interactions, and high-dimensional datasets like 10-K statements, leading to overfitting or poor generalization. While tools like decision trees and support vector machines better capture non-linearities, they also face issues with interpretability and overfitting [2]. Moreover, many models ignore temporal dependencies, treating each year’s data independently and missing year-over-year trends. Dimensionality reduction methods like PCA and PLS help reduce noise and improve prediction by capturing latent structures and integrating response variables, but their linear nature limits their ability to model complex, non-linear financial relationships. [3].
Key Assumptions used in Technology Development
Several assumptions have been made in the development of the modeling framework. It is assumed that the financial data reported in statements is accurate and standardized across companies. In addressing missing data, an Iterative Imputer is employed under the assumption that the missingness is either random or explainable by observable features, allowing for reasonable estimation of missing values. The dataset is treated as stationary across fiscal years, implying that the underlying financial structures and relationships remain sufficiently stable over time. The decision to retain the top eight principal components assumes that these components capture most of the meaningful variation in the data. Additionally, it is assumed that the models developed can generalize to new, unseen companies within the defined sectors. Finally, company success is primarily assessed using EBITDA growth, with the underlying assumption that this metric serves as a reasonable proxy for overall financial health and investment attractiveness.