European Society of Medicine, to appear
Simmons, S. J., Bastin, K., LaBarr, A., & Healey, C. G. A comparison of the prediciton capabilites of large scale time series algorithms. European Society of Medicine, to appear.
Simmons, Susan J., Kornelia Bastin, Aric LaBarr, and Christopher G. Healey. “A Comparison of the Prediciton Capabilites of Large Scale Time Series Algorithms.” European Society of Medicine (n.d.): to appear.
Simmons, Susan J., et al. “A Comparison of the Prediciton Capabilites of Large Scale Time Series Algorithms.” European Society of Medicine, pp. to appear.
Since December 31, 2020, the world has closely monitored the progress and outcomes of the SARS-CoV-2 coronavirus (COVID). This paper focuses on two goals. First, we compare time series algorithms for predicting fatalities during the COVID pandemic. Second, we examine how domain affects algorithm choice by comparing our COVID results to historical and current weekly temperature data analyses. Critical interest revolves around tracking and predicting the effects of COVID. Throughout the past three years, many researchers have created models and built visualizations to observe this disease’s progression and impact, both regionally and worldwide. Researchers have recently proposed using machine learning to forecast the progression of COVID. With the increased interest in time series methods and the different algorithms available, this paper explores these techniques’ accuracy and computational expense. We compare time series analysis approaches for forecasting COVID fatalities from March 11, 2020, to December 28, 2021. The time series models we include are those that can be automatically created to scale to large datasets. Statistical analysis is used to identify significant differences in performance. To investigate generalizability, we apply the same algorithms to predict temperature data, a standard example dataset due to its seasonal and trend components. An analysis is performed both for historical data (1970s) and current data (2020s). Results allow us to: (1) identify significant differences in algorithm performance versus pandemic data with different time series patterns; (2) examine the performance of time series algorithms trained on shorter, constant-length training sets; and (3) determine whether variations in temperature due to climate changes affect how temperature data should now be predicted. We conclude by discussing how domain and data patterns inform the decision of which time series algorithms to consider when predicting future events from historical or existing data. Our results illustrate that no one method is always the best. Careful consideration of the data’s domain, the time period in question, and the length of time to analyze must be considered when deciding which algorithm to choose.