How Will Air Travel Rebound in 2021? (Time Series Forecast [Python, SQL, Docker])

The travel industry was one of the most directly impacted sectors of the economy during the pandemic as international flights abruptly ground to a halt in March 2020 and many domestic carriers significantly reduced their flights. I was curious how the travel industry could rebound this year (2021) so I created a time series forecast using official TSA Checkpoint data, which measures how many passengers are screened per day at U.S. airports, to predict the number of passengers that will be screened this year and specifically sought to find how this number will compare to pre-pandemic levels in 2019. I used the psycopg2 package to retrieve the data from a Postgres database within a Docker container and performed the analysis using FBProphet running in a Python virtual environment within a connected Docker container.

Full code:

Before running the forecast, I first visualized the TSA data to get an overall perspective of the data and ensure its accuracy. I noticed several interesting trends in the 2019 data including a decline in screenings during spring (likely due to seasonal fluctuations in travel), a steady rebound during the summer months and a noticeable spike around Thanksgiving. In the 2020 data, the precipitous decline in travel is clearly visualized followed by a steady but slow recovery into 2021.

Then, I declared the FBProphet model and fit the model to my original data. Next, I created a separate Dataframe with 365 future days and used the model to predict the number of screened passengers for one year into the future. Overall, the model predicts a relatively steady rebound in the number of passengers screened over the next year represented by the predicted forecast line in dark blue, the 80% confidence interval in light blue, and trend changepoints marked by dotted red lines.

However, the model appears to assign significant weight to the seasonal decline in screenings around March 2019 and the precipitous decline in March 2020. It is uncertain whether this pattern will necessarily repeat itself to such a significant degree next year although it predicts a similar decline around March-April 2022. Additionally, the model has made conservative predictions in the short-term; for example, it predicted that 513,774 passengers would be screened on May 1 when, in reality, 1,335,535 passengers were actually screened that day which is considerably greater than even the upper bound (959,733) of the prediction.

FBProphet automatically deconstructs the data into various seasonality components which show that there is clear weekly and yearly seasonality in the data. Specifically, the fewest passengers are screened on Tuesdays which makes sense because few people except business travelers are flying during the week. Conversely, the greatest number of passengers are screened on Thursdays and Fridays as passengers travel over the weekend and again on Sunday as they return home before the beginning of the week. Annually, the data is more complicated because I believe the model is still assigning too much significance to the March-May decline seen in 2019 and 2020. Travel does rebound during the summer months as more people take vacations before decreasing during early fall as school resumes and rising again during the holiday season, especially around Christmas.

Finally, we can use the results of the time series forecast to answer how many fewer people are predicted to be screened at U.S. airports this year compared to pre-pandemic levels in 2019. According to the model, roughly 420,000,000 fewer people will fly through U.S. airports during the 2021 calendar year than the 2019 calendar year though, as noted earlier, the model seems to take a conservative approach so it is possible that this difference will actually be smaller.

Possible Improvements:

I ran the model with the default settings so it is likely possible to achieve more accurate results by tuning certain model parameters such as seasonality. The FBProphet package includes cross-validation features that can be used to find the best values for each parameter. I only used one complete year (2019) of pre-pandemic data so the forecast would likely be more accurate if the model was trained on multiple years of past data to reflect the broader industry growth over the past decade. Alternatively, the forecast could be more accurate if the model was trained solely on 2019 data, which was less volatile than 2020, or post-March 2020 data after the initial sharp decline.

Further Questions:

As noted before, this data only includes passengers screened at U.S. airports so it primarily focuses on domestic travel. It would be interesting to analyze international data as well to see how travel patterns differ between countries as international flights begin to resume.

Domestically, it would also be interesting to compare air travel data with road trip data to analyze consumer hesitancy to fly which could result in more car travel than recent years if consumers view air travel as riskier to their health in the short-term.