Background: Understanding the trend of the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) is becoming crucial. Previous studies focused on predicting COVID-19 trends, but few papers have considered models for disease estimation and progression based on large real-world data.
Methods: We used de-identified data from 60,938 employees of a major financial institution in Italy with daily COVID-19 status information between 31 March 2020 and 31 August 2021. We consider six statuses: (i) concluded case, (ii) confirmed case, (iii) close contact, (iv) possible-probable contact, (v) possible contact, and (vi) no-COVID-19 or infection. We conducted a logistic regression to assess the odds ratio (OR) of transition to confirmed COVID-19 case at each time point. We also fitted a general model for disease progression via the multi-state transition probability model at each time point, with lags of 7 and 15 days.
Results: Employment in a branch versus in a central office was the strongest predictor of case or contact status, while no association was detected with gender or age. The geographic prevalence of possible-probable contacts and close contacts was predictive of the subsequent risk of confirmed cases. The status with the highest probability of becoming a confirmed case was concluded case (12%) in April 2020, possible-probable contact (16%) in November 2020, and close contact (4%) in August 2021. The model based on transition probabilities predicted well the rate of confirmed cases observed 7 or 15 days later.
Conclusion: Data from industry-based surveillance systems may effectively predict the risk of subsequent infection.