|Year : 2020 | Volume
| Issue : 4 | Page : 275-280
Models for forecasting the number of COVID cases in Indian states
Department of Statistics, Sri C. Achutha Menon Government College, Thrissur, Kerala, India
|Date of Submission||25-Jun-2020|
|Date of Decision||19-Jul-2020|
|Date of Acceptance||03-Aug-2020|
|Date of Web Publication||19-Oct-2020|
Prof. T Unnikrishnan
Department of Statistics, Sri C. Achutha Menon Government College, Thrissur, Kerala
Source of Support: None, Conflict of Interest: None
Introduction: Coronavirus, a worldwide pandemic today, is continuing its spread day by day. The only way that can be adopted at this stage is to control the number of cases to a minimum. Time series-forecasting models can enable the planners and administrators to foresee the picture and take timely action to control. Methodology: To forecast the daily number of COVID cases, prediction models were developed using Autoregressive-Integrated Moving Average (ARIMA) modelling techniques in the various states of India where the cases are highly reported. The main objectives included the assessment of trend and growth rates of number of COVID-19 cases confirmed and identification of the best ARIMA model for their prediction. Results: Excellent parsimonious forecasting equation for each state in India could be generated using the method. These models will be helpful for planning purposes in controlling the cases. The best model for the prediction of number of COVID cases in all over India was observed as ARIMA (0,2,1). ARIMA (0,1,0) was identified as best model for Mizoram and Puducherry. Conclusion: To predict all India cases, ARIMA (0,2,7) was identified as the best model.
Keywords: Autoregressive-integrated moving average, COVID-19 in India, forecasting, planning, timeseries
|How to cite this article:|
Unnikrishnan T. Models for forecasting the number of COVID cases in Indian states. Curr Med Issues 2020;18:275-80
| Introduction|| |
According to Bruhat Samhitha written by Varahamihira around 700AD, when Venus is seen during the Sunset in the west, the world is witnessing high pandemic. This was truly observed from 2019 November onward and is seen in the Taurus for a long period of time this year say till August 2020. Hence, such a prediction said about 1500 years ago is seemed to be true in this era. Many ancient rituals were restarted under new names and still the pandemic continues its travel.
Mean while locals believe that keeping a distance with every one and chewing betel leaves with arecanut and quicklime are best remedies to kill all creatures which are invisible tohumane eye.
Although the first confirmed positive case in India was reported on January 30 in Kerala, the spread of novel coronavirus in India has quickened after the half of March. After it the pattern shows it takes time to decrease quickly till a new drug is developed to kill the viruses. Here, a study was conducted to know how quickly the pattern changes its path and to know about the pattern of the increasing nature of number of COVID cases in India. For this daily data on confirmed cases in each state was collected from the website “ https://www.kaggle.com/sudalairajkumar/covid19-in-India” and compared with that of the official reports and used for the study.
| Materials and Methods|| |
Daily data on confirmed cases in each state of India were collected from the website “https://www.kaggle.com/sudalairajkumar/covid19-in-India” and compared with that of the official reports. Time series modelling using autoregressive integrated moving average (ARIMA) were done. These models have the advantage of getting high accuracy for short-term forecasts as it depends on its previous few values only. Due to this flexibility, it is very much suitable for short-term forecasting than other parametric models.
Autoregressive-integrated moving average models
A time series Xt is an ARIMA (p, d, q) process if there exists polynomials F and Q of degrees p and q, respectively, and a white noise series Zt such that the time series Dd Xt is stationary and F(B) Dd Xt= Q(B)Zt almost surely on the underlying probability space, where B denote the back shift operator B (Xt) =Xt-1.
The equation F(B)Dd Xt= Q(B)Zt can be expanded as:
Xt= f1Xt-1+ f2Xt-2+…+fpXt-p+ Zt+ q1Zt-1+ q2Zt-2 +…+qqZt-q
Box and Jenkins linear time series model was applied for the forecasting purpose. The Box-Jenkins methodology for analyzing and modeling time series is characterized by the four steps: (1) Model identification, (2) model estimation, (3) diagnostic checking, and (4) forecasting. The principal objective of developing an ARIMA model for a variable is to generate postsample period forecasts for that variable. Its strength lies in the fact that the method is suitable for any time series with any pattern of change and it does not require the forecaster to choose apriori the value of any parameter. Its limitations include its requirement of a long-time series and short-term prediction.
For a stationary time series, the autocovariance and autocorrelation at a lag kÎZ are defined by gX(k) = cov(Xt + k, Xt) and rX(k)) = r(Xt+ k, Xt) = gX(k))/gX(0)), where gX (0)= Var (Xt) and rX (0) = 1. The partial autocorrelation at a lag k is defined as the correlation between Xk -Pk-1(Xk) and X0-Pk-1(X0), where Pk is the projection of the vector yÎk on the subspace spanned by (X1, X2,…Xk) in Rk which is a linear combination y = Sbjxj such that //y-y^// is minimal. This is the correlation due to intermediate values X1, X2,…Xk-1 removed.
The precise estimates of the parameters of the model are obtained by the method of ordinary least squares as advocated by Box and Jenkins iterative procedure for finding the estimate through SPSS (Statistical Package).
Different models can be obtained for the various combinations of AR and MA individually and collectively. The best model is obtained with the following diagnostics such as coefficient of determination (R2), Akaike information criteria, Bayesian information criteria, Portmonteau tests - Box Pierce or Ljung-Box Q-tests, percentage forecast inaccuracy, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The smaller the values of MAE, RMSE, and MAPE, better the model is considered to be.
Chandran and Pandey have studied the seasonal fluctuation of potato price in Delhi. Ghosh and Prajneshu have studied the price fluctuation of onion. Price of oil palm was predicted efficiently using ARIMA (2, 1, 0) model by Nochai, R and Nochai, T. ARIMA model for forecasting wholesale price of oil palm was ARIMA (1,0,1) and pure oil price of oil palm was ARIMA (3,0,0). Sen, et al. found that time series modelling and forecasting of Black Pepper price could be done using ARIMA (1,0,0). Unnikrishnan developed ARIMA models for forecasting area, production, productivity, and price of major crops of Kerala.
| Results and Discussion|| |
Forecasting models for the confirmed cases in all the states of India were developed using data up to July 18, 2020. All the models were of R2 above 99% and indicate 99% of the variations in the data can be predicted by the model. The model was used to forecast values up to July 31. To validate the model MAPE, RMSE and MAE were calculated with the forecasted and actual values. The statistics showed that the models are very apt for forecasting purposes.
Forecasts and estimated values were calculated for all India data using the parameters estimated and plotted the actual and estimated values as given in [Figure 1]. The plot shows that the actual and estimated values go side by side as given in blue- and red-colored lines. Hence, the model can be very much suited for forecasting COVID cases knowing the number of COVID cases in the past few days. This model can be used to estimate the number of COVID cases for at 10 days by substituting forecasted values instead of actual values in the data. Once the actual data are arrived, the estimate can be replaced by the actual value and the rest can be forecasted as said earlier.
|Figure 1: Actual (blue) and estimated (red) confirmed COVID cases in India.|
Click here to view
COVID cases recorded high-positive growth rates in March 16, 2020, March 21, 2020, and April 02, 2020. After May 21, 2020, the growth rate shows a decreasing pattern [Figure 2]. Even though increase in number of cases, since growth rate decreases, it will end in coming months.
ARIMA (0, 2, 1) was identified as the best forecasting model for estimating the number of COVID cases in most of the states in India. Whereas ARIMA (0, 1, 0) was identified as the best model for Mizoram and Puducherry. To predict all India cases, ARIMA (0, 2, 7) was identified as the best model. For the other states, Assam: ARIMA (0, 1, 1), Goa: ARIMA (1, 1, 1), Dadra Nagar Haveli Daman Diu: ARIMA (4, 1, 0), Jharkhand: ARIMA (2, 1, 1), Manipur : ARIMA (0, 1, 4), Arunachal Pradesh: ARIMA (0, 2, and 13) and Nagaland: ARIMA (0, 2, 10) were the best model found for forecasting purposes. The final models for forecasting are given in [Table 1].
|Table 1: Autoregressive integrated Moving Average models for forecasting daily number of COVID cases in each state of India|
Click here to view
Here, X(t) denote the number of cases in the tth day and e(t) = Actual X(t) – Forecasted X(t). Also X(t-k) denote the number of cases in the t-kth day and e(t-k) is the error in forecasting in (t-k)th day. Ln denote natural logarithm and Sqrt(X[t]) denote the square root of X(t) in the model.
The residual ACF and residual PACF showed that the model is best for forecasting the number of COVID cases in India. Using this forecast for COVID cases in the coming 10 days could be given as in the [Table 2]. The forecasts also show that there is a chance for reduction in the COVID cases as growth rates shows a decreasing pattern in the state.
|Table 2: Forecast for the next 10 days generated using auto regressive integrated moving average modeling|
Click here to view
| Conclusion|| |
ARIMA (0, 2, 1) was identified as the best forecasting model for estimating the number of COVID cases in most of the states in India. The forecasting power of ARIMA model was used to forecast for 10 leading days, and the results showed a good agreement between actual and predicted values. The estimated and forecasted values show that there is a chance for the reduction in the COVID cases as growth rates shows a decreasing pattern in the state. The reasons may be due to less contact cases even in the increasing number of returnees from the other states and countries. The changing pattern in the number of COVID cases in India can also be identified with this model. Forecasts can also be made using the early forecasted values as the models can explain more than 99% of variability. Knowing the previous number of cases, the Government can take decision by the next 10 or more days based on the values estimated using the model, which will cradle the government to tide over the grim situations with easy planning
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
No studies involving human subjects and/or animals were conducted for this research.
| References|| |
Varahamihira (700AD). Bruhat Samhitha, Malayalam Translation Namboothiri P.P. (Ed): Devi Bookstall; 1935.
Box GE, Jenkins GM. Time Series Analysis: Forecasting and Control. SanFrancisco: Holden-Day; 1976.
Chandran KP, Pandey NK. Potato price forecasting using seasonal ARIMA approach. Potato J 2007;34:137-8.
Ghosh H, Prajneshu. non-linear time series modeling of volatile onion price data using AR (p)-ARCH (q)-in mean. Calcutta Stat Assoc Bull 2003;54:231-47.
Nochai R, Nochai T. ARIMA Model for Forecasting Oil Palm Price.2nd
IMT GT Regional Conference on Mathematics, Statistics and Applications, Univ. Sains Malaysia; 13-15, June, 2006
Sen LK, Shitan M, Hussain H. Time Series Modeling and Forecasting of Sarawak Black Pepper Price, Malaysia; 2000.
Unnikrishnan T. Changing Scenario of Kerala Agriculture – An over view, (MSc Thesis), Kerala Agricultural University; 2009.
[Figure 1], [Figure 2]
[Table 1], [Table 2]