ORIGINAL ARTICLE Year : 2020  Volume : 18  Issue : 4  Page : 275280 Models for forecasting the number of COVID cases in Indian states T Unnikrishnan Department of Statistics, Sri C. Achutha Menon Government College, Thrissur, Kerala, India Correspondence Address: Introduction: Coronavirus, a worldwide pandemic today, is continuing its spread day by day. The only way that can be adopted at this stage is to control the number of cases to a minimum. Time seriesforecasting models can enable the planners and administrators to foresee the picture and take timely action to control. Methodology: To forecast the daily number of COVID cases, prediction models were developed using AutoregressiveIntegrated Moving Average (ARIMA) modelling techniques in the various states of India where the cases are highly reported. The main objectives included the assessment of trend and growth rates of number of COVID19 cases confirmed and identification of the best ARIMA model for their prediction. Results: Excellent parsimonious forecasting equation for each state in India could be generated using the method. These models will be helpful for planning purposes in controlling the cases. The best model for the prediction of number of COVID cases in all over India was observed as ARIMA (0,2,1). ARIMA (0,1,0) was identified as best model for Mizoram and Puducherry. Conclusion: To predict all India cases, ARIMA (0,2,7) was identified as the best model.
Introduction According to Bruhat Samhitha written by Varahamihira around 700AD,[1] when Venus is seen during the Sunset in the west, the world is witnessing high pandemic. This was truly observed from 2019 November onward and is seen in the Taurus for a long period of time this year say till August 2020. Hence, such a prediction said about 1500 years ago is seemed to be true in this era. Many ancient rituals were restarted under new names and still the pandemic continues its travel. Mean while locals believe that keeping a distance with every one and chewing betel leaves with arecanut and quicklime are best remedies to kill all creatures which are invisible tohumane eye. Although the first confirmed positive case in India was reported on January 30 in Kerala, the spread of novel coronavirus in India has quickened after the half of March. After it the pattern shows it takes time to decrease quickly till a new drug is developed to kill the viruses. Here, a study was conducted to know how quickly the pattern changes its path and to know about the pattern of the increasing nature of number of COVID cases in India. For this daily data on confirmed cases in each state was collected from the website “ https://www.kaggle.com/sudalairajkumar/covid19inIndia” and compared with that of the official reports and used for the study. Materials and Methods Daily data on confirmed cases in each state of India were collected from the website “https://www.kaggle.com/sudalairajkumar/covid19inIndia” and compared with that of the official reports. Time series modelling using autoregressive integrated moving average (ARIMA) were done. These models have the advantage of getting high accuracy for shortterm forecasts as it depends on its previous few values only. Due to this flexibility, it is very much suitable for shortterm forecasting than other parametric models. Autoregressiveintegrated moving average models A time series Xt is an ARIMA (p, d, q) process if there exists polynomials F and Q of degrees p and q, respectively, and a white noise series Zt such that the time series Dd Xt is stationary and F(B) Dd Xt= Q(B)Zt almost surely on the underlying probability space, where B denote the back shift operator B (Xt) =Xt1. The equation F(B)Dd Xt= Q(B)Zt can be expanded as: Xt= f1Xt1+ f2Xt2+…+fpXtp+ Zt+ q1Zt1+ q2Zt2 +…+qqZtq Box and Jenkins[2] linear time series model was applied for the forecasting purpose. The BoxJenkins methodology for analyzing and modeling time series is characterized by the four steps: (1) Model identification, (2) model estimation, (3) diagnostic checking, and (4) forecasting. The principal objective of developing an ARIMA model for a variable is to generate postsample period forecasts for that variable. Its strength lies in the fact that the method is suitable for any time series with any pattern of change and it does not require the forecaster to choose apriori the value of any parameter. Its limitations include its requirement of a longtime series and shortterm prediction. Identification For a stationary time series, the autocovariance and autocorrelation at a lag kÎZ are defined by gX(k) = cov(Xt + k, Xt) and rX(k)) = r(Xt+ k, Xt) = gX(k))/gX(0)), where gX (0)= Var (Xt) and rX (0) = 1. The partial autocorrelation at a lag k is defined as the correlation between Xk Pk1(Xk) and X0Pk1(X0), where Pk is the projection of the vector yÎk on the subspace spanned by (X1, X2,…Xk) in Rk which is a linear combination y = Sbjxj such that //yy^// is minimal. This is the correlation due to intermediate values X1, X2,…Xk1 removed. Estimation The precise estimates of the parameters of the model are obtained by the method of ordinary least squares as advocated by Box and Jenkins iterative procedure for finding the estimate through SPSS (Statistical Package). Diagnostic checking Different models can be obtained for the various combinations of AR and MA individually and collectively. The best model is obtained with the following diagnostics such as coefficient of determination (R2), Akaike information criteria, Bayesian information criteria, Portmonteau tests  Box Pierce or LjungBox Qtests, percentage forecast inaccuracy, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The smaller the values of MAE, RMSE, and MAPE, better the model is considered to be. Chandran and Pandey[3] have studied the seasonal fluctuation of potato price in Delhi. Ghosh and Prajneshu[4] have studied the price fluctuation of onion. Price of oil palm was predicted efficiently using ARIMA (2,1,0) model by Nochai, R and Nochai, T.[5] ARIMA model for forecasting wholesale price of oil palm was ARIMA (1,0,1) and pure oil price of oil palm was ARIMA (3,0,0). Sen, et al.[6] found that time series modelling and forecasting of Black Pepper price could be done using ARIMA (1,0,0). Unnikrishnan[7] developed ARIMA models for forecasting area, production, productivity, and price of major crops of Kerala. Results and Discussion Forecasting models for the confirmed cases in all the states of India were developed using data up to July 18, 2020. All the models were of R2 above 99% and indicate 99% of the variations in the data can be predicted by the model. The model was used to forecast values up to July 31. To validate the model MAPE, RMSE and MAE were calculated with the forecasted and actual values. The statistics showed that the models are very apt for forecasting purposes. Forecasts and estimated values were calculated for all India data using the parameters estimated and plotted the actual and estimated values as given in [Figure 1]. The plot shows that the actual and estimated values go side by side as given in blue and redcolored lines. Hence, the model can be very much suited for forecasting COVID cases knowing the number of COVID cases in the past few days. This model can be used to estimate the number of COVID cases for at 10 days by substituting forecasted values instead of actual values in the data. Once the actual data are arrived, the estimate can be replaced by the actual value and the rest can be forecasted as said earlier.{Figure 1} COVID cases recorded highpositive growth rates in March 16, 2020, March 21, 2020, and April 02, 2020. After May 21, 2020, the growth rate shows a decreasing pattern [Figure 2]. Even though increase in number of cases, since growth rate decreases, it will end in coming months.{Figure 2} ARIMA (0,2,1) was identified as the best forecasting model for estimating the number of COVID cases in most of the states in India. Whereas ARIMA (0,1,0) was identified as the best model for Mizoram and Puducherry. To predict all India cases, ARIMA (0,2,7) was identified as the best model. For the other states, Assam: ARIMA (0,1, 1), Goa: ARIMA (1,1,1), Dadra Nagar Haveli Daman Diu: ARIMA (4,1,0), Jharkhand: ARIMA (2,1, 1), Manipur : ARIMA (0,1,4), Arunachal Pradesh: ARIMA (0,2, and 13) and Nagaland: ARIMA (0,2,10) were the best model found for forecasting purposes. The final models for forecasting are given in [Table 1].{Table 1} Here, X(t) denote the number of cases in the tth day and e(t) = Actual X(t) – Forecasted X(t). Also X(tk) denote the number of cases in the tkth day and e(tk) is the error in forecasting in (tk)th day. Ln denote natural logarithm and Sqrt(X[t]) denote the square root of X(t) in the model. The residual ACF and residual PACF showed that the model is best for forecasting the number of COVID cases in India. Using this forecast for COVID cases in the coming 10 days could be given as in the [Table 2]. The forecasts also show that there is a chance for reduction in the COVID cases as growth rates shows a decreasing pattern in the state.{Table 2} Conclusion ARIMA (0,2,1) was identified as the best forecasting model for estimating the number of COVID cases in most of the states in India. The forecasting power of ARIMA model was used to forecast for 10 leading days, and the results showed a good agreement between actual and predicted values. The estimated and forecasted values show that there is a chance for the reduction in the COVID cases as growth rates shows a decreasing pattern in the state. The reasons may be due to less contact cases even in the increasing number of returnees from the other states and countries. The changing pattern in the number of COVID cases in India can also be identified with this model. Forecasts can also be made using the early forecasted values as the models can explain more than 99% of variability. Knowing the previous number of cases, the Government can take decision by the next 10 or more days based on the values estimated using the model, which will cradle the government to tide over the grim situations with easy planning Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. Ethical statement No studies involving human subjects and/or animals were conducted for this research. References


