Forecasting Dengue Cases in Several States in Malaysia Using Arima and Sarima Method
Anis Natrah Ibrahim., Humaida Banu Samsudin*
Faculty of Science and Technology, Universiti Kebangsaan Malaysia
DOI: https://doi.org/10.51244/IJRSI.2025.1215000157P
Received: 17 September 2025; Accepted: 22 September 2025; Published: 18 October 2025
Dengue is a viral disease carried by the Aedes mosquito that spreads quickly with a high death rate. In Malaysia, dengue fever is a contagious health threat with an increasing infection trend. The Malaysian Ministry of Health (KKM) found that there was a surge in dengue fever cases in 2010, 2015 and 2019. The dengue epidemic in Malaysia is expected to record a high increase every four to five years of the cycle and is expected to increase again in 2024 or 2025. This study compared cases of dengue fever dengue in several states in Malaysia for the year 2010 to 2021. Model construction for forecasting purposes was done using the Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA) methods. The best model will be selected based on the smallest Akaike Information Criterion (AIC) value and the appropriate model will be tested by performing a diagnostic check through the Ljung-Box Test. The prediction of dengue fever cases for the year 2022 is made based on the model that has been built. Forecasting using this model can help public health practitioners and the government in better risk management, allocation and planning for the provision of clinical care in the event of severe dengue fever in the future.
Keywords: Dengue fever; Autoregressive Integrated Moving Average (ARIMA); Seasonal Autoregressive Integrated Moving Average (SARIMA); prediction of dengue fever cases; forecasting model
Dengue is one of the most common and rapidly spreading vector-borne viral diseases with a high mortality rate (Sabir et al. 2021). According to the official portal of the Ministry of Health Malaysia (KKM), dengue fever is a type of viral infection that spreads through the bite of an infected Aedes Aegypti mosquito. The World Health Organization (WHO) classifies dengue into two main categories: dengue (with or without warning) and severe dengue. Dengue hemorrhagic fever is a severe dengue fever caused by dengue virus infection. Dengue hemorrhagic fever has shown an increase in the past decade worldwide (Lestari et al. 2021). Based on data released by WHO, the number of dengue cases worldwide in 2000 was 505,430 cases while in 2019, 4.2 million dengue cases were recorded. This shows that there has been an eight-fold increase in dengue fever cases over the past two decades.
Severe dengue was first recognized in the 1950s during dengue epidemics in the Philippines and Thailand (WHO, 2022). Today, however, dengue severely affects most Asian and Lagin American countries, becoming the leading cause of hospitalization and death among children and adults in the region. Before 1970, only nine countries experienced severe dengue epidemics. But now, this disease has been declared endemic in more than 100 countries in WHO region, namely in Africa, the Americas, the Eastern Mediterranean, Southeast Asia and the Western Pacific. The Americas, Southeast Asia and the Western Pacific are the worst affected, with Asia representing approximately 70% of the global disease burden.
In Malaysia, dengue fever is a contagious health threat with an increasing infection trend. Malaysia is a tropical country located on the equator. The hot, humid and rainy weather throughout the year contributes to the increase in dengue fever cases. Hot weather makes mosquitoes more active and increases the frequency of biting victims in addition to the spread of the dengue virus becoming more widespread (Ali, 2016). For this reason, other tropical countries such as Indonesia, Brazil and Mexico also recorded a high number of dengue fever cases.
In 2020, the Ministry of Health has reported that there were 90,304 cases of dengue fever in Malaysia. This has shown a significant decrease compared to 2014. However, Selangor still recorded the highest number of dengue fever cases compared to other states, with 44,635 cases. Thus, it can be said that Selangor has accounted for approximately 50% of dengue fever cases in Malaysia. This is because Selangor is a concentrated state and there is a rapid increase in the human populagion. Withanage et al. (2018) stated that unplanned and uncontrolled large-scale urbanization with rapid increase in human populagion leads to higher disease transmission in endemic areas. Therefore, the forecasting of dengue fever cases in several selected states has been done to help those responsible for risk management, provision and better planning for the provision of clinical care in the event of a severe dengue fever case.
This study uses secondary data, which is dengue fever case data for a study period of 11 years on a monthly basis from March 2010 to December 2021. The data used is data on dengue fever cases in Malaysia and cumulative data on dengue fever cases by state. This study uses ARIMA and SARIMA methods. However, before running the ARIMA and SARIMA methods, the Augmented Dickey-Fuller Unit Root (ADF) test is performed to determine if the time series is stationary and if there are any significant trends that need to be modeled. Stationary data refers to time series data that have a mean and variance that does not vary over time, while data are considered non-stationary if there is a strong trend or seasonality observed from the data (Wu, 2021).
If the p-value is greater than the value of 0.05, then alternative hypothesis is rejected and the time series is considered non-stationary, then the differentiation process needs to be done. Differentiation needs to be done if the time series is non-stationary to make it stationary. The ARIMA model or also known as the Box-Jenkins model is a ‘univariate’ time series modeling to predict future values. ARIMA models are usually written as ARIMA(p,d,q) where p is the autoregressive parameter, q for the moving average and d for the variance. There are four steps in forming this model which are tentative identification, parameter estimation, diagnosis examination and prediction (Bowerman, O’Connell, & Koehler, 2005).
The SARIMA model is short for ‘Seasonal Autoregressive Integrated Moving Average’. This model is a variation of the ARIMA model that includes a seasonal component. Thus, the SARIMA model consists of the same parameters as ARIMA but this model is used when the time series data is seasonal. This model adds four more terms in the model, namely m for season (S), P for Seasonal Autoregression (SAR), D for Seasonal Difference (SI) and Q for Seasonal Moving Average (SMA). The SARIMA(p,d,q)(P,D,Q)m model is used when the time series data shows the presence of seasons, which are patterns that repeat at certain time intervals (Shumway & Stoffer, 2011).
The ‘auto.arima’ function in R Software is used to check the stationarity of the data and determine the level of differentiation, d for time series data based on the KPSS Test. The ‘auto.arima’ function also reports the best ARIMA(p,d,q) and SARIMA(p,d,q)(P,D,Q)m models based on Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). A good model has low AIC and BIC values. Therefore, this study will use the ‘auto.arima’ function to determine the best ARIMA and SARIMA models for dengue fever cases in each state in Malaysia.
Based on Table 1, the results of the ADF unit root test show that only the state of Kelantan is stationary at the level. This is so because the p-value for Kelantan is significant and less than the significance level of 0.05. This causes the null hypothesis that there is a unit root in the time series to be rejected. Therefore, Kelantan integrates at order 0, which is I(0).
Table 1 ADF Unit Root Test
Level of Differentiation | ||||
State | Level | First Differentiation | ||
p-Value | Result | p-Value | Result | |
Kelantan | 0.0207 | Stationary | – | – |
Pulau Pinang | 0.5645 | Not Stationary | 0.01 | Stationary |
Sabah | 0.5176 | Not Stationary | 0.01 | Stationary |
Selangor | 0.877 | Not Stationary | 0.01 | Stationary |
Wilayah Persekutuan Kuala Lumpur & Putrajaya | 0.3767 | Not Stationary | 0.01 | Stationary |
Johor | 0.5498 | Not Stationary | 0.01 | Stationary |
In addition, other states are not stationary at the level because the p-value exceeds the significance level of 0.05 and is not significant. Therefore, the alternative hypothesis that says the unit root does not exist is rejected, then the differentiation process is performed. After the first differentiation process was carried out, all stationary states were found with significant p-values and less than the 0.05 significance level. Therefore, the null hypothesis that there is a unit root in the time series is rejected. Therefore, the states of Penang, Sabah, Selangor, the Federal Territory of Kuala Lumpur & Putrajaya and Johor are integrated at order 1, which is I(1).
Modeling of dengue fever cases was done to predict dengue fever cases in the states of Selangor, Kuala Lumpur, Johor, Penang, Kelantan and Sabah in 2022. These states were chosen because they had recorded the highest number of dengue cases throughout the period from January 1 to May 11, 2019 (Sinar Harian, 2019). Dengue fever cases are time series data, therefore the models used are ARIMA and SARIMA models.
ARIMA Model Construction
Figure 1 Time Series Plot of Dengue Cases
Based on Figure 1, it can be seen that the trend of cases in the states of Kelantan, Penang and Sabah is relatively flat and does not show a seasonal increase. Therefore, the ARIMA(p,d,q) model was chosen to predict cases in 2022. Based on Figure 2, for Kelantan, the ACF and PACF diagrams show that lag 1 is significant compared to the other lags. So, the value 1 for p and q is chosen. However, for Penang and Sabah, the most significant lag is difficult to choose, so the ‘auto.arima’ function in R Software is used to obtain an ARIMA model based on the time series of dengue cases.
Rajah 2 Rajah ACF dan PACF
Tables 2, 3 and 4 show a comparison of AIC values with some other models for the states of Kelantan, Penang and Sabah. The best ARIMA model is selected based on the AIC value that can be obtained directly. ARIMA model (1,0,1) was found to be the best model with the smallest AIC value for dengue fever cases in Kelantan while in Penang ARIMA model (0,1,2) and Sabah ARIMA model(2,1,3).
Table 2 ARIMA model selection test in Kelantan
ARIMA(p,d,q) Model | AIC |
ARIMA (1,0,1) with a non-zero mean | 2090.863*** |
ARIMA (2,0,1) with a non-zero mean | 2092.774 |
ARIMA (1,0,2) with a non-zero mean | 2092.784 |
ARIMA (0,0,2) with a non-zero mean | 2094.061 |
ARIMA (2,0,0) with a non-zero mean | 2095.566 |
ARIMA (2,0,2) with a non-zero mean | 2094.775 |
ARIMA (1,0,1) with a zero mean | 2098.773 |
Note: *** refers to the selected model
Table 3 ARIMA model selection test in Penang
ARIMA(p,d,q) Model | AIC |
ARIMA(0,1,2) | 1787.847*** |
ARIMA(0,1,1) | 1787.870 |
ARIMA(1,1,2) | 1789.823 |
ARIMA(0,1,3) | 1789.778 |
ARIMA(1,1,1) | 1788.139 |
Nota: *** refers to the selected model
Table 4 ARIMA model selection test in Sabah
ARIMA(p,d,q) Model | AIC |
ARIMA(2,1,3) | 1660.961*** |
ARIMA(1,1,3) | 1665.675 |
ARIMA(2,1,2) | 1663.530 |
ARIMA(3,1,3) | 1666.308 |
ARIMA(1,1,2) | 1664.948 |
Nota: *** refers to the selected model
Estimated values of the model parameters and standard error values of the parameters are obtained by using the ‘auto.arima’ function in R Software. Next, the significance test of the model parameters needs to be done. Statistical value |𝑡| for each parameter found to be higher than the critical value for the 95% confidence level which is 1.96. Therefore, the null hypothesis should be rejected. The determined parameters are significant and should be retained in the forecasting model.
After the significance of the ARIMA model parameters was tested, a diagnostic check was conducted using Ljung-Box Statistics to ensure that the selected model for dengue fever case prediction was appropriate. The obtained p-value is as in Table 6.
Table 6 ARIMA model diagnosis review
State | ARIMA(p,d,q) | p-Value |
Kelantan | ARIMA (1,0,1) with a non-zero mean | 0.8812 |
Pulau Pinang | ARIMA(0,1,2) | 0.6248 |
Sabah | ARIMA (2,1,3) | 0.3398 |
Next, the prediction of dengue fever cases for each state can be made by using the respective ARIMA model that is confirmed to be appropriate. Predicted dengue fever cases and actual cases are recorded in Table 7 and plotted in Figure 3.
Table 7 Prediction of dengue fever cases in 2022 using ARIMA model
Month | Kelantan | Pulau Pinang | Sabah | |||
Actual Case | Prediction Case | Actual Case | Prediction Case | Actual Case | Prediction Case | |
January | 21 | 125 | 47 | 28 | 191 | 166 |
February | 24 | 228 | 35 | 28 | 129 | 151 |
March | 29 | 275 | 29 | 28 | 175 | 148 |
April | 31 | 297 | 43 | 27 | 252 | 158 |
May | 59 | 307 | 84 | 27 | 373 | 170 |
June | 115 | 311 | 85 | 26 | 602 | 171 |
July | 132 | 313 | 80 | 26 | 807 | 162 |
August | 145 | 314 | 101 | 25 | 824 | 153 |
September | 173 | 314 | 154 | 25 | 835 | 153 |
October | 160 | 315 | 207 | 24 | 869 | 161 |
November | 162 | 315 | 247 | 24 | 877 | 168 |
December | 181 | 315 | 536 | 23 | 1161 | 166 |
Based on Figure 3 (a), it can be seen that the predicted value of dengue fever cases in Kelantan is close to the actual value. Figure 3 (b) and (c) show that the predicted values in Penang and Sabah are not close to the actual values. This is so because the actual value has increased from April to December 2022. The increase in Sabah in December 2022 has recorded the highest cases of dengue fever since 2022. This may be due to external factors such as weather, community attitudes and climate change. However, all models have been confirmed to be suitable for use in forecasting.
Figure 3 Prediction of Non-Seasonal Dengue Fever Cases in 2022
SARIMA model construction
Figure 4 Additional Model Seasonal Trend Decomposition Plot
Figure 4 shows a time series plot of dengue fever cases for the state of Selangor, the Federal Territory of Kuala Lumpur & Putrajaya and Johor. Based on the diagram, it can be seen that these three have a seasonal trend of dengue fever cases. Therefore, the SARIMA(p,d,q)(P,D,Q)m model is used to predict dengue fever cases in 2022.
Based on Figure 5, p and q values for each model are selected based on the ACF and PACF diagrams, but the most significant lag is difficult to determine. Therefore, the ‘auto.arima’ function in R Software is used to obtain the SARIMA model and the best model based on the AIC value can be obtained directly. The SARIMA model (2,1,2)(1,0,0)[12] was found to be the best model with the smallest AIC value for dengue fever cases in Selangor while in the Federal Territory of Kuala Lumpur & Putrajaya the SARIMA model (2,1,2 )(1,0,1)[12] and in Johor the SARIMA model (0,1,0)(1,0,1)[12]. Tables 8, 9 and 10 show the comparison of AIC values with other models for the state of Selangor, the Federal Territory of Kuala Lumpur & Putrajaya and Johor.
Figure 5 ACF and PACF diagrams
Table 8 SARIMA model selection test in Selangor
SARIMA(p,d,q)(P,D,Q)m Model | AIC |
SARIMA (2,1,2)(1,0,0)[12] | 2275.690 *** |
SARIMA (2,1,2)(0,0,2)[12] | 2276.738 |
SARIMA (2,1,2)(0,0,1)[12] | 2276.773 |
SARIMA (2,1,2)(1,0,2)[12] | 2278.421 |
SARIMA (2,1,2)(1,0,1)[12] | 2277.608 |
SARIMA (1,1,2)(0,0,2)[12] | 2285.303 |
SARIMA (2,1,1)(0,0,2)[12] | 2283.502 |
Nota: *** refers to the selected model
Table 9 SARIMA model selection test in Wilayah Persekutuan Kuala Lumpur & Putrajaya
SARIMA(p,d,q)(P,D,Q)m Model | AIC |
SARIMA(2,1,2)(1,0,1)[12] | 1829.762 *** |
SARIMA(2,1,2)(0,0,1)[12] | 1834.286 |
SARIMA(2,1,2)(1,0,0)[12] | 1830.357 |
SARIMA(2,1,2)(2,0,1)[12] | 1831.528 |
SARIMA(2,1,2)(1,0,2)[12] | 1831.568 |
SARIMA(2,1,2)(0,0,1)[12] | 1834.286 |
Nota: *** refers to the selected model
Table 10 SARIMA model selection test in Johor
SARIMA(p,d,q)(P,D,Q)m Model | AIC |
SARIMA(0,1,0)(1,0,1)[12] | 1824.590 *** |
SARIMA(0,1,0)(0,0,1)[12] | 1827.450 |
SARIMA(0,1,0)(1,0,0)[12] | 1825.627 |
SARIMA(1,1,0)(1,0,1)[12] | 1826.319 |
SARIMA(0,1,1)(1,0,1)[12] | 1826.276 |
SARIMA(1,1,1)(1,0,1)[12] | 1828.181 |
Nota: *** refers to the selected model
Estimated values of the model parameters and standard error values of the parameters are obtained by using the ‘auto.arima’ function in R Software. Next, the significance test of the model parameters needs to be done. Statistical value |𝑡| for each parameter found to be higher than the critical value for the 95% confidence level. Therefore, the null hypothesis should be rejected. The determined parameters are significant and should be retained in the forecasting model. After the significance of the parameters of the SARIMA models was tested, a diagnosis check was carried out using Ljung-Box Statistics to ensure that the model selected for the prediction of dengue fever cases was suitable for use. The p-values obtained are as in the table below and it is found that all the p-values obtained exceed the confidence level of 0.05 or 5%. Therefore, the null hypothesis that the model is suitable is accepted.
Table 11 Diagnostic review of the SARIMA model
State | SARIMA (p,d,q)(P,D,Q)m | p-Value |
Selangor | SARIMA (2,1,2)(1,0,0)[12] | 0.5596 |
Wilayah Persekutuan Kuala Lumpur & Putrajaya | SARIMA (2,1,2)(1,0,1)[12] | 0.8601 |
Johor | SARIMA (0,1,0)(1,0,1)[12] | 0.4421 |
Forecasting dengue fever cases for the three states can be made using their respective SARIMA models that have been confirmed as appropriate. Predicted dengue fever cases and actual cases are recorded in Table 12 and plotted in Figure 6.
Table 12 Prediction of dengue fever cases in 2022 using the SARIMA model
Month | Selangor | Wilayah Persekutuan Kuala Lumpur & Putrajaya | Johor | |||
Actual Case | Prediction Case | Actual Case | Prediction Case | Actual Case | Prediction Case | |
January | 1934 | 1542 | 240 | 271 | 124 | 211 |
February | 1852 | 1259 | 211 | 245 | 80 | 160 |
March | 2128 | 1051 | 229 | 217 | 103 | 107 |
April | 2876 | 1036 | 393 | 191 | 163 | 72 |
May | 2818 | 1203 | 438 | 267 | 268 | 123 |
June | 3664 | 1520 | 646 | 374 | 360 | 201 |
July | 4107 | 1619 | 729 | 367 | 379 | 264 |
August | 3280 | 1523 | 665 | 290 | 409 | 260 |
September | 3021 | 1398 | 634 | 247 | 459 | 160 |
October | 3423 | 1242 | 764 | 228 | 537 | 127 |
November | 3266 | 1179 | 797 | 178 | 570 | 92 |
December | 4624 | 1337 | 881 | 149 | 733 | 55 |
Figure 6 Prediction of Seasonal Dengue Fever Cases in 2022
Based on Figure 6 (a), it can be seen that the predicted value of dengue fever cases in Selangor is not close to the actual value from March to December 2022. Figure 6 (b) and (c) show the predicted value for the Federal Territories of Kuala Lumpur & Putrajaya and Johor. These two states recorded predicted values close to the actual values in January to July, but the actual values of dengue fever cases started to increase in August to December 2022.
The ARIMA model was built and used for the prediction of non-seasonal dengue fever cases while the SARIMA model was built and used for the prediction of seasonal dengue fever cases. The forecast value in 2022 was found not to be close to the actual value for all states except Kelantan. This may be due to external factors such as weather, relagive humidity and the community’s own attitude that contributed to the increase in dengue fever cases in 2022.
The forecast results for the ARIMA model show that there is an increase in cases from January to August and the cases show a constant until December 2022. For the SARIMA model, the forecast value shows that cases increase from May, but decrease from September to December 2022. However, the actual value of fever cases dengue in all states shows an ups and downs from January to May and a trend of increasing cases can be seen from June to December 2022.
Although the dengue fever case forecasting model does not guarantee 100% prediction accuracy, the dengue fever case forecast produced is very valuable to public health practitioners and the government. This is so because this prediction can help them manage risk, provision and better planning for the provision of clinical care in the event of a severe case of dengue fever (Riley et al. 2020). According to Wongkoon et al. (2012), the development of mathematical models is very useful in the control and prevention of infectious diseases. Furthermore, the use of the Box-Jenkins method in building ARIMA and SARIMA models for vector-borne diseases has increased and is receiving more and more attention because it brings promising forecasting results.
We are gratefully acknowledge the Ministry of Health Malaysia for providing the data for this study.