Nepal - Covid-19 Prediction models using different ML algorithms

Updated on Dec. 20, 2024, 10:18 p.m., by: sagar

I am using a sigmoidal function to fit the historical data of Covid-19 and predict or forecast. And also use LinearRegression and RandomForestRegressor to predict. I thought of a sigmoidal function first because China's data resembled a sigmoidal shape. Therefore, I try to fit sigmoid functions onto Nepal also.

**Step-1:Load dataset from s3 (download from [https://covid.ourworldindata.org/data/owid-covid-data.csv](https://covid.ourworldindata.org/data/owid-covid-data.csv))**

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')

df1=pd.read\_csv('s3://sagemaker-studio-im065hy7nj/owid-covid-data.csv')

#df1.head(5)

df1.columns

**Step-2: Subsetting only those rows that have "NPL" in the "location" column and plot y-axis total\_cases over x-axis as day-count.**

Nepal\_df = df1\[df1\['location'\]=='Nepal'\].groupby('date')\[\['total\_cases','total\_deaths'\]\].sum()

Nepal\_df\['day\_count'\] = list(range(1,len(Nepal\_df)+1))

ydata = Nepal\_df.total\_cases

xdata = Nepal\_df.day\_count

Nepal\_df\['rate'\] = (Nepal\_df.total\_cases-Nepal\_df.total\_cases.shift(1))/Nepal\_df.total\_cases

Nepal\_df\['increase'\] = (Nepal\_df.total\_cases-Nepal\_df.total\_cases.shift(1))

\# Nepal\_df = Nepal\_df\[Nepal\_df.total\_cases>100\]

plt.plot(xdata, ydata, 'o')

plt.title("Nepal")

plt.ylabel("Population Infected")

plt.xlabel("Days")

plt.show()

**Output:**

[![](https://lh3.googleusercontent.com/-V9b4x75_Fz8/YH0aGXkia4I/AAAAAAAAKgU/D0m8rtflFxcb30Z5SMRvw3oje_pwDsOzwCNcBGAsYHQ/image.png)](https://www.blogger.com/blog/post/edit/2875741694909600015/6667143551291784996#)

**Step-3: Sigmoidal Function:**

from scipy.optimize import curve\_fit

import pylab

def sigmoid(x,c,a,b):

y = c\*1 / (1 + np.exp(-a\*(x-b)))

return y

xdata = np.array(\[1, 2, 3,4, 5, 6, 7\])

ydata = np.array(\[0, 0, 13, 35, 75, 89, 91\])

popt, pcov = curve\_fit(sigmoid, xdata, ydata, method='dogbox',bounds=(\[0.,0., 0.\],\[100,2, 10.\]))

print(popt)

x = np.linspace(\-1, 10, 50)

y = sigmoid(x, \*popt)

pylab.plot(xdata, ydata, 'o', label='data')

pylab.plot(x,y, label='fit')

pylab.ylim(\-0.05, 105)

pylab.legend(loc='best')

pylab.show()

[![](https://lh3.googleusercontent.com/-WLxbj5ze2vA/YH00h8WPXEI/AAAAAAAAKgg/lFqGf2x2XGA0AMQr8BbY4du1eHOXPrMjgCNcBGAsYHQ/image.png)](https://www.blogger.com/blog/post/edit/2875741694909600015/6667143551291784996#)

Sigmoid function, Here is a snap of how I learnt to fit Sigmoid Function - y = c/(1+np.exp(-a\*(x-b))) and 3 coefficients \[c, a, b\]:

*   c - the maximum value (eventual maximum infected people, the sigmoid scales to this value eventually)
*   a - the sigmoidal shape (how the infection progress. The smaller, the softer the sigmoidal shape is)
*   b - the point where the sigmoid start to flatten from steepening (the midpoint of sigmoid, when the rate of increase start to slow down)

**Step-4: Since our dataset doesn't have a recovered cases column so I am gonna import and read the next dataset**

df2 = pd.read\_csv('s3://sagemaker-studio-im065hy7nj/time\_series\_covid19\_recovered\_global.csv')

\# Extract list of values of 'recovered' column and insert that column to our first data-subset with proper arrangement.

\# Since few pairs of row doesn't exist in column of new dataset, so we gonna manually add values referencing other sites.

list(df2.iloc\[172, 4:\])

input\_values = \[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 4, 4, 4, 7, 10, 11, 12, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 22, 22, 31, 31, 31, 33, 33, 35, 35, 36, 36, 36, 36, 37, 45, 49, 70, 70, 87, 112, 155, 183, 187, 206, 219, 220, 221, 266, 278, 290, 333, 365, 467, 488, 584, 674, 861, 877, 913, 974, 1041, 1158, 1167, 1186, 1402, 1578, 1772, 2148, 2224, 2338, 2650, 2698, 2834, 3013, 3134, 3194, 4656, 5320, 6143, 6415, 6547, 6811, 7499, 7752, 7891, 8011, 8442, 8589, 10294, 10328, 11025, 11249, 11534, 11637, 11695, 11868, 12477, 12684, 12840, 12947, 13053, 13128, 13754, 13875, 14021, 14248, 14399, 14492, 14603, 14961, 15026, 15156, 15389, 15814, 16313, 16353, 16493, 16664, 16728, 16837, 17077, 17201, 17335, 17495, 17580, 17700, 17964, 18214, 18350, 18631, 18806, 19119, 19504, 20073, 20242, 20555, 20822, 21410, 22178, 23290, 24207, 25561, 27127, 28941, 30677, 32964, 33882, 35700, 36672, 37524, 38697, 39576, 40638, 41706, 42949, 43820, 45267, 46233, 47238, 48061, 49954, 50411, 51866, 53013, 53898, 54640, 55371, 56428, 57389, 60696, 62740, 64069, 65202, 67542, 68668, 71343, 73023, 74252, 75804, 77277, 78780, 80954\]

len(input\_values)

**step-5: Creating new subset with required columns and rows only:**

in\_df1 = df1\[df1\['location'\]=='Nepal'\].groupby('date')\[\['total\_cases', 'new\_cases', 'total\_deaths', 'new\_deaths', 'total\_tests'\]\].sum().reset\_index(False)

in\_df1\['recovered'\]= np.array(input\_values).copy()

in\_df1\['Active'\]=in\_df1\['total\_cases'\]-in\_df1\['new\_deaths'\]-in\_df1\['recovered'\]

\# in\_df1 = in\_df1\[in\_df1.Active>=20\]

in\_df1.head(10)

in\_df1.isnull().sum()

**Step-6: Sigmoidal fitting**

in\_df1\['day\_count'\] = list(range(1, len(in\_df1)+1))

in\_df1\['increase'\] = (in\_df1.total\_cases-in\_df1.total\_cases.shift(1))

in\_df1\['rate'\] = (in\_df1.total\_cases-in\_df1.total\_cases.shift(1))/in\_df1.total\_cases

def sigmoid(x,c,a,b):

y = c\*1 / (1 + np.exp(-a\*(x-b)))

return y

xdata = np.array(list(in\_df1.day\_count)\[::2\])

ydata = np.array(list(in\_df1.total\_cases)\[::2\])

\# population=29136808

\# popt, pcov = curve\_fit(sigmoid, xdata, ydata, method='dogbox',bounds=(\[0., 0., 0.\], \[population, 6, 180.\]))

\# print(popt)

**Step-7: Prediction Using Manual sigmoidal fitting:**
======================================================

import pylab

est\_a = 145000

est\_b = 0.04

est\_c = 270

x = np.linspace(\-1, Nepal\_df.day\_count.max()+50, 50)

y = sigmoid(x,est\_a,est\_b,est\_c)

pylab.plot(xdata, ydata, 'o', label='data')

pylab.plot(x,y, label='fit',alpha = 0.8)

pylab.ylim(\-0.05, est\_a\*1.05)

pylab.xlim(\-0.05, est\_c\*2.05)

pylab.legend(loc='best')

plt.xlabel('days from day 1')

plt.ylabel('Infection Cases')

plt.title('Nepal')

pylab.show()

print('model start date:',Nepal\_df\[Nepal\_df.day\_count==1\].index\[0\])

print('model start infection:',int(Nepal\_df\[Nepal\_df.day\_count==1\].total\_cases\[0\]))

print('model fitted max infection at:',int(est\_a))

print('model sigmoidal coefficient is:',round(est\_b,2))

print('model curve stop steepening, start flattening by day:',int(est\_c))

print('which is date:', Nepal\_df\[Nepal\_df.day\_count==int(est\_c)\].index\[0\])

print('model curve flattens by day:',int(est\_c)\*2)

display(Nepal\_df.head(3))

display(Nepal\_df.tail(3))

[![](https://lh3.googleusercontent.com/-tb-5R0G_a7M/YH054Az09tI/AAAAAAAAKgo/JGPhxEMYI44z04MDHOVvvk5MNZd-WskmgCNcBGAsYHQ/image.png)](https://www.blogger.com/blog/post/edit/2875741694909600015/6667143551291784996#)

[![](https://lh3.googleusercontent.com/-_saEyuK_jT4/YH058QSuinI/AAAAAAAAKgs/MQZDDXGvLsYn65i3XdTYA5b_N-x2B1opgCNcBGAsYHQ/w320-h299/image.png)](https://www.blogger.com/blog/post/edit/2875741694909600015/6667143551291784996#)

**Nepal,**

*   The b coefficient is 270, which means that the model starts to flatten 270 days, after the 25th of September, and really flatten significantly after 540 days.
    
*   The c coefficient is 145000, which is the predicted amount of infected people.
    
*   The coefficient is 0.04 is smaller than China's 0.22, which means the sigmoid is even softer in China. This means that Nepal will take even longer than China to fight Covid-19.

From this, its seen that in Nepal if the graph goes like that:

max Active case: 145000,

curve stop steepening, start flattening by day: 270,

which is: 2020-09-25,

the curve flattens by day: 540

===========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================

**Linear Regression and Random Forest for prediction:**
=======================================================

**For this part, please go through sourcecode with explanations:**  [https://colab.research.google.com/drive/1uKgB3L-7C5OtuhEdr0W1g2F-U3yV84vA#scrollTo=mYLbmyLGAFE\_](https://colab.research.google.com/drive/1uKgB3L-7C5OtuhEdr0W1g2F-U3yV84vA#scrollTo=mYLbmyLGAFE_)

Nepal - Covid-19 Prediction models using different ML algorithms

Write your comment

0 comments

No comments yet. 😀

Category

Tags

Similar posts

Latest posts

Most commented posts