COVID-19: Estimating spread in Spain solving an inverse ... · In Europe, for example, it is considered to be the most signi cant challenge that the continent has faced since World

COVID-19: Estimating spread in Spain solving

an inverse problem with a probabilistic model

Marcos Matabuena1,4, Carlos Meijide-Garcıa3, PabloRodrıguez-Mier2, and Vıctor Leboran1

1CiTIUS (Centro Singular de Investigacion en TecnoloxıasIntelixentes), Universidade de Santiago of Compostela, Santiago de

Compostela, Spain2Toxalim (Research Centre in Food Toxicology), Universite de

Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France3Universidade de Santiago of Compostela, Santiago de

Compostela, [email protected]

May 5, 2020

Abstract

We introduce a new probabilistic model to estimate the real spread of the novelSARS-CoV-2 virus along regions or countries. Our model simulates the behaviorof each individual in a population according to a probabilistic model through aninverse problem; we estimate the real number of recovered and infected peopleusing mortality records. In addition, the model is dynamic in the sense thatit takes into account the policy measures introduced when we solve the inverseproblem. The results obtained in Spain have particular practical relevance: thenumber of infected individuals can be 17 times higher than the data providedby the Spanish government on April 26 th in the worst case scenario. Assumingthat the number of fatalities reflected in the statistics is correct, 9.8 percentof the population may be contaminated or have already been recovered fromthe virus in Madrid, one of the most affected regions in Spain. However, if weassume that the number of fatalities is twice as high as the official numbers, thenumber of infections could have reached 19.5%. In Galicia, one of the regionswhere the effect has been the least, the number of infections does not reach 2.5%. Based on our findings, we can: i) estimate the risk of a new outbreak beforeAutumn if we lift the quarantine; ii) may know the degree of immunization of thepopulation in each region; and iii) forecast or simulate the effect of the policiesto be introduced in the future based on the number of infected or recoveredindividuals in the population.

1

arX

iv:2

004.

1369

5v2

[q-

bio.

PE]

3 M

ay 2

020

[email protected]

1 Introduction

The spread of the Coronavirus is generating an unprecedented crisis worldwide.In Europe, for example, it is considered to be the most significant challengethat the continent has faced since World War II. In the light of this situationof emergency, the governments must work quickly to avoid the collapse of thehealthcare system, reduce the mortality associated with the virus, and avoidthe possible effects of an economic recession [1, 2].

Given the strong capacity of the virus to spread and the lack of preventivemeasures, many countries have been systematically forced to lockdown the pop-ulation temporarily. Although these policies may be useful in controlling thespread of the virus in the immediate future, they are unsustainable over timefrom an economic point of view. In this regard, forecasting the evolution andconsequences of the pandemic based on the exposure of the population becomesa critical factor in decision-making [3, 4]. However, to rigorously predict theseeffects, it is necessary to assess the current spread of the epidemic, which isoften unknown.

At the beginning of the XX century, the first mathematical models to studythe dynamics of an epidemic were introduced. Probably the most well-knownis the susceptible-infected-recovered model (SIR). SIR models and its variations[5, 6] divide the population into compartments, and using differential (deter-ministic) equations, the number of individuals in each of the compartments overtime is estimated. Since then, many new variations of these models have beenintroduced in the literature (see for review [7, 8, 9, 10] or more contemporaryexamples [11, 12]).

However, those approaches suffer from a critical shortcoming. In essence,they focus on explaining the dynamics of the epidemic at the population leveland exclude the complex interactions that occur at the microscopic level betweenindividuals [13, 14, 15]. Furthermore, these models tend to be adjusted withfixed parameters that are not updated over time, according to how the epidemicevolves and the different policies introduced.

In the current era of precision medicine, where more efficient solutions arebeing sought that optimize the health of the individual [16], it is surprising notto find in the current state of the art models of epidemic spread that follow moreindividual approaches [17, 18], as is rightly pointed out in [19]. Precisely in [19],the authors propose to adopt new models that exploit the enormous individualinformation recorded by biosensors [20] and other devices.

A crucial aspect to consider in the success of mathematical modeling is theneed to adjust the parameters of the models with accurate data. However, this isdifficult to achieve in practice, because governments are often overloaded, andinformation is only recorded on patients with symptoms. The observationalnature of this data [21], together with a delay or errors in the recording ofinformation [22], requires specific techniques to correct the biases related to thedata [23].

1.1 Main contributions and results

Motivated by the need to developed new models that overcome these drawbacksand that estimate the expansion of the SARS-CoV-2 in a realistic way in Spain,

2

we introduce a new probabilistic model. The contributions and most significantresults of this paper are introduced below.

• To the best of our knowledge, we propose one of the few probabilisticmodels that estimate the evolution of an epidemic from each individualof the population. This model combines the finite Markov model, alongwith continuous probability distributions and a dynamic Poisson model.

• The model is designed in such a way that it does not need to use statisticson the number of infections that have a significant measurement error,and uses mortality records instead.

• The model estimates for each day the number of susceptible, contami-nated, recovered, and fatalities. For this, we solve an inverse problem,and we introduce biological expert knowledge related to the SARS-CoV-2disease into the probabilistic model about the time each individual spendsin the different states and the transition of probabilities.

• Using the data from several regions of Spain, we estimate the real numberof infections as of April 26st. The results show a higher proportion ofinfections than that provided by the Spanish government.

• The effect of the Coronavirus has been uneven among regions of Spain.In Madrid, more than 10 percent of the population was infected, or itrecovers from the virus, while in Galicia, this percentage is less than 2.5.

• These results highlight differences between regions; i) the degree of immu-nity of the population can be different; ii) the number of recovered peoplewho can return to normality is also different; iii) based on this analysis,a reasonable strategy to be considered could be to lift the quarantine ona case-by-case basis based on the degree of exposure of each region to thevirus.

• Finally, we repeat the estimations assuming that the number of fatalitiesis twice than reported by the official records. In this case, the number ofinfections can increases between 40 y 110 percent by region. However, wemust interpret these results with caution.

1.2 Outline

The mathematical details of the model are relegated to the end of the article,to facilitate the readability for a general target audience. The structure of thearticle is as follows: First, we describe the evolution of Coronavirus in Spain,together with some demographic and economic characteristics of the Spanishpopulation. Subsequently, we fit our model in different scenarios, and we showthe evolution of susceptible patients, infected and recovered ones over time.Next, we discuss the results and limitations of the model. Finally, we introducethe mathematical content of our model and the computational details of theimplementation and parameter optimization.

3

2 COVID-19 in Spain

Spain was one of the first countries in the world to experience the effects ofCOVID-19 after China and Italy. However, despite the delay in the start ofthe outbreak, with respect to these countries, the consequences are now moredramatic. To give a better context to the evolution of Coronavirus in Spain andto compare it with other countries, we introduce some historical background:

• January 31st. The first positive result was confirmed on Spanish territoryin La Gomera. At that time, there were around 10, 000 confirmed casesworldwide.

• February 12th. The Mobile World Congress, one of the most remarkabletechnological congresses in the world to be held in Barcelona, was canceled.

• March 8th. Multitudinous marches were celebrated in Spain. Also, sportscompetitions and other events were held as usual.

• March 13th. Madrid reported 500 new cases of Cov-19 in one day (64deaths total). Wuhan had gone into lockdown with 400 new cases per day(17 deaths total).

• March 14th. With the increase in the outbreak of infections, the govern-ment declared a quarantine throughout the country.

• March 21st. Due to an overloaded health system, the first patients startedto arrive at new makeshift hospitals.

• April 3rd. Spain accounts for a total of 117, 710 confirmed cases, surpass-ing Italy for the first time.

• April 6th. Spain becomes the country in the world with more deaths permillion inhabitants.

• April 9th. The FMI forecasts that 170 countries are going to be intorecession this year in the worst crisis since the Great Depression.

• April 18th. The Spanish government changes protocols for the daily statis-tics of COVID-19.

• April 21th. The president of the United States, Donald Trump, declared:”It’s incredible what happened to Spain, it’s been shattered.”.

Figure 1, shows accumulated number of cases and deceases respectively inthe previous periods in Spain, Italy, China, United Kingdom and the UnitedStates according to the data supplied by the different governments.

Following the statistics of the Population Reference Bureau, Spain is the20th country with the world’s oldest population [26]. The country demographicstructure, poverty rates, and epidemiological profiles is essential to comparemortality between countries. In the Coronavirus disease, relative and absolutecase-fatality risk (CFR) [27] increases dramatically with age and with comor-bidity, as evidenced by the current literature. Relative risk can increase by morethan 900% in patients over 60 [28].

4

0

250000

500000

750000

feb. mar. abr.

Date

China Italy Spain United Kingdom US

Cumulative number of cases worldwide time−series

0

10000

20000

30000

40000

50000

feb. mar. abr.

Date

China Italy Spain United Kingdom US

Cumulative number of deaths worldwide time−series

Figure 1: Spread and number of deaths of Coronavirus in Spain, Italy, China,and the United States. Number of accumulated infected patients (left) and thenumber of accumulated deaths (right) [24].

Galicia Paıs Vasco Castilla y Leon Cataluna Madrid EspanaPopulation 2, 698, 763 2, 181, 916 2, 553, 301 7, 609, 497 6, 685, 470 47, 100, 396

At-risk-of-poverty rate 18.8 8.6 16.1 13.6 16.1 21.5Population density 91.28 305.19 25.47 239.01 830.02 93.08

Percentage of population by age group0− 9 7.50 8.93 7.51 9.78 9.92 9.289− 18 7.56 8.78 7.79 9.74 9.44 9.3718− 30 10.39 10.66 10.67 12.72 12.80 12.4230− 45 21.18 20.28 19.47 22.24 23.24 22.0745− 60 22.77 23.28 23.60 21.77 22.23 22.6160− 80 22.60 21.41 22.30 18.24 17.33 18.70

from 81 on 8.01 6.67 8.65 5.50 5.03 5.56

Table 1: Demographic and socioeconomics characteristics of the Spanish popu-lation throughout some regions: Galicia, Paıs Vasco, Castilla y Leon, Cataluna,Madrid [25].

Subsequently, we perform a descriptive analysis in the regions of Spain thatwe analyze in this paper: Galicia, Paıs Vasco, Castilla y Leon, Madrid, Cataluna.Table 1 contains the essential demographic and socioeconomic characteristics ofthese regions. We can see that Castilla y Leon is the region with the highestproportion of elderly people. At the same time, Castilla y Leon has the mostdelocalized population centers, and the Paıs Vasco is the region with the low-est poverty rate. Spain is a multicultural country where there are significanteconomic, geographical, social, and demographic differences throughout the re-gions. All these peculiarities make Spain an interesting country to extrapolatethe effects of the spread of the Coronavirus to other regions and countries.

Finally, in Figure 2, we show the evolution of infections and fatalities amongthe regions under consideration. As we can see, Madrid is the most affectedregion, while Galicia is the least affected, despite its older population. However,it is important to note that the outbreak began later, and the containment was

5

0

2000

4000

6000

mar 01 mar 15 abr 01 abr 15

Date

Castilla y León Cataluña Galicia Madrid País Vasco

Cumulative number of deaths by day in Spain time−series

0

20000

40000

60000

mar 01 mar 15 abr 01 abr 15

Date

Castilla y León Cataluña Galicia Madrid País Vasco

Cumulative number of cases by day in Spain time−series

Figure 2: Evolution of accumulated infected (left) and death patients (right) inGalicia, Paıs Vasco, Castilla y Leon, Madrid, Cataluna

carried out earlier than in Madrid.

3 Results

Next, we show the estimation performed with the model until April 26st inGalicia, Paıs Vasco, Castilla y Leon, Madrid, Cataluna, assuming that thesetwo scenarios hold:

1. We assume that the number of real deaths due to Coronavirus is the onereflected by the official records.

2. We assume that many people have died of Coronavirus, but they have notbeen included in the records because a diagnostic test was not performed.In particular, we shall suppose that the number of deaths is twice as highas those indicated in the official records each day.

We graphically represent along time the result of estimating how many in-dividuals belong to each of the following states:

• I1(t): Number of infected individuals who are incubating the virus on dayt.

• I2(t): Number of infected people who have passed the incubation periodand do not show symptoms on day t.

• I3(t): Number of infected people who have passed the incubation periodand do show symptoms on day t.

• R1(t): Number of recovered cases which are still able to infect on day t.

• R2(t). Number of recovered cases which are not able to infect anymore onday t.

6

• M(t): Number of deaths on day t.

In addition, to understand the meaning of the results, we show: i) the numberof people who may be contaminated or who have already transmitted the virusas a percentage of the population size; and ii) the rate of new infections eachday.

Finally, we introduce confidence bands of our estimations 1.

1Our confidence bands shall not be confused with classic frequentist statistics confidencebands

7

8

3.1 Scenario 1

Galicia

Mar2020

Apr

02 09 16 23 30 06 13 200

5000

10000

15000

20000

25000

I1 (Infected, incubation)

Mar2020

Apr

02 09 16 23 30 06 13 200

5000

10000

15000

20000

25000

I2 (Infected, asymptomatic)

Mar2020

Apr

02 09 16 23 30 06 13 200

5000

10000

15000

20000

25000

I3 (Infected, symptomatic)

Mar2020

Apr

02 09 16 23 30 06 13 200

5000

10000

15000

20000

25000

R1 (Recovered, still infectious)

Mar2020

Apr

02 09 16 23 30 06 13 20

0

5000

10000

15000

20000

25000

30000

35000

R2 (Recovered)

Mar2020

Apr

02 09 16 23 30 06 13 20

0

100

200

300

400

Model fit vs Observed fatalities

Observed fatalities

Mar2020

Apr

02 09 16 23 30 06 13 20

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

% of the population infected

Mar2020

Apr

02 09 16 23 30 06 13 20

0

500

1000

1500

2000

2500

3000

Daily rate of infections ( t)

Figure 3: Results in Galicia: we assume that there are as many fatalities as inthe official records.

9

Castilla y Leon

24

Mar2020

Apr

02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


24

Mar2020

Apr

02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


24

Mar2020

Apr

02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


24

Mar2020

Apr

02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

20000

40000

60000

80000

100000

120000

140000

R2 (Recovered)

24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

250

500

750

1000

1250

1500

1750


Observed fatalities

24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

1

2

3

4

5

6


24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

2000

4000

6000

8000

10000


Figure 4: Results in Castilla y Leon: we assume that there are as many fatalitiesas in the official records. 10

Paıs Vasco

Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

20000

40000

60000

80000

100000

R2 (Recovered)

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

200

400

600

800

1000

1200


Observed fatalities

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

1

2

3

4

5


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

1000

2000

3000

4000

5000


Figure 5: Results in Paıs Vasco: we assume that there are as many fatalities asin the official records. 11

Madrid

Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

100000

200000

300000

400000

500000

600000

700000

R2 (Recovered)

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

2000

4000

6000

8000


Observed fatalities

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

2

4

6

8

10


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

10000

20000

30000

40000

50000

60000

70000

80000


Figure 6: Results in Madrid: we assume that there are as many fatalities as inthe official records. 12

Cataluna

Mar2020

Apr

24 02 09 16 23 30 06 13 200

50000

100000

150000

200000

250000I1 (Infected, incubation)

Mar2020

Apr

24 02 09 16 23 30 06 13 200

50000

100000

150000

200000

250000I2 (Infected, asymptomatic)

Mar2020

Apr

24 02 09 16 23 30 06 13 200

50000

100000

150000

200000

250000I3 (Infected, symptomatic)

Mar2020

Apr

24 02 09 16 23 30 06 13 200

50000

100000

150000

200000

250000R1 (Recovered, still infectious)

Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

50000

100000

150000

200000

250000

300000

350000

400000

R2 (Recovered)

Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

1000

2000

3000

4000


Observed fatalities

Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

1

2

3

4

5


Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

5000

10000

15000

20000

25000


Figure 7: Results in Cataluna: we assume that there are as many fatalities asin the official records. 13

14

3.2 Scenario 2

Galicia

Mar2020

Apr

02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000I1 (Infected, incubation)

Mar2020

Apr

02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000I2 (Infected, asymptomatic)

Mar2020

Apr

02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000I3 (Infected, symptomatic)

Mar2020

Apr

02 09 16 23 30 06 13 200

10000

20000

30000

40000

50000R1 (Recovered, still infectious)

Mar2020

Apr

02 09 16 23 30 06 13 20

0

10000

20000

30000

40000

50000

60000

70000

R2 (Recovered)

Mar2020

Apr

02 09 16 23 30 06 13 20

0

200

400

600

800


Observed fatalities

Mar2020

Apr

02 09 16 23 30 06 13 20

0.0

0.5

1.0

1.5

2.0

2.5


Mar2020

Apr

02 09 16 23 30 06 13 20

0

1000

2000

3000

4000

5000


Figure 8: Results in Galicia: we assume there are twice as many deaths as theofficial records.

15

Castilla y Leon

24

Mar2020

Apr

02 09 16 23 30 06 13 200

25000

50000

75000

100000

125000

150000

175000

200000


24

Mar2020

Apr

02 09 16 23 30 06 13 200

25000

50000

75000

100000

125000

150000

175000

200000


24

Mar2020

Apr

02 09 16 23 30 06 13 200

25000

50000

75000

100000

125000

150000

175000

200000


24

Mar2020

Apr

02 09 16 23 30 06 13 200

25000

50000

75000

100000

125000

150000

175000

200000


24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

50000

100000

150000

200000

250000

300000

R2 (Recovered)

24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

500

1000

1500

2000

2500

3000

3500


Observed fatalities

24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

2

4

6

8

10

12


24

Mar2020

Apr

02 09 16 23 30 06 13 20

0

5000

10000

15000

20000

25000Daily rate of infections ( t)

Figure 9: Results in Castilla y Leon: we assume there are twice as many deathsas the official records. 16

Paıs Vasco

Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

20000

40000

60000

80000

100000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

50000

100000

150000

200000

R2 (Recovered)

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

500

1000

1500

2000

2500


Observed fatalities

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

2

4

6

8

10


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

2000

4000

6000

8000

10000Daily rate of infections ( t)

Figure 10: Results in Paıs Vasco: we assume there are twice as many deaths asthe official records. 17

Madrid

Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

200000

400000

600000

800000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

200000

400000

600000

800000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

200000

400000

600000

800000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 200

200000

400000

600000

800000


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

200000

400000

600000

800000

1000000

1200000

1400000

R2 (Recovered)

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

2500

5000

7500

10000

12500

15000

17500Model fit vs Observed fatalities

Observed fatalities

Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

5

10

15

20


Mar2020

Apr

17 24 02 09 16 23 30 06 13 20

0

20000

40000

60000

80000

100000


Figure 11: Results in Madrid: we assume there are twice as many deaths as theofficial statistics. 18

Cataluna

Mar2020

Apr

24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

24 02 09 16 23 30 06 13 200

100000

200000

300000

400000

500000


Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

100000

200000

300000

400000

500000

600000

700000

800000

R2 (Recovered)

Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

2000

4000

6000

8000


Observed fatalities

Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

2

4

6

8

10


Mar2020

Apr

24 02 09 16 23 30 06 13 20

0

10000

20000

30000

40000

50000

60000


Figure 12: Results in Cataluna: we assume there are twice as many deaths asthe official statistics. 19

3.3 Analysis of results

The most relevant results are described below:

• Madrid is the most affected the region by COVID-19. 19.5 % of the pop-ulation could have been infected or recovered from the virus if we assumethat the number of deaths is double that reported by the Government. Atpresent, there may be 120, 000 patients recovered.

• Galicia is the region which suffered the mildest effects. The percentage ofinfected people is less than 2.5 %.

• Castilla y Leon, Pais Vasco, Cataluna could have suffered the effects ofCOVID-19 with a proportional magnitude. The percentage of infectionscould be between 6− 10 % of the population.

• The peak of new infections has probably occurred at the start of quaran-tine, while the peak of people who can contaminate took place betweenMarch 17-24.

• The number of new infections have been dramatically reduced after theintroduction of containment measures. Now, it seems that the situationis back under control.

20

4 Discussion

The Coronavirus pandemic is causing an unprecedented crisis in health, eco-nomic, and social terms throughout the world. To help designing better politi-cies in the future, estimating the number of people who were infected by thevirus is a crucial priority. To address this issue, we have proposed a new proba-bilistic model to estimate the spread of an epidemic by simulating the behaviourof each individual in a population.

COVID-19 in Spain

The predictions obtained over several regions of Spain have practical relevance.The results reveal that the number of infections is much higher than what isreported by the Spanish government. Two factors may explain these discrepan-cies: i) The tests were carried out in a limited way among the general population;and ii) no random sampling was done among the different regions throughoutthe territory shading the real magnitude of the pandemic. It is also worth not-ing that its impact was uneven across regions. Madrid was the most affectedregion, while Galicia was the least affected. However, according to our model,in Cataluna, Castilla y Leon, and the Paıs Vasco, the effects were surprisinglysimilar, even though the outbreak of infections theoretically started at differenttime points. In addition, the geographical dispersion of Castilla y Leon is moresignificant (see Table 1). Perhaps the older population of Castilla y Leon (seeTable 1) has a higher case fatality rate than the other communities, and ourmodel is overestimating the number of infected people in that region. As previ-ously mentioned, taking into account the demographic structure of the regions,together with the epidemiological profiles and other socioeconomic characteris-tics, is fundamental in the analysis of results.

More deaths than those reported in the official records

Even though many patients die due to the virus, this is not the official causeof death. This problem occurs due to a lack of diagnostic tests. Spanish andInternational press echoed this problem in the current Coronavirus crisis, andmore than twice as many deaths than those officially announced have beenreported in certain regions. A recent study reported that the actual number ofdeaths might be even three times higher in Italy [29].

Due to this fact, it is vital to take into account this uncertainty present inthe models to perform an accurate estimate of the number of actual infections.However, this uncertainty is not probabilistic. To alleviate this problem, wehave assumed that there could be up to twice the number of official deaths.

If we consider this correction, the number of infected increases from 40 to110 percent by region.

Practical considerations of this results

These findings allow us have a better sense of the degree of exposure to thevirus of the population and to design a restoration to normality based on thefollowing factors.

• The degree of immunity of each population.

21

• The number of people who might contaminate today.

• The number of recovered patients.

In this sense, it is vital to gradually de-escalate the confinement and alwaysproceed according to the individual characteristics of each region.

For the moment, there is no available vaccine for use with humans, and thereis uncertainty about treatments. Therefore it is advisable to take measures suchas social distancing [30] to prevent the infection rate from skyrocketing and toavoid another outbreak.

The notorious peak

Wondering about when the peak of infection would have been reached was ahot topic in mass media. However, such peak can be i) the maximum peak ofnew infections. ii) the maximum peak of people who can contaminate. In thefirst case, our models prove the existence of this peak between 9 − 16 March,while in the second case, it was estimated to be in March 17− 24.

From a perspective of virus containment, the most important is the secondone from as it indicates the number of potential transmitters.

Methodological aspects of epidemic modeling

From a methodological standpoint, indeed, there are already simulators thatwork at the individual level [31, 32, 33, 34, 35]. However, from our point of view,the models analyzed have some of the following potential limitations: i) somemodels use fixed parameters without prior optimization according to the targetpopulation; ii) the parameters do not usually evolve dynamically according tothe policies introduced; and iii), many of these models were designed withoutadding biological expert knowledge.

More generally we can find two philosophies when addressing epidemic fore-casting in the literature (see [18]): i) Mechanistic models: Using an epidemi-ological approach, these models explain the evolution of an epidemic from thecausal mechanisms of transmissionc [13]; ii) Phenomenological models: Fromentirely data-driven approaches such as regression models or time series, thedevelopment of epidemics is predicted using historical data [18].

Phenomenological models can be an appropriate solution in the case of epi-demics such as influenza [36], for which there exists long-term data collected onhospitalization and mortality rates. Additionally, the etiology, together with theprevious incidence rates, are better characterized. With all this, extrapolationsof the real impact of the epidemic in the whole population can be done fromdata-driven approaches. In addition, current evidence indicates that influenzaincubation times are shorter than those of COVID-19, which reduces an indi-vidual’s potential exposure to the virus. For all these reasons, we believe thatin the case of the new COVID-19, it is necessary to use Mechanistic models asour proposal.

Often, defining a clear boundary between Mechanistic models and Phe-nomenological models may not be simple, and even both methods can be seen ascomplementary approaches. For example, SIR models and their variants froma theoretical point of view try to explain the effects of an epidemic from themechanisms of transmission, however they are often used as time series [37].

22

At this point, we would like to point out that the underpinning assumptionsof these models are often unrealistic [5]. Moreover, from the perspective ofEpistemology and Epidemiology, the epidemic spread is based on the complexinteractions between individuals, which these models cannot capture due totheir population-based nature [18].

The use of individual models should be an aspiration in the development ofnew methods. In the current era of big data, in developed countries, a substan-tial part of the population is geolocalized through mobile devices. Moreover,it is increasingly common to use biosensors that monitor patient health [20].Knowledge about interactions between individuals and patient health can pro-vide a more realistic picture of the spread and evolution of the epidemic [38].However, in this context, it is necessary to systematically apply techniques thatcorrect the biases that appear in the observational nature of these data. At thesame time, it is also important to preserve the privacy of the citizens.

The absence of reliable information is one of the main problems with theCOVID-19 pandemic. To alleviate this problem, when fitting model param-eters, it is important to understand the social, economic, and demographiccharacteristics along with the epidemiological profiles of the region to introduceexpert knowledge into the model and reduce uncertainty in statistical learning.Also, with this strategy, we reduce the computational demands and we increasemodel interpretability.

Collaborative attitude

Information is one of the most valuable weapons in combating a pandemic ofthis magnitude. Understanding the mechanisms of virus transmission [39, 40],the prevalence of asymptomatic patients in the population [41], risk and prog-nostic factors [22] are topics that can reduce the number of deaths. Due to theabsence of prior knowledge of all these factors, a collaborative approach betweencountries and institutions to share their knowledge should be a top priority.

At an international level, we could highlight the initiative promoted by EranSegal from Israel, to which several countries have joined [42]. Segal and his teamdesigned a survey in which anonymous volunteers participated to estimate thereal evolution of the Coronavirus in Israel [43]. This questionnaire was adaptedand applied in other countries. There is now an international consortium inaction to share the results of these surveys, together with other clinical infor-mation from patients. However, it is important to note that the data obtainedfrom the questionnaires do not represent a random sample of the population,which limits the quality of the data collected.

Expert knowledge in parameter fitting

From a general perspective of inverse problems and statistical learning, theestimation of the model parameters is an ill-posed problem [44, 45, 46, 47, 48,49]. In the case of epidemic spread models, these difficulties are even moresignificant (see, for example, [50, 51]). In practice, the evolution of severalcompartments is predicted simultaneously, and usually, there is only informationabout the actual growth of one of the compartments.

In this regard, it is vital to follow Ockham’s Razor rule: the simplest solutionis most likely the right. Reducing the number of variables to be fitted in the

23

models and introducing expert knowledge into these models guided by currentbiological evidence is for ill-posed estimation problems.

In the case of COVID-19, the current epidemiological evidence is inconsis-tent. It is especially noteworthy about the case fatality rate (CFR) and theproportion of asymptomatic patients [52]. In the first case, variations between0.4 and 15 percent have been reported in the literature. Meanwhile, with asymp-tomatic patients, these range from 20 percent to 80 percent. The primary ratio-nale for these discrepancies is the application of non-observational data analysismethods for a sample composed largely of elderly patients. In this sense, aftercarrying out our estimates taking into account the demographic structure ofSpain and using the published articles that we believe reflect the reality in amore precise way, we estimate that the rate of asymptomatic patients ranges be-tween 75 and 85 percent. Likewise, CFR may vary between 0.4 and 0.8 percent.A recent article has estimated that asymptomatic patients to be 78 percent [53].

Combining a probabilistic model with a deterministic in-verse problem

Our probabilistic model is non-stationary and has a complex dependency struc-ture. In addition, the time series of each day’s death records can be seen as arealization of the stochastic process.

To fit the model, we estimate a deterministic inverse problem that makes theobserved deaths is the average of the random process. From there, we define awindow, and we try with a non-probabilistic approach to capture the trajectoriesof fatalities close to the real one. In this way, we can obtain trajectories of therandom process that allow us to explain the real situation of the infected ineach region. In other words: From day one, with the chosen configuration ofparameters, we can have many different scenarios in our probabilistic model;however, at the present moment, we already observed a concrete trajectory.Then, using this trajectory as a reference, we look for a set of close trajectoriesthat measure other alternative scenarios that approximate the real number ofdeaths.

Model limitations

The main limitation of our proposal is that some of the parameters on which themodel depends on are set and may need to be tuned more carefully to the studypopulation. However, the current biological evidence does not allow furtherrefinement. Also, optimizing more parameters with a data-driven approach isdangerous due to the quality of data about infections reported by governments.Therefore, it is better to restrict oneself to current biological expertise informa-tion.

The models are an approximation of reality, and as Cox would say: all modelsare wrong, some are useful. We are predicting the unknown in this epidemicwithout reliable data. However, with statistical simulation models [54, 55], wecan see the effect of a deviation of the input parameters, the extreme cases, andthe model predictions in other scenarios such as the end of containment policies.

Another limitation of our model is that despite simulating data for each indi-vidual in the population, we were not using information about the demographic

24

structure and epidemiological profile along regions. We also not introduce ge-olocation information as in the interaction between each individual in the pop-ulation [38]. However, the inclusion of all these aspects, may not lead to greateraccuracy as no reliable model output data are available, and, furthermore, wewould be entering into privacy issues that are beyond the scope of this study.

Future work and model extensions

Our immediate work is to extend our estimations to other countries. Also, wewant to simulate the effect of an immediate end coronavirus lockdown [56] ineach of these location. We also want to carry out simulations to find out howto minimize the effects of a possible outbreak in Autumn [4], for example, withsocial distance [57] or weeks of partitioned work: combining remote workingwith face-to-face activity. In a complementary way, it would be interesting topredict the effects of these measures on the economy. Some economists claimthat the economic consequences of this crisis may be more damaging to ourhealth in the long term than the effects caused directly by the virus [58].

From a methodological point of view, another exciting extension is to de-velop a mixed model or an empirical Bayes framework (see an example of thismethodology [59]) to make a simultaneous estimation across many regions andto be able to make a statistical inference about the parameters of the models.In this case, it may be interesting to optimize the parameters with Bayesianoptimization [60], to alleviate the computational issues of this approach. An-other variation of the model may be to employ other probability distributionssuch as the Conway-Maxwell-Poisson distribution [61] to handle underdisper-sion situations in the number of newly infected. However, our current electionof Poisson distribution is well motivated given the asymptotic properties of theErdos-Renyi model [62].

Finally, another possible improvement of the model could be to introduce anensemble forecasting as they do in meteorology and manage better the possibleuncertainty in the initial conditions [63].

Our future prediction in the world

Given the current emergency, and the need to provide predictions on the spreadof the epidemic worldwide, we have decided to set up a website (https://github.com/covid19-modeling/forecasts) where we regularly update ourestimates across many regions of the world in a fully accessible manner. Inaddition to this, we will be incorporating our simulations of what the effect ofcurrent end coronavirus lockdown would be in each of these regions.

25

https://github.com/covid19-modeling/forecasts

https://github.com/covid19-modeling/forecasts

5 Mathematical model

5.1 Model elements

We suppose that D = 0, 1, · · · , n is the set of days under study. Consider thefollowing random processes whose domain is defined on D.

• S(t): Number of people susceptible to become infected on day t.

• I1(t): Number of infected individuals who are incubating the virus on dayt.

• I2(t): Number of infected people who have passed the incubation periodand do not show symptoms on day t.

• I3(t): Number of infected people who have passed the incubation periodand do show symptoms on day t.

• R1(t): Number of recovered cases which are still able to infect on day t.

• R2(t). Number of recovered cases which are not able to infect anymore onday t.

• M(t): Number of deaths on day t.

Henceforth, we will denote by I(t) = I1(t) + I2(t) + I3(t) the number ofinfected people at time t ∈ D and R(t) = R1(t)+R2(t) the number of recoveredpeople.

The above random processes describe the evolution of population in differ-ent compartments. However, unlike the classics models [6, 10], we divided theinfected and recovered individuals in a more broader and realistic taxonomyfor the particular case of the COVID-19. There are two main reasons for this:First, the patients tested by healthcare are usually those found in I3. In thiscase, there exists a corpus of prior knowledge about how they evolve over timeand in case of death, which is their survival time. Second, there is evidence thatthere are recovered patients who can still infect others.

5.2 Model definition

The causal mechanism of new infected individuals (see Figure 13) is introducedbelow. For each day t ∈ D, we assume that the new infected population I1(t)−I1(t−1), is generated by the individual interaction of the susceptible people withinfected patients and the recovered cases who can still contaminate. Formally,we assume that if a person can contaminate, it does so according to a randomvariable X ∼ Poisson(Ri(t)), being Ri(t) the average number of new infectionsthat can cause each person in the day t. It is important to remark, that itis natural to assume that the function Ri(t) follows a decreasing trend overtime in our setting, basically for two reasons: i) quarantine policies have beensystematically introduced along with different countries and regions. ii) thenumber of susceptible people decreases over time, while the number of infectedpeople can increase. This all makes it more complicated to interact with anon-infected people.

26

S I1

I3

M R2

I2

R1

α

1− β

β

1− α

Figure 13: Diagram of state changes in our model.

Once a new infected person arrives to the model (see Figure 13), we assumethat the transitions between the different graph states are modeled by a proba-bility law that verifies the following conditions: i) the transition probabilities areindependent of the absolute instant when such transition takes place ii) the prob-abilities depend only on the current state of the patient regardless of the previ-ous path in the graph. In particular, given the States = I1, I2, I3, R1, R2,M,and α, β ∈ [0, 1], we have: P(I2|I1) = α, P(I3|I1) = 1 − α, P(M |I3) = β,P(R1|I3) = 1− β, P(I3|I2) = 1, P(R2|R1) = 1; all other transitions take a valueequal to zero in probability. More schematically, the transition matrix betweenevents is shown in the Equation 1.

P =

I1 I2 I3 R1 M R2I1 0 1− α α 0 0 0I2 0 0 0 1 0 0I3 0 0 0 1− β β 0R1 0 0 0 0 0 1M 0 0 0 0 1 0R2 0 0 0 0 0 1

. (1)

Additionally, the Table 2 shows the random variables that model the timebetween transitions together with the references that have been used. In par-ticular cases, as survival time, we have made our estimates based on data fromsome of these researchers and incorporating some expert knowledge of others.

Transition Random variable Used referencesI1 → I2 Gamma(5.807, 0.948) [64, 65]I1 → I3 Gamma(5.807, 0.948) [64, 65]I2 → R1 Uniform(5, 10)I3 → R1 Uniform(9, 14) [65]I3 →M Gamma(6.67, 2.55) [66, 67, 65, 68]R1 → R2 Uniform(7, 14) [69, 70]

Table 2: Random variables of the time of each transition

Finally, in a similar way, in Table 3, we show the values used to model the

27

transition probabilities.

Coefficient Value Used referencesα 0.8 [53, 71, 72, 73]β 0.06 [74, 52, 75, 76, 77, 78]

Table 3: Probability of each transition

5.3 Model implementation

The proposed model is not known to have an closed-form solution. In a real-world setting, it is necessary to use statistical simulation methods to approx-imate the mean trajectory or quantile functions over time. Also, we must fitsome parameters of the model to characterize the behavior of the study popu-lation. For this purpose, we use a sample of the deceased patients M1, M2,· · · ,Ms of each day in the timeframe O = 1, · · · , s.

Next, we suppose that our model (M) is dependent on a vector of pa-rameters θ = (θ1, θ2) ∈ Rp1 × Rp2 (with p1 + p2 = p), where θ1 is a vectorof dimension p1, defined in beforehand, and θ2 must be estimated from thesample. Furthermore, let us assume that the initial state of the system ischaracterized by S = (S(0), I1(0), I2(0), I3(0), R1(0), R2(0),M(0)) ∈ N7 andT = (T1(0), T2(0), T3(0), T4(0), T5(0), T6(0)) ∈ Nm × · · · × Nm. S has the num-ber of elements for each compartment of the model on day 0. T also containsthe amount of remaining days to complete the transition they are in for eachindividual in the initial state, being m a integer number that represents themaximun numbers of registred days.

To simplify the notation, from now on we denote the average dead trajectoryby the function Mean(θ1, θ2, S, T )(t) for each day t ∈ D

The next step is to estimate θ2. To do this, we solve the following optimiza-tion problem:

θ2 = arg minθ2∈S⊂Rp2

s∑i=1

ωi(Mi −Mean(θ1, θ2, S, T )(i))2, (2)

where ω = (ω1, · · · , ωs) is a weighted vector that can help to improve modelestimation. Examples of this weights may be:

ωi = Mi/∑si=1Mi or ωi = (1/Mi)/(

∑si=1 1/Mi) (i = 1, . . . , s).

At this point, it is relevant to note that the above optimization problem (2),as formulated, includes the possibility of introducing constraints in the space ofparameters. It is important because we have prior knowledge of what the rangeof the parameters is. We may know constraints involving them as well.

In Equation (2), we have used the real mean trajectory. However, in practice,this is unknown, and we must approximate it using simulation. Next, we run Bdifferents simulations and we denote byM(θ1, θ

02, S, T ) = 1

B

∑Bi=1M

i(θ1, θ2, S, T ),the estimated mean trajectory. M i(θ, θ2, S, T ) (i = 1, · · · , B) denote the resultof each simulation.

So the optimization problem to be solved is:

28

θ2 = arg minθ2∈S⊂Rp2

s∑i=1

ωi(Mi −M(θ1, θ2, S, T )(i))2. (3)

Schematically, the overall optimization process is described below.

1. Define an initial θ02 and run B times M(θ1, θ2, S, T ). We denote byM1(θ1, θ

02, S, T ), · · · , MB(θ1, θ

02, S, T ) to each of the obtained results.

2. Estimate the mean trajectory M(θ1, θ02, S, T ) = 1

B

∑Bi=1M

i(θ1, θ02, S, T )

3. Estimate the mean square error ˆRSS0

=∑si=1 ω

i(M(θ1, θ02, S, T )(i) −

Mi)2.

4. To construct a succession of vectors θj2M+1j=1 so that ˆRSS

0> ˆRSS

1>

ˆRSS2> · · · > ˆRSS

M+1. For example with a stochastic optimization

algorithm.

5. Stop after M + 1 iterations and return θM+12 as the optimal parameter of

the problem.

In our particular setting, θ2 contain the parameter of a Ri(t) function defines

in Section 5.2. From now on, we asume thatRi(t) = minC, ae−(bt+ct2+dt3+et4+ft5)where a ∈ [0, 3], b ∈ [−1, 1], c ∈ [0, 1], d ∈ [0, 1], e ∈ [0, 1], f ∈ [0, 1] withθ2 = (a, b, c, d, e, f) ∈ [0, 3] × [−1, 1] × [0, 1] × [0, 1] × [0, 1] × [0, 1] and C is apositive constant fixed 0.005.

5.4 Model optimization

In our setting, we have found optimal parameters taking into account random-ness in the approximation of the mean. To do this, we must resort to stochasticoptimization algorithms.

Many algorithms of this kind can be found in literature, but based onthe good results obtained in a preeliminary analysis we have decided to usea state-of-the-art evolutionary algorithm: the CMA-ES [79]. CMA-ES is anevolutionary-based derivative-free optimization technique that can optimize awide variety of functions, including noisy functions as the one we use in ourmethod. One survey of Black-Box optimizations found it outranked 31 otheroptimization algorithms, performing especially keen on “difficult functions” orlarger dimensional search spaces [80]. From a theoretical point of view, CMA-EScan be seen as a particular case of Expectation-Maximization algorithm (EM)[81].

5.5 Model Inference

Our model has been fitted independently for each region. Therefore, we cannotmake statistical inference in the usual sense since we only observe a trajectoryof a stochastic process with a complex dependence structure. Hence, to identifythe model, we assume that the observed trajectory of accumulated deaths is themean of our random process.

29

Given a day t′ ∈ O, Mt′ accumulated number of observed deaths in t′ andthe multivariate random processX(t) = (I1(t), I2(t), I3(t), R1(t), R2(t),M(t))t∈D, the death trajectories of theprobabilistic model close to the observed path on t′, are for us a set of possibleexplanations of the spread of the coronavirus in a study region. We now formal-ize this idea. Let Iα = [Mt′ − αMt′ ,Mt′ + αMt′ ], where α ∈ (0, 1). Considerer,ω ∈ Ω : M(t′, ω) ∈ Iα, being Ω sample space of process X(t)t∈D. We buildour confidence bands of level α as paths that belong ω ∈ Ω : M(t′, ω) ∈ Iα.

Our confidence bands contain the trajectories of infected and recovered pa-tients that generate many deaths similar to that which occurs in reality. There-fore, in practice, α = 0.1 is usually a fair value, and t′ is selected as one of thelatest datums in the historical records.

5.6 Software details and resources

Our proposal has been implemented in several programming languages: C++,Python, and R, although the results shown in this article have been obtainedwith Python. We optimized the parameters using library pycma [82], and Numpy

has been used for mathematical operations.In the different statistical analyses performed, we have used R. Plots have

been made both in R with ggplot2 library and in Python with Matplotlib.Finally, the training data used to fit the models can be downloaded from at

[83] and [84].In the immediate future we are going to release the code used in this paper for

the benefit of the scientific community at (https://github.com/covid19-modeling).

Acknowledgements

The authors thank Oscar Hernan Madrid Padilla for their helpful suggestions.We are also grateful to Felipe Munoz Lopez for his help in the computer imple-mentations at the early stages of this research.

This work has received financial support from the Consellerıa de Cultura, Ed-ucacion e Ordenacion Universitaria (accreditation 2019-2022 ED431G-2019/04)and the European Regional Development Fund (ERDF), which acknowledgesthe CiTIUS-Research Center in Intelligent Technologies of the University ofSantiago de Compostela as a Research Center of the Galician University Sys-tem.

Competing Interests

The authors declare no competing interests.

References

[1] Ilona Kickbusch and Gabriel Leung. Response to the emerging novel coro-navirus outbreak. BMJ, 368, 2020.

30

https://github.com/covid19-modeling

[2] Scott P. Layne, James M. Hyman, David M. Morens, and Jeffery K.Taubenberger. New coronavirus outbreak: Framing questions for pandemicprevention. Science Translational Medicine, 12(534), 2020.

[3] Martin Enserink and Kai Kupferschmidt. Mathematics of life and death:How disease models shape national shutdowns and other pandemic policies.Science Magazine, 2020.

[4] Stephen M Kissler, Christine Tedijanto, Edward Goldstein, Yonatan HGrad, and Marc Lipsitch. Projecting the transmission dynamics of sars-cov-2 through the postpandemic period. Science, 2020.

[5] A Huppert and G. Katriel. Mathematical modelling and prediction in infec-tious disease epidemiology. Clinical Microbiology and Infection, 19(11):999–1005, Nov 2013.

[6] Matt J Keeling and Pejman Rohani. Modeling infectious diseases in humansand animals. Princeton University Press, 2011.

[7] Sandip Mandal, Ram Rup Sarkar, and Somdatta Sinha. Mathematicalmodels of malaria-a review. Malaria journal, 10(1):202, 2011.

[8] Linda JS Allen. Some discrete-time si, sir, and sis epidemic models. Math-ematical biosciences, 124(1):83–105, 1994.

[9] Junkichi Satsuma, R Willox, A Ramani, B Grammaticos, and AS Carstea.Extending the sir epidemic model. Physica A: Statistical Mechanics andits Applications, 336(3-4):369–375, 2004.

[10] Frank Ball, Catherine Laredo, David Sirl, and Viet Chi Tran. StochasticEpidemic Models with Inference, volume 2255. Springer Nature, 2019.

[11] Boseung Choi and Grzegorz A Rempala. Inference for discretely observedstochastic kinetic networks with applications to epidemic modeling. Bio-statistics, 13(1):153–165, 2012.

[12] Dave Osthus, Kyle S Hickmann, Petruta C Caragea, Dave Higdon, andSara Y Del Valle. Forecasting seasonal influenza with a state-space sirmodel. The annals of applied statistics, 11(1):202, 2017.

[13] J. S. Koopman and J. W. Lynch. Individual causal models and popula-tion system models in epidemiology. American journal of public health,89(8):1170–1174, Aug 1999. 10432901[pmid].

[14] J. Gomez-Gardees, L. Lotero, S. N. Taraskin, and F. J. Perez-Reche. Ex-plosive contagion in networks. Scientific Reports, 6(1):19767, 2016.

[15] Matt J. Keeling and Ken T. D. Eames. Networks and epidemic mod-els. Journal of the Royal Society, Interface, 2(4):295–307, Sep 2005.16849187[pmid].

[16] Michael R Kosorok and Eric B Laber. Precision medicine. Annual reviewof statistics and its application, 6:263–286, 2019.

31

[17] Nicholas G. Reich, Logan C. Brooks, Spencer J. Fox, Sasikiran Kandula,Craig J. McGowan, Evan Moore, Dave Osthus, Evan L. Ray, AbhinavTushar, Teresa K. Yamana, Matthew Biggerstaff, Michael A. Johansson,Roni Rosenfeld, and Jeffrey Shaman. A collaborative multiyear, multi-model assessment of seasonal influenza forecasting in the united states.Proceedings of the National Academy of Sciences, 116(8):3146–3154, 2019.

[18] Logan C. Brooks, David C. Farrow, Sangwon Hyun, Ryan J. Tibshi-rani, and Roni Rosenfeld. Nonmechanistic forecasts of seasonal influenzawith iterative one-week-ahead distributions. PLOS Computational Biology,14(6):1–29, 06 2018.

[19] Cecile Viboud and Alessandro Vespignani. The future of influenza forecasts.Proceedings of the National Academy of Sciences, 116(8):2802–2804, 2019.

[20] Xiao Li, Jessilyn Dunn, Denis Salins, Gao Zhou, Wenyu Zhou,Sophia Miryam Schussler-Fiorenza Rose, Dalia Perelman, Elizabeth Col-bert, Ryan Runge, Shannon Rego, Ria Sonecha, Somalee Datta, TraceyMcLaughlin, and Michael P. Snyder. Digital health: Tracking physiomesand activity using wearable biosensors reveals useful health-related informa-tion. PLoS biology, 15(1):e2001402–e2001402, Jan 2017. 28081144[pmid].

[21] Sander Greenland. Multiple-bias modelling for analysis of observationaldata. Journal of the Royal Statistical Society: Series A (Statistics in Soci-ety), 168(2):267–306, 2005.

[22] Marc Lipsitch, Christl Donnelly, Christophe Fraser, Isobel Blake, AnneCori, Ilaria Dorigatti, Neil Ferguson, Tini Garske, Harriet Mills, StevenRiley, Maria Van Kerkhove, and Miguel Hernan. Potential biases in es-timating absolute and relative case-fatality risks during outbreaks. PLoSneglected tropical diseases, 9:e0003846, 07 2015.

[23] Heejung Bang and James Robins. Doubly robust estimation in missingdata and causal inference models. Biometrics, 61:962–73, 01 2006.

[24] John Hopkins Coronavirus Resource Center. https://coronavirus.jhu.edu/map.html.

[25] Instituto Nacional de Estadıstica (INE). https://www.ine.es/.

[26] Population Reference Bureau. https://www.prb.org/

countries-with-the-oldest-populations/.

[27] AC Ghani, CA Donnelly, DR Cox, JT Griffin, C Fraser, TH Lam, LM Ho,WS Chan, RM Anderson, AJ Hedley, et al. Methods for estimating the casefatality ratio for a novel, emerging infectious disease. American journal ofepidemiology, 162(5):479–486, 2005.

[28] Xianxian Zhao, Bili Zhang, Pan Li, Chaoqun Ma, Jiawei Gu, Pan Hou,Zhifu Guo, Hong Wu, and Yuan Bai. Incidence, clinical characteristicsand prognostic factor of patients with covid-19: a systematic review andmeta-analysis. medRxiv, 2020.

32

https://coronavirus.jhu.edu/map.html

https://coronavirus.jhu.edu/map.html

https://www.ine.es/

https://www.prb.org/countries-with-the-oldest-populations/

https://www.prb.org/countries-with-the-oldest-populations/

[29] Paolo Buonanno, Sergio Galletta, and Marcello Puca. News from the covid-19 epicenter. Available at SSRN 3567093, 2020.

[30] Spencer Woody, Mauricio Garcia Tec, Maytal Dahan, Kelly Gaither,Michael Lachmann, Spencer Fox, Lauren Ancel Meyers, and James GScott. Projections for first-wave covid-19 deaths across the us using social-distancing measures derived from mobile phones. medRxiv, 2020.

[31] Keith R. Bisset, Ashwin M. Aji, Madhav V. Marathe, and Wu-chun Feng.High-performance biocomputing for simulating the spread of contagion overlarge contact networks. BMC genomics, 13 Suppl 2(Suppl 2):S3–S3, Apr2012. 22537298[pmid].

[32] Dennis L. Chao, M. Elizabeth Halloran, Valerie J. Obenchain, and Ira MLongini, Jr. Flute, a publicly available stochastic influenza epidemic simu-lation model. PLOS Computational Biology, 6(1):1–8, 01 2010.

[33] Keith Bisset, Jiangzhuo Chen, Xizhou Feng, Sritesh Kumar, and MadhavMarathe. Epifast: A fast algorithm for large scale realistic epidemic simu-lations on distributed memory systems. pages 430–439, 01 2009.

[34] Y Ma, K. R. Bisset, J. Chen, S. Deodhar, and M. V. Marathe. Formal spec-ification and experimental analysis of an interactive epidemic simulationframework. In 2011 IEEE International Conference on High PerformanceComputing and Communications, pages 790–795, Sep. 2011.

[35] Liang Mao and Ling Bian. Spatial-temporal transmission of influenza andits health risks in an urbanized area. Computers, Environment and UrbanSystems, 34(3):204–215, May 2010. PMC7124201[pmcid].

[36] Centers for Disease Control and Prevention. https://www.cdc.gov/flu/

about/keyfacts.htm.

[37] Ottar N Bjørnstad, Barbel F Finkenstadt, and Bryan T Grenfell. Dynamicsof measles epidemics: estimating scaling of transmission rates using a timeseries sir model. Ecological monographs, 72(2):169–184, 2002.

[38] Lars Lorch, William Trouleau, Stratis Tsirtsis, Aron Szanto, BernhardScholkopf, and Manuel Gomez-Rodriguez. A spatiotemporal epidemicmodel to quantify the effects of contact tracing, testing, and containment.arXiv preprint arXiv:2004.07641, 2020.

[39] Juanjuan Zhang, Maria Litvinova, Wei Wang, Yan Wang, Xiaowei Deng,Xinghui Chen, Mei Li, Wen Zheng, Lan Yi, Xinhua Chen, et al. Evolvingepidemiology and transmission dynamics of coronavirus disease 2019 out-side hubei province, china: a descriptive and modelling study. The LancetInfectious Diseases, 2020.

[40] Lei Luo, Dan Liu, Xin-long Liao, Xian-bo Wu, Qin-long Jing, Jia-zhenZheng, Fang-hua Liu, Shi-gui Yang, Bi Bi, Zhi-hao Li, Jian-ping Liu, Wei-qi Song, Wei Zhu, Zheng-he Wang, Xi-ru Zhang, Pei-liang Chen, Hua-minLiu, Xin Cheng, Miao-chun Cai, Qing-mei Huang, Pei Yang, Xin-fen Yang,Zhi-gang Huang, Jin-ling Tang, Yu Ma, and Chen Mao. Modes of contactand risk of transmission in covid-19 among close contacts. medRxiv, 2020.

33

https://www.cdc.gov/flu/about/keyfacts.htm

https://www.cdc.gov/flu/about/keyfacts.htm

[41] Yan Bai, Lingsheng Yao, Tao Wei, Fei Tian, Dong-Yan Jin, Lijuan Chen,and Meiyun Wang. Presumed Asymptomatic Carrier Transmission ofCOVID-19. JAMA, 02 2020.

[42] Eran Segal, Feng Zhang, Xihong Lin, Gary King, Ophir Shalem, SmadarShilo, William E. Allen, Yonatan H. Grad, Casey S. Greene, Faisal Alquad-doomi, Simon Anders, Ran Balicer, Tal Bauman, Ximena Bonilla, GiselBooman, Andrew T. Chan, Ori Ori Cohen, Silvano Coletti, Natalie David-son, Yuval Dor, David A. Drew, Olivier Elemento, Georgina Evans, PhilEwels, Joshua Gale, Amir Gavrieli, Benjamin Geiger, Iman Hajirasouliha,Roman Jerala, Andre Kahles, Olli Kallioniemi, Ayya Keshet, Gregory Lan-dua, Tomer Meir, Aline Muller, Long H. Nguyen, Matej Oresic, SvetlanaOvchinnikova, Hedi Peterson, Jay Rajagopal, Gunnar Ratsch, Hagai Ross-man, Johan Rung, Andrea Sboner, Alexandros Sigaras, Tim Spector, RonSteinherz, Irene Stevens, Jaak Vilo, Paul Wilmes, and . Building an in-ternational consortium for tracking coronavirus health status. medRxiv,2020.

[43] Hagai Rossman, Ayya Keshet, Smadar Shilo, Amir Gavrieli, Tal Bauman,Ori Cohen, Esti Shelly, Ran Balicer, Benjamin Geiger, Yuval Dor, andEran Segal. A framework for identifying regional outbreak and spread ofcovid-19 from one-minute population-wide surveys. Nature Medicine, 2020.

[44] Finbarr O’Sullivan. A statistical perspective on ill-posed inverse problems.Statistical science, pages 502–518, 1986.

[45] Martin Hanke and Otmar Scherzer. Inverse problems light: Numerical dif-ferentiation. The American Mathematical Monthly, 108(6):512–521, 2001.

[46] Peter J Bickel, Bo Li, Alexandre B Tsybakov, Sara A van de Geer, BinYu, Teofilo Valdes, Carlos Rivero, Jianqing Fan, and Aad van der Vaart.Regularization in statistics. Test, 15(2):271–344, 2006.

[47] Vladimir Vapnik. The nature of statistical learning theory. Springer science& business media, 2013.

[48] Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Gio-vannini, and Francesca Odone. Learning from examples as an inverse prob-lem. Journal of Machine Learning Research, 6(May):883–904, 2005.

[49] Vladimir Vapnik and Rauf Izmailov. V-matrix method of solving statisticalinference problems. Journal of Machine Learning Research, 16(51):1683–1730, 2015.

[50] Huili Xiang and Bin Liu. Solving the inverse problem of an sis epidemicreaction-diffusion model by optimal control methods. Computers & Math-ematics with Applications, 70(5):805–819, 2015.

[51] Alexandra Smirnova and Gerardo Chowell. A primer on stable parameterestimation and forecasting in epidemiology by a problem-oriented regular-ized least squares algorithm. Infectious Disease Modelling, 2(2):268–275,May 2017. 29928741[pmid].

34

[52] Anthony S. Fauci, H. Clifford Lane, and Robert R. Redfield. Covid-19 -navigating the uncharted. New England Journal of Medicine, 382(13):1268–1269, 2020.

[53] Michael Day. Covid-19: four fifths of cases are asymptomatic, china figuresindicate. BMJ, 369, 2020.

[54] Satyajith Amaran, Nikolaos V. Sahinidis, Bikram Sharda, and Scott J.Bury. Simulation optimization: a review of algorithms and applications.4OR, 12(4):301–333, 2014.

[55] Satyajith Amaran, Nikolaos V. Sahinidis, Bikram Sharda, and Scott J.Bury. Simulation optimization: a review of algorithms and applications.Annals of Operations Research, 240(1):351–380, 2016.

[56] Corey M Peak, Rebecca Kahn, Yonatan H Grad, Lauren M Childs, Ruo-ran Li, Marc Lipsitch, and Caroline O Buckee. Modeling the comparativeimpact of individual quarantine vs. active monitoring of contacts for themitigation of covid-19. medRxiv, 2020.

[57] Stephen M Kissler, Christine Tedijanto, Marc Lipsitch, and Yonatan Grad.Social distancing strategies for curbing the covid-19 epidemic. medRxiv,2020.

[58] Martin McKee and David Stuckler. If the world fails to protect the economy,covid-19 will damage health not just now but also in the future. NatureMedicine, 2020.

[59] Ziyue Liu and Wensheng Guo. Government responses matter: Predictingcovid-19 cases in us under an empirical bayesian time series framework.medRxiv, 2020.

[60] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and NandoDe Freitas. Taking the human out of the loop: A review of bayesian opti-mization. Proceedings of the IEEE, 104(1):148–175, 2015.

[61] Galit Shmueli, Thomas P. Minka, Joseph B. Kadane, Sharad Borle, andPeter Boatwright. A useful distribution for fitting discrete data: revival ofthe conway-maxwell-poisson distribution. Journal of the Royal StatisticalSociety: Series C (Applied Statistics), 54(1):127–142, 2005.

[62] Thomas House. Modelling epidemics on networks. Contemporary Physics,53(3):213–225, 2012.

[63] ZHAN Zhang and TN Krishnamurti. Ensemble forecasting of hurricanetracks. Bulletin of the American Meteorological Society, 78(12):2785–2796,1997.

[64] Stephen A Lauer, Kyra H Grantz, Qifang Bi, Forrest K Jones, Qulu Zheng,Hannah R Meredith, Andrew S Azman, Nicholas G Reich, and JustinLessler. The incubation period of coronavirus disease 2019 (covid-19) frompublicly reported confirmed cases: estimation and application. Annals ofinternal medicine, 2020.

35

[65] AG Abdel-Salam and M Mollazehi. Modeling survival time to recoveryfrom covid-19: A case study on singapore. 2020.

[66] Coronavirus Pneumonia Emergency Response Epidemiology Novel et al.The epidemiological characteristics of an outbreak of 2019 novel coronavirusdiseases (covid-19) in china. Zhonghua liu xing bing xue za zhi= Zhonghualiuxingbingxue zazhi, 41(2):145, 2020.

[67] Robert Verity, Lucy C. Okell, Ilaria Dorigatti, Peter Winskill, CharlesWhittaker, Natsuko Imai, Gina Cuomo-Dannenburg, Hayley Thompson,Patrick G. T. Walker, Han Fu, Amy Dighe, Jamie T. Griffin, MarcBaguelin, Sangeeta Bhatia, Adhiratha Boonyasiri, Anne Cori, Zulma Cu-cunuba, Rich FitzJohn, Katy Gaythorpe, Will Green, Arran Hamlet, WesHinsley, Daniel Laydon, Gemma Nedjati-Gilani, Steven Riley, Sabine vanElsland, Erik Volz, Haowei Wang, Yuanrong Wang, Xiaoyue Xi, Christl A.Donnelly, Azra C. Ghani, and Neil M. Ferguson. Estimates of the severityof coronavirus disease 2019: a model-based analysis. The Lancet InfectiousDiseases, 2020/04/19 XXXX.

[68] Henrik Salje, Cecile Tran Kiem, Noemie Lefrancq, Noemie Courtejoie,Paolo Bosetti, Juliette Paireau, Alessio Andronico, Nathanael Hoze, Je-hanne Richet, Claire-Lise Dubost, et al. Estimating the burden of sars-cov-2 in france. 2020.

[69] Qifang Bi, Yongsheng Wu, Shujiang Mei, Chenfei Ye, Xuan Zou, ZhenZhang, Xiaojian Liu, Lan Wei, Shaun A Truelove, Tong Zhang, et al. Epi-demiology and transmission of covid-19 in shenzhen china: Analysis of 391cases and 1,286 of their close contacts. MedRxiv, 2020.

[70] Katrin Zwirglmaier Ehmann, Christian Drosten, Clemens Wendtner,MD Zange, Patrick Vollmar, DVM Rosina Ehmann, Katrin Zwirglmaier,MD Guggemos, Michael Seilmaier, Daniela Niemeyer, et al. Virologicalassessment of hospitalized cases of coronavirus disease 2019.

[71] Hiroshi Nishiura, Tetsuro Kobayashi, Takeshi Miyama, Ayako Suzuki,Sungmok Jung, Katsuma Hayashi, Ryo Kinoshita, Yichi Yang, BaoyinYuan, Andrei R. Akhmetzhanov, and Natalie M Linton. Estimation of theasymptomatic ratio of novel coronavirus infections (covid-19). medRxiv,2020.

[72] Sakiko Tabata, Kazuo Imai, Shuichi Kawano, Mayu Ikeda, Tatsuya Ko-dama, Kazuyasu Miyoshi, Hirofumi Obinata, Satoshi Mimura, TsutomuKodera, Manabu Kitagaki, Michiya Sato, Satoshi Suzuki, Toshimitsu Ito,Yasuhide Uwabe, and Kaku Tamura. Non-severe vs severe symptomaticcovid-19: 104 cases from the outbreak on the cruise ship “diamond princess”in japan. medRxiv, 2020.

[73] Kenji Mizumoto, Katsushi Kagaya, Alexander Zarebski, and GerardoChowell. Estimating the asymptomatic proportion of coronavirus disease2019 (covid-19) cases on board the diamond princess cruise ship, yokohama,japan, 2020. Eurosurveillance, 25(10), 2020.

36

[74] Dimple D Rajgor, Meng Har Lee, Sophia Archuleta, Natasha Bagdasarian,and Swee Chye Quek. The many estimates of the covid-19 case fatalityrate. The Lancet Infectious Diseases, 2020.

[75] Robert Verity, Lucy C. Okell, Ilaria Dorigatti, Peter Winskill, CharlesWhittaker, Natsuko Imai, Gina Cuomo-Dannenburg, Hayley Thompson,Patrick G. T. Walker, Han Fu, Amy Dighe, Jamie T. Griffin, MarcBaguelin, Sangeeta Bhatia, Adhiratha Boonyasiri, Anne Cori, Zulma Cu-cunuba, Rich FitzJohn, Katy Gaythorpe, Will Green, Arran Hamlet, WesHinsley, Daniel Laydon, Gemma Nedjati-Gilani, Steven Riley, Sabine vanElsland, Erik Volz, Haowei Wang, Yuanrong Wang, Xiaoyue Xi, Christl A.Donnelly, Azra C. Ghani, and Neil M. Ferguson. Estimates of the severityof coronavirus disease 2019: a model-based analysis. The Lancet InfectiousDiseases, 2020/04/06 XXXX.

[76] Joseph T. Wu, Kathy Leung, Mary Bushman, Nishant Kishore, ReneNiehus, Pablo M. de Salazar, Benjamin J. Cowling, Marc Lipsitch, andGabriel M. Leung. Estimating clinical severity of covid-19 from the trans-mission dynamics in wuhan, china. Nature Medicine, 2020.

[77] Elisabeth Mahase. Covid-19: death rate is 0.66% and increases with age,study estimates. BMJ, 369:m1327, Apr 2020.

[78] Christian Dudel, Tim Riffe, Enrique Acosta, Alyson A van Raalte, andMikko Myrskyla. Monitoring trends and differences in covid-19 case fatalityrates using decomposition methods: Contributions of age structure andage-specific fatality. medRxiv, 2020.

[79] Nikolaus Hansen. The cma evolution strategy: A tutorial. arXiv preprintarXiv:1604.00772, 2016.

[80] Nikolaus Hansen, Anne Auger, Raymond Ros, Steffen Finck, and PetrPovsik. Comparing results of 31 algorithms from the black-box optimiza-tion benchmarking bbob-2009. In Proceedings of the 12th Annual Confer-ence Companion on Genetic and Evolutionary Computation, GECCO 10,pages 1689–1696, New York, NY, USA, 2010. Association for ComputingMachinery.

[81] David H Brookes, Akosua Busia, Clara Fannjiang, Kevin Murphy, andJennifer Listgarten. A view of estimation of distribution algorithms throughthe lens of expectation-maximization. arXiv preprint arXiv:1905.10474,2019.

[82] niko, Youhei Akimoto, yoshihikoueno, Dimo Brockhoff, Matthew Chan,and ARF1. Cma-es/pycma: r3.0.3, April 2020.

[83] COVID-19 Ruben Fernandez Casal. https://github.com/rubenfcasal/

COVID-19.

[84] Datadista. https://github.com/datadista/datasets/tree/master/

COVID%2019.

37

https://github.com/rubenfcasal/COVID-19

https://github.com/rubenfcasal/COVID-19

https://github.com/datadista/datasets/tree/master/COVID%2019

https://github.com/datadista/datasets/tree/master/COVID%2019

Documents

COVID-19: Estimating spread in Spain solving an inverse ... · In Europe, for example, it is considered to be the most signi cant challenge that the continent has faced since World