Top 35 R resources on Novel COVID-19 Coronavirus

Antoine Soetewey 2020-03-12 16 minute read
Photo by CDC

Photo by CDC

The Coronavirus is a serious concern around the globe. With its expansion, there are also more and more online resources about it. This article presents a selection of the best R resources on the COVID-19 virus.

This list is by no means exhaustive. I am not aware of all R resources available online about the Coronavirus, so please feel free to let me know in the comments or by contacting me if you believe that another resource (R package, Shiny app, R code, blog posts, datasets, etc.) deserves to be on this list.

R Shiny apps and dashboards

Coronavirus tracker

Developed by John Coene, this Shiny app tracks the spread of the coronavirus, based on three data sources (John Hopkins, Weixin and DXY Data). The Shiny app, built with shinyMobile (which makes it responsive on different screen sizes), presents in a really nice way the number of deaths, confirmed, suspected and recovered cases by time and region.

The code is available on GitHub.

Coronavirus dashboard from the {coronavirus} package

Developed by the author of the {coronavirus} package, this dashboard provides an overview of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The data and dashboard are refreshed on a daily basis.

The code is available on GitHub.

From this dashboard, I created another dashboard specific to Belgium. Feel free to use the code available on GitHub to build one specific to your country. See more details in this article.

COVID-19 Global Cases

Developed by Christoph Schoenenberger, this Shiny app shows recent developments of the COVID-19 pandemic via a map, summary tables, key figures and plots.

The code is available on GitHub.

Find more thoughts on this dashboard from the author in this article.

Visualization of Covid-19 Cases

Developed by Nico Hahn, this Shiny app uses leaflet, plotly and the data from Johns Hopkins University to visualize the outbreak of the novel coronavirus and shows data for the entire world or singular countries.

The code is available on GitHub.

Modeling COVID-19 Spread vs Healthcare Capacity

Developed by Dr. Alison Hill, this Shiny app uses an epidemiological model based on the classic SEIR model to describe the spread and clinical progression of COVID-19. It includes different clinical trajectories of infection, interventions to reduce transmission, and comparisons to healthcare capacity.

The code is available on GitHub.

COVID-19 Data Visualization Platform

Developed by Shubhram Pandey, this Shiny app provides a clear visualization of Covid19 impact all over the world and it also provides a sentiment analysis using natural language processing from Twitter.

The code is available on GitHub.

Coronavirus 10-day forecast

Developed by the Spatial Ecology and Evolution Lab, this Shiny app gives a ten-day forecast, by country, on likely numbers of coronavirus cases and gives citizens a sense of how fast this epidemic is progressing.

See a detailed explanation of the app and how to read it in this blog post. The code is available on GitHub.

Coronavirus (COVID-19) across the world

Developed by Anisa Dhana in collaboration with datascience+, this Shiny app monitors the spread of COVID-19 across the world via a map visualization of the confirmed cases and some graphs on the growth of the virus.

The dataset used is from Johns Hopkins CSSE and part of the code is available in this blog post.

COVID-19 outbreak

Developed by Dr. Thibaut Fabacher in collaboration with the department of Public Health of the Strasbourg University Hospital and the Laboratory of Biostatistics and Medical Informatics of the Strasbourg Medicine Faculty, this Shiny app shows an interactive map for global monitoring of the infection. It focuses on the evolution of the number of cases per country and for a given period in terms of incidence and prevalence.

The code is available on GitHub and this blog post discusses it in more detail.

Comparing Corona trajectories

Developed by André Calero Valdez, this Shiny app compares the number of confirmed and deceased cases together with case trajectories by country via two graphs. The app also allows you to compare growth rates and case numbers by country via a table.

The code is available on GitHub.

Flatten the Curve

Developed by Tinu Schneider, this Shiny app illustrates, in an interactive way, the different scenarios behind the #FlattenTheCurve message.

The app has been built upon Michael Höhle’s article and the code is available on GitHub.

Explore the spread of Covid-19

Developed by Joachim Gassen, this Shiny app allows you to visualize confirmed, recovered cases and reported deaths for several countries via one summary graph.

The Shiny app is based on data from:

This blog post explains the Shiny app in further details and in particular the {tidycovid19} R package behind it.

Governments and COVID-19

Developed by Sebastian Engel-Wolf, this Shiny app presents in a elegant way the following measurements:

  • Maximum time of exponential growth in a row
  • Days to double infections
  • Exponential growth today
  • Confirmed cases
  • Deaths
  • Population
  • Confirmed cases on 100,000 inhabitants
  • Mortality rate

The code is available on GitHub and this article explains it in further details.

R packages

{nCov2019}

The {nCov2019} package gives you access to epidemiological data on the coronavirus outbreak.1 The package gives real-time statistics, includes historical data and a Shiny app. The vignette explains the main functions and possibilities of the package.

Furthermore, the authors of the package also developed a website with interactive plots and time-series forecasts, which could be useful in informing the public and studying how the virus spread in populous countries.

{coronavirus}

Developed by Rami Krispin, the {coronavirus} package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. Pulled from the dataset of John Hopkins, the R package gives a daily summary of the coronavirus cases by state/province. The data set contains various variables such as confirmed cases, death, and recovered across different states.

More details are available here, a csv format of the package dataset is available here and a summary dashboard is available here.

{tidycovid19}

Developed by Joachim Gassen, the {tidycovid19} package allows you to download, tidy and visualize Covid-19 related data (including data on governmental measures) directly from authoritative sources. It also provides a flexible function and an accompanying Shiny app to visualize the spreading of the virus.

The package is available on GitHub and a blog post explains it in more detail.

R code and blog posts

Analyzing COVID-19 outbreak data with R

Written by Tim Churches, these two articles (part 1 and part 2) explore the R tools and packages that might be used to analyze the COVID-19 data. In particular, the author considers when the pandemic will subside in China, and then turns the analysis on Japan, South Korea, Italy and Iran. He also shows improvements on the cumulative incidence plots that are so common. Moreover, he presents R code to analyze how contagious is the Coronavirus thanks to the classic SIR (Susceptible-Infectious-Recovered) compartmental model of communicable disease outbreaks.2

The code is available on GitHub (part 1 and part 2).

Part 1 is actually based on another shorter blog post by Prof. Dr. Holger K. von Jouanne-Diedrich from Learning Machines. Read his article here for a more concise analysis on how to model the outbreak of the coronavirus and discover how contagious it is.

More recently, the author published a series of other interesting articles:

COVID-19 Data Analysis with {tidyverse} and {ggplot2}

Dr. Yanchang Zhao from RDataMining published a data analysis around the Coronavirus with the {tidyverse} and {ggplot2} packages, for China and world wide.

Both documents are a mix of data cleaning, data processing and visualizations of the confirmed/cured cases and death rates across countries or regions.

COVID-19 cumulative observed case fatality rate over time

Written by Peter Ellis, this article focuses on how the observed case fatality rate of COVID-19 has evolved over time across 7 countries and comments on why the rates vary (low testing rates, age of the population, overwhelmed hospitals, etc.).

The code is available at the end of the article. The data is from John Hopkins and it uses the {coronavirus} package.

More recently, the author published a new article on the impact of a country’s age breakdown on COVID-19 case fatality rate. It looks at estimated fatalities in different countries according to the age distributions in those countries (based on Italy’s data). The data is from The Istituto Superiore di Sanità (Roma) and all the code is shown in the post.

Covid 19 Tracking

Written by Prof. Kieran Healy, this article discusses how to get an overview of best-available counts of deaths, using the COVID-19 Data from the European Centers for Disease Control.

Code can be found in the article and on GitHub.

More recently, the author published another article discussing how to create a small-multiple plot of cases by country, showing the trajectory of the outbreak for a large number of countries, with a the background of each small-multiple panel also showing (in grey) the trajectory of every other country for comparison.

Infectious diseases and nonlinear differential equations

Published by Fabian Dablander, this math intensive blog post explains what SIR and SIRS models take into account and how they calculate their results.

From a pandemic perspective, the author writes “The SIRS model extends the SIR model, allowing the recovered population to become susceptible again (hence the extra ‘S’). It assumes that the susceptible population increases proportional to the recovered population”.

Epidemic modelling of COVID-19 in the UK using an SIR model

Published by Thomas Wilding, this blog post applies the SIR model to UK data.

As further extensions to the model, the author suggests:

  • Using an SEIR model (adding an Exposed compartment for people who are infected but not yet infectious)
  • Adding a “Q” layer since a lot of people are being Quarantined or isolated
  • Considering the “hidden”" population that is infected but is denied being tested due to shortage of tests
  • Feasibility of a second wave / outbreak of the epidemic later in the year (as seen in previous outbreaks, such as Swine Flu)

Data sources:

Modeling Pandemics

Published by Arthur Charpentier, this series of 3 blog post (part 1, part 2, part 3) walks through the SIR model and its parameters, how ODEquations solves it, and generating the reproductive rate. It also gives a mathematical explanation of a model for how quickly a pandemic will return, albeit with diminishing intensity. Last, it explains a model that is more sophisticated than SIR, the SEIR model, and illustrates it with Ebola data.

COVID-19: The Case of Germany

Published by Prof. Dr. Holger K. von Jouanne-Diedrich from Learning Machines, this blog post uses the SIR model and German data to estimate the duration and severity of the pandemic.

Download the data from Morgenpost.

Flatten the COVID-19 Curve

Published by Michael Höhle from Theory meets practice, this blog post discusses why the message of flattening the COVID-19 curve is right, but why some of the visualizations used to show the effect are wrong: Reducing the basic reproduction number does not just stretch the outbreak, it also reduces the final size of the outbreak.

From a pandemic point of view, the author writes “Because of limited health capacities, stretching out the outbreak over a longer time period will ensure, that a larger proportion of those in need of hospital treatment will actually get it. Other advantages of this approach are to win time in order to find better treatment forms and, possibly, to eventually develop a vaccine”.

A Shiny app has also been built upon this article to investigate different scenarios.

Flattening vs shrinking: the math of #FlattenTheCurve

Published by Ben Bolker and Jonathan Dushoff, this blog post gives a clear explanation of physical distancing and explains how physical distancing makes several beneficial outcomes possible.

The code is available on GitHub.

explainCovid19 challenge

Published by Przemyslaw Biecek, this blog post gives an overview of a model that uses gradient boosting to predict survival based on age, country, and gender. It also shows how older people are more at risk and it lets you play with the model yourself with a modelStudio interactive dashboard.

Data sources:

An R Package to explore the Novel Coronavirus

Published by Patrick Tung via Towards Data Science, this blog post translates into English an R package originally written in Chinese.

Data is collected from Tencent, at https://news.qq.com/zt2020/page/feiyan.htm, which contains one of the most up-to-date public information of the coronavirus.

Coronavirus model using R – Colombia

Published by Daniel Pena Chavez, this blog post uses the code from Prof. Dr. Holger K. von Jouanne-Diedrich to model height of pandemic in Colombia and projected deaths. The author also points out that a huge number of other variables need to be considered, such as density, climate and government response.

Data is from Rami Krispin’s GitHub.

COVID-19: The Case of Spain

Written by Jose from Diarium - Statistics and R software, this blog post, using data for Spain, applies the SIR model, and then a cubic polynomial regression model to predict infections, hospitalizations, deaths and peak date.

Tidying the new Johns Hopkins Covid-19 time-series datasets

Written by Joachim Gassen, this blog post provides functions and code to deal with different country names and changes on the Johns Hopkins site.

A few days later, the author published another blog post analyzing five kinds of intervention on the spread of COVID-19.

COVID-19 in Belgium

Based on Tim Churches’ article, I published an analysis of the COVID-19 specifically for Belgium.

The code is available on GitHub, so feel free to use it as starting point for an analysis of the virus outreak in your own country.

Facts About Coronavirus Disease 2019 (COVID-19) in 5 Charts created with R and ggplot2

Written by Gregory Kanevsky, this blog post compiles some useful facts about COVID-19 into 5 charts, including gauge charts, and discusses R and {ggplot2} techniques used to create them.

Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting

Written by Martijn Weterings on Learning Machines, this guest post describes the fitting of Covid-19 data with the SIR model and explains tricky parts of the fitting methodology and how we can mitigate some of the problems (e.g., early stopping of the algorithm or an ill-conditioned problem). It provides a very clear explanation of some tweaks to the standard model.

The code is available here.

Coronavirus : spatially smoothed decease in France and decease animation map

Published by Michael from r.iresmi.net, this blog post shows R code on how to use kernel weighted smoothing with arbitrary bounding areas to display a map of deaths from Covid-19 in France.

See also how the author built an animated map of deaths from Covid-19 in France.

Another “flatten the COVID-19 curve” simulation… in R

Written by Javier Fernandez-Lopez, this blog post shows R code to create static plots and then simulations to demonstrate how social distancing could help to “flat the curve” of COVID-19 infections.

Non-english resources

This section may be of interested to only a limited number or people, but still, there are great resources in languages other than English. See a collection of them below:

Thanks for reading. I hope you will find these R resources on the COVID-19 Coronavirus useful. Feel free to let me know in the comments if I missed one.4 A special thanks to Rees Morrison for his tremendous work on collecting and organizing several articles, which greatly helped in improving the section about blog posts.

Although I have carefully read all resources, inclusion on the list does not mean that I endorse the findings. Moreover, some of the analyses, code, dashboards, packages or datasets might be out of date, so these should not be viewed, by default, as current findings.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion. If you find a mistake or bug, you can inform me by raising an issue on GitHub. For all other requests, you can contact me here.

Get updates every time a new article is published by subscribing to this blog.

Related articles:


  1. The package has also been the subject of a preprint.

  2. See more information about this epidemiological model in this post by Marc Choisy.

  3. The author Su Wei is looking for some help in translating the Shiny app. Do not hesitate to contact me if you’d like to help!

  4. If you are the author of one of these resources, do not hesitate to contact me if you see any inconsistency or if you would like to remove it from this article.