A recent news report suggested that Covid-19 cases in England and Wales are increasing once more. Here we look at data for 2023 and show the data on a UK map
In part two we downloaded data from the UK Government’s Health Security Agency Covid-19 Dashboard website.
The data is now only updated weekly, so a CSV file with the data was saved to remove the need for daily downloads of data.
df <- read_csv("all_areas.csv")
# A tibble: 6 × 5
area_code area_name date new_cases_by_specime…¹ country
<chr> <chr> <date> <dbl> <chr>
1 E06000003 Redcar and Clev… 2023-04-12 6 England
2 E06000014 York 2023-04-12 3 England
3 E06000050 Cheshire West a… 2023-04-12 4 England
4 E08000001 Bolton 2023-04-12 10 England
5 E08000016 Barnsley 2023-04-12 4 England
6 E08000031 Wolverhampton 2023-04-12 4 England
# ℹ abbreviated name: ¹new_cases_by_specimen_date
For the plots that will be created for this post, we need to look for the top five areas, or counties, in England and Wales, so we can plot the labels to the final map.
top_ten_all_areas_df <- df %>%
# Filter data for just 2023
filter(date >= as.Date("2023-01-01")) %>%
# Group by area_name variable
group_by(area_name) %>%
# Calculate total number of cases by area_name
summarise(cases = sum(new_cases_by_specimen_date)) %>%
# Sort in descending order
arrange(-cases) %>%
# Get the top five items
mutate(lbl = if_else(row_number() <= 5, TRUE, FALSE))
# A tibble: 203 × 3
area_name cases lbl
<chr> <dbl> <lgl>
1 Hampshire 10105 TRUE
2 Kent 8691 TRUE
3 Essex 8425 TRUE
4 Lancashire 8168 TRUE
5 Surrey 7109 TRUE
6 Hertfordshire 6754 FALSE
7 Staffordshire 6221 FALSE
8 Norfolk 5615 FALSE
9 Nottinghamshire 5563 FALSE
10 Derbyshire 5423 FALSE
# ℹ 193 more rows
We can now add some labels with some descriptive text explaining the chart to the viewer.
# Create title
title_text <-
"How many new cases of Covid-19 have been submitted?"
# Create subtitle
subtitle_text <-
paste(
top_area_names,
"are the top five counties with a total of",
comma(top_area_counts),
"cases, representing",
top_area_pct,
"of all cases in England & Wales since the start of 2023"
)
# Create caption
caption_text <- "Source: UK Health Security Agency at
https://coronavirus.data.gov.uk/"
Create a new palette of colours using the colorspace
package to remove the standard ggplot2
colour palette.
# Create colour palette using the RedOr (Red-Orange) palette
pal <-
colorspace::sequential_hcl(
length(unique(df$area_name)),
palette = "redor",
rev = TRUE)
Now we have the data loaded, labels created and a new colour palette, we now need to join the data to the map data. Using the sf
package, we can read the shape file saved in the same directory as the source data.
# Read the shape file into new data frame
counties <-
sf::st_read(
"Counties_and_Unitary_Authorities_(December_2016)_Boundaries.shp"
)
Now we can join the data from the shape file to the data frame containing the data we want to map
counties <- counties %>%
inner_join(df, c("ctyua16cd" = "area_code"))
Now we have the data loaded, labels created and a new colour palette, let’s create our plot.
ggplot(counties) +
geom_sf(aes(fill = cases), size = .4) +
labs(title = title_text,
subtitle = subtitle_text,
caption = caption_text) +
scale_fill_gradientn(
colours = pal,
labels = label_number(scale_cut = cut_short_scale()),
guide = guide_colorbar(title = NULL)
)
Let’s remove the axis labels for the coordinates and amend the legend and move to the bottom of the plot.
ggplot(counties) +
geom_sf(aes(fill = cases), size = .4) +
labs(title = title_text,
subtitle = subtitle_text,
caption = caption_text) +
scale_fill_gradientn(
colours = pal,
labels = label_number(scale_cut = cut_short_scale()),
guide = guide_colorbar(title = NULL)
) +
theme(
plot.margin = margin(rep(.25, 4), unit = "cm"),
plot.subtitle = element_textbox_simple(margin = margin(t = 1)),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "bottom",
legend.direction = "horizontal",
legend.key.height = unit(.8, units = "lines"),
legend.key.width = unit(3, units = "lines")
)
For someone outside of the United Kingdom, the county names in the subtitle will be meaningless if the location of those counties are not known. Let’s add some labels for the top five counties using the ggrepel
package.
ggplot(counties) +
geom_sf(aes(fill = cases), size = .4) +
labs(title = title_text,
subtitle = subtitle_text,
caption = caption_text) +
scale_fill_gradientn(
colours = pal,
labels = label_number(scale_cut = cut_short_scale()),
guide = guide_colorbar(title = NULL)
) +
theme(
plot.margin = margin(rep(.25, 4), unit = "cm"),
plot.subtitle = element_textbox_simple(margin = margin(t = 1)),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "bottom",
legend.direction = "horizontal",
legend.key.height = unit(.8, units = "lines"),
legend.key.width = unit(3, units = "lines")
) +
ggrepel::geom_text_repel(
data = counties %>% filter(lbl == TRUE),
aes(x = long, y = lat, label = area_name),
nudge_x = c(1.28, 0, 1.25, -1.5, .4),
nudge_y = c(0,-.6, 0, 0, -.7),
family = "roboto-condensed"
)
To filter the data for the area of the UK where I live, we can filter the data on the initial loading. First, create a vector of area names in North East England
# Create list of areas to use in the filter
areas <- c(
"Hartlepool","Middlesbrough","Redcar and Cleveland",
"Stockton-on-Tees","Darlington","County Durham",
"Northumberland","Newcastle upon Tyne","North Tyneside",
"South Tyneside","Sunderland","Gateshead"
)
Apply the filter when the data is initially loaded.
The steps outlined in the previous sections can then be followed once more to create the plot below.
The values for the nudge_x
parameter in the ggrepel::geom_text_repel
need to be amended for the new top five counties. We can remove the nudge_y
parameter as the areas to be labelled will be far enough apart without the need to vertically amend the location of the label.
nudge_x = c(0, 0, .6, .3, .5)
In this post we looked at the top ten areas in both England & Wales as a whole and the in the North East England and mapped the values to a map using the sf
package.
The values used by the nudge_x
and nudge_y
parameters in the geom_text_repel
were manually calculated by trial and error, amending the values with each iteration of preparing the images for this post. While not my preferred method, it was a good learning exercise.
In the next post, we will look at the actual numbers of cases submitted for Covid-19 as these values have not been shown in any of the previous posts.