A recent news report suggested that Covid-19 cases in England and Wales are increasing once more. Here we look at actual numbers in the data for 2023
In previous parts of this series on visualising Covid-19 data, we downloaded data from the UK Government’s Health Security Agency Covid-19 Dashboard website and plotted the data in a variety of methods, but none of these have included the actual numbers of cases submitted.
df <- read_csv("all_areas.csv")
# A tibble: 6 × 5
area_code area_name date new_cases_by_specime…¹ country
<chr> <chr> <date> <dbl> <chr>
1 E06000003 Redcar and Clev… 2023-04-19 1 England
2 E06000014 York 2023-04-19 4 England
3 E06000050 Cheshire West a… 2023-04-19 8 England
4 E08000001 Bolton 2023-04-19 5 England
5 E08000016 Barnsley 2023-04-19 1 England
6 E08000031 Wolverhampton 2023-04-19 3 England
# ℹ abbreviated name: ¹new_cases_by_specimen_date
Once again, we will be creating plots for the top ten areas in this post.
df <- df %>%
# Filter data for just 2023
filter(date >= as.Date("2023-01-01")) %>%
# Select the columns to use
select(area_name, cases = new_cases_by_specimen_date) %>%
# Group by the area name
group_by(area_name) %>%
# Calculate total number of cases by area_name
summarise(cases = sum(cases)) %>%
# Sort in descending order
arrange(-cases) %>%
# Get the top ten items
top_n(n = 10, wt = cases) %>%
# Add a new variable to identify the top three items
mutate(highlight = ifelse(row_number() <= 3, TRUE, FALSE))
# A tibble: 10 × 3
area_name cases highlight
<chr> <dbl> <lgl>
1 Hampshire 10500 TRUE
2 Kent 9054 TRUE
3 Essex 8729 TRUE
4 Lancashire 8441 FALSE
5 Surrey 7366 FALSE
6 Hertfordshire 7004 FALSE
7 Staffordshire 6559 FALSE
8 Norfolk 5831 FALSE
9 Nottinghamshire 5765 FALSE
10 Derbyshire 5665 FALSE
As previously, we can now create the labels with some descriptive text explaining the chart to the viewer.
df %>%
# Create plot adding the highlight variable as the fill colour
ggplot(aes(x = cases, y = area_name, fill = highlight)) +
# Add column geometry
geom_col()
The order in which the names of each area appears should be in descending order, with the highest value at the top of the plot.
df %>%
# Create plot adding the highlight variable as the fill colour
# and reorder the area_name using fct_reorder
ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
# Add column geometry and remove the legend
geom_col(show.legend = FALSE)
We can now add the actual numbers of each bar.
df %>%
# Create plot adding the highlight variable as the fill colour
# and reorder the area_name using fct_reorder
ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
# Add column geometry and remove the legend
geom_col(show.legend = FALSE) +
# Add text geometry to plot
geom_text(aes(cases, area_name, label = cases), hjust = 0)
Now we have the bars in the correct order, along with the data labels, now we can re-imagine the plot into something a little different.
If we remove the labels on the y axis and have them above the actual bar, along with the data label, we can have an original plot.
df %>%
# Create plot adding the highlight variable as the fill colour
# and reorder the area_name using fct_reorder
ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
# Add column geometry and remove the legend and amend the
# width of the column
geom_col(width = .35, show.legend = FALSE) +
# Add label for the area name and amend position by nudge upwards
# to be above bar and ensure ordering is the same with fct_reorder
# Also amend the font family to match other plot elements
geom_text(
aes(
x = 0,
y = fct_reorder(area_name, cases),
label = area_name
),
hjust = 0,
position = position_nudge(y = .45),
family = "roboto-condensed"
) +
# Add label for the number of cases and amend position by nudge upwards and
# to the left and above bar. Ensure ordering is the same with fct_reorder
# Also amend the font family to match other plot elements
geom_text(
aes(
x = cases,
y = fct_reorder(area_name, cases),
label = comma(cases),
colour = highlight
),
fontface = "bold",
hjust = 1,
position = position_nudge(x = -.4, y = .45),
family = "roboto-condensed"
)
We can now remove the labels on the y-axis and amend the other elements using the theme
command and add the labels to the plot. The colour of the bars and data labels can also be amended
df %>%
# Create plot adding the highlight variable as the fill colour
# and reorder the area_name using fct_reorder
ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
# Add column geometry and remove the legend and amend the
# width of the column
geom_col(width = .35, show.legend = FALSE) +
# Add label for the area name and amend position by nudge upwards
# to be above bar and ensure ordering is the same with fct_reorder
# Also amend the font family to match other plot elements
geom_text(
aes(
x = 0,
y = fct_reorder(area_name, cases),
label = area_name
),
hjust = 0,
position = position_nudge(y = .45),
family = "roboto-condensed"
) +
# Add label for the number of cases and amend position by nudge upwards and
# to the left and above bar. Ensure ordering is the same with fct_reorder
# Also amend the font family to match other plot elements
geom_text(
aes(
x = cases,
y = fct_reorder(area_name, cases),
label = comma(cases),
colour = highlight,
fontface = "bold"
),
hjust = 1,
position = position_nudge(x = -.4, y = .45),
family = "roboto-condensed"
) +
# Amend the colour of the bars to more suitable colours
scale_fill_manual(values = c("tomato", "lightgrey"),
breaks = c(TRUE, FALSE)) +
# Amend colour of data labels
scale_colour_manual(values = c("tomato", "black"),
breaks = c(TRUE, FALSE)) +
# Amend theme elements
theme(
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "none",
plot.title.position = "panel"
) +
# Add final labels
labs(title = title_text,
subtitle = subtitle_text,
caption = caption_text)
Now we have the final plot, let’s filter the main dataset for only those areas in North East England.
# Create list of areas to use in the filter
<- c(
areas "Hartlepool","Middlesbrough","Redcar and Cleveland",
"Stockton-on-Tees","Darlington","County Durham",
"Northumberland","Newcastle upon Tyne","North Tyneside",
"South Tyneside","Sunderland","Gateshead"
)
# Add filter for areas to the initial pipeline of commands
filter(date >= as.Date("2023-01-01"), area_name %in% areas) %>%
Exploring different and new methods of visualising data has been fun and will certainly be using these techniques with other datasets.
The next post will be back to MS Excel and some tips and tricks for using the new dynamic spilled arrays.