Overview

In previous parts of this series on visualising Covid-19 data, we downloaded data from the UK Government’s Health Security Agency Covid-19 Dashboard website and plotted the data in a variety of methods, but none of these have included the actual numbers of cases submitted.

Load Data

df <- read_csv("all_areas.csv")

# A tibble: 6 × 5
  area_code area_name        date       new_cases_by_specime…¹ country
  <chr>     <chr>            <date>                      <dbl> <chr>  
1 E06000003 Redcar and Clev… 2023-04-19                      1 England
2 E06000014 York             2023-04-19                      4 England
3 E06000050 Cheshire West a… 2023-04-19                      8 England
4 E08000001 Bolton           2023-04-19                      5 England
5 E08000016 Barnsley         2023-04-19                      1 England
6 E08000031 Wolverhampton    2023-04-19                      3 England
# ℹ abbreviated name: ¹new_cases_by_specimen_date

Transform Data

Top Ten Areas

Once again, we will be creating plots for the top ten areas in this post.

df <- df %>%
  
  # Filter data for just 2023
  filter(date >= as.Date("2023-01-01")) %>%
  
  # Select the columns to use
  select(area_name, cases = new_cases_by_specimen_date) %>%
  
  # Group by the area name
  group_by(area_name) %>%
  
  # Calculate total number of cases by area_name
  summarise(cases = sum(cases)) %>%
  
  # Sort in descending order
  arrange(-cases) %>%
  
  # Get the top ten items
  top_n(n = 10, wt = cases) %>%
  
  # Add a new variable to identify the top three items
  mutate(highlight = ifelse(row_number() <= 3, TRUE, FALSE))

# A tibble: 10 × 3
   area_name       cases highlight
   <chr>           <dbl> <lgl>    
 1 Hampshire       10500 TRUE     
 2 Kent             9054 TRUE     
 3 Essex            8729 TRUE     
 4 Lancashire       8441 FALSE    
 5 Surrey           7366 FALSE    
 6 Hertfordshire    7004 FALSE    
 7 Staffordshire    6559 FALSE    
 8 Norfolk          5831 FALSE    
 9 Nottinghamshire  5765 FALSE    
10 Derbyshire       5665 FALSE

Create Plot

Create Plot Labels

As previously, we can now create the labels with some descriptive text explaining the chart to the viewer.

Create Basic Plot

df %>%
  
  # Create plot adding the highlight variable as the fill colour
  ggplot(aes(x = cases, y = area_name, fill = highlight)) +
  
  # Add column geometry
  geom_col()

Figure 1: Basic plot

Reorder Area Names

The order in which the names of each area appears should be in descending order, with the highest value at the top of the plot.

df %>%
  
  # Create plot adding the highlight variable as the fill colour 
  # and reorder the area_name using fct_reorder
  ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
  
  # Add column geometry and remove the legend
  geom_col(show.legend = FALSE)

Figure 2: Basic plot with re-ordered area names

Add Data Labels

We can now add the actual numbers of each bar.

df %>%
  
  # Create plot adding the highlight variable as the fill colour 
  # and reorder the area_name using fct_reorder
  ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
  
  # Add column geometry and remove the legend
  geom_col(show.legend = FALSE) +
  
  # Add text geometry to plot
  geom_text(aes(cases, area_name, label = cases), hjust = 0)

Figure 3: Basic plot with re-ordered area names and data labels

Reimagine the plot

Now we have the bars in the correct order, along with the data labels, now we can re-imagine the plot into something a little different.

If we remove the labels on the y axis and have them above the actual bar, along with the data label, we can have an original plot.

df %>%
  
  # Create plot adding the highlight variable as the fill colour
  # and reorder the area_name using fct_reorder
  ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
  
  # Add column geometry and remove the legend and amend the
  # width of the column
  geom_col(width = .35, show.legend = FALSE) +
  
  # Add label for the area name and amend position by nudge upwards
  # to be above bar and ensure ordering is the same with fct_reorder
  # Also amend the font family to match other plot elements
  geom_text(
    aes(
      x = 0,
      y = fct_reorder(area_name, cases),
      label = area_name
    ),
    hjust = 0,
    position = position_nudge(y = .45),
    family = "roboto-condensed"
  ) +
  
  # Add label for the number of cases and amend position by nudge upwards and
  # to the left and above bar. Ensure ordering is the same with fct_reorder
  # Also amend the font family to match other plot elements
  geom_text(
    aes(
      x = cases,
      y = fct_reorder(area_name, cases),
      label = comma(cases),
      colour = highlight
    ),
    fontface = "bold",
    hjust = 1,
    position = position_nudge(x = -.4, y = .45),
    family = "roboto-condensed"
  )

Figure 4: Reworked plot with re-ordered area names and data labels

The Final Plot

We can now remove the labels on the y-axis and amend the other elements using the theme command and add the labels to the plot. The colour of the bars and data labels can also be amended

df %>%
  
  # Create plot adding the highlight variable as the fill colour
  # and reorder the area_name using fct_reorder
  ggplot(aes(cases, fct_reorder(area_name, cases), fill = highlight)) +
  
  # Add column geometry and remove the legend and amend the
  # width of the column
  geom_col(width = .35, show.legend = FALSE) +
  
  # Add label for the area name and amend position by nudge upwards
  # to be above bar and ensure ordering is the same with fct_reorder
  # Also amend the font family to match other plot elements
  geom_text(
    aes(
      x = 0,
      y = fct_reorder(area_name, cases),
      label = area_name
    ),
    hjust = 0,
    position = position_nudge(y = .45),
    family = "roboto-condensed"
  ) +
  
  # Add label for the number of cases and amend position by nudge upwards and
  # to the left and above bar. Ensure ordering is the same with fct_reorder
  # Also amend the font family to match other plot elements
  geom_text(
    aes(
      x = cases,
      y = fct_reorder(area_name, cases),
      label = comma(cases),
      colour = highlight,
      fontface = "bold"
    ),
    hjust = 1,
    position = position_nudge(x = -.4, y = .45),
    family = "roboto-condensed"
  ) +
  
  # Amend the colour of the bars to more suitable colours
  scale_fill_manual(values = c("tomato", "lightgrey"),
                    breaks = c(TRUE, FALSE)) +
  
  # Amend colour of data labels
  scale_colour_manual(values = c("tomato", "black"),
                      breaks = c(TRUE, FALSE)) +
  
  # Amend theme elements
  theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    legend.position = "none",
    plot.title.position = "panel"
  ) +
  
  # Add final labels
  labs(title = title_text,
       subtitle = subtitle_text,
       caption = caption_text)

Figure 5: Final plot with amended colours and elements

Filtering data for North East England

Now we have the final plot, let’s filter the main dataset for only those areas in North East England.

# Create list of areas to use in the filter
areas <- c(
  "Hartlepool","Middlesbrough","Redcar and Cleveland",
  "Stockton-on-Tees","Darlington","County Durham",
  "Northumberland","Newcastle upon Tyne","North Tyneside",
  "South Tyneside","Sunderland","Gateshead"
  )

# Add filter for areas to the initial pipeline of commands 

filter(date >= as.Date("2023-01-01"), area_name %in% areas) %>%

Figure 6: Final plot for North East England

Conclusion

Exploring different and new methods of visualising data has been fun and will certainly be using these techniques with other datasets.

The next post will be back to MS Excel and some tips and tricks for using the new dynamic spilled arrays.