Step by Step on Constructing a Great Graph

By | 4th January 2020

A graph is worth a thousand words, though not always. Not all graphs are created equal. I would say a great graph is worth a thousand words. What is a great graph?

Let’s start from scratch to construct a great graph to show you how to get to one. The data I used is from R4DS. Below is how we can obtain the data and construct a scatter plot to examine the relationship between two variables.

library(tidyverse)
student_ratio_raw <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/student_teacher_ratio.csv")

library(WDI)

WDIdata <- WDI(indicator = c("iso3c", "NY.GDP.PCAP.CD", "SP.POP.TOTL"), start=2015,end=2015, extra=TRUE) %>%
           mutate(country_code = as.character(iso3c)) %>% 
           mutate(GDP_per_capita = NY.GDP.PCAP.CD, total_population=SP.POP.TOTL) %>% 
           tbl_df()
student_ratio_elementary_2015 <- student_ratio_raw %>%
  filter(indicator == "Primary Education", year==2015) %>% 
  arrange(desc(student_ratio)) %>%
  inner_join(WDIdata, by="country_code")

student_ratio_elementary_2015 %>%
  ggplot(aes(GDP_per_capita, student_ratio)) +
  geom_point() 

Step 1: Construct a scatter plot to check on the relationship between student ratio and GDP per capita. They are expected to have a negative relationship. Richer country has a low student to teacher ratio. However, the relationship is not very clear from the first graph.

Step 2: Apply the log transformation on the two right skewed variables. The negative relationship becomes much more clearly.

student_ratio_elementary_2015 %>%
  ggplot(aes(GDP_per_capita, student_ratio)) +
  geom_point() +
  scale_x_log10() +
  scale_y_log10()

Step 3: Add in labels of the country and add the graph title and axis labels so that we can understand what the graph depicts easily.

student_ratio_elementary_2015 %>%
  ggplot(aes(GDP_per_capita, student_ratio)) +
  geom_point() +
  scale_x_log10() +
  scale_y_log10() +
  geom_text(aes(label=country), vjust=1,hjust=1,check_overlap=TRUE ) +
  labs(x="GDP per capita",
       y="student/teacher ratio in primary education",
       title = "GDP per capita and student/teacher ratio are negatively correlated.",
       subtitle = "in 2015") 

Step 4: Add color by region and size by total population. We can tell Sub-suharan African countries have lower GDP per capita and higher student/teacher ratio.

student_ratio_elementary_2015 %>%
  arrange(desc(total_population)) %>%
  top_n(100,total_population) %>%
  ggplot(aes(GDP_per_capita, student_ratio, color=region)) +
  geom_point(aes(size=total_population)) +
  scale_y_log10() +
  scale_x_log10() +
  geom_text(aes(label=country), vjust=1,hjust=1,check_overlap=TRUE ) +
  labs(x="GDP per capita",
       y="student/teacher ratio in primary education",
       title = "GDP per capita and student/teacher ratio are negatively correlated.",
       subtitle = "in 2015") 

Step 5: Change the legend label format to comma style for easier read. Increase the contrasts of the dot sizes so that countries with bigger population stand out more.

student_ratio_elementary_2015 %>%
  arrange(desc(total_population)) %>%
  top_n(100,total_population) %>%
  ggplot(aes(GDP_per_capita, student_ratio, color=region)) +
  geom_point(aes(size=total_population)) +
  scale_y_log10() +
  scale_x_log10() +
  scale_size_continuous(label = scales::comma, range = c(.25,12)) +
  geom_text(aes(label=country), vjust=1,hjust=1,check_overlap=TRUE ) +
  labs(x="GDP per capita",
       y="student/teacher ratio in primary education",
       title = "GDP per capita and student/teacher ratio are negatively correlated.",
       subtitle = "in 2015")  

In summary, a great graph is well labeled and incorporates a lot of information in it. The most important criterion is that a great graph does not make the user think what it tries to show/convey.

One thought on “Step by Step on Constructing a Great Graph

Leave a Reply

Your email address will not be published. Required fields are marked *