Gapminder Country Comparison

The gapminder dataset from R gapminder package has data on life expectancy, population, and GDP per capita for 142 countries from 1952 to 2007. The aim of the project today is to have a glance at how the average life expectancy of across 5 continents changed over the past 55 years.

First, load the packages needed.

library(tidyverse)  # Load ggplot2, dplyr, and all the other tidyverse packages
library(gapminder)  # gapminder dataset
library(here)
library(janitor)
library(skimr)

Then we have a look at the completness and first 20 rows of the data.

skim(gapminder)
Table 1: Data summary
Name gapminder
Number of rows 1704
Number of columns 6
_______________________
Column type frequency:
factor 2
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
country 0 1 FALSE 142 Afg: 12, Alb: 12, Alg: 12, Ang: 12
continent 0 1 FALSE 5 Afr: 624, Asi: 396, Eur: 360, Ame: 300

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 1979.50 17.27 1952.00 1965.75 1979.50 1993.25 2007.0 ▇▅▅▅▇
lifeExp 0 1 59.47 12.92 23.60 48.20 60.71 70.85 82.6 ▁▆▇▇▇
pop 0 1 29601212.32 106157896.74 60011.00 2793664.00 7023595.50 19585221.75 1318683096.0 ▇▁▁▁▁
gdpPercap 0 1 7215.33 9857.45 241.17 1202.06 3531.85 9325.46 113523.1 ▇▁▁▁▁
head(gapminder, 20) 
## # A tibble: 20 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## 11 Afghanistan Asia       2002    42.1 25268405      727.
## 12 Afghanistan Asia       2007    43.8 31889923      975.
## 13 Albania     Europe     1952    55.2  1282697     1601.
## 14 Albania     Europe     1957    59.3  1476505     1942.
## 15 Albania     Europe     1962    64.8  1728137     2313.
## 16 Albania     Europe     1967    66.2  1984060     2760.
## 17 Albania     Europe     1972    67.7  2263554     3313.
## 18 Albania     Europe     1977    68.9  2509048     3533.
## 19 Albania     Europe     1982    70.4  2780097     3631.
## 20 Albania     Europe     1987    72    3075321     3739.

We begin the analysis by producing two graphs on how life expectancy has changed over the years for the country and the continent I come from.

# filter the gapminder dataset for specific country and continent, then assigning separately to two datasets
country_data <- gapminder %>% 
            filter(country == 'New Zealand')

continent_data <- gapminder %>% 
            filter(continent == "Oceania")

A plot of life expectancy over time for the single country I come from is created by map year on the x-axis, and lifeExp on the y-axis.

# create the plot
plot1 <- ggplot(data = country_data, mapping = aes(x = year,y = lifeExp))+
  # add the data points
  geom_point() +
  # generate a line of best fit
  geom_smooth(se = FALSE) +
  NULL

 plot1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Next I want to add a title, using the labs() function.

# add an informative title to plot1
 plot1<- plot1 +
   labs(title = "New Zealand Change in Life Expectancy 1952-2007",
       x = " ",
       y = " ") +
   NULL

 plot1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Similarly, a plot for all countries in the continent I come from is produced, where the country variable is mapped on both the colour and group aesthetic to show the countries by different colour but also group them together when computing line of best fit.

# create the plot
 ggplot(continent_data, mapping = aes(x = year , y =  lifeExp, colour= country, group =country))+
   geom_point() + 
   geom_smooth(se = FALSE) +
   labs(title = "Oceania Change in Life Expectancy 1952-2007")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

   NULL
## NULL

Finally, using the original gapminder data, I produced a life expectancy over time graph, faceted by continent. I removed all legends by adding the theme(legend.position="none") at the end of our ggplot.

# create the plot
 ggplot(data = gapminder , mapping = aes(x = year, y =  lifeExp, color= continent))+
   geom_point() + 
   geom_smooth(se = FALSE) +
   facet_wrap(~continent) +
   theme(legend.position="none") + # remove all legends
   labs(title = "World Change in Life Expectancy 1952-2007 Faceted by Continents") +
   NULL
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

After plotting the graphs, we can have a careful look at them. Several observations can be made, especially on the differences among continents:

  1. Geneal Trend: First and most general observation is that the life expectancy in all continents have been increasing in the past years since 1952, likely because of development in technology that improved everyone’s life quality. Besides, in all continents apart from Oceania, which has too small a sample size of only two countries, the rate of increase in life expectancy is slowing down. This signifies to a degree a halt of significant development in life sciences and related technologies.

  2. By Continent: Going down to the continent level, Oceania has the highest life expectancy, followed closely by America and Europe, whereas Asia and Africa lie further behind. Such difference represents to a degree the difference in wealth level and average living standards among continents. Furthermore, interesting patterns can be observed in distribution of life expectancy of each country within each continent. Oceania has only two countries and their life expectancy are rather similar. In Europe, most countries have rather long and similar life expectancies, apart from one outlier which was extraordinarily low from 1950 to 1990 but caught up since then. This suggests that most countries in Europe are quite well developed, perhaps apart one which only caught up after 1990. On the other hand, Africa, America and Asia have much wider distribution in life expectancy by country, showing that the level of wealthiness and development in these continents are more differentiated.