Draw a Mean Line in Box Plot in R

boxplot() in R

boxplot() in R helps to visualize the distribution of the data by quartile and notice the presence of outliers. You can apply the geometric object geom_boxplot() from ggplot2 library to depict a boxplot() in R.

Nosotros will utilise the airquality dataset to introduce boxplot() in R with ggplot. This dataset measures the airquality of New York from May to September 1973. The dataset contains 154 observations. Nosotros will use the following variables:

  • Ozone: Numerical variable
  • Air current: Numerical variable
  • Month: May to September. Numerical variable

In this tutorial, you will learn

  • Create Box Plot
  • Box Plot with dots
  • Control aesthetic of the Box Plot
  • Box Plot with Jittered dots
  • Notched box plot

Create Box Plot

Before you start to create your commencement boxplot() in R, yous need to manipulate the data as follow:

  • Pace 1: Import the data
  • Step two: Drib unnecessary variables
  • Footstep 3: Catechumen Month in factor level
  • Step four: Create a new categorical variable dividing the calendar month with three level: begin, middle and stop.
  • Footstep 5: Remove missing observations

All these steps are done with dplyr and the pipeline operator %>%.

library(dplyr) library(ggplot2) # Step 1 data_air <- airquality % > %  #Step 2 select(-c(Solar.R, Temp)) % > %  #Footstep three mutate(Calendar month = factor(Month, social club = TRUE, labels = c("May", "June", "July", "Baronial", "September")),          #Step 4  day_cat = factor(ifelse(Day < ten, "Brainstorm", ifelse(Solar day < twenty, "Middle", "Finish"))))          

A good practice is to check the structure of the information with the function glimpse().



## Observations: 153 ## Variables: 5 ## $ Ozone   <int> 41, 36, 12, 18, NA, 28, 23, xix, 8, NA, 7, 16, 11, xiv, ... ## $ Air current    <dbl> 7.4, 8.0, 12.half dozen, 11.5, 14.three, 14.9, eight.6, 13.8, 20.i, ## $ Month   <ord> May, May, May, May, May, May, May, May, May, May, May,... ## $ Day     <int> 1, two, 3, four, 5, six, 7, viii, 9, 10, xi, 12, 13, 14, 15, 16,... ## $ day_cat <fctr> Begin, Begin, Brainstorm, Begin, Begin, Brainstorm, Begin, Begi...          

In that location are NA'southward in the dataset. Removing them is wise.

# Stride 5 data_air_nona <-data_air %>% na.omit()          

Basic box plot

Let's plot the basic R boxplot() with the distribution of ozone past month.

# Store the graph box_plot <- ggplot(data_air_nona, aes(ten = Month, y = Ozone)) # Add the geometric object box plot box_plot +     geom_boxplot()          

Code Caption

  • Shop the graph for farther use
    • box_plot: You lot store the graph into the variable box_plot It is helpful for farther utilise or avoid too circuitous line of codes
  • Add together the geometric object of R boxplot()
    • You pass the dataset data_air_nona to ggplot boxplot.
    • Inside the aes() statement, you add the x-axis and y-axis.
    • The + sign means y'all want R to keep reading the code. It makes the code more readable past breaking it.
    • Use geom_boxplot() to create a box plot


Box Plot in R

Modify side of the graph

Y'all tin flip the side of the graph.

box_plot +   geom_boxplot()+   coord_flip()          

Code Explanation

  • box_plot: You employ the graph you stored. Information technology avoids rewriting all the codes each fourth dimension you lot add together new information to the graph.
  • geom_boxplot(): Create boxplots() in R
  • coord_flip(): Flip the side of the graph


Box Plot in R

Modify colour of outlier

You lot can change the color, shape and size of the outliers.

box_plot +     geom_boxplot(outlier.colour = "red",         outlier.shape = two,         outlier.size = three) +     theme_classic()          

Code Explanation

  • outlier.colour="crimson": Control the colour of the outliers
  • outlier.shape=ii: Alter the shape of the outlier. 2 refers to triangle
  • outlier.size=3: Change the size of the triangle. The size is proportional to the number.


Box Plot in R

Add a summary statistic

You tin can add together a summary statistic to the R boxplot().

box_plot +     geom_boxplot() +     stat_summary(fun.y = mean,         geom = "signal",         size = 3,         colour = "steelblue") +     theme_classic()          

Code Caption

  • stat_summary() allows calculation a summary to the horizontal boxplot R
  • The argument fun.y controls the statistics returned. You volition use mean
  • Annotation: Other statistics are available such equally min and max. More one statistics can exist exhibited in the same graph
  • geom = "point": Plot the average with a point
  • size=3: Size of the point
  • color ="steelblue": Color of the points


Box Plot in R

Box Plot with Dots

In the next horizontal boxplot R, you add the dot plot layers. Each dot represents an observation.

box_plot +     geom_boxplot() +     geom_dotplot(binaxis = 'y',         dotsize = 1,         stackdir = 'center') +     theme_classic()          

Code Explanation

  • geom_dotplot() allows adding dot to the bin width
  • binaxis='y': Change the position of the dots along the y-axis. Past default, x-axis
  • dotsize=1: Size of the dots
  • stackdir='eye': Way to stack the dots: Four values:
    • "up" (default),
    • "down"
    • "center"
    • "centerwhole"


Box Plot in R

Command Artful of the Box Plot

Change the color of the box

You can modify the colors of the group.

ggplot(data_air_nona, aes(x = Month, y = Ozone, color = Calendar month)) +     geom_boxplot() +     theme_classic()          

Code Explanation

  • The colors of the groups are controlled in the aes() mapping. You can utilize color= Month to alter the color of the box and whisker plot according to the months


Box Plot in R

Box plot with multiple groups

It is also possible to add together multiple groups. You can visualize the departure in the air quality according to the twenty-four hours of the measure.

ggplot(data_air_nona, aes(Month, Ozone)) +     geom_boxplot(aes(fill up = day_cat)) +     theme_classic()          

Code Caption

  • The aes() mapping of the geometric object controls the groups to display (this variable has to be a gene)
  • aes(fill= day_cat) allows creating three boxes for each calendar month in the x-axis


Box Plot in R

Box Plot with Jittered Dots

Another style to show the dot is with jittered points. It is a convenient way to visualize points with boxplot for categorical data in R variable.

This method avoids the overlapping of the discrete data.

box_plot +     geom_boxplot() +     geom_jitter(shape = 15,         color = "steelblue",         position = position_jitter(width = 0.21)) +     theme_classic()          

Code Explanation

  • geom_jitter() adds a little disuse to each signal.
  • shape=xv changes the shape of the points. 15 represents the squares
  • color = "steelblue": Alter the color of the betoken
  • position=position_jitter(width = 0.21): Way to identify the overlapping points. position_jitter(width = 0.21) means you move the points by 20 percent from the x-axis. By default, 40 percentage.


Box Plot in R

You tin come across the difference between the get-go graph with the jitter method and the second with the point method.

box_plot +     geom_boxplot() +     geom_point(shape = 5,         color = "steelblue") +     theme_classic()          

Box Plot in R

Notched Box Plot

An interesting feature of geom_boxplot(), is a notched boxplot role in R. The notch plot narrows the box around the median. The master purpose of a notched box plot is to compare the significance of the median between groups. In that location is stiff show two groups have unlike medians when the notches exercise not overlap. A notch is computed as follow:

Box Plot in R

with is the interquartile and number of observations.

box_plot +     geom_boxplot(notch = TRUE) +     theme_classic()          

Code Explanation

  • geom_boxplot(notch=TRUE): Create a notched horizontal boxplot R


Box Plot in R


We tin can summarize the different types of horizontal boxplot R in the table below:

Objective Lawmaking
Basic box plot
ggplot(df, aes( x = x1, y =y)) + geom_boxplot()
flip the side
ggplot(df, aes( x = x1, y =y)) + geom_boxplot() + coord_flip()
Notched box plot
ggplot(df, aes( 10 = x1, y =y)) + geom_boxplot(notch=TRUE)
Box plot with jittered dots
ggplot(df, aes( x = x1, y =y)) + geom_boxplot() + geom_jitter(position = position_jitter(0.21))

