Packages used in this chapter
The following commands will install these packages if they are not already installed:
if(!require(dplyr)){install.packages("dplyr")}
if(!require(psych)){install.packages("psych")}
A cookbook approach
The examples in this book follow a “cookbook” approach as much as possible. The reader should be able to modify the examples with her own data and change the options and variable names as needed. This is more obvious with some examples than others, depending on the complexity of the code.
Color coding in this book
The text in blue in this book is R code that can be copied, pasted, and run in R. The text in red is the expected result and should not be run. In most cases I have truncated the results and included only the most relevant parts. Comments are in green. It is fine to run comments, but they have no effect on the results.
Copying and pasting code
From the website
Copying the R code pieces from the website version of this book should work flawlessly. Code can be copied from the webpages and pasted into the R console, the RStudio console, the RStudio editor, or a plain text file. All line breaks and formatting spaces should be preserved.
The only issue you may encounter is that if you paste code into the RStudio editor, leading spaces may be added to some lines. This is not usually a problem, but a way to avoid this is to paste the code into a plain text editor, save that file as a .R file, and open it from RStudio.
From the pdf
Copying the R code from the pdf version of this book may work less perfectly. Formatting spaces and even line breaks may be lost. Different pdf readers may behave differently.
It may help to paste the copied code in to a plain text editor to clean it up before pasting into R or saving it as a .R file. Also, if your pdf reader has a select tool that allows you to select text in a rectangle, that works better in some readers.
A sample program
The following is an example of code for R that creates a vector called x and a vector called y, performs a correlation test between x and y, and then plots y vs. x.
This code can be copied and pasted into the console area of R or RStudio, or into the editor area of RStudio, and run. You should get the output from the correlation test and the graphical output of the plot.
x = c(1,2,3,4,5,6,7,8,9) # create a
vector of values and call it x
y = c(9,7,8,6,7,5,4,3,1)
cor.test(x,y) # perform correlation
test
plot(x,y) # plot y vs. x
You can run fairly large chunks of code with R, though it is probably better to run smaller pieces, examining the output before proceeding to the next piece.
This kind of code can be saved as a file in the editor section of RStudio, or can be stored separately as a plain text file. By convention files for R code are saved as .R files. These files can be opened and edited with either a plain text editor or with the RStudio editor.
Assignment operators
In my examples I will use an equal sign, =, to assign a value to a variable.
height = 127.5
In examples you find elsewhere, you will more likely see a left arrow, <-, used as the assignment operator.
height <- 127.5
These are essentially equivalent, but I think the equal sign is more readable for a beginner.
Comments
Comments are indicated with a number sign, #. Comments are for human readers, and are not processed by R.
Installing and loading packages
Some of the packages used in this book do not come with R automatically but need to be installed as add-on packages. For example, if you wanted to use a function in the psych package to calculate the geometric mean of x in the sample program above:
x = c(1,2,3,4,5,6,7,8,9)
First you would need to the install the package psych:
install.packages("psych")
Then load the package:
library(psych)
You may then use the functions included in the package:
geometric.mean(x)
[1] 4.147166
In future sessions, you will need only to load the package; it should still be in the library from the initial installation.
If you see an error like the following, you may have misspelled the name of the package, or the package has not been installed.
library(psych)
Error in library(psych) : there is no package called ‘psych’
Data types
There are several data types in R. Most commonly, the functions we are using will ask for input data to be a vector, a matrix, or a data frame. Data types won’t be discussed extensively here, but the examples in this book will read the data as the appropriate data type for the selected analysis.
Creating data frames from a text string of data
For certain analyses you will want to select a variable from within a data frame. In most examples using data frames, I’ll create the data frame from a text string that allows us to arrange the data in columns and rows, as we normally visualize data.
A data frame can be created with the read.table function. Note that the text for the table is enclosed in simple double quotes and parentheses. read.table is pretty tolerant of extra spaces or blank lines. But if we convert a data frame to a matrix—which we will later—with as.matrix—I’ve had errors from trailing spaces at the ends of lines.
Values in the table that will have spaces or special characters can be enclosed in simple single quotes (e.g. 'Spongebob & Patrick').
D1 = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Gender Height
male 175
male 176
female 162
female 165
")
D1
Gender Height
1 male 175
2 male 176
3 female 162
4 female 165
Reading data from a file
R can also read data from a separate file. For longer data sets or complex analyses, it is helpful to keep data files and r code files separate. For example,
D2 = read.table("GenderHeight.dat", header=TRUE, stringsAsFactors=TRUE)
would read in data from a file called female-male.dat found in the working directory. In this case the file could be a space-delimited text file:
Sex Height
male 175
male 176
female 162
female 165
Or, with read.csv,
D2 = read.csv("GenderHeight.csv", header=TRUE, stringsAsFactors=TRUE)
for a comma-separated file.
Gender,Height
male,175
male,176
female,162
female,165
D2
Gender Height
1 male 175
2 male 176
3 female 162
4 female 165
RStudio also has an easy interface in the Tools menu to import data from a file.
The getwd function will show the location of the working directory, and setwd can be used to set the working directory.
getwd()
[1] "C:/Users/Salvatore/Documents"
setwd("C:/Users/Salvatore/Desktop")
Alternatively, file paths or URLs can be designated directly in the read.table function.
D3 = read.csv("https://rcompanion.org/documents/GenderHeight.csv",
header=TRUE, stringsAsFactors=TRUE)
D3
Gender Height
1 male 175
2 male 176
3 female 162
4 female 165
Variables within data frames
For the data frame D1created above, to look at just the variable Gender in this data frame:
D1$Gender
[1] male male female female
Levels: female male
Note that D1$Height is a vector of numbers.
D1$Height
[1] 175 176 162 165
So if you wanted the mean for this variable:
mean(D1$Height)
[1] 169.5
Using dplyr to create new variables in data frames
The standard method to define new variables in data frames is to use the data.frame$ variable syntax. So if we wanted to add a variable to the D1 data frame above which would double Height:
D1$ Double = D1$ Height * 2 #
Spaces are optional
D1
Gender Height Double
1 male 175 350
2 male 176 352
3 female 162 324
4 female 165 330
Another method is to use the mutate function in the dplyr package:
library(dplyr)
D1 =
mutate(D1,
Triple = Height*3,
Quadruple = Height*4)
D1
Gender Height Double Triple Quadruple
1 male 175 350 525 700
2 male 176 352 528 704
3 female 162 324 486 648
4 female 165 330 495 660
The dplyr package also has functions to select only certain columns in a data frame (select function) or to filter a data frame by the value of some variable (filter function). It can be helpful for manipulating data frames.
In the examples in this book, I will use either the $ syntax or the mutate function in dplyr, depending on which I think makes the example more comprehensible.
Extracting elements from the output of a function
Sometimes it is useful to extract certain elements from the output of an analysis. For example, we can assign the output from a binomial test to a variable we’ll call Test.
Test = binom.test(7, 12, 3/4,
alternative="less",
conf.level=0.95)
To see the value of Test:
Test
Exact binomial test
number of successes = 7, number of trials = 12, p-value = 0.1576
95 percent confidence interval:
0.0000000 0.8189752
To see what elements are included in Test:
names(Test)
[1] "statistic" "parameter"
"p.value" "conf.int" "estimate"
"null.value" "alternative"
[8] "method" "data.name"
Or with more details:
str(Test)
To view the p-value from Test:
Test$ p.value
[1] 0.1576437
To view the confidence interval from Test:
Test$ conf.int
[1] 0.0000000 0.8189752
[1] 0.95
To view the upper confidence limit from Test:
Test$ conf.int[2]
[1] 0.8189752
Exporting graphics
R has the ability to produce a variety of plots. Simple plots can be produced with just a few lines of code. These are useful to get a quick visualization of your data or to check on the distribution of residuals from an analysis. More in-depth coding can produce publication-quality plots.
Exporting plots from the RStudio window
In the RStudio Plots window, there is an Export icon which can be used to save the plot as image or pdf file. A method I use is to export the plot as pdf and then open this pdf with either Adobe Photoshop or the free alternative, GIMP (www.gimp.org/). These programs allow you to import the pdf at whatever resolution you need, and then crop out extra white space.
The appearance of exported plots will change depending on the size and scale of exported file. If there are elements missing from a plot, it may be because the size is not ideal. Changing the export size is also an easy way to adjust the size of the text of a plot relative to the other elements.
An additional trick in RStudio is to change the size of the plot window after the plot is produced, but before it is exported. Sometimes this can get rid of problems where, for example, words in a plot legend are cut off.
Finally, if you export a plot as a pdf, but still need to edit it further, you can open it in Inkscape, ungroup the plot elements, adjust some plot elements, and then export as a high-resolution bitmap image. Just be sure you don’t change anything important, like how the data line up with the axes.
Exporting plots directly as a file
R also allows for the direct exporting of graphics as a .bmp, .jpg, .png, or .tif file. See ?png for details. This method allows you to specify the dimensions and resolution of the outputted image.
Note that dev.off() is used afterwards to redirect future output to its usual channel.
### Optional code to set the directory
where the image will be saved
setwd("C:/Users/Salvatore/Desktop")
### Create data frame
D4 = read.table(header=TRUE, stringsAsFactors=TRUE, text="
TolkienRace AvgHeight
Dwarf 130
Hobbit 105
Man 165
Elf 170
Orc 125
")
### Output a plot as a .png file
png(filename = "TolkienPlot.png",
width = 5,
height = 3.75,
units = "in",
res = 300)
barplot(AvgHeight ~ TolkienRace, data=D4)
dev.off()