A cookbook approach
The examples in this book follow a “cookbook” approach as much as possible. The reader should be able to modify the examples with her own data, and change the options and variable names as needed. This is more obvious with some examples than others, depending on the complexity of the code.
Color coding in this book
The text in blue in this book is R code that can be copied, pasted, and run in R. The text in red is the expected result, and should not be run. In most cases I have truncated the results and included only the most relevant parts. Comments are in green. It is fine to run comments, but they have no effect on the results.
Copying and pasting code
From the website
Copying the R code pieces from the website version of this book should work flawlessly. Code can be copied from the webpages and pasted into the R console, the R Studio console, the R Studio editor, or a plain text file. All line breaks and formatting spaces should be preserved.
The only issue you may encounter is that if you paste code into the R Studio editor, leading spaces may be added to some lines. This is not usually a problem, but a way to avoid this is to paste the code into a plain text editor, save that file as a .R file, and open it from R Studio.
From the pdf
Copying the R code from the pdf version of this book may work less perfectly. Formatting spaces and even line breaks may be lost. Different pdf readers may behave differently.
It may help to paste the copied code into a plain text editor to clean it up before pasting in to R or saving it as a .R file. Also, if your pdf reader has a select tool that allows you to select text in a rectangle, that works better in some readers.
A sample program
The following is an example of code for R that creates a vector called x and a vector called y, performs a correlation test between x and y, and then plots y vs. x.
This code can copied and pasted into the console area of R or R Studio, or into the editor area of R Studio, and run. You should get the output from the correlation test and the graphical output of the plot.
x = c(1,2,3,4,5,6,7,8,9) # create a
vector of values and call it x
y = c(9,7,8,6,7,5,4,3,1)
cor.test(x,y) # perform correlation
test
plot(x,y) # plot y vs. x
You can run fairly large chunks of code with R, though it is probably better to run smaller pieces, examining the output before proceeding to the next piece.
This kind of code can be saved as a file in the editor section of R Studio, or can be stored separately as a plain text file. By convention files for R code are saved as .R files. These files can be opened and edited with either a plain text editor or with the R Studio editor.
Assignment operators
In my examples I will use an equal sign, =, to assign a value to a variable.
height = 127.5
In examples you find elsewhere, you will more likely see a left arrow, <-, used as the assignment operator.
height <- 127.5
These are essentially equivalent, but I think the equal sign is more readable for a beginner.
Comments
Comments are indicated with a number sign, #. Comments are for human readers, and are not processed by R.
Installing and loading packages
Some of the packages used in this book do not come with R automatically, but need to be installed as add-on packages. For example, if you wanted to use a function in the psych package to calculate the geometric mean of x in the sample program above:
x = c(1,2,3,4,5,6,7,8,9)
First you would need to the install the package psych:
install.packages("psych")
Then load the package:
library(psych)
You may then use the functions included in the package:
geometric.mean(x)
[1] 4.147166
In future sessions, you will need only to load the package; it should still be in the library from the initial installation.
If you see an error like the following, you may have misspelled the name of the package, or the package has not been installed.
library(psych)
Error in library(psych) : there is no package called ‘psych’
Data types
There are several data types in R. Most commonly, the functions we are using will ask for input data to be a vector, a matrix, or a data frame. Data types won’t be discussed extensively here, but the examples in this book will read the data as the appropriate data type for the selected analysis.
Creating data frames from a text string of data
For certain analyses you will want to select a variable from within a data frame. In most examples using data frames, I’ll create the data frame from a text string that allows us to arrange the data in columns and rows, as we normally visualize data.
Here, Input is just a text string that will be converted to a data frame with the read.table function. Note that the text for the table is enclosed in simple double quotes and parentheses.
read.table is pretty tolerant of extra spaces or blank lines. But if we convert a data frame to a matrix—which we will later—with as.matrix—I’ve had errors from trailing spaces at the ends of lines.
Values in the table that will have spaces or special characters can be enclosed in simple single quotes (e.g. 'Spongebob & Patrick').
Input =("
Sex Height
male 175
male 176
female 162
female 165
")
D1 = read.table(textConnection(Input),header=TRUE)
D1
Sex Height
1 male 175
2 male 176
3 female 162
4 female 165
Reading data from a file
R can also read data from a separate file. For longer data sets or complex analyses, it is helpful to keep data files and r code files separate. For example,
D2 = read.table("male-female.dat", header=TRUE)
would read in data from a file called male-female.dat found in the working directory. In this case the file could be a space-delimited text file:
Sex Height
male 175
male 176
female 162
female 165
Or
D2 = read.table("male-female.csv", header=TRUE, sep=",")
for a comma-separated file.
Sex,Height
male,175
male,176
female,162
female,165
D2
Sex Height
1 male 175
2 male 176
3 female 162
4 female 165
R Studio also has an easy interface in the Tools menu to import data from a file.
The getwd function will show the location of the working directory, and setwd can be used to set the working directory.
getwd()
[1] "C:/Users/Salvatore/Documents"
setwd("C:/Users/Salvatore/Desktop")
Alternatively, file paths or URLs can be designated directly in the read.table function.
Variables within data frames
For the data frame D1created above, to look at just the variable Sex in this data frame:
D1$ Sex # Note: the space is optional
[1] male male female female
Levels: female male
Note that D1$Height is a vector of numbers.
D1$ Height
[1] 175 176 162 165
So if you wanted the mean for this variable:
mean(D1$ Height)
[1] 169.5
Using dplyr to create new variables in data frames
The standard method to define new variables in data frames is to use the data.frame$ variable syntax. So if we wanted to add a variable to the D1 data frame above which would double Height:
D1$ Double = D1$ Height * 2 #
Spaces are optional
D1
Sex Height Double
1 male 175 350
2 male 176 352
3 female 162 324
4 female 165 330
Another method is to use the mutate function in the dplyr package:
library(dplyr)
D1 =
mutate(D1,
Triple = Height*3,
Quadruple = Height*4
)
D1
Sex Height Double Triple Quadruple
1 male 175 350 525 700
2 male 176 352 528 704
3 female 162 324 486 648
4 female 165 330 495 660
The dplyr package also has functions to select only certain columns in a data frame (select function) or to filter a data frame by the value of some variable (filter function). It can be helpful for manipulating data frames.
In the examples in this book, I will use either the $ syntax or the mutate function in dplyr, depending on which I think makes the example more comprehensible.
Extracting elements from the output of a function
Sometimes it is useful to extract certain elements from the output of an analysis. For example, we can assign the output from a binomial test to a variable we’ll call Test.
Test = binom.test(7, 12, 3/4,
alternative="less",
conf.level=0.95)
To see the value of Test:
Test
Exact binomial test
number of successes = 7, number of trials = 12, p-value = 0.1576
95 percent confidence interval:
0.0000000 0.8189752
To see what elements are included in Test:
names(Test)
[1] "statistic" "parameter" "p.value" "conf.int" "estimate" "null.value" "alternative"
[8] "method" "data.name"
Or with more details:
str(Test)
To view the p-value from Test:
Test$ p.value
[1] 0.1576437
To view the confidence interval from Test:
Test$ conf.int
[1] 0.0000000 0.8189752
[1] 0.95
To view the upper confidence limit from Test:
Test$ conf.int[2]
[1] 0.8189752
Exporting graphics
R has the ability to produce a variety of plots. Simple plots can be produced with just a few lines of code. These are useful to get a quick visualization of your data or to check on the distribution of residuals from an analysis. More in-depth coding can produce publication-quality plots.
In the Rstudio Plots window, there is an Export icon which can be used to save the plot as image or pdf file. A method I use is to export the plot as pdf and then open this pdf with either Adobe Photoshop or the free alternative, GIMP (www.gimp.org/). These programs allow you to import the pdf at whatever resolution you need, and then crop out extra white space.
The appearance of exported plots will change depending on the size and scale of exported file. If there are elements missing from a plot, it may be because the size is not ideal. Changing the export size is also an easy way to adjust the size of the text of a plot relative to the other elements.
An additional trick in Rstudio is to change the size of the plot window after the plot is produced, but before it is exported. Sometimes this can get rid of problems where, for example, words in a plot legend are cut off.
Finally, if you export a plot as a pdf, but still need to edit it further, you can open it in Inkscape, ungroup the plot elements, adjust some plot elements, and then export as a high-resolution bitmap image. Just be sure you don’t change anything important, like how the data line up with the axes.