[banner]

An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

Avoiding Pitfalls in R

Grammar, spelling, and capitalization count

 

Probably the most common problems in programming in any language are syntax errors, for example, forgetting a comma or misspelling the name of a variable or function. 

 

Be sure to include quotes around names requiring them; also be sure to use straight quotes ( " ) and not the smart quotes that some word processors use automatically.  It is helpful to write your R code in a plain text editor or in the editor window in RStudio.

 

Data types in functions

 

Probably the biggest cause of problems I had when I first started working with R was trying to feed functions the wrong data type.  For example, if a function asks for the data as a matrix, and you give it a data frame, it won’t work. 

 

A more subtle error I’ve encountered is when a function is expecting a variable to be a factor vector, and it’s really a character (“chr”) vector.

 

For instance, if we create a variable in the global environment with the values of Gender, it will be a character vector.

 

Gender = c("male", "male", "female", "female")

str(Gender)     # What is the structure of this variable?

 

chr [1:4] "male" "male" "female" "female"

 

While in the data frame, Gender was read in as a factor variable:

 

str(D1$ Gender)

 

Factor w/ 2 levels "female","male": 2 2 1 1

 

 

One of the nice things about using RStudio is that it allows you to look at the structure of data frames and other objects in the Environment window.

 

Data types can be converted from one data type to another, but it may not be obvious how to do some conversions.  Functions to convert data types include factor, as.numeric, and as.character.


Gender = c("male", "male", "female", "female")

str(Gender)

 

chr [1:4] "male" "male" "female" "female"

 

 

Gender2 = factor(Gender)

Gender2

 

[1] male   male   female female

Levels: female male

 

str(Gender2)

 

Factor w/ 2 levels "female","male": 2 2 1 1

 

 

Gender3 = as.numeric(Gender2)

Gender3

 

[1] 2 2 1 1

 

str(Gender3)

 

num [1:4] 2 2 1 1

 

 

Gender4 = as.character(Gender3)

Gender4

 

[1] "2" "2" "1" "1"

 

str(Gender4)

 

chr [1:4] "2" "2" "1" "1"

 

 

Creating data frames from vector variables

 

You can combine vector variables into a data frame with the data.frame function.  However, note that the vectors need to have the same number of observations.

 

GenderFrame = data.frame(Gender, Gender2, Gender3, Gender4)

str(GenderFrame)

 

'data.frame':     4 obs. of  4 variables:

 $ Gender : chr  "male" "male" "female" "female"

 $ Gender2: Factor w/ 2 levels "female","male": 2 2 1 1

 $ Gender3: num  2 2 1 1

 $ Gender4: chr  "2" "2" "1" "1"

 

 

GenderFrame

 

  Gender Gender2 Gender3 Gender4

1   male    male       2       2

2   male    male       2       2

3 female  female       1       1

4 female  female       1       1

 

 

Style

 

There isn’t an established style for programming in R in many respects, such as if variable names should be capitalized.  In practice, people use different style conventions. 

 

But it’s helpful if you establish a convention for yourself, for example, in terms of the capitalization for variable names.  For example, you could decide to always capitalize variable names and data frame names.

 

Some punctuation can be used variable names.  For example, any of the following conventions could be used to create a fifth variable listing observations of genders.

 

Gender5  = factor(c("Female", "Male", "Nonbinary", "Female"))

 

Gender.5 = factor(c("Female", "Male", "Nonbinary", "Female"))

 

Gender_5 = factor(c("Female", "Male", "Nonbinary", "Female"))


Or any of the following conventions could be used to name a data frame.


mydata  = data.frame(X=c(1,2,3), Y=c(4,5,6))

myData  = data.frame(X=c(1,2,3), Y=c(4,5,6))

MyData  = data.frame(X=c(1,2,3), Y=c(4,5,6))

My.Data = data.frame(X=c(1,2,3), Y=c(4,5,6))

My_Data = data.frame(X=c(1,2,3), Y=c(4,5,6))