Types of variables in SAEPER
For a discussion of the following topics, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/C_01.html).
• Organizing data–observations and variables
• Types of variables
• Long-format and wide-format data
• Nominal, ordinal, and interval/ratio data
• Discrete and continuous variables
• Levels of measurement.
Importing packages in this chapter
The following commands will import required packages used in this chapter from libraries and assign them common aliases. You may need install these libraries first.
import io
import pandas as pd
Types of variables in Python
Python does not use the terms nominal, ordinal, and interval/ratio for types of variables.
Base Python
Types of variables in base Python include, but are not limited to, numbers, strings, and lists.
Numbers can include integers and floating-point values.
Rating = -3
print(type(Rating))
<class 'int'>
Temperature = 12.5
print(type(Temperature))
<class 'float'>
Strings contain text. Python allows for manipulation of text data, but this topic won’t be investigated here.
Instructor = 'Louise Belcher'
print(type(Instructor))
<class 'str'>
Lists can contain different kinds of values, and the individual elements can be examined. Note that the first element in a list is referenced with an index value of 0.
Day1 = ["Beginner course", 12, 22.4]
print(Day1[0])
Beginner course
print(type(Day1[0]))
<class 'str'>
print(Day1[1])
12
print(type(Day1[1]))
<class 'int'>
pandas data types
The pandas package implements additional data types, which will be useful for data analysis tasks.
Data can be entered as a list of values and then converted to a pandas array. pandas arrays have a designator dtype to indicate the type of data in the array.
Nominal data
In pandas, nominal data can be of the types string or category. In a data frame, string or mixed data will be designated as object type.
Category variables are similar to factor variables in R. They contain a limited number of unique values, and the levels can be ordered. It may be necessary to identify variables as category variables when they are passed to certain functions for plotting or statistical analysis.
ColorsList = ['Red', 'Red', 'Green', 'Green', 'Blue']
### Convert the list to category data
Colors = pd.Categorical(ColorsList)
print(Colors)
['Red', 'Red', 'Green', 'Green', 'Blue']
Categories (3, object): ['Blue', 'Green', 'Red']
### Note that the array has three categories: Blue, Green, and Red.
### Python has alphabetized the categories.
The order of the categories can be changed. This is useful to get category results in the desired order for statistical analyses and plots. Note that the order of the actual data isn’t changed.
ColorsList = ['Red', 'Red', 'Green', 'Green', 'Blue']
Colors = pd.Categorical(ColorsList,
categories =['Red', 'Green', "Blue"])
print(Colors.categories)
Index(['Red', 'Green', 'Blue'], dtype='object')
print(Colors)
['Red', 'Red', 'Green', 'Green', 'Blue']
Categories (3, object): ['Red', 'Green', 'Blue']
We can also change the names of the categories.
Colors = Colors.rename_categories(['Rojo', 'Verde', "Azul"])
print(Colors)
['Rojo', 'Rojo', 'Verde', 'Verde', 'Azul']
Categories (3, object): ['Rojo', 'Verde', 'Azul']
Numeric data
Interval/ratio data can be coded as variables with integer or floating-point classes. The following example uses integer data.
BugCountList = [1, 2, 3, 4, 5]
BugCount = pd.to_numeric(BugCountList)
print(BugCount)
[1 2 3 4 5]
print(BugCount.dtype)
int64
Mathematical operations can be performed on arrays.
BugCountSquared = BugCount ** 2
print(BugCountSquared)
[ 1 4 9 16 25]
The following example uses floating point data.
BugTemperatureList = [12.5, 13.6, 11.9, 9.4, 11.6]
BugTemperature = pd.to_numeric(BugTemperatureList)
print(BugTemperature)
[ 12.5 13.6 11.9 9.4 11.6 ]
print(BugTemperature.dtype)
float64
BugTemperatureF = BugTemperature * 9/5 + 32
print(BugTemperatureF)
[ 54.5 56.48 53.42 48.92 52.88 ]
Ordinal data
We can code ordinal data as either numeric or category variables, depending on how we will be summarizing, plotting, and analyzing them.
Dragons = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""
Tribe Length.m SizeRank
IceWings 6.4 1
MudWings 6.1 2
SeaWings 5.8 3
SkyWings 5.5 4
NightWings 5.2 5
RainWings 4.9 6
SandWings 4.6 7
"""))
print(Dragons.info())
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Tribe 7 non-null object
1 Length.m 7 non-null float64
2 SizeRank 7 non-null int64
The variable SizeRank was read as an integer variable. As an ordinal variable, it can be converted to a category variable, which may be useful with some types of summary and analysis.
Likewise, as Tribe was read as an object variable, we will convert it to a category variable.
Dragons['SizeRank'] = Dragons['SizeRank'].astype('category')
Dragons['Tribe'] = Dragons['Tribe'].astype('category')
print(Dragons.info())
# # Column Non-Null Count Dtype
#--- ------ -------------- -----
# 0 Tribe 7 non-null category
# 1 Length.m 7 non-null float64
# 2 SizeRank 7 non-null category
References
Rouxzee. 2014. Wings of Fire: How Big are the Dragons? Diviantart.com. (Since deactivated.)