Python Handbook: Types of Variables

Types of variables in SAEPER

For a discussion of the following topics, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/C_01.html).

• Organizing data–observations and variables

• Types of variables

• Long-format and wide-format data

• Nominal, ordinal, and interval/ratio data

• Discrete and continuous variables

• Levels of measurement.

Importing packages in this chapter

The following commands will import required packages used in this chapter from libraries and assign them common aliases. You may need install these libraries first.

import io

import pandas as pd

Types of variables in Python

Python does not use the terms nominal, ordinal, and interval/ratio for types of variables.

Base Python

Types of variables in base Python include, but are not limited to, numbers, strings, and lists.

Numbers can include integers and floating-point values.

Rating = -3

print(type(Rating))

Temperature = 12.5

print(type(Temperature))

Strings contain text. Python allows for manipulation of text data, but this topic won’t be investigated here.

Instructor = 'Louise Belcher'

print(type(Instructor))

Lists can contain different kinds of values, and the individual elements can be examined. Note that the first element in a list is referenced with an index value of 0.

Day1 = ["Beginner course", 12, 22.4]

print(Day1[0])

Beginner course

print(type(Day1[0]))

print(Day1[1])

print(type(Day1[1]))

pandas data types

The pandas package implements additional data types, which will be useful for data analysis tasks.

Data can be entered as a list of values and then converted to a pandas array. pandas arrays have a designator dtype to indicate the type of data in the array.

Nominal data

In pandas, nominal data can be of the types string or category. In a data frame, string or mixed data will be designated as object type.

Category variables are similar to factor variables in R. They contain a limited number of unique values, and the levels can be ordered. It may be necessary to identify variables as category variables when they are passed to certain functions for plotting or statistical analysis.

ColorsList = ['Red', 'Red', 'Green', 'Green', 'Blue']

### Convert the list to category data

Colors = pd.Categorical(ColorsList)

print(Colors)

['Red', 'Red', 'Green', 'Green', 'Blue']

Categories (3, object): ['Blue', 'Green', 'Red']

### Note that the array has three categories: Blue, Green, and Red.

### Python has alphabetized the categories.

The order of the categories can be changed. This is useful to get category results in the desired order for statistical analyses and plots. Note that the order of the actual data isn’t changed.

ColorsList = ['Red', 'Red', 'Green', 'Green', 'Blue']

Colors = pd.Categorical(ColorsList,

categories =['Red', 'Green', "Blue"])

print(Colors.categories)

Index(['Red', 'Green', 'Blue'], dtype='object')

print(Colors)

['Red', 'Red', 'Green', 'Green', 'Blue']

Categories (3, object): ['Red', 'Green', 'Blue']

We can also change the names of the categories.

Colors = Colors.rename_categories(['Rojo', 'Verde', "Azul"])

print(Colors)

['Rojo', 'Rojo', 'Verde', 'Verde', 'Azul']

Categories (3, object): ['Rojo', 'Verde', 'Azul']

Numeric data

Interval/ratio data can be coded as variables with integer or floating-point classes. The following example uses integer data.

BugCountList = [1, 2, 3, 4, 5]

BugCount = pd.to_numeric(BugCountList)

print(BugCount)

[1 2 3 4 5]

print(BugCount.dtype)

int64

Mathematical operations can be performed on arrays.

BugCountSquared = BugCount ** 2

print(BugCountSquared)

[ 1 4 9 16 25]

The following example uses floating point data.

BugTemperatureList = [12.5, 13.6, 11.9, 9.4, 11.6]

BugTemperature = pd.to_numeric(BugTemperatureList)

print(BugTemperature)

[ 12.5 13.6 11.9 9.4 11.6 ]

print(BugTemperature.dtype)

float64

BugTemperatureF = BugTemperature * 9/5 + 32

print(BugTemperatureF)

[ 54.5 56.48 53.42 48.92 52.88 ]

Ordinal data

We can code ordinal data as either numeric or category variables, depending on how we will be summarizing, plotting, and analyzing them.

Dragons = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""

Tribe Length.m SizeRank

IceWings 6.4 1

MudWings 6.1 2

SeaWings 5.8 3

SkyWings 5.5 4

NightWings 5.2 5

RainWings 4.9 6

SandWings 4.6 7

"""))

print(Dragons.info())

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Tribe 7 non-null object

1 Length.m 7 non-null float64

2 SizeRank 7 non-null int64

The variable SizeRank was read as an integer variable. As an ordinal variable, it can be converted to a category variable, which may be useful with some types of summary and analysis.

Likewise, as Tribe was read as an object variable, we will convert it to a category variable.

Dragons['SizeRank'] = Dragons['SizeRank'].astype('category')

Dragons['Tribe'] = Dragons['Tribe'].astype('category')

print(Dragons.info())

# # Column Non-Null Count Dtype

#--- ------ -------------- -----

# 0 Tribe 7 non-null category

# 1 Length.m 7 non-null float64

# 2 SizeRank 7 non-null category

References

Rouxzee. 2014. Wings of Fire: How Big are the Dragons? Diviantart.com. (Since deactivated.)

A Python Companion to Extension Program Evaluation

Types of Variables

Types of variables in SAEPER

Importing packages in this chapter

Types of variables in Python

Base Python

pandas data types

Nominal data

Numeric data

Ordinal data

References