[banner]

A Python Companion to Extension Program Evaluation

Salvatore S. Mangiafico

Types of Variables

Types of variables in SAEPER

 

For a discussion of the following topics, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/C_01.html). 

 

• Organizing data–observations and variables

• Types of variables

• Long-format and wide-format data

• Nominal, ordinal, and interval/ratio data

• Discrete and continuous variables

• Levels of measurement.

 

Importing packages in this chapter

 

The following commands will import required packages used in this chapter from libraries and assign them common aliases.  You may need install these libraries first.

 

import io

 

import pandas as pd

 

 

Types of variables in Python

 

Python does not use the terms nominal, ordinal, and interval/ratio for types of variables.

 

Base Python

 

Types of variables in base Python include, but are not limited to, numbers, strings, and lists.

 

Numbers can include integers and floating-point values.

 

Rating = -3

 

print(type(Rating))

 

<class 'int'>

 

 

Temperature = 12.5

 

print(type(Temperature))

 

<class 'float'>

 

 

Strings contain text.  Python allows for manipulation of text data, but this topic won’t be investigated here.

 

Instructor = 'Louise Belcher'

 

print(type(Instructor))

 

<class 'str'>

 

 

Lists can contain different kinds of values, and the individual elements can be examined.  Note that the first element in a list is referenced with an index value of 0.

 

Day1 = ["Beginner course", 12, 22.4]

 

print(Day1[0])

 

Beginner course

 

 

print(type(Day1[0]))

 

<class 'str'>

 

 

print(Day1[1])

 

12

 

 

print(type(Day1[1]))

 

<class 'int'>

 

 

pandas data types

 

The pandas package implements additional data types, which will be useful for data analysis tasks.

 

Data can be entered as a list of values and then converted to a pandas array. pandas arrays have a designator dtype to indicate the type of data in the array.

 

Nominal data

 

In pandas, nominal data can be of the types string or category.  In a data frame, string or mixed data will be designated as object type.

 

Category variables are similar to factor variables in R. They contain a limited number of unique values, and the levels can be ordered.  It may be necessary to identify variables as category variables when they are passed to certain functions for plotting or statistical analysis.

 

ColorsList = ['Red', 'Red', 'Green', 'Green', 'Blue']

 

 

### Convert the list to category data

 

Colors = pd.Categorical(ColorsList)

 

print(Colors)

 

['Red', 'Red', 'Green', 'Green', 'Blue']

 

Categories (3, object): ['Blue', 'Green', 'Red']

 

   ### Note that the array has three categories: Blue, Green, and Red.

 

   ### Python has alphabetized the categories.

 

 

The order of the categories can be changed.  This is useful to get category results in the desired order for statistical analyses and plots.  Note that the order of the actual data isn’t changed.

 

ColorsList = ['Red', 'Red', 'Green', 'Green', 'Blue']

 

Colors = pd.Categorical(ColorsList,

                        categories =['Red', 'Green', "Blue"])

 

print(Colors.categories)

 

Index(['Red', 'Green', 'Blue'], dtype='object')

 

 

print(Colors)

 

['Red', 'Red', 'Green', 'Green', 'Blue']

 

Categories (3, object): ['Red', 'Green', 'Blue']



We can also change the names of the categories.

 

Colors = Colors.rename_categories(['Rojo', 'Verde', "Azul"])

 

print(Colors)

 

['Rojo', 'Rojo', 'Verde', 'Verde', 'Azul']

 

Categories (3, object): ['Rojo', 'Verde', 'Azul']

 

 

Numeric data

 

Interval/ratio data can be coded as variables with integer or floating-point classes.  The following example uses integer data.


BugCountList = [1, 2, 3, 4, 5]

 

BugCount = pd.to_numeric(BugCountList)

 

 

print(BugCount)

 

[1 2 3 4 5]

 

 

print(BugCount.dtype)

 

int64

 

 

Mathematical operations can be performed on arrays.

 

BugCountSquared = BugCount ** 2

 

print(BugCountSquared)

 

[ 1  4  9 16 25]

 

 

The following example uses floating point data.

 

BugTemperatureList = [12.5, 13.6, 11.9, 9.4, 11.6]

 

BugTemperature = pd.to_numeric(BugTemperatureList)

 

print(BugTemperature)

 

[  12.5  13.6  11.9   9.4  11.6 ]

 

 

print(BugTemperature.dtype)

 

float64

 

 

BugTemperatureF = BugTemperature * 9/5 + 32

 

print(BugTemperatureF)

 

[  54.5   56.48  53.42  48.92  52.88  ]

 

 

Ordinal data

 

We can code ordinal data as either numeric or category variables, depending on how we will be summarizing, plotting, and analyzing them.

 

Dragons = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""

 

Tribe       Length.m  SizeRank

IceWings    6.4       1

MudWings    6.1       2

SeaWings    5.8       3

SkyWings    5.5       4

NightWings  5.2       5

RainWings   4.9       6

SandWings   4.6       7

"""))

 

print(Dragons.info())

 

 #   Column    Non-Null Count  Dtype 

---  ------    --------------  ----- 

 0   Tribe     7 non-null      object

 1   Length.m  7 non-null      float64

 2   SizeRank  7 non-null      int64

 

 

The variable SizeRank was read as an integer variable.  As an ordinal variable, it can be converted to a category variable, which may be useful with some types of summary and analysis.

 

Likewise, as Tribe was read as an object variable, we will convert it to a category variable.

 

Dragons['SizeRank'] = Dragons['SizeRank'].astype('category')

 

Dragons['Tribe']    = Dragons['Tribe'].astype('category')

 

print(Dragons.info())

 

# #   Column    Non-Null Count  Dtype  

#---  ------    --------------  -----  

# 0   Tribe     7 non-null      category

# 1   Length.m  7 non-null      float64

# 2   SizeRank  7 non-null      category

 

 

References

 

Rouxzee. 2014. Wings of Fire: How Big are the Dragons? Diviantart.com. (Since deactivated.)