2  Python Basics

The rest of this class assumes you have some basic familiarity with the programming language Python. If you need a refresher, this chapter is your friend.

The code written in this book is for versions of Python after Python 3.10.

2.1 Python, Jupyter and Code Cells

We will be working in Jupyter environments, which consist of code cells interspersed with markdown cells. Markdown cells contain text and formatting instructions, and code cells contain instructions for a computer to execute commands in order.

When you run a code cell by pressing the “play” button (or, often, shift-enter) the code in that block will execute, line by line, in order. The code in the block might save or update a value. That value will stay stored, and be accessible the next time any code block is run in the same session. However, all stored values will disappear if you leave that session.

Try running the following code block!

There is one thing to keep in mind with this: If you run a code block multiple times, it can update values in a non-reproducible way.

Try running the following code block once, and then the second code block several times.

For this reason, we do need to be careful when working in Jupyter environments. All assignments you complete should have all relevant code, in order.

Note

Jupyter cells will print their output in one of two scenarios:

  • The print() function is called (anywhere in the cell), or

  • an object is named or produced on the last line of the cell, without saving it to a name

If the last line of a cell saves an object, that line will (typically) not print anything.

2.2 Basic Python objects

2.2.1 Naming objects and assigning values

When coding, we often want to save values to an easy-to-remember name so we can perform computations or logic with them later. In Python, the = sign assigns the value of whatever expression is on the right, to the name on the left. Objects also have types. The most basic types of objects in Python are below.

Basic Types of Python Objects
Type Meaning How to initialize
Integer int Whole number (negative, positive, or zero) my_int = 5
Float float Decimal number my_float = 5.0
Boolean bool True or False value my_bool = True (or False)
String str Sequence of characters, e.g. a word, my_str = "Anything goes between these quotes!"
Warning

There are two hard rules for naming Python objects.

  • The only characters in object names may be letters, numbers, and underscores (_)

    • This includes no periods — as we will see, the . character does something special in Python!
  • Object names cannot start with a number.

If you try to name an object and you break these rules, you will see an error, as follows.

3rd_thing = 7 
  Cell In[1], line 1
    3rd_thing = 7
    ^
SyntaxError: invalid decimal literal
Note

It is normal to get error messages when coding, especially in notebook environments! Becoming a good programmer means learning to understand, interpret, and fix error messages.

While the above details the only hard rules for naming objects in Python, there are a few other guidelines. For instance, most objects’ names should be descriptive and use lowercase_with_underscores patterns. Furthermore, you should avoid accidentally overwriting existing objects and functions, for instance, you can but should not ever name an object print, because that is the name of an existing object (the print() function). That being said, if you do accidentally “break” Python by overwriting a special name, you can restart your kernel / session and the original functionality will be restored.

2.2.2 Performing Arithmetic with Objects

There are lots of reasons to work with Python objects. Even though Python can perform lots of computations for us, the primary reason you might want to save objects is to allow a computer to do something with them without writing out the same thing multiple places, and to make your computation steps clear. Compare the following two blocks of code.

2.2.2.1 Operations with Numbers

The previous section made use of a number of symbols for basic arithmetic. The symbols Python uses for basic arithmetic operations are as follows:

Important operators for numerical operations in Python
Operator Name Description Example Result
+ Addition Adds two numbers 2 + 3 5
- Subtraction Subtracts the second number from the first 5 - 2 3
* Multiplication Multiplies two numbers 4 * 3 12
/ Division Divides the first number by the second 10 / 2 5.0
// Floor Division Divides and returns the integer part 10 // 3 3
% Modulus Returns the remainder of division 10 % 3 1
** Exponentiation Raises first number to the power of second 2 ** 3 8

2.2.3 Data structures that can be accessed using [ ]

It’s not hard to imagine that to analyze data, we might want to store several items together, and consider the collection of those items to be a meaningful object on its own. For the following types of objects, if you know where an item is stored (via an index or a key), the [ ] notation after the object’s name will access and return that item to you.

The most important data structures in basic Python for data science.
Type of Structure What is it? How to Create How to access Returns
List A collection of objects arranged in an ordered sequence. my_list = [1, 2, "a", "things"]

Provide the index (numerical location). List indices in Python start at 0.

my_list[2]

"a"
Dictionary A collection of “key-value” pairs, where knowing the key gives you access to the value. my_dict = {"key1":"value1", "apple":"red"}

Provide the key (usually a string or number).

my_dict["apple"]

For a list of keys:

my_dict.keys()

"red"

or (for the second line)

["key1", "apple"]

Tuple Basically a list, but cannot be edited once it is created. my_tuple = (1, 2, "a") my_tuple[2] "a"

2.3 Functions and Methods

“Functions” and “Methods” are two different types of command to request that Python do something.

Python has several different ways of “doing things” with a few different allowable notations depending on the type of command. I want you to be exposed to the notation (syntax). These are patterns of writing code, sort of like grammar. If your code does not follow the rules, then it will not run.

2.3.1 Functions (Parentheses Notation)

One of the most common way to combine or manipulate values in Python is by calling functions. Python comes with many built-in functions that perform common operations. Functions are a specific type of object that can do things, including to other objects. You can think of functions as action verbs.

You have already seen the print function. It takes one or more arguments, which is another word for inputs. The arguments must be contained in parentheses, and separated by commas. The print function then does the action, “print what it’s given.”

However, saving the output of print() by writing objectname = print("something") will give you the value, objectname = None (which has type, NoneType). In programming lingo, the print function does not return anything that can be used later. Notice how this behaves by running the cells below, in order!

Some functions take multiple arguments, separated by commas. For example, the built-in max function returns the maximum argument passed to it, by finding and returning the furthest-right number on the number line. Run the cell below for an example.

2.3.2 Some built-in Python functions

Some important built-in functions in Python are listed in the official documentation here.

In the future, you might need to work with some of the functions below:

function syntax valid arguments behavior
abs abs(x) Single int or float Returns the absolute value of a number
help help(obj) The name of an object (including another function) in Python Prints help documentation for the object, if it exists; Returns None.
len len(s) s can be a string, a list, or most things that can be thought of as “made up of many simpler objects”; we will see more examples Returns how many elements are in the object; for a string, this is how many characters (including spaces!) there are.
max max(a,b,c,...) Numbers, strings, etc, as long as they are all the same type and can be compared. (Also accepts a list of these things.) Returns the maximum (highest) value of its arguments. This defaults to, for example, the maximum number, or the latest string in alphabetical order.
min min(a,b,c,...) Numbers, strings, etc, as long as they are all the same type and can be compared. (Also accepts a list of these things.) Returns the minimum (lowest) value of its arguments. This defaults to, for example, the minimum number, or the earliest string in alphabetical order.
pow pow(base,exp) Numbers Returns the value of \(base^{exp}\). (You can also write base**exp, but NOT base^exp; this does something else entirely.)
print print(a,b,c,...) Any number of objects separated by commas Prints the objects; returns None
round round(x, ndigits=None) x must be numeric. Optionally, ndigits, if you supply a value, must be an integer (it can be negative!). Returns the result of rounding x to the specified number of digits after the decimal place. Defaults to rounding to an integer.
sorted sorted(object) Object can be a string, a list of numbers, or a list of any other types of objects that can be compared with \(\leq\) Returns the elements of the object in order from lowest to highest (or alphabetical order)
sum sum(object) A list of numbers Returns the sum of all the numbers in the list
type type(object) Any object Returns the type of the object

2.3.3 Methods (Dot notation)

Some objects have built-in methods. Methods use slightly different notation than functions, but also tell Python to “do something.” The following method replaces all of the space characters with dashes:

The replace method we used here is a little different from the functions we used previously (e.g. print ). Methods are typically called using the dot notation – e.g. stringname.replace("a","b") because this method only works for a particular kind of object that they were designed to work for. Here the replace method was written specifically to work with string objects. In contrast, the print function was written to work with many kinds of objects, therefore, we don’t use the dot notation.

You should be aware, however, that dot notation is also often used in other scenarios, such as to access objects and functions that come from a package, so “using dot notation” is not the only indicator of whether you are using a function or a method. If you are ever in doubt about the right syntax (order of symbols) to use for a function or method, do not forget that you can use the help(pbjectname) function to access the documentation that exists for a type of object.

2.3.4 Attributes (dot notation, no parentheses)

Some objects have attributes, which are characteristics that can also be accessed using the dot notation but do not need parentheses. Attributes are like adjectives, and are descriptive.

For example, the object prof_masden might have an attribute hair_colorwhich you can access by the command,

prof_masden.hair_color

In this case, there are no parentheses after hair_color because calling hair_color does not do anything, it just accesses existing information about the object prof_masden. If prof_masden were an object, then the command prof_masden.hair_color should return "red".

In fact, even the print function is kind of an object, and it has a name that can be accessed using the special __name__ attribute!

Run the following cell to see for yourself.

2.3.5 A comparison between functions, methods and attributes

As you get used to working with Python, this information will feel more natural. However, this comparison table should help you keep the vocabulary straight and recognize valid Python syntax.

Method Function Attribute
Does things Also does things Is descriptive
Works only for objects of a particular type Might work with many types of objects Works only for objects of a particular type
Might require arguments Usually requires arguments Does not have arguments
Called using dot notation: object_name.method_name() or sometimes object_name.method_name(arg1,arg2) Can be called without dots: function_name(arg1,arg2) or sometimes function_name() Accessed using dot notation, but no parentheses: object_name.attribute_name

2.4 Some String Methods

I will add some string methods here once we need them.

2.5 Writing a custom function

You will have to write lots of custom functions in this class.

When you write a function, you need to give it placeholders for its arguments. In this context, we call those placeholders parameters.

You then need to tell the function what to do with those arguments. All instructions must be indented an equal amount.

The end of a function is always signified by the command, return. If you want the function to actually output anything, the objects it returns should appear after the return command.

For example, the following function is designed to compute the cost of a party based on the number of balloons, price of a balloon, number of children, and price of favors (of which each child receives one).

Once you have defined it by running the previous cell, you can now call this function, and change values to compute the cost of many different parties.

The stuff between the three single quotes is called the docstring and makes it so that something prints when you type help().

Unless a function is very simple, you should write something in the docstring so you know what it does.

2.6 Conditionals (if-then logic)

Sometimes you want a piece of code to do something different depending on a condition. To write code that runs when one condition is true, and does something else otherwise, use the format:

if condition: 
    do_thing() 
else: 
    do_something_else()

For instance, the absolute value function is given by:

\[ |x| = \begin{cases}x & \textrm{if }x\geq 0; \\ -x & \textrm{otherwise} \end{cases} \]

Even though the absolute value function is built into Python, it’s not a bad thing to see how we could write it “from scratch”.

If you run the cell above, you can also run the cell below to see that this function does what you want. Feel free to change the inputs if you want.

There are more efficient ways to write this. In python, the return command can show up at multiple places within a function, for instance, after an if statement. However, this class is not about mastering all nuances of Python.

2.7 Loops (Repeated Instructions)

If you want to tell a computer to do “basically” the same thing many times in a row, do not write out the same code over and over. Instead, use a loop.

The most common type of loop you will be expected to use in this class is the for loop, which repeats an instruction for every object in a list and/or dictionary. It follows the general syntax:

for item in iterable: 
    # do something 

For example, let’s look at the list and the dictionary from Data structures that can be accessed using [ ]. These are both iterables, and can be looped through using a for loop in a similar way, with some important distinctions. The following syntax works for both lists and dictionaries, but be careful! Notice that if you change my_list to my_dict below, the loop only lists the keys! This is intended behavior, but it may take a while to get used to.

Changing the line print(item) to the line print(my_dict[item]) is how you access the values of the dictionary instead of its keys.

A common iterable to use is the output of the range(n) function, which produces a range of integers from 0 to n-1.

Valid syntax includes:

  • range(n) (integers from 0 to 4),

  • range(start, stop) (integers from “start” to “stop-1”), and

  • range(start, stop, step) which starts at “start”, stops before “stop”, and has step sizes of size step (this can be negative!)

For example, see if you can modify the below code to print only the numbers 3, 5, 7, and 9. There is more than one way to do this!

You can iterate through a list using indices produced by range(). This is more similar to the process we used for dictionaries, where the iteration goes through keys. If you change the range to something with a start, stop, step, see if you can get this list to be printed backwards instead of forward.

2.8 Importing Packages

A package is a bunch of code that someone (or several other people) has written in order to enable certain Python functionality. In general, when we want to use a package, we import it.

import pandas as pd

Importing follows the format, import package_name as alias

In other words, the package_name refers to the actual name of the package, and the alias is a (usually shorter) version of the same name. Functions, objects and methods that come from a package can then be accessed using dot notation as well. For instance, the following creates a DataFrame object in Pandas and views it. This is the most important part.

df = pd.DataFrame({
    "col1": [1,2,3], 
    "col2": [4,5,6]
})

df
col1 col2
0 1 4
1 2 5
2 3 6

Side note: the DataFrame() initialization function wants a dictionary as input!

2.8.1 Installing packages locally

In general, packages are not simply available in Python. If you’re not working in a curated environment, you need to install those packages. See the installation instructions in the appendix for details.

2.9 Pandas DataFrames

Pandas is the most popular Python package for tabular data - but as we will see, data does not always come in tabular format!

You should know, at least, the following:

To be added.

2.9.1 Loading a csv (comma-separated values) file in Pandas.

Lots of tabular data is available as a comma-separated values file. For instance, there are many interesting csv files available at Larry Winner’s website from the University of Florida (https://users.stat.ufl.edu/~winner/data). The following code loads one of the datasets (about lead in lipsticks, documentation here) and shows the first three lines with the .head() method. The file itself looks like this if you try to open it in a text editor (like Notepad):

JRC_code,purchCntry,prodCntry,Pb,sdPb,shade,prodType,priceCatgry 
C135,NL,NL,3.75,0.24,Red,LP,2 
C18,FI,FI,2.29,0.07,Red,LP,2 
C20,FI,IT,1.27,0.06,Red,LP,2

Notice that the headers are separated by commas, and each line after the first gives a new row of data, but it is painful to look at. To load this file into pandas and view (the first five lines) of the table, instead run:

import pandas as pd 

lead_in_lipstick = pd.read_csv("https://users.stat.ufl.edu/~winner/data/lead_lipstick.csv")

lead_in_lipstick.head(5)
JRC_code purchCntry prodCntry Pb sdPb shade prodType priceCatgry
0 C135 NL NL 3.75 0.24 Red LP 2
1 C18 FI FI 2.29 0.07 Red LP 2
2 C20 FI IT 1.27 0.06 Red LP 2
3 C164 DE FR 1.21 0.06 Red LP 2
4 C71 MT UK 0.85 0.04 Red LP 2

2.9.2 Working with Pandas

Refer to the Python Data Science book , chapter 3, for reference material if you want information on how to use the Pandas package for dataframe manipulation beyond the basics (Tiffany Timbers and Heagy (2022)).