Python

Readings

Required (prior to class)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
login as: tmoore
tmoore@minnow.wellesley.edu's password:
Last login: Tue Jan 24 07:12:39 2012 from puma.wellesley.edu
[tmoore@minnow ~] python
Python 2.7 (r27:82500, Sep 16 2010, 18:03:06)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print 'hi'
hi
>>> 3+4
7
>>>

Optional (If you're new to Python and want to learn more than what is in these notes, then these are good resources.)

Why Python?

We already know Java. Why learn Python?

Python is a very likable language. Some things that I like about it:

Python as an Environment

One of the best ways to learn python is to type expressions into the interpreter. Just give the command python to your Linux or Mac shell, or from a Putty SSH terminal in Windows, and start typing expressions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[tmoore@minnow ~] python
Python 2.7 (r27:82500, Sep 16 2010, 18:03:06)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 3
>>> b = 4
>>> import math
>>> c = math.sqrt(a*a+b*b)
>>> c
5.0

To exit, invoke the quit() function or type a control-D.

Python as a Language

Python code looks like no other code that I'm familiar with. It's a complete departure from the C family of languages.

1
2
3
4
5
6
7
# Like shell scripts, it uses # as a end-of-line comment character

import math    # packages are "loaded" by the import statement

a = 3          # no need to declare types; very dynamic
b = 4
c = math.sqrt(a*a+b*b)

So far, not too bad. Let's see some more syntax:

1
2
3
4
5
if a == b:
    print 'a and b are the same'
else:
    print 'a and b differ'
print "let's go on"

Hmmm. Where are the parens and braces? Gone! Python knows that the else section is over because the indentation ends. That's right, the indentation has syntactic meaning in Python. So, the following two programs are different in Python.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#first code block
i = 0
while i < 10:
    i += 1
print i

#second code block
i = 0
while i < 10:
    i += 1
    print i

The first code block prints the last number, while the bottom one prints every number, because the print statement is inside the loop.

Functions in Python

Function declarations in Python are simple: a name, a formal argument list (no data types), and a body. The end of the body is, as expected, signaled by the end of indentation.

You can try these by downloading the mathfuns.py python file, importing the contents into python, and running the functions:

1
2
3
4
5
>>> import mathfuns
>>> mathfuns.hypo(5,12)
13.0
>>> mathfuns.gcd(55,89)
1

You can avoid the filename (which is also the name of the module) by importing particular members or all members:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
>>> from math import sqrt
>>> sqrt(9)
3
>>> from mathfuns import *
>>> fibonacci(100)   # too long!
>>> gcd(30,50)
a is 30 and b is 50
a is 30 and b is 20
a is 10 and b is 20
a is 10 and b is 10
10

Data types in Python

All the examples so far have been numeric, for no good reason but that numbers don't need much introduction. Let's look at some more interesting datatypes. To play with these test values, download this sampledata.py file.

Strings

Strings pretty much work as you expect. You can concatenate them with the + operator. You can take their length. You can print them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
>>> from sampledata import *
>>> x+x
'spam, spam, '
>>> x+x+y+' and '+x
'spam, spam, eggs,  and spam, '
>>> x+x+y+'and '+x
'spam, spam, eggs, and spam, '
>>> len(x)
6
>>> lex(x+y)
12
>>> print(x+y)
spam, eggs,

You can substitute stuff into them like the C printf statement, using % and a letter code to indicate the type and format. Here are the most common codes:

Code Data type Example string Example value
%s String 'Hi %s' % ('Tyler') 'Hi Tyler'
%i Integer '%i+%i' % (1,2) '1+2'
%f Float '%f+%i' % (1.1,2) '1.100000+2'
%.nf Float with n decimal places '%.2f+%i' % (1.1,2) '1.10+2'

For complete reference see the Python documentation.

Lists

Lists are denoted with square brackets with commas between the elements. You can index them numerically, and extract sub-lists. You can append stuff onto the end (actually, either end). You can store into them. Lists are one of the most useful and commonly used data structures in Python.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
>>> from sampledata import *
>>> len(cheeses)
6
>>> cheeses[0]
'swiss'
>>> cheeses[1:3]
['gruyere', 'cheddar']
>>> cheeses[1:4]
['gruyere', 'cheddar', 'stilton']
>>> cheeses.append('gouda')
>>> cheeses
['swiss', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda']
>>> cheeses[0] = 'emmentaler'
>>> cheeses
['emmentaler', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda']

The append shows how to invoke a method on a list, and that lists are mutable, unlike tuples.

Tuples

Tuples are just like lists, except that they use parentheses instead of square brackets and they are immutable.

1
2
3
4
5
6
7
8
>>> from sampledata import *
>>> len(troupe)
6
>>> troupe[0] = 'Homer'   # won't work
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'tuple' object does not support item assignment
>>>

Dictionaries

Like all civilized languages, Python has hashtables built-in, except that Python calls them dictionaries (like Smalltalk). Dictionaries are another fundamental data structure in Python that you will use frequently.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> from sampledata import *
>>> uni
{'Jones': 'Oxford', 'Gilliam': 'Occidental', 'Cleese': 'Cambridge', 'Chapman': 'Cambridge', 'Idle': 'Cambridge', 'Palin': 'Oxford'}
>>> uni['Palin']
'Oxford'
>>> uni['Palin'] = 'Oxford University'
>>> uni['Palin']
'Oxford University'
>>> uni.keys()
['Jones', 'Gilliam', 'Cleese', 'Chapman', 'Idle', 'Palin']

Iteration in Python

We've seen the usual while loop above, which is very normal. The python for loop, however, is very powerful, due to its natural integration with iterators. You've seen iterators in Java, but the implementation in Python is much cleaner.

Essentially, if you pass a list or dictionary into a for loop, it will iterate over the list's values or the dictionary's keys.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> from sampledata import *
>>> for cheese in cheeses:
...     print '%s is tasty' % (cheese)
...
swiss is tasty
gruyere is tasty
cheddar is tasty
stilton is tasty
roquefort is tasty
brie is tasty

Similarly for dictionaries:

1
2
3
4
5
6
7
8
9
>>> for name in uni:
...     print name, "went to", uni[name]
...
Jones went to Oxford
Gilliam went to Occidental
Cleese went to Cambridge
Chapman went to Cambridge
Idle went to Cambridge
Palin went to Oxford University

There's even another built-in method to lists that will give you the key-value pairs directly as tuples:

1
2
3
4
5
6
7
8
9
>>> for k, v in uni.iteritems():
...     print k, "went to", v
... 
Jones went to Oxford
Gilliam went to Occidental
Cleese went to Cambridge
Chapman went to Cambridge
Idle went to Cambridge
Palin went to Oxford University

If you'd really like to emulate C-style for loops over numbers, you can using the range() function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
>>> range(4)
[0, 1, 2, 3]
>>> for i in range(len(cheeses)):
...     print cheeses[i]
...
swiss
gruyere
cheddar
stilton
roquefort
brie

You shouldn't normally need to do this, given that lists, dictionaries, and most objects have built-in iterators.

List Comprehensions

Here's a common coding task: iterate over some list, perform some action on each element of that list, and store the results in a new list. Here's an example:

1
2
3
4
5
6
>>> cheeselen=[]
>>> for c in cheeses:
...     cheeselen.append(len(c))
...
>>> cheeselen
[5, 7, 7, 7, 9, 4]

It turns out that there is a way to do this in just one line of code using list comprehensions:

1
2
cheeselen=[len(c) for c in cheeses]
cheeselen

But wait, there's more! Suppose you only want to add items to the list if they meet a certain condition, say if the item begins with the letter s. Well here's the long way:

1
2
3
4
5
6
7
>>> scheeselen=[]
>>> for c in cheeses:
...     if c[0]=='s':
...             scheeselen.append(len(c))
...
>>> scheeselen
[5, 7]

It turns out you can even add a conditional to list comprehensions. So the 4 lines of code become this:

1
2
scheeselen=[len(c) for c in cheeses if c[0]=="s"]
scheeselen

While this may seem a bit esoteric, list comprehensions are incredibly useful when coding quickly. It's worth getting used to them.

String Manipulation

Strings in python have a number of useful built-in methods. Two of the most useful are split() and join(). Split will break up a string according to the parameter passed to the method and place the pieces into a list.

1
2
3
4
>>> x="apple,orange,banana,mango"
>>> fruits=x.split(",")
>>> fruits
['apple', 'orange', 'banana', 'mango']

The default value for split takes removes all whitespace characters. So for instance:

1
2
>>> "Hi there how are you      doing".split()
['Hi', 'there', 'how', 'are', 'you', 'doing']

The inverse of split is join(). Because join() is a method for strings, you invoke it after the string and pass in a list that needs gluing together. Suppose we want to rejoin our fruits list, this time with semicolons:

1
2
3
4
5
>>> fruits
['apple', 'orange', 'banana', 'mango']
>>> semifruits=";".join(fruits)
>>> semifruits
'apple;orange;banana;mango'

It may seem a bit strange to invoke a method directly from a string. If it just looks too weird to you, you can always assign the string to a variable first:

1
2
semicolon=";"
semifruits=semicolon.join(fruits)

File I/O in Python

File I/O is pretty straightforward. If you have a text file, you use the built-in open() method. The easiest way to read a file line-by-line is to use the built-in iterator. Suppose we want to read in the file fruitveg.csv:

apple,34,fruit
pear,3,fruit
lettuce,4,veg
potato,15,veg
mango,22,fruit

We could simply print the file out one line at a time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
>>> for line in open('public_html/data/fruitveg.csv'):
...     print "Here is a line: %s" % line
...
Here is a line: apple,34,fruit

Here is a line: pear,3,fruit

Here is a line: lettuce,4,veg

Here is a line: potato,15,veg

Here is a line: mango,22,fruit

We end up with an extra line because there is a newline character at the end of each line, but one is also added at the end of a print statement. To fix that, we can use the string method strip():

1
2
3
4
5
6
7
8
>>> for line in open('public_html/data/fruitveg.csv'):
...     print "Here is a line: %s" % line.strip()
...
Here is a line: apple,34,fruit
Here is a line: pear,3,fruit
Here is a line: lettuce,4,veg
Here is a line: potato,15,veg
Here is a line: mango,22,fruit

Suppose we wanted to create a dictionary mapping the fruit name to a list with the other values in each line. Here's what we would need to do:

1
2
3
4
5
6
7
fvmap={}
for line in open('public_html/data/fruitveg.csv'):
    bits=line.split(',')
    fvmap[bits[0]]=[bits[1],bits[2].strip()]

>>> fvmap
{'lettuce': ['4', 'veg'], 'mango': ['22', 'fruit'], 'pear': ['3', 'fruit'], 'apple': ['34', 'fruit'], 'potato': ['15', 'veg']}

Food for thought: How can you modify this code to work for any length list, not only two-element lists?

To write to a file, you also use the open() function, but this time include an additional parameter. For instance, to rearrange the list elements 2 and 3 from fruitveg.csv:

1
2
3
4
5
f=open('public_html/data/fruitveg2.csv','w')
for k in fvmap:
    f.write('%s,%s,%s\n'%(k,fvmap[k][1],fvmap[k][0]))

f.close()

You can also append to the end of an existing file by using the 'a' parameter.

Variables in Python

Variables are created when they are assigned; no declaration is needed. They are also implicitly typed; there is no need to specify whether it's an int, string, float, etc.

1
2
3
4
5
6
7
8
9
>>> x=4
>>> y=4.2
>>> z='hamster'
>>> type(x)
<type 'int'>
>>> type(y)
<type 'float'>
>>> type(z)
<type 'str'>

For scoping, if the variable is created inside a function, the variable's scope is local to that function. If the variable is created outside a function, it is global (to that file/module).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
x=3
def computeFun():
    x=4
    return "the value of fun is %i" % x

def computeFun2():
    return "the value of fun is %i" % x

>>> computeFun()
'the value of fun is 4'
>>> computeFun2()
'the value of fun is 3'

Function code can refer to (get values from) a global variable that already exists (see computeFun2() above). However, if the function wants to modify an existing global variable, then it must be explicitly referred to in the function using the global keyword. Otherwise, the interpreter would allocate a local variable just for the function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
y=1
def computeLife():
    y=42
    return "the meaning of life is %i" % y

def computeLife2():
    global y
    y=42
    return "the meaning of life is %i" % y

>>> computeLife()
'the meaning of life is 42'
>>> y
1
>>> computeLife2()
'the meaning of life is 42'
>>> y
42

Executable Python Scripts and Modules

You can put a bunch of Python code, including function definitions and such, into a file and run it. Look at now_v1.py:

1
2
3
4
5
6
#!/usr/bin/python

from datetime import datetime

now = datetime.now()
print now.strftime("%H:%M:%S")  # Use the internet standard

You can run it from the shell as follows:

python now_v1.py
2012-01-31

Sometimes is more convenient to turn the file into an executable script:

chmod a+rx now_v1.py
now_v1.py
2012-01-31

Modules in Python

We've also seen that we can import functions and other useful stuff from files into python. It's smart to write your Python code so that you can use the code that way, invoking them from other Python code. Look at now_v2.py:

1
2
3
4
5
6
7
8
#!/usr/bin/python

from datetime import datetime

def now():
    """Returns a string for the current day in internet format:  YYYY-MM-DD"""
    now = datetime.now()
    return now.strftime("%H:%M:%S")  # Use the internet standard

Here's how we could use it:

1
2
3
4
5
>>> import now_v2
>>> now_v2.now()
'2011-02-17'
>>> print now_v2.now()
2011-02-17

But now it doesn't work as a shell script. Can we do both? Yes, there's a trick! Look at now_v3.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/python

from datetime import datetime

def now():
    """Returns a string for the current day in internet format:  YYYY-MM-DD"""
    now = datetime.now()
    return now.strftime("%H:%M:%S")  # Use the internet standard

# the following code is only executed if this file is invoked from the
# command line as a script, rather than loaded as a module.

if __name__ == '__main__':
    print now()

And as a script:

chmod a+rx now_v3.py
now_v3.py
2011-02-17

PyDoc: Self-documenting Scripts

To get the documentation on a Python module, including one you write yourself, you can use the pydoc shell command:

pydoc mathfuns

produces the following documentation for mathfuns right to your screen. (You can also set up pydoc as a web server, which is very cool.)

Of course, the documentation that Pydoc gives you comes from the author of the module, and when you write Python code, you shoulder the responsibility of documenting what you create.

Give every function a meaningful documentation string. Write the kind of documentation you'd like to read if you wanted to know how to use the function. The string goes (in triple-quotes, which allows for multiple lines) as the first element of the function definition.

1
2
3
def computeLife():
    """This function returns the meaning of life in integer form"""
    return 42

Some additional guidelines and information:

Documenting Functions Docstring conventions

Python Semantics

Python is a language that is byte-compiled and interpreted on the fly, like Perl, PHP, JavaScript (but not Java) and many other scripting languages. It is weakly typed, with types on objects rather than on variables. It is mostly lexically scoped. Objects are allocated from the heap; almost nothing is allocated on the stack, so you can return objects from functions.

Versions of Python

Python is in active development, and new versions come out regularly. As of this writing (2011), the latest stable release is version 2.7. Puma is running two versions: 2.4 and 2.6. The 2.4 version is the default version. The Linux workstations run 2.7.

There is also Python version 3. Version 3 substantially modifies the language and is not backwards-compatible with version 2. This means packages developed for version 2 won't work. Consequently, almost no one actually uses Version 3, since much of Python's value is the packages developed by the open-source community. We use Python version 2 in this class.

Significant portions of these notes were adapted from those used by Scott Anderson in CS304. Consequently, these notes are also released under a Creative Commons License.

In-Class Exercises

For these exercises we will be working with data from the Global Peace Index ranking of US states, obtained from this article in the Guardian.

I have extracted one sheet from the spreadsheet and converted it to CSV format, which can be downloaded here.

Exercise 1: Open the file and put it into a dictionary

For simplicity you should open the file directly from the URL. To do this, rather than typing open("filename"), you should type import urllib2, then urllib2.urlopen("http://cs.wellesley.edu/~qtw/data/peaceIndexNoHeader.csv").

Your task is to create a dictionary called state2PeaceYrs, which maps a state name to a 19-element list of index values for each of the years 1991-2009. Be sure to convert the string values to floating point values using the float() casting function.

I also encourage you to do some sanity-checking of the data structure afterwards to make sure that you are creating the data structure you expect. For example, you should count the keys to make sure that there are exactly 50 (one for each state).

Exercise 2: Create a pared-down dictionary and list comprehensions

Create another dictionary which maps the state to just the most-recent 2009 ranking (so the value should be a floating point number, not a list). Call this dictionary state2Peace.

Next, write 3 list comprehensions to create lists meeting the following criteria:

Exercise 3: Create a derivative CSV file

Suppose you are interested in the percentage change from the rankings from previous years to now. First write a function calcPctChange that accepts two parameters and returns the percentage point change from the first argument to the second. For example, calcPctChange(2,3) should return 50.0, while calcPctChange(3,2) should return -33.3.

After you've written the function, compute the percentage change from the first year of the study (1991) to the most recent year (2009). Then write a new CSV file called peaceStateChange.csv with four variables: State, 2009 ranking, 1991 ranking, pct change.

Solutions to In-Class Exercises

Please don't check these until you've written the code.

Exercise 1

Exercise 2

Exercise 3