Python
Readings
Required (prior to class)
- These lecture notes
- Introduction to Unix and the X Window System -- Read this if you aren't familiar with working in Linux.
- SSH into one of the Linux workstations (e.g.,
minnow) from your computer (or any remote computer) and start the Python interpreter. We will work through a Python exercise in class, so if you bring a laptop to class then you can follow along in your interpreter. Here's an example session:
1 2 3 4 5 6 7 8 9 10 11 12 | login as: tmoore tmoore@minnow.wellesley.edu's password: Last login: Tue Jan 24 07:12:39 2012 from puma.wellesley.edu [tmoore@minnow ~] python Python 2.7 (r27:82500, Sep 16 2010, 18:03:06) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print 'hi' hi >>> 3+4 7 >>> |
Optional (If you're new to Python and want to learn more than what is in these notes, then these are good resources.)
- Think Python, by Olin professor Allen Downey, Chapters 3 and 4. The entire book is available to read online, and offers easy-to-read explanations of all the concepts we will cover.
- The Official Python Tutorial Sections 1--5
- Python quick reference - Very terse overview of features of Python as distinguished from other languages.
Why Python?
We already know Java. Why learn Python?
- Python has far less overhead than Java for the programmer.
- Python is handy for data manipulation and transformation, and anything "quick and dirty."
- Python is very powerful, thanks to all those extension modules.
Python is a very likable language. Some things that I like about it:
- Its syntax, while bizarre compared to conventional languages, is spare and almost beautiful. It will take some getting used to, but it's quite readable, even for a beginner. Once you're used to it, you may never want to go back to a conventional language!
- It has a lot of powerful, dynamic features: dynamic creation of objects, functions, and pretty much anything.
- It has a lot of powerful packages.
- It is easily portable.
- It has a read-eval-print-loop which makes experimentation and playing a joy.
- Its object-oriented programming is good but optional.
Python as an Environment
One of the best ways to learn python is to type expressions into the interpreter. Just give the command python to your Linux or Mac shell, or from a Putty SSH terminal in Windows, and start typing expressions.
1 2 3 4 5 6 7 8 9 10 | [tmoore@minnow ~] python Python 2.7 (r27:82500, Sep 16 2010, 18:03:06) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a = 3 >>> b = 4 >>> import math >>> c = math.sqrt(a*a+b*b) >>> c 5.0 |
To exit, invoke the quit() function or type a control-D.
Python as a Language
Python code looks like no other code that I'm familiar with. It's a complete departure from the C family of languages.
1 2 3 4 5 6 7 | # Like shell scripts, it uses # as a end-of-line comment character import math # packages are "loaded" by the import statement a = 3 # no need to declare types; very dynamic b = 4 c = math.sqrt(a*a+b*b) |
So far, not too bad. Let's see some more syntax:
1 2 3 4 5 | if a == b: print 'a and b are the same' else: print 'a and b differ' print "let's go on" |
Hmmm. Where are the parens and braces? Gone! Python knows that the else section is over because the indentation ends. That's right, the indentation has syntactic meaning in Python. So, the following two programs are different in Python.
1 2 3 4 5 6 7 8 9 10 11 | #first code block i = 0 while i < 10: i += 1 print i #second code block i = 0 while i < 10: i += 1 print i |
The first code block prints the last number, while the bottom one prints every number, because the print statement is inside the loop.
Functions in Python
Function declarations in Python are simple: a name, a formal argument list (no data types), and a body. The end of the body is, as expected, signaled by the end of indentation.
You can try these by downloading the mathfuns.py python file, importing the contents into python, and running the functions:
1 2 3 4 5 | >>> import mathfuns >>> mathfuns.hypo(5,12) 13.0 >>> mathfuns.gcd(55,89) 1 |
You can avoid the filename (which is also the name of the module) by importing particular members or all members:
1 2 3 4 5 6 7 8 9 10 11 | >>> from math import sqrt >>> sqrt(9) 3 >>> from mathfuns import * >>> fibonacci(100) # too long! >>> gcd(30,50) a is 30 and b is 50 a is 30 and b is 20 a is 10 and b is 20 a is 10 and b is 10 10 |
Data types in Python
All the examples so far have been numeric, for no good reason but that numbers don't need much introduction. Let's look at some more interesting datatypes. To play with these test values, download this sampledata.py file.
Strings
Strings pretty much work as you expect. You can concatenate them with the + operator. You can take their length. You can print them.
1 2 3 4 5 6 7 8 9 10 11 12 13 | >>> from sampledata import * >>> x+x 'spam, spam, ' >>> x+x+y+' and '+x 'spam, spam, eggs, and spam, ' >>> x+x+y+'and '+x 'spam, spam, eggs, and spam, ' >>> len(x) 6 >>> lex(x+y) 12 >>> print(x+y) spam, eggs, |
You can substitute stuff into them like the C printf statement, using % and a letter code to indicate the type and format. Here are the most common codes:
| Code | Data type | Example string | Example value |
|---|---|---|---|
| %s | String | 'Hi %s' % ('Tyler') | 'Hi Tyler' |
| %i | Integer | '%i+%i' % (1,2) | '1+2' |
| %f | Float | '%f+%i' % (1.1,2) | '1.100000+2' |
| %.nf | Float with n decimal places | '%.2f+%i' % (1.1,2) | '1.10+2' |
For complete reference see the Python documentation.
Lists
Lists are denoted with square brackets with commas between the elements. You can index them numerically, and extract sub-lists. You can append stuff onto the end (actually, either end). You can store into them. Lists are one of the most useful and commonly used data structures in Python.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | >>> from sampledata import * >>> len(cheeses) 6 >>> cheeses[0] 'swiss' >>> cheeses[1:3] ['gruyere', 'cheddar'] >>> cheeses[1:4] ['gruyere', 'cheddar', 'stilton'] >>> cheeses.append('gouda') >>> cheeses ['swiss', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda'] >>> cheeses[0] = 'emmentaler' >>> cheeses ['emmentaler', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda'] |
The append shows how to invoke a method on a list, and that lists are mutable, unlike tuples.
Tuples
Tuples are just like lists, except that they use parentheses instead of square brackets and they are immutable.
1 2 3 4 5 6 7 8 | >>> from sampledata import * >>> len(troupe) 6 >>> troupe[0] = 'Homer' # won't work Traceback (most recent call last): File "", line 1, in TypeError: 'tuple' object does not support item assignment >>> |
Dictionaries
Like all civilized languages, Python has hashtables built-in, except that Python calls them dictionaries (like Smalltalk). Dictionaries are another fundamental data structure in Python that you will use frequently.
1 2 3 4 5 6 7 8 9 10 | >>> from sampledata import * >>> uni {'Jones': 'Oxford', 'Gilliam': 'Occidental', 'Cleese': 'Cambridge', 'Chapman': 'Cambridge', 'Idle': 'Cambridge', 'Palin': 'Oxford'} >>> uni['Palin'] 'Oxford' >>> uni['Palin'] = 'Oxford University' >>> uni['Palin'] 'Oxford University' >>> uni.keys() ['Jones', 'Gilliam', 'Cleese', 'Chapman', 'Idle', 'Palin'] |
Iteration in Python
We've seen the usual while loop above, which is very normal. The python for loop, however, is very powerful, due to its natural integration with iterators. You've seen iterators in Java, but the implementation in Python is much cleaner.
Essentially, if you pass a list or dictionary into a for loop, it will iterate over the list's values or the dictionary's keys.
1 2 3 4 5 6 7 8 9 10 | >>> from sampledata import * >>> for cheese in cheeses: ... print '%s is tasty' % (cheese) ... swiss is tasty gruyere is tasty cheddar is tasty stilton is tasty roquefort is tasty brie is tasty |
Similarly for dictionaries:
1 2 3 4 5 6 7 8 9 | >>> for name in uni: ... print name, "went to", uni[name] ... Jones went to Oxford Gilliam went to Occidental Cleese went to Cambridge Chapman went to Cambridge Idle went to Cambridge Palin went to Oxford University |
There's even another built-in method to lists that will give you the key-value pairs directly as tuples:
1 2 3 4 5 6 7 8 9 | >>> for k, v in uni.iteritems(): ... print k, "went to", v ... Jones went to Oxford Gilliam went to Occidental Cleese went to Cambridge Chapman went to Cambridge Idle went to Cambridge Palin went to Oxford University |
If you'd really like to emulate C-style for loops over numbers, you can using the range() function:
1 2 3 4 5 6 7 8 9 10 11 | >>> range(4) [0, 1, 2, 3] >>> for i in range(len(cheeses)): ... print cheeses[i] ... swiss gruyere cheddar stilton roquefort brie |
You shouldn't normally need to do this, given that lists, dictionaries, and most objects have built-in iterators.
List Comprehensions
Here's a common coding task: iterate over some list, perform some action on each element of that list, and store the results in a new list. Here's an example:
1 2 3 4 5 6 | >>> cheeselen=[] >>> for c in cheeses: ... cheeselen.append(len(c)) ... >>> cheeselen [5, 7, 7, 7, 9, 4] |
It turns out that there is a way to do this in just one line of code using list comprehensions:
1 2 | cheeselen=[len(c) for c in cheeses] cheeselen |
But wait, there's more! Suppose you only want to add items to the list if they meet a certain condition, say if the item begins with the letter s. Well here's the long way:
1 2 3 4 5 6 7 | >>> scheeselen=[] >>> for c in cheeses: ... if c[0]=='s': ... scheeselen.append(len(c)) ... >>> scheeselen [5, 7] |
It turns out you can even add a conditional to list comprehensions. So the 4 lines of code become this:
1 2 | scheeselen=[len(c) for c in cheeses if c[0]=="s"] scheeselen |
While this may seem a bit esoteric, list comprehensions are incredibly useful when coding quickly. It's worth getting used to them.
String Manipulation
Strings in python have a number of useful built-in methods. Two of the most useful are split() and join(). Split will break up a string according to the parameter passed to the method and place the pieces into a list.
1 2 3 4 | >>> x="apple,orange,banana,mango" >>> fruits=x.split(",") >>> fruits ['apple', 'orange', 'banana', 'mango'] |
The default value for split takes removes all whitespace characters. So for instance:
1 2 | >>> "Hi there how are you doing".split() ['Hi', 'there', 'how', 'are', 'you', 'doing'] |
The inverse of split is join(). Because join() is a method for strings, you invoke it after the string and pass in a list that needs gluing together. Suppose we want to rejoin our fruits list, this time with semicolons:
1 2 3 4 5 | >>> fruits ['apple', 'orange', 'banana', 'mango'] >>> semifruits=";".join(fruits) >>> semifruits 'apple;orange;banana;mango' |
It may seem a bit strange to invoke a method directly from a string. If it just looks too weird to you, you can always assign the string to a variable first:
1 2 | semicolon=";" semifruits=semicolon.join(fruits) |
File I/O in Python
File I/O is pretty straightforward. If you have a text file, you use the built-in open() method. The easiest way to read a file line-by-line is to use the built-in iterator. Suppose we want to read in the file fruitveg.csv:
apple,34,fruit pear,3,fruit lettuce,4,veg potato,15,veg mango,22,fruit
We could simply print the file out one line at a time:
1 2 3 4 5 6 7 8 9 10 11 12 | >>> for line in open('public_html/data/fruitveg.csv'): ... print "Here is a line: %s" % line ... Here is a line: apple,34,fruit Here is a line: pear,3,fruit Here is a line: lettuce,4,veg Here is a line: potato,15,veg Here is a line: mango,22,fruit |
We end up with an extra line because there is a newline character at the end of each line, but one is also added at the end of a print statement. To fix that, we can use the string method strip():
1 2 3 4 5 6 7 8 | >>> for line in open('public_html/data/fruitveg.csv'): ... print "Here is a line: %s" % line.strip() ... Here is a line: apple,34,fruit Here is a line: pear,3,fruit Here is a line: lettuce,4,veg Here is a line: potato,15,veg Here is a line: mango,22,fruit |
Suppose we wanted to create a dictionary mapping the fruit name to a list with the other values in each line. Here's what we would need to do:
1 2 3 4 5 6 7 | fvmap={} for line in open('public_html/data/fruitveg.csv'): bits=line.split(',') fvmap[bits[0]]=[bits[1],bits[2].strip()] >>> fvmap {'lettuce': ['4', 'veg'], 'mango': ['22', 'fruit'], 'pear': ['3', 'fruit'], 'apple': ['34', 'fruit'], 'potato': ['15', 'veg']} |
Food for thought: How can you modify this code to work for any length list, not only two-element lists?
To write to a file, you also use the open() function, but this time include an additional parameter. For instance, to rearrange the list elements 2 and 3 from fruitveg.csv:
1 2 3 4 5 | f=open('public_html/data/fruitveg2.csv','w') for k in fvmap: f.write('%s,%s,%s\n'%(k,fvmap[k][1],fvmap[k][0])) f.close() |
You can also append to the end of an existing file by using the 'a' parameter.
Variables in Python
Variables are created when they are assigned; no declaration is needed. They are also implicitly typed; there is no need to specify whether it's an int, string, float, etc.
1 2 3 4 5 6 7 8 9 | >>> x=4 >>> y=4.2 >>> z='hamster' >>> type(x) <type 'int'> >>> type(y) <type 'float'> >>> type(z) <type 'str'> |
For scoping, if the variable is created inside a function, the variable's scope is local to that function. If the variable is created outside a function, it is global (to that file/module).
1 2 3 4 5 6 7 8 9 10 11 12 | x=3 def computeFun(): x=4 return "the value of fun is %i" % x def computeFun2(): return "the value of fun is %i" % x >>> computeFun() 'the value of fun is 4' >>> computeFun2() 'the value of fun is 3' |
Function code can refer to (get values from) a global variable that already exists (see computeFun2() above). However, if the function wants to modify an existing global variable, then it must be explicitly referred to in the function using the global keyword. Otherwise, the interpreter would allocate a local variable just for the function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | y=1 def computeLife(): y=42 return "the meaning of life is %i" % y def computeLife2(): global y y=42 return "the meaning of life is %i" % y >>> computeLife() 'the meaning of life is 42' >>> y 1 >>> computeLife2() 'the meaning of life is 42' >>> y 42 |
Executable Python Scripts and Modules
You can put a bunch of Python code, including function definitions and such, into a file and run it. Look at now_v1.py:
1 2 3 4 5 6 | #!/usr/bin/python from datetime import datetime now = datetime.now() print now.strftime("%H:%M:%S") # Use the internet standard |
You can run it from the shell as follows:
python now_v1.py 2012-01-31
Sometimes is more convenient to turn the file into an executable script:
chmod a+rx now_v1.py now_v1.py 2012-01-31
Modules in Python
We've also seen that we can import functions and other useful stuff from files into python. It's smart to write your Python code so that you can use the code that way, invoking them from other Python code. Look at now_v2.py:
1 2 3 4 5 6 7 8 | #!/usr/bin/python from datetime import datetime def now(): """Returns a string for the current day in internet format: YYYY-MM-DD""" now = datetime.now() return now.strftime("%H:%M:%S") # Use the internet standard |
Here's how we could use it:
1 2 3 4 5 | >>> import now_v2 >>> now_v2.now() '2011-02-17' >>> print now_v2.now() 2011-02-17 |
But now it doesn't work as a shell script. Can we do both? Yes, there's a trick! Look at now_v3.py:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/usr/bin/python from datetime import datetime def now(): """Returns a string for the current day in internet format: YYYY-MM-DD""" now = datetime.now() return now.strftime("%H:%M:%S") # Use the internet standard # the following code is only executed if this file is invoked from the # command line as a script, rather than loaded as a module. if __name__ == '__main__': print now() |
And as a script:
chmod a+rx now_v3.py now_v3.py 2011-02-17
PyDoc: Self-documenting Scripts
To get the documentation on a Python module, including one you write yourself, you can use the pydoc shell command:
pydoc mathfuns
produces the following documentation for mathfuns right to your screen. (You can also set up pydoc as a web server, which is very cool.)
Of course, the documentation that Pydoc gives you comes from the author of the module, and when you write Python code, you shoulder the responsibility of documenting what you create.
Give every function a meaningful documentation string. Write the kind of documentation you'd like to read if you wanted to know how to use the function. The string goes (in triple-quotes, which allows for multiple lines) as the first element of the function definition.
1 2 3 | def computeLife(): """This function returns the meaning of life in integer form""" return 42 |
Some additional guidelines and information:
Documenting Functions Docstring conventions
Python Semantics
Python is a language that is byte-compiled and interpreted on the fly, like Perl, PHP, JavaScript (but not Java) and many other scripting languages. It is weakly typed, with types on objects rather than on variables. It is mostly lexically scoped. Objects are allocated from the heap; almost nothing is allocated on the stack, so you can return objects from functions.
Versions of Python
Python is in active development, and new versions come out regularly. As of this writing (2011), the latest stable release is version 2.7. Puma is running two versions: 2.4 and 2.6. The 2.4 version is the default version. The Linux workstations run 2.7.
There is also Python version 3. Version 3 substantially modifies the language and is not backwards-compatible with version 2. This means packages developed for version 2 won't work. Consequently, almost no one actually uses Version 3, since much of Python's value is the packages developed by the open-source community. We use Python version 2 in this class.
Significant portions of these notes were adapted from those used by Scott Anderson in CS304. Consequently, these notes are also released under a Creative Commons License.
In-Class Exercises
For these exercises we will be working with data from the Global Peace Index ranking of US states, obtained from this article in the Guardian.
I have extracted one sheet from the spreadsheet and converted it to CSV format, which can be downloaded here.
Exercise 1: Open the file and put it into a dictionary
For simplicity you should open the file directly from the URL. To do this, rather than typing open("filename"), you should type import urllib2, then urllib2.urlopen("http://cs.wellesley.edu/~qtw/data/peaceIndexNoHeader.csv").
Your task is to create a dictionary called state2PeaceYrs, which maps a state name to a 19-element list of index values for each of the years 1991-2009. Be sure to convert the string values to floating point values using the float() casting function.
I also encourage you to do some sanity-checking of the data structure afterwards to make sure that you are creating the data structure you expect. For example, you should count the keys to make sure that there are exactly 50 (one for each state).
Exercise 2: Create a pared-down dictionary and list comprehensions
Create another dictionary which maps the state to just the most-recent 2009 ranking (so the value should be a floating point number, not a list). Call this dictionary state2Peace.
Next, write 3 list comprehensions to create lists meeting the following criteria:
- States with rankings above 3
- States with rankings below 2
- Rankings for those states beginning with "New".
Exercise 3: Create a derivative CSV file
Suppose you are interested in the percentage change from the rankings from previous years to now. First write a function calcPctChange that accepts two parameters and returns the percentage point change from the first argument to the second. For example, calcPctChange(2,3) should return 50.0, while calcPctChange(3,2) should return -33.3.
After you've written the function, compute the percentage change from the first year of the study (1991) to the most recent year (2009). Then write a new CSV file called peaceStateChange.csv with four variables: State, 2009 ranking, 1991 ranking, pct change.
Solutions to In-Class Exercises
Please don't check these until you've written the code.
