aquamusic@gmail.com
I found lots of great references on the web and in books. But none of them were quite right for me. I was looking for something that would get me up and running quickly, but the tutorials I found were too simple or too complex, and went into too much detail or not enough. I already knew how to program, and what I wanted was to learn the mechanics and mindset for working with Python. By reading multiple sources, I finally got myself to a reasonable point of understanding, and I thought I'd document what I'd learned in order to help other people who are going through the same process. I deliberately wrote this while I was still fresh to the language and before I got too familiar with it. So my attitude is to give you the right frame of mind to feel your way through reading and writing Python code, and enough details on the language so you can knock out something basic. If you find yourself liking Python, by all means get a broader and deeper education from references and documentation.
It's worth knowing that the basic language isn't huge. A lot of the power of Python comes from its many libraries, which provide everything from regular expression operations to web access. I don't know of a shortcut for learning these; you just have to skim over the main libraries to get familiar with what they contain, and then go to the others as needed. If you have a favorite library from another language, the odds are pretty good that a similar library exists in the Python world.
The basics of the language are available all over the place. Skim through something like Dive Into Python (available here for free) to get a feeling for things.
One of the sweet things about Python is that it's an interpreted language, so you can open up the interpreter and try things out. Anything you can do in Python you can do in the interpreter.
If you need to download and install Python,
there are lots of great guides online for different platforms.
Since Python is open-source, you can download a copy for free.
You may have a copy of Python already on your computer,
ready to fire up and run
(for example, it comes pre-loaded on Macs).
To find out,
just open a shell and type python
and see if an interpreter announces itself.
If it does, type 3+5 and then a return;
Python should come back to you with their sum.
If you don't have Python pre-installed,
or your copy is an older version,
you might want to update to something newer.
I've been happy with
ActivePython
from ActiveState.
This is a complete Python installation along with a ton of pre-installed,
pre-configured libraries and documentation,
and is available for Windows, Mac OS X, Linux,
and a few other operating systems.
It's worth reading their README file,
to make sure you have a few shell and
environment variables set correctly.
Probably the most visible oddity about Python is its famous (or infamous) indenting rule. In most languages, extra white space is ignored, and you're free to format your code as you like. In particular, when you have a block (say inside a loop or after an if statement), you mark the block with something like curly braces. Most people locate those curly braces in consistent places, and use a handful of other formatting rules to create code that is both attractive and readable. I use white space all the time to format my code so that similar lines are obviously similar, and their differences stand out even at a glance.
Python does not work that way. Blocks are created by indenting. You can indent with the tab key or space key, but what matters is that the first non-white character of each line in a block is to the right of the first non-white character in the line that started the block. If this isn't clear, look at any Python code, and it will leap out at you.
If you try to format your code to your own design using white space at the start of a line, you'll probably regret it. Python is strict about indentation. Love it or hate it, that's how it is, so if you're going to work in Python, I suggest choosing to love it.
A big learning aid is the built-in help command
in the interpreter. To learn more about something called thing,
type help(thing). If it's a module, you'll need to
import it first.
Then you can learn about the
functions in that module with help(module.function).
The import directive is Python's way of reading in libraries,
and we'll see the syntax for it later on.
Python is evolving, and the documentation (and books and online references) frequently talk about "the old way" and "the new way" for different things. I'll stick mostly to the new ways. If you see something weird when reading someone else's code, they might be doing something in an "old way" and you can find an explanation online.
The examples in this document are mostly the result of
cut-and-paste directly from the interpreter, but with a twist. I'm using
color-coding to distinguish what I type (on a purple background) and what the computer replies (on a blue background). Here's why.
The interpreter normally prompts you with a prompt of three
greater-than signs like this:
>>>
and then if you're in the middle of a block, you get a continuation prompt of three
dots, like this: ...
If I included the prompts in these listings, then you wouldn't be able to select
my text, copy it, and paste it into your interpreter. I think one of the great
values of the interpreter is precisely that you can enter stuff and fool around with
it so easily, and if showing you the system's prompts interferes with that, then they
have to go. I hope this color-coding is a reasonable substitution for revealing who's
typing what. Generally speaking, if it's on purple, you can cut-and-paste right back
into the interpreter and fool around with the results. Give it a shot - you can't
break the computer!
When you're entering a code block (which we'll get to, but basically something
that's part of a function definition, or a loop body, or things like that) there's no
formal way to say "all done," like
the closing brace in many other languages.
In the interpreter, the usual way to end a block is to enter some
new code that's indented less than the block, or (if there's
no more code coming) just enter a blank line.
Because I'm not showing the prompts I'm also not showing the blank lines.
If you copy
and paste a chunk of code from here, and the interpreter is giving you a ...
prompt, that's your cue to just press the return and enter a blank line, which will
usually close all open blocks and return you to the top-level >>> prompt,
which 99 times out of 100 will be where you'll want to be.
Like many modern languages, Python does automatic garbage collection for you, so you don't have to worry about memory yourself. If you remember Ye Olden Days before garbage collecting, this is music to your ears. If you don't, it's still pretty sweet.
OK, let's dig in.
You don't declare your variables in Python, which means you don't assign types to them either. Python has types, and it does strict type checking at run time. But you need to keep track yourself of what's in your variables. By re-assigning, a string can become an integer and then a dictionary, without warning. This is a strange thing for programmers used to explicit typing, because the strong run-time checking means you have to know the types of your variables, but the lack of declarations means that there's no obvious way to know.
I can see only two solutions: either lots of comments or a consistent naming convention. Comments can be helpful or a drag, depending on their quality and quantity. I haven't yet seen a widely-adopted naming convention for Python that encodes the type of a variable in its name (like so-called Hungarian notation, used in early programming languages). So in this document I'll do what everyone else does and simply give my variables convenient names. Since I'll usually be presenting only tiny fragments of code, I think the types of my variables should be clear from context. This is a little like playing with fire, but I don't know of an alternative.
When you assign a value to a variable, Python figures out the type of that value and then internally marks the variable as having that type. Python rarely does any automatic type conversion for you. This is usually not a problem except when printing, and there are tools to help you in that special case.
Python has the usual assortment of data types in which to store numbers:
integers, floats, longs, and so on. It also has the usual
arithmetic operators on them.
If a given expression is all integers, then the
result is an integer. So the expression
7/2 is 3, while
7.0/2 is 3.5. You can just put a decimal
after a number to make it a float: 7. is just as good as 7.0.
In most other languages I know, having a floating-point value anywhere in the expression
promotes all the calculations in that expression to floating point. Not so in Python.
Take a look at these two examples:
(7/2)*1.1
(7./2)*1.1
|
In the first line,
(7/2) got evaluated as 3.
Because both numbers were integers, the result was an integer,
even though the very next thing we do is to multiply by a float.
If you wanted that expression to result in 3.5,
you need to make at least one of the operands floating-point
by including the decimal point,
regardless of the other values in the calculation. Weird. Be careful.
If you want any calculation to be floating-point, make sure that
at least one of
the values involved is explicitly identified as floating-point by
including the decimal point. This still trips me up all the time.
Python also supports single characters, identified with single and double quotes.
We'll get into other types later, including strings and your own classes.
There are three special types built into Python, and all can be used as parent classes for your own classes. They're sequences, strings, and dictionaries. Let's take them in order:
[v0, v1, ... ,vn]A sequence is a list of items that has a bunch of useful abilities attached to it. A sequence can contain other sequences. Each item in a sequence can be a different data type. You create a sequence with square brackets and commas between the elements. These are all valid sequences:
seq0 = [3] seq1 = [3, 4.2, 5] seq2 = ['a', 'Superdog', 8, ['Bob', 'Carol'], 38.5] |
(I'm jumping ahead here and using strings in the examples, but for now think of them like strings in any other language and you'll be fine).
Sequences have useful built-in properties. For example, you can retrieve a single item from a sequence by naming the index of the thing you want in square brackets:
[3,4,5][1]
z = ['apple', 35.2, 9] z[0]
|
It's too bad that square brackets are used here in two very different ways: to create a sequence, and to identify an index into a sequence. I suppose it's because there just aren't a whole lot of matched-pair symbols on a typical keyboard, so it's hard not to re-use them. But don't let this dual use confuse you: the index descriptor is not a sequence! Here, let's try it, what's the worst that can happen?
[3,4,5][1,2]
|
We get an error. At least the machine didn't blow up. A nice thing about Python is that many of the error messages are actually useful in understanding the problem that caused them, or at least they point you in the right direction. Here it's telling us that we're trying to index a list with something other than an integer, which is correct: we tried to use a list of integers, which isn't the same thing.
The first element is in a list is at index 0, the next is 1, and so on. I found that the best way to get a consistent handle on all indexing is to think of the elements in the sequence in little boxes, side by side. The index for each element sits in the gap between the box. So the index 0 is located just to the left of the first entry. 1 is just the left of entry 1, and so on, like this:
seq = ['a', 'b', 'c', 'd'] |
0 1 2 3 | +---+ | +---+ | +---+ | +---+ | | a | | | b | | | c | | | d | | +---+ | +---+ | +---+ | +---+ | | | | -4 -3 -2 -1
Note that in the figure I've included negative
numbers below the vertical bars;
Python supports counting backwards from the end using
negative indices, as shown here. So seq[0] is
'a', seq[2] is 'c', and
seq[-3] is 'b'.
Asking for an entry that's out of bounds normally raises an error.
In this case, the error would be raised for asking for an entry
greater than 3 or less than -4. Common Python idioms use
seq[0] to get the first
entry in a sequence, and seq[-1] to get the last entry.
This picture is useful because Python supports "slicing". In a
slice, you get a
sub-sequence out of a sequence by using the notation [a:b]. Typically,
a is less than b, and you'll get the elements between indices a and b. Note from the picture that this
means your sub-sequence will begin with the element to the right of
index a, and continue up to the element to the left of
index b. So seq[1:3] is a new list, ['b', 'c'].
By remembering that a slice refers
to the index values (not the entry numbers), and remembering where the index values
live in the picture, slices are easy to control and predict.
Slices are very flexible. You can leave off either entry, or both. If you leave
off the first, it defaults to "start of sequence". If you leave off the second, it
defaults to "index just past the end of sequence".
This might seem a little odd, but it's a good choice.
Suppose that a missing right-side of the slice meant "end of sequence."
In the 4-element sequence above, the last element is at
index 3, so if an empty right side of a slice defaulted to the index of the
last element,
seq[1:] would be the same as seq[1:3],
which is ['b', 'c'].
That doesn't include the element at the end of the sequence!
So Python interprets a missing argument on the right side of the slice
as an index one past the last index.
This works because slices have their own special-case rules,
where asking for a number that's too large does not raise
an error.
Compare this to what happens if you ask for an
element beyond the end of seq directly.
Here I'll try to get entry number 55,
and it blows up:
seq[55]
|
But if you use that same index as the right side of a slice, everything's fine:
seq[1:55]
|
So when you leave off the right element in a slice, Python replaces it with a value of 1 more than the index of the last element, and you get everything up to, and including, the end of the sequence.
If you leave off both values for the slice range, you get the whole list
(so seq[:] is the whole sequence). Here's a picture that demonstrates the different
slice types - I'm leaving off the negative indices for simplicity, but you can use
them freely instead of positive ones:
r = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'] |
0 1 2 3 4 5 6 7 8 | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | | a | | | b | | | c | | | d | | | e | | | f | | | g | | | h | | | i | | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | +---+ | | | | | | | | | <-----r[:2]-----> <--------r[3:6]---------> <---r[7:]----->
r[:2]
r[3:6]
r[7:]
|
Python has a bunch of sequence types, and you can even make your own. I think the most popular type is the list, so let's cover that now.
A list is a sequence, so it's created with square brackets and comma-separated elements, and it can be indexed and sliced as above. Lists support a few useful everyday operations:
+ sign:
list1 + list2
list1 = [1, 2, 3, 4] list2 = [77, [111, 222], 99] combo = list1 + list2 combo
|
* sign: list * n
list1 * 3
|
You can ask if an element is in a list with the expression val in list.
The in construct appears frequently, usually in two ways. The first is
in a test, as I'm discussing now. The other use is to extract elements from a sequence,
typically in a loop, and we'll get to that below.
Here are some Boolean expressions that use in. They evaluate to True if
the element is in the list, and False if they're not (Python's Booleans
are basically the same as in every other language - they're just little variables that
are either True or False):
list1 = [1, 2, 3, 4] list2 = [1, 2, 3, 4, 77, [111, 222], 99] 3 in list1
9 in list1
[111,222] in list2
|
The in idiom will come in handy when writing tests for if
statements and loops.
You can assign to list elements or to whole slices:
list1 = [0, 1, 2, 3, 4, 5, 6] list1[3]
list1[3] = 9 list1
list1[2:4] = [22,33] list1
|
You can create an empty list by creating a list with nothing in it:
emptyList = [] emptyList
|
If tested as a Boolean, an empty list is False.
Let's look at how to manipulate lists. First we'll look at four functions that work on lists or list elements:
del(list[n])n from the list. It returns nothing:
chars = ['a','b','c','d','e'] chars[3]
del(chars[3]) chars
|
len(list)
len([])
len(['a'])
len([1, [222, 333], 'a'])
|
min(list)max(list)
min(['banana', 'apple', 'kiwi'])
max(['banana', 'apple', 'kiwi'])
|
The rest of the functions we'll cover here are methods attached to the list
object itself. Make note of this important warning: most of these
list functions change the list! That is, they alter the
contents of the thing being operated on. If you don't want your list to be
changed on you, make a copy of it first and then process the copy. So
for example, if your list is named list1, and you want to append an object
named newThing to it, you'd call
list1.append(newThing) |
with the result that list1 will now have newThing stuck onto the end of it.
All of these operations are covered in more detail in the references. Some
have additional options I'm not covering, so if something is close but to
what you need but not quite right, take a look at the references before
rolling your own. Note that when I talk about "the elements of the list" I
mean the elements as they're visible to the list as it looks at them from
the highest level. For example, the count function tells us how many times
an object appears in a list. Let's apply it to a list containing a sublist:
myList = [1, 2, [1, 2, 3, 4, 1], 3] myList.count(1)
|
That's because myList[2] is a list itself, and not a 1. If you wanted the total
number of 1's you'd need to flatten the list first (that is, remove the hierarchy
and in this case make it a single big list of 8 entries). Flattening seems like
a pretty useful thing, but I haven't seen any built-in commands to do that.
(As we discuss more of Python's features,
you might want to think about how you could use them
to write a routine to flatten a list that contains
multiple sublists, each of which might contain sublists as well.)
Here are some useful list-related routines:
append(obj): appends obj to the end of the listcount(obj) : returns the number of times obj appears at the top level of the listindex(val) : return the index of the first occurrence of val insert(index, obj) : insert obj into the list before indexpop(index) : remove the object at index from the list and return it. If you
omit the index and call just pop() without arguments, it pops from the end of the list.sort() : in-place sort. You can provide your own comparison and scoring
functions as optional arguments. Remember, the list itself is
sorted, not a copy of it!reverse() : reverses the list in-placeremove(val) : remove the first occurrence of val from the listSorting and reversing are common operations and it would be a hassle to always make a copy and then operate on the copy every time you wanted to use them. Python provides a couple of nice shortcuts that do not modify the list:
newList = sorted(list)newList = reversed(list)
both return a new list. Like sort, you can give sorted your own
functions with which to do its job, so you can sort a bunch of songs based on who's performing
them based on your personal ranking of musicians.
(v0, v1, ... vn)A tuple is a read-only list. It's created with round parens, rather than square brackets. You cannot assign to a tuple once it's made, but you can read it with normal indexing and slicing. Tuples are great for reference objects you don't want to accidentally mangle, like a list of passwords or a list of all the lowercase letters on your keyboard.
None
Welcome to a momentary detour. I'm going to be printing out a lot of things
in the rest of this document, so let's look quickly at the print
statement.
This will take any single thing you give it, and it will do its best to print it out for you back to the standard output, or right into your interpreter window if you're running Python interactively:
print 29
print 'Hello there!'
|
When debugging, it's handy to print out the value of a variable. I'll cover
string concatenation later, but basically to put two strings together you use
a plus sign (+), like this:
print "Hi, "+"Mr. "+"Bumpercars"
|
Note that I had to put spaces into the strings myself. If you try to print out a variable this way, it fails:
apples = 42 print "I have "+apples+" apples."
|
You can see that the error message actually has a little explanation with it, which I think is terrific. The problem is that we're surrounding a number with a couple of strings, and as Python says, it can't glue numbers and strings together.
It seems pretty reasonable to me that print should try to turn
everything into a string all by itself, but that doesn't happen.
Python just really does not want to do automatic type conversion
for you, ever. Luckily, it's not hard to do it yourself.
To turn just about anything into a string,
hand it to str, and it will do its best to turn it into a string that
you can then combine with other strings and print:
apples = 42 print "I have "+str(apples)+" apples."
|
There's one other thing to mention before going forward. Just about every
language has some way of saying "nothing here". In Python, that's the special
symbol None. You can't do much with None except
test for it and assign it to things. Note that None is not the
same as an empty string, or an empty list. Those are actual things that just
don't have stuff in them. None really means "There's nothing
here at all, not even a placeholder."
It shows up in Python in a bunch of places, but I'm mentioning it at this point
because you'll see it eventually if you play around in the interpreter. It's not
necessarily an error or anything to be concerned about. For example, a function
that doesn't have an explicit return statement returns None
(I'll talk about functions later). So sometimes you'll call a function in
the interpreter and you'll see None printed out as the result,
perhaps because you forgot
to include a return statement, or you used a function in an unexpected way.
OK then, now that we can print anything, and we know what None
is all about, we're off to strings!
'stuff' or "stuff" or """stuff"""A string is a read-only list of characters. You can change a string during an expression, but if you want the results to stick around you have to save them into a new string. That's easy to do, but sometimes I still have to remind myself that I can't manually change the characters in a string. Strings are maintained by the system. It knows how long they are, so there are no forbidden characters (unlike, say, C, where the character \0 (a string of 8 bits, all 0) means "end of string", and thus cannot be part of a string. In Python, that's just another legitimate character that happens to be 0).
String have lots of methods they can carry out. Before digging into the best, it's worth noting the string module (more on modules later) provides a half-dozen really useful predefined strings. I'll name them here, but you'll never guess what these strings contain. Never. Not in a million billion years.
string.digitsstring.lettersstring.lowercasestring.printablestring.punctuationstring.uppercase
As a heads up, to use these you'll first need to import the string package.
You typically do this by putting an import statement at the
top of your file:
import string |
Now if you want to know whether or not the first letter of a string called
name is capitalized, you can test it (I'll talk more about if
statements below, but you'll get the idea):
if name[0] in string.uppercase : print 'Yes! A capital!' |
Another way that strings are special is that they know that their characters
all belong together, rather than existing merely as a list of distinct values.
So you don't use
the (v,v,v) or [v,v,v] notation to create a string,
since then you'd just get
a normal list of individual characters. Rather, you create a string just by
enclosing it in quotes:
string1 = 'Hello' string2 = "World" |
There are a bunch of details about strings that arise in virtually every language, and every language is a little different. The details are nit-picky but important because strings are so common and these issues come up all the time. Here are the things you need to know about using quotes and newlines and such things in Python strings.
There are three kinds of quotes on most keyboards: the double-quote ("),
the single-quote ('), and the backquote (`).
Python strings are created using the first two types.
First, note that your opening quote matches your closing quote, and
quotes of the other kind in between are ignored. So you can write
string1 = "Jason's car" string1
string1[0]
string2 = '"Delicious," I said.' string2
string2[0]
string3 = "mmm-hmmm" string3
|
Note that the first character of string1 is the letter
J (the first character
after the opening quote) and the first character of string2 is a single quote
(the first character after string2's opening quote). Also note that Python
used double quotes to print out string1, because the
string contains a single quote, so you might otherwise get confused.
That's pretty clever. Normally, it prints out strings with single quotes, as
in the example for string3.
You can embed newlines into your strings with the usual \n escape code (there's
also \t for tabs and \r for carriage returns). If you just look at the contents
of a string by entering it in the interpreter it'll show you these codes, but if
you hand the string to the print function it'll interpret them correctly
s1 = 'Line one\nLine\ttwo' s1
print s1
|
If you want to extend a string over several lines of input, use the backslash to indicate a continued line:
longString = "This is a \ very \ lengthy string" longString
|
Finally, triple quotes (either single or double) will let you embed newlines in your text just by writing it out, and they ignore single instances of both single and double quotes:
longText = """This is a multi-line piece of text. "Really?" she asked. "That's cool." """ longText
print longText
|
One more thing about string-making. You can defeat all escapes by creating a
"raw" string. This is useful if you want a string with backslashes in it, because
otherwise you'd have to escape them by writing \\ every time and that can get really ugly
in a hurry, particularly when using regular expressions (I won't discuss them
here, but Python has a full-featured regular expression library waiting for your
string-manipulating pleasure). You create a raw
string by putting the lower-case letter r just before
the opening quote of the string. Here are three strings with backslashes in them,
and I'll look at each one both by just asking Python its value, and then handing
it to print. I've deliberately tried to be ambiguous about whether
I want to represent the words "either" and "nor" separated by a backslash,
or the words "either" and "or" separated by a newline character:
s1 = 'either\nor' s1
print s1
s2 = 'either\\nor' s2
print s2
s3 = r'either\or' s3
print s3
|
Wow, that's a lot of work for those stupid backslashes. But if you have a long pathname to
a file, or some other string with backslashes, the raw form pays off so you're not
constantly writing \\ or getting weird returns and tabs and things in your string.
Alright, so much for the bookkeeping. Let's see what strings can do for us. There
are lots of string functions, so here is a selection that I think will give you a good start.
See the documentation for all the others, and all the variations and optional
arguments.
Remember that strings are read-only,
so except for find, all of these return a new string.
aString + bString: combine the strings into one larger string
find(pattern) : return the index of the first occurrence of the substring pattern, or -1 if none foundjoin(list) : a new list where the input elements are concatenated together with
the calling string placed between each pairsplit(pattern) : split the input string into pieces, breaking it apart at each
instance of pattern (see the notes at the end of this
document for some details on just how it does that)lower() : the string in entirely lower caseupper() : the string in entirely upper casereplace(oldstr, newstr) : replace instances of oldstr with newstrstrip() : remove leading and trailing whitespace
There's also a translate command which takes some setting up, but which will
convert all occurrences of one letter into another letter, for many such pairs, all
at once. Basically it applies a substitution cipher for you in one call, if you need
that kind of thing.
All of these methods can be applied to a variable holding a string, but Python's just as happy to let you call these operations directly on a string itself:
'This Is GREAT!'.lower()
|
The join and split commands are the
opposites of each other. The join command in
particular is a workhorse, with a little idiom you'll see over and over again. You'll
often find yourself with a list of strings that you'd like to mash together into
one big string. You could do this all kinds of ways, but the join method seems to be
the preferred option:
" ".join(["The", "best", "is", "yet", "to", "come"])
|
And you can use split to run the process the other way:
'The best is yet to come'.split(' ')
|
You'll see join and split
used like this all the time because string lists are very common.
If you join a list of strings with an empty string
between them,
the list elements just get mashed together:
"".join(["The", "best", "is", "yet", "to", "come"])
|
{ key : value, key: value ... }Dictionaries are lists of key/value pairs. The keys can be anything immutable (that is, things whose values you cannot change after they're created), though numbers, strings, and tuples seem the most common (that I've seen so far, anyway). The values can be anything, from simple data types to complex objects.
You can define a dictionary manually like this, using curly braces, colons, and commas:
myDict = { 'name' : 'Bob' , 'age' : '42' , 'haircolor': 'green' , 13 : 99 } |
Note that you can freely mix up the types of the keys and values. Now you can retrieve values from a given key:
myDict['name']
myDict.get('age')
|
These both work, but as we'll see below, the get form has some advantages.
You can also create new entries simply by assigning to them:
myDict['numberOfEyes'] = 2 myDict
|
A key point to keep in mind is that a dictionary's entries are, by definition, unordered. They are not stored in the order you provided them, and whatever order they might be in at some given time is not guaranteed to be preserved even for a moment. See the example above, where the order of the entries appears unrelated to the order in which they were defined. They don't just appear that way, they are. Never assume anything about the order of things in a dictionary. If you're using a built-in function to enumerate the entries (as below), then you're guaranteed to get all of them by the time you're done, but there's no predicting which one will be presented first, second, third, and so on. The moral is if you want something specific from a dictionary you need to do a built-in function call to search for and retrieve it; never try to save time by keeping your own shortcuts into a Python dictionary's internal structure.
Here's a quick guide to some important dictionary functions:
clear() : erase everything in the dictionarycopy() : returns a shallow copy of the dictionary.
If the values are native data
types like numbers, this is a complete copy. But if the values
are more complex objects, like lists, elements of your own classes (or even
other dictionaries) then the values are pointers to those other objects,
and so modifying them can surprise you later when you refer to them from
the first dictionary. This can get complicated, and I won't
get into it here. If you need a full (or deep) copy,
there's function called deepcopy in the copy module.fromkeys(stringlist) : given a list of strings, create a new dictionary with those
strings as keys. The values are all set to None (or you can provide a
new default as an optional argument).get(key) : if the key is in the dictionary, return the corresponding value, else return
None. Why is this here? Can't you get the value for a given key
with myDict['key']? Yes, you can, but if there is no entry in the
dictionary for that key, you'll get a KeyError, which will stop
your program (unless you catch and process it yourself, which I'll talk about later).
If you use get instead of the indexed syntax, you'll get back a value
if one exists, or the special symbol None if it doesn't.
has_key(key) : returns a Boolean telling you if the key is in the dictionary. You can
also write this as key in myDict.items() : returns a list of all key/value pairs in the dictionarykeys() : returns a list of all the keys in the dictionarypop(key) : remove this key/value pair from the dictionary and return the valuepopitem() : remove a randomly-selected key/value pair from the dictionary and return the valuesetdefault(key,val) : This call first looks to see if the key exists. If it does,
its value is returned. Otherwise, a new key/value pair is created
from the arguments, and the value is returned. It's kind of a hybrid
of setting and getting, where it checks, sets if needs to, then gets.update(dict2) : This copies over all the key/value pairs from dict2
into the calling dictionary. Any entries with the same keys are over-written.
If the calling dictionary has keys that don't exist in dict2,
they are unaffected.
values() : a list of all the values in the dictionaryUsing the examples above,
myDict.items()
myDict.keys()
myDict.values()
myDict.setdefault('language', 'French')
myDict.items()
|
Note again that the sequence of items returned by
keys(), items(), and values() are
completely uncorrelated. Each time you go to the dictionary you should think
of it as being in an unguessable state.
What if you want to run through your dictionary's entries in
some specific order?
You need to manually get the keys, put them in the order you want,
and then use the ordered list to retrieve the items.
For example, if you were happy to use the results of a standard
Python sort on myDict,
you might say:
for k in sorted(myDict.keys()) : (k, myDict[i])
|
Note that I used sorted here rather than sort.
If I'd tried to use sort I'd have received an error,
because myDict.keys() is not a sequence that sort
can sort in-place.
if test : block elif: block else: block
Where would a language could be without if statements? Nowhere, that's where.
The Python approach is the standard one: you say if and follow that with
a test that yields a Boolean result (that is, True or False), followed by a colon.
Unlike some languages, there's no need to put the test in parentheses. Then there's a code
black that's executed if it's true, and an optional else: clause to
introduce a block if it's not true:
if 8>3 : print "Yes! Whew, numbers work right in Python" else : print "Well, that's certainly unexpected"
if 2>3 : print "How could 2 be bigger than 3?" print "(that's just weird)" else : print "Python still works right" print "(what a relief!)"
|
Note how the indenting works. The lines of code that come after the if
statement are indented to the right of that statement, which makes them into a block.
As long as lines continue to be indented to the same level,
they're considered part of the block.
Then the
else clause is at the same level of indentation as the if,
with its code indented below.
There's no formal way to end these blocks; the first line that isn't indented as much
as the preceding lines marks the end of the block.
If you have a big chain of statements, you might find yourself putting new
if statements in your else clauses. If so, you can
use elif instead to save yourself some typing. I'll use some
made-up functions here to illustrate the point. We have a string with the
name of someone showing up to a party, and we want to print out the
appropriate greeting:
visitor = "Bob Michael Herringbone Smith" if visitor.IsOnGuestList() : print 'Welcome, honored guest.' elif visitor.IsRoyalty() : print 'We are honored, sire.' elif visitor.IsMyBrother() : print 'Hey, grab a beer.' else: print 'Do I know you?' |
In one of Python's very few exceptions to its draconian indenting rule, if your block is only one line long, you can put it on the same line as the expression that started it, after the colon. So we could write two of the three examples above more compactly (the second example can't be written in one line, because the blocks each contain two statements):
if 8>3 : print "Yes! Whew, numbers work right in Python" else : print "Well, that's certainly unexpected"
if visitor.IsOnGuestList() : print 'Welcome, honored guest.' elif visitor.IsRoyalty() : print 'Welcome, sire.' elif visitor.IsMyBrother() : print 'Hey, grab a beer.' else : print 'Do I know you?" |
def func(args) : block
Functions are easy. Put the word def at the start, name your function, follow it
with parens that enclose your arguments, then a colon, then give a block of indented statements.
Put a return statement at the end if you want to return something
(otherwise the function will return the special value None, which is
fine if you don't need it to return anything else).
If you want to return a bunch of things, put them in a tuple and return that.
There are more details if you want to dig into them. The only one that I think is important for you to know right now is that you can provide named arguments with defaults. Here's a function definition with a mix of arguments:
def makeMusic(songtype, speed='fast', key='G') : print "I'm making a "+speed+" "+songtype+" in the key of "+key makeMusic('Minuet')
makeMusic('Minuet', key="A flat")
makeMusic('Rock Opera', speed='pretentious')
|
If you need to support an unpredictable number of arguments, you can do that, but it gets a little complicated; see the documentation.
Here's a function that returns a value:
def superSizeMe(v) : return v*3 superSizeMe('Cola')
superSizeMe(4)
|
The last example works because * does something for both strings and integers.
This highlights my earlier point about Python's typing system.
Suppose that you inherited this piece of code from someone else,
or you wrote it yourself a long time ago and you don't remember it well.
Just by looking at the function declaration, it's not clear what type of variable the
function is expecting.
Heck, the argument v could be anything: a float, an integer, a dictionary,
or a picture of this year's fashionable designer socks.
In the example I tried a string and then an integer and they both did something,
though obviously they did very different things,
and there's frankly no way of knowing which was the intended usage.
The lesson is that there should be some comments or some other way of knowing
what type of variable is expected by this function.
It's probably not smart to be ambiguous this way except for fun.
Note that unlike some other languages, there can be only one definition of a function
in a given namespace. You can't define multiple versions that take different numbers
or types of arguments. You might then be tempted to do some kind of type checking,
for example to make sure that superSizeMe was actually given a string rather than a
number (or a dictionary or some other thing). From what I've read in the Python
documentation, you should resist that urge and bury it. Except for very rare and
specific circumstances, you should never explicitly check for the type of an object
(even though Python maintains those types, and you indeed can get at them). Instead,
you're encouraged to write your code, and name your objects, in such a way that there
will never be type confusion.
The saving grace, I guess, is that Python does do strong type checking
at run time
(e.g., if you try to take the square root of a string, you'll get an error).
But you aren't supposed to do it yourself. The result is that
you can stick your foot in your mouth
when creating something like the example above, where * almost accidentally works for
both numbers and strings. If I used a + sign instead, it would have failed:
def superSizeMe(v) : return v+3 superSizeMe(4)
superSizeMe('Cola')
|
Well, that's a reasonable error message, anyway.
I'm not sure where else these fit, but they're worth knowing about.
#:
Begin comment. Everything after a # on a line is normally a comment.
pass:
Do nothing. Empty blocks are illegal, so if you're programming up something
and you have a block of code you haven't implemented yet, you can't just leave
it blank or containing nothing but a comment. You need something there that
Python can interpret, but that has no effect. pass fills that niche.
eval expression
Execute the expression string like any other Python code, and return the result.
You can give it optional dictionaries that define the local and global namespaces
in which to execute; by default, it runs in its own little world.
range(n)range(low,high)range(low,high,step):
With one argument, this generates a list from 0 to
n-1. With two arguments, the list
goes from low to high-1. With three arguments,
the list starts with low, then each
entry is step larger than the last until the last entry is as large as possible but
less than high. Thus range(n) is the same as
range(0,n,1). The arguments may be
any integers, positive or negative, but step must not be zero. This is used all the
time to generate indices for accessing sequences. A for loop in C might
read for (i=3; i<20; i+= 4), so in Python you'd write
for i in range(3,20,4) :
str(expression):
This does its very best to turn expression into a string, suitable for printing
or further processing.
zip(s0, s1, ... sn) -> [(s0[0], s1[0], ... sn[0]), (s0[1], s1[1], ... sn[1]), ...]:
This command weaves together a bunch of sequences into a new list. The first entry
in the list is a tuple of the first entries of all the lists, in order. The
second entry is a tuple of all the second entries, and so on. The list is
only as long as the shortest input sequence; values in the longer sequences
are ignored.
False None 0 "" '' () [] {}:
These are all the
things in Python that evaluate to False when tested as a Boolean.
Everything else is True.
while and for
There are two basic loop constructs in Python: while and for.
They're much like their counterparts in other languages. You can halt evaluation of a loop
with a break statement,
and you force moving to the end of the block with continue.
Python style seems to be that most while loops are infinite. You use a condition
of True as your test, and then use break to stop.
There are plenty of while loops with real tests in the conditional,
but I see lots of the infinite kind:
while True : v = computeStuff() if stopNow(v) : break |
The for loop generally uses an in construct
to get all the elements of an object:
for v in list1 : doSomething(v) if stopNow(v) : break |
Note that this use of in is related to, but different from,
the use that tells us if a variable is a member of a sequence.
This in construction works with all kinds of types. If you're using a dictionary,
you'll typically want to pull out (key,value) pairs together:
for key,val in myDictionary : doStuff(key) if stopNow(key) : break if doneWithThisEntry(key) : continue doMoreThings(key) |
You can attach an else: clause to the end of these loops. The code in that
block is executed only if you did not leave the loop via a break statement.
for v in list1 : if stopEarly(v) : break doStuff(v) else: print "did everything in the list" |
If you're using the while True : convention, the else: clause
isn't useful because you'll never execute it.
[f(x) for x in obj]With all these lists around, you're constantly looping over them and affecting each element, writing code like this:
newList = [] for elem in myList: newList.append(doSomething(elem)) |
In words, make a new list by applying a function to each element of the input list. You might only choose to include those elements that meet some condition:
newList = [] for elem in myList: if test(elem) : newList.append(doSomething(elem)) |
This is done so much that earlier versions of Python had a bunch of routines to make
this kind of thing particularly easy. They're still around, called
map(),
apply(), and
filter(). But recent releases make it even easier to do this sort of thing
with a special syntactical shortcut. The shortcut is called a list comprehension,
and it's the name given to a special way of writing the loops above. The important two inputs
are the list to examine and a function to apply to each element of that list.
You write a little for loop to pull
out items from the sequence and hand them to the function. The list comprehension that
implements our first example above looks like this:
[doSomething(x) for x in myList] |
Note that we didn't need to explicitly construct the new list
using a variable (newList in this case)
and an append statement, because
the list comprehension itself creates and returns a list.
Let's use this to take a list and multiply each entry by 10:
a = range(10) def mul10(x) : return x*10 [mul10(x) for x in a]
|
If you're wondering if we needed to define the function and the list ahead of time, we didn't. This produces the very same result:
[x*10 for x in range(10)]
|
If you want to include a test for each element, you add that in at the end before the closing bracket. Here's the list comprehension that implements our second example (the one with the test):
[doSomething(x) for x in myList if test(x)] |
Again, because the list comprehension creates a list, this one short line replaces a four-line loop. That's convenient. Of course, if you need to keep the result around for later use, you'll want to assign the list comprehension to a variable:
newList = [doSomething(x) for x in myList if test(x)] |
You can call a defined function in a list comprehension, as suggested above, but you can also put the test right there into the expression. Let's see that in action by only multiplying the input numbers that are even:
[mul10(x) for x in a if x%2 == 0]
|
The variable x used in these examples in sometimes called a
dummy variable, because it doesn't really matter what it's called and
it's not used for anything else except simply showing Python how to use
the retrieved objects when making the function calls. Here's a variant
with a function that takes a few named arguments:
[doSomething(state='Washington', citizen=x) for x in myList if test(x)] |
Python supports making your own classes. You can do it super-simply, or get into all the gnarly detail. To do things the "new" way, put this at the top of your file, to tell Python to use the new approach:
__metaclass__ = type |
If it's not obvious on your browser, there are two underscore characters, one
right after the other, before and after the word metaclass.
We'll see this kind of thing a few more times in this document, so if something looks
like one long underline to you, remember it's probably two separate characters.
I'll assume you have this line at the top of your source code from now on.
To create a class, you simply define it and give it some methods:
class MyClass : def sayHi(self) : print 'hello' c = MyClass() c.sayHi()
|
OK, it doesn't get much simpler than that. Note that the method here has
the argument self. All methods receive a pointer to the object calling them as
their first argument. You must name it something, and you may name it anything
you like, but everyone everywhere always uses self, so it's a good
idea to do it that way. By convention, classes start with capital letters.
Classes can have constructors, called __init__ (there's that double
underscore before and after the word again). Of course, it takes the argument
self, and then it does whatever you'd like it to do. You can also give
__init__ more arguments if you like.
Here I've added an optional argument with a default
value.
class MyClass : def __init__(self, myName="classy"): self.myName = myName def sayHi(self): print "hi from "+self.myName d = MyClass() d.sayHi()
e = MyClass("a big green rocketship") e.sayHi()
|
So __init__ gets called automatically when you create an instance of the class.
You can use variables as well, simply referring to them with self.varName.
These are instance variables, but you can have class variables too.
Finally, you can of course inherit from other classes. Just name the classes you want to inherit from in parentheses after your class name. If there's more than one, separate them with commas. Unlike some other languages, when you create a new object of your class, your superclass's initializers are not automatically called. You have to call them yourself. Don't forget this! There are a number of ways to make these calls, but here's the "new" and recommended way:
class MySubclass(ParentClass) : def __init__(self) : super(MyClass, self).__init__() # do my local initialization stuff |
Basically super(MyClass, self) is looking up your parent's class for you. Then you
call its initializer.
You need that __metaclass__ = type line I mentioned above to
be at the top of your file for this to work.
Since the "new" way is the recommended way, I'd rather it was enabled by default.
You need that line only in files where you're defining new classes, but rather than hope
I remember at the right time, I'm trying to get in the habit of automatically
putting that line at the top of all my Python files.
If you want to know if an object supports a particular method, or has a variable
of a given name (these are called attributes), you can ask it with hasattr:
if hasattr(object, attribute) : print 'yes!' |
You can call this on a class or a class instance.
Sooner or later you're going to divide by 0. The question is, what happens next?
My programming experience has been to verify that my data is good before I operate on it. If there's a problem, I catch it and handle it. If there isn't, I proceed to do things with the data. For example, if I ask the user to type in an integer, I might check to make sure the input string contains only numbers before I try to do calculations with it.
The Python approach seems to go the other way around (not always, but often): assume your data is good, try to do stuff with it, and provide handlers for any errors that might arise.
The exception-handling syntax is similar to other languages. You put the code that
might fail in a try: block, and you provide except: blocks
to handle errors. You can also provide an else: block at the end, which
is executed if there were no errors.
def adder(): try: v = eval(raw_input('Enter a number: ')) v = v+3 except NameError: print "Hey! That wasn't a number you gave me." else: print 'Your number plus 3 is '+str(v) adder()
adder()
|
Here I caught NameError, because I tried this out first in the interpreter and
that was the error it raised. In the first line of the try: clause I get a string
from the user, and then I call eval to evaluate that string and gets its value.
So if the user types in a number, it evaluates to that number. The user could
also type in a variable name, or any expression, and eval will do its best to
evaluate it and return the result, raising errors along the way if needed. In
this example, if skoodle had been a variable with a numerical value, things would
have been fine. But skoodle is an undefined name, so you get a
NameError when you enter it. If I give it a value, we get what we expect:
skoodle = 45 adder()
skoodle = '45' adder()
|
This is no surprise, really, since the string '45' evaluates to
itself (that is, the string '45'),
and that's not a number. We can catch that too, and make our error messages a
little more descriptive.
Now that we know what type of error is raised (thanks, interpreter!),
we can add a new except clause to catch it:
def adder(): try: v = eval(raw_input('Enter a number: ')) v = v+3 except NameError: print "Hey! That looks like a name for something that doesn't exist." except TypeError: print "Hey! That's not a number! What're you trying to pull?" else: print 'Your number plus 3 is '+str(v) adder()
gronk=10 adder()
gronk='10' adder()
|
The documentation lists all the error types. A convenient way to get familiar with them is to just try your code in the interpreter, read the errors messages as you try your test cases, and then add handlers for them. After a while you'll start to anticipate what can go wrong and how it will be reported.
You can define your own errors if you want by subclassing existing error types;
see the documentation. You can raise any error, whether it's a system error
or your own, with raise:
raise TypeError |
OK, now we're getting a little more into what makes Python unique. We've
seen constructions like for i in list : that automatically
scan through list, producing elements one at a time,
saving them in the variable i which is then used in the body of
the for loop. There's a lot of smarts in the system and
in the list object that get co-ordinated to make this happen.
If you have a bunch of items of your own, you can probably always find a way
to walk through them one by one. But Python makes it easy (well, not too hard,
anyway) for you to "promote" your objects so that they can take part in
expressions like the for loop above. Objects that have that
ability are called iterators, and once your object is able to behave
like an iterator, you get to use all the nice syntax that those objects enjoy,
like list comprehensions.
To turn your object into an iterator, you need to implement four functions (if your object doesn't allow itself to be changed, you need only the first two):
__len__(self):
returns an integer, equal to the number of items in your object
__getitem__(self,key):
return the item associated with the given key. The key can be anything that's
convenient for you.
__setitem__(self, key, value):
set the item associated with this key to this value
__delitem__(self, key):
delete the item with this key
If the key for any of these functions is of the wrong type for you, you are
allowed to
raise TypeError. If the key is of the right type but is out of bounds, you
should raise IndexError.
This is one of the very few places you should do type checking in Python. To
check that a key is, for example, a long or an int, you might say
if not isinstance(key, (int, long)) : raise TypeError |
Iterators are great, but both the built-in iterators, and those of the form I just described above, share a kind of behavior that is sometimes desirable and sometimes awful: when you ask the iterator for its first value, in addition to making that value, it creates all the values for the iterator and saves them.
For example, suppose you give Python this for loop:
for i in range(8) : doSomething(i) |
When Python sees that range(8), it goes off to range,
hands it the argument, and asks for the sequence.
So range does its thing and produces a sequence
of 8 integers.
That sequence comes
back and Python stashes it away for further use as it continues to interpret
this line and then execute the loop.
This is no problem with a lightweight like range,
which just produces a list of integers.
But suppose your iterator is producing massive objects,
like different starting conditions for a full-planet weather simulator.
Making each of those objects can take a long time,
and storing them all could eat up huge amounts of memory.
It would be terribly wasteful to build and store them all before you
even get to work,
particularly if you might not need them all
(your loop might have a break in it and bail after processing
the first three items).
Even worse,
your iterator might be infinite,
producing an unlimited number of values.
There's simply no way Python can get an infinite sequence from you
and store it prior to running the loop!
All of these problems are solved with custom iterators, because they produce only one item at a time, when asked. If you're like me, you're a little confused now, because after all, haven't we just discussed custom iterators? Informally, yes, but Python supports two different ways for you to create your own iterator. The method discussed above seems to be referred to simply as an iterator, while the other way (which we'll discuss now) is called a custom iterator.
Custom iterators are simpler than regular iterators because they require only two procedures. Plus they have the advantage, discussed above, that they compute entries only when requested, rather than building them all at once before you even start processing the first. These points seem to argue for using the simpler custom iterator approach all the time, so that's what I do.
The magic to making it all work is that there's a hidden step
when Python uses custom iterators in expressions like loops and in constructions.
Under the covers, Python calls a routine called next()
that's attached to the iterator to retrieve the next value.
Each time next() is called, it's supposed to produce
one element from your sequence.
You can call it yourself, but usually you let Python do it for you.
So to create a custom iterator,
you give Python an object that implements next(),
and it does all the work from then on.
Here are the mechanics.
Your object implements a method called __iter__(self),
which returns an object. That returned object has an implementation for
next(self). That function returns one item of your sequence each time it's
called. If your sequence is finite and you've reached the end,
then your implementation of next() raises
StopIteration. This exception is caught by Python and
used to neatly terminate the loop.
In small examples the same object usually
implements both __iter__ and next,
so it simply returns itself as the object containing next().
In larger applications, the object can return some other object,
perhaps something big and complicated, as the object that implements next().
Here's an iterator that returns an infinite list of a few colors. I've added a number to the end to tell us how many times we've gone around the loop:
class MyIter() : def __init__(self): self.colors = ['red', 'orange', 'yellow', 'green', 'blue'] self.cindex = 0 self.loopCount = 0 def __iter__(self) : return self def next(self) : clr = self.colors[self.cindex] + str(self.loopCount) self.cindex += 1 if self.cindex >= len(self.colors) : self.cindex = 0 self.loopCount += 1 return clr |
To demonstrate this, I just create an instance of the class, and then
manually call next() on it.
iter = MyIter() iter.next()
iter.next()
|
The real beauty of course comes when you use this like any other iterator in one of Python's loops. Note that I've added a stopping condition here in the loop, since the iterator will produce an unending sequence of values if we let it:
iter = MyIter() for color in iter : print color if color == 'red3' : break
|
Beautiful. If your
next() routine is complex, you might want to package it up
in another object, and then return that object from
__iter__ rather than self. Python
doesn't care where
next() comes from or where it lives, it just needs a function to call.
There's another way to create a program that feeds next() statements, and that
approach does a lot of things implicitly. It's called a generator. You create a
generator by writing a normal function that computes some value. In the last example,
when Python called your iterator's next() function, the system ran that function
inside the state of a class instance. We used the object's own state variables to
drive the computation, and then returned a result.
But when next() points at a generator, that's different.
First, the generator is just
a function, and not a part of a class. So Python fires up this function and off it
goes, until it's computed an element to return. It then calls yield instead of
return.
Python takes a snapshot of the function just before it executes the statement
after the yield, and
stashes that state away.
Then when you call next() again, the program picks up from
where it left off. All of its local state is exactly as it was.
Like before, it runs until it
hits a yield statement (there can be lots of them in the function, if you need them)
and it freezes again, until you're ready for another.
Another next() call re-awakens it,
and it continues on from exactly that point.
So unlike a return statement that returns a value and
ends a function's execution,
the yield statement returns a value
and flash-freezes the function,
wrapping it in carbonite until you thaw it out again.
As with an iterator, you can create an infinite number of elements
simply by always returning your values with yield.
But unlike iterators,
where you raise StopIteration to signal that you have
no more elements left to serve,
in a generator you signal the end of your sequence by executing
a return statement,
which ends execution of the generator just as it does with any other function.
Python translates that into a StopIteration exception for you.
What I found a little confusing at first is that a generator has no explicit next()
statement, like iterators do. If you call next() on a generator, the first time
means "start running from the top of the function" and subsequent calls mean
"re-awaken and continue". I don't see any other way to control it, start it over again,
or change what it does. If you need to start over, you make a new generator and start
over. Here's a simple generator and its use:
def myGen(): a = 'zip' b = 'zap' yield a yield a+b yield (a+b).upper() return g = myGen() # create a generator g.next()
g.next()
g.next()
g.next()
|
By hitting the return statement in the generator,
a StopIteration error got automatically
raised. If we'd been inside a loop or an in construction, that would have neatly terminated
the loop. Here's an example
g = myGen() for i in g: print i
|
The loop terminated normally, and the interpreter was then waiting for me to type something new. If you have an infinite sequence, you might want to use a loop:
def myGen2() : mystr = '' while True : mystr += 'a' yield mystr g = myGen2() [g.next() for i in range(10)]
|
If you want to fill up your screen with junk extremely fast, run this generator without any stopping conditions. This will do it, but be sure you know how to interrupt or stop your interpreter because this will go nuts:
g = myGen2() for i in g : print i |
I like to use generators to neatly package up nested loops. For example, suppose that you want to sample a 3D volume and find the places where a function has a value above a threshold. The obvious way to go is to write a triply-nested loop. I'll use a simple function here for demonstration purposes. I picked the threshold of 3.0 experimentally so that we got back just a few results.
def loop3D() : for z in range(10) : for y in range(10) : for x in range(10) : v = (x*y*z)/200.0 if v > 3.0 : print(x,y,z,v) loop3D()
|
This is just fine. But let's do it with a generator that takes the function to be evaluated as an argument. Now the calling part of the program only has to make the generator and then iterate over it:
def genLoop3D(func) : for z in range(10) : for y in range(10) : for x in range(10) : v = func(x, y, z) if v > 3.0 : yield (x,y,z,v) returndef volumeFunc1(x,y,z) : return (x*y*z)/200.0g1 = genLoop3D(volumeFunc1)for r1 in g1 : print r1
|
The beauty of this is that the triply-nested loop is now
hidden away inside the generator. We simply call it to get
the values we want, and the details of the computation are
conveniently out of sight. Of course for this document I
kept genLoop3D simple, but if it was complex and
messy this would be a bigger win. For completeness,
let's make a new generator with a different function:
def volumeFunc2(x,y,z) : return (x+y+z)/8.0g2 = genLoop3D(volumeFunc2)for r2 in g2 : print r2
|
That strikes me as attractively minimal: we only need to define the function, create the generator, and then iterate over its returned values. The iteration even stops on its own when it's sampled the entire volume. Of course, we could parameterize the generator further, say by passing in the threshold and different ranges and step sizes along each axis.
I don't have much experience with generators yet. I think that
anything a generator can do can also be done by an iterator
(or a custom iterator),
and vice-versa.
From what I can tell today, they're essentially different
approaches to computing values in response to calls to
next(). If you like the idea of computing
a value from scratch based on local state variables, go with an iterator. If you like the idea of
writing a loop and catching the values one by one as they come out, use a generator.
Philosophically, I think iterators are well-suited when you think of
your objects as the result of a standalone computation:
you take some variables, compute with them, derive a result, and return it.
A generator is best when your objects are part of a larger computation
or an ongoing process.
For example, suppose that you were implementing a fluid-flow simulator,
tracking the positions of some toy sailboats as they slosh around inside
a bathtub on board a rocking boat.
Your generator might move time forward by a fraction of a second,
compute the state of the world and the fluid and the sailboats,
and then yield to give you the positions of the boats
so you can draw them.
Then the next time through the loop the simulator picks up where
it left off, increments time a little bit, and repeats.
This is nicely handled by a generator since the simulator is
inherently a free-running loop that delivers periodic values
as they're computed.
So to my mind, the choice of iterator or generator depends on whether it's more convenient to think of your computation as a function that operates on a set of parameters, or an ongoing process that creates a string of intermediate results.
Here's a little side note of warning. When using the interpreter, I sometimes forget the parens and say something like
t = myGen # almost surely wrong! |
That's a legal assignment, but then t is just a pointer to a function, not a magical
next()-calling state-preserving generator, so when I try to use it it doesn't behave
properly and I get lots of errors. If your generator's behaving
really erratically, make sure you didn't forget the parens when you made it.
It's easy to work with files. Here's the essence for text files.
f = open(r'C:\personal\secret\poetry.txt', 'r') |
Note that I used the "raw" form of string naming by placing the little r before the
string, so I didn't have to escape all those backslashes. The second string is the
open mode, like in most other languages, and can be
'r',
'w', or
'a' for read, write,
and append. Each of these can be followed by
'b' for binary mode, and/or
'+' to
enable both reading and writing. So if I wanted to write to the file above as well
as read it, I'd use
'r+' for the mode.
You can read a line of text from the file with
readline, or write to it with write:
line = f.readline() f.write(line) |
When you read a line of text using readline(),
Python gives you back all the characters in that line,
up to and including the final newline character.
If you don't want the newline,
you can throw it away using a slice:
line = f.readline() line
line[:-1]
|
If you continue reading lines past the last one,
your return string comes back as '';
that is, an empty string.
When you're done, call f.close() to close the file.
A common convention is to loop through the lines of the file, and because of how files
are set up, you can do this with the in construct, just like looping through the
items in a list. Files are internally set up to be iterators that crank out
one line of text each time you ask them for more stuff:
f = open(filename, 'r') for line in f: processLine(line) f.close() |
This will naturally end the for loop when you run out of lines to read.
There are a few constructions I see all the time. Probably this list should be far larger, but these are the ones that seem useful right now.
Asking for index -1 gets you the last entry of a sequence:
seq[-1] |
Asking for index 0 get the first entry of a sequence:
seq[0] |
You can unpack a list or tuple into several variables at once just by assigning them in order as they come out of the sequence:
x,y,z = [1,2,3] |
Grab a quick and dirty input string from the user, giving them a prompt first:
answer = raw_input('prompt:') |
To create a tuple of 1 element you have to have a trailing comma. It looks wierd, but without the comma you'd just have the value in parentheses, which evaluates to the value itself. The comma makes it a tuple:
(v,) |
You often want to run through all the entries in a dictionary. You can ask it for the items, and then immediately unpack the key and value into variables that you can then use in the rest of the loop:
for key,val in myDict.items() : |
The in construct is great for going through a list,
but sometimes you want to know
which index you're looking at now, perhaps so you can change it in-place.
Use enumerate to get both of these things, but remember that this index
won't be meaningful for dictionaries (see the Notes section at the end of the document
for an example of what happens if you try):
for i,val in enumerate(obj) : |
Take a string and turn it into a list of single characters:
list(str) |
Take a list of single characters and turn it into a string. More precisely, the elements of the list are placed into a new string one after the other, with the calling string between them. In this idiom, the calling string is the empty string, so the letters are mushed together.
"".join(nlilst) |
The idiom above is also useful if you have a list of words. Putting them together with a single black space between them makes them readable:
" ".join(nlilst) |
In a function, if you refer to a variable, Python assumes you're talking about a variable of that name that's local to the function. If there's a global variable out there that you're trying to affect, you won't be able to. If you do want to refer to a global variable, then declare your variable name to be a global at the start of your function. Now when you refer to it, Python will look at the globals in your namespace, rather than your function's local variables:
global var |
Run the test code for a module:
if __name__ == "__main__" : runTestCase() |
This one takes a little explaining. It's handy to develop code by writing it in
text files (conventionally ending with the ".py" suffix) and then reading it into
the interpreter with the import command (to import "cheese.py" type import cheese, but note that
if you include the .py you get an error!). Once it's been read into the interpreter,
you can use it like any other code. You can of course also import it into other
text files. So suppose you've written a bunch of routines and you've packaged them
up nicely, but you'd like to include some test cases to either verify that they're
working correctly, or just demonstrate their function. You'd like to be able to
run that package directly from the shell, e.g. "python cheese.py" (in this form,
you DO need the .py extension!). How do you get your test/demo code to run automatically
when you do this, yet not run automatically when someone merely imports your file?
This bit of magic does the trick. There's a variable named __name__ created and maintained
for you by Python. If your file is being loaded as part of something else, or being
included in another module, then __name__ has the name of your file. If your program
is being invoked from the command line, then __name__ has the value __main__. So
you test the value of __name__, and if it is __main__, you call your test program.
This sure looks ugly, but it's the way to do it, and you'll see this everywhere,
almost always at the very end of the file.
That's it! We're done!
You should be able to knock out Python code to do pretty much any classic programming tasks you want to. If you want to now talk to the web, or read mp3 files, solve the Tower of Hanoi, manipulate your computer's file system, play music, draw graphics, or create a GUI front end, you can do those things by finding the right libraries to help you.
Of course, there's more to Python than I've covered here. Consult any of the many online and printed references and books both for more information on features I've discussed, and introductions to features I haven't covered at all.
There's a common phrase in the Python world that Python "comes with batteries included". That means that if you're trying to something complicated but not terribly specialized, then someone somewhere has probably already written and debugged code to do it, and that code is either part of Python already (in the form of options to built-in routines) or it's available in library form on the web. Rolling your own complicated data structures is a great learning exercise, but if you just need one to do your work, you can probably find it out on the web, packaged up in a ready-to-run library.
There are lots of free libraries out there. Although Python is platform-independent, a lot of low-level libraries (quite reasonably) are not (for example, it doesn't make sense to execute Windows commands on a Linux box). When you find a library, make sure it's compatible with your system, and follow the installation instructions carefully.
The sort and sorted routines let you provide your
own functions to help them in their work. One function
takes two objects and returns -1, 0, or 1 depending on whether the first is
less than, equal to, or greater than, the second. The beauty of using your
own function is that those things can mean anything you like: you can sort
on colors of the rainbow or your favorite flavors or the sequence in which you
encounter animals when talking a walk clockwise through the zoo. You can also
use the built-in comparison function and instead supply your own "scoring" function,
which assigns a number to an object (they call this a key function). Or you
could replace both functions, but I can't think of any instances right now
when you'd need to - either one seems to do the trick.
Here's an example of replacing the scoring function. Remember
that sort
works in-place and modifies the string that calls it.
r=range(15) def myscore(x) : return x % 5 r.sort(key=myscore) r
|
I found the details of split kind of confusing.
Try it on something with a
bunch of repeated split characters, or with split characters at the beginning
and/or end:
st = '//how I///love/to sing!/' st
st.split('/')
|
I looked at output like this and wondered to myself just what the rules are.
I suppose I could find the source to split somewhere,
since Python is open-source,
but I thought I'd learn more by trying to work it out on my own and write my own
version of the routine. Actually that's kind of cool, since it also let me make predictions
using my code and test them against Python's built-in version. I'm offering
the result here to inspire you to try the same thing on language features
that catch your eye.
I don't guarantee the
program below will always match the built-in split,
but it's been a perfect match on everything
I've thrown at it. In words, find the first occurrence of the fence from your starting
point and return everything from your starting point to the left of the fence. Then move
the starting point to the end of the fence and repeat, until you reach the end of the input
string. The oddball cases for fences at the start and end, or repeated, are naturally
handled by this algorithm, which turns them into empty strings just like split does.
def mysplit(s, fence) : words = [] left = 0 while 1 : right = s.find(fence, left) if right < 0 : break words.append(s[left:right]) left = right + len(fence) words.append(s[left:]) return words |
Let's try it out:
mysplit(st, '/')
st.split('/') == mysplit(st,'/')
|
Of course, it had better work for fences that are longer than just one character, and it does:
s2 = "It 'tis a fish, it is, a very fishy fish it sisisis indeed." s2.split('is')
mysplit(s2,'is')
s2.split('is') == mysplit(s2,'is')
|
The real split has nice default behavior when you call it with no arguments, which
mine doesn't, so this isn't a complete replacement.
When I talked about enumerate earlier,
I said that it isn't useful for dictionaries. Let's prove it.
I'll create four entries, numbered 1 through 4, and then
enumerate them. Watch the order in which they come out:
d = {'k1':'v1', 'k2':'v2', 'k3':'v3', 'k4':'v4'} for i,v in enumerate(d) : print i, v
|
Yup, dictionary items are served up in unpredictable ways, so having the loop index around doesn't help you access them in the dictionary. The index does tell you how many items you've processed so far, though, which could be useful.
Python supports multidimensional arrays, but it doesn't need any special mechanism for it. You just create a bunch of nested lists. But to make things work out properly, you need to decide beforehand in what order you're going to access the array's elements. For example, when writing about a 2D matrix you usually write an entry as (row,col). So (4,3) refers to the entry 4 down and 3 across (or 5 down and 4 across if you're counting from zero). On the other hand, when placing points on paper, we usually write them (x,y), in which case (4,3) means 4 across and 3 down. We know whether to go (over,down) or (down,over) by convention and context. Since Python lacks those things, we need to create and enforce our own policy.
Here I'll create a 2D grid that I'm going to use as a matrix. I want to access the
cells as grid[y][x], or
grid[row][column]. Here's how I construct it:
myGrid = [[0 for w in range(NumColumns)] for h in range(NumRows)] |
The grid is filled with elements that all have the value 0
because that's the expression that gets evaluated for each element
in the innermost loop.
Now I can get the element 2 down and 6 over with myGrid[1][5]
(counting from zero, of course), as I planned.
There's a quirk here that's important to observe. I find that depending on my mood, this quirk is either entirely obvious or utterly opaque. Here's the thing: I said that I wanted to access the array by giving the row index first (that is, closest to the variable name), then the column index second (that is, farthest away), so I created the array in the opposite order: the innermost loop generates the columns, the outermost loop generates the rows.
If this strikes you as entirely reasonable and sensible, then you can skip the rest of this section. Otherwise, it may help to think about the structure of the object we're making: it's a list of lists. Python runs the inner loop to create a list of numbers, and there as many of these as there are columns. So this inner list generates one horizontal slice of our matrix. Then Python increments the outermost loop, and we generate the next horizontal slice. So the outer loop creates a list with one item for each row, and those are created by the inner loop, which creates the elements of the row. When we want to access an element, we select first the row (from the outermost list) and then the column (from the innermost list). In other words, we create the array by looping through the indices in the reverse order from how we intend to access them.
Here's an example for a 3D space. I've decided I want to access my points in the
traditional geometric way, so the point at (x,y,z) will be retrieved with
space[x][y][z]:
space = [[['x='+str(x)+' y='+str(y)+' z='+str(z) for z in range(3)] \ for y in range(3)] for x in range(3)] space
space[0]
space[0][1]
space[0][1][2]
|
(To make this fit the page, note that I split the input into two lines using the
continuation character \. I also manually reformatted the output.)
So it worked as planned. space[0] is the plane x=0,
space[0][1] is the line where x=0
and y=1, and space[0][1][2] is the point (0,1,2).
So we access a specific entry (x,y,z)
using space[x][y][z], with x in the first position, then y, then z.
But to make that work,
we constructed it in the opposite order, with z in the innermost loop, then y, then x.
I find that this often makes perfect sense. And often it seems completely backwards. Weird.
In "Beginning Python" by Magnus Lie Hetland, Chapter 9 ends with a challenge to write a program to solve the N-queens puzzle. The challenge is to place N queens on a checkerboard of size N by N so that none of the queens can capture each other. One queen can capture another if they are on the same row, column, or diagonal. Hetland's solution is a great piece of Python, both short and elegant. Before looking at it, though, I wrote my own.
I'm including it here not because it's great code, but because it isn't. I'm a beginning Python programmer without much experience, but I'm trying to think the Python way. I took a brute-force algorithm for solving the problem, and pretty much wrote brute-force code for solving it. And it worked, and it's reasonably clear, and it's pretty Python-ish. I thought I'd share it with you as a kind of example of how even a little knowledge of Python can let be useful right away.
I wrote a function that takes as input an N-by-N checkerboard. All the squares begin with the value 0, meaning that they're unoccupied and unthreatened. When I enter my queen-placing routine, I search for a square with the value 0. To place the current queen there, I make a copy of the input board, I set that queen's square on that copy to the number of that queen, and I mark all the squares in the copy that queen threatens with the negative of that queen's number. All of that is overkill - I could just mark them with, say, 1 to mean unavailable, but I liked looking at the progress of the board as it goes, and all those numbers helped with the debugging. Then I recursively hand that board to the function again, inviting it to find another 0 for the next queen, and so on.
Here's my code. It's not optimized for anything, except maybe for some clarity and an over-abundance of comments.
# 1 August 2008 / AG # Solve the N-queens problem # Set the board size and the number of queens you want to place and let 'er rip! # # The idea is we keep a board of the current state. To place queen N, we look for a # square with the value 0. When we find one, we set it to N and set all the squares # it can capture to -N, and recurse. Eventually they all find a home. Note that this # means queens are numbered [1..N] rather than [0..N-1]. # __metaclass__ = type # a good habit is to make this the first line of |
Like I said, this is really lousy code. But my goal was simply to show
a piece of running Python, rather than a nice algorithm.
The program starts with initRun() which initializes the
board to all 0's, and then calls placeQueen
for queen number 1.
The placeQueen algorithm is brute-force and greedy. It
loops through the squares in the board, looking for one with a zero
in it (that's the top two for loops).
If it finds such a square,
it makes a complete copy of the board,
puts a queen into the appropriate square in the copy,
marks the squares that the queen threatens,
and then calls itself with the next-numbered queen.
If the cell it's looking at isn't a zero,
or the recursion call returns,
then the nested loops pick up again,
scanning the original board
(not the copied one)
for the next square with value 0.
The termination condition is when we place the last
queen (that is, we place queen number NumQueens).
At that point I just print out the board and quit.
Of course, if I wanted to find all the solutions,
I could then put a 0 back into that final queen's
cell and return,
continuing the search.
Here's an example run, starting from the command prompt. The file's name is queens.py:
Octomac:playpen$ python queens.py
Octomac:playpen$ |
If you're motivated to modify this code (or write your own) to implement a more subtle and efficient algorithm, or use more sophisticated and enlightened Python structures, there's tons of fun to be had in both directions.
I think a fun modification would be to replace the main for loops
with an iterator (or generator) that produces the 2D indices.
You could also write another iterator that produces all the
cells threatened by a queen at a given cell.
I'm not sure if the program would be much shorter or clearer,
but it would be a lot Pythonier.
I didn't do all the color formatting in this document by hand. I wrote this as a plain text file with a little tag at the start and end of each block of code. I then wrote a Python script to read the text file and treat lines between those custom code tags in a special way, first figuring out if each line is input or output, and then formatting it properly. My plain text has a bit of HTML markup in it to control sections and changing font here and there, but because all of that is outside of the code blocks the script just passes it through. I got the idea for this kind of format, and swiped the HTML code to make it happen, from http://www.secnetix.de/olli/Python/list_comprehensions.hawk.
I owe a big thanks to Eric Haines, who read this document carefully and suggested numerous improvements, all of which made it better and clearer. Any problems that remain should be laid at my feet.
-- END --