Part C: Dictionaries, Arrays, Functions, Modules

Mappings

Mappings store associations between a set of keys and a set of values. They can be regarded as generalizations of sequences, since sequences are a restricted kind of mapping whose keys are a range of integers. Mappings allow more general keys, though, and impose no order on their elements. There are other essential differences between sequences and mappings, so the two should be kept clearly distinguished.

Mappings are accessed much like sequences:
mapping[key]

returns the value associated with the key, and
mapping[key]=value

changes it.

A list of all keys is available using
mapping.keys()

and a list of all values using
mapping.values()

A list of (key, value) tuples can be obtained using
mapping.items()

Dictionaries

The most frequently used mappings, and the only kind discussed here, are dictionaries. They allow any non-mutable object to be a key. That is, a key cannot change during its lifetime. This means that lists, dictionaries, and arrays cannot be used as keys. However, integers, tuples, and strings (among others) are allowed as keys.

A dictionary's values, though, can be arbitrary objects.

The following example creates a dictionary containing the atomic masses of some chemical elements. The element names are the keys and the masses are the values. The dictionary is a set of key:value pairs:
In [1]: atomic_mass = {'H': 1., 'C': 12, 'S': 32}

Then we can add another key:value entry using:
In [2]: atomic_mass['O'] = 16.

Let's check what we've created:
In [3]: atomic_mass.keys()
Out[3]: ['H', 'C', 'S', 'O']

In [4]: atomic_mass.values()
Out[4]: [1.0, 12, 32, 16.0]

Let's calculate the mass of a greenhouse-gas molecule.
In [5]: print(atomic_mass['C'] + 2*atomic_mass['O'])
44.0

A dictionary entry can be deleted with del dictionary[key].
In [7]: del atomic_mass['C']

In [8]: print(atomic_mass.items())
[('H', 1.0), ('S', 32), ('O', 16.0)]

Unlike sequence objects, dictionaries have no order. In fact, internally Python randomly shuffles the items in a dictionary to build an efficient indexing scheme—called a hash.

Nonetheless, you may want to get a sorted list of the items in a dictionary; sorted either by key or by value. Given the internal random indexing it doesn't make sense to directly sort a dictionary. Here's how you can do this, though.

First, let's look at how to produce a list of values in the dictionary above sorted by keys (the atoms' symbols).
In [60]: atomic_mass = { 'S': 32. , 'H': 1., 'C': 12. }

In [61]: atom_names = atomic_mass.keys()

In [62]: atom_names.sort()

In [63]: atom_mass_keysorted = [ atomic_mass[name] for name in atom_names ]

In [64]: print(atom_mass_keysorted)
[12.0, 1.0, 32.0]

Notice the unusual, but rather compact for loop syntax used when producing the sorted values atom_mass_keysorted.

This is a construction called a list comprehension. It is a form that abbreviates the multiple-line code:
In [22]: atom_mass_keysorted = []

In [23]: for name in atom_names:
   ....:     atom_mass_keysorted.append(atomic_mass[name])
   ....:     
   ....:     

In [24]: print atom_mass_keysorted
[12.0, 1.0, 32.0]

List comprehensions collect the results of applying an arbitrary expression to a sequence of values, returning them in a new list. (You can read more on list comprehensions in the Python textbook.)

Now, let's see how to get a list of keys (atom names) sorted by value (atomic mass).
In [65]: items = atomic_mass.items()

In [66]: reversedItems = [ [v[1],v[0]] for v in items ]

In [67]: reversedItems.sort()

In [68]: atom_names_valuesorted = [ reversedItems[i][1] for i in range(0,len(reversedItems))]

In [69]: print(atom_names_valuesorted)
['H', 'C', 'S']

Make sure you understand each step above and the component constructions in each case. The syntax is a bit tricky, but it does begin to hint at the conciseness, subtlety, and power of Python.

A good exercise at this point, one that would guarantee that you do understand the above code, would be to rewrite it without using list comprehensions, using instead for loops.

Arrays

Arrays are another kind of sequence object with special properties optimized for numerical applications. Arrays and functions dealing with them are defined in the module NumPy, which also contains the mathematical functions introduced earlier. The (rather extensive) manual for NumPy is here. You can also get good online documentation for NumPy at the SciPy Documentation page.

It is convenient to begin an interactive session for numerical calculations with
In [8]: from numpy import *

And we will do this here.

However, while handy for an interactive session, importing this way is not good programming practice. Why? It might lead to name collisions between numpy functions and those you write or those in other packages. More on this later, when we start writing our own packages.

Arrays have two properties that distinguish them from other sequences. First, they can be multidimensional. And, second, their elements are all of the same type: either integers, real numbers, or complex numbers. (There are also arrays of characters, single-precision real numbers, and general Python objects. We won't discuss them for now.) The number of dimensions is limited to 40.

Arrays are created from other sequences by the function numpy.array. Multidimensional arrays are created from nested sequences. An optional second argument indicates the type of the array: integer, float, or complex. If the type is not indicated, it is inferred from the data.

The following creates a one-dimensional array of integers:
In [9]: from numpy import *

In [10]: integer_array = array([2, 3, 5, 7])

In [11]: print(integer_array)
[2 3 5 7]

In [12]: print(integer_array.shape)
(4,)

And the following creates a three-dimensional array of real numbers:
In [13]: real_array = array([ [ [0., 1.], [2., 3.] ],
   ....:                      [ [4., 5.], [6., 7.] ],
   ....:                      [ [8., 9.], [10., 11.] ] ])

In [14]: print(real_array.shape)
(3, 2, 2)

In [15]: print(real_array)
[[[  0.   1.]
  [  2.   3.]]

 [[  4.   5.]
  [  6.   7.]]

 [[  8.   9.]
  [ 10.  11.]]]

The shape of an array, printed above for each array, is a tuple containing the lengths of the dimensions of the array: (4,) for the first array and (3, 2, 2) for the second.

The number of dimensions is the length of the shape and is referred to as the array's rank.

The standard number objects (integers, real, and complex numbers) can be regarded as arrays of rank zero.

Constructing arrays can be tedious. Here are a few useful functions for building them up.

There are a few functions in module NumPy that create special arrays. The function zeros(shape, type) returns an array of the specified shape (a tuple) with all elements set to zero. The second argument specifies the type (integer, float or complex) and is optional; the default is integer.
In [19]: tt = zeros((3,4),float)

In [20]: print(tt)
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]

The function ones(shape, type) works similarly, setting the elements to one.

The function arange(first, last, step) works much like range, but returns an array (of rank one) and accepts real-number arguments in addition to integers.
In [22]: ttt = arange(1.0,27.5, 0.75)

In [23]: print(ttt)
[  1.     1.75   2.5    3.25   4.     4.75   5.5    6.25   7.  7.75   8.5 
        9.25  10.    10.75  11.5   12.25  13.    13.75  14.5 15.25  16.  
       16.75  17.5   18.25  19.    19.75  20.5   21.25  22.  22.75  23.5 
       24.25  25.    25.75  26.5   27.25]

Arrays support an extended form of the indexing used for sequences. In addition to the forms a[i] (single element) and a[i:j] (subarray from i to j), there is the form a[i:j:k] that extracts a subarray from i to j with an increment of k.
In [25]: tttt = ttt[1:10:2]

In [26]: print(tttt)
[ 1.75  3.25  4.75  6.25  7.75]

For multidimensional arrays, the indices and ranges are separated by commas. If there are fewer indices than dimensions, indices are assigned starting from the left.

Note: This can be confusing; well, no, it is confusing. So, in going through the following, test your understanding at each step by exploring a bit. You'll thank yourself later on by thoroughly understanding arrays now.

Indexed arrays can also be used as the target of an assignment, as for lists.

Here are some examples of array indexing. First, we create a zero array:
In [29]: a = zeros((4, 2, 3), float)

In [30]: print(a)
[[[ 0.  0.  0.]
  [ 0.  0.  0.]]
 [[ 0.  0.  0.]
  [ 0.  0.  0.]]
 [[ 0.  0.  0.]
  [ 0.  0.  0.]]
 [[ 0.  0.  0.]
  [ 0.  0.  0.]]]

Set a specific element to one:
In [31]: a[2,1,0] = 1.

In [32]: print(a)
[[[ 0.  0.  0.]
  [ 0.  0.  0.]]
 [[ 0.  0.  0.]
  [ 0.  0.  0.]]
 [[ 0.  0.  0.]
  [ 1.  0.  0.]]
 [[ 0.  0.  0.]
  [ 0.  0.  0.]]]

Set all elements with first index 0 to 2.5:
In [33]: a[0] = 2.5

In [34]: print(a)
[[[ 2.5  2.5  2.5]
  [ 2.5  2.5  2.5]]
 [[ 0.   0.   0. ]
  [ 0.   0.   0. ]]
 [[ 0.   0.   0. ]
  [ 1.   0.   0. ]]
 [[ 0.   0.   0. ]
  [ 0.   0.   0. ]]]

Print all elements with first index 0 and last index 1:
In [35]: print(a[0,:,1])
[ 2.5  2.5]

But which elements in a where these? Let's check:
In [36]: a[0,:,1] = 7.

In [37]: print(a)
[[[ 2.5  7.   2.5]
  [ 2.5  7.   2.5]]

 [[ 0.   0.   0. ]
  [ 0.   0.   0. ]]

 [[ 0.   0.   0. ]
  [ 1.   0.   0. ]]

 [[ 0.   0.   0. ]
  [ 0.   0.   0. ]]]

Be sure you're comfortable with how the slicing indicated these particular elements (7s).

There are two special indices that do not directly select elements.

First, the special index ... (three dots) "skips" dimensions such that indices following it are assigned starting from the right. For example, a[..., 0] selects all elements with 0 as their last index. This works for any number of dimensions.
In [38]: print(a[...,0])
[[ 2.5  2.5]
 [ 0.   0. ]
 [ 0.   1. ]
 [ 0.   0. ]]

Second, the special index newaxis (defined in module numpy) inserts a new dimension of length one.

For example, if a has the shape (2, 2), then a[:, numpy.newaxis, :] has the shape (2, 1, 2) and the same elements as a.
In [39]: a = array( [ [0.,1.],[2.,3.] ] )

In [40]: print(a)
[[ 0.  1.]
 [ 2.  3.]]

In [41]: aa = a[:, newaxis, :]

In [42]: print(aa)
[[  [ 0.  1.]]
 [  [ 2.  3.]]]

In [43]: aa.shape
Out[43]: (2, 1, 2)

Arrays can also be used as sequences. These can appear, for example, in loops. In such a situation, the first index becomes the sequence index, and the elements of the sequences are arrays with a smaller rank. This is a consequence of the rule that assigns indices to dimensions starting from the left.
In [44]: for i in a:
   ....:     print(i)
   ....:     
   ....:     
[ 0.  1.]
[ 2.  3.]

The standard arithmetic operations (addition, multiplication, and so on) and the mathematical functions from the module NumPy can be applied to arrays. The operations are applied componentwise, that is, individually to each array element.
In [94]: sqrt(aa)
Out[94]: 
array([[[ 0.        ,  1.        ]],

       [[ 1.41421356,  1.73205081]]])

For binary operations (addition and the like), the arrays must have matching shapes. "Matching" here does not mean "equal". It is possible, for example, to multiply a whole array by a single number, or to add a row to all rows of a matrix.

The precise matching rule is the following: Compare the shapes of the two arrays element by element, starting from the right and continuing until the smaller shape list is exhausted. If, for all dimensions, the lengths are equal or one of them is one, then the arrays match. A less rigorous description is that dimensions of length one are repeated along the corresponding dimension of the other array.

Example:
In [11]: a = array([[1, 2]])

In [12]: a.shape
Out[12]: (1, 2)

In [13]: b = array([[10], [20], [30]])

In [14]: b.shape
Out[14]: (3, 1)

When you ask for a+b, hidden from view, a is repeated three times along the first axis and b is repeated twice along the second axis. In effect, there is an intermediate stage:
a = array([[1, 2], [1, 2], [1, 2]])
b = array([[10, 10], [20, 20], [30, 30]])

Now both arrays have the same shape and the addition is done component-wise:
In [15]: print(a+b)
[[11 12]
 [21 22]
 [31 32]]

It turns out that the arrays are not physically replicated, so there is no risk of running out of memory.

Binary operations also exist as functions (in module NumPy): add(a, b) is equivalent to a+b, and subtract, multiply, divide, and power can be used instead of the other binary operators.

There are additional binary operations that exist only as functions:
maximum(a, b) larger of a and b
minimum(a, b) smaller of a and b

Let's try these:
In [16]: a = array([1,2,3,4,5])

In [17]: print(a)
[1 2 3 4 5]

In [18]: b = array([2,3,4,5,6])

In [19]: print(b)
[2 3 4 5 6]

In [20]: maximum(a,b)
Out[20]: array([2, 3, 4, 5, 6])

In [21]: minimum(a,b)
Out[21]: array([1, 2, 3, 4, 5])

Here are some others, give them a try.
equal(a, b), not_equal(a, b) equality test (returns 0/1)
greater(a, b), greater_equal(a, b) comparison
less(a, b), less_equal(a, b) comparison
logical_and(a, b), logical_or(a,b) logical and/or
logical_not(a) logical negation

The binary operations in this list can also be applied to combinations of elements of a single array. For example, add.reduce(a) calculates the sum of all elements of a along the first dimension, and minimum.reduce(a) returns the smallest element in a.
In [22]: add.reduce(a)
Out[22]: 15

In [23]: minimum.reduce(a)
Out[23]: 1

An optional second argument indicates the dimension explicitly. It can be positive (0 = first dimension) or negative (-1 = last dimension).

A variation is accumulate: add.accumulate(a) returns an array containing the first element of a, the sum of the first two elements, the sum of the first three elements, etc. The last element is then equal to add.reduce(a).
In [25]: add.accumulate(a)
Out[25]: array([ 1,  3,  6, 10, 15])

Another way to derive an operation from binary functions is outer. The function add.outer(a, b) returns an array with the combined dimensions of a and b whose elements are all possible sum combinations of the elements of a and b.
In [29]: add.outer(a,b)
Out[29]: 
array([[ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10],
       [ 7,  8,  9, 10, 11]])

More array operations will be described as we go along.

Functions

You have already used many Python functions and probably encountered situations for which you wished to write your own. Functions allow you to write (and test) a certain piece of code once and then use it again and again. For now we will only treat functions defined in Python. It is also possible to define functions in C or other compatible low-level languages, but this is an advanced topic that requires more detailed understanding of Python.

The following code example defines a function called distance that calculates the distance between two points in space. It calls this function for two arbitrary vectors:
In [30]: from numpy import array, sqrt

In [31]: def distance(r1, r2):
   ....:     r = r1 - r2
   ....:     rr = r*r
   ....:     return sqrt(rr.sum())
   ....: 

In [32]: a = array([1., 0., 0.])

In [33]: b = array([3., -2., 1.])

In [32]: print(distance(a,b))
3.0

In effect, a function call consist of three steps:

  1. The arguments in the function call are assigned to the corresponding variables in the function definition.
  2. The code in the function is executed.
  3. The value given to the return statement is put in the place of the original function call.

Functions can take any number of arguments (including none) and return any number of values (including none).

They can also define and use any number of variables. These are local to the function. In particular, they have no relation to variables of the same name in another function or outside any function. These variables disappear after the function has completed.

However, a function may use (but not change) the value of variables defined outside it.

A function's argument variables are also local to the function. Inside the function, they can be used like any other variable.

The following function converts from Cartesian to polar coordinates. It takes two arguments and returns two values. It also defines several local variables.
In [39]: def cartesianToPolar(x, y):
   ....:     r = sqrt(x**2+y**2)
   ....:     phi = arccos(x/r)
   ....:     if y < 0.:
   ....:         phi = 2 * pi - phi
   ....:     return r, phi
   ....: 
   ....:         

In [40]: print(cartesianToPolar(-1.,0.))
(1.0, 3.1415926535897931)

In [41]: radius, angle = cartesianToPolar(1., 1.)

In [42]: print(radius)
1.41421356237

In [43]: print(angle)
0.785398163397

Functions are a kind of data object. Like all data objects, they can be assigned to variables, stored in lists and dictionaries, or passed as arguments to other functions.

The following example shows a function that prints a table of values of another function that it receives as an argument:
In [44]: def printIterates(OneDMap,InitialCondition,nIterates):
   ....:     x = InitialCondition
   ....:     for i in xrange(nIterates):
   ....:         x = OneDMap(x)   
   ....:         print(i, x)
   ....:         
   ....:         

In [45]: def LogisticMap(x):
   ....:     return 4.0 * x * (1.0 - x)
   ....: 

In [46]: printIterates(LogisticMap,0.3,10)
(0, 0.83999999999999997)
(1, 0.53760000000000008)
(2, 0.99434495999999994)
(3, 0.02249224209039382)
(4, 0.087945364544563753)
(5, 0.32084390959875014)
(6, 0.87161238108855688)
(7, 0.44761695288677272)
(8, 0.98902406550053368)
(9, 0.043421853445318986)

Modules

So far our programs have been written as single files containing all the required components, including importing definitions from standard modules. In this scheme, a function required in several programs must be copied to each program file. This is inconvenient for several reasons. First, it's tedious. Second, if one finds an error at some point, then one has to remember the programs in which the function is used and go correct those versions. For these and other reasons, useful function definitions are collected in modules, which are loaded by the programs that need them.

Python makes no distinction between its standard library module and other modules; the standard library simply comes with the interpreter. The mechanisms for importing from modules are the same, no matter where they come from.

A module is a file that contains definitions (variables, functions, and so on). The name of the file must end in .py; the string before the '.' makes up the module name.

The command import numpy, for example, looks for a file called numpy.py and executes it, making the resulting definitions available to the importing program.

Note that Python does not search the whole file system for modules, which would be inefficient and error-prone. Instead, it uses a strategy similar to that used by the operating system (here I'm referring specifically to Unix-based systems) to locate programs: an environment variable called PYTHONPATH can be set before running Python. Its value is a sequence of directory names separated by colons. Python looks for modules in those directories in the order in which they occur in the string. If a module is not found there, then Python tries its standard library. (Due to this, you can override functions in Python's standard library. This is a benefit and a danger.)

Some of the module names that have been used before contain one or several dots. These are modules contained in packages.

A package is a module that contains other modules. Packages are used to structure libraries and to prevent name clashes between libraries written by different people.

So far we've used the numpy and gzip packages which contain the modules random and GzipFile and others. And these modules are themselves packages. GzipFile, for example, contains the subpackage filename, which has the function fget(). So the complete name for the latter is gzip.GzipFile.fget(). Similarly, numpy.random.random() is the full name for the random() function, which one uses to get uniformly distributed values.

A full description of the package numpy and its modules can be found in its online pages. Browsing this will give an idea of how large packages are structured.

Programs with Parameters, Scripting

Once you've developed a useful Python program, you may also want to run it like any other program that takes parameters from the shell command line. That is, you might want to run a program as
python AProgram.py a b c

To do this one needs a way to access the command line parameters. The module sys contains a variable argv whose value is a list of all parameters, starting with the name of the Python program.

Here's a program stored in the file AProgram.py that illustrates this:
import sys

print(sys.argv)

for arg in sys.argv:
    print(arg)

So (in iPython) when running AProgram, sys.argv contains ['AProgram.py', 'a', 'b', 'c']:
In [128]: run AProgram.py a b c   
['AProgram.py', 'a', 'b', 'c']
AProgram.py
a
b
c

On Unix-based systems, you can make Python programs directly executable; that is, without having to specify python on the command line. This requires two steps.

First, add
#!/usr/bin/python

as the first line of AProgram.py. That is, the program file now looks like:
#!/usr/bin/python
import sys

print(sys.argv)

for arg in sys.argv:
    print(arg)

You must appropriately change the path /usr/bin/python, if Python is installed somewhere else on your system.

For example, I installed Python on Mac OS X using Enthought Python Distribution (EPD) which puts it in /Library/Frameworks/Python.framework/Versions/Current/bin/python.

Second, make the file executable with the Unix utility chmod.
104}ls -lat AProgram.py 
-rw-r--r--   1 chaos  chaos  78 Apr 18 22:41 AProgram.py
105}chmod 755 AProgram.py 
106}ls -lat AProgram.py
-rwxr-xr-x   1 chaos  chaos  78 Apr 18 22:41 AProgram.py

Then you're ready to run it, as if it is a command-line command:
107}AProgram.py a b c
['AProgram.py', 'a', 'b', 'c']
AProgram.py
a
b
c

Such programs are often referred to as scripts.

(Windows: Let me know how to do this on Windows and I'll add the steps here.)

Commenting Python Code

The first line you inserted above in AProgram.py is ignored by Python. It is a comment, which is any line that starts with #.

You should use comment lines to explain steps in your code. This is most often a favor to yourself, who will return after some time, having forgotten your coding strategies. It's also a favor to others, who don't have a clue as to what your intentions were.

Note that when # appears on the first line, it is recognized by the Unix/Linux shells as an indicator of the program specified in the line that is to be run to execute the file.


Table of Contents