Part F: Objects, Classes, and Error Handling

This Programming Lab introduces several key constructs in Python that allow one to manage the complexity of moderate to large programs.

Data types

A data type is a structured collection of data with an associated set of operations. For example, an integer is a collection of contiguous bits with arithmetic operations, such as subtraction and negation; a vector is a list of three floating-point numbers with operations such as addition, cross product, and normalization.

Successful program design and tool-building often turn on identifying data types that are appropriate to a given problem. There is no fixed approach to do this, but experience and even writing preliminary, draft programs help to show one the key organizational structures for a problem.

For example, what collections of basic data types keep reappearing together in different parts of a program? Does this or that collection have a natural interpretation? Perhaps, if so, it may be a candidate for organizing into a new data type.

In other circumstances, say those for which you're working within an established mathematical framework, the formalism may already indicate what the key data types are (complex numbers, quaternions, tangent spaces, submanifolds, dynamical systems, ...). Consider developing data types that support working with the entities within the available mathematical framework.

Simple goals for data types are that they should help you understand the programming task and help you document and maintain the code your developing. Proper selection and design of data types should save you work—both programming work and mental work.

Making data types

Python allows one to define new data types—types that can do anything that the built-in types can do. From the user's viewpoint there is no difference. In fact, many of the data types used in these tutorials are defined as data types that supplement the Python language. These included vectors, text files, interpolating functions, and the like. It is also possible to define new data types in a lower level language like C. This is often done for computational speed or efficiency. The array data type from the NumPy package is an example of this.

Python provides a set of standard operations that any data type can use, if and when they are appropriate to the type. These operations include arithmetic operators, indexing, function call, and so on. In addition, data types can include new operations defined as methods. Methods act like functions, but they are attached to and depend on their data object. You have already seen many examples: append() is a method defined for lists, as in

In [1]: l = [1, 2, 3] In [2]: l.append(4) In [3]: l Out[3]: [1, 2, 3, 4]

and close() is a method defined for files.

Again, there are many reasons for defining new data types. In addition to those already mentioned above, they help to keep programs readable—it is much clearer to write

(a+b) / 2

for two vectors a and b than to write something like

vector_scale(vector_add(a, b), 0.5)

as many programming languages require.

Artful definition of new data types helps large projects maintain modularity. Specifically, type definitions reduce the inter-dependence between different parts of a program. A program that needs to multiply two vectors should not have to know how vectors store their data—the data could be stored as a list, a tuple, an array, or three separate variables. If and when the storage method for vectors is changed for some reason, other modules that use vectors not be affected.

Modularity also helps with debugging large programming projects by isolating the sources of error.

Finally, in the research domain, data types can be designed to mimic the theoretical framework or formalism in which one is working. In addition to the benefits noted above, a Python program with an appropriate set of data types is a kind of documentation of the structure of the theory. This provides conceptual regularity that can be very helpful when programming complex concepts.

Object-oriented programming

It is possible to write a program exclusively in terms of defining the appropriate data types, ranging from low-level general-purpose data types, such as vectors, to high-level data types describing application-specific objects—such as force fields, wavefunctions, or even dynamical systems. This approach is known as object-oriented programming (OOP).

Experience has shown that object-oriented programming is generally better than the traditional procedural style. Procedural programming, which we have been using so far, structures the code according to functions and subroutines. Object-oriented programming generally results in code that is easier to understand and easier to extend and modify. This comes from a greater independence between the different parts of the overall program. Python supports most of the techniques that are commonly used in object-oriented programming.

As hinted at above, the most difficult part of writing large object-oriented software systems is deciding on what the appropriate data types should be. As a rule of thumb, data types representing mathematical entities, such as arrays or functions or vector fields, and data types representing physical entities, such as molecules or vehicles, are a good choice. More abstract data types are equally important, however. Examples would be data types that represent common data structures, such as lists or queues, or common algorithms, such as a Fourier transform or Runge-Kutta integration.

Object-oriented design is best learned by experience; whenever you write a nontrivial program (more than a few lines), consider doing it in an object-oriented way.

Simulation Tools: Architecture

You will see that developing dynamical system simulators falls very nicely into the object-oriented design paradigm. We can think of the overall structure of exploratory simulators using a laboratory metaphor.

We have the (dynamical) system that we wish to study and a simulation engine that generates its evolution. This is the experiment.
We have instruments that control and measure various aspects of the system.
And we have analysis tools that either display or otherwise produce results that tell us something about the system's behavior and structure.

Each one of these three components can, and should, be represented by different kinds of object. Keeping the different functions represented by each separate will keep your dynamical systems simulator modular, flexible, and easy to debug.

Classes

So much for OOP theory and motivations. How do we implement this in Python?

A definition of a new data type or object is called a class. A class defines a type's data components and their structure and also all of the operations appropriate to the data type.

The latter are defined as class methods. Standard operations, such as arithmetic operators, are mapped to methods with special names.

A data type also defines the initial values of its data components, so that when a new object is created the instance of the data type starts off properly initialized.

The following example shows a part of the definition of the class Vector. Only initialization, addition, and length calculation are shown explicitly. And some operations are less general than they would be in a complete class specification.

import numpy class Vector: def __init__(self, x, y, z): self.array = numpy.array([x,y,z]) def __add__(self, other): sum = self.array+other.array return Vector(sum[0], sum[1], sum[2]) def length(self): return numpy.sqrt(numpy.add.reduce(self.array*self.array))

Methods are defined like functions and behave much like functions. However, their first argument self has a special meaning: it stands for the object on which the method is called. It is convention to call the first argument self; you could use any other name instead.

If v is a Vector, then in the method call v.length(), the variable self gets the value of v.

The methods whose names begin and end with a double underscore (__name__) have a special meaning. Typically they are not called directly, although they can be.

The most important special method is __init__, which is called immediately after an object has been created. The expression v = Vector(1., 0., 2.) creates a new vector object v and then calls v's method __init__. The latter stores the three coordinates—1. and 0. and 2.—in an array that is assigned to a local variable of the new object.

The arithmetic operations are also implemented as special methods.

For example, the expression a+b is equivalent to a.__add__(b).

The other arithmetic operators have similar equivalents. See the Python Language Reference or the Learning Python book for details.

There are more special methods that implement indexing, copying, and printing, for example. Only the methods that make sense must be implemented, and only if the default behavior is not sufficient. For vectors, for example, it makes sense to define a printed representation that shows the values of the coordinates. This is achieved by adding another special representation method __repr__(self):

def __repr__(self): return 'Vector(%s,%s,%s)' % (`self.array[0]`, `self.array[1]`,`self.array[2]`)

A complete module with the preceding (skeletal) vector class definition is Vector.py. Load it into iPython and try some of the operations.

In [1]: run Vector.py In [2]: Vector? Type: classobj String Form: __main__.Vector Namespace: Interactive File: /Users/chaos/Presentations/NonlinearPhysics/Software/PartF_Code/Vector.py Docstring: Constructor information: Definition: Vector(self, x, y, z) In [3]: a = Vector(1.,1.,1.) In [4]: a Out[4]: Vector(1.0,1.0,1.0) In [5]: b = Vector(1.,2.,3.) In [6]: a+b Out[6]: Vector(2.0,3.0,4.0)

Note that Vector.py includes text just below the class definition and between triple quotes. This is the docstring referred to above, when I used iPython's object inspection operator (&ldquo?”). You use this triple-quote syntax to include comments and documentation on the objects being defined and their uses.

Attributes

An object in Python can have any number of attributes. These act like variables, except that they are attached to a specific object. Variables, in contrast, are attached to modules or to functions. (In fact, variables defined in modules are nothing other than attributes of module objects.) The notation for accessing attributes is always

object.attribute

Method names are attributes too, just as function names are variables.

The NumPy array data structure is an attribute of the Vector class we just defined:

In [17]: a.array Out[17]: array([ 1., 1., 1.])

Unlike other object-oriented languages, Python does not restrict access to attributes. Any code can use and even change any attribute in any object. For example, you can run

import numpy numpy.sqrt = numpy.exp

to make sqrt() behave like exp(). Obviously, this is a very bad idea. Python will not protect you from masochism. Of course, this design-choice on Python's part leaves open the chance of accidentally changing an attribute, but in practice this is not a problem.

Python: A universe of objects

Python is a consistent language. Its world view consists of nothing but objects, names, and namespaces. All data is kept in objects, but modules, functions, classes, and methods are also objects. Objects can be assigned to names, and names reside in namespaces. Every object has an associated namespace for its attributes. In addition, functions and methods provide a temporary namespace during execution—storage for local variables.

There are a few rules that determine in which namespace a given name resides:

Definitions within functions and methods are made in the temporary execution namespace. Code in a function can also use (but not assign to) names in the surrounding module.
Definitions in modules end up in the attribute namespace of the module object.
Definitions in a class are in the attribute namespace of the class.
Finally, an object that is constructed from a class (a class instance) has its own attribute namespace, in which all assignments happen. However, when an attribute is requested that is not in this namespace, it is searched for in the class namespace. This is how methods are normally found.

Specializing and extending classes

Often several data types have something in common.

For example, one might be a specialization of another: we could introduce normalized vectors as a special kind of vector or we could have two-dimensional maps be a special case of a more general dynamical system type.

Sometimes several data types can share operations, but differ in detailed features. For example, one could define data types representing scalar and vector fields, which share the property of being defined on a grid, but have specific operations, such as gradient for scalar fields and divergence for vector fields. Both data types would be implemented as specializations of a data type called, for example, field. This type would define the common behavior, but not be used directly in programs. (This is sometimes called an abstract class).

The technique for treating specialization is called inheritance. A class can inherit methods from another class, substitute those that require modification, and add some of its own. The main advantage is avoiding redundant code, which is an important source of mistakes, a maintenance headache, and also a waste of memory.

The following code (Direction.py) defines a class representing directions in space—vectors with length one—as a specialization of the vector class just defined:

from Vector import Vector class Direction(Vector): def __init__(self, x, y, z): Vector.__init__(self, x, y, z) self.array = self.array/self.length()

The only method being redefined is initialization, which now normalizes the vector. Note that the initialization method first calls the initialization method of the class Vector and then applies the normalization.

Note: In the above class definition, we import from the module (Vector.py) the class definition (Vector). (Using similar names for the module file and the class can be a bit confusing.)

The class Direction inherits all the operations from Vector. This is indicated in the class definition line via Direction(Vector).

Here's an example:

In [1]: run Direction.py In [2]: a = Direction(4.,5.,6.) In [3]: a Out[3]: Vector(0.455842305839,0.569802882298,0.683763458758)

The last statement uses the Vector class's print function __repr__.

The inherited operations act as if their code were repeated in the new class. In particular, the sum of two directions will be a vector, not another direction. To obtain a normalized result, the method __add__ would have to be redefined as well. But since adding directions is not a very useful operation, it might not be worth the effort.

Error handling

When an error occurs, Python prints a stack trace (the nested sequence of all active functions at the time the error occurred) and stops. This is often useful, but not always. You might want to deal with errors yourself. For example, you may wish to simply print a warning, ask for user input, or do a different calculation. Python allows any code to catch specific error conditions and deal with them in whatever way necessary.

To identify an error type, Python has several built-in error objects. One is ValueError which indicates that a value is unsuitable for an operation. An example would be passing a negative number to the square root function.

Another is TypeError which indicates an unsuitable data type. This would occur when you ask for the logarithm of a character string, for example.

A program can catch a specific error object, a specific collection of errors, or any error.

The general form of error catching is

try: x = someFunction(a) anotherFunction(x) except ValueError: print "Something's wrong" else: print "No error"

The code after try: is executed first. If a ValueError occurs, then the code after except ValueError: is executed, otherwise the code after else:. This last part is optional.

To catch several error types, use a tuple, as in

except (ValueError, TypeError):

To catch all errors, use a blank except:.

To deal with several error types in different ways, use several except ...: code statements.

For additional details and uses, see the Language Reference manual or the Learning Python book.

Python programs can intentionally generate errors. This is done by using the statement

raise ErrorObject

or

raise ErrorObject, "Explanation"

to add an explanation for the user. The ErrorObject can be any predefined error object or a string.

Many modules define their own error types as strings and let other modules import them, as shown here. In one module we define AError.

# Module A AError = "Error in module A" def someFunction(x): raise AError

In a second module, we import the first and catch the error:

# Module B import A try: A.someFunction(0) except A.AError: print "Something went wrong"

Table of Contents