An overview of Python#
Compiled vs. interpreted languages#
From https://en.wikipedia.org/wiki/Compiler:
In computing, a compiler is a computer program that translates
computer code written in one programming language (the source
language) into another language (the target language).
From https://en.wikipedia.org/wiki/Interpreter_(computing):
In computer science, an interpreter is a computer program that
directly executes instructions written in a programming or scripting
language, without requiring them previously to have been compiled into
a machine language program.
Compiled |
Interpreted |
|
---|---|---|
Advantages |
Fast execution |
Dynamic types; platform independence |
Disadvantages |
Slow testing; platform dependence |
Translation overhead at run time |
Actually, Python code is typically first translated to bytecode, then interpreted. But it can also be compiled “just-in-time” (for instance with pypy, which claims to be on average ~5 times faster than the official CPython
distribution).
The Python ecosystem#
Built upon:
Python standard library: specification of the core language features and modules
CPython: an official distribution that implements the standard library (typically installed in
/usr/lib/python3/dist-packages/)
Python distributions alternative to the official one (ex. pypy, anaconda)
Python Package Index (pypi): a large, official repository of third-party modules and packages
Package managers: software to install packages and handle their dependencies (ex. pip, poetry)
Virtual environments: isolated, local installations of Python distributions (ex. venv, conda)
Useful links#
Python tutorials proliferates on the internet. There are, however, a few authoritative sources of information:
The official documentation available at https://docs.python.org/3/library/
The integrated documentation accessible from the interactive python and available in any IDE, notebook etc.
In addition
Stack overflow provides a wealth of information on both general and very specific issues about the Python language
The hitchhiker’s guide to python by Kenneth Reisz is an excellent, general-purpose guide to Python
Real Python tutorials are pretty good, too
The w3school tutorial is quite extensive and code snippets can be executed live
A matter of style#
The golden rules to write top-quality Python code are crystallized in PEP-8
"One of Guido's key insights is that code is read much more often than
it is written. The guidelines provided here are intended to improve
the readability of code and make it consistent across the wide
spectrum of Python code. As PEP 20 says, Readability counts."
While Python tolerates ad-hoc custom conventions on spacings, name conventions etc, the PEP-8 provides the authoritative reference to write good Python code. There are package and tools that help you clean up your code automatically, such as flake8
, autopep8
, black
…
A matter of philosophy#
The philosophy of Python coding is beautifully described by the Zen of Python
(PEP-20), which you should deeply ponder every night before going to sleep
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one -and preferably only one- obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea - let's do more of those!
Command line interface (CLI)#
When a Python code has to be executed from the command line, it is good practice to encapsulate an entry point in a main()
function.
def main(verbose=False):
if verbose:
return 'Hello world!'
The main()
function will be called only when the code is executed from the command line
if __name__ == 'main':
main()
To parse parameters on the command line and pass them to main()
you can use the sys.argv
list, however… there is a much better alternative! If you want to build a command line interface for your Python codes, look no further than the argh package. Use the dispatch_command()
function to create a CLI for your main()
function:
def main(verbose=False):
"""Say Hello to the world"""
if verbose:
return 'Hello world!'
if __name__ == '__main__':
from argh import dispatch_command
dispatch_command(main)
Say we saved the script in /tmp/main.py
. If you execute it from the command line, you will now get a nice help message that explains how to pass parameters to your Python script
python /tmp/main.py --help
usage: main.py [-h] [-v]
Say Hello to the world
optional arguments:
-h, --help show this help message and exit
-v, --verbose False
The dark side#
A highly recommended reading: Facts and myths about Python names and values
Names are bindings to objects#
Variables in Python behave differently then in statically-types languages like C or Fortran. See this example:
x = 1
print(x, type(x))
x = 1.0
print(x, type(x))
x = 'hello world'
print(x, type(x))
1 <class 'int'>
1.0 <class 'float'>
hello world <class 'str'>
It looks like x
is
born as an integer
later becomes a floating point number
and finally a string
What is x
, actually? x
is a name, which first refers to object 1
, then object 1.0
and finally an object 'hello world'
. Almost everything in Python is an “object” and variables (“names”) are just references (“bindings”) to them. To sum up: names are bindings to objects.
Get the identifier of the object assigned to x
with the id()
function
x = 1
id(x)
9793056
Now increment x
by one
x = x + 1
id(x)
9793088
What happened? x
got assigned to a new object (2
) with a different id. We did not modify an integer variable, we just created a new one and assigned x
to it! This is because in Python integers are “immutable” objects.
Mutable and immutable objects#
An object is mutable if its “state” (in terms of its “instance variables”) can be changed and immutable otherwise. Immutable objects can only be reassigned, not modified. Integers, floats and strings are examples of immutable objects: you cannot change them, only create new ones if they do not exist yet.
A classic example of mutable vs. immutable data structures in Python are lists and tuples.
Lists can be modified in-place using the [...]
syntax (they are “subscriptable”)
mutable = [0, 1, 2]
mutable[0] = "x"
mutable
['x', 1, 2]
But tuples are immutable so you’ll get an error
not_mutable = (0, 1, 2) # A tuple
not_mutable[0] = "x"
Traceback (most recent call last):
not_mutable[0] = "x"
TypeError: 'tuple' object does not support item assignment
Numpy arrays are mutable and subscriptable too: it is crucial, however, to use the [...]
syntax for in-place modifications
x = numpy.array([0, 1, 2])
x[0] = 1 # subscriptable
print(x, id(x))
x += 1 # in-place modification
print(x, id(x))
x = x + 1 # reassignement
print(x, id(x))
[1 1 2] 139648897711248
[2 2 3] 139648897711248
[3 3 4] 139648793861936
There are actually mutable integers and floats: numpy
0-arrays. They will be useful when interfacing Python with Fortran (see below).
A 0-array can be declared like this
import numpy
x = numpy.array(1)
x, type(x), id(x)
1 <class 'numpy.ndarray'> 139648897839344
Increment it in-place using the +=
syntax (no reassigment)
x += 1
x, type(x), id(x)
2 <class 'numpy.ndarray'> 139648897839344
Note
The shape of a 0-array is an empty tuple.
Non-locality#
OK, this can be tricky. Say we have two variables
x = 1
y = x
we understand that y
is assigned to the same object as x
id(x) == id(y), x is y
(True, True)
However, when we increment x
we reassign it and y
still binds to 1
x += 1
x, y
(2, 1)
But if we do the same thing with a 0-array, both x
and y
“gets modified”
import numpy
x = numpy.array(1)
y = x
x += 1
x, y
(array(2), array(2))
Of course, that’s because the modification is done in-place and both x
and y
are assigned to the same object. To check if two arrays share some data, use
numpy.shares_memory(x, y)
True
This behavior holds in general for mutable objects and can be quite confusing, in particular when passing arguments to functions or when complex objects share data.
Copies#
Since mutable objects can lead to “non-local” effects, which are sometimes undesirable, it is natural to ask how can we make copies of a mutable object. Just use the built-in copy
module for that.
import copy
x = [0]
y = copy.copy(x)
x is not y
True
What happens if you try to make a copy of an *immutable* object?
When we create a new object with the copy
module, there are actually two possible depth “levels”:
shallow: only the “first neighbor” references to other objects are copied
deep: all referenced objects are recursively copied
The copy.copy()
function only does a shallow copy. If we want to be sure to get a full copy of the object, we must use copy.deepcopy()
instead.
Let’s see an example
import copy
x = [0]
y = [x]
z = copy.copy(y)
Now z
is a reference to a newly created list, but if we add an element to x
, z
itself will be modified.
x.append(1)
z
[[0, 1]]
Compare with copy.deepcopy()
import copy
x = [0]
y = [x]
z = copy.deepcopy(y)
x.append(1)
z
[[0]]
Numerical precision of numpy arrays#
Bear in mind that there are limitations with the numerical precision of numpy arrays. The behavior of numpy
may not be the one you expect esepcially if you need precision higher than double
(64 bits).
See for instance:
the numpy doc on precision issues
this thread of reddit (note the comment by the OP professor: “Python is a ‘toy language’ due to inaccuracy”)
it seems impossible to just set single precision by default
Numpy arrays can be slower than lists#
Lists are very powerful and flexible data structures, but they are very slow for number crunching calculations: accessing the entries of a list is pretty inefficient. Numpy arrays are generally faster… but this is not always the case. Let’s see why.
Let’s do some simple linear algebra on the elements on two 5000-elements lists
N = 5000
x = [1.0] * N
y = [2.0] * N
z = [0.0] * N
a = 1.0
# Repeat the inner loop many times and time the execution
import time
t_i = time.time()
for _ in range(1000):
for i in range(N):
z[i] = x[i] + a * y[i]
t_f = time.time()
print('Elapsed time: {:.2f} s'.format(t_f - t_i))
Elapsed time: 0.94 s
We have repeated the inner loop many times to increase the computational time.
Let us do the same with the numpy
arrays, transforming the lists x
, y
, z
into numpy arrays
x = numpy.array([1.0] * N)
y = numpy.array([2.0] * N)
z = numpy.array([0.0] * N)
t_i = time.time()
for _ in range(1000):
for i in range(N):
z[i] = x[i] + a * y[i]
t_f = time.time()
print('Elapsed time: {:.2f} s'.format(t_f - t_i))
Elapsed time: 1.98 s
As you noticed, things have not improved at all, quite the opposite actually! The problem is that numpy arrays are only efficient when operations are done on arrays as a whole (or on arrays slices), not element-wise. Let’s try this way:
t_i = time.time()
for _ in range(1000):
z[:] = x[:] + a * y[:]
t_f = time.time()
print('Elapsed time: {:.2f} s'.format(t_f - t_i))
Elapsed time: 0.01 s
This time, it is about two orders of magnitude faster! Notice how we have used a “vector syntax” to express the fact that the linear combination of the two arrays can be performed element-wise. Fortran programmers will recognize the familiar syntax for array operation (which, indeed, was introduced in Fortran years before numpy
was born…).
We can also use the following syntax to get the same result.
import numpy
for _ in range(1000):
z = x + a * y
Notice, however, that in the code above z
will be reassigned at every loop iteration.
It is also possible to operate on array subsections using a powerful slicing syntax such as
# Copy the first 10 elements
z[0: 10] = a * y[0: 10]
# Can you guess what this does?
z[0: 10: 2] = a * y[-1: -10: -2]
The numpy
package provides general n-dimensional arrays to do efficient linear algebra on matrices of arbitrary dimensions. Let us define a 2x3 matrix
import numpy
N, M = 2, 3
# Matrix, elements are uninitialized
x = numpy.ndarray((N, M))
print(x.shape)
print(x)
# Matrix, all elements set to zero
x = numpy.zeros((N, M))
print(x)
(2, 3)
[[6.77781734e-310 5.37326426e-310 2.21576943e+214]
[8.96368245e+276 2.31261087e-152 2.25563599e-153]]
[[0. 0. 0.]
[0. 0. 0.]]
The shape
attribute is a tuple that specifies the size of the array along each dimension.
The numpy
package implements several operations on arrays and matrices, such as transpose, dot and outer product and many others. An even more complete environment for linear algebra is provided by the scipy
package.