High Performance Python Notes
My notes for the book "High Performance Python"
Making the code faster requires us to first understand where are the bottlenecks in our program. By finding out pieces of code which are taking the longest time we can narrow down our focus to those and save ourselves from spending our time and energy on parts of programme which were already optimized for performance.
Let's say we want to calculate how much time a function foo
in our program takes to run. We can do this using time
module and calculating the time taken wherever the function is called
import time
start = time.time() ## Noting the start time
foo(*args, **kwargs) ## calling the function
end = time.time() ## noting the end time
print(f'The function {foo.__name__} took {end-start} seconds to run')
This approach however requires us to write the code for calculating the time taken by function everywhere the function is called. If the function is called numerous times, this approach can clutter our program.
A better approach would be to use decorator
def timer_func(func):
def time_measurer(*args, **kwargs):
start = time.time()
reult = func(*args, **kwargs)
end = time.time()
print(f'The function {func.__name__} took {end-start} seconds to run')
return result
return time_measurer
Now we only need to "decorate" the function as follows
@timer_func
def foo(*args, **kwargs):
...
The above code snippet is just a fancy way of saying foo = timer_func(foo)
. With this approach, we only need to write the code for calculating the time taken once and then using a decorator we can convert foo
into a function that prints the time taken and returns the result. Moreover, we can time any function using this decorator
But there's one problem with this approach. Whenever the function foo
will be called it will print out The function time_measurer took 10 seconds to run
. This is because timer_func
returns a function named "time_measurer". We can circumvent this issue by a small fix.
from functools import wraps
def timer_func(func):
@wraps(func)
def time_measurer(*args, **kwargs):
start = time.time()
reult = func(*args, **kwargs)
end = time.time()
print(f'The function {func.__name__} took {end-start} seconds to run')
return result
return time_measurer
wraps
decorator forces the function time_measurer
to have the same attributes as that of func
.
In Jupyter notebooks we can use magic %timeit
for timing the function. This will return mean and standard deviation of run time of several calls to the function
import julia_set
%timeit julia_set.calc_pure_python(desired_width = 1000, max_iterations = 300)
timeit
can also be run from command line as:
! python -m timeit -n 5 -r 2 -s "import julia_set" "julia_set.calc_pure_python(desired_width = 1000, max_iterations = 300)"
timeit
using magic command return mean of all the runs while while running it from command line displays the time of the best run
cProfile is the build in profiling tool in the standard library. Using this module gives more detailed information at the cost of greater overhead. It can be used from command line as below
!python -m cProfile -s cumulative julia_set.py
To get more control over the results of cProfile we can write the results into a statistics file as below:
! python -m cProfile -o profile.stats julia_set.py
The above line of code writes the results of cProfile in a file named profile.stats
. We can analyze this file in a seperate programme using the pstats
module
import pstats
p = pstats.Stats("profile.stats")
p.sort_stats("cumulative")
The last line of code in the cell above sorted the functions according to the cumulative time taken by them
p.print_stats()
Similarly, we can sort according to total time taken as follows:
p.sort_stats("tottime")
p.print_stats()
To get a sense of which function was called by which, or what are the functions being profiled we can use print_callers()
method:
p.print_callers()
To print which function called which other functions i.e flipping the information in previous output cell, we can use p.print_callees()
p.print_callees()
We can use snakeviz
visulaiser to visualize the outputs of cProfile profiler
!pip install snakeviz
! snakeviz profile.stats
The above line of produces a graphic as shown below:
Above profiling methods only give information at the function level. That's a good starting point. Once we have figured out the functions which are taking up the most time we can profile them further line by line using the line_profiler
tool by Robert Kern
!pip install line_profiler
Before running the relevant line_profiler commands from the command line we have to decorate the functions we are interested in by @profile
decorator. The kernprof
script is used to execute the code. The argument -l
specifies that we want line-by-line profiling instead of function level. The -v
argument stand for verbose output. Without -v
you receive an .lprof
output which can be later analyzed by line-profiler
module.
! kernprof -l -v julia_set.py
By looking at the %Time
column we can see that the while loop in function calc_pure_python
takes 39.5 % of the total time it takes to run the whole programme. More specifically the line while abs(z) < 2 and n < maxiter
takes up 39.5% of the time. We can't figure out which part of either side of and
takes the most time yet. Also we can see that an operation as simple as n += 1
takes up 26% of the total time. This is because of the python's dynamic lookup machinery. Every time we encounter that line in the loop python checks whether the n
object has an __add__
function. (If it didn't, it'll walk up any inherited classes to see whether any of them has it.), then integer 1 is passed to the __add__
function to it's thing. To reduce the cost of this rather simple operation, type specialization is the way to go (which we'll encounter later).
Now getting back to that time consuming while loop; we want to figure out which one of abs(z) < 2
and n < maxiter
took greater time. Whichever takes less time and is False more often should be put on the left side of and
because if first argument of an and
statement is False it doesn't even need to look at the second argument.
We can breakup the two statements on either send of and
into seperate lines and then run a line profiler as below:
statement1 = abs(z) < 2
statement2 = n < maxiter
while statement1 and statement2:
...
! kernprof -l -v julia_set.py
It looks as if checking abs(z) < 2
takes more time than checking n < maxiter
. Now we need to know which of these two statements is False
more often. n < maxiter
will be false on 1 of every 301 calls since maxiter = 300
in the programme.