f2py-jit#
f2py-jit
builds efficient Fortran extensions for Python at run time. It extends the machinery of f2py to provide
Compilation of Fortran source blocks as Python strings
Caching of module builds across executions
Support for Fortran derived types via f90wrap
Inlining to improve performance (experimental)
Quick start#
There are two equivalent ways to use f2py-jit
.
Start from a Fortran code.f90
with open('code.f90', 'w') as fh:
fh.write("""
subroutine hello()
print*, "Hello world!"
end subroutine hello
""")
Compile the code, import it and execute it
from f2py_jit import jit
f90 = jit('code.f90')
f90.hello()
Hello world!
You can do the same using a Python string as source block
source = """
subroutine hello()
print*, "Hello world!"
end subroutine hello
"""
f90 = jit(source)
f90.hello()
Hello world!
Usage#
Passing arguments#
Thanks to f2py, interfacing Python and Fortran 90 is pretty smooth and works mostly fine with numpy arrays - as long as you follow some basic rules.
On the Fortran side:
declare
intent
’s of all the arguments of your Fortran subroutinesdeclare your floating point variables as
real(8)
ordouble precision
, unless you know how to handles typesavoid using Fortran functions, use subroutines with
intent(out)
variables as last arguments instead - they will be returned as result
On the Python side:
use
dtype=numpy.int32
for integer arrays, unless you useinteger(8)
on the Fortran sidein-place modification of scalar variables, i.e.
intent(inout)
on the Fortran side, requires passing a 0-dimensional array (ex.numpy.array(1.0)
)shape multi-dimensional numpy arrays using Fortran layout: for instance,
numpy.ndarray((3, 10), order
'F')
will give the right contiguity in memory layout
Multiple files#
If you have different Fortran routines in different files, just pass them to jit()
as a list. Internally, they will be merged into a single source file and the module will be built from it.
Here is a simple example. The file back.f90
contains a module
module back
contains
subroutine mysum(x, y, z)
double precision, intent(in) :: x, y
double precision, intent(out) :: z
z = x + y
end subroutine mysum
end module back
which is used by another module in front.f90
module front
use back
contains
subroutine main()
double precision :: x, y, z
x = 1; y = 1
call mysum(x, y, z)
end subroutine main
end module front
This will build a single module from the two source files and execute the main function
from f2py_jit import jit
f90 = jit(['back.f90', 'front.f90'])
f90.front.main()
Note
When developing a package, the Fortran source file will be stored in the package directory itself. In this case, you must provide the absolute path of the source file to jit()
, see the snippet below how to do that. This is necessary also when executing a script from another directory than the one where the script resides.
Assuming the Fortran source file is in the same directory as the Python code, this is the safest way to call jit()
import os
pwd = os.path.dirname(__file__)
jit(os.path.join(pwd, 'code.f90'))
Derived types#
f2py-jit
supports Fortran derived types thanks to f90wrap.
This simple example shows how Fortran types are mapped to Python classes. Note the constructor syntax convention new_array
for a type named array
module types
type array
double precision, allocatable :: x(:)
end type array
contains
subroutine new_array(this, n)
type(array), intent(inout) :: this
integer, intent(in), optional :: n
if (present(n)) allocate(this % x(n))
end subroutine new_array
end module types
We create and modify an array
object from Python.
from f2py_jit import jit
f90 = jit('type_allocatable.f90')
t = f90.types.array(3)
t.x[:] = 0
print(t.x)
[0. 0. 0.]
We can store references to Python objects in components of Fortran types
via pointers
. This is useful, for instance, if you have already data stored as numpy arrays and you want to modify them in-place in either Fortran or Python code.
module types
type array
double precision, pointer :: x(:) => null()
end type array
contains
subroutine new_array(this, x)
type(array), intent(inout) :: this
double precision, intent(in), target, optional :: x(:)
if (present(x)) this % x => x
end subroutine new_array
end module types
Note how x
tracks the modification to the Fortran type component.
import numpy
from f2py_jit import jit
f90 = jit('type_pointer.f90')
x = numpy.ones(3)
t = f90.types.array(x)
t.x[:] -= 1
print('They should be all zeros:', x)
They should be all zeros: [0. 0. 0.]
Check out the f90wrap usage for more details.
Performance#
Module builds cache#
Module builds are cached by default: if we call jit()
and the module associated to the source or source files has been built already, no recompilation is attempted.
from f2py_jit import jit
jit('code.f90') # First compilation
jit('code.f90') # Use build cache
This applies both within a Python session and across sessions. This is useful to limit compilation overhead if you are running the same script many times, both in terms of CPU time and disk space.
The cache is stored in the .f2py-jit/
folder in the directory where the Python script is executed. You can change the cache path by changing the variable f2py_jit.modules_dir
, for instance
import os
modules_dir = os.path.expanduser('~/.cache/f2py-jit')
To clear the cache use f2py_jit.clear_modules()
or simply remove the directory.
Inlining#
If the Fortran source contains multiple subroutines calling one another, f2py
will not perform interprocedural optimizations - at least not in my experience with gfortran
. This can lead to significant performance loss when wrapping Fortran modules. f2py_jit
can inline the source code before compiling it and you should get a performance boost. This feature is still experimental and may not work 100%: check your results!
As a benchmark we use the following module, in which main
routine repeatedly calls kernel
.
module test
contains
subroutine kernel(n, x, y)
integer, intent(in) :: n
real(8), intent(in) :: x
real(8), intent(out) :: y
integer :: m
m = n * 2
y = log(exp(-x**m))
end subroutine kernel
subroutine main(n, z)
integer, intent(in) :: n
real(8), intent(out) :: z
real(8) :: x, y
integer :: i, k=4
z = 0
do i = 1, n
do j = 1, n
call kernel(k, x, y)
z = z + y
end do
end do
end subroutine main
end module test
Let’s time the execution of main
inside the test
module, with and without inlining of the kernel
routine
import timeit
from f2py_jit import jit, inline
module_inlined = jit('bench.f90', inline=True)
module_not_inlined = jit('bench.f90')
print('Not inlined : ', timeit.timeit('module_not_inlined.test.main(10000)', number=3, globals=globals()))
print('Inlined : ', timeit.timeit('module_inlined.test.main(10000)', number=3, globals=globals()))
Not inlined : 4.6051065729989205
Inlined : 0.3709742070059292
The inlined code is remarkably faster.
Note that inlining is enabled by default in gfortran
at -O3
, which is the default optimization option used by f2py
. So what is going wrong? The thing is that f2py
compiles with the -fPIC
option, which prevents inlining. One can see this by compiling the code from the terminal
gfortran -S -cpp -O3 -fPIC bench.f90; grep kernel bench.s | grep call # you'll see a function call
gfortran -S -cpp -O3 bench.f90; grep kernel bench.s | grep call # this will return nothing
Compiler flags#
You can specify optimization flags for the compiler when building the Fortran module. This allows you to fine tune the compilation flags at run time and squeeze the most out of your code. This would not be easy to do (although possible) when installing a Fortran extension statically using f2py
.
Here is a simple example
import timeit
from f2py_jit import jit
source = """
subroutine series(n, s)
integer, intent(in) :: n
real(8), intent(out) :: s
s = 0.0
do i = 1,n
s = s + exp(1.0/(real(i))**2)
end do
end subroutine
"""
# Try out a few gfortran flags combos
n = int(1e8)
for flags in ['-O0', '-O1', '-O3', '-O3 -ffast-math']:
f90 = jit(source, flags=flags)
time = timeit.timeit('f90.series(n)', number=3, globals=globals())
print(f'{flags:16}: {time:.4f}')
-O0 : 5.5776
-O1 : 1.6290
-O3 : 1.5598
-O3 -ffast-math : 0.7351
Notice the gap between -O0
and -O3
and the extra boost coming from less accurate exponential evaluation when using -ffast-math
, which is not set by default by f2py
.
Comparison with numba#
Numba is a popular Python package to compile Python code just-in-time. Python functions decorated with numba.jit
will generally run faster than pure Python.
Summing a series#
Here we compare the performance of numba
and f2py-jit
on a simple example (single loop)
from numba import jit
@jit(nopython=True)
def series_numba(n):
s = 0.0
for i in range(1, n+1):
s = s + 1.0/float(i)**2
return s
This is the equivalent code with f2py-jit
source = """
subroutine series(n, s)
integer, intent(in) :: n
real(8), intent(out) :: s
s = 0.0
do i = 1,n
s = s + 1.0/(real(i))**2
end do
end subroutine series
"""
from f2py_jit import jit
f90 = jit(source, flags='-O0')
f90 = jit(source)
series_f90 = f90.series
The timings are comparable in this simple example.
import timeit
n = int(1e8)
print('Numba:', timeit.timeit('series_numba(n)', number=3, globals=globals()))
print('F90:', timeit.timeit('series_f90(n)', number=3, globals=globals()))
Numba: 0.37357314999826485
F90: 0.3555460839997977
N-body calculation#
Now we compare the performance of numba
and f2py-jit
on a minimal N-body energy calculation (double loop).
This is the numba code
from numba import jit
@jit(nopython=True)
def potential(r, sigma, epsilon, n):
return epsilon * (sigma/r)**n
import math
@jit(nopython=True)
def interaction(pos, sigma, epsilon, n):
N = pos.shape[0]
U = 0.0
for i in range(N):
for j in range(i+1, N):
r = pos[i, :] - pos[j, :]
# rij = numpy.sqrt(numpy.sum(r**2)) # this is slower
rij = numpy.sum(r**2)**0.5
if rij > 0.6:
U += potential(rij, sigma, epsilon, n)
U /= N
return U
func_numba = interaction
This is the equivalent Fortran code, called nbody.f90
subroutine potential(r, sigma, epsilon, n, u)
double precision, intent(in) :: r, sigma, epsilon
integer, intent(in) :: n
double precision, intent(out) :: u
u = epsilon * (sigma/r)**n
end subroutine potential
subroutine interaction(pos, sigma, epsilon, n, U)
double precision, intent(in) :: pos(:,:), sigma, epsilon
integer, intent(in) :: n
double precision, intent(out) :: U
double precision :: rij, r(size(pos,1)), rcut, uij
U = 0.0
do i = 1,size(pos, 2)
do j = i+1,size(pos, 2)
r = pos(:, i) - pos(:, j)
rij = sum(r**2)**0.5
if (rij > 0.6) then
call potential(rij, sigma, epsilon, n, uij)
U = U + uij
end if
end do
end do
U = U / size(pos,2)
end subroutine interaction
which we now compile
from f2py_jit import jit
f90 = jit('nbody.f90')
func_f90 = f90.interaction
f90_opt = jit('nbody.f90', flags='-O3 -ffast-math')
func_f90_opt = f90_opt.interaction
Let’s get the timings
import numpy
N = 2000
L = 5.0
numpy.random.seed(1)
pos = (numpy.random.random((N, 3)) - 0.5) * L / 2
print('Numba:', timeit.timeit('func_numba(pos, sigma=1.0, epsilon=1.0, n=12)', number=3, globals=globals()))
pos = pos.transpose() # transpose the array for fortran
print('F90: ', timeit.timeit('func_f90(pos, sigma=1.0, epsilon=1.0, n=12)', number=3, globals=globals()))
print('F90 optimized: ', timeit.timeit('func_f90_opt(pos, sigma=1.0, epsilon=1.0, n=12)', number=3, globals=globals()))
Numba: 1.1736466069996823
F90: 0.23314218599989545
F90 optimized: 0.10024302700185217
In this specific example, f2py-jit
is appreciably faster than numba
. Note that inlining the code would not give an appreciable speedup here.
API reference#
f2py-jit
module#
Create python modules from Fortran code just-in-time.
Two main steps: first build, then import the module. This can be done in a single step using the jit() function.
The modules_dir directory contains the compiled modules and is created in the current path (cwd).
- f2py_jit.f2py_jit.available_modules()#
Return a list of available modules
- f2py_jit.f2py_jit.build_module(source, metadata=None, extra_args='', verbose=False, inline=False)#
Build a Fortran module from source and return its unique uid, which can be used to import the module with import_module()
- Parameters:
source (str) – Fortran source of module or subroutine to compile
metadata (dict) – Metadata to identify the module (Default value = None)
extra_args (str) – Command line arguments passed to f2py (Default value = ‘’)
verbose (bool) – (Default value = False)
inline (bool) – Inline source code (Default value = False)
- Returns:
uid – The unique module uid
- Return type:
str
- f2py_jit.f2py_jit.clear_modules()#
Clean modules from cache directory
- f2py_jit.f2py_jit.compile_module(source, name, extra_args='', verbose=True, quiet=False, source_fn=None, extension='.f90')#
Build extension module from a Fortran source string with f2py.
- Parameters:
source (str or bytes) – Fortran source of module / subroutine to compile
name (str, optional) – The name of the compiled python module
extra_args (str or list, optional) – Additional parameters passed to f2py (Default value = ‘’)
verbose (bool, optional) – Print f2py output to screen (Default value = True)
source_fn (str, optional) – Name of the file where the fortran source is written. The default is to use a temporary file with the extension provided by the extension parameter
extension ({'.f', '.f90'}, optional) – Filename extension if source_fn is not provided. The extension tells which fortran standard is used. The default is .f, which implies F77 standard.
quiet – (Default value = False)
- f2py_jit.f2py_jit.import_module(path, quiet=False)#
- Parameters:
name –
quiet – (Default value = False)
- f2py_jit.f2py_jit.jit(source, flags='', extra_args='', verbose=False, inline=False, skip='', only='')#
Single-step just-in-time build and import of Fortran source code, which can be either a path or a string with f90 code
- Parameters:
source –
flags (str) – (Default value = ‘’)
extra_args (str) – (Default value = ‘’)
verbose (bool) – (Default value = False)
inline (bool) – (Default value = False)
- Returns:
f90
- Return type:
module
finline
module#
Inline fortran 90 code
- f2py_jit.finline.inline(source, ignore=None)#
Multiprocedural optimizations (inlining) in source f90 code.
Accept a path or a f90 cource code (as string) as input and return the inlined source code.
- f2py_jit.finline.inline_source(source, ignore=None)#
Multiprocedural optimizations (inlining) in source f90 code.
Accept a path or a f90 cource code (as string) as input and return the inlined source code.