f2py-jit#

f2py-jit builds efficient Fortran extensions for Python at run time. It extends the machinery of f2py to provide

Compilation of Fortran source blocks as Python strings
Caching of module builds across executions
Support for Fortran derived types via f90wrap
Inlining to improve performance (experimental)

Quick start#

There are two equivalent ways to use f2py-jit.

Start from a Fortran code.f90

with open('code.f90', 'w') as fh:
    fh.write("""
subroutine hello()
  print*, "Hello world!"
end subroutine hello
""")

Compile the code, import it and execute it

from f2py_jit import jit
f90 = jit('code.f90')
f90.hello()

Hello world!

You can do the same using a Python string as source block

source = """
subroutine hello()
  print*, "Hello world!"
end subroutine hello
"""
f90 = jit(source)
f90.hello()

Hello world!

Usage#

Passing arguments#

Thanks to f2py, interfacing Python and Fortran 90 is pretty smooth and works mostly fine with numpy arrays - as long as you follow some basic rules.

On the Fortran side:

declare intent’s of all the arguments of your Fortran subroutines
declare your floating point variables as real(8) or double precision, unless you know how to handles types
avoid using Fortran functions, use subroutines with intent(out) variables as last arguments instead - they will be returned as result

On the Python side:

use dtype=numpy.int32 for integer arrays, unless you use integer(8) on the Fortran side
in-place modification of scalar variables, i.e. intent(inout) on the Fortran side, requires passing a 0-dimensional array (ex. numpy.array(1.0))
shape multi-dimensional numpy arrays using Fortran layout: for instance, numpy.ndarray((3, 10), order 'F') will give the right contiguity in memory layout

Multiple files#

If you have different Fortran routines in different files, just pass them to jit() as a list. Internally, they will be merged into a single source file and the module will be built from it.

Here is a simple example. The file back.f90 contains a module

module back
contains
  subroutine mysum(x, y, z)
    double precision, intent(in) :: x, y
    double precision, intent(out) :: z
    z = x + y
  end subroutine mysum
end module back

which is used by another module in front.f90

module front
  use back
contains
  subroutine main()
    double precision :: x, y, z
    x = 1; y = 1
    call mysum(x, y, z)
  end subroutine main
end module front

This will build a single module from the two source files and execute the main function

from f2py_jit import jit
f90 = jit(['back.f90', 'front.f90'])
f90.front.main()

Note

When developing a package, the Fortran source file will be stored in the package directory itself. In this case, you must provide the absolute path of the source file to jit(), see the snippet below how to do that. This is necessary also when executing a script from another directory than the one where the script resides.

Assuming the Fortran source file is in the same directory as the Python code, this is the safest way to call jit()

import os
pwd = os.path.dirname(__file__)
jit(os.path.join(pwd, 'code.f90'))

Derived types#

f2py-jit supports Fortran derived types thanks to f90wrap.

This simple example shows how Fortran types are mapped to Python classes. Note the constructor syntax convention new_array for a type named array

module types

  type array
     double precision, allocatable :: x(:)
  end type array

contains

  subroutine new_array(this, n)
    type(array), intent(inout) :: this
    integer,     intent(in), optional :: n
    if (present(n)) allocate(this % x(n))
  end subroutine new_array

end module types

We create and modify an array object from Python.

from f2py_jit import jit

f90 = jit('type_allocatable.f90')
t = f90.types.array(3)
t.x[:] = 0
print(t.x)

[0. 0. 0.]

We can store references to Python objects in components of Fortran types via pointers. This is useful, for instance, if you have already data stored as numpy arrays and you want to modify them in-place in either Fortran or Python code.

module types

  type array
     double precision, pointer :: x(:) => null()
  end type array

contains

  subroutine new_array(this, x)
    type(array), intent(inout) :: this
    double precision, intent(in), target, optional :: x(:)
    if (present(x)) this % x => x
  end subroutine new_array

end module types

Note how x tracks the modification to the Fortran type component.

import numpy
from f2py_jit import jit

f90 = jit('type_pointer.f90')
x = numpy.ones(3)
t = f90.types.array(x)
t.x[:] -= 1
print('They should be all zeros:', x)

They should be all zeros: [0. 0. 0.]

Check out the f90wrap usage for more details.

Performance#

Module builds cache#

Module builds are cached by default: if we call jit() and the module associated to the source or source files has been built already, no recompilation is attempted.

from f2py_jit import jit
jit('code.f90')  # First compilation
jit('code.f90')  # Use build cache

This applies both within a Python session and across sessions. This is useful to limit compilation overhead if you are running the same script many times, both in terms of CPU time and disk space.

The cache is stored in the .f2py-jit/ folder in the directory where the Python script is executed. You can change the cache path by changing the variable f2py_jit.modules_dir, for instance

import os
modules_dir = os.path.expanduser('~/.cache/f2py-jit')

To clear the cache use f2py_jit.clear_modules() or simply remove the directory.

Inlining#

If the Fortran source contains multiple subroutines calling one another, f2py will not perform interprocedural optimizations - at least not in my experience with gfortran. This can lead to significant performance loss when wrapping Fortran modules. f2py_jit can inline the source code before compiling it and you should get a performance boost. This feature is still experimental and may not work 100%: check your results!

As a benchmark we use the following module, in which main routine repeatedly calls kernel.

module test

contains

  subroutine kernel(n, x, y)
    integer, intent(in) :: n
    real(8), intent(in) :: x
    real(8), intent(out) :: y
    integer :: m
    m = n * 2
    y = log(exp(-x**m))
  end subroutine kernel

  subroutine main(n, z)
    integer, intent(in) :: n
    real(8), intent(out) :: z
    real(8) :: x, y
    integer :: i, k=4
    z = 0
    do i = 1, n
       do j = 1, n
          call kernel(k, x, y)
          z = z + y
       end do
    end do
  end subroutine main

end module test

Let’s time the execution of main inside the test module, with and without inlining of the kernel routine

import timeit
from f2py_jit import jit, inline

module_inlined = jit('bench.f90', inline=True)
module_not_inlined = jit('bench.f90')

print('Not inlined : ', timeit.timeit('module_not_inlined.test.main(10000)', number=3, globals=globals()))
print('Inlined     : ', timeit.timeit('module_inlined.test.main(10000)', number=3, globals=globals()))

Not inlined :  4.6051065729989205
Inlined     :  0.3709742070059292

The inlined code is remarkably faster.

Note that inlining is enabled by default in gfortran at -O3, which is the default optimization option used by f2py. So what is going wrong? The thing is that f2py compiles with the -fPIC option, which prevents inlining. One can see this by compiling the code from the terminal

gfortran -S -cpp -O3 -fPIC bench.f90; grep kernel bench.s | grep call  # you'll see a function call
gfortran -S -cpp -O3 bench.f90; grep kernel bench.s | grep call  # this will return nothing

Compiler flags#

You can specify optimization flags for the compiler when building the Fortran module. This allows you to fine tune the compilation flags at run time and squeeze the most out of your code. This would not be easy to do (although possible) when installing a Fortran extension statically using f2py.

Here is a simple example

import timeit
from f2py_jit import jit

source = """
subroutine series(n, s)
  integer, intent(in) :: n
  real(8), intent(out) :: s
  s = 0.0
  do i = 1,n
    s = s + exp(1.0/(real(i))**2)
  end do
end subroutine
"""

# Try out a few gfortran flags combos
n = int(1e8)
for flags in ['-O0', '-O1', '-O3', '-O3 -ffast-math']:
    f90 = jit(source, flags=flags)
    time = timeit.timeit('f90.series(n)', number=3, globals=globals())
    print(f'{flags:16}: {time:.4f}')

-O0             : 5.5776
-O1             : 1.6290
-O3             : 1.5598
-O3 -ffast-math : 0.7351

Notice the gap between -O0 and -O3 and the extra boost coming from less accurate exponential evaluation when using -ffast-math, which is not set by default by f2py.

Comparison with numba#

Numba is a popular Python package to compile Python code just-in-time. Python functions decorated with numba.jit will generally run faster than pure Python.

Summing a series#

Here we compare the performance of numba and f2py-jit on a simple example (single loop)

from numba import jit

@jit(nopython=True)
def series_numba(n):
    s = 0.0
    for i in range(1, n+1):
        s = s + 1.0/float(i)**2
    return s

This is the equivalent code with f2py-jit

source = """
subroutine series(n, s)
  integer, intent(in) :: n
  real(8), intent(out) :: s
  s = 0.0
  do i = 1,n
    s = s + 1.0/(real(i))**2
  end do
end subroutine series
"""
from f2py_jit import jit

f90 = jit(source, flags='-O0')
f90 = jit(source)
series_f90 = f90.series

The timings are comparable in this simple example.

import timeit

n = int(1e8)
print('Numba:', timeit.timeit('series_numba(n)', number=3, globals=globals()))
print('F90:', timeit.timeit('series_f90(n)', number=3, globals=globals()))

Numba: 0.37357314999826485
F90: 0.3555460839997977

N-body calculation#

Now we compare the performance of numba and f2py-jit on a minimal N-body energy calculation (double loop).

This is the numba code

from numba import jit

@jit(nopython=True)
def potential(r, sigma, epsilon, n):
    return epsilon * (sigma/r)**n

import math
@jit(nopython=True)
def interaction(pos, sigma, epsilon, n):
    N = pos.shape[0]
    U = 0.0
    for i in range(N):
        for j in range(i+1, N):
            r = pos[i, :] - pos[j, :]
            # rij = numpy.sqrt(numpy.sum(r**2))  # this is slower
            rij = numpy.sum(r**2)**0.5
            if rij > 0.6:
                U += potential(rij, sigma, epsilon, n)
    U /= N
    return U

func_numba = interaction

This is the equivalent Fortran code, called nbody.f90

subroutine potential(r, sigma, epsilon, n, u)
  double precision, intent(in) :: r, sigma, epsilon
  integer, intent(in) :: n
  double precision, intent(out) :: u
  u = epsilon * (sigma/r)**n
end subroutine potential

subroutine interaction(pos, sigma, epsilon, n, U)
  double precision, intent(in) :: pos(:,:), sigma, epsilon
  integer, intent(in) :: n
  double precision, intent(out) :: U
  double precision :: rij, r(size(pos,1)), rcut, uij
  U = 0.0
  do i = 1,size(pos, 2)
        do j = i+1,size(pos, 2)
            r = pos(:, i) - pos(:, j)
            rij = sum(r**2)**0.5
            if (rij > 0.6) then
                call potential(rij, sigma, epsilon, n, uij)
                U = U + uij
            end if
        end do
  end do
  U = U / size(pos,2)
end subroutine interaction

which we now compile

from f2py_jit import jit

f90 = jit('nbody.f90')
func_f90 = f90.interaction
f90_opt = jit('nbody.f90', flags='-O3 -ffast-math')
func_f90_opt = f90_opt.interaction

Let’s get the timings

import numpy

N = 2000
L = 5.0
numpy.random.seed(1)
pos = (numpy.random.random((N, 3)) - 0.5) * L / 2
print('Numba:', timeit.timeit('func_numba(pos, sigma=1.0, epsilon=1.0, n=12)', number=3, globals=globals()))
pos = pos.transpose()  # transpose the array for fortran
print('F90: ', timeit.timeit('func_f90(pos, sigma=1.0, epsilon=1.0, n=12)', number=3, globals=globals()))
print('F90 optimized: ', timeit.timeit('func_f90_opt(pos, sigma=1.0, epsilon=1.0, n=12)', number=3, globals=globals()))

Numba: 1.1736466069996823
F90:  0.23314218599989545
F90 optimized:  0.10024302700185217

In this specific example, f2py-jit is appreciably faster than numba. Note that inlining the code would not give an appreciable speedup here.

API reference#

`f2py-jit` module#

Create python modules from Fortran code just-in-time.

Two main steps: first build, then import the module. This can be done in a single step using the jit() function.

The modules_dir directory contains the compiled modules and is created in the current path (cwd).

f2py_jit.f2py_jit.available_modules()#: Return a list of available modules

f2py_jit.f2py_jit.build_module(source, metadata=None, extra_args='', verbose=False, inline=False)#

Build a Fortran module from source and return its unique uid, which can be used to import the module with import_module()

Parameters:

source (str) – Fortran source of module or subroutine to compile
metadata (dict) – Metadata to identify the module (Default value = None)
extra_args (str) – Command line arguments passed to f2py (Default value = ‘’)
verbose (bool) – (Default value = False)
inline (bool) – Inline source code (Default value = False)

Returns:

uid – The unique module uid

Return type:

str

f2py_jit.f2py_jit.clear_modules()#: Clean modules from cache directory

f2py_jit.f2py_jit.compile_module(source, name, extra_args='', verbose=True, quiet=False, source_fn=None, extension='.f90')#

Build extension module from a Fortran source string with f2py.

Parameters:

source (str or bytes) – Fortran source of module / subroutine to compile
name (str, optional) – The name of the compiled python module
extra_args (str or list, optional) – Additional parameters passed to f2py (Default value = ‘’)
verbose (bool, optional) – Print f2py output to screen (Default value = True)
source_fn (str, optional) – Name of the file where the fortran source is written. The default is to use a temporary file with the extension provided by the extension parameter
extension ({'.f', '.f90'}, optional) – Filename extension if source_fn is not provided. The extension tells which fortran standard is used. The default is .f, which implies F77 standard.
quiet – (Default value = False)

f2py_jit.f2py_jit.import_module(path, quiet=False)#

Parameters:

name –
quiet – (Default value = False)

f2py_jit.f2py_jit.jit(source, flags='', extra_args='', verbose=False, inline=False, skip='', only='')#

Single-step just-in-time build and import of Fortran source code, which can be either a path or a string with f90 code

Parameters:

source –
flags (str) – (Default value = ‘’)
extra_args (str) – (Default value = ‘’)
verbose (bool) – (Default value = False)
inline (bool) – (Default value = False)

Returns:

f90

Return type:

module

`finline` module#

Inline fortran 90 code

f2py_jit.finline.inline(source, ignore=None)#

Multiprocedural optimizations (inlining) in source f90 code.

Accept a path or a f90 cource code (as string) as input and return the inlined source code.

f2py_jit.finline.inline_source(source, ignore=None)#

Multiprocedural optimizations (inlining) in source f90 code.

Accept a path or a f90 cource code (as string) as input and return the inlined source code.

f2py-jit#

Quick start#

Usage#

Passing arguments#

Multiple files#

Derived types#

Performance#

Module builds cache#

Inlining#

Compiler flags#

Comparison with numba#

Summing a series#

N-body calculation#

API reference#

f2py-jit module#

finline module#

`f2py-jit` module#

`finline` module#