5.5. 2024-02-08 Performance study of conversion to C types from Python types¶
I have implemented invocation of a C callback along with conversion of Python arguments to C types as a Python C extension.
In details in the following purely Python code:
import ctypes
import time
import _callback
import numpy as np
from oif.core import OIF_ARRAY_F64, OIF_FLOAT64, OIF_INT, OIFArrayF64
class Callback:
def __init__(self, fn_p, id: str):
# Details of creating a callback function from a PyCapsule `fn_p`.
# Omitted as irrelevant here.
def __call__(self, *args):
c_args = []
for i, (t, v) in enumerate(zip(self.arg_types, args)):
if t == OIF_INT:
c_args.append(ctypes.c_int(v))
elif t == OIF_FLOAT64:
c_args.append(ctypes.c_double(v))
elif t == OIF_ARRAY_F64:
assert v.dtype == np.float64
nd = v.ndim
dimensions = (ctypes.c_long * len(v.shape))(*v.shape)
data = v.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
oif_array = OIFArrayF64(nd, dimensions, data)
c_args.append(ctypes.pointer(oif_array))
return self.fn_p_py(*c_args)
the method __call__
is update to use an equivalent C code.
Here I report the performance results.
With the previous code, cProfile
gives the following results:
Thu Feb 8 15:34:35 2024 profiler-results-oif-scipy_ode_dopri5
5028526 function calls (5001575 primitive calls) in 10.659 seconds
Ordered by: cumulative time
List reduced from 5549 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
758/1 0.009 0.000 10.660 10.660 {built-in method builtins.exec}
1 0.000 0.000 10.660 10.660 examples/compare_performance_ivp_burgers_eq.py:1(<module>)
1 0.000 0.000 9.822 9.822 examples/compare_performance_ivp_burgers_eq.py:96(run_one_impl)
1 0.003 0.003 9.822 9.822 examples/compare_performance_ivp_burgers_eq.py:194(_run_once)
2000 0.004 0.000 9.405 0.005 oif/interfaces/python/oif/interfaces/ivp.py:38(integrate)
2002 0.064 0.000 9.403 0.005 oif/lang_python/oif/core.py:138(call)
2000 0.006 0.000 9.275 0.005 oif_impl/impl/ivp/scipy_ode_dopri5/dopri5.py:39(integrate)
2000 0.004 0.000 9.268 0.005 <>/lib/python3.12/site-packages/scipy/integrate/_ode.py:397(integrate)
2000 0.394 0.000 9.263 0.005 <>/lib/python3.12/site-packages/scipy/integrate/_ode.py:1173(run)
89872 0.197 0.000 8.869 0.000 oif_impl/impl/ivp/scipy_ode_dopri5/dopri5.py:44(_rhs_fn_wrapper)
89872 2.109 0.000 8.672 0.000 oif_impl/python/oif/callback.py:27(__call__)
89872 0.645 0.000 5.703 0.000 oif/lang_python/oif/core.py:105(wrapper)
89872 3.066 0.000 3.914 0.000 examples/compare_performance_ivp_burgers_eq.py:73(compute_rhs)
179744 0.311 0.000 1.090 0.000 <>/lib/python3.12/site-packages/numpy/ctypeslib.py:506(as_array)
we can see that the call invocation took 2.109 seconds by itself (that is, for conversion of the arguments to C types from Python types).
With the new implementation of the __call__
method:
Thu Feb 8 15:25:22 2024 profiler-results-oif-scipy_ode_dopri5
3770321 function calls (3743370 primitive calls) in 7.942 seconds
Ordered by: cumulative time
List reduced from 5550 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
758/1 0.009 0.000 7.943 7.943 {built-in method builtins.exec}
1 0.000 0.000 7.943 7.943 examples/compare_performance_ivp_burgers_eq.py:1(<module>)
1 0.000 0.000 7.141 7.141 examples/compare_performance_ivp_burgers_eq.py:96(run_one_impl)
1 0.003 0.003 7.141 7.141 examples/compare_performance_ivp_burgers_eq.py:194(_run_once)
2000 0.004 0.000 6.742 0.003 oif/interfaces/python/oif/interfaces/ivp.py:38(integrate)
2002 0.071 0.000 6.739 0.003 oif/lang_python/oif/core.py:138(call)
2000 0.006 0.000 6.602 0.003 oif_impl/impl/ivp/scipy_ode_dopri5/dopri5.py:39(integrate)
2000 0.004 0.000 6.595 0.003 <>/lib/python3.12/site-packages/scipy/integrate/_ode.py:397(integrate)
2000 0.371 0.000 6.590 0.003 <>/lib/python3.12/site-packages/scipy/integrate/_ode.py:1173(run)
89872 0.097 0.000 6.220 0.000 oif_impl/impl/ivp/scipy_ode_dopri5/dopri5.py:44(_rhs_fn_wrapper)
89872 0.036 0.000 6.123 0.000 oif_impl/python/oif/callback.py:27(__call__)
89872 0.560 0.000 6.087 0.000 {built-in method callback.call_c_fn_from_python}
89872 0.699 0.000 5.527 0.000 oif/lang_python/oif/core.py:105(wrapper)
89872 2.864 0.000 3.673 0.000 examples/compare_performance_ivp_burgers_eq.py:73(compute_rhs)
179744 0.282 0.000 1.086 0.000 <>/lib/python3.12/site-packages/numpy/ctypeslib.py:506(as_array)
we can see now that the __call__
takes about 0.59 seconds (see
callback.callc_fn_from_python
).
This gives about 3.5 times of performance boost from the previous version.