5.4. 2024-01-31 Profiling conversion between NumPy ndarrays and OIFArrayF64 data structures

Using the script examples/compare_performance_ivp_burgers_eq_pure_comparison.py one scipy_ode_dopri5, we get the following the following results:

Profile the code with the following command:

python -m cProfile -o profiler-results \
    examples/compare_performance_ivp_burgers_eq_pure_comparison.py \
        one scipy_ode_dopri5

then using

python -m pstats profiler-results
sort cumulative
stats 20

we obtain the following results:

Wed Jan 31 16:54:54 2024    profiler-results

         4223975 function calls (4196427 primitive calls) in 8.631 seconds

   Ordered by: cumulative time
   List reduced from 5448 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    744/1    0.011    0.000    8.632    8.632 {built-in method builtins.exec}
        1    0.000    0.000    8.632    8.632 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:1(<module>)
        1    0.000    0.000    7.706    7.706 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:177(run_one_impl)
        1    0.005    0.005    7.706    7.706 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:252(_run_once)
     2000    0.005    0.000    7.378    0.004 /home/dima/sw/mambaforge/envs/um02-toy-example/lib/python3.12/site-packages/scipy/integrate/_ode.py:397(integrate)
     2000    0.497    0.000    7.373    0.004 /home/dima/sw/mambaforge/envs/um02-toy-example/lib/python3.12/site-packages/scipy/integrate/_ode.py:1173(run)
    89871    0.268    0.000    6.875    0.000 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:125(compute)
    89871    0.087    0.000    3.945    0.000 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:112(compute_rhs_native)
    89871    3.024    0.000    3.858    0.000 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:94(compute_rhs)
    89871    1.218    0.000    1.874    0.000 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:134(_c_args_from_py_args)
   826/10    0.005    0.000    1.034    0.103 <frozen importlib._bootstrap>:1349(_find_and_load)
   821/10    0.004    0.000    1.033    0.103 <frozen importlib._bootstrap>:1304(_find_and_load_unlocked)
   784/11    0.003    0.000    1.031    0.094 <frozen importlib._bootstrap>:911(_load_unlocked)
   671/11    0.002    0.000    1.031    0.094 <frozen importlib._bootstrap_external>:988(exec_module)
  1958/21    0.002    0.000    1.028    0.049 <frozen importlib._bootstrap>:480(_call_with_frames_removed)
  1006/97    0.002    0.000    0.911    0.009 <frozen importlib._bootstrap>:1390(_handle_fromlist)
   659/71    0.001    0.000    0.906    0.013 {built-in method builtins.__import__}
    90031    0.123    0.000    0.814    0.000 /home/dima/sw/mambaforge/envs/um02-toy-example/lib/python3.12/site-packages/numpy/core/fromnumeric.py:2692(max)
    89871    0.376    0.000    0.753    0.000 examples/compare_performance_ivp_burgers_eq_pure_comparison.py:154(_py_args_from_c_args)
    90230    0.210    0.000    0.692    0.000 /home/dima/sw/mambaforge/envs/um02-toy-example/lib/python3.12/site-packages/numpy/core/fromnumeric.py:71(_wrapreduction)

where we can see that inside the function compute that runs for 6.875 seconds, the functions _c_args_from_py_args and _py_args_from_c_args take \(1.874 + 0.753 = 2.627\) seconds, which constitutes 38 % of the run time for the function compute (useful work is done inside the function compute_rhs_native during the other 62 % of run time).

5.4.1. Line profiling

Line profiling was done with

    kernprof -l -v examples/compare_performance_ivp_burgers_eq_pure_comparison.py one scipy_ode_dopri5

with functions _c_args_from_py_args and _py_args_from_c_args annotated (decorator @profile from line_profiler module).

The results are:

================================================================
Solving Burgers' equation with time integration scipy_ode_dopri5
Finished
Wrote profile results to compare_performance_ivp_burgers_eq_pure_comparison.py.lprof
Timer unit: 1e-06 s

Total time: 2.43264 s
File: examples/compare_performance_ivp_burgers_eq_pure_comparison.py
Function: _c_args_from_py_args at line 135

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   135                                               @profile
   136                                               def _c_args_from_py_args(self, *args) -> list:
   137     89871      56466.2      0.6      2.3          begin = time.time()
   138     89871      33840.0      0.4      1.4          c_args: list[CtypesType] = []
   139    269613     245270.1      0.9     10.1          for i, (t, v) in enumerate(zip(self.arg_types, args)):
   140    179742      55548.1      0.3      2.3              if t == OIF_INT:
   141                                                           c_args.append(ctypes.c_int(v))
   142    179742      52412.0      0.3      2.2              elif t == OIF_FLOAT64:
   143     89871      76238.6      0.8      3.1                  c_args.append(ctypes.c_double(v))
   144     89871      28563.6      0.3      1.2              elif t == OIF_ARRAY_F64:
   145     89871     118409.0      1.3      4.9                  assert v.dtype == np.float64
   146     89871      29576.1      0.3      1.2                  nd = v.ndim
   147     89871     194035.4      2.2      8.0                  dimensions = (ctypes.c_long * len(v.shape))(*v.shape)
   148     89871      42022.9      0.5      1.7                  double_p_t = ctypes.POINTER(ctypes.c_double)
   149     89871    1001718.9     11.1     41.2                  data = v.ctypes.data_as(double_p_t)
   150
   151     89871     270073.7      3.0     11.1                  oif_array = OIFArrayF64(nd, dimensions, data)
   152     89871     109151.4      1.2      4.5                  c_args.append(ctypes.pointer(oif_array))
   153     89871      37293.8      0.4      1.5          end = time.time()
   154                                                   # print("{:35s} {:.2e}".format("Elapsed time _c_args_from_py_args", end - begin))
   155     89871      82017.4      0.9      3.4          return c_args

Total time: 0.901177 s
File: examples/compare_performance_ivp_burgers_eq_pure_comparison.py
Function: _py_args_from_c_args at line 157

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   157                                               @profile
   158                                               def _py_args_from_c_args(self, *args) -> list:
   159     89871      39935.7      0.4      4.4          begin = time.time()
   160     89871      24564.8      0.3      2.7          py_args = []
   161     89871      32901.2      0.4      3.7          py_args.append(args[0])
   162     89871      18304.0      0.2      2.0          v = args[1]
   163     89871     150222.5      1.7     16.7          arr_type = ctypes.c_double * v.contents.dimensions[0]
   164    179742      52523.6      0.3      5.8          py_args.append(
   165    179742     395453.0      2.2     43.9              np.ctypeslib.as_array(
   166     89871     107230.1      1.2     11.9                  arr_type.from_address(ctypes.addressof(v.contents.data.contents))
   167                                                       )
   168                                                   )
   175     89871      28710.8      0.3      3.2          end = time.time()
   176                                                   # print("{:35s} {:.2e}".format("Elapsed time _py_args_from_c_args", end - begin))
   177
   178     89871      51331.4      0.6      5.7          return py_args

We can see from the results of line profiling, that somehow functions ndarray.ctypes.data_as and numpy.ctypeslib.as_array are taking 40 % of run time, and look as the main targets for optimization.