5.6. 2024-02-09 Performance study of conversion of OIF arrays to NumPy arrays via C extension

I have implemented a C extension to convert OIFArrayF64 * to NumPy arrays not via np.ctypeslib but directly for efficiency.

These are the results of profiling:

Fri Feb  9 17:30:26 2024    profiler-results-oif-scipy_ode_dopri5

         3051649 function calls (3024692 primitive calls) in 6.099 seconds

   Ordered by: cumulative time
   List reduced from 5551 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    758/1    0.009    0.000    6.100    6.100 {built-in method builtins.exec}
        1    0.000    0.000    6.100    6.100 examples/compare_performance_ivp_burgers_eq.py:1(<module>)
        1    0.000    0.000    5.273    5.273 examples/compare_performance_ivp_burgers_eq.py:96(run_one_impl)
        1    0.003    0.003    5.273    5.273 examples/compare_performance_ivp_burgers_eq.py:194(_run_once)
     2000    0.005    0.000    4.856    0.002 oif/interfaces/python/oif/interfaces/ivp.py:38(integrate)
     2002    0.071    0.000    4.853    0.002 oif/lang_python/oif/core.py:146(call)
     2000    0.006    0.000    4.712    0.002 oif_impl/impl/ivp/scipy_ode_dopri5/dopri5.py:39(integrate)
     2000    0.004    0.000    4.706    0.002 <..>/lib/python3.12/site-packages/scipy/integrate/_ode.py:397(integrate)
     2000    0.351    0.000    4.701    0.002 <..>/lib/python3.12/site-packages/scipy/integrate/_ode.py:1173(run)
    89872    0.088    0.000    4.350    0.000 oif_impl/impl/ivp/scipy_ode_dopri5/dopri5.py:44(_rhs_fn_wrapper)
    89872    0.036    0.000    4.263    0.000 oif_impl/python/oif/callback.py:27(__call__)
    89872    0.413    0.000    4.227    0.000 {built-in method callback.call_c_fn_from_python}
    89872    0.410    0.000    3.813    0.000 oif/lang_python/oif/core.py:110(wrapper)
    89872    2.597    0.000    3.259    0.000 examples/compare_performance_ivp_burgers_eq.py:73(compute_rhs)

We can see from these results that the run time decreased in comparison to the previous results (in the previous section), as in the wrapper function conversion of OIF arrays to NumPy arrays is done in a more efficient manner.

5.6.1. Native profiling

Native profiling results are

Fri Feb  9 17:49:50 2024    profiler-results-native-scipy_ode_dopri5

         2051233 function calls (2024295 primitive calls) in 4.467 seconds

   Ordered by: cumulative time
   List reduced from 5518 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    756/1    0.010    0.000    4.469    4.469 {built-in method builtins.exec}
        1    0.000    0.000    4.469    4.469 examples/compare_performance_ivp_burgers_eq.py:1(<module>)
        1    0.001    0.001    3.587    3.587 examples/compare_performance_ivp_burgers_eq.py:97(run_one_impl)
        1    0.004    0.004    3.586    3.586 examples/compare_performance_ivp_burgers_eq.py:195(_run_once)
     2000    0.004    0.000    3.185    0.002 <..>/lib/python3.12/site-packages/scipy/integrate/_ode.py:397(integrate)
     2000    0.305    0.000    3.181    0.002 <..>/lib/python3.12/site-packages/scipy/integrate/_ode.py:1173(run)
    89871    0.061    0.000    2.876    0.000 examples/compare_performance_ivp_burgers_eq.py:92(compute_rhs_native)
    89871    2.268    0.000    2.815    0.000 examples/compare_performance_ivp_burgers_eq.py:74(compute_rhs)
   849/11    0.005    0.000    1.026    0.093 <frozen importlib._bootstrap>:1349(_find_and_load)
   844/11    0.004    0.000    1.025    0.093 <frozen importlib._bootstrap>:1304(_find_and_load_unlocked)
   804/14    0.003    0.000    1.022    0.073 <frozen importlib._bootstrap>:911(_load_unlocked)
   683/12    0.002    0.000    1.022    0.085 <frozen importlib._bootstrap_external>:988(exec_module)
  2013/24    0.002    0.000    1.019    0.042 <frozen importlib._bootstrap>:480(_call_with_frames_removed)
   668/79    0.001    0.000    0.903    0.011 {built-in method builtins.__import__}
 1017/106    0.002    0.000    0.895    0.008 <frozen importlib._bootstrap>:1390(_handle_fromlist)
    89953    0.093    0.000    0.533    0.000 <..>/lib/python3.12/site-packages/numpy/core/fromnumeric.py:2692(max)
        1    0.000    0.000    0.482    0.482 <..>/lib/python3.12/site-packages/matplotlib/pyplot.py:1(<module>)
    90055    0.143    0.000    0.441    0.000 <..>/lib/python3.12/site-packages/numpy/core/fromnumeric.py:71(_wrapreduction)
1953/1833    0.027    0.000    0.433    0.000 {built-in method builtins.__build_class__}
      105    0.003    0.000    0.301    0.003 <..>/lib/python3.12/site-packages/matplotlib/artist.py:159(_update_set_signature_and_docstring)

5.6.2. Comparison for \(N = 1001\)

Comparing the run time for ode.integrate with native performance (results are above), we see now the following profiling results:

OIF or native

Run time, seconds

oif

4.706

native

3.185

which gives 47% performance penalty for \(N = 1001\).

5.6.3. Comparison of normalized performance for different \(N\)

../_images/2024-01-31-ivp_burgers_perf_normalized.png

Fig. 5.6.1 Normalized runtime relative to the “native performance” of directly calling scipy.integrate.ode.dopri5 from Python for different grid resolutions. Values less than unity show are due to difference in numerical methods and implementations.

../_images/2024-02-09-ivp_burgers_perf_normalized.png

Fig. 5.6.2 Normalized runtime relative to the “native performance” of directly calling scipy.integrate.ode.dopri5 from Python for different grid resolutions. Values less than unity show are due to difference in numerical methods and implementations.

We can see from these two figures that for resolution \(N = 10001\) where computational workload is relatively large, with the performance optimizations done here and in the previous section, performance penalty drops from 50% to 20%, that is more than twofold.