Python Acceleration

Accelerate Python Models 2871x with AADC

Prototype in Python, achieve C++ performance. Accelerate your existing Python pricing models 2871x with minimal code changes — no rewrites required.

2871x speedup with just +77 lines of code
Iterate in Python at C++ speeds
AADC 1.9x faster Greeks than hand-optimised C++

Performance Results

VersionLines of CodeExecution TimeSpeedup
Basic Python 775 32 min
NumPy 781 18.8s 102x
C++ Optimised 880 1.30s 1483x
Python + AADC 852 (+77) 0.67s 2871x

Benchmark Configuration

10 trades × 100K scenarios × 252 timesteps with 8 threads — all Greeks (Delta, Rho, Vega) computed.

AADC vs Hand-Optimised C++

For pricing-only, C++ edges out AADC slightly
With Greeks, AADC is 1.9x faster than hand-optimised C++
Traditional bump-and-revalue adds +495% overhead for Greeks
AADC adds only +26% overhead while computing all Greeks
See the Difference

The Same Logic, Four Ways

Browse functions like files in VSCode - or ask for source code to run yourself

  • f gbm_constants
  • f simulate_path
  • f price_asian_option
  • f price_with_greeks
basic.py - gbm_constants
32 min 775 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = math.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt
basic.py - simulate_path
32 min 775 total lines
def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_path, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = 0.0
    for t in range(num_timesteps):
        price = price * math.exp(drift + vol_sqrt_dt * Z_path[t])
        running_sum += price

    average = running_sum / num_timesteps
    payoff = max(average - K, 0.0)
    discount = math.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff
basic.py - price_asian_option
32 min 775 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option - loop over all scenarios."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    payoff_sum = 0.0
    for scenario in range(num_scenarios):
        payoff_sum += simulate_path(S0, K, r, T, drift, vol_sqrt_dt,
                                    Z[scenario], num_timesteps)

    return payoff_sum / num_scenarios
basic.py - price_with_greeks
32 min 775 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Greeks require 4 full pricings - 4x the compute cost!
numpy_pricer.py - gbm_constants
18.8s 781 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Same as basic - using np.sqrt instead of math.sqrt
numpy_pricer.py - simulate_path
18.8s 781 total lines
def simulate_paths_vectorized(S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps):
    """Simulate all GBM paths at once using NumPy vectorization."""
    # Z is (num_scenarios, num_timesteps)
    log_increments = drift + vol_sqrt_dt * Z  # Vectorized across all scenarios
    log_prices = np.cumsum(log_increments, axis=1)
    prices = S0 * np.exp(log_prices)

    # Running average for Asian option
    running_sum = np.cumsum(prices, axis=1)
    averages = running_sum[:, -1] / num_timesteps

    payoffs = np.maximum(averages - K, 0.0)
    discount = np.exp(-r * T)
    return discount * payoffs

# Vectorized across scenarios - ~6x faster than Basic Python
numpy_pricer.py - price_asian_option
18.8s 781 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option using NumPy vectorization."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    # Single vectorized call - no Python loop over scenarios
    discounted_payoffs = simulate_paths_vectorized(
        S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps
    )

    return np.mean(discounted_payoffs)

# ~6x faster than Basic Python loops
numpy_pricer.py - price_with_greeks
18.8s 781 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Still 4 pricings needed - Greeks add +274% overhead
aadc_pricer.py - gbm_constants
0.67s 852 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Identical to naive - AADC works with regular Python code!
aadc_pricer.py - simulate_path
0.67s 852 total lines
def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_vals, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = aadc.idouble(0.0)                                  # AADC
    for t in range(num_timesteps):
        price = price * np.exp(drift + vol_sqrt_dt * Z_vals[t])
        running_sum = running_sum + price

    average = running_sum / num_timesteps
    payoff = np.maximum(average - K, 0.0)
    discount = np.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff

# Only change: aadc.idouble for running_sum - enables AAD!
aadc_pricer.py - price_asian_option
0.67s 852 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Price Asian option using AADC - loop over all scenarios."""
    workers = aadc.ThreadPool(num_threads)                           # AADC

    # --- Record computation graph ---
    funcs = aadc.Functions()                                         # AADC
    funcs.start_recording()                                          # AADC

    # Active variables (use idouble instead of float)
    S0_v    = aadc.idouble(S0);    S0_arg    = S0_v.mark_as_input()   # AADC
    r_v     = aadc.idouble(r);     r_arg     = r_v.mark_as_input()    # AADC
    sigma_v = aadc.idouble(sigma); sigma_arg = sigma_v.mark_as_input()# AADC
    K_v     = aadc.idouble(K);     K_arg     = K_v.mark_as_input_no_diff()  # AADC
    T_v     = aadc.idouble(T);     T_arg     = T_v.mark_as_input_no_diff()  # AADC

    # ... record path simulation ...

    payoff_res = discounted_payoff.mark_as_output()                  # AADC
    funcs.stop_recording()                                           # AADC

    # Evaluate vectorized across scenarios
    request = {payoff_res: [S0_arg, r_arg, sigma_arg]}               # AADC
    results = aadc.evaluate(funcs, request, inputs, workers)         # AADC

    return results, payoff_res, S0_arg, r_arg, sigma_arg
aadc_pricer.py - price_with_greeks
0.67s 852 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Compute price and Greeks via AAD (1 forward + 1 adjoint pass)."""
    results, payoff_res, S0_arg, r_arg, sigma_arg = price_asian_option(
        S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads
    )

    # --- Extract results ---                                        # AADC
    discounted_payoffs = results[0][payoff_res]                      # AADC
    price = float(np.mean(discounted_payoffs))                       # AADC

    # Greeks from single adjoint pass (no extra pricings needed!)    # AADC
    delta = float(np.mean(results[1][payoff_res][S0_arg]))           # AADC
    rho   = float(np.mean(results[1][payoff_res][r_arg]))            # AADC
    vega  = float(np.mean(results[1][payoff_res][sigma_arg]))        # AADC

    return price, delta, rho, vega

# All Greeks computed in ONE pass - +31% overhead vs +593%!
optimised.cpp - gbm_constants
1.30s 880 total lines
void gbm_constants(double r, double vol, double T, size_t num_timesteps,
                   double& dt, double& sqrt_dt, double& drift, double& vol_sqrt_dt) {
    /**Compute GBM simulation constants.*/
    dt = T / static_cast<double>(num_timesteps);
    sqrt_dt = std::sqrt(dt);
    drift = (r - 0.5 * vol * vol) * dt;
    vol_sqrt_dt = vol * sqrt_dt;
}
optimised.cpp - simulate_path
1.30s 880 total lines
// Scalar version (portable, works on ARM/Apple Silicon)
double simulate_path_scalar(double S0, double K, double drift, double vol_sqrt_dt,
                            const double* Z_row, size_t num_timesteps) {
    /**Simulate GBM path and compute payoff (scalar version).*/
    double price = S0;
    double running_sum = 0.0;

    for (size_t t = 0; t < num_timesteps; ++t) {
        price = price * std::exp(drift + vol_sqrt_dt * Z_row[t]);
        running_sum += price;
    }

    double average = running_sum / static_cast<double>(num_timesteps);
    double payoff = std::max(average - K, 0.0);
    return payoff;
}

// AVX2 SIMD version also available for x86-64 (~2x faster)
optimised.cpp - price_asian_option
1.30s 880 total lines
// Core pricing function (auto-selects AVX2 or scalar)
double price_asian_option(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps
) {
    double dt, sqrt_dt, drift, vol_sqrt_dt;
    gbm_constants(r, vol, T, num_timesteps, dt, sqrt_dt, drift, vol_sqrt_dt);

#if USE_AVX2
    // Broadcast constants to SIMD registers
    const __m256d drift_vec = _mm256_set1_pd(drift);
    const __m256d vol_sqrt_dt_vec = _mm256_set1_pd(vol_sqrt_dt);
    const __m256d S0_vec = _mm256_set1_pd(S0);
    size_t row_stride = (num_timesteps + SIMD_WIDTH - 1) & ~(SIMD_WIDTH - 1);
#else
    size_t row_stride = num_timesteps;
#endif

    double payoff_sum = 0.0;
    for (size_t scenario = 0; scenario < num_scenarios; ++scenario) {
        const double* Z_row = Z + scenario * row_stride;
#if USE_AVX2
        payoff_sum += simulate_path_avx(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps,
                                        drift_vec, vol_sqrt_dt_vec, S0_vec);
#else
        payoff_sum += simulate_path_scalar(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps);
#endif
    }

    return std::exp(-r * T) * (payoff_sum / static_cast<double>(num_scenarios));
}
optimised.cpp - price_with_greeks
1.30s 880 total lines
// Greeks via bump-and-revalue (requires 4 full pricings)
constexpr double BUMP_SIZE = 1e-6;

void price_with_greeks(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps,
    double& price, double& delta, double& rho, double& vega
) {
    price = price_asian_option(S0, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dS = price_asian_option(S0 + BUMP_SIZE, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dr = price_asian_option(S0, K, r + BUMP_SIZE, vol, T, Z, num_scenarios, num_timesteps);
    double p_dv = price_asian_option(S0, K, r, vol + BUMP_SIZE, T, Z, num_scenarios, num_timesteps);
    delta = (p_dS - price) / BUMP_SIZE;
    rho   = (p_dr - price) / BUMP_SIZE;
    vega  = (p_dv - price) / BUMP_SIZE;
}

// Even with AVX2 SIMD, Greeks add +582% overhead (7 evaluations)!

Key insight: AADC delivers 2871x faster than Basic Python and 28x faster Greeks than NumPy — outperforming Optimised C++ by 1.9x.

Greeks via AAD: 1 forward + 1 adjoint pass - +26% overhead vs +298% for NumPy bump-and-revalue.

Why This Approach Works

AADC doesn't change your computation. It produces an exact replica, mathematically proven, just accelerated. The integration code handles type annotations, recording setup, and kernel compilation — exactly the kind of repetitive, pattern-based work that's easy to validate.

For Prototyping

AI-assisted integration is excellent for quickly validating potential speedup on your actual models

For Production

Use MatLogica's AADC Toolkit with debugging support and automated scripts

Watch the Tutorial

See the complete workflow from start to finish

Technical Details

  • Arithmetic average Asian option under GBM
  • Monte Carlo simulation with Greeks (Delta, Rho, Vega)
  • Basic Python: 775 lines, 32 min
  • AADC Python: 852 lines, 0.67s
  • 10 trades × 100K scenarios × 252 timesteps

Business Impact

  • Rapid model acceleration: integrate AADC in hours, not weeks
  • Prototype in Python, achieve production performance immediately
  • Accelerate existing models without rewriting from scratch
  • Build production systems in Python without performance compromises

Ready to Accelerate Your Python Models?

See what AADC can do for your specific use case. Schedule a demo or get the Claude configuration files to try it yourself.

Tags: PythonaccelerationMonte CarloAADC integrationperformanceGreeksprototyping

Frequently Asked Questions

Can Python really achieve C++ performance with AADC?
Yes. AADC Python achieves 2871x speedup over basic Python, and when computing Greeks, AADC is 1.9x faster than hand-optimised C++. Teams are building production systems in Python with AADC, iterating at speeds that would normally require hand-optimised C++.
What's the recommended path for evaluating AADC?
Prototype in Python, observe performance on a real model, then harden for production with the toolkit. This approach lets you see actual speedup on your specific models before any commitment. Some teams like this enough that they build production systems this way.
How much code change is required for AADC acceleration?
Only +77 lines of code are needed to achieve 2871x speedup. AADC integration handles type annotations, recording setup, and kernel compilation — the boilerplate work that's straightforward to validate.
How does AADC compare to hand-optimised C++ for Greeks calculation?
AADC is 1.9x faster than hand-optimised C++ for computing Greeks (0.67s vs 1.30s). While C++ may edge out AADC slightly for pricing-only, AADC dramatically outperforms when Greeks are required. Traditional C++ adds +495% overhead for Greeks via bump-and-revalue, while AADC adds only +26%.