Rust in Python: 150× avec PyO3

1. Pourquoi Rust pour la Finance Quantitative1. Why Rust for Quantitative Finance

Python domine la finance quantitative grâce à son écosystème (NumPy, pandas, scikit-learn) et sa rapidité de prototypage. Mais quand la boucle interne est une simulation Monte Carlo avec \(10^6\) chemins, ou un processeur d'order-book traitant \(10^5\) événements/seconde, l'overhead de l'interpréteur CPython devient le goulot d'étranglement.Python dominates quantitative finance thanks to its ecosystem (NumPy, pandas, scikit-learn) and rapid prototyping. But when the inner loop is a Monte Carlo simulation with \(10^6\) paths, or an order-book processor handling \(10^5\) events/second, CPython interpreter overhead becomes the bottleneck.

💡 Le constat:💡 The finding: Only 8% of Python execution time is useful computation. The remaining 92% is interpreter overhead (bytecode dispatch, object boxing, function calls).

Figure 1: CPU time breakdown in a Python Monte Carlo simulation. Rust eliminates interpreter overhead.

Comparaison des approches d'accélérationComparison of Acceleration Approaches

ApprocheApproach	Speedup	AvantagesPros	InconvénientsCons
NumPy vectoriséVectorised NumPy	5–20×	Facile, pas de buildEasy, no build step	Gourmand en mémoireMemory-hungry
Numba JIT	30–80×	Décorateur simpleSimple decorator	Compilation fragile, types limitésFragile compilation, limited types
Cython	20–50×	Mature	Syntaxe verbeuse, gestion mémoire manuelleVerbose syntax, manual memory management
C++ + pybind11	100–200×	Maximum performanceMaximum performance	⚠️ Segfaults, UB, builds complexes⚠️ Segfaults, UB, complex builds
Rust + PyO3	100–200×	✅ Safe, parallèle, clean✅ Safe, parallel, clean	Courbe d'apprentissage RustRust learning curve

2. Exemple Concret: Monte Carlo Rough Heston2. Concrete Example: Monte Carlo Rough Heston

Voici l'implémentation Python classique d'un pricer Monte Carlo Rough Heston. Avec 5,000 chemins et 100 pas de temps, elle prend ~120 secondes:Here is the classic Python implementation of a Monte Carlo Rough Heston pricer. With 5,000 paths and 100 time steps, it takes ~120 seconds:

import numpy as np

def rh_mc_put_python(S, K, T, r, H, nu, rho, kappa, theta, v0,
                     n_paths=5000, n_steps=100):
    dt = T / n_steps
    sqrt_dt = np.sqrt(dt)
    
    payoffs = np.zeros(n_paths)
    for p in range(n_paths):
        S_t, V_t = S, v0
        for i in range(n_steps):
            Z1 = np.random.standard_normal()
            Z2 = rho * Z1 + np.sqrt(1 - rho**2) * np.random.standard_normal()
            V_t = max(V_t + kappa * (theta - V_t) * dt
                      + nu * np.sqrt(max(V_t, 0)) * sqrt_dt * Z2, 1e-8)
            S_t *= np.exp((r - 0.5 * V_t) * dt
                          + np.sqrt(max(V_t, 0)) * sqrt_dt * Z1)
        payoffs[p] = max(K - S_t, 0)
    
    return np.exp(-r * T) * np.mean(payoffs)

Et voici la même logique en Rust avec PyO3 et Rayon pour la parallélisation:And here is the same logic in Rust with PyO3 and Rayon for parallelisation:

use pyo3::prelude::*;
use rayon::prelude::*;
use rand::prelude::*;
use rand_distr::StandardNormal;

#[pyfunction]
fn rh_mc_put(
    spot: f64, k: f64, t: f64, r: f64, h: f64, nu: f64, rho: f64,
    lambda_: f64, theta: f64, v0: f64, n_paths: usize, n_steps: usize,
) -> f64 {
    let payoffs: Vec<f64> = (0..n_paths)
        .into_par_iter()  // ← Rayon: parallélisation automatique
        .map(|_| {
            let (s_t, _) = simulate_path(spot, t, r, h, nu, rho,
                                          lambda_, theta, v0, n_steps);
            (k - s_t).max(0.0)
        })
        .collect();
    
    let mean = payoffs.iter().sum::<f64>() / n_paths as f64;
    (-r * t).exp() * mean
}

✅ Résultat:✅ Result: The Rust version executes in 0.8 seconds — that is 150× faster than Python, with the same numerical result.

3. Benchmarks de Production3. Production Benchmarks

Voici les speedups mesurés sur notre infrastructure de production (Apple M2 Pro, 12 cœurs):Here are the speedups measured on our production infrastructure (Apple M2 Pro, 12 cores):

Temps d'exécution par module (échelle log)Execution time per module (log scale)

Rough Heston MC

Python: 120s

Numba: 36s

Rust: 0.8s ⚡

Module	Python	Rust	Speedup
Monte Carlo Rough Heston	120s	0.8s	150×
Order Book Aggregation	45ms	0.23ms	196×
Greeks (bump & reprice)	8.2s	52ms	158×
Regime Detection (HMM)	2.1s	18ms	117×
Portfolio Optimization	340ms	3.8ms	89×

4. Railway-Oriented Programming

Au-delà de la performance pure, Rust apporte un pattern de programmation fonctionnelle fondamental pour les systèmes critiques: le Railway-Oriented Programming (ROP). Ce pattern, popularisé par Scott Wlaschin, utilise le type Result<T, E> pour gérer les erreurs de manière composable.Beyond raw performance, Rust brings a fundamental functional programming pattern for critical systems: Railway-Oriented Programming (ROP). This pattern, popularised by Scott Wlaschin, uses the Result<T, E> type for composable error handling.

Exemple de pipeline ROP avec Rust:Example of an ROP pipeline in Rust:

use anyhow::{Result, Context};

fn process_order(raw: &str) -> Result<ExecutedOrder> {
    let order = parse_order(raw)
        .context("Failed to parse order")?;           // ← Switch 1
    
    let validated = validate_order(order)
        .context("Order validation failed")?;          // ← Switch 2
    
    let priced = compute_price(&validated)
        .context("Pricing engine error")?;             // ← Switch 3
    
    let executed = execute_order(priced)
        .context("Execution failed")?;                 // ← Switch 4
    
    Ok(executed)
}

// Appel — pas de try/catch, l'erreur est dans le type
match process_order(raw_data) {
    Ok(order) => log::info!("Executed: {:?}", order),
    Err(e) => log::error!("Pipeline failed: {:?}", e),
}

🚃 Railway-Oriented Programming: Chaque étape peut échouer et "dérailler" vers la voie d'erreur. L'opérateur ? de Rust propage automatiquement les erreurs sans verbosité ni exceptions non gérées.Each step can fail and "derail" to the error track. Rust's ? operator automatically propagates errors without verbosity or unhandled exceptions.

Polarway: Notre Moteur de Données Haute PerformancePolarway: Our High-Performance Data Engine

Polarway est notre moteur de données construit sur Polars — le DataFrame 100% Rust qui est 31-72× plus rapide que pandas. Polarway ajoute des fonctionnalités spécifiques à la finance haute fréquence:Polarway is our data engine built on Polars — the 100% Rust DataFrame that is 31-72× faster than pandas. Polarway adds high-frequency finance specific features:

🔗 gRPC streaming pour les pipelines distribuésfor distributed pipelines
📊 Window functions optimisées pour les rolling metrics financiersoptimised for financial rolling metrics
🔄 Lazy evaluation avec plan d'exécution optimiséwith optimised execution plan
💾 Formats columnarColumnar formats: Parquet, Arrow IPC, Delta Lake

import polarway as pw

# Pipeline lazy avec optimisation automatique
pipeline = (
    pw.scan_parquet("trades/*.parquet")
    .filter(pw.col("volume") > 1000)
    .with_columns([
        pw.col("price").rolling_mean(window_size=60).alias("vwap_60s"),
        pw.col("price").pct_change().alias("returns"),
    ])
    .group_by_dynamic("timestamp", every="1m")
    .agg([
        pw.col("price").last().alias("close"),
        pw.col("volume").sum().alias("volume"),
        pw.col("returns").std().alias("realized_vol"),
    ])
)

# Exécution parallèle sur tous les cœurs
df = pipeline.collect()  # ~50× plus rapide que pandas équivalent

📚 Documentation PolarwayPolarway Documentation

Explorez la documentation complète avec exemples et API reference.Explore the complete documentation with examples and API reference.

ReadTheDocs GitHub

5. Polarway: Architecture et Performances5. Polarway: Architecture and Performance

Polarway combine les forces de Polars (query engine Apache Arrow) avec des extensions HFT-spécifiques:Polarway combines the strengths of Polars (Apache Arrow query engine) with HFT-specific extensions:

Figure 2: Polarway stack — from Python to optimised I/O via Rust and Apache Arrow.

6. Conclusion & Ressources6. Conclusion & Resources

En migrant les 20% critiques de notre codebase vers Rust avec PyO3, nous avons obtenu des speedups de 50× à 200× tout en gardant une API Python propre.By migrating the critical 20% of our codebase to Rust with PyO3, we achieved speedups of 50× to 200× while keeping a clean Python API.

🎯 Takeaways:

Rust + PyO3 = performance C++ + sécurité mémoireRust + PyO3 = C++ performance + memory safety
Rayon = parallélisation triviale des bouclesRayon = trivial loop parallelisation
Railway-Oriented Programming = gestion d'erreurs composableRailway-Oriented Programming = composable error handling
Polarway/Polars = DataFrames 31-72× plus rapides que pandasPolarway/Polars = DataFrames 31-72× faster than pandas

📚 Ressources📚 Resources

PyO3 Documentation — Le bridge Rust ↔ PythonThe Rust ↔ Python bridge
Polars — DataFrame engine en RustRust DataFrame engine
Polarway Documentation — Notre moteur de donnéesOur data engine
Railway-Oriented Programming — Scott Wlaschin

🚀 Essayez HFThot Lab🚀 Try HFThot Lab

Expérimentez avec nos labs Rust-accélérés: Monte Carlo, Greeks, Portfolio Optimization...Experiment with our Rust-accelerated labs: Monte Carlo, Greeks, Portfolio Optimization...

Launch Demo View Plans

Rust in Python: Comment nous avons atteint 150× avec PyO3

Rust in Python: How We Achieved 150× with PyO3