1. Why Rust for Quantitative Finance
Python dominates quantitative finance thanks to its ecosystem (NumPy, pandas, scikit-learn) and rapid prototyping. But when the inner loop is a Monte Carlo simulation with \(10^6\) paths, or an order-book processor handling \(10^5\) events/second, CPython interpreter overhead becomes the bottleneck.
Comparison of Acceleration Approaches
| Approach | Speedup | Pros | Cons |
|---|---|---|---|
| Vectorised NumPy | 5–20× | Easy, no build step | Memory-hungry |
| Numba JIT | 30–80× | Simple decorator | Fragile compilation, limited types |
| Cython | 20–50× | Mature | Verbose syntax, manual memory management |
| C++ + pybind11 | 100–200× | Maximum performance | ⚠️ Segfaults, UB, complex builds |
| Rust + PyO3 | 100–200× | ✅ Safe, parallel, clean | Rust learning curve |
2. Concrete Example: Monte Carlo Rough Heston
Here is the classic Python implementation of a Monte Carlo Rough Heston pricer. With 5,000 paths and 100 time steps, it takes ~120 seconds:
import numpy as np
def rh_mc_put_python(S, K, T, r, H, nu, rho, kappa, theta, v0,
n_paths=5000, n_steps=100):
dt = T / n_steps
sqrt_dt = np.sqrt(dt)
payoffs = np.zeros(n_paths)
for p in range(n_paths):
S_t, V_t = S, v0
for i in range(n_steps):
Z1 = np.random.standard_normal()
Z2 = rho * Z1 + np.sqrt(1 - rho**2) * np.random.standard_normal()
V_t = max(V_t + kappa * (theta - V_t) * dt
+ nu * np.sqrt(max(V_t, 0)) * sqrt_dt * Z2, 1e-8)
S_t *= np.exp((r - 0.5 * V_t) * dt
+ np.sqrt(max(V_t, 0)) * sqrt_dt * Z1)
payoffs[p] = max(K - S_t, 0)
return np.exp(-r * T) * np.mean(payoffs)
And here is the same logic in Rust with PyO3 and Rayon for parallelisation:
use pyo3::prelude::*;
use rayon::prelude::*;
use rand::prelude::*;
use rand_distr::StandardNormal;
#[pyfunction]
fn rh_mc_put(
spot: f64, k: f64, t: f64, r: f64, h: f64, nu: f64, rho: f64,
lambda_: f64, theta: f64, v0: f64, n_paths: usize, n_steps: usize,
) -> f64 {
let payoffs: Vec<f64> = (0..n_paths)
.into_par_iter() // ← Rayon: parallélisation automatique
.map(|_| {
let (s_t, _) = simulate_path(spot, t, r, h, nu, rho,
lambda_, theta, v0, n_steps);
(k - s_t).max(0.0)
})
.collect();
let mean = payoffs.iter().sum::<f64>() / n_paths as f64;
(-r * t).exp() * mean
}
3. Production Benchmarks
Here are the speedups measured on our production infrastructure (Apple M2 Pro, 12 cores):
Execution time per module (log scale)
| Module | Python | Rust | Speedup |
|---|---|---|---|
| Monte Carlo Rough Heston | 120s | 0.8s | 150× |
| Order Book Aggregation | 45ms | 0.23ms | 196× |
| Greeks (bump & reprice) | 8.2s | 52ms | 158× |
| Regime Detection (HMM) | 2.1s | 18ms | 117× |
| Portfolio Optimization | 340ms | 3.8ms | 89× |
4. Railway-Oriented Programming
Beyond raw performance, Rust brings a fundamental functional programming pattern for critical systems: Railway-Oriented Programming (ROP). This pattern, popularised by Scott Wlaschin, uses the Result<T, E> type for composable error handling.
Example of an ROP pipeline in Rust:
use anyhow::{Result, Context};
fn process_order(raw: &str) -> Result<ExecutedOrder> {
let order = parse_order(raw)
.context("Failed to parse order")?; // ← Switch 1
let validated = validate_order(order)
.context("Order validation failed")?; // ← Switch 2
let priced = compute_price(&validated)
.context("Pricing engine error")?; // ← Switch 3
let executed = execute_order(priced)
.context("Execution failed")?; // ← Switch 4
Ok(executed)
}
// Appel — pas de try/catch, l'erreur est dans le type
match process_order(raw_data) {
Ok(order) => log::info!("Executed: {:?}", order),
Err(e) => log::error!("Pipeline failed: {:?}", e),
}
? operator automatically propagates errors without verbosity or unhandled exceptions.
Polarway: Our High-Performance Data Engine
Polarway is our data engine built on Polars — the 100% Rust DataFrame that is 31-72× faster than pandas. Polarway adds high-frequency finance specific features:
- 🔗 gRPC streaming for distributed pipelines
- 📊 Window functions optimised for financial rolling metrics
- 🔄 Lazy evaluation with optimised execution plan
- 💾 Columnar formats: Parquet, Arrow IPC, Delta Lake
import polarway as pw
# Pipeline lazy avec optimisation automatique
pipeline = (
pw.scan_parquet("trades/*.parquet")
.filter(pw.col("volume") > 1000)
.with_columns([
pw.col("price").rolling_mean(window_size=60).alias("vwap_60s"),
pw.col("price").pct_change().alias("returns"),
])
.group_by_dynamic("timestamp", every="1m")
.agg([
pw.col("price").last().alias("close"),
pw.col("volume").sum().alias("volume"),
pw.col("returns").std().alias("realized_vol"),
])
)
# Exécution parallèle sur tous les cœurs
df = pipeline.collect() # ~50× plus rapide que pandas équivalent
📚 Polarway Documentation
Explore the complete documentation with examples and API reference.
ReadTheDocs GitHub5. Rust + WebAssembly: The Browser-Native Future
Beyond PyO3, Rust also compiles to WebAssembly (WASM) — enabling quantitative calculations to run directly in the browser with near-native performance:
6. Polarway: Architecture and Performance
Polarway combines the strengths of Polars (Apache Arrow query engine) with HFT-specific extensions:
7. Conclusion & Resources
By migrating the critical 20% of our codebase to Rust with PyO3, we achieved speedups of 50× to 200× while keeping a clean Python API. Adding WebAssembly allows running the same computations in the browser.
- Rust + PyO3 = C++ performance + memory safety
- Rayon = trivial loop parallelisation
- Railway-Oriented Programming = composable error handling
- WASM = browser portability without recompilation
- Polarway/Polars = DataFrames 31-72× faster than pandas
📚 Resources
- PyO3 Documentation — The Rust ↔ Python bridge
- Polars — Rust DataFrame engine
- Polarway Documentation — Our data engine
- Rust WASM Book — WebAssembly with Rust
- Railway-Oriented Programming — Scott Wlaschin
🚀 Try HFThot Lab
Experiment with our Rust-accelerated labs: Monte Carlo, Greeks, Portfolio Optimization...
Launch Demo View Plans