Zero-Cost Async: From Python asyncio to Rust Tokio

1. Le problème : les coûts cachés d'asyncio 1. The problem: asyncio's hidden costs

Python asyncio est conçu pour l'I/O concurrente. C'est un excellent outil pour les applications web, mais il cache une vérité que tout ingénieur data rencontre tôt ou tard : le GIL (Global Interpreter Lock) empêche tout parallélisme réel. Et cette vérité, on ne la découvre pas en lisant la documentation — on la découvre à 2h du matin quand les alertes PagerDuty crépitent et que la dashboard Grafana montre une courbe de latence qui monte en escalier.

Python asyncio is designed for concurrent I/O. It's an excellent tool for web applications, but it hides a truth every data engineer encounters sooner or later: the GIL (Global Interpreter Lock) prevents any real parallelism. And you don't discover this truth by reading documentation — you discover it at 2AM when PagerDuty alerts are firing and the Grafana dashboard shows a latency curve climbing like a staircase.

Nous l'avons appris de la pire façon. Notre premier pipeline d'ingestion HFT était un beau code asyncio — gather() partout, des coroutines élégantes, une architecture qui ressemblait à un diagramme de livre. En staging, tout fonctionnait. En production, avec 50 WebSockets ouverts simultanément et des DataFrames Polars à dé-sérialiser entre chaque tick, le throughput plafonnait à 70-80 requêtes par seconde. On ajoutait des cœurs CPU, rien ne changeait. Le profiler nous a montré la réalité : 94% du temps d'exécution était séquentiel. Le GIL transformait notre belle concurrence en file d'attente.

We learned this the hard way. Our first HFT ingestion pipeline was beautiful asyncio code — gather() everywhere, elegant coroutines, an architecture that looked like a textbook diagram. In staging, everything worked. In production, with 50 WebSockets open simultaneously and Polars DataFrames to deserialize between each tick, throughput plateaued at 70-80 queries per second. We added CPU cores, nothing changed. The profiler showed us the reality: 94% of execution time was sequential. The GIL turned our beautiful concurrency into a queue.

Je me souviens du moment exact. On venait de passer 3 semaines à refactorer le pipeline pour le rendre "proprement asynchrone" — on avait suivi tous les guides, découpé chaque opération en coroutines, ajouté des Semaphore Python pour le rate limiting. Le code était presque beau. On a déployé un vendredi soir (oui, je sais), et le lundi matin, le Slack de l'équipe trading était en feu : "Pourquoi les ticks arrivent avec 200ms de retard ? On a raté 3 opportunités d'arbitrage ce matin." Trois opportunités d'arbitrage. À 500-2000$ chacune. En une matinée. Le coût de l'asyncio n'était plus abstrait — il avait un prix en euros.

I remember the exact moment. We had just spent 3 weeks refactoring the pipeline to make it "properly asynchronous" — we followed every guide, split every operation into coroutines, added Python Semaphores for rate limiting. The code was almost beautiful. We deployed on a Friday evening (yes, I know), and Monday morning the trading team's Slack was on fire: "Why are ticks arriving with 200ms delay? We missed 3 arbitrage opportunities this morning." Three arbitrage opportunities. At $500-2000 each. In one morning. The cost of asyncio was no longer abstract — it had a price in euros.

Une boucle asyncio exécute un seul coroutine à la fois sur un seul thread. L'illusion de concurrence vient du fait que les I/O relâchent le GIL — mais dès que votre code touche un DataFrame, sérialise du JSON, ou filtre un Parquet, vous repassez en séquentiel :

An asyncio event loop runs one coroutine at a time on one thread. The illusion of concurrency comes from I/O releasing the GIL — but the moment your code touches a DataFrame, serializes JSON, or filters Parquet, you're back to sequential:

Python asyncio

✗ Single-threaded event loop

✗ GIL blocks CPU-bound work

✗ Heap-allocated coroutines

✗ No backpressure built-in

✗ Exception-based errors

✗ Saturates at ~70 QPS (GIL)

Rust + Tokio

✓ Multi-threaded work-stealing

✓ No GIL — true parallelism

✓ Zero-cost state machines

✓ Built-in backpressure (mpsc)

✓ Monadic Result<T,E>

✓ Linear scaling to 1200+ QPS

La différence ne relève pas de l'optimisation. On ne peut pas "optimiser" asyncio pour contourner le GIL — c'est une limitation architecturale. Chaque workaround (ProcessPoolExecutor, multiprocessing, uvloop) ajoute de la complexité sans résoudre le problème fondamental : Python exécute un seul bytecode à la fois. Nous avons essayé chacune de ces approches. uvloop donne ~15% sur le scheduling, mais le GIL reste. ProcessPoolExecutor exige de sérialiser les DataFrames entre les processus — ce qui annule presque tout le gain. La seule solution réelle : sortir du GIL complètement.

The difference isn't about optimization. You can't "optimize" asyncio to work around the GIL — it's an architectural limitation. Every workaround (ProcessPoolExecutor, multiprocessing, uvloop) adds complexity without solving the fundamental problem: Python executes one bytecode at a time. We tried each of these approaches. uvloop gives ~15% on scheduling, but the GIL remains. ProcessPoolExecutor requires serializing DataFrames between processes — which almost cancels out all the gains. The only real solution: escape the GIL entirely.

Laissez-moi être plus précis sur ces workarounds, parce que chacun nous a coûté des semaines de travail avant de comprendre qu'on perdait notre temps. uvloop a été notre premier espoir — un drop-in replacement qui promet 2-4x sur le scheduling de la boucle événementielle. On l'a installé, benchmarké : 15% de gain sur le dispatch, zéro gain sur la sérialisation des DataFrames. Le GIL ne s'en soucie pas — il ne distingue pas le code "rapide" du code "lent", il verrouille tout pareil. Ensuite, ProcessPoolExecutor : on lance les lectures Parquet dans des processus séparés. Sounds great, sauf que chaque DataFrame de 50MB doit être sérialisé en pickle pour traverser la frontière inter-processus. À 50 fichiers, le temps de sérialisation dépasse le temps de lecture. On a littéralement essayé de paralléliser un problème et créé un problème pire. Enfin, multiprocessing.shared_memory pour les DataFrames : ça marche, mais le code ressemble à du C++ des années 90 — des offsets manuels, des tailles en bytes, des segfaults silencieux quand on se trompe d'un octet. Ce n'est plus du Python, c'est de la souffrance avec des f-strings.

Let me be more specific about these workarounds, because each one cost us weeks of work before we understood we were wasting our time. uvloop was our first hope — a drop-in replacement that promises 2-4x on event loop scheduling. We installed it, benchmarked it: 15% gain on dispatch, zero gain on DataFrame serialization. The GIL doesn't care — it doesn't distinguish "fast" code from "slow" code, it locks everything the same. Next, ProcessPoolExecutor: we launch Parquet reads in separate processes. Sounds great, except each 50MB DataFrame needs to be pickle-serialized to cross the inter-process boundary. At 50 files, serialization time exceeds read time. We literally tried to parallelize a problem and created a worse one. Finally, multiprocessing.shared_memory for DataFrames: it works, but the code looks like 90s C++ — manual offsets, byte sizes, silent segfaults when you're off by one byte. That's not Python anymore, that's suffering with f-strings.

The fundamental difference is mathematical. Python's throughput complexity is bounded:

$$ T_{\text{asyncio}}(n) = O(n) \quad \text{(sequential, GIL-bound)} $$ $$ T_{\text{tokio}}(n, c) = O\!\left(\frac{n}{c}\right) \quad \text{where } c = \text{CPU cores} $$

Tokio divise le travail par le nombre de cœurs — asyncio ne le peut pas. Tokio divides work by the number of cores — asyncio cannot.

C'est cette asymétrie qui nous a poussés à construire Polarway. Pas par amour de Rust (même si, avouons-le, les match sur des Result sont addictifs), mais parce qu'aucune quantité de cleverness Python ne peut transformer $O(n)$ en $O(n/c)$. C'est un invariant du langage, pas un bug à corriger. Et une fois qu'on l'accepte — vraiment, viscéralement, pas juste intellectuellement — le reste suit naturellement. Vous arrêtez de chercher le workaround magique. Vous arrêtez de lire des threads Reddit qui promettent que "Python 3.13 va tout changer avec le sub-interpreter GIL removal". Vous prenez la décision difficile : réécrire la couche critique dans un langage capable de parallélisme réel, et garder Python là où il excelle — l'interface utilisateur, la data science interactive, le prototypage. Chacun son rôle.

This asymmetry is what drove us to build Polarway. Not out of love for Rust (although, let's be honest, pattern matching on Result values is addictive), but because no amount of Python cleverness can transform $O(n)$ into $O(n/c)$. It's a language invariant, not a bug to fix. And once you accept that — truly, viscerally, not just intellectually — everything else follows naturally. You stop looking for the magic workaround. You stop reading Reddit threads promising that "Python 3.13 will change everything with sub-interpreter GIL removal". You make the hard decision: rewrite the critical layer in a language capable of real parallelism, and keep Python where it excels — user interfaces, interactive data science, prototyping. Each to its own.

2. Les 4 piliers de Polarway 2. The 4 pillars of Polarway

Polarway was designed around four non-negotiable performance invariants. I use the word "invariant" deliberately — not "goal", not "aspiration". An invariant, in mathematics, is a property that remains true regardless of the transformation applied. The four numbers below must hold true regardless of the dataset, regardless of the query pattern, regardless of the time of day. If any single one breaks, the system has regressed — and we treat it as a bug, not as acceptable degradation.

Why these four specifically? Because they correspond to the four failure modes we observed in production. Latency kills high-frequency trading — a tick arriving 2ms late is a useless tick, plain and simple. In a market that moves 0.5% in 100ms, 2ms is an eternity. Throughput limits how many feeds a single node can absorb: when you go from 10 crypto pairs to 40, you need 4x the throughput, not an excuse about why "it's enough for the demo". Memory determines whether your pipeline survives a volatile trading day without an OOM-kill at 3AM — and nothing is more depressing than getting a PagerDuty call because your server ate 64GB of RAM during a flash crash. And CPU scalability dictates your infrastructure cost — if every doubling of load requires a doubling of machines, your cloud bill diverges like a harmonic series.

⏱️

< 1ms

End-to-end latency

p50 = 0.8ms, p99 = 1.2ms

🚀

100k+/core

Throughput (ticks/second)

120k measured on c6i.4xlarge

💾

O(batch)

Constant memory

500MB @ 50GB dataset

⚙️

Linear

CPU scalability

17.1x @ 500 concurrent queries

Key invariant: memory stays at $O(\text{batch\_size})$ regardless of dataset size. Polarway processes 50 GB with 500 MB of RAM — where Polars fails with OOM.

Each pillar is measurable and verifiable. These are not aspirational goals — they're contracts. Our CI runs benchmarks on every PR and fails if any of the four invariants regresses. A commit that improves throughput but degrades p99 latency beyond 1.2ms is automatically rejected. It's rigid, but it's what allows us to promise these numbers to users rather than hope for them.

I'll be honest: this rigidity has a cost. Three times we've blocked a PR for over a week because a memory optimization degraded p99 by 0.1ms. Three times the developer wanted to argue that "0.1ms is within noise". Three times we refused. Because today's noise is tomorrow's regression. If you accept 0.1ms this week, you'll accept 0.2ms next week, and in six months you're at 3ms wondering "how did we get here?". CI as a gatekeeper isn't popular. But stable pipelines are very popular with teams that sleep at night.

3. Tokio : ce que Rust apporte 3. Tokio: what Rust brings to the table

Le runtime Tokio transforme les fonctions async Rust en machines à états compilées — aucune allocation heap, aucun overhead de scheduler interprété. C'est le sens de "zero-cost". Et ce n'est pas un slogan marketing — c'est une propriété vérifiable : désassemblez le binaire, cherchez les allocations, comptez-les. Zéro.

Tokio's runtime transforms Rust async functions into compiled state machines — no heap allocation, no interpreter scheduler overhead. That's what "zero-cost" means. And it's not a marketing slogan — it's a verifiable property: disassemble the binary, look for allocations, count them. Zero.

Quand nous avons choisi Tokio, nous avons aussi évalué Go (goroutines + channels), Zig (async frames stack-allocated), et même le moteur io_uring de Glommio pour du Rust sans Tokio. Chaque option avait ses mérites. Go est plus facile à apprendre — pas de lifetime, pas de borrow checker, un garbage collector qui "just works". Mais les goroutines sont heap-allocated (8KB chacune au minimum), le GC introduit des pauses imprévisibles, et les channels Go n'ont pas de typage aussi strict que les channels Tokio. Zig était tentant pour le contrôle total — mais l'écosystème en 2025 était trop immature pour de la production, et l'absence de package manager standard rendait la supply chain risquée. Tokio a gagné par défaut : c'est le runtime async le plus testé en production dans l'industrie (Cloudflare, Discord, Fly.io l'utilisent), avec une communauté qui shippe des correctifs de sécurité en heures, pas en semaines.

When we chose Tokio, we also evaluated Go (goroutines + channels), Zig (async stack-allocated frames), and even Glommio's io_uring engine for Tokio-free Rust. Each option had its merits. Go is easier to learn — no lifetimes, no borrow checker, a garbage collector that "just works". But goroutines are heap-allocated (8KB each minimum), the GC introduces unpredictable pauses, and Go channels don't have typing as strict as Tokio channels. Zig was tempting for total control — but the ecosystem in 2025 was too immature for production, and the lack of a standard package manager made the supply chain risky. Tokio won by default: it's the most production-tested async runtime in the industry (Cloudflare, Discord, Fly.io all use it), with a community that ships security fixes in hours, not weeks.

Pour comprendre pourquoi c'est important, il faut regarder ce qui se passe *réellement* quand vous écrivez await en Python vs en Rust. En Python, chaque coroutine est un objet alloué sur le heap avec son propre frame et sa stack. Le scheduler asyncio — écrit en Python pur (oui, même avec uvloop, le dispatch est interprété) — doit choisir quelle coroutine réveiller à chaque itération de la boucle. En Rust, le compilateur génère un enum où chaque variante représente un état de la future. Pas d'allocation, pas de dispatch dynamique, pas d'indirection. C'est strictement équivalent à une machine à états que vous auriez écrite à la main — le compilateur fait le travail à votre place.

To understand why this matters, you need to look at what *actually* happens when you write await in Python vs Rust. In Python, each coroutine is a heap-allocated object with its own frame and stack. The asyncio scheduler — written in pure Python (yes, even with uvloop the dispatch is interpreted) — must choose which coroutine to wake on each loop iteration. In Rust, the compiler generates an enum where each variant represents a state of the future. No allocation, no dynamic dispatch, no indirection. It's strictly equivalent to a hand-written state machine — the compiler does the work for you.

C'est un point que les gens sous-estiment. Quand je dis "le compilateur génère un enum", ça signifie que votre fonction async de 50 lignes avec 3 await points est transformée en un enum à 4 variantes (state initial + 3 suspension points) qui tient dans 128 bytes sur la stack. En Python, la même fonction alloue un coroutine object (~200 bytes), un frame object (~400 bytes), et potentiellement un task wrapper (~100 bytes). Multipliez par 10,000 coroutines concurrentes et la différence est: Rust utilise 1.2MB sur la stack, Python utilise 7MB sur le heap — avec les allocations, la fragmentation, et le garbage collector qui doit nettoyer tout ça.

This is a point people underestimate. When I say "the compiler generates an enum", it means your 50-line async function with 3 await points is transformed into a 4-variant enum (initial state + 3 suspension points) that fits in 128 bytes on the stack. In Python, the same function allocates a coroutine object (~200 bytes), a frame object (~400 bytes), and potentially a task wrapper (~100 bytes). Multiply by 10,000 concurrent coroutines and the difference is: Rust uses 1.2MB on the stack, Python uses 7MB on the heap — with allocations, fragmentation, and the garbage collector having to clean all of it up.

Repr\u00e9sentation m\u00e9moire : coroutine Python (heap, fragment\u00e9) vs future Rust (stack, compacte). Memory representation: Python coroutine (heap, fragmented) vs Rust future (stack, compact).

Work-stealing : lecture concurrente de fichiers Work-stealing: concurrent file reads

Passons au code concret. L'API JoinSet de Tokio est la pièce maîtresse de notre stratégie de lecture concurrente. Plutôt que de lire les fichiers Parquet séquentiellement, on lance chaque lecture dans une tâche Tokio séparée. Le scheduler se charge de répartir ces tâches sur tous les cœurs disponibles — et si un cœur finit avant les autres, il vole du travail à ses voisins.

Let's look at the actual code. Tokio's JoinSet API is the centerpiece of our concurrent read strategy. Instead of reading Parquet files sequentially, we spawn each read in a separate Tokio task. The scheduler handles distributing these tasks across all available cores — and if one core finishes early, it steals work from its neighbors.

use tokio::task::JoinSet;

async fn concurrent_parquet_reads(paths: Vec<String>) -> Result<Vec<DataFrame>, Error> {
    let mut set = JoinSet::new();

    for path in paths {
        set.spawn(async move {
            // Each file read runs on a separate Tokio task
            // Work-stealing ensures optimal CPU utilization
            read_parquet(&path).await
        });
    }

    let mut results = Vec::new();
    while let Some(res) = set.join_next().await {
        results.push(res??);
    }

    Ok(results)
}

Remarquez la double ? dans res??. Le premier ? déplie le JoinError (la tâche a-t-elle panic ?), le second déplie notre Error métier. C'est typiquement Rust : les erreurs sont explicites à chaque couche, et le compilateur vous force à les gérer. En Python avec asyncio, un gather() avec return_exceptions=True vous donne une liste de Union[result, Exception] — et c'est à vous de trier le bon du mauvais avec des isinstance(). Ici, le type system fait le travail.

Notice the double ? in res??. The first ? unwraps the JoinError (did the task panic?), the second unwraps our business Error. This is quintessentially Rust: errors are explicit at every layer, and the compiler forces you to handle them. In Python with asyncio, a gather() with return_exceptions=True gives you a list of Union[result, Exception] — and it's up to you to sort good from bad with isinstance() checks. Here, the type system does the work.

Le JoinSet de Tokio est également plus intelligent que asyncio.gather() sur un point subtil : le work-stealing. Quand vous spawn 100 tâches, Tokio ne les met pas toutes dans la même file d'attente. Chaque thread worker a sa propre file, et quand un worker finit avant les autres, il *vole* une tâche de la file d'un autre worker. Résultat : l'utilisation CPU est quasi-optimale sans aucune intervention de votre part.

Tokio's JoinSet is also smarter than asyncio.gather() on a subtle point: work-stealing. When you spawn 100 tasks, Tokio doesn't put them all in the same queue. Each worker thread has its own queue, and when a worker finishes early, it *steals* a task from another worker's queue. Result: CPU utilization is near-optimal without any intervention on your part.

Work-stealing is an elegant algorithm formalized by Blumofe and Leiserson at MIT in the 90s, and Tokio implements it with lock-free queues. The intuition is simple: imagine 4 cooks in a kitchen. Without work-stealing, each cook has their stack of orders and processes them sequentially — if cook 1 has 20 orders and cook 4 has only 2, cook 4 stands idle while cook 1 is overwhelmed. With work-stealing, cook 4 takes orders from cook 1's stack. Everyone works all the time. The beauty is that the stealing is nearly free: queues are implemented as lock-free deques, stealing happens from the opposite end of insertion (LIFO for the local worker, FIFO for the thief), and contention is virtually nonexistent. It's elegant, and it shows in our benchmarks: CPU utilization is consistently above 92% even with variable-duration tasks.

Work-stealing : le thread 3 (inactif) vole la t\u00e2che E de la file du thread 1 (surcharg\u00e9). Work-stealing: thread 3 (idle) steals task E from thread 1's queue (overloaded).

Côté Python, le code client est presque identique à un pipeline asyncio classique — mais sous le capot, Tokio distribue le travail sur tous les cœurs. C'est la promesse centrale de Polarway : le data scientist n'a pas besoin de connaître Rust. Il écrit du Python standard, et le serveur gRPC fait le reste. Notez que l'API retourne des Result — chaque lecture peut échouer indépendamment, et les erreurs sont filtrées explicitement.

On the Python side, the client code is nearly identical to a classic asyncio pipeline — but under the hood, Tokio distributes work across all cores. This is Polarway's central promise: the data scientist doesn't need to know Rust. They write standard Python, and the gRPC server does the rest. Notice the API returns Result values — each read can fail independently, and errors are filtered explicitly.

from polarway.async_client import AsyncPolarwayClient

async with AsyncPolarwayClient("localhost:50051") as client:
    # Read 100 files concurrently — Tokio work-steals across all cores
    results = await client.batch_read([
        f"data/file_{i:03d}.parquet" for i in range(100)
    ])

    handles = [r.unwrap() for r in results if r.is_ok()]

    # Concurrent collect
    tables = await client.batch_collect(handles)

    print(f"Processed {len(tables)} files concurrently")

Le point clé : le code Python ne change presque pas. Vous remplacez polars.read_parquet() par client.batch_read() et gagnez 5x. Pas de threads, pas de multiprocessing, pas de concurrent.futures.

Key point: the Python code barely changes. You replace polars.read_parquet() with client.batch_read() and gain 5x. No threads, no multiprocessing, no concurrent.futures.

C'était notre objectif de design numéro un : que le data scientist ne voie *rien* changer dans son workflow. Son code reste de l'async/await Python normal. Il n'a pas besoin de connaître Rust, Tokio, ni les channels mpsc. La complexité est absorbée par la couche gRPC entre le client Python et le serveur Rust. Pour lui, c'est juste "Polars, mais plus rapide". Et c'est exactement ce qu'on voulait. Un outil que les gens *utilisent* plutôt qu'un outil qu'ils *admirent*.

This was our number one design goal: the data scientist sees *nothing* change in their workflow. Their code remains normal Python async/await. They don't need to know Rust, Tokio, or mpsc channels. The complexity is absorbed by the gRPC layer between the Python client and the Rust server. For them, it's just "Polars, but faster". And that's exactly what we wanted. A tool people *use* rather than a tool they *admire*.

There's a philosophy I borrowed from Rich Hickey (Clojure's creator): "Simple is not Easy." Building a tool that's *simple* to use (the Python API is 12 methods, not 120) requires *hard* work under the hood. The gRPC server does protocol buffering, automatic batching, connection pooling, Arrow zero-copy serialization. The Python client sees none of it. And that's hundreds of hours of design work: every time we added a parameter to the API, we asked "does the data scientist *need* to know this?". If the answer was no, we made it automatic. The best code is the code the user never has to write.

Backpressure intégrée avec Semaphore Built-in backpressure with Semaphore

Le pattern précédent lance autant de tâches que de fichiers — ce qui est parfait quand vous avez 100 fichiers et 16 cœurs. Mais que se passe-t-il quand vous en avez 10 000 ? Vous ne voulez pas 10 000 file descriptors ouverts simultanément. C'est là que le Semaphore de Tokio entre en jeu : il limite le nombre de tâches concurrentes à un maximum configurable, et les tâches excédentaires attendent proprement leur tour.

The previous pattern spawns as many tasks as files — which is perfect when you have 100 files and 16 cores. But what happens when you have 10,000? You don't want 10,000 file descriptors open simultaneously. That's where Tokio's Semaphore comes in: it caps the number of concurrent tasks to a configurable maximum, and excess tasks wait cleanly for their turn.

use tokio::sync::Semaphore;
use std::sync::Arc;

async fn batch_with_backpressure(paths: Vec<String>) -> Result<(), Error> {
    let semaphore = Arc::new(Semaphore::new(10));  // Max 10 concurrent

    let mut handles = vec![];

    for path in paths {
        let sem = semaphore.clone();

        handles.push(tokio::spawn(async move {
            let _permit = sem.acquire().await?;  // Block if at capacity
            read_parquet(&path).await
        }));
    }

    for handle in handles {
        handle.await??;
    }

    Ok(())
}

La backpressure est un sujet que la plupart des tutoriels asyncio ignorent complètement. En asyncio, si votre producteur envoie des messages plus vite que le consommateur ne les traite, la file asyncio.Queue grossit sans limite jusqu'à l'OOM. Avec Tokio, le Semaphore ci-dessus limite à 10 tâches concurrentes. Si la onzième arrive, elle attend — pas de panique, pas d'exception, juste une suspension propre. C'est le même pattern que les circuits hydrauliques : une soupape de pression plutôt qu'un tuyau qui éclate.

Backpressure is a topic most asyncio tutorials completely ignore. In asyncio, if your producer sends messages faster than the consumer processes them, the asyncio.Queue grows without bound until OOM. With Tokio, the Semaphore above caps at 10 concurrent tasks. If an 11th arrives, it waits — no panic, no exception, just a clean suspension. It's the same pattern as hydraulic circuits: a pressure relief valve rather than a pipe that bursts.

J'insiste sur la backpressure parce que c'est la cause numéro un de nos incidents de production *avant* Polarway. Le scénario classique : le marché s'affole (un tweet d'Elon Musk, une annonce réglementaire), les WebSockets envoient 10x leur débit normal, l'asyncio.Queue enfle de 1000 à 100,000 messages en quelques secondes, la RAM passe de 2GB à 30GB, et le kernel Linux tue le processus. Temps de récupération : 3-5 minutes en comptant le redémarrage, la reconnexion aux exchanges, et la re-synchronisation de l'état. Pendant ces 3-5 minutes, vous ne tradez pas, et le marché continue sans vous. Avec un Semaphore borné, le pire qui arrive c'est que les messages s'accumulent *côté producteur* (le WebSocket), qui a son propre buffer TCP. Le consommateur ne voit jamais plus de messages qu'il ne peut en traiter. Le système *ralentit* au lieu de *crasher*. C'est la différence entre un freinage d'urgence et un accident.

I insist on backpressure because it was the number one cause of production incidents *before* Polarway. The classic scenario: the market panics (an Elon Musk tweet, a regulatory announcement), WebSockets send 10x their normal throughput, the asyncio.Queue swells from 1,000 to 100,000 messages in seconds, RAM jumps from 2GB to 30GB, and the Linux kernel kills the process. Recovery time: 3-5 minutes counting the restart, reconnecting to exchanges, and re-synchronizing state. During those 3-5 minutes, you're not trading, and the market continues without you. With a bounded Semaphore, the worst that happens is messages accumulate *on the producer side* (the WebSocket), which has its own TCP buffer. The consumer never sees more messages than it can handle. The system *slows down* instead of *crashing*. That's the difference between emergency braking and a car crash.

4. Gestion d'erreurs monadique — zéro exception 4. Monadic error handling — zero exceptions

Polarway n'utilise aucune exception. Chaque opération retourne un Result<T, E> ou un Option<T>, exactement comme Rust. Ce n'est pas qu'une convention — c'est une propriété du type system qui rend les erreurs impossibles à ignorer. Et avant que vous leviez un sourcil — oui, nous sommes conscients que c'est une position impopulaire en Python. Nous l'assumons.

Polarway uses zero exceptions. Every operation returns a Result<T, E> or Option<T>, exactly like Rust. This isn't just a convention — it's a type system property that makes errors impossible to ignore. And before you raise an eyebrow — yes, we're aware this is an unpopular position in Python. We stand by it.

C'est probablement la décision de design la plus controversée de Polarway. En Python, les exceptions sont idiomatiques — c'est le pattern EAFP ("Easier to Ask Forgiveness than Permission"). Toute la communauté Python est bâtie dessus. Alors pourquoi les abandonner ? Parce qu'EAFP est optimisé pour le prototypage, pas pour la production. Dans un notebook Jupyter, try/except est parfait : vous essayez un truc, ça plante, vous corrigez, vous relancez. Le feedback loop est de 2 secondes. En production, le feedback loop est de 3 heures — c'est le temps qu'il faut pour que le bug atteigne le bon percentile de trafic, déclenche l'alerte, réveille l'ingénieur d'astreinte, qui ouvre le laptop, lit les logs, et comprend ce qui s'est passé. À ce stade, l'exception originale est enterrée sous 47 lignes de traceback et 3 couches de retry.

This is probably the most controversial design decision in Polarway. In Python, exceptions are idiomatic — it's the EAFP pattern ("Easier to Ask Forgiveness than Permission"). The entire Python community is built on it. So why abandon them? Because EAFP is optimized for prototyping, not production. In a Jupyter notebook, try/except is perfect: you try something, it crashes, you fix it, you re-run. The feedback loop is 2 seconds. In production, the feedback loop is 3 hours — that's how long it takes for the bug to reach the right traffic percentile, trigger the alert, wake the on-call engineer, who opens the laptop, reads the logs, and figures out what happened. By then, the original exception is buried under 47 lines of traceback and 3 layers of retry logic.

Parce qu'en production, les exceptions *mentent*. Un try/except vous donne l'illusion de gérer les erreurs, mais en réalité, il attrape tout — y compris les erreurs que vous n'aviez pas prévues. Combien de fois avez-vous vu un except Exception as e: logger.error(e) qui avale silencieusement une TypeError causée par un bug dans *votre* code, pas dans l'I/O ? Les monades rendent chaque point de défaillance visible et composable. Vous ne pouvez pas "oublier" de gérer une erreur — elle est là, dans le type de retour, et Pyright la signale si vous l'ignorez.

Because in production, exceptions *lie*. A try/except gives you the illusion of handling errors, but in reality it catches everything — including errors you didn't anticipate. How many times have you seen except Exception as e: logger.error(e) silently swallowing a TypeError caused by a bug in *your* code, not in the I/O? Monads make every failure point visible and composable. You can't "forget" to handle an error — it's there, in the return type, and Pyright flags it if you ignore it.

Je vais vous raconter l'incident qui a scellé notre décision. Juin 2025. Un de nos pipelines d'ingestion avait un try/except Exception qui loguait les erreurs et continuait. Pendant 6 jours, une AttributeError — causée par un changement de schema côté Binance — a été silencieusement avalée. Le pipeline continuait de tourner, les métriques de santé étaient vertes, mais les données écrites étaient corrompues : un champ avait été renommé, et notre code désérialisait None là où il attendait un float. Six jours de données corrompues. Six jours de backtests faux. Et personne ne s'en est aperçu parce que l'exception ne remontait pas. Avec un Result[DataFrame, SchemaError], le problème aurait été visible à la première requête : le schéma ne matche pas → Err(SchemaError::FieldMissing("price")) → le pipeline s'arrête, l'alerte part, et vous perdez 5 minutes, pas 6 jours.

Let me tell you about the incident that sealed our decision. June 2025. One of our ingestion pipelines had a try/except Exception that logged errors and continued. For 6 days, an AttributeError — caused by a schema change on Binance's side — was silently swallowed. The pipeline kept running, health metrics were green, but the data being written was corrupted: a field had been renamed, and our code was deserializing None where it expected a float. Six days of corrupted data. Six days of false backtests. And nobody noticed because the exception wasn't surfacing. With a Result[DataFrame, SchemaError], the problem would have been visible on the first query: schema doesn't match → Err(SchemaError::FieldMissing("price")) → pipeline stops, alert fires, and you lose 5 minutes, not 6 days.

Result monad

Voici à quoi ressemble le Result de Polarway en pratique. L'API est conçue pour être familière aux développeurs Python : pas de syntaxe exotique, pas de macro, juste des méthodes chaînées. Vous composez les opérations avec .map(), gérez les erreurs avec .or_else(), et enchainez les transformations faillibles avec .and_then(). Si ça vous rappelle les streams de Java ou les Option de Rust — c'est exactement l'idée.

Here is what Polarway's Result looks like in practice. The API is designed to feel familiar to Python developers: no exotic syntax, no macros, just chained methods. You compose operations with .map(), handle errors with .or_else(), and chain fallible transformations with .and_then(). If this reminds you of Java streams or Rust's Option — that's exactly the idea.

from polarway.async_client import Result

# Chain operations with map (Functor)
result: Result[int, str] = Result.ok(42)
doubled = result.map(lambda x: x * 2)  # Ok(84)

# Handle errors with or_else
result = Result.err("Failed to read file")
recovered = result.or_else(lambda e: Result.ok(0))  # Ok(0)

# Compose with and_then (flatMap / Monad bind)
def safe_divide(x: int) -> Result[float, str]:
    if x == 0:
        return Result.err("Division by zero")
    return Result.ok(100 / x)

result = Result.ok(5).and_then(safe_divide)  # Ok(20.0)

Si vous venez de Haskell ou Scala, ces patterns vous semblent évidents. Mais les rendre ergonomiques en Python était un défi. Python n'a pas de syntaxe do-notation. Nous avons opté pour le chainage fluide (.map().and_then().or_else()) qui reste lisible même pour un développeur Python qui n'a jamais entendu parler de monades. L'objectif n'était pas de transformer les gens en fonctionnalistes — c'était de rendre le code plus fiable sans qu'ils aient besoin de comprendre la théorie des catégories.

If you come from Haskell or Scala, these patterns seem obvious. But making them ergonomic in Python was a challenge. Python has no do-notation syntax. We opted for fluent chaining (.map().and_then().or_else()) which remains readable even for a Python developer who has never heard of monads. The goal wasn't to turn people into functionalists — it was to make code more reliable without requiring them to understand category theory.

Nous avons fait des erreurs de design en chemin. Notre première API exposait des méthodes comme bind() et fmap() — les noms académiques. Un de nos développeurs junior a mis 20 minutes à comprendre ce que faisait .bind(lambda x: Result.ok(x + 1)). Ma faute. On a tout renommé : bind → and_then, fmap → map, pure → ok. Les mêmes concepts, les mêmes garanties mathématiques, mais un vocabulaire que n'importe quel développeur Python peut comprendre en 30 secondes. Les puristes de Haskell nous en voudront. Mais notre cible, ce n'est pas les puristes de Haskell — c'est le data scientist qui veut des pipelines robustes sans se taper "Learn You a Haskell for Great Good".

We made design mistakes along the way. Our first API exposed methods like bind() and fmap() — the academic names. One of our junior developers spent 20 minutes understanding what .bind(lambda x: Result.ok(x + 1)) did. My fault. We renamed everything: bind → and_then, fmap → map, pure → ok. Same concepts, same mathematical guarantees, but a vocabulary any Python developer can understand in 30 seconds. Haskell purists will resent us. But our target isn't Haskell purists — it's the data scientist who wants robust pipelines without having to read "Learn You a Haskell for Great Good".

Option monad

L'Option suit la même philosophie que Result, mais pour un cas différent : l'absence de valeur. Au lieu de retourner None et de croiser les doigts pour que l'appelant vérifie, Option rend l'absence explicite et composable. Chaque opération sur un Option.nothing() est un no-op silencieux — pas de crash, pas de NoneType surprise.

Option follows the same philosophy as Result, but for a different case: the absence of a value. Instead of returning None and crossing your fingers that the caller checks, Option makes absence explicit and composable. Every operation on an Option.nothing() is a silent no-op — no crash, no surprise NoneType.

from polarway.async_client import Option

# Handle nullable values
opt = Option.some(42)
doubled = opt.map(lambda x: x * 2)  # Some(84)

# Provide defaults
opt = Option.nothing()
value = opt.unwrap_or(0)  # 0

# Chain with and_then
opt = (Option.some("data.parquet")
       .and_then(lambda path: read_file(path))
       .and_then(lambda df: filter_df(df)))

Option élimine une classe entière de bugs : les NoneType has no attribute. En Python standard, None est un accident qui attend de se produire — il se propage silencieusement à travers 15 couches de code avant d'exploser dans un endroit complètement inattendu. Avec Option, l'absence de valeur est encodée dans le type. Vous êtes *forcé* de décider quoi faire quand il n'y a rien : fournir un défaut avec unwrap_or(), propager avec and_then(), ou échouer explicitement avec unwrap().

Option eliminates an entire class of bugs: NoneType has no attribute. In standard Python, None is an accident waiting to happen — it propagates silently through 15 layers of code before exploding in a completely unexpected place. With Option, the absence of a value is encoded in the type. You're *forced* to decide what to do when there's nothing: provide a default with unwrap_or(), propagate with and_then(), or fail explicitly with unwrap().

Tony Hoare, l'inventeur de null, l'a appelé sa "billion dollar mistake". Depuis 2009, chaque conférence sur les langages de programmation cite cette phrase. Et pourtant, en 2026, la majorité du code Python en production continue d'utiliser None comme valeur de retour sans aucun encadrement. Optional[int] dans les type hints est un pas dans la bonne direction, mais Pyright ne vous empêchera pas d'appeler .upper() sur un Optional[str] à runtime — il vous *prévient*, et vous ignorez l'avertissement parce que "ça marche en dev". L'Option de Polarway est différent : opt.map(str.upper) ne crashe jamais, même si opt est Nothing. La sécurité n'est pas à l'honneur du développeur — elle est dans la structure.

Tony Hoare, the inventor of null, called it his "billion dollar mistake". Since 2009, every programming language conference cites that quote. And yet in 2026, the majority of production Python code still uses None as a return value with zero guardrails. Optional[int] in type hints is a step in the right direction, but Pyright won't stop you from calling .upper() on an Optional[str] at runtime — it *warns* you, and you ignore the warning because "it works in dev". Polarway's Option is different: opt.map(str.upper) never crashes, even if opt is Nothing. Safety isn't on the developer's honor — it's in the structure.

Functor : la base catégorique Functor: the categorical foundation

Result et Option sont des foncteurs au sens de la théorie des catégories : ils implémentent fmap qui préserve la structure. En Rust :

Result and Option are functors in the category-theoretic sense: they implement fmap which preserves structure. In Rust:

$$ \text{fmap} : (A \to B) \to F(A) \to F(B) $$ $$ \text{fmap}(f, \text{Ok}(x)) = \text{Ok}(f(x)) \qquad \text{fmap}(f, \text{Err}(e)) = \text{Err}(e) $$

Loi des foncteurs : fmap id = id et fmap (g . f) = fmap g . fmap f Functor laws: fmap id = id and fmap (g . f) = fmap g . fmap f

Voici l'implémentation concrète du trait Functor en Rust, appliqué à Result. La méthode fmap prend une fonction f: T → U et l'applique à la valeur contenue — si et seulement si le Result est Ok. En cas d'erreur, elle retourne l'erreur intacte. C'est la définition même d'un foncteur : appliquer une transformation sans changer la structure.

Here is the concrete implementation of the Functor trait in Rust, applied to Result. The fmap method takes a function f: T → U and applies it to the contained value — if and only if the Result is Ok. In case of error, it returns the error untouched. This is the very definition of a functor: applying a transformation without changing the structure.

trait Functor<T> {
    type Output<U>;
    fn fmap<U, F>(self, f: F) -> Self::Output<U>
    where
        F: FnOnce(T) -> U;
}

impl<T, E> Functor<T> for Result<T, E> {
    type Output<U> = Result<U, E>;

    fn fmap<U, F>(self, f: F) -> Result<U, E>
    where
        F: FnOnce(T) -> U,
    {
        self.map(f)
    }
}

Et voici l'équivalent Python. Notre implémentation de Result.map() est un mirror 1:1 du Functor Rust ci-dessus. La différence est subtile mais importante : en Rust, le compilateur *garantit* que fmap respecte les lois des foncteurs. En Python, c'est notre suite de tests qui joue ce rôle — 180 tests unitaires vérifient les propriétés algébriques de chaque monade.

And here is the Python equivalent. Our implementation of Result.map() is a 1:1 mirror of the Rust Functor above. The difference is subtle but important: in Rust, the compiler *guarantees* that fmap respects the functor laws. In Python, it's our test suite that plays that role — 180 unit tests verify the algebraic properties of each monad.

class Result(Generic[T, E]):
    def map(self, f: Callable[[T], U]) -> Result[U, E]:
        """Functor: fmap for Result"""
        if self.is_ok():
            return Result.ok(f(self._value))
        return Result.err(self._error)

La beauté du Functor est sa composabilité. Vous pouvez chaîner autant de .map() que vous voulez sans jamais vérifier si le résultat est Ok ou Err. Si c'est une erreur, elle passe à travers toute la chaîne intacte — comme l'eau qui coule dans un tuyau fermé. Ce n'est pas de la magie, c'est de l'algèbre : les lois des foncteurs garantissent que fmap préserve la structure. Et comme Rust vérifie ces lois à la compilation, on a la certitude mathématique que notre pipeline d'erreurs est correct.

The beauty of the Functor is its composability. You can chain as many .map() calls as you want without ever checking whether the result is Ok or Err. If it's an error, it passes through the entire chain untouched — like water flowing through a closed pipe. This isn't magic, it's algebra: the functor laws guarantee that fmap preserves structure. And since Rust verifies these laws at compile time, we have mathematical certainty that our error pipeline is correct.

C'est ici que la théorie des catégories cesse d'être un sujet de discussion académique et devient un outil d'ingénierie. Quand je lis un pipeline qui fait result.map(normalize).map(filter).map(aggregate), je sais — avec une certitude *mathématique*, pas juste une intuition — que si result est une erreur, les trois opérations sont court-circuitées et l'erreur originale est préservée intacte. Je n'ai pas besoin de lire le code de normalize, filter, ou aggregate. Les lois des foncteurs me le garantissent. C'est une forme de confiance que try/except ne peut jamais offrir — avec les exceptions, vous devez lire *chaque* fonction appelée pour savoir si elle peut lever une exception, et *quel type* d'exception. Le type system fait le travail de la revue de code.

This is where category theory stops being an academic discussion topic and becomes an engineering tool. When I read a pipeline that does result.map(normalize).map(filter).map(aggregate), I know — with *mathematical* certainty, not just intuition — that if result is an error, all three operations are short-circuited and the original error is preserved intact. I don't need to read the code of normalize, filter, or aggregate. The functor laws guarantee it. This is a form of trust that try/except can never offer — with exceptions, you have to read *every* called function to know if it can throw, and *what type* of exception. The type system does the code review's job.

Pipeline monadique : l'erreur traverse la cha\u00eene intacte, chaque .map() est court-circuit\u00e9 automatiquement. Monadic pipeline: error passes through the chain untouched, each .map() is automatically short-circuited.

En pratique : traitement de fichiers sans exception In practice: file processing without exceptions

Assez de théorie — voici à quoi ressemble un pipeline *complet* sans aucune exception. Ce code lit des fichiers en batch, filtre les résultats réussis, logue les erreurs fonctionnellement, et collecte les DataFrames. Pas un seul try/except en vue. Chaque point de défaillance est visible dans les types.

Enough theory — here is what a *complete* pipeline looks like without a single exception. This code reads files in batch, filters successful results, logs errors functionally, and collects DataFrames. Not a single try/except in sight. Every failure point is visible in the types.

async def process_files(paths: list[str]) -> list[pd.DataFrame]:
    """Process files without exceptions — pure monadic pipeline"""

    async with AsyncPolarwayClient("localhost:50051") as client:
        # Read files — returns list[Result[Handle, Error]]
        results = await client.batch_read(paths)

        # Filter successful reads (no try/except!)
        handles = [r.unwrap() for r in results if r.is_ok()]

        # Log errors functionally
        for r in results:
            r.map_err(lambda e: print(f"⚠️ Read failed: {e}"))

        # Collect DataFrames
        tables = await client.batch_collect(handles)

        return [t.unwrap() for t in tables if t.is_ok()]

Zéro try/except. Les erreurs se composent avec map, and_then, or_else. Si une erreur est ignorée, le code ne compile pas (Rust) ou le type checker la signale (Python + Pyright).

Zero try/except. Errors compose with map, and_then, or_else. If an error is ignored, the code won't compile (Rust) or the type checker flags it (Python + Pyright).

5. Cas réel : ingestion WebSocket à 100k ticks/s 5. Real-world: WebSocket ingestion at 100k ticks/s

Here is the complete architecture of a real-time ingestion pipeline used in production at HFThot. This isn't a documentation example — it's code extracted (and simplified) from our actual system. If you work on market data ingestion, this pattern is probably directly applicable to your infrastructure.

This pipeline ingests crypto market data (BTC, ETH, and 30+ altcoins) from Binance WebSockets, transforms them into Polars DataFrames, and persists them as Parquet partitioned by day. It all runs on a single node with 4 cores. Before Polarway, this same pipeline required 3 nodes with Kafka in between to absorb the load. Today, infrastructure cost is divided by 3 and latency by 10.

The migration story deserves to be told in detail, because it contains lessons that benchmarks don't show. The old system had 3 components: a Python asyncio ingester that read WebSockets and pushed to Kafka, a Python consumer that read from Kafka and wrote Parquet, and an Airflow scheduler that orchestrated hourly batches. Three services, three deployments, three log stores, three Grafana dashboards. When something broke (and believe me, something broke at least once a week), you had to triangulate across 3 systems to understand whether the problem was the ingester, Kafka, or the consumer. Mean time to diagnose an incident was 45 minutes. With Polarway, it's a single process, a single log file, and mean time to diagnose dropped to 8 minutes.

Avant/Apr\u00e8s : migration de 3 services + Kafka vers un seul processus Polarway. Before/After: migration from 3 services + Kafka to a single Polarway process.

Python Client
await client.stream()

→

Tokio mpsc
Bounded channel (1000)

→

Rust Worker
serde + DataFrame

→

Parquet / Delta
Persistent storage

Serveur Rust : WebSocket → Tokio Channel → Stockage Rust Server: WebSocket → Tokio Channel → Storage

Voici le cœur du serveur d'ingestion. L'architecture suit un pattern producteur-consommateur classique, mais avec une différence fondamentale : le channel mpsc borné crée une backpressure naturelle entre les WebSockets entrants et le moteur de stockage. Chaque connexion WebSocket est gérée dans une tâche Tokio séparée, et les ticks sont désérialisés via serde_json avant d'être envoyés dans le channel.

Here is the heart of the ingestion server. The architecture follows a classic producer-consumer pattern, but with a fundamental difference: the bounded mpsc channel creates natural backpressure between incoming WebSockets and the storage engine. Each WebSocket connection is handled in a separate Tokio task, and ticks are deserialized via serde_json before being sent into the channel.

use tokio::sync::mpsc;
use tokio_tungstenite::accept_async;

async fn websocket_ingest() -> Result<(), Error> {
    let (tx, mut rx) = mpsc::channel(1000); // Bounded = backpressure

    let listener = TcpListener::bind("0.0.0.0:8080").await?;

    // Accept WebSocket connections
    while let Ok((stream, _)) = listener.accept().await {
        let tx = tx.clone();

        tokio::spawn(async move {
            let ws = accept_async(stream).await?;
            let (_, mut receiver) = ws.split();

            while let Some(Ok(msg)) = receiver.next().await {
                let tick: MarketTick = serde_json::from_str(&msg.to_text()?)?;
                tx.send(tick).await?; // Backpressure if channel full
            }

            Ok::<_, Error>(())
        });
    }

    // Process ticks — separate Tokio task, parallel to receivers
    while let Some(tick) = rx.recv().await {
        let df = tick_to_dataframe(tick)?;
        store_to_polarway(df).await?;
    }

    Ok(())
}

Le détail qui change tout : le channel mpsc est *borné* à 1000. Quand le consommateur (stockage Parquet) est lent, le channel se remplit. Quand il est plein, l'appel tx.send(tick).await *suspend* le producteur au lieu de rejeter le message. Le producteur ralentit automatiquement au rythme du consommateur. C'est de la backpressure native, sans configuration, sans tuning, sans Kafka. En asyncio, ce même pattern nécessite un asyncio.Queue(maxsize=1000) — sauf que la Queue asyncio est mono-thread et ne supporte qu'un seul producteur efficace.

The detail that changes everything: the mpsc channel is *bounded* at 1000. When the consumer (Parquet storage) is slow, the channel fills up. When it's full, the tx.send(tick).await call *suspends* the producer instead of dropping the message. The producer automatically slows down to the consumer's pace. This is native backpressure, with no configuration, no tuning, no Kafka. In asyncio, this same pattern requires an asyncio.Queue(maxsize=1000) — except the asyncio Queue is single-threaded and only supports a single efficient producer.

For people who've used Kafka, this simplification might seem suspicious. "How can you replace Kafka with an in-memory channel?" The answer is: we don't replace Kafka in the general case. Kafka is indispensable for distributed multi-consumer systems with replay. What we replace is the *misuse* of Kafka — when you use it as a buffer between two components *on the same machine* because asyncio has no backpressure. That's like taking a plane to cross the street. The mpsc channel is the right tool for intra-process communication, and Kafka remains the right tool for inter-cluster communication. Both have their place. But when you can solve the problem with 3 lines of Tokio instead of a Kafka cluster to administer, you sleep better at night.

Client Python : streaming avec auto-reconnexion Python Client: streaming with auto-reconnect

Côté client, le code Python est délibérément minimal. L'API polarway.stream() retourne un itérateur asynchrone que vous consommez avec un async for classique. Le batching est géré par l'utilisateur (ici, tous les 1000 ticks), et chaque écriture retourne un Result plutôt qu'une exception — ce qui permet de logger les erreurs sans interrompre le flux.

On the client side, the Python code is deliberately minimal. The polarway.stream() API returns an async iterator that you consume with a standard async for. Batching is handled by the user (here, every 1,000 ticks), and each write returns a Result rather than an exception — which lets you log errors without interrupting the flow.

from polarway.async_client import AsyncPolarwayClient, Result

async def ingest_market_data():
    async with AsyncPolarwayClient("localhost:50051") as polarway:
        batch: list[dict] = []

        async for tick in polarway.stream("btcusdt@trade"):
            batch.append(tick)

            if len(batch) >= 1000:
                # Batch write — returns Result, not exception
                result: Result = await polarway.write_batch(batch)
                result.map(lambda n: print(f"✅ Stored {n} ticks"))
                result.map_err(lambda e: print(f"⚠️ Write failed: {e}"))
                batch.clear()

Remarquez l'absence totale de try/except dans ce code de production. Chaque écriture retourne un Result : succès ou échec, explicitement. Les erreurs sont loguées via map_err sans interrompre le flux. Si un batch échoue, le suivant prend le relais. Le stream ne s'arrête jamais. C'est à la fois plus simple et plus robuste que la version asyncio équivalente avec ses try/except ConnectionError imbriqués et ses finally: await cleanup().

Notice the complete absence of try/except in this production code. Each write returns a Result: success or failure, explicitly. Errors are logged via map_err without interrupting the flow. If a batch fails, the next one takes over. The stream never stops. This is both simpler and more robust than the equivalent asyncio version with its nested try/except ConnectionError blocks and finally: await cleanup() handlers.

Cet article de production tourne 24/7 depuis 4 mois. Zéro restart non planifié. Zéro perte de données. Zéro incident d'on-call lié à l'ingestion. Avant Polarway, le même pipeline nécessitait un restart hebdomadaire en moyenne — parfois à cause d'un OOM (asyncio Queue sans bounds), parfois à cause d'une exception non rattrapée qui tuait la boucle événementielle, parfois à cause d'un deadlock dans le connection pool. Le plus frustrant, c'était les incidents "fantômes" : ceux qui se résolvaient avec un systemctl restart et qui ne laissaient aucune trace dans les logs parce que l'exception avait été avalée. Avec les monades, chaque erreur est visible, chaque failover est explicite, et le log raconte l'histoire complète. L'ingénieur d'astreinte (souvent moi, à 3h du matin) n'a plus besoin de *deviner* — il lit.

This production code has been running 24/7 for 4 months. Zero unplanned restarts. Zero data loss. Zero on-call incidents related to ingestion. Before Polarway, the same pipeline required a weekly restart on average — sometimes due to OOM (asyncio Queue without bounds), sometimes due to an uncaught exception killing the event loop, sometimes due to a deadlock in the connection pool. The most frustrating were the "ghost" incidents: those resolved by systemctl restart that left no trace in the logs because the exception had been swallowed. With monads, every error is visible, every failover is explicit, and the log tells the complete story. The on-call engineer (often me, at 3AM) no longer needs to *guess* — they read.

6. Benchmarks commentés 6. Benchmarks with commentary

All benchmarks are run on AWS c6i.4xlarge (16 vCPU, 32 GB RAM) with a dataset of 50 Parquet files × 100 MB = 5 GB. Source code is available on GitHub and detailed results on ReadTheDocs. We publish benchmarks not to impress — anyone can publish favorable numbers — but to be held accountable. If you reproduce our tests and get different results, open an issue.

An important note on our methodology: each benchmark is run 10 times, the first 2 runs are discarded (warmup), and we report the median of the remaining 8. The dataset is generated from real market data (BTC/USDT ticks) replayed into Parquet files with Snappy compression. This isn't a synthetic benchmark with random integers — it's the same type of data our pipelines process in production.

Why the median and not the mean? Because the mean lies as much as exceptions do. A single outlier run (a Python GC triggering at the wrong time, a noisy neighbor on the AWS hypervisor) can shift the mean by 15%. The median is robust to outliers. Why discard the first 2 runs? Because CPU caches (L1/L2/L3), kernel page tables, and Python's potential JIT are cold on the first run. Comparing a cold first run to a warm run is misleading. This is methodological honesty, not cheating — and it's a nuance that most "Rust is 10x faster" blog posts completely ignore.

6.1 Lecture batch : le speedup croît avec la charge 6.1 Batch read: speedup grows with load

FichiersFiles	Polars (sync)	Polarway (async)	Speedup
10	2.3s	0.6s	3.8x
50	11.5s	2.8s	4.1x
100	23.0s	4.5s	5.1x

Why does speedup increase? Tokio's work-stealing distributes I/O tasks across $c$ cores. With more files, Python's GIL saturation becomes the dominant bottleneck.

What's striking is the *non-linearity* on the Polars side. 10 files in 2.3s, 100 files in 23s — that's perfectly linear, $O(n)$. Polars doesn't parallelize file reads even if your machine has 16 cores. On the Polarway side, 10 files in 0.6s, 100 files in 4.5s — that's *not* linear. Speedup increases because the more tasks there are, the better Tokio can distribute the work. That's the theoretical advantage of work-stealing: the parallelism factor converges to the number of cores as $n \to \infty$.

An important nuance: Polars is an *extraordinary* tool. Ritchie Vink and his team have done remarkable work. The problem isn't Polars — it's that Polars lives in Python's universe, and that universe has a glass ceiling called the GIL. Polars parallelizes *within* a single operation (the Apache Arrow engine uses rayon for parallel scans), but it doesn't parallelize *between* operations when orchestrated from Python. That's the crucial distinction: intra-operation vs inter-operation parallelism. Polarway solves the latter by moving orchestration into Tokio. These aren't competing tools — they're complementary. Polarway uses Polars under the hood for DataFrame operations. We added the missing parallelism, we didn't replace the library.

6.2 Mémoire : constant vs linéaire 6.2 Memory: constant vs linear

Dataset	Polars (mém.)Polars (mem)	Polarway (mém.)Polarway (mem)	Status
1 GB	1.2 GB	0.5 GB	✅ / ✅
5 GB	5.8 GB	0.5 GB	✅ / ✅
10 GB	11.5 GB	0.5 GB	⚠️ / ✅
50 GB	OOM ❌	0.5 GB	❌ / ✅

$$ M_{\text{polars}}(n) = O(n) \qquad M_{\text{polarway}}(n) = O(\text{batch\_size}) = O(1) $$

La mémoire de Polarway est indépendante de la taille du dataset. Polarway memory is independent of dataset size.

This is the benchmark that convinced our trading team. The "50 GB → OOM" line is the reality they face every week: a backtest on 3 months of tick-by-tick data for 40 crypto pairs easily exceeds 32 GB of RAM. The standard solution was to rent r7i.8xlarge machines (256 GB RAM, ~$1.60/h). With Polarway in streaming mode, a c6i.4xlarge (32 GB RAM, ~$0.68/h) is enough. Over a year of daily backtests, that's thousands of euros in savings — without changing a single strategy.

The technical trick is simple but powerful: Polarway never loads the entire dataset into memory. It reads Parquet files by row group (typically 50,000-100,000 rows), processes them, emits the result, and *frees the memory*. Python's garbage collector has nothing to do — Arrow buffers are allocated and freed on the Rust side with a deterministic allocator (jemalloc). Memory is a flat plateau, not an ascending ramp. It's the difference between reading a book page by page (O(1) memory) and photocopying the entire book before starting to read (O(n)). The metaphor is obvious, but it took 6 months of Rust to implement correctly — especially the zero-copy between row groups and the gRPC layer. The devil is in the buffers.

6.3 Requêtes concurrentes : scalabilité linéaire 6.3 Concurrent queries: linear scalability

Requêtes par seconde (QPS) Queries per second (QPS)

1 query

Polars

Polarway

10 queries

Polars

Polarway

100 queries

Polars

Polarway

650

500 queries

Polars

Polarway 🔥

1200

The GIL wall: Polars saturates at ~70 QPS regardless of query count. Polarway scales linearly: $\text{QPS} \approx 2.4 \times n_{\text{queries}}$.

This chart is what we show skeptics. The Polars plateau at ~70 QPS is visible to the naked eye — that's the GIL wall. No matter how many queries you throw at it, Python can't process more than 70 per second because each query requires a minimum of CPU work (deserialization, filtering) that cannot be parallelized. On the Polarway side, the curve is quasi-linear. At 500 concurrent queries, we're at 1200 QPS — 17.1x above Python's ceiling. And the curve doesn't yet show saturation, suggesting the real limit is well beyond.

I want to be honest about what this chart does *not* show. At 1 sequential query, Polars and Polarway are both at 10 QPS. Polarway isn't faster on an isolated task — it's faster when your load increases. This is a crucial point that "Rust vs Python" articles often omit: the speedup isn't free, it manifests under load. If you have a weekly backtest script that processes one file at a time, Polarway won't help you. If you have a real-time dashboard with 50 concurrent users querying a 500GB lakehouse, Polarway is a game changer. Understanding *when* a tool is useful is as important as understanding *how* it works.

6.4 WebSocket streaming 6.4 WebSocket streaming

MétriqueMetric	ValeurValue
Latence (p50)Latency (p50)	0.8 ms
Latence (p99)Latency (p99)	1.2 ms
DébitThroughput	120k ticks/sec
Mémoire (fenêtre 1M ticks)Memory (1M tick window)	500 MB (constantconstant)

7. Ce qui arrive : Distributed Mode & Bus 7. What's next: Distributed Mode & Bus

Polarway doesn't stop at a single-process runtime. Here's what we're building — and why each piece of the puzzle is essential.

Le Polarway d'aujourd'hui est un serveur gRPC unique qui exploite un runtime Tokio multi-thread. C'est déjà suffisant pour écraser asyncio sur un seul nœud. Mais notre vision va plus loin : nous voulons que le même code client await client.batch_read(paths) fonctionne de façon transparente sur un cluster de nœuds, sans que l'utilisateur sache (ou ait besoin de savoir) combien de machines traitent ses données. C'est le passage de $O(n/c)$ à $O(n/(c \cdot k))$ où $k$ est le nombre de nœuds.

Today's Polarway is a single gRPC server exploiting a multi-threaded Tokio runtime. That's already enough to crush asyncio on a single node. But our vision goes further: we want the same client code await client.batch_read(paths) to work transparently across a cluster of nodes, without the user knowing (or needing to know) how many machines are processing their data. That's the transition from $O(n/c)$ to $O(n/(c \cdot k))$ where $k$ is the number of nodes.

This isn't just an engineer's dream. It's a market necessity. Crypto datasets grow exponentially: tick-by-tick for 500+ pairs across 3+ exchanges, that's several hundred GB per month. Today, a single Polarway node handles our current load. But we see the curve, and we know that by 2027, a single node won't be enough. The question isn't *if* we'll need distributed mode — it's *when*. And we'd rather have the infrastructure ready before demand arrives, not after.

🌐 Distributed Mode

Distribution automatique des tâches Tokio sur un cluster de nœuds. Un JoinSet distribué qui scale horizontalement, avec le même code client.

Automatic distribution of Tokio tasks across a cluster of nodes. A distributed JoinSet that scales horizontally, with the same client code.

🚌 Event Bus

A pub/sub event bus integrated into Polarway. Publish a DataFrame, subscribe from any node. Backpressure and exactly-once guarantees.

📊 Query Federation

Federated SQL queries across heterogeneous sources (Parquet, Delta Lake, DuckDB, PostgreSQL) via a single gRPC endpoint.

🧪 REPL interactif

A Python REPL connected to the Tokio runtime for interactive exploration of distributed DataFrames with auto-completion.

The Bus is perhaps the feature we're most excited about. Imagine a pipeline where one node publishes DataFrames of raw ticks, another subscribes to compute real-time indicators, and a third persists everything to Delta Lake — all with exactly-once guarantees and end-to-end backpressure. It's Kafka, but without Kafka: no ZooKeeper cluster, no topic management, no consumer groups to configure. Just Rust and distributed Tokio channels.

If this sounds ambitious, it's because it is. We don't claim the Bus will replace Kafka for LinkedIn-scale use cases. But for a 5-50 person team processing financial data in real-time that has neither the expertise nor the budget to administer a Kafka cluster (3 brokers + ZooKeeper + Schema Registry + monitoring = minimum 6 machines), the Bus offers 80% of the functionality for 10% of the operational complexity. It's the Unix philosophy: do one thing, do it well, and let people compose. Kafka is an operating system. The Bus is a tool.

Un mot personnel pour conclure cette section. Quand j'ai commencé à travailler sur Polarway il y a 18 mois, je ne savais pas écrire du Rust. Je venais de 10 ans de Python. Le borrow checker m'a fait pleurer pendant les 3 premières semaines. Je ne comprenais pas pourquoi le compilateur refusait du code qui était "évidemment correct". Puis, lentement, quelque chose a changé. J'ai commencé à voir les erreurs non pas comme des obstacles mais comme des guides. Le compilateur ne me disait pas "tu as tort" — il me disait "tu pourrais avoir un data race ici, et je ne peux pas le prouver safe". C'est un collaborateur, pas un adversaire. Et le jour où j'ai compilé le premier pipeline WebSocket sans un seul unsafe et qu'il a tourné sans correction pendant 30 jours consécutifs en production — ce jour-là, j'ai compris pourquoi les gens tombent amoureux de Rust. Pas pour la syntaxe (elle est brutaliste). Pour la *confiance*. Quand ça compile, ça marche. Vraiment.

A personal word to close this section. When I started working on Polarway 18 months ago, I didn't know how to write Rust. I came from 10 years of Python. The borrow checker made me cry for the first 3 weeks. I couldn't understand why the compiler refused code that was "obviously correct". Then, slowly, something shifted. I started seeing errors not as obstacles but as guides. The compiler wasn't telling me "you're wrong" — it was telling me "you might have a data race here, and I can't prove it safe". It's a collaborator, not an adversary. And the day I compiled the first WebSocket pipeline without a single unsafe and it ran without correction for 30 consecutive days in production — that day, I understood why people fall in love with Rust. Not for the syntax (it's brutalist). For the *confidence*. When it compiles, it works. Truly.

$$ \text{Polarway}_{v2} : \underbrace{T_{\text{tokio}}(n, c)}_{\text{single-node}} \to \underbrace{T_{\text{distributed}}(n, c, k)}_{\text{k nodes}} = O\!\left(\frac{n}{c \cdot k}\right) $$

Le prochain palier : scalabilité linéaire avec le nombre de nœuds. The next level: linear scalability with the number of nodes.

Try Polarway

Async pipeline, Result/Option monads, reproducible benchmarks. Everything is open-source.

📖 Documentation 💻 GitHub

Zero-Cost Async : De Python asyncio au runtime Tokio de Rust

Zero-Cost Async: From Python asyncio to Rust's Tokio Runtime

1. Le problème : les coûts cachés d'asyncio 1. The problem: asyncio's hidden costs

Python asyncio

Rust + Tokio

2. Les 4 piliers de Polarway 2. The 4 pillars of Polarway

3. Tokio : ce que Rust apporte 3. Tokio: what Rust brings to the table

Work-stealing : lecture concurrente de fichiers Work-stealing: concurrent file reads

Backpressure intégrée avec Semaphore Built-in backpressure with Semaphore

4. Gestion d'erreurs monadique — zéro exception 4. Monadic error handling — zero exceptions

Result monad

Option monad

Functor : la base catégorique Functor: the categorical foundation

En pratique : traitement de fichiers sans exception In practice: file processing without exceptions

5. Cas réel : ingestion WebSocket à 100k ticks/s 5. Real-world: WebSocket ingestion at 100k ticks/s

Serveur Rust : WebSocket → Tokio Channel → Stockage Rust Server: WebSocket → Tokio Channel → Storage

Client Python : streaming avec auto-reconnexion Python Client: streaming with auto-reconnect

6. Benchmarks commentés 6. Benchmarks with commentary

6.1 Lecture batch : le speedup croît avec la charge 6.1 Batch read: speedup grows with load

6.2 Mémoire : constant vs linéaire 6.2 Memory: constant vs linear

6.3 Requêtes concurrentes : scalabilité linéaire 6.3 Concurrent queries: linear scalability

Requêtes par seconde (QPS) Queries per second (QPS)

6.4 WebSocket streaming 6.4 WebSocket streaming

7. Ce qui arrive : Distributed Mode & Bus 7. What's next: Distributed Mode & Bus

🌐 Distributed Mode

🚌 Event Bus

📊 Query Federation

🧪 REPL interactif

Essayez Polarway

Try Polarway