Table des matières
  1. Introduction — pourquoi RPC en HFT ?
  2. gRPC Tonic : l'épine dorsale temps-réel
  3. Architecture globale de Polarway
  4. Lakehouse Delta Lake : données ACID à toutes les couches
  5. Time-Travel : audit réglementaire et débogage
  6. Conformité RGPD out-of-the-box
  7. Performances mesurées
  8. Intégration dans HFThot
  9. Conclusion

1. Introduction — Why RPC in HFT?

High-frequency trading demands microsecond-level latencies. REST/HTTP, with its JSON headers and text serialization, introduces overhead incompatible with these constraints. What we need is a binary, typed, streamable protocol — which is exactly what gRPC provides.

But speed without data reliability is worthless. That's why Polarway pairs gRPC with a Delta Lake Lakehouse: every tick, signal, and allocation decision is atomically persisted, versioned, and auditable. The infrastructure is designed as a coherent whole, not two separate systems.

Polarway in a nutshell: a hybrid Rust/Python storage engine orchestrating an LRU Cache → Parquet → DuckDB stack, exposed via both gRPC (real-time streaming) and a Delta Lake lakehouse (analytics & compliance). Source: github.com/ThotDjehuty/polarway — Docs: polarway.readthedocs.io

2. gRPC Tonic: The Real-Time Backbone

Tonic is the native Rust gRPC implementation built on Tokio (async runtime). In HFThot, the gRPC server listens on port 50053 and exposes three services: MarketService (ticks, OHLCV), SignalService (HMM regimes, MFG price), and PortfolioService (portfolio optimization via optimiz-rs).

Why gRPC over REST for HFT

Protobuf serializes 5–10× faster than JSON. HTTP/2 multiplexes streams over a single TCP connection. Bidirectional streaming allows a price tick to be broadcast to 10,000 client connections in parallel with a single server write. The .proto definition enforces a strong contract between producer and consumer.

Bidirectional streaming example

// hfthot-lab-core/src/grpc/market_service.rs
use tonic::{Request, Response, Status};
use tokio_stream::wrappers::ReceiverStream;

#[tonic::async_trait]
impl MarketService for HfthotService {
    type StreamTicksStream = ReceiverStream<Result<Tick, Status>>;

    async fn stream_ticks(
        &self,
        request: Request<SubscribeRequest>,
    ) -> Result<Response<Self::StreamTicksStream>, Status> {
        let symbol = request.into_inner().symbol;
        let (tx, rx) = tokio::sync::mpsc::channel(128);

        // Subscribe to Polarway price feed (zero-copy Arrow batches)
        let mut feed = self.polarway.subscribe_feed(&symbol).await?;

        tokio::spawn(async move {
            while let Some(tick) = feed.next().await {
                let msg = Tick {
                    symbol: tick.symbol.clone(),
                    price: tick.close,
                    qty: tick.volume,
                    ts_ms: tick.timestamp_ms,
                    latency_us: tick.pipeline_latency_us,
                };
                if tx.send(Ok(msg)).await.is_err() { break; }
            }
        });

        Ok(Response::new(ReceiverStream::new(rx)))
    }
}
Zero-copy memory: Polarway pushes Apache Arrow batches into the gRPC channel without intermediate deserialization. The Arrow buffer is directly encoded as Protobuf bytes — no heap allocation between the exchange and the client.

3. Polarway Overall Architecture

🌐 Exchange WebSocket / REST Binance · Coinbase · Kraken · ccxt
Arrow normalization
⚡ gRPC Tonic (port 50053) Tokio async · bidirectional streaming
automatic routing
🔥 LRU Cache ~15ms · hot path
📦 Parquet ~30ms · warm
🦆 DuckDB ~45ms · analytics
↓ Delta Lake writes (ACID)
🗄️ Lakehouse Delta Lake Time-travel · RGPD · Audit log
↓ Polars lazy eval
📊 HMM Regimes via optimiz-rs
📈 Signals MFG HJB-FP equations
🖥️ Streamlit UI Real-time Dashboard

Data enters at the top (exchanges), transits via gRPC, is automatically routed to the appropriate storage tier based on recency and access frequency, then flows to the Lakehouse for ACID persistence. Polars in lazy evaluation mode traverses all three tiers without loading the entire dataset into memory.

Technology Stack

🔥 HFThot Core (Rust)
Tonic gRPC Tokio async Axum REST WebSocket
🏔️ Polarway (Rust/Python)
Polars LazyFrame Apache Arrow DuckDB Delta Lake Parquet LRU Cache
⚙️ optimiz-rs (Rust + PyO3)
DE / SHADE HJB-FP MFG PyO3 bindings WASM Risk Parity
🐍 Python / Streamlit layer
Streamlit dashboard ccxt market data FastAPI REST bridge Argon2 auth

4. Delta Lake Lakehouse: ACID Data at Every Layer

A Lakehouse combines the flexibility of a Data Lake (raw Parquet files) with the transactional guarantees of a Data Warehouse (ACID, schemas). Delta Lake adds a JSON transaction log on top of Parquet files to ensure atomicity, consistency, isolation, and durability — even when the server crashes.

# python/lakehouse/client.py (Polarway LakehouseClient)
from python.lakehouse.client import LakehouseClient

client = LakehouseClient("/app/data/lakehouse")

# ACID write: saves user API keys atomically
success = client.save_api_keys(
    user_id="user-123",
    provider_keys={
        "finnhub": {"api_key": "c8qXXX", "queries_limit": 3600},
    },
    data_sharing_consent=True
)

# The Delta transaction log records this as a new version
# → /lakehouse/api_keys/_delta_log/00000000000000000003.json
# If the write fails mid-way, Delta rolls back => no partial writes

In live trading, a server crash during a position write could leave a corrupted state. With Delta Lake, either the transaction is complete or it's absent — never half-committed. This drastically simplifies crash recovery logic.

5. Time-Travel: Regulatory Audit & Debugging

Every write in Polarway creates a new numbered, timestamped Delta version. We can query the exact state of any table at any historical moment — a week, a month, a year in the past.

from datetime import datetime
from python.lakehouse.client import LakehouseClient

client = LakehouseClient("/app/data/lakehouse")

# --- Débogage : état avant un bug en production ---
# Était-ce la version 12 qui avait le mauvais quota ?
users_v12 = client.read_version("users", version=12)
api_keys_v12 = client.read_version("api_keys", version=12)

# --- Audit MiFID II : état à une date précise ---
audit_date = datetime(2026, 1, 15, 14, 0, 0)  # heure de la transaction contestée
state = client.read_at_timestamp("api_keys", audit_date)

# --- Lister toutes les versions disponibles ---
versions = client.list_versions("market/BTC-USDT")
# [{"version": 0, "timestamp": "...", "operation": "WRITE", "rows_added": 1440},
#  {"version": 1, "timestamp": "...", "operation": "WRITE", "rows_added": 1440}]

A client disputes an API quota deduction on January 15th. With time-travel, we retrieve exactly the api_keys table at the moment of the disputed request, with the precise counter value — irrefutable proof for support or legal proceedings.

6. Out-of-the-Box GDPR Compliance

GDPR (notably Article 17 — right to erasure) mandates permanent deletion of personal data. Delta Lake enables this with VACUUM: after a soft-delete (tombstone), the vacuum physically erases Parquet files from old versions, leaving no recoverable trace.

# RGPD Article 17 — Droit à l'oubli / Right to erasure

# Étape 1 : soft-delete (tombstone, data still recoverable for 30 days)
client.delete_user_soft(user_id="user-123")

# Étape 2 : suppression physique irréversible (VACUUM)
# Efface tous les fichiers Parquet référencés dans les anciennes versions Delta
client.delete_user_permanent(user_id="user-123")
# → Removes from: users/, sessions/, api_keys/, audit_log/
# → Runs VACUUM (retention_hours=0 for immediate deletion)
# → Old Delta version files physically removed from disk

# Étape 3 : export RGPD (Article 20 — Portabilité)
export = client.export_user_data(user_id="user-123", format="json")
# Returns structured JSON: user + sessions + api_keys + audit_log

7. Measured Performance

The following metrics are measured on our production infrastructure (dedicated Infomaniak server, 8 cores, 32 GB RAM). Numbers represent P99 percentiles over 1 million operations.

320µs
P99 Latency tick → client
Exchange → gRPC client
15ms
Cache LRU hit
Hot path read
30ms
Parquet read
Warm data
45ms
DuckDB analytics
Complex SQL
85%
Cache hit rate
Recent data
10×
Speed vs REST+JSON
gRPC Protobuf gain

The end-to-end 320µs latency includes: tick capture from exchange (WebSocket), Arrow normalization in Rust (~80µs), Protobuf encoding (~40µs), TCP/IP transmission (~150µs on local network), client decoding (~50µs). The limiting factor is the network — not our code.

8. Integration with HFThot: The Full Cycle

Here is how a portfolio allocation decision traverses the entire system, from price capture to order execution:

① Tick BTC-USDT (320µs) Polarway gRPC streaming
HMM Regime Detection (94µs) optimiz-rs Viterbi algorithm
on regime change
Portfolio Optimization (2ms) optimiz-rs Risk Parity / CARA / Sparse L1
Order Signal TWAP/VWAP slicing → broker API
↓ persist (ACID)
⑤ Lakehouse Delta Write Tick + signal + allocation → versioned + auditable
# Exemple d'intégration complet / Full integration example
import grpc
import hfthot_pb2_grpc as stub
import hfthot_pb2 as pb
from python.lakehouse.client import LakehouseClient

# --- Setup ---
channel = grpc.secure_channel("api.hfthot-lab.eu:50053", grpc.ssl_channel_credentials())
meta = [("x-hfthot-api-key", "hft_pro_your_key")]
signals = stub.SignalServiceStub(channel)
portfolio = stub.PortfolioServiceStub(channel)
lake = LakehouseClient("/app/data/lakehouse")

# --- Streaming regime updates ---
for event in signals.StreamRegime(pb.SubscribeRequest(symbol="BTC-USDT"), metadata=meta):
    if event.confidence > 0.8:
        # Re-optimize portfolio on regime change
        weights = portfolio.Optimize(pb.OptimizeRequest(
            method="risk_parity",
            symbols=["BTC-USDT", "ETH-USDT", "SOL-USDT"],
        ), metadata=meta)

        # Persist allocation decision (ACID write to Delta Lake)
        lake.write_allocation(
            strategy=event.strategy,
            weights=dict(zip(weights.symbols, weights.values)),
            regime=event.to,
            confidence=event.confidence
        )
        print(f"Regime: {event.to} → weights: {weights}")

9. Conclusion

Polarway demonstrates that it's possible to build a complete HFT infrastructure from open-source components (Tonic, Delta Lake, Polars, DuckDB) without sacrificing real-time performance, regulatory compliance, or data reliability. The gRPC + Lakehouse combination creates a symbiosis: the former handles the flow, the latter handles the truth.

Upcoming articles will cover the Mean Field Games implementation in optimiz-rs, geometric arbitrage strategies, and zero-downtime deployment on dedicated Infomaniak servers.

Resources

Polarway Docs (ReadTheDocs) polarway GitHub optimiz-rs GitHub API Reference Architecture HFThot Full Documentation