Adaptive Portfolio Construction & Execution

Outline

Pourquoi des portefeuilles adaptatifs ?Why adaptive portfolios?
Le modèle — régimes, drift, dispersionThe model — regimes, drift, dispersion
Hamilton–Jacobi–Bellman avec coûts d'exécutionHamilton–Jacobi–Bellman with execution costs
Schéma numérique & calibrationNumerical scheme & calibration
Exemple synthétique — rotation 5 actifsSynthetic example — 5-asset rotation
Performance du backtestBacktest performance
Points clés pratiquesPractical takeaways
Lire le papier / suivre le coursRead the paper / take the course

1. Pourquoi des portefeuilles adaptatifs ?Why adaptive portfolios?

La mean-variance statique est l'optimiseur par défaut partout — et la raison pour laquelle elle déçoit est bien connue : les matrices de covariance sont instables, les prévisions de rendements attendus sont bruitées, et le turnover est illimité dès qu'une entrée change. Un desk typique corrige cela avec le shrinkage, les vues Black–Litterman, et des règles de rééquilibrage ad hoc. Le résultat est un pipeline fragile qui produit de belles efficient frontiers et un P&L décevant.Static mean-variance is the default optimiser everywhere — and the reason it disappoints is well known: covariance matrices are unstable, expected-return forecasts are noisy, and turnover is unbounded once a single input changes. A typical desk patches this with shrinkage, Black–Litterman views, and ad-hoc rebalancing rules. The result is a brittle pipeline that produces beautiful efficient frontiers and ugly P&L.

WP3 part du prémisse inverse : l'optimiseur fait partie d'un système stochastique commandé, où l'objet contrôlé est le vecteur de position $\theta_t$ et l'objectif est une utilité actualisée qui pénalise la vitesse de trading. La matrice de covariance n'est plus une nuisance — c'est le coefficient de diffusion d'un état markovien. Les prévisions ne sont plus des estimations ponctuelles — ce sont des espérances conditionnelles sous un modèle de régimes. Le turnover n'est plus contraint à la main — il est façonné par le coût courant d'exécution.WP3 starts from the opposite premise: the optimiser is part of a controlled stochastic system, where the controlled object is the position vector $\theta_t$ and the objective is a discounted utility that charges for trading speed. The covariance matrix is no longer a nuisance — it is the diffusion coefficient of a Markovian state. Forecasts are no longer point estimates — they are conditional expectations under a regime model. Turnover is no longer constrained by hand — it is shaped by the running cost of execution.

Idée centrale.Core idea. Replace "compute optimal weights, then trade towards them" with "compute the optimal trading rate directly, given the current market state and the cost of moving". The fixed-point of this controlled system is exactly the regime-aware portfolio you were trying to construct, but without the artificial split between optimisation and execution.

2. Le modèleThe model

Considérons $d$ actifs tradables avec le vecteur de prix $S_t \in \mathbb{R}^d$. Le marché est décrit par un état continu $X_t \in \mathbb{R}^k$ (facteurs, niveaux, vol, skew, dispersion) suivant une EDS à commutation de régimes pilotée par une chaîne de Markov à états finis $Z_t \in \{1, \ldots, m\}$ :Consider $d$ tradable assets with price vector $S_t \in \mathbb{R}^d$. The market is described by a continuous state $X_t \in \mathbb{R}^k$ (factors, levels, vol, skew, dispersion) following a regime-switching SDE driven by a finite-state Markov chain $Z_t \in \{1, \ldots, m\}$:

$$ dX_t = \mu(X_t, Z_t)\,dt + \Sigma(X_t, Z_t)\,dW_t, \qquad Z_t \text{ Markov with generator } Q. $$

Les rendements des actifs sont conditionnés par $X_t$ viaAsset returns are conditioned on $X_t$ via

$$ dS_t / S_t = \alpha(X_t, Z_t)\,dt + \sigma(X_t, Z_t)\,dB_t, $$

où $W_t$ et $B_t$ sont des mouvements browniens corrélés dont la covariance jointe est le seul objet dont le framework a réellement besoin.where $W_t$ and $B_t$ are correlated Brownian motions whose joint covariance is the only object the framework actually needs.

Dynamique de position avec taux de trading finiPosition dynamics with finite trading rate

Soit $\theta_t \in \mathbb{R}^d$ la position du portefeuille. On ne suppose pas que $\theta_t$ peut sauter ; au lieu de cela, il suitLet $\theta_t \in \mathbb{R}^d$ be the portfolio position. We do not assume $\theta_t$ can jump; instead it follows

$$ d\theta_t = u_t\,dt, $$

où $u_t$ est le taux de trading que l'on contrôle. C'est la structure de type Almgren–Chriss qui rend le problème d'optimisation réaliste.where $u_t$ is the trading rate we control. This is the Almgren–Chriss-style structure that makes the optimisation problem realistic.

Fonctionnelle de coûtCost functional

Utilité de l'investisseur sur un horizon $T$ :Investor utility on a horizon $T$:

$$ J(\theta_0, X_0, Z_0; u) = \mathbb{E}\!\left[\int_0^T e^{-\rho t}\!\left( \theta_t^\top \alpha_t - \tfrac{\gamma}{2}\theta_t^\top \Sigma_t \theta_t - \tfrac{\eta}{2}u_t^\top \Lambda u_t \right) dt \right], $$

avec l'escompte $\rho \geq 0$, l'aversion au risque $\gamma > 0$, la matrice de coût d'exécution $\Lambda \succ 0$, et le facteur d'impact $\eta > 0$. Les deux premiers termes constituent le critère mean-variance classique ; le troisième est ce qui relie la construction de portefeuille à l'exécution.with discount $\rho \geq 0$, risk aversion $\gamma > 0$, execution-cost matrix $\Lambda \succ 0$, and impact-scaling $\eta > 0$. The first two terms are the classical mean-variance criterion; the third is what links portfolio construction to execution.

3. L'équation HJBThe HJB equation

En écrivant la fonction valeur $V(t, \theta, x, z)$, la programmation dynamique donne l'EDP Hamilton–Jacobi–BellmanWriting the value function $V(t, \theta, x, z)$, dynamic programming yields the Hamilton–Jacobi–Bellman PDE

$$ -\partial_t V = \sup_{u} \left\{ \theta^\top \alpha - \tfrac{\gamma}{2} \theta^\top \Sigma \theta - \tfrac{\eta}{2} u^\top \Lambda u + u^\top \nabla_\theta V + \mathcal{L}^X V + (QV)(z) \right\}, $$

avec la condition terminale $V(T, \cdot) = g(\theta, x, z)$ encodant la récompense du portefeuille final. La condition du premier ordre donne le taux de trading optimal en forme close :with terminal condition $V(T, \cdot) = g(\theta, x, z)$ encoding the final-portfolio reward. The first-order condition gives the optimal trading rate in closed form:

$$ u^\star_t = \frac{1}{\eta}\, \Lambda^{-1}\, \nabla_\theta V(t, \theta_t, X_t, Z_t). $$

Pourquoi c'est important.Why this matters. The optimal rate is the gradient of the value function — exactly the same object that mean-variance optimisers approximate by hand using a target. Here it is computed self-consistently with execution friction baked in. There is no "rebalance threshold" to tune.

Structure pilotée par les régimesRegime-driven structure

La chaîne de régimes $Z_t$ entre via le terme $(QV)(z) = \sum_{z'} q_{zz'} V(\cdot, z')$. Concrètement : quand le système est dans le régime $z$, la valeur d'y être dépend de la valeur d'être dans chaque autre régime, pondérée par le taux de transition $q_{zz'}$. C'est ce qui couple les régimes — et pourquoi la stratégie « anticipe » les changements de régime plutôt que de réagir après coup.The regime chain $Z_t$ enters via the term $(QV)(z) = \sum_{z'} q_{zz'} V(\cdot, z')$. Practically: when the system is in regime $z$, the value of being there depends on the value of being in every other regime weighted by the transition rate $q_{zz'}$. This is what couples the regimes — and why the strategy "anticipates" regime changes rather than reacting after the fact.

4. Schéma numériqueNumerical scheme

WP3 utilise un schéma d'itération de politique sur une grille tensorielle en $(\theta, x)$ pour chaque régime $z$ :WP3 uses a policy-iteration scheme on a tensor grid over $(\theta, x)$ for each regime $z$:

Initialiser $V^{(0)}(t, \theta, x, z)$ (par exemple par la valeur mean-variance sans contrainte).Initialise $V^{(0)}(t, \theta, x, z)$ (e.g. by the unconstrained mean-variance value).
Pour chaque itération $n$ :For each iteration $n$:
- Évaluation de la politique : résoudre une EDP linéaire avec le contrôle courant $u^{(n)}$ via un pas de temps implicite.Policy evaluation: solve a linear PDE with the current control $u^{(n)}$ via implicit time-stepping.
- Amélioration de la politique : mettre à jour $u^{(n+1)} = \eta^{-1}\Lambda^{-1}\nabla_\theta V^{(n)}$.Policy improvement: update $u^{(n+1)} = \eta^{-1}\Lambda^{-1}\nabla_\theta V^{(n)}$.
Arrêter quand $\|V^{(n+1)} - V^{(n)}\|_\infty < \varepsilon$.Stop when $\|V^{(n+1)} - V^{(n)}\|_\infty < \varepsilon$.

Pour cinq actifs et trois régimes, la grille est suffisamment petite pour tourner sur un ordinateur portable. Pour les univers plus larges (50–100 actifs), WP3 recommande une réduction de rang faible suivie de solveurs réseaux de neurones de type deep-Galerkin — tous deux implémentés dans optimiz-rs.For five assets and three regimes, the grid is small enough to run on a laptop. For larger universes (50–100 assets), WP3 recommends a low-rank factor reduction followed by deep-Galerkin neural-network solvers — both of which are implemented in optimiz-rs.

# Skeleton — adaptive portfolio with policy iteration
import numpy as np
from optimizr import HJBSolver, RegimeChain

# Data: 5 assets × 3 regimes (bull / range / risk-off)
mu_by_regime    = np.array([[ 0.10,  0.07,  0.04,  0.02, -0.02],
                            [ 0.04,  0.03,  0.02,  0.01,  0.01],
                            [-0.06, -0.04,  0.00,  0.02,  0.05]])
sigma_by_regime = np.stack([build_covariance(r) for r in range(3)])
Q = np.array([[-0.20, 0.15, 0.05],
              [ 0.10,-0.20, 0.10],
              [ 0.05, 0.15,-0.20]])  # generator (per year)

solver = HJBSolver(
    mu=mu_by_regime, sigma=sigma_by_regime,
    chain=RegimeChain(generator=Q),
    gamma=3.0, eta=1.0, exec_cost_matrix=np.eye(5),
    horizon=1.0, n_steps=252,
)

V, U = solver.policy_iteration(theta_grid=np.linspace(-1, 1, 21),
                               tol=1e-5, max_iter=50)
# U[t, theta_idx, regime_idx] is the optimal trade rate vector

5. Exemple synthétique — 5 actifsSynthetic 5-asset example

Le notebook associé calibre un modèle à trois régimes sur un univers synthétique de cinq ETFs (proxies pour SPY, QQQ, GLD, TLT, BTC). Le générateur de transition $Q$ implique une durée de régime moyenne d'environ 5, 7 et 5 ans respectivement — un choix délibérément conservateur. La matrice de coût d'exécution $\Lambda$ est prise proportionnelle à $\sigma\sigma^\top$ pour capturer le fait empirique que les actifs à forte volatilité sont aussi plus coûteux à trader.The companion notebook calibrates a three-regime model on a synthetic universe of five ETFs (proxies for SPY, QQQ, GLD, TLT, BTC). The transition generator $Q$ implies an average regime length of roughly 5, 7, and 5 years respectively — a deliberately conservative choice. The execution-cost matrix $\Lambda$ is taken proportional to $\sigma\sigma^\top$ to capture the empirical fact that high-vol assets are also costlier to trade.

Trois diagnostics importent :Three diagnostics matter:

Poids conditionnels par régime.Regime-conditioned weights. Dans le régime haussier, le contrôleur se penche fortement vers QQQ et BTC ; en mode risk-off, il se déplace vers TLT et GLD. Les transitions sont lisses — ce ne sont pas des sauts brusques, car le terme $\eta$ pénalise le trading rapide.In the bull regime, the controller tilts strongly to QQQ and BTC; in risk-off, it shifts to TLT and GLD. The shifts are smooth — they are not step-jumps, because the $\eta$ term penalises fast trading.
Anticipation.Anticipation. Comme $Q$ est non nul, le contrôleur commence à réduire l'exposition actions avant que le changement de régime soit visuellement visible, car la valeur optionnelle d'être dans le régime haussier diminue à mesure que la probabilité conditionnelle de risk-off augmente.Because $Q$ is non-zero, the controller starts unwinding equity exposure before the regime visibly switches, since the option value of being in the bull state is decreasing as the conditional probability of risk-off rises.
Robustesse à $\eta$.Robustness to $\eta$. Doubler le coefficient de coût d'exécution réduit environ de moitié le turnover réalisé mais ne dégrade que modestement l'utilité réalisée — exactement la frontier efficace arrondie qu'on veut observer.Doubling the execution-cost coefficient roughly halves the realised turnover but only modestly degrades realised utility — exactly the rounded efficient frontier you want to see.

6. Performance du backtestBacktest performance

Sur l'univers synthétique, avec des coûts de transaction réalistes (2 bps par côté sur les actions, 5 bps sur BTC, 1 bp sur les Treasuries), le contrôleur adaptatif produit ce qui suit sur 1 000 chemins de 5 ans :On the synthetic universe, with realistic transaction costs (2 bps per side on equities, 5 bps on BTC, 1 bp on Treasuries), the adaptive controller delivers the following over 1 000 paths of 5 years:

+1.42

Sharpe vs. static MV
(0.94 baseline)

−38%

Annual turnover
(vs. naive rebalance)

−27%

Max drawdown
(regime-aware vs. flat)

96%

Paths with positive utility
at horizon T=5

Aucun de ces chiffres n'est hors-échantillon sur de vrais marchés — cela est réservé au lab de calibration (privé). Ils sont rapportés ici uniquement pour montrer la forme de l'amélioration, qui est largement cohérente sur des plages de paramètres raisonnables.None of these numbers are out-of-sample on real markets — that is reserved for the (private) calibration lab. They are reported here only to show the shape of the improvement, which is broadly consistent across reasonable parameter changes.

7. Points clés pratiquesPractical takeaways

Le coût importe à l'étape d'optimisationCost matters at the optimisation step, pas en ajustement post-trade. Dès que vous inscrivez le coût dans l'équation de Bellman, la moitié des heuristiques d'un desk s'évaporent., not as a post-trade adjustment. Once you write the cost into the Bellman equation, half the heuristics of a desk evaporate.
Les régimes sont une caractéristique, pas une prévision.Regimes are a feature, not a forecast. Le contrôleur n'a pas besoin de prédire le régime — il a seulement besoin de savoir comment les régimes transitent, et cette information est bien plus stable que l'alpha directionnel.The controller does not need to predict the regime — it just needs to know how regimes transition, and that information is far more stable than directional alpha.
Le taux de trading est l'objet naturel à modéliser.The trading rate is the natural object to model. Cibles-et-seuils est une mauvaise approximation du contrôle de taux optimal.Targets-and-thresholds is a bad approximation of optimal rate control.
L'aversion au risque $\gamma$ et le coût d'exécution $\eta$ fixent ensemble le compromis Sharpe–turnover.Risk aversion $\gamma$ and execution cost $\eta$ together set the Sharpe–turnover trade-off. Ajuster les deux est bien plus interprétable que régler un seuil de rééquilibrage unique.Tuning both is much more interpretable than tuning a single rebalance threshold.

8. Lire le papier / suivre le coursRead the paper / take the course

Le papier WP3 complet développe les preuves d'existence de solutions de viscosité, d'unicité et de convergence du schéma d'itération de politique ; il contient aussi le solveur réseau de neurones deep-Galerkin pour le régime haute dimension ($d \geq 50$). Le notebook associé reproduit toutes les figures et tourne de bout en bout sur un ordinateur portable.The full WP3 paper develops the proofs of viscosity-solution existence, uniqueness, and convergence of the policy iteration scheme; it also contains the deep-Galerkin neural-network solver for the high-dimensional regime ($d \geq 50$). The companion notebook reproduces all figures and runs end-to-end on a laptop.

Lire le PDF WP3Read WP3 PDF Suivre le cours optimisation & contrôleTake the optimisation & control course Vue d'ensemble de la rechercheResearch overview

Parcours pédagogique associé.Companion learning path. The free Optimisation & Control course on hfthot-lab.eu/courses is the recommended prerequisite. It covers risk-parity, mean-variance, Almgren–Chriss execution, and the dynamic-programming machinery used in this paper. Take it before opening the paper if HJB equations are unfamiliar territory.

Pack Pré-requis WP3WP3 Prerequisites Pack

14,99 €

Achat unique, accès à vie · PDF prérequis WP3. Le bagage mathématique minimum pour lire le papier WP3 sereinement : contrôle stochastique, équations de Hamilton–Jacobi–Bellman, itération de politique, méthodes deep-Galerkin et outils d'analyse convexe. One-off purchase, lifetime access · WP3 prerequisites PDF. The minimum mathematical baggage to read the WP3 paper comfortably: stochastic control, Hamilton–Jacobi–Bellman equations, policy iteration, deep-Galerkin methods, and the convex analysis tools needed throughout the paper.

Contrôle stochastique & principe de BellmanStochastic control & Bellman principle
Dérivation HJB & solutions de viscositéHJB derivation & viscosity solutions
Itération de politique : preuve de convergencePolicy iteration: convergence proof
Méthode deep-Galerkin pour la haute dimensionDeep-Galerkin method for high dimension

RéférencesReferences

Almgren, R. & Chriss, N. (2001). Optimal execution of portfolio transactions. Journal of Risk.
Cartea, Á., Jaimungal, S. & Penalva, J. (2015). Algorithmic and High-Frequency Trading. Cambridge University Press.
Pham, H. (2009). Continuous-time Stochastic Control and Optimisation with Financial Applications. Springer.
Avellaneda, M. & Stoikov, S. (2008). High-frequency trading in a limit-order book. Quantitative Finance.
HFThot Research Lab (2026). WP3 — Adaptive Portfolio Construction and Execution.