Outline
  1. Why adaptive portfolios?
  2. The model — regimes, drift, dispersion
  3. Hamilton–Jacobi–Bellman with execution costs
  4. Numerical scheme & calibration
  5. Synthetic example — 5-asset rotation
  6. Backtest performance
  7. Practical takeaways
  8. Read the paper / take the course

1. Why adaptive portfolios?

Static mean-variance is the default optimiser everywhere — and the reason it disappoints is well known: covariance matrices are unstable, expected-return forecasts are noisy, and turnover is unbounded once a single input changes. A typical desk patches this with shrinkage, Black–Litterman views, and ad-hoc rebalancing rules. The result is a brittle pipeline that produces beautiful efficient frontiers and ugly P&L.

WP3 starts from the opposite premise: the optimiser is part of a controlled stochastic system, where the controlled object is the position vector $\theta_t$ and the objective is a discounted utility that charges for trading speed. The covariance matrix is no longer a nuisance — it is the diffusion coefficient of a Markovian state. Forecasts are no longer point estimates — they are conditional expectations under a regime model. Turnover is no longer constrained by hand — it is shaped by the running cost of execution.

Core idea. Replace "compute optimal weights, then trade towards them" with "compute the optimal trading rate directly, given the current market state and the cost of moving". The fixed-point of this controlled system is exactly the regime-aware portfolio you were trying to construct, but without the artificial split between optimisation and execution.

2. The model

Consider $d$ tradable assets with price vector $S_t \in \mathbb{R}^d$. The market is described by a continuous state $X_t \in \mathbb{R}^k$ (factors, levels, vol, skew, dispersion) following a regime-switching SDE driven by a finite-state Markov chain $Z_t \in \{1, \ldots, m\}$:

$$ dX_t = \mu(X_t, Z_t)\,dt + \Sigma(X_t, Z_t)\,dW_t, \qquad Z_t \text{ Markov with generator } Q. $$

Asset returns are conditioned on $X_t$ via

$$ dS_t / S_t = \alpha(X_t, Z_t)\,dt + \sigma(X_t, Z_t)\,dB_t, $$

where $W_t$ and $B_t$ are correlated Brownian motions whose joint covariance is the only object the framework actually needs.

Position dynamics with finite trading rate

Let $\theta_t \in \mathbb{R}^d$ be the portfolio position. We do not assume $\theta_t$ can jump; instead it follows

$$ d\theta_t = u_t\,dt, $$

where $u_t$ is the trading rate we control. This is the Almgren–Chriss-style structure that makes the optimisation problem realistic.

Cost functional

Investor utility on a horizon $T$:

$$ J(\theta_0, X_0, Z_0; u) = \mathbb{E}\!\left[\int_0^T e^{-\rho t}\!\left( \theta_t^\top \alpha_t - \tfrac{\gamma}{2}\theta_t^\top \Sigma_t \theta_t - \tfrac{\eta}{2}u_t^\top \Lambda u_t \right) dt \right], $$

with discount $\rho \geq 0$, risk aversion $\gamma > 0$, execution-cost matrix $\Lambda \succ 0$, and impact-scaling $\eta > 0$. The first two terms are the classical mean-variance criterion; the third is what links portfolio construction to execution.

3. The HJB equation

Writing the value function $V(t, \theta, x, z)$, dynamic programming yields the Hamilton–Jacobi–Bellman PDE

$$ -\partial_t V = \sup_{u} \left\{ \theta^\top \alpha - \tfrac{\gamma}{2} \theta^\top \Sigma \theta - \tfrac{\eta}{2} u^\top \Lambda u + u^\top \nabla_\theta V + \mathcal{L}^X V + (QV)(z) \right\}, $$

with terminal condition $V(T, \cdot) = g(\theta, x, z)$ encoding the final-portfolio reward. The first-order condition gives the optimal trading rate in closed form:

$$ u^\star_t = \frac{1}{\eta}\, \Lambda^{-1}\, \nabla_\theta V(t, \theta_t, X_t, Z_t). $$

Why this matters. The optimal rate is the gradient of the value function — exactly the same object that mean-variance optimisers approximate by hand using a target. Here it is computed self-consistently with execution friction baked in. There is no "rebalance threshold" to tune.

Regime-driven structure

The regime chain $Z_t$ enters via the term $(QV)(z) = \sum_{z'} q_{zz'} V(\cdot, z')$. Practically: when the system is in regime $z$, the value of being there depends on the value of being in every other regime weighted by the transition rate $q_{zz'}$. This is what couples the regimes — and why the strategy "anticipates" regime changes rather than reacting after the fact.

4. Numerical scheme

WP3 uses a policy-iteration scheme on a tensor grid over $(\theta, x)$ for each regime $z$:

  1. Initialise $V^{(0)}(t, \theta, x, z)$ (e.g. by the unconstrained mean-variance value).
  2. For each iteration $n$:
    • Policy evaluation: solve a linear PDE with the current control $u^{(n)}$ via implicit time-stepping.
    • Policy improvement: update $u^{(n+1)} = \eta^{-1}\Lambda^{-1}\nabla_\theta V^{(n)}$.
  3. Stop when $\|V^{(n+1)} - V^{(n)}\|_\infty < \varepsilon$.

For five assets and three regimes, the grid is small enough to run on a laptop. For larger universes (50–100 assets), WP3 recommends a low-rank factor reduction followed by deep-Galerkin neural-network solvers — both of which are implemented in optimiz-rs.

# Skeleton — adaptive portfolio with policy iteration
import numpy as np
from optimizr import HJBSolver, RegimeChain

# Data: 5 assets × 3 regimes (bull / range / risk-off)
mu_by_regime    = np.array([[ 0.10,  0.07,  0.04,  0.02, -0.02],
                            [ 0.04,  0.03,  0.02,  0.01,  0.01],
                            [-0.06, -0.04,  0.00,  0.02,  0.05]])
sigma_by_regime = np.stack([build_covariance(r) for r in range(3)])
Q = np.array([[-0.20, 0.15, 0.05],
              [ 0.10,-0.20, 0.10],
              [ 0.05, 0.15,-0.20]])  # generator (per year)

solver = HJBSolver(
    mu=mu_by_regime, sigma=sigma_by_regime,
    chain=RegimeChain(generator=Q),
    gamma=3.0, eta=1.0, exec_cost_matrix=np.eye(5),
    horizon=1.0, n_steps=252,
)

V, U = solver.policy_iteration(theta_grid=np.linspace(-1, 1, 21),
                               tol=1e-5, max_iter=50)
# U[t, theta_idx, regime_idx] is the optimal trade rate vector

5. Synthetic 5-asset example

The companion notebook calibrates a three-regime model on a synthetic universe of five ETFs (proxies for SPY, QQQ, GLD, TLT, BTC). The transition generator $Q$ implies an average regime length of roughly 5, 7, and 5 years respectively — a deliberately conservative choice. The execution-cost matrix $\Lambda$ is taken proportional to $\sigma\sigma^\top$ to capture the empirical fact that high-vol assets are also costlier to trade.

Three diagnostics matter:

  1. Regime-conditioned weights. In the bull regime, the controller tilts strongly to QQQ and BTC; in risk-off, it shifts to TLT and GLD. The shifts are smooth — they are not step-jumps, because the $\eta$ term penalises fast trading.
  2. Anticipation. Because $Q$ is non-zero, the controller starts unwinding equity exposure before the regime visibly switches, since the option value of being in the bull state is decreasing as the conditional probability of risk-off rises.
  3. Robustness to $\eta$. Doubling the execution-cost coefficient roughly halves the realised turnover but only modestly degrades realised utility — exactly the rounded efficient frontier you want to see.

6. Backtest performance

On the synthetic universe, with realistic transaction costs (2 bps per side on equities, 5 bps on BTC, 1 bp on Treasuries), the adaptive controller delivers the following over 1 000 paths of 5 years:

+1.42
Sharpe vs. static MV
(0.94 baseline)
−38%
Annual turnover
(vs. naive rebalance)
−27%
Max drawdown
(regime-aware vs. flat)
96%
Paths with positive utility
at horizon T=5

None of these numbers are out-of-sample on real markets — that is reserved for the (private) calibration lab. They are reported here only to show the shape of the improvement, which is broadly consistent across reasonable parameter changes.

7. Practical takeaways

8. Read the paper / take the course

The full WP3 paper develops the proofs of viscosity-solution existence, uniqueness, and convergence of the policy iteration scheme; it also contains the deep-Galerkin neural-network solver for the high-dimensional regime ($d \geq 50$). The companion notebook reproduces all figures and runs end-to-end on a laptop.

Read WP3 PDF Take the optimisation & control course Research overview

Companion learning path. The free Optimisation & Control course on hfthot-lab.eu/courses is the recommended prerequisite. It covers risk-parity, mean-variance, Almgren–Chriss execution, and the dynamic-programming machinery used in this paper. Take it before opening the paper if HJB equations are unfamiliar territory.

WP3 Prerequisites Pack

14,99 €

One-off purchase, lifetime access · WP3 prerequisites PDF. The minimum mathematical baggage to read the WP3 paper comfortably: stochastic control, Hamilton–Jacobi–Bellman equations, policy iteration, deep-Galerkin methods, and the convex analysis tools needed throughout the paper.

  • Stochastic control & Bellman principle
  • HJB derivation & viscosity solutions
  • Policy iteration: convergence proof
  • Deep-Galerkin method for high dimension

References

  1. Almgren, R. & Chriss, N. (2001). Optimal execution of portfolio transactions. Journal of Risk.
  2. Cartea, Á., Jaimungal, S. & Penalva, J. (2015). Algorithmic and High-Frequency Trading. Cambridge University Press.
  3. Pham, H. (2009). Continuous-time Stochastic Control and Optimisation with Financial Applications. Springer.
  4. Avellaneda, M. & Stoikov, S. (2008). High-frequency trading in a limit-order book. Quantitative Finance.
  5. HFThot Research Lab (2026). WP3 — Adaptive Portfolio Construction and Execution.