Back to blog

ThotBook MCP : votre assistant de recherche en physique & mathématiques

ThotBook MCP: Your Research Assistant for Physics & Mathematics

📖 Abstract

Staying current in physics and mathematics is increasingly costly: thousands of ArXiv preprints appear every week, each assuming a vast implicit prerequisite base. ThotBook MCP is an MCP (Model Context Protocol) server that lets the AI assistant search, summarize, and structure these papers in seconds, generate personalized prerequisite trees, and produce interactive courses in Jupyter Notebook.

📋 Table of Contents

  1. The research overload problem
  2. Natural-language ArXiv search
  3. Prerequisite tree generation
  4. Why fBm is not a semimartingale
  5. Feynman-Kac: the PDE–SDE bridge
  6. Knowledge as a directed acyclic graph
  7. 7. Interactive courses (preview)
  8. 8. Access the full notebook

1. The research overload problem

In theoretical physics and mathematics, a researcher typically needs to master 3 to 5 sub-disciplines to read a cutting-edge paper. A paper on quantum viscosity in string theory assumes knowledge of differential geometry, supersymmetry, conformal field theory, and relativistic fluids — rarely spelled out in the text.

Stat: ArXiv receives ~2,000 submissions per day in physics and mathematics. A researcher focused on a specific theme may receive >50 relevant papers per week — far beyond a reasonable reading capacity.

ThotBook MCP solves three concrete problems: quickly finding relevant papers, understanding what you need to know before reading them, and generating a structured course to accelerate skill acquisition.

The first capability of ThotBook MCP is semantic search across ArXiv. Rather than building a complex Boolean query, you describe your interest in natural language, and the MCP builds and executes an optimized query.

Example: rough volatility and SDEs

# Via the MCP tool in your AI assistant
result = thotbook.arxiv_search(
    query="rough volatility stochastic Volterra equations Heston model",
    max_results=5,
    categories=["q-fin.MF", "math.PR"]
)
# Returned structured metadata
[
  {
    "id": "2302.04854",
    "title": "Rough Volatility: Fact or Artefact?",
    "authors": ["Cont, R.", "Das, P."],
    "year": 2023,
    "citations": 87,
    "abstract_summary": "Challenges the rough volatility paradigm; shows
     that apparent roughness may stem from statistical
     artifacts in high-frequency data estimation..."
,
    "key_equations": ["Volterra SDE", "fractional Brownian motion"],
    "relevance_score": 0.94
  },
  ... 4 more results
]

The MCP returns enriched structured metadata: synthesized summary, detected key equations, relevance score. These results can be piped directly into the following tools.

As an illustration, the Rough Bergomi volatility model describes instantaneous variance via a fractional SDE:

$$v_t = \xi_0(t)\,\mathcal{E}\!\left(\eta\,\sqrt{2H}\int_0^t (t-s)^{H-\frac{1}{2}}\,dW_s^1\right)$$

where \(H \in (0, \tfrac{1}{2})\) is the Hurst exponent, \(\xi_0(t)\) the initial variance term structure, and \(\mathcal{E}\) the Doléans-Dade exponential. ThotBook MCP can locate, summarize, and contextualize the foundational papers of this model in seconds.

3. Prerequisite tree generation

After identifying a paper, the next step is understanding what you need to know to approach it. paper_prerequisites analyzes the paper content and generates a dependency tree.

prereqs = thotbook.paper_prerequisites(
    paper_id="2302.04854",
    depth=3,
    target_level="graduate"  # undergraduate | graduate | expert
)
▶ Rough Volatility (2302.04854)
├─ Stochastic Calculus (Itô integral)
├─ Brownian motion & Wiener measure
├─ Itô’s lemma (standard + multidimensional)
└─ SDEs: existence, uniqueness, Markov property
├─ Fractional Brownian Motion
├─ Self-similar processes & long-range dependence
├─ Hölder regularity & p-variation
└─ Malliavin calculus (basic)
├─ Volterra Equations
├─ Kernel operators & resolvent theory
└─ Stochastic Volterra integral equations
└─ Volatility Models
├─ Black-Scholes & local volatility
├─ Heston stochastic volatility (mean-reversion, Feller)
└─ VIX & realized variance estimation
└─ HAR model, realized kernel estimators

The tree is interactive in the notebook: each node can be expanded ("explain this concept") or collapsed ("I already know this"). Exploration depth is configurable, and the target level (undergraduate to expert) adjusts the granularity of generated prerequisites.

Typical use case: A PhD student in mathematical physics wants to understand stochastic Volterra SDEs. They give the paper to the MCP, get the prerequisite tree, identify specific gaps ("I don't know Malliavin calculus"), and the MCP generates a targeted course to fill exactly those gaps.

4. Why fractional Brownian motion is not a semimartingale

This is one of the most important — and most misunderstood — results in stochastic process theory. Most SDE courses teach Itô calculus implicitly assuming the underlying process is a semimartingale. This holds for standard Brownian motion, but is false for \(H \neq \tfrac{1}{2}\).

Definition and covariance structure

The fractional Brownian motion \((B^H_t)_{t \ge 0}\) with Hurst exponent \(H \in (0,1)\) is the unique centered Gaussian process satisfying:

$$\mathbb{E}[B^H_t\, B^H_s] = \frac{1}{2}\!\left(t^{2H} + s^{2H} - |t-s|^{2H}\right)$$

When \(H = \tfrac{1}{2}\) we recover standard Brownian motion: \(\mathbb{E}[B_t B_s] = \min(t,s)\). For \(H > \tfrac{1}{2}\), increments are positively correlated (long memory); for \(H < \tfrac{1}{2}\), negatively correlated (anti-persistent). This last regime, \(H < \tfrac{1}{2}\), captures the empirically observed roughness of realized volatility.

The \(p\)-variation criterion

A semimartingale \(X\) has finite \(p\)-variation for all \(p > 2\). More precisely, the quadratic variation of a semimartingale is an adapted process with finite-variation paths. However, for fBm:

$$\sum_{k=0}^{n-1}\!\left|B^H_{t_{k+1}} - B^H_{t_k}\right|^p \;\xrightarrow[|\Pi|\to 0]{\mathbb{P}}\; \begin{cases} 0 & \text{if } pH > 1 \\ +\infty & \text{if } pH < 1 \end{cases}$$

The only case where the quadratic variation is finite (and non-trivial) is \(p = 1/H\). For \(H < \tfrac{1}{2}\), we have \(1/H > 2\): the quadratic variation is infinite. By the Bichteler-Dellacherie theorem, this implies \(B^H\) is not a semimartingale when \(H \neq \tfrac{1}{2}\).

Direct consequence: Itô's formula does not apply to \(B^H\) when \(H \neq \tfrac{1}{2}\). One cannot write \(f(B^H_t) = f(B^H_0) + \int_0^t f'(B^H_s)\,dB^H_s + \tfrac{1}{2}\int_0^t f''(B^H_s)\,d[B^H]_s\) because the quadratic correction term is either zero (\(H > \tfrac{1}{2}\)) or infinite (\(H < \tfrac{1}{2}\)). A radically different framework is required.

The fix: Lyons' rough path theory

Terry Lyons (1998) introduced rough path theory to give meaning to stochastic integrals driven by processes that are not semimartingales. The key idea is to enhance the signal \(B^H\) with its iterated integrals up to level \(\lceil 1/H \rceil\):

$$\mathbf{B}^H_{s,t} = \left(B^H_{s,t},\; \mathbb{B}^H_{s,t},\; \ldots\right), \quad \mathbb{B}^H_{s,t} := \int_s^t (B^H_{r} - B^H_s)\otimes dB^H_r$$

This "lifted rough path" \(\mathbf{B}^H\) is not unique (there is freedom in the antisymmetric part of \(\mathbb{B}^H\)), but once fixed, it deterministically determines the solution of SDEs driven by \(B^H\). This approach, refined by Gubinelli (2004) under the name controlled rough paths, is today the theoretical foundation of all rough volatility models.

Why ThotBook detects this: A rough volatility paper may cite Lyons (1998), Friz & Victoir (2010), Bayer-Friz-Gatheral (2016), and Cont-Das (2023) without ever explaining the prerequisite chain. ThotBook MCP reconstructs this chain as a topologically sorted DAG, with "entry points" chosen according to your level.

5. Feynman-Kac: the hidden bridge between PDEs and SDEs

Many researchers learn both parabolic PDE theory and stochastic calculus without ever realizing they are studying the same thing under two different guises. The Feynman-Kac theorem is the linchpin.

The theorem

Let \((X_t)\) be the diffusion defined by the SDE:

$$dX_t = \mu(X_t)\,dt + \sigma(X_t)\,dW_t, \quad X_s = x$$

And let \(u(t,x)\) be the solution of the backward parabolic PDE (backward Kolmogorov equation):

$$\frac{\partial u}{\partial t} + \mu(x)\frac{\partial u}{\partial x} + \frac{\sigma^2(x)}{2}\frac{\partial^2 u}{\partial x^2} - r(x)\,u = 0, \quad u(T,x) = g(x)$$

Then, under standard regularity assumptions, the solution admits the following probabilistic representation:

$$u(t,x) = \mathbb{E}^{t,x}\!\left[g(X_T)\exp\!\left(-\int_t^T r(X_s)\,ds\right)\right]$$

This result is striking: it states that solving a PDE = computing an expectation. The solution of the heat equation with potential \(r\) is exactly the expectation of the discounted terminal payoff along Brownian paths.

Black-Scholes as a special case

Choose \(\mu(x) = rx\), \(\sigma(x) = \sigma x\) (log-normal dynamics), \(r(x) = r\) (constant risk-free rate), and \(g(x) = (x-K)^+\) (European call payoff). The Feynman-Kac equation becomes exactly the Black-Scholes PDE:

$$\frac{\partial V}{\partial t} + rxS\frac{\partial V}{\partial S} + \frac{\sigma^2 S^2}{2}\frac{\partial^2 V}{\partial S^2} - rV = 0$$

and the probabilistic representation is the Black-Scholes formula itself:

$$V(t,S) = e^{-r(T-t)}\,\mathbb{E}^{t,S}\!\left[(S_T - K)^+\right] = S\,\Phi(d_+) - Ke^{-r(T-t)}\Phi(d_-)$$
Non-trivial insight: This means that any Monte Carlo code that simulates geometric Brownian paths and computes expected payoffs is actually a disguised numerical PDE solver. Conversely, finite-difference schemes for the Black-Scholes equation approximate stochastic expectations. The two communities (PDE and probabilistic) often work in parallel without realizing it.

Why ArXiv papers are so hard to read

A paper on PDE-based pricing may assume the reader already knows the probabilistic representation, and vice versa. A paper on Volterra SDEs will write \(\partial_t u + \mathcal{L}u = 0\) without explaining that \(\mathcal{L}\) is the infinitesimal generator of the diffusion — a concept belonging to both PDEs and SDEs but rarely taught as an explicit bridge between the two.

ThotBook MCP automatically detects these conceptual bridges and makes them explicit in the prerequisite tree, with cross-references to both literatures.

6. Mathematical knowledge as a directed acyclic graph

Here is a central idea that motivates ThotBook MCP's design: the structure of mathematical knowledge is, fundamentally, a directed acyclic graph (DAG). Every theorem depends on lemmas, every lemma on definitions, every definition on more primitive structures. There are no cycles: one cannot understand the Novikov theorem while ignoring the Girsanov process, nor understand Girsanov without Wiener measure.

Topological sort = optimal reading order

Formally, let \(G = (V, E)\) be the prerequisite graph, where \((u, v) \in E\) means "understanding \(v\) requires understanding \(u\) first". A topological sort of \(G\) produces a linear ordering of nodes such that every edge points forward. This is the reading order that minimizes the number of "undefined concepts encountered" at every point.

Theorem: A DAG admits a topological sort if and only if it is acyclic (Kahn's algorithm, 1962). The ordering is not unique in general — the set of valid sorts forms the lattice of linear extensions of the DAG, whose cardinality can be exponential in \(|V|\). ThotBook selects an optimal sort relative to your declared skill profile.

Worked example: the Heston model

Consider the Heston model (1993), which describes the joint dynamics of price \(S_t\) and variance \(v_t\):

$$dS_t = rS_t\,dt + \sqrt{v_t}\,S_t\,dW^1_t$$ $$dv_t = \kappa(\theta - v_t)\,dt + \xi\sqrt{v_t}\,dW^2_t, \quad d\langle W^1, W^2\rangle_t = \rho\,dt$$

To understand why this model is tractable (and how to derive its option price via Fourier transform), here is the complete DAG chain:

Heston option pricing
├─ Lemme d’Itô multidimensionnel  [bridge to SDE theory]
├─ Variation quadratique croisée \(\langle W^1, W^2\rangle\)
└─ Mouvement brownien corrélé via factorisation de Cholesky
├─ Processus CIR (\(v_t\) est la solution du modèle CIR)
├─ Condition de Feller : \(2\kappa\theta \ge \xi^2\) (garantit \(v_t > 0\))
└─ Distribution non-centrale du \(\chi^2\) (transition exacte)
├─ Générateur infinitésimal et équation de Kolmogorov rétrograde
└─ Pont Feynman-Kac : \(V = \mathbb{E}[e^{-r(T-t)}(S_T-K)^+]\)
├─ Fonction caractéristique du log-prix (forme affine fermée)
├─ Systèmes affines de Duffie-Pan-Singleton
└─ Equations de Riccati complexes (\(\alpha, \beta\) sous le signe exp)
└─ Inversion par transformée de Fourier (Carr-Madan)
├─ Fonction de Fourier carré intégrable, distribution tempérée
└─ Quadrature de Gauss-Legendre pour l’inversion numérique

This 13-node DAG is never made explicit in Heston's original paper. An unprepared reader unfamiliar with the Feller condition will spend hours wondering why variance stays positive. A reader unaware of affine systems won't understand why the characteristic function has a closed form — even though it is merely a corollary of the general Duffie-Pan-Singleton (2000) theory.

ThotBook's automatic detection

ThotBook MCP reconstructs this DAG by analyzing the paper and enriching it with its mathematical knowledge base. It identifies:

What researchers say: "I spent 3 weeks on the Bayer-Friz-Gatheral paper before realizing I didn't have the foundations of Volterra SDEs. ThotBook would have told me in 5 seconds." — PhD student in mathematical finance, ETH Zürich.

7. Course generation — preview

The most powerful ThotBook MCP capability is generating interactive Jupyter courses from a paper or a prerequisite node. Here is a partial preview of the output:

course = thotbook.generate(
    topic="Stochastic Volterra equations — from Itô to Rough Vol",
    style="lecture",          # lecture | exercises | cheat-sheet
    depth="graduate",
    include_exercises=True,
    target_paper="2302.04854"
)
# Returns a .ipynb file with structured cells
# Cell 1 — Introduction
# Stochastic Volterra Equations: From Itô to Rough Volatility

## 1.1 Motivation: Why classical SDEs are insufficient
The Heston model postulates mean-reverting variance ...

# Cell 2 — Formal definition
$$ X_t = X_0 + \int_0^t K(t-s) b(s, X_s)\,ds + \int_0^t K(t-s)\sigma(s,X_s)\,dW_s $$

# Cell 3 — Exercise 1: verify Hölder regularity
Given K(t) = t^(H-1/2), show that sample paths have Hölder
exponent strictly less than H ...

# Cell 4 — Python simulation (numpy + scipy)
import numpy as np
from scipy.special import gamma
...
🔒

Subscriber access only

The full notebook is available with an HFThot Research Lab subscription.

View plans

8. Access the full notebook

The subscriber demo notebook covers the full ThotBook MCP pipeline:

📓 Notebook: ThotBook MCP — Physics & Maths

  • Multi-criteria ArXiv search with relevance scoring
  • Interactive prerequisite tree generation (depth=5)
  • Structured paper explanation (abstract → equations → proof)
  • Full Jupyter course generation with exercises
  • Export to Delta Lake / Polarway for persistence
  • Automated monitoring: weekly ArXiv alerts

Available in the Research Lab plan from day one.

Subscribe — access the notebook MCP Documentation
Technology: ThotBook MCP is implemented according to Anthropic's Model Context Protocol specification. It works with Claude, GPT-4, and any MCP-compatible assistant. Integration takes one line in .vscode/mcp.json.