Newton–GMRES and Preconditioning for Stiff Chemistry

Stiff chemistry solvers are built on a chain of ideas, each one forced by the one before it. Stiffness forces implicit timestepping. Implicit timestepping forces a nonlinear solve. The nonlinear solve forces repeated linear algebra. And linear algebra at scale forces you to think carefully about structure. By the end of this article, that chain will feel obvious rather than mysterious.

This article is fully self-contained. If you read only this page, you should be able to answer:

What stiffness is, physically and mathematically.
Why explicit timestepping fails on stiff chemistry.
Why implicit timestepping creates a nonlinear solve.
How Newton converts that nonlinear solve into a sequence of linear solves.
Why GMRES is used for those linear solves.
Why preconditioning is the key performance lever.
How structure-aware preconditioning connects to the architecture of real stiff-chemistry solvers.

I use one concrete 60-species toy chemistry problem throughout so every concept maps to something tangible. After section 12, you will find an interactive lab embedded directly in the page — all the solver traces from the toy run, explorable in your browser.

1) Start from the real problem

In reacting-flow and combustion simulations, species evolve by ordinary differential equations:

$\frac{dy}{dt} = f(y)$

where $y \in \mathbb{R}^N$ is a vector of species states (and often temperature and density terms), and $f(y)$ is the chemistry source term. $N$ can range from a handful of species in a global mechanism to thousands in a detailed one.

The painful part is that chemistry has many timescales at once. Some reactions are extremely fast — microseconds. Some are much slower — milliseconds or longer. One system, many orders of magnitude of timescale spread. That is stiffness.

2) What stiffness means (without hand-waving)

2.1 Physical meaning

If one species pair reacts in microseconds and another pathway evolves in milliseconds, your solver must represent both. The fast reactions create a kind of numerical sensitivity: a solver that is responsive to fast dynamics must take tiny steps, even when your actual engineering question is about the slow behavior.

This is the core frustration. You want to know what happens over a millisecond. But the solver is forced to walk in nanosecond steps because there happen to be fast reactions present, even if they equilibrate almost immediately.

2.2 Mathematical meaning

Linearized locally, the system is governed by the Jacobian $J = \partial f / \partial y$ :

$\frac{dy}{dt} = Jy$

If the eigenvalues of $J$ span a huge range in magnitude, the system is stiff. A common stiffness indicator is the ratio of spectral extremes:

$\kappa = \frac{|\lambda_{\max}|}{|\lambda_{\min}|}$

Large $\kappa$ means a wide timescale spread. Real combustion mechanisms routinely have $\kappa$ in the range of $10^6$ to $10^{12}$ .

3) Why explicit methods fail on stiff chemistry

Forward Euler advances the solution by:

$y^{(n+1)} = y^n + \Delta t \, f(y^n)$

For the scalar test equation $\dot{y} = \lambda y$ with $\lambda < 0$ , Forward Euler is stable only when:

$|1 + \lambda \, \Delta t| \leq 1 \quad \Rightarrow \quad \Delta t \leq \frac{2}{|\lambda|}$

So the largest-magnitude eigenvalue in the Jacobian dictates your maximum step size. If $|\lambda_{\max}| \sim 10^8$ , then $\Delta t$ must be on the order of $10^{-8}$ — regardless of whether you care about the fast dynamics at all.

This is the explicit method's fundamental failure on stiff systems: the step size is held hostage by stability, not by accuracy. You are not taking small steps because you need resolution. You are taking small steps because the method becomes numerically unstable otherwise.

4) Implicit methods solve stability — but introduce nonlinear solves

Backward Euler sidesteps the stability trap:

$y^{(n+1)} = y^n + \Delta t \, f(y^{(n+1)})$

For stiff problems this is dramatically more stable. You can take large steps without the solution blowing up.

But now $y^{(n+1)}$ appears on both sides. Define the nonlinear residual:

$F(y) = y - y_{\text{old}} - \Delta t \, f(y)$

At every timestep you must solve $F(y_{\text{new}}) = 0$ . Implicit integration trades timestep stability for a nonlinear algebraic solve. The difficulty does not disappear — it relocates.

5) Newton method: nonlinear solve → sequence of linear solves

Newton's method solves $F(y) = 0$ iteratively. At iteration $k$ with current guess $y_k$ :

$A_k \, \delta_k = -F(y_k)$

$y_{k+1} = y_k + \delta_k$

where:

$A_k = \frac{\partial F}{\partial y}\bigg|_{y_k} = I - \Delta t \, J(y_k)$

The key insight about Newton is that it does not remove the difficulty — it relocates it again. Newton turns one nonlinear solve into a sequence of linear system solves. If those linear solves are fast, implicit chemistry is fast. If they are slow, everything is slow.

The efficiency of the whole implicit timestepper depends on how quickly you can solve $A_k \, \delta = -F(y_k)$ .

6) The cost of solving linear systems naively

The naive approach at each Newton step is to build the dense matrix $A$ , factorize it with dense LU, and solve by back-substitution:

Storage: $\mathcal{O}(N^2)$
Factorization: $\mathcal{O}(N^3)$

For large mechanisms with hundreds or thousands of species, this is often the performance wall. Even worse: chemistry Jacobians are typically sparse, so dense methods spend the majority of their time multiplying zeros. You pay the full $\mathcal{O}(N^3)$ cost to handle a matrix that is mostly empty.

7) GMRES: an iterative solver that only needs matrix-vector products

GMRES (Generalized Minimal Residual) is an iterative linear solver for non-symmetric systems. Its defining property: it never needs to form $A$ explicitly. It only needs to evaluate the map $v \mapsto Av$ for given vectors $v$ .

Starting from an initial residual $r_0 = b - Ax_0$ , GMRES builds search directions from the Krylov subspace:

$\mathcal{K}_m = \text{span}\bigl\{r_0,\; Ar_0,\; A^2 r_0,\; \ldots,\; A^{m-1} r_0\bigr\}$

At each iteration it finds the combination of these directions that minimizes $\|b - Ax\|_2$ . The key advantage: chemistry Jacobians are sparse and structured, so the product $Av$ is cheap — and can even be approximated without forming $J$ at all.

8) Matrix-free Jacobian-vector product

For our Newton matrix $A = I - \Delta t \, J$ , the matrix-vector product with $v$ is:

$Av = v - \Delta t \, (Jv)$

The Jacobian-vector product $J(y)\,v$ can be approximated by a finite difference directional derivative:

$J(y)\,v \;\approx\; \frac{f(y + \varepsilon v) - f(y)}{\varepsilon}$

This requires only two function evaluations — no Jacobian assembly, no storage of a dense $N \times N$ matrix. GMRES needs matrix-vector products, and those products can be approximated cheaply using the structure of the problem.

9) Why GMRES alone is not enough on stiff systems

Without preconditioning, GMRES on stiff problems may require many iterations or stagnate. The reason is spectral: if $A = I - \Delta t \, J$ has eigenvalues spread over a wide range — which stiff chemistry almost always produces — then the Krylov directions reduce the residual slowly.

Theoretically, the iteration count can approach $N$ before convergence. For $N = 60$ that is manageable; for $N = 500$ or $N = 2000$ it becomes prohibitive.

Preconditioning is the solution.

10) Preconditioning: the dominant performance lever

Preconditioning transforms the linear system into one that GMRES can solve quickly. Instead of $Ax = b$ directly, solve:

$M^{-1} A x = M^{-1} b$

where $M$ is a matrix that approximates $A$ but is cheap to apply. The goal: choose $M$ such that $M^{-1}A$ has eigenvalues clustered near 1, which makes GMRES converge in very few iterations.

The trade-off is always: a more accurate $M$ means fewer GMRES iterations, but a more expensive $M$ to build and apply. The best preconditioner is one that captures the dominant physics at acceptable cost.

10.1 Diagonal preconditioner

Use only the main diagonal of $A$ . Apply $M^{-1}$ by dividing each component by the corresponding diagonal entry — $\mathcal{O}(N)$ to apply. This captures per-variable scaling but ignores all species coupling, which is exactly what makes stiff chemistry hard. For strongly coupled systems, diagonal preconditioning gives modest improvement.

10.2 Structure-aware (banded) preconditioner

If chemistry coupling is local — if species only interact with their neighbors — then $J$ has a near-banded structure. Build an approximate Jacobian $\hat{J}$ by retaining only entries within bandwidth $b$ , then form $M = I - \Delta t \, \hat{J}$ .

This $M$ captures the dominant coupling while remaining far cheaper than the full dense factorization. When the approximation matches the true dominant physics, GMRES iteration counts collapse dramatically.

The physical intuition: the preconditioner pre-corrects for the part of $A$ that matters most. Once absorbed into $M^{-1}$ , the residual system $M^{-1}A$ is nearly the identity, and GMRES converges in very few steps.

11) The 60-species toy model used here

Rather than use a real combustion mechanism — which would obscure the principles under layers of chemical detail — I built a deliberately simple but instructive synthetic system:

Transfer: $X_i \to X_{i+1}$ with rate decreasing along the chain
Bimolecular: $X_i + X_{i+1} \to X_{i+2}$ with coupling between neighbors
Decay: $X_i \to \varnothing$ at a small uniform rate
Source: constant injection into $X_0$

This creates nonlinearity (bimolecular term), stiffness (wide range in transfer rates), local species coupling (each species only talks to its neighbors), and a sparse near-banded Jacobian. These are exactly the features needed to demonstrate why preconditioning matters, in a setting where everything is visible and controllable.

12) What the experiment revealed

Running one implicit timestep ( $\Delta t = 0.05$ ) with all three strategies on this system produces the same final solution in all cases — the tolerances are met, the answer is correct. But the cost to get there is dramatically different.

Without any preconditioner, GMRES struggles on several of the Newton steps. Some steps require 20, 37, or 62 inner iterations before the linear system is solved to tolerance. The total across all 8 Newton steps is 172 GMRES iterations.

The diagonal preconditioner helps modestly. It rescales each equation, removing some per-variable disparity. But it doesn't capture species coupling, so it still hits expensive steps — one step alone costs 63 GMRES iterations. Total: 153 iterations, not much better.

The banded preconditioner changes everything. Because it captures the actual dominant coupling in the Jacobian, Newton converges in just 4 steps, and each step requires exactly 1 GMRES iteration. Total: 4 GMRES iterations — a 43× reduction.

All three strategies solve the same equation to the same accuracy. The preconditioner doesn't change the answer. It changes the path.

Explore it yourself. The lab below contains the full solver traces from this run. You can inspect the Newton convergence under each strategy, click through individual Newton steps to see GMRES residual histories on a log scale, and see the sparsity pattern that explains why banded wins.

One implicit timestep of a 60-species stiff chemistry system, solved three ways. Explore how Newton outer iterations and GMRES inner solves interact — and why the preconditioner is the dominant performance lever.

The key result — one implicit timestep, 60-species system

No preconditioner

172

total GMRES iterations

across 8 Newton steps

43× more than banded

Diagonal

152

total GMRES iterations

across 8 Newton steps

38× more than banded

Banded ILU

total GMRES iterations

across 4 Newton steps

winner

All three strategies solve the same equation F(y) = 0 to the same tolerance. The preconditioner doesn't change the problem — it changes how efficiently GMRES converges at each Newton step. Banded ILU captures the dominant local coupling in the Jacobian, which is why it collapses the iteration count so dramatically.

Newton outer-loop progress

Each row is one Newton iteration. The bar shows the magnitude of the nonlinear residual ‖F(y)‖. The number on the right is how many GMRES iterations were needed to compute the Newton update for that step.

Newton 0

‖F‖ = 3.538

1 GMRES

Newton 1

‖F‖ = 0.273

1 GMRES

Newton 2

‖F‖ = 0.005

1 GMRES

Newton 3

‖F‖ = 1.3e-6

1 GMRES

Total: 4 GMRES iterations across 4 Newton steps

GMRES convergence for each Newton step

Pick a Newton step. Each chart shows the GMRES residual history on a log scale. A steep downward curve means fast convergence; a flat curve means GMRES is struggling. The difference between strategies is most visible at harder Newton steps.

No preconditioner

5 iters

‖F‖ entering Newton: 3.538

Diagonal

4 iters

‖F‖ entering Newton: 3.538

Banded ILU

1 iters

‖F‖ entering Newton: 3.538

Why structure matters: Jacobian sparsity

J = ∂f/∂y — 60×60 Jacobian

Only 294 entries out of 3600 are nonzero — a fill fraction of just 8.2%. Each species only couples to its immediate neighbors; that local interaction is exactly what creates the near-diagonal band.

The banded preconditioner explicitly captures this coupling within bandwidth 2. The diagonal preconditioner keeps only the main diagonal, discarding all off-diagonal coupling. The unpreconditioned case leaves GMRES to discover the structure through Krylov iterations alone.

The closer your preconditioner is to the true operator, the fewer GMRES iterations you need. Banded captures the dominant physics; that is why it collapses 172 → 4.

13) The complete algorithm from start to finish

One implicit timestep, unrolled:

Start with $y_{\text{old}}$ from the previous accepted step.
Set Newton initial guess $y = y_{\text{old}}$ .
Compute nonlinear residual $F(y) = y - y_{\text{old}} - \Delta t \, f(y)$ .
If $\|F(y)\|$ is below tolerance: timestep is done.
Build or update the preconditioner $M$ for the current $y$ .
Solve $A(y)\,\delta = -F(y)$ using preconditioned GMRES, where $A \cdot v = v - \Delta t \, Jv$ is evaluated matrix-free.
Update $y \leftarrow y + \delta$ .
Check convergence on $\|\delta\|$ and $\|F\|$ . If not converged, return to step 3.

This is the Newton–Krylov–preconditioned workflow. Every element is forced by the previous one: implicit stepping forces the nonlinear solve, Newton forces the linear system, scale forces Krylov, and Krylov efficiency forces preconditioning.

14) Runnable reference implementation

The code below captures the full method — chemistry model, Newton loop, GMRES, and all three preconditioners — in a single self-contained script.

Run this in your browser: Open Google Colab, paste the code into a cell, and run. No installation required. Try changing the bandwidth, the timestep, or the chemistry rates and see directly how the iteration counts change.

import numpy as np
 
# ------------------------------------------------------------
# Problem setup
# ------------------------------------------------------------
N = 60
dt = 0.05
 
k_transfer = 50.0 * np.exp(-0.01 * np.arange(N - 1)) + 1.0
k_bimol = 30.0 * np.exp(-0.01 * np.arange(N - 2)) + 0.5
k_decay = 0.05 + 0.005 * np.arange(N)
 
source = np.zeros(N)
source[0] = 2.0
 
y_old = np.zeros(N)
y_old[0] = 1.0
 
 
def rhs(y):
    y = np.maximum(y, 0.0)
    dy = source.copy()
 
    flux_transfer = k_transfer * y[:-1]
    dy[:-1] -= flux_transfer
    dy[1:] += flux_transfer
 
    flux_bimol = k_bimol * y[:-2] * y[1:-1]
    dy[:-2] -= flux_bimol
    dy[1:-1] -= flux_bimol
    dy[2:] += flux_bimol
 
    dy -= k_decay * y
    return dy
 
 
def F(y):
    return y - y_old - dt * rhs(y)
 
 
def Jv_fd(y, v):
    vnorm = np.linalg.norm(v)
    if vnorm == 0.0:
        return np.zeros_like(v)
    eps = 1e-7 * (1.0 + np.linalg.norm(y)) / vnorm
    return (rhs(y + eps * v) - rhs(y)) / eps
 
 
def A_matvec(y, v):
    return v - dt * Jv_fd(y, v)
 
 
def gmres_left_preconditioned(matvec, b, Msolve=None, tol=1e-7, maxit=80):
    n = b.size
    x0 = np.zeros(n)
 
    def Minv(v):
        return v if Msolve is None else Msolve(v)
 
    r0 = Minv(b - matvec(x0))
    beta = np.linalg.norm(r0)
    hist = [beta]
    if beta < tol:
        return x0, hist
 
    V = np.zeros((n, maxit + 1))
    H = np.zeros((maxit + 1, maxit))
    V[:, 0] = r0 / beta
    x = x0.copy()
 
    for j in range(maxit):
        w = Minv(matvec(V[:, j]))
 
        for i in range(j + 1):
            H[i, j] = np.dot(V[:, i], w)
            w -= H[i, j] * V[:, i]
 
        H[j + 1, j] = np.linalg.norm(w)
        if H[j + 1, j] > 1e-14:
            V[:, j + 1] = w / H[j + 1, j]
 
        e1 = np.zeros(j + 2)
        e1[0] = beta
        yls, *_ = np.linalg.lstsq(H[:j + 2, :j + 1], e1, rcond=None)
        x = x0 + V[:, :j + 1] @ yls
 
        r = Minv(b - matvec(x))
        rnorm = np.linalg.norm(r)
        hist.append(rnorm)
 
        if rnorm < tol or H[j + 1, j] < 1e-14:
            break
 
    return x, hist
 
 
def diagonal_preconditioner(y):
    base = rhs(y)
    diagJ = np.zeros(N)
    for j in range(N):
        eps = 1e-8 * (1.0 + abs(y[j]))
        yp = y.copy()
        yp[j] += eps
        diagJ[j] = (rhs(yp)[j] - base[j]) / eps
 
    diagA = 1.0 - dt * diagJ
    diagA = np.where(np.abs(diagA) < 1e-12, 1.0, diagA)
    return lambda r: r / diagA
 
 
def banded_preconditioner(y, bandwidth=2):
    base = rhs(y)
    Jband = np.zeros((N, N))
 
    for j in range(N):
        eps = 1e-8 * (1.0 + abs(y[j]))
        yp = y.copy()
        yp[j] += eps
        col = (rhs(yp) - base) / eps
 
        i0 = max(0, j - bandwidth)
        i1 = min(N, j + bandwidth + 1)
        Jband[i0:i1, j] = col[i0:i1]
 
    M = np.eye(N) - dt * Jband
    return lambda r: np.linalg.solve(M, r)
 
 
def implicit_step_newton(y_start, kind="none", newton_tol=1e-8, max_newton=8):
    y = y_start.copy()
 
    for k in range(max_newton):
        Fy = F(y)
        Fnorm = np.linalg.norm(Fy)
        if Fnorm < newton_tol:
            return y
 
        if kind == "none":
            Msolve = None
        elif kind == "diag":
            Msolve = diagonal_preconditioner(y)
        elif kind == "band":
            Msolve = banded_preconditioner(y, bandwidth=2)
        else:
            raise ValueError("unknown preconditioner kind")
 
        delta, hist = gmres_left_preconditioned(
            matvec=lambda v: A_matvec(y, v),
            b=-Fy,
            Msolve=Msolve,
            tol=1e-7,
            maxit=80,
        )
 
        y = y + delta
 
        if np.linalg.norm(delta) < 1e-12 * (1.0 + np.linalg.norm(y)):
            return y
 
    return y
 
 
y_guess = y_old.copy()
y_none = implicit_step_newton(y_guess, kind="none")
y_diag = implicit_step_newton(y_guess, kind="diag")
y_band = implicit_step_newton(y_guess, kind="band")
 
print("||y_none - y_band|| =", np.linalg.norm(y_none - y_band))
print("||y_diag - y_band|| =", np.linalg.norm(y_diag - y_band))

15) How this architecture appears in real stiff-chemistry solvers

The toy model demonstrates the same architectural choices made in serious stiff-chemistry software:

Exploit sparsity and structure in the Jacobian rather than treating it as dense.
Use Krylov solvers (GMRES or similar) for the linearized systems in implicit timesteps.
Apply structure-aware preconditioning to control Krylov iteration counts.
Avoid full Jacobian assembly where possible, using finite-difference directional derivatives.

Real solvers extend these ideas to mechanisms with hundreds or thousands of species, using more sophisticated sparse infrastructure — compressed storage formats, graph-coloring for efficient Jacobian estimation, ILU factorizations that exploit fill patterns, and adaptive preconditioning that updates $M$ only when the solution changes significantly.

The principles, however, are exactly the same. Understanding the toy model gives you the conceptual framework to read and interpret production stiff-chemistry code.

16) Diagnostic checklist for when you need it

When you encounter a stiff chemistry solver and need to understand its performance, ask these questions in order:

Is the timestepper stiff-appropriate? (BDF, SDIRK, or implicit Runge-Kutta — not explicit methods.)
Are the nonlinear solves robust? (Check Newton tolerances, update checks, maximum iterations.)
Are the linear solves matrix-free or explicit-matrix, and why was that choice made?
What preconditioner is used, and what physical coupling does it capture?
Where is the runtime actually going? (Jacobian assembly, preconditioner build, GMRES iterations, or chemistry RHS evaluations?)

If you can answer all five clearly, you understand the solver.

17) The chain of reasoning

The progression from chemistry ODE to preconditioned Krylov solve is not a collection of arbitrary choices. Each step is forced by the previous one:

Stiffness forces implicit methods. Implicit methods force a nonlinear solve at every timestep. Newton turns that nonlinear solve into repeated linear system solves. Repeated stiff linear solves force Krylov methods — because direct factorization at scale is too expensive. Krylov methods on stiff operators require preconditioning to converge in acceptable iteration counts. And the best preconditioners come from exploiting the physical structure of the Jacobian.

That chain is the central idea. The solver is not arbitrary — it is the logical endpoint of confronting stiffness honestly, step by step. The interactive lab in section 12 shows you the numbers behind that claim.

More to explore