What We Learned Training a Neural Network to Operate a Gas Storage Facility

We spent the last few months trying to train a neural network to operate a gas storage facility. 365 daily decisions, stochastic prices, physical constraints, demand obligations. Here’s what we hit and how we got past it.

Problem 1: RL doesn’t work here

We started where everyone starts, reinforcement learning. PPO, SAC, stable-baselines3, the standard toolkit. After 4 hours of GPU training, SAC reached $1.07M on our benchmark. Our C++ implementation with exact adjoint gradients hit $2.19M in 39 seconds on CPU. We even tried putting exact gradients inside PPO and SAC. The RL machinery made things worse. Clipping and entropy regularisation solve a problem that doesn’t exist when your gradients are exact.

Problem 2: You can’t differentiate through min/max

Gas storage has hard physical constraints. You can’t inject more than the tank holds, can’t withdraw below zero. These are min/max operations. Non-differentiable. No gradient can flow through them. We replaced them with smooth sigmoid approximations. It worked, but introduced a 17% bias on some curves. We were honest about this in the paper. It’s the cost of differentiability.

Problem 3: Demand changes everything

Real storage facilities have firm service commitments. Customers need gas delivered regardless of price. When we added stochastic demand, something unexpected happened. The smooth constraint bias stopped mattering. The neural policy’s ability to react to demand shocks more than offset the constraint approximation error. The adjoint approach went from losing to DP on 3 out of 6 curves to winning every single one, by 31–67%.

Problem 4: Everyone treats the simulator as a black box

This is the part that surprised us most. PyTorch and JAX can differentiate through a neural network. But the pricing model, the inventory dynamics, the cashflow discounting: they treat all of that as a black box. REINFORCE estimates gradients by randomly perturbing weights and measuring what happens. With 807 parameters, that’s noise.

Differentiable simulation does something different. It records the entire simulation (price model, policy, constraints, cashflows) and compiles it into a native machine code kernel. One reverse pass gives you exact gradients through everything. Not through the network. Through the simulator. That’s why 40 seconds of training beats 4 hours of RL.

Problem 5: Greeks

A storage desk doesn’t just need a value. It needs hedge ratios. How does the position change if the April contract moves? If rates move? With adjoint differentiation, all 365 forward curve sensitivities come from the same reverse pass used for training. Zero additional cost. DP needs 365 bump-and-revalue re-solves. LSMC needs 365 re-runs.

To our knowledge, nobody has applied adjoint differentiation to gas storage policy optimisation before. Adjoint methods are well established for Greeks in rates and equity. Neural approaches to gas storage exist. But differentiating through the full commodity simulation (price model, policy, constraints, cashflows) in one pass? That appears to be new.

Paper forthcoming in Commodity Insights Digest. Co-authored with Dmitri Goloubentsev.

Implemented using AADC, a commercial adjoint AD compiler (matlogica.com).