Hamilton-Jacobi-Bellman (HJB) Solver User Guide
Introduction
The Hamilton-Jacobi-Bellman (HJB) solver is a powerful tool for finding optimal control strategies in insurance and risk management. Unlike traditional optimization methods that may find local optima or require specific problem structures, the HJB approach provides globally optimal solutions by solving a partial differential equation (PDE) that characterizes the value function and optimal policy simultaneously.
When to Use the HJB Solver
The HJB solver is particularly effective for:
Dynamic Optimization Problems: When decisions must adapt to changing states over time
Stochastic Control: Problems with uncertainty in state evolution
Path-Dependent Strategies: When optimal decisions depend on history
Multi-dimensional State Spaces: Problems with multiple interacting state variables
Non-linear Dynamics: Systems with complex, non-linear state evolution
Risk-Sensitive Control: When risk aversion affects optimal decisions
Advantages Over Alternative Methods
- Compared to Monte Carlo Optimization:
Provides deterministic, reproducible solutions
Faster convergence for smooth problems
Yields complete policy function, not just point estimates
- Compared to Dynamic Programming:
Handles continuous state and control spaces naturally
Scales better with time horizon
Provides smooth value functions and policies
- Compared to Model Predictive Control:
Solves for entire policy offline
No need for repeated optimization during execution
Guarantees global optimality under convexity assumptions
Mathematical Foundation
The HJB Equation
The HJB equation for a finite-horizon problem is:
- Where:
\(V(x,t)\) is the value function
\(L(x,u,t)\) is the running cost/reward
\(f(x,u,t)\) is the drift (deterministic dynamics)
\(\sigma(x,t)\) is the diffusion coefficient (stochastic component)
\(u\) is the control variable
For infinite-horizon problems with discount rate \(\rho\):
Solution Method
The solver uses policy iteration with finite difference discretization:
Initialize: Start with an initial guess for the value function and policy
Policy Evaluation: Fix the policy and solve the resulting linear PDE for the value function
Policy Improvement: Update the policy by maximizing the Hamiltonian at each state
Iterate: Repeat steps 2-3 until convergence
- The finite difference scheme uses:
Upwind differencing for first-order terms (ensures stability)
Central differencing for second-order terms
Implicit time-stepping for robustness
Core Components
State Variables
State variables define the problem dimensions:
from ergodic_insurance.hjb_solver import StateVariable, BoundaryCondition
# Wealth state with logarithmic spacing
wealth = StateVariable(
name="wealth",
min_value=1e6,
max_value=1e8,
num_points=100,
log_scale=True, # Use log spacing for large ranges
boundary_lower=BoundaryCondition.ABSORBING,
boundary_upper=BoundaryCondition.REFLECTING
)
# Time state with linear spacing
time = StateVariable(
name="time",
min_value=0,
max_value=10,
num_points=50,
log_scale=False
)
- Best Practices:
Use logarithmic spacing for variables spanning orders of magnitude
Choose boundary conditions matching the economic interpretation
Balance grid resolution with computational cost (50-200 points per dimension)
Control Variables
Control variables represent decisions:
from ergodic_insurance.hjb_solver import ControlVariable
# Insurance limit control
insurance_limit = ControlVariable(
name="limit",
min_value=0,
max_value=5e7,
num_points=30, # Discretization for optimization
continuous=True
)
# Retention/deductible control
retention = ControlVariable(
name="retention",
min_value=1e5,
max_value=1e7,
num_points=30
)
- Optimization Tips:
Use 20-50 control points for smooth problems
Increase resolution near expected optimal values
Consider adaptive refinement for complex policies
Utility Functions
The solver includes several built-in utility functions:
from ergodic_insurance.hjb_solver import (
LogUtility,
PowerUtility,
ExpectedWealth,
create_custom_utility
)
# Logarithmic utility (Kelly criterion)
log_utility = LogUtility(wealth_floor=1e3)
# Power utility (CRRA)
power_utility = PowerUtility(
risk_aversion=2.0, # Higher = more risk averse
wealth_floor=1e3
)
# Risk-neutral (linear) utility
linear_utility = ExpectedWealth()
# Custom utility function
def exponential_eval(w):
alpha = 0.001
return 1 - np.exp(-alpha * w)
def exponential_deriv(w):
alpha = 0.001
return alpha * np.exp(-alpha * w)
custom_utility = create_custom_utility(
evaluate_func=exponential_eval,
derivative_func=exponential_deriv
)
Complete Example: Insurance Optimization
Problem Setup
Consider a manufacturing firm optimizing its insurance program to maximize long-term growth:
import numpy as np
from ergodic_insurance.hjb_solver import (
StateVariable, StateSpace,
ControlVariable,
HJBProblem, HJBSolver, HJBSolverConfig,
LogUtility, BoundaryCondition
)
# Define state space
state_vars = [
StateVariable(
name="assets",
min_value=1e6,
max_value=1e9,
num_points=80,
log_scale=True,
boundary_lower=BoundaryCondition.ABSORBING, # Bankruptcy
boundary_upper=BoundaryCondition.NEUMANN # No constraint
),
StateVariable(
name="loss_rate",
min_value=0,
max_value=0.2,
num_points=40,
log_scale=False
)
]
state_space = StateSpace(state_vars)
# Define controls
controls = [
ControlVariable("coverage_limit", 0, 5e7, num_points=25),
ControlVariable("deductible", 1e4, 1e7, num_points=25)
]
# Use log utility for growth optimization
utility = LogUtility(wealth_floor=1e4)
Dynamics and Costs
Define the system dynamics and running costs:
def dynamics(state, control, time):
"""Asset dynamics with insurance."""
assets = state[..., 0]
loss_rate = state[..., 1]
limit = control[..., 0]
deductible = control[..., 1]
# Growth rate reduced by insurance premium
base_growth = 0.08
premium_rate = 0.02 * (limit / 1e7) * (1 - deductible / limit)
# Expected retained losses
expected_loss = assets * loss_rate * np.minimum(1.0, deductible / assets)
# Asset drift
asset_drift = assets * (base_growth - premium_rate) - expected_loss
# Loss rate mean reversion
loss_drift = 0.1 * (0.05 - loss_rate)
return np.stack([asset_drift, loss_drift], axis=-1)
def running_cost(state, control, time):
"""Utility flow from operations."""
assets = state[..., 0]
return utility.evaluate(assets)
def terminal_value(state):
"""Terminal wealth utility."""
assets = state[..., 0]
return utility.evaluate(assets)
Solving the HJB Equation
Create and solve the HJB problem:
# Create HJB problem
problem = HJBProblem(
state_space=state_space,
control_variables=controls,
utility_function=utility,
dynamics=dynamics,
running_cost=running_cost,
terminal_value=terminal_value,
discount_rate=0.05,
time_horizon=20 # 20-year horizon
)
# Configure solver
config = HJBSolverConfig(
time_step=0.01,
max_iterations=500,
tolerance=1e-6,
scheme=TimeSteppingScheme.IMPLICIT,
use_sparse=True,
verbose=True
)
# Solve
solver = HJBSolver(problem, config)
value_function, optimal_policy = solver.solve()
# Check solution quality
metrics = solver.compute_convergence_metrics()
print(f"Max residual: {metrics['max_residual']:.2e}")
print(f"Policy range - Limit: {metrics['policy_stats']['coverage_limit']}")
print(f"Policy range - Deductible: {metrics['policy_stats']['deductible']}")
Using the Solution
Extract and apply the optimal policy:
# Get optimal control at specific state
current_state = np.array([5e7, 0.06]) # $50M assets, 6% loss rate
optimal_control = solver.extract_feedback_control(current_state)
print(f"Optimal coverage limit: ${optimal_control['coverage_limit']:,.0f}")
print(f"Optimal deductible: ${optimal_control['deductible']:,.0f}")
# Visualize the policy
import matplotlib.pyplot as plt
# Plot optimal limit as function of assets (fixed loss rate)
assets_range = np.linspace(1e6, 1e9, 100)
loss_rate_fixed = 0.05
optimal_limits = []
for assets in assets_range:
state = np.array([assets, loss_rate_fixed])
control = solver.extract_feedback_control(state)
optimal_limits.append(control['coverage_limit'])
plt.figure(figsize=(10, 6))
plt.semilogx(assets_range, optimal_limits)
plt.xlabel('Assets ($)')
plt.ylabel('Optimal Coverage Limit ($)')
plt.title('State-Dependent Insurance Strategy')
plt.grid(True)
plt.show()
Advanced Topics
Multi-Layer Insurance
For complex insurance programs with multiple layers:
# Define controls for each layer
controls = []
for i in range(3): # 3-layer program
controls.extend([
ControlVariable(f"limit_L{i}", 1e6 * (i+1), 1e7 * (i+1), 20),
ControlVariable(f"attachment_L{i}", 1e5 * (i+1), 1e6 * (i+1), 20),
ControlVariable(f"coinsurance_L{i}", 0.8, 1.0, 10)
])
# Dynamics account for all layers
def multi_layer_dynamics(state, control, time):
assets = state[..., 0]
total_premium = 0
for i in range(3):
limit = control[..., i*3]
attachment = control[..., i*3 + 1]
coinsurance = control[..., i*3 + 2]
# Layer-specific premium
layer_premium = compute_layer_premium(limit, attachment, coinsurance)
total_premium += layer_premium
# Continue with dynamics...
Stochastic Volatility
Incorporate time-varying uncertainty:
# Add volatility as state variable
volatility = StateVariable(
name="volatility",
min_value=0.01,
max_value=0.5,
num_points=30,
log_scale=False
)
def stochastic_dynamics(state, control, time):
assets = state[..., 0]
vol = state[..., 1]
# Volatility affects growth uncertainty
growth_rate = 0.08
growth_std = vol * np.sqrt(assets)
# In HJB, we work with drift (deterministic part)
# Diffusion enters through second-order terms
drift_assets = assets * growth_rate
drift_vol = 0.2 * (0.15 - vol) # Mean reversion
return np.stack([drift_assets, drift_vol], axis=-1)
Performance Optimization
Computational Efficiency
Grid Resolution: Start coarse, refine gradually
# Initial solve with coarse grid wealth_coarse = StateVariable("wealth", 1e6, 1e8, num_points=30) # Refine around optimal region wealth_fine = StateVariable("wealth", 1e6, 1e8, num_points=100)
Sparse Matrices: Enable for large problems
config = HJBSolverConfig(use_sparse=True)
Parallel Control Optimization: Future enhancement
# Currently sequential, but structure allows parallelization # Each state's optimization is independent
Memory Management
For very large state spaces:
# Use iterative methods instead of direct solvers
config = HJBSolverConfig(
scheme=TimeSteppingScheme.IMPLICIT,
use_sparse=True,
time_step=0.001 # Smaller time steps for stability
)
# Consider domain decomposition for 3+ dimensions
# Solve on subdomains and match at boundaries
Troubleshooting Guide
Common Issues and Solutions
- Problem: Solver doesn’t converge
Increase
max_iterations
Reduce
tolerance
Check boundary conditions match problem physics
Ensure dynamics and costs are smooth
- Problem: Numerical instabilities
Use implicit time-stepping
Reduce time step
Check for discontinuities in dynamics/costs
Increase grid resolution near discontinuities
- Problem: Unrealistic optimal policies
Verify utility function choice
Check control bounds are reasonable
Ensure dynamics correctly model the system
Validate cost/reward functions
- Problem: Out of memory
Reduce grid points
Enable sparse matrices
Use iterative solvers
Consider dimension reduction
Validation Techniques
Benchmark Against Known Solutions:
# Test with linear dynamics, quadratic cost (LQR) # Compare with analytical Riccati solution
Convergence Analysis:
resolutions = [20, 40, 80, 160] solutions = [] for n in resolutions: state = StateVariable("x", 0, 1, num_points=n) # Solve and store value at test point solutions.append(value_at_test_point) # Check convergence rate errors = [abs(s - solutions[-1]) for s in solutions[:-1]]
Policy Simulation:
# Forward simulate using optimal policy # Check value function matches simulated rewards
Integration with Simulation Framework
Using with OptimalController
Integrate HJB solutions with the simulation framework:
from ergodic_insurance.optimal_control import (
HJBFeedbackControl,
OptimalController,
ControlSpace
)
# After solving HJB
control_space = ControlSpace(
limits=[(1e6, 5e7)],
retentions=[(1e5, 1e7)],
coverage_percentages=[(0.8, 1.0)]
)
# Create feedback strategy
strategy = HJBFeedbackControl(
hjb_solver=solver,
control_space=control_space,
state_mapping=lambda state_dict: np.array([
state_dict['assets'],
state_dict['loss_rate']
])
)
# Create controller
controller = OptimalController(strategy, control_space)
# Use in simulation loop
for t in range(simulation_steps):
insurance = controller.apply_control(manufacturer, time=t)
# Apply insurance and simulate...
Real-Time Adaptation
For online learning and adaptation:
class AdaptiveHJBControl:
def __init__(self, base_solver):
self.solver = base_solver
self.observations = []
def update(self, state, outcome):
"""Update beliefs about system dynamics."""
self.observations.append((state, outcome))
if len(self.observations) % 100 == 0:
# Re-estimate parameters
new_dynamics = self.estimate_dynamics()
# Re-solve HJB with updated model
self.solver.problem.dynamics = new_dynamics
self.solver.solve()
Best Practices Summary
Problem Formulation: - Clearly define state and control spaces - Choose utility function matching risk preferences - Ensure dynamics are smooth and well-behaved
Numerical Setup: - Start with coarse grids, refine gradually - Use appropriate boundary conditions - Choose time-stepping scheme based on stability needs
Validation: - Test on problems with known solutions - Verify convergence with grid refinement - Simulate policies to check consistency
Performance: - Profile to identify bottlenecks - Use sparse matrices for large problems - Consider approximation methods for high dimensions
Integration: - Map between simulation and HJB state spaces carefully - Handle edge cases in policy interpolation - Monitor solution quality during deployment
Conclusion
The HJB solver provides a powerful framework for optimal control in insurance and risk management. Its ability to handle complex dynamics, multiple state variables, and risk-sensitive objectives makes it particularly valuable for:
Dynamic insurance optimization: Adapting coverage to changing risk profiles
Capital allocation: Balancing growth and risk in investment strategies
Operational decisions: Optimizing production and inventory under uncertainty
While computational requirements grow with dimensionality, careful problem formulation and numerical techniques enable practical solutions for real-world problems. The global optimality guarantees and complete policy characterization often justify the computational investment compared to heuristic or local methods.
For further examples and applications, see the Jupyter notebooks in ergodic_insurance/notebooks/
, particularly:
- 12_hjb_optimal_control.ipynb
: Complete examples with visualization
- 11_pareto_analysis.ipynb
: Multi-objective optimization using HJB
References
Fleming, W.H., & Soner, H.M. (2006). Controlled Markov Processes and Viscosity Solutions. Springer.
Kushner, H.J., & Dupuis, P. (2001). Numerical Methods for Stochastic Control Problems in Continuous Time. Springer.
Bertsekas, D.P. (2017). Dynamic Programming and Optimal Control. Athena Scientific.
Pham, H. (2009). Continuous-time Stochastic Control and Optimization with Financial Applications. Springer.
Touzi, N. (2012). Optimal Stochastic Control, Stochastic Target Problems, and Backward SDE. Springer.