================================================ Hamilton-Jacobi-Bellman (HJB) Solver User Guide ================================================ .. contents:: Table of Contents :depth: 3 :local: Introduction ============ The Hamilton-Jacobi-Bellman (HJB) solver is a powerful tool for finding optimal control strategies in insurance and risk management. Unlike traditional optimization methods that may find local optima or require specific problem structures, the HJB approach provides globally optimal solutions by solving a partial differential equation (PDE) that characterizes the value function and optimal policy simultaneously. When to Use the HJB Solver ========================== The HJB solver is particularly effective for: 1. **Dynamic Optimization Problems**: When decisions must adapt to changing states over time 2. **Stochastic Control**: Problems with uncertainty in state evolution 3. **Path-Dependent Strategies**: When optimal decisions depend on history 4. **Multi-dimensional State Spaces**: Problems with multiple interacting state variables 5. **Non-linear Dynamics**: Systems with complex, non-linear state evolution 6. **Risk-Sensitive Control**: When risk aversion affects optimal decisions Advantages Over Alternative Methods ------------------------------------ **Compared to Monte Carlo Optimization:** - Provides deterministic, reproducible solutions - Faster convergence for smooth problems - Yields complete policy function, not just point estimates **Compared to Dynamic Programming:** - Handles continuous state and control spaces naturally - Scales better with time horizon - Provides smooth value functions and policies **Compared to Model Predictive Control:** - Solves for entire policy offline - No need for repeated optimization during execution - Guarantees global optimality under convexity assumptions Mathematical Foundation ======================= The HJB Equation ---------------- The HJB equation for a finite-horizon problem is: .. math:: -\frac{\partial V}{\partial t}(x,t) + \max_u \left[ L(x,u,t) + \nabla V(x,t) \cdot f(x,u,t) + \frac{1}{2}\text{tr}(\sigma(x,t)\sigma(x,t)^T \nabla^2 V) \right] = 0 Where: - :math:`V(x,t)` is the value function - :math:`L(x,u,t)` is the running cost/reward - :math:`f(x,u,t)` is the drift (deterministic dynamics) - :math:`\sigma(x,t)` is the diffusion coefficient (stochastic component) - :math:`u` is the control variable For infinite-horizon problems with discount rate :math:`\rho`: .. math:: \rho V(x) = \max_u \left[ L(x,u) + \nabla V(x) \cdot f(x,u) + \frac{1}{2}\text{tr}(\sigma(x)\sigma(x)^T \nabla^2 V) \right] Solution Method --------------- The solver uses **policy iteration** with finite difference discretization: 1. **Initialize**: Start with an initial guess for the value function and policy 2. **Policy Evaluation**: Fix the policy and solve the resulting linear PDE for the value function 3. **Policy Improvement**: Update the policy by maximizing the Hamiltonian at each state 4. **Iterate**: Repeat steps 2-3 until convergence The finite difference scheme uses: - **Upwind differencing** for first-order terms (ensures stability) - **Central differencing** for second-order terms - **Implicit time-stepping** for robustness Core Components =============== State Variables --------------- State variables define the problem dimensions: .. code-block:: python from ergodic_insurance.hjb_solver import StateVariable, BoundaryCondition # Wealth state with logarithmic spacing wealth = StateVariable( name="wealth", min_value=1e6, max_value=1e8, num_points=100, log_scale=True, # Use log spacing for large ranges boundary_lower=BoundaryCondition.ABSORBING, boundary_upper=BoundaryCondition.REFLECTING ) # Time state with linear spacing time = StateVariable( name="time", min_value=0, max_value=10, num_points=50, log_scale=False ) **Best Practices:** - Use logarithmic spacing for variables spanning orders of magnitude - Choose boundary conditions matching the economic interpretation - Balance grid resolution with computational cost (50-200 points per dimension) Control Variables ----------------- Control variables represent decisions: .. code-block:: python from ergodic_insurance.hjb_solver import ControlVariable # Insurance limit control insurance_limit = ControlVariable( name="limit", min_value=0, max_value=5e7, num_points=30, # Discretization for optimization continuous=True ) # Retention/deductible control retention = ControlVariable( name="retention", min_value=1e5, max_value=1e7, num_points=30 ) **Optimization Tips:** - Use 20-50 control points for smooth problems - Increase resolution near expected optimal values - Consider adaptive refinement for complex policies Utility Functions ----------------- The solver includes several built-in utility functions: .. code-block:: python from ergodic_insurance.hjb_solver import ( LogUtility, PowerUtility, ExpectedWealth, create_custom_utility ) # Logarithmic utility (Kelly criterion) log_utility = LogUtility(wealth_floor=1e3) # Power utility (CRRA) power_utility = PowerUtility( risk_aversion=2.0, # Higher = more risk averse wealth_floor=1e3 ) # Risk-neutral (linear) utility linear_utility = ExpectedWealth() # Custom utility function def exponential_eval(w): alpha = 0.001 return 1 - np.exp(-alpha * w) def exponential_deriv(w): alpha = 0.001 return alpha * np.exp(-alpha * w) custom_utility = create_custom_utility( evaluate_func=exponential_eval, derivative_func=exponential_deriv ) Complete Example: Insurance Optimization ========================================= Problem Setup ------------- Consider a manufacturing firm optimizing its insurance program to maximize long-term growth: .. code-block:: python import numpy as np from ergodic_insurance.hjb_solver import ( StateVariable, StateSpace, ControlVariable, HJBProblem, HJBSolver, HJBSolverConfig, LogUtility, BoundaryCondition ) # Define state space state_vars = [ StateVariable( name="assets", min_value=1e6, max_value=1e9, num_points=80, log_scale=True, boundary_lower=BoundaryCondition.ABSORBING, # Bankruptcy boundary_upper=BoundaryCondition.NEUMANN # No constraint ), StateVariable( name="loss_rate", min_value=0, max_value=0.2, num_points=40, log_scale=False ) ] state_space = StateSpace(state_vars) # Define controls controls = [ ControlVariable("coverage_limit", 0, 5e7, num_points=25), ControlVariable("deductible", 1e4, 1e7, num_points=25) ] # Use log utility for growth optimization utility = LogUtility(wealth_floor=1e4) Dynamics and Costs ------------------ Define the system dynamics and running costs: .. code-block:: python def dynamics(state, control, time): """Asset dynamics with insurance.""" assets = state[..., 0] loss_rate = state[..., 1] limit = control[..., 0] deductible = control[..., 1] # Growth rate reduced by insurance premium base_growth = 0.08 premium_rate = 0.02 * (limit / 1e7) * (1 - deductible / limit) # Expected retained losses expected_loss = assets * loss_rate * np.minimum(1.0, deductible / assets) # Asset drift asset_drift = assets * (base_growth - premium_rate) - expected_loss # Loss rate mean reversion loss_drift = 0.1 * (0.05 - loss_rate) return np.stack([asset_drift, loss_drift], axis=-1) def running_cost(state, control, time): """Utility flow from operations.""" assets = state[..., 0] return utility.evaluate(assets) def terminal_value(state): """Terminal wealth utility.""" assets = state[..., 0] return utility.evaluate(assets) Solving the HJB Equation ------------------------ Create and solve the HJB problem: .. code-block:: python # Create HJB problem problem = HJBProblem( state_space=state_space, control_variables=controls, utility_function=utility, dynamics=dynamics, running_cost=running_cost, terminal_value=terminal_value, discount_rate=0.05, time_horizon=20 # 20-year horizon ) # Configure solver config = HJBSolverConfig( time_step=0.01, max_iterations=500, tolerance=1e-6, scheme=TimeSteppingScheme.IMPLICIT, use_sparse=True, verbose=True ) # Solve solver = HJBSolver(problem, config) value_function, optimal_policy = solver.solve() # Check solution quality metrics = solver.compute_convergence_metrics() print(f"Max residual: {metrics['max_residual']:.2e}") print(f"Policy range - Limit: {metrics['policy_stats']['coverage_limit']}") print(f"Policy range - Deductible: {metrics['policy_stats']['deductible']}") Using the Solution ------------------ Extract and apply the optimal policy: .. code-block:: python # Get optimal control at specific state current_state = np.array([5e7, 0.06]) # $50M assets, 6% loss rate optimal_control = solver.extract_feedback_control(current_state) print(f"Optimal coverage limit: ${optimal_control['coverage_limit']:,.0f}") print(f"Optimal deductible: ${optimal_control['deductible']:,.0f}") # Visualize the policy import matplotlib.pyplot as plt # Plot optimal limit as function of assets (fixed loss rate) assets_range = np.linspace(1e6, 1e9, 100) loss_rate_fixed = 0.05 optimal_limits = [] for assets in assets_range: state = np.array([assets, loss_rate_fixed]) control = solver.extract_feedback_control(state) optimal_limits.append(control['coverage_limit']) plt.figure(figsize=(10, 6)) plt.semilogx(assets_range, optimal_limits) plt.xlabel('Assets ($)') plt.ylabel('Optimal Coverage Limit ($)') plt.title('State-Dependent Insurance Strategy') plt.grid(True) plt.show() Advanced Topics =============== Multi-Layer Insurance --------------------- For complex insurance programs with multiple layers: .. code-block:: python # Define controls for each layer controls = [] for i in range(3): # 3-layer program controls.extend([ ControlVariable(f"limit_L{i}", 1e6 * (i+1), 1e7 * (i+1), 20), ControlVariable(f"attachment_L{i}", 1e5 * (i+1), 1e6 * (i+1), 20), ControlVariable(f"coinsurance_L{i}", 0.8, 1.0, 10) ]) # Dynamics account for all layers def multi_layer_dynamics(state, control, time): assets = state[..., 0] total_premium = 0 for i in range(3): limit = control[..., i*3] attachment = control[..., i*3 + 1] coinsurance = control[..., i*3 + 2] # Layer-specific premium layer_premium = compute_layer_premium(limit, attachment, coinsurance) total_premium += layer_premium # Continue with dynamics... Stochastic Volatility --------------------- Incorporate time-varying uncertainty: .. code-block:: python # Add volatility as state variable volatility = StateVariable( name="volatility", min_value=0.01, max_value=0.5, num_points=30, log_scale=False ) def stochastic_dynamics(state, control, time): assets = state[..., 0] vol = state[..., 1] # Volatility affects growth uncertainty growth_rate = 0.08 growth_std = vol * np.sqrt(assets) # In HJB, we work with drift (deterministic part) # Diffusion enters through second-order terms drift_assets = assets * growth_rate drift_vol = 0.2 * (0.15 - vol) # Mean reversion return np.stack([drift_assets, drift_vol], axis=-1) Performance Optimization ======================== Computational Efficiency ------------------------ 1. **Grid Resolution**: Start coarse, refine gradually .. code-block:: python # Initial solve with coarse grid wealth_coarse = StateVariable("wealth", 1e6, 1e8, num_points=30) # Refine around optimal region wealth_fine = StateVariable("wealth", 1e6, 1e8, num_points=100) 2. **Sparse Matrices**: Enable for large problems .. code-block:: python config = HJBSolverConfig(use_sparse=True) 3. **Parallel Control Optimization**: Future enhancement .. code-block:: python # Currently sequential, but structure allows parallelization # Each state's optimization is independent Memory Management ----------------- For very large state spaces: .. code-block:: python # Use iterative methods instead of direct solvers config = HJBSolverConfig( scheme=TimeSteppingScheme.IMPLICIT, use_sparse=True, time_step=0.001 # Smaller time steps for stability ) # Consider domain decomposition for 3+ dimensions # Solve on subdomains and match at boundaries Troubleshooting Guide ===================== Common Issues and Solutions --------------------------- **Problem: Solver doesn't converge** - Increase ``max_iterations`` - Reduce ``tolerance`` - Check boundary conditions match problem physics - Ensure dynamics and costs are smooth **Problem: Numerical instabilities** - Use implicit time-stepping - Reduce time step - Check for discontinuities in dynamics/costs - Increase grid resolution near discontinuities **Problem: Unrealistic optimal policies** - Verify utility function choice - Check control bounds are reasonable - Ensure dynamics correctly model the system - Validate cost/reward functions **Problem: Out of memory** - Reduce grid points - Enable sparse matrices - Use iterative solvers - Consider dimension reduction Validation Techniques --------------------- 1. **Benchmark Against Known Solutions**: .. code-block:: python # Test with linear dynamics, quadratic cost (LQR) # Compare with analytical Riccati solution 2. **Convergence Analysis**: .. code-block:: python resolutions = [20, 40, 80, 160] solutions = [] for n in resolutions: state = StateVariable("x", 0, 1, num_points=n) # Solve and store value at test point solutions.append(value_at_test_point) # Check convergence rate errors = [abs(s - solutions[-1]) for s in solutions[:-1]] 3. **Policy Simulation**: .. code-block:: python # Forward simulate using optimal policy # Check value function matches simulated rewards Integration with Simulation Framework ===================================== Using with OptimalController ----------------------------- Integrate HJB solutions with the simulation framework: .. code-block:: python from ergodic_insurance.optimal_control import ( HJBFeedbackControl, OptimalController, ControlSpace ) # After solving HJB control_space = ControlSpace( limits=[(1e6, 5e7)], retentions=[(1e5, 1e7)], coverage_percentages=[(0.8, 1.0)] ) # Create feedback strategy strategy = HJBFeedbackControl( hjb_solver=solver, control_space=control_space, state_mapping=lambda state_dict: np.array([ state_dict['assets'], state_dict['loss_rate'] ]) ) # Create controller controller = OptimalController(strategy, control_space) # Use in simulation loop for t in range(simulation_steps): insurance = controller.apply_control(manufacturer, time=t) # Apply insurance and simulate... Real-Time Adaptation -------------------- For online learning and adaptation: .. code-block:: python class AdaptiveHJBControl: def __init__(self, base_solver): self.solver = base_solver self.observations = [] def update(self, state, outcome): """Update beliefs about system dynamics.""" self.observations.append((state, outcome)) if len(self.observations) % 100 == 0: # Re-estimate parameters new_dynamics = self.estimate_dynamics() # Re-solve HJB with updated model self.solver.problem.dynamics = new_dynamics self.solver.solve() Best Practices Summary ====================== 1. **Problem Formulation**: - Clearly define state and control spaces - Choose utility function matching risk preferences - Ensure dynamics are smooth and well-behaved 2. **Numerical Setup**: - Start with coarse grids, refine gradually - Use appropriate boundary conditions - Choose time-stepping scheme based on stability needs 3. **Validation**: - Test on problems with known solutions - Verify convergence with grid refinement - Simulate policies to check consistency 4. **Performance**: - Profile to identify bottlenecks - Use sparse matrices for large problems - Consider approximation methods for high dimensions 5. **Integration**: - Map between simulation and HJB state spaces carefully - Handle edge cases in policy interpolation - Monitor solution quality during deployment Conclusion ========== The HJB solver provides a powerful framework for optimal control in insurance and risk management. Its ability to handle complex dynamics, multiple state variables, and risk-sensitive objectives makes it particularly valuable for: - **Dynamic insurance optimization**: Adapting coverage to changing risk profiles - **Capital allocation**: Balancing growth and risk in investment strategies - **Operational decisions**: Optimizing production and inventory under uncertainty While computational requirements grow with dimensionality, careful problem formulation and numerical techniques enable practical solutions for real-world problems. The global optimality guarantees and complete policy characterization often justify the computational investment compared to heuristic or local methods. For further examples and applications, see the Jupyter notebooks in ``ergodic_insurance/notebooks/``, particularly: - ``12_hjb_optimal_control.ipynb``: Complete examples with visualization - ``11_pareto_analysis.ipynb``: Multi-objective optimization using HJB References ========== 1. Fleming, W.H., & Soner, H.M. (2006). *Controlled Markov Processes and Viscosity Solutions*. Springer. 2. Kushner, H.J., & Dupuis, P. (2001). *Numerical Methods for Stochastic Control Problems in Continuous Time*. Springer. 3. Bertsekas, D.P. (2017). *Dynamic Programming and Optimal Control*. Athena Scientific. 4. Pham, H. (2009). *Continuous-time Stochastic Control and Optimization with Financial Applications*. Springer. 5. Touzi, N. (2012). *Optimal Stochastic Control, Stochastic Target Problems, and Backward SDE*. Springer.