Data Models and Analysis Classes

This document shows the data structures and analysis models used throughout the system. The diagrams are split into focused sections for readability.

Ergodic Analysis

The ergodic analysis subsystem implements Ole Peters’ ergodic economics framework, comparing time-average versus ensemble-average growth rates to demonstrate how insurance transforms business growth dynamics.

        classDiagram
    class ErgodicAnalyzer {
        -convergence_threshold: float
        +calculate_time_average_growth(trajectories) dict
        +calculate_ensemble_average(trajectories) dict
        +compare_scenarios(insured, uninsured, metric) dict
        +check_convergence(values, window_size) tuple
        +analyze_simulation_batch(results, label) dict
        +integrate_loss_ergodic_analysis(loss_data, insurance, manufacturer) ErgodicAnalysisResults
        +validate_insurance_ergodic_impact(...) ValidationResults
        +significance_test(insured_growth, uninsured_growth) dict
    }

    class ErgodicData {
        <<dataclass>>
        +time_series: ndarray
        +values: ndarray
        +metadata: dict
        +validate() bool
    }

    class ErgodicAnalysisResults {
        <<dataclass>>
        +time_average_growth: float
        +ensemble_average_growth: float
        +survival_rate: float
        +ergodic_divergence: float
        +insurance_impact: dict
        +validation_passed: bool
        +metadata: dict
    }

    class ValidationResults {
        <<dataclass>>
        +is_valid: bool
        +checks: dict
        +warnings: list
        +summary: str
    }

    ErgodicAnalyzer --> ErgodicData : accepts
    ErgodicAnalyzer --> ErgodicAnalysisResults : produces
    ErgodicAnalyzer --> ValidationResults : validates with
    ErgodicAnalysisResults --> ErgodicData : derived from
    

ErgodicAnalyzer is the core analysis engine. It accepts trajectories as ErgodicData or SimulationResults, calculates time-average and ensemble-average growth rates, performs convergence checks, and runs integrated loss-ergodic analysis. The compare_scenarios() method is the primary entry point for comparing insured versus uninsured outcomes.

ErgodicData is a lightweight dataclass holding time series arrays and metadata. It validates array length consistency before analysis.

ErgodicAnalysisResults captures the complete output of an integrated analysis, including growth rates, survival statistics, insurance impact metrics, and validation status.

Business Optimization

The optimization subsystem uses ergodic metrics to find insurance strategies that maximize real business outcomes such as ROE, growth rate, and survival probability.

        classDiagram
    class BusinessOptimizer {
        -manufacturer: WidgetManufacturer
        -loss_distribution: LossDistribution
        -decision_engine: InsuranceDecisionEngine
        -ergodic_analyzer: ErgodicAnalyzer
        -optimizer_config: BusinessOptimizerConfig
        +maximize_roe_with_insurance(constraints, time_horizon) OptimalStrategy
        +minimize_bankruptcy_risk(growth_targets, budget) OptimalStrategy
        +optimize_capital_efficiency(constraints) OptimalStrategy
        +optimize_business_outcomes(objectives, constraints) BusinessOptimizationResult
    }

    class OptimalStrategy {
        <<dataclass>>
        +coverage_limit: float
        +deductible: float
        +premium_rate: float
        +expected_roe: float
        +bankruptcy_risk: float
        +growth_rate: float
        +capital_efficiency: float
        +recommendations: list~str~
        +to_dict() dict
    }

    class BusinessObjective {
        <<dataclass>>
        +name: str
        +weight: float
        +target_value: float
        +optimization_direction: OptimizationDirection
        +constraint_type: str
        +constraint_value: float
    }

    class BusinessConstraints {
        <<dataclass>>
        +max_risk_tolerance: float
        +min_roe_threshold: float
        +max_leverage_ratio: float
        +min_liquidity_ratio: float
        +max_premium_budget: float
        +min_coverage_ratio: float
        +regulatory_requirements: dict
    }

    class BusinessOptimizationResult {
        <<dataclass>>
        +optimal_strategy: OptimalStrategy
        +objective_values: dict
        +constraint_satisfaction: dict
        +convergence_info: dict
        +sensitivity_analysis: dict
        +is_feasible() bool
    }

    BusinessOptimizer --> OptimalStrategy : finds
    BusinessOptimizer --> BusinessObjective : uses
    BusinessOptimizer --> BusinessConstraints : respects
    BusinessOptimizer --> BusinessOptimizationResult : produces
    BusinessOptimizationResult --> OptimalStrategy : contains
    

BusinessOptimizer provides multiple optimization methods: maximize_roe_with_insurance() for ROE-focused optimization, minimize_bankruptcy_risk() for safety-first strategies, optimize_capital_efficiency() for capital allocation, and optimize_business_outcomes() for multi-objective optimization using BusinessObjective definitions.

OptimalStrategy is the output dataclass capturing the recommended insurance parameters (coverage limit, deductible, premium rate) along with expected business outcomes and actionable recommendations.

Risk Analysis

Risk metrics and ruin probability analysis provide the quantitative foundation for evaluating tail risk and insurance value.

        classDiagram
    class RiskMetrics {
        -losses: ndarray
        -weights: ndarray
        -rng: Generator
        +var(confidence, method, bootstrap_ci) float
        +tvar(confidence) float
        +expected_shortfall(confidence) float
        +pml(return_period) float
        +maximum_drawdown() float
        +economic_capital(confidence) float
        +tail_index(threshold) float
        +risk_adjusted_metrics() dict
        +coherence_test() dict
        +summary_statistics() dict
        +plot_distribution()
    }

    class RiskMetricsResult {
        <<dataclass>>
        +metric_name: str
        +value: float
        +confidence_level: float
        +confidence_interval: tuple
        +metadata: dict
    }

    class RuinProbabilityAnalyzer {
        -manufacturer: WidgetManufacturer
        -loss_generator: ManufacturingLossGenerator
        -insurance_program: InsuranceProgram
        -config: SimulationConfig
        +analyze_ruin_probability(config) RuinProbabilityResults
    }

    class RuinProbabilityResults {
        <<dataclass>>
        +time_horizons: ndarray
        +ruin_probabilities: ndarray
        +confidence_intervals: ndarray
        +bankruptcy_causes: dict
        +survival_curves: ndarray
        +execution_time: float
        +n_simulations: int
        +convergence_achieved: bool
        +mid_year_ruin_count: int
        +ruin_month_distribution: dict
        +summary() str
    }

    class RuinProbabilityConfig {
        <<dataclass>>
        +time_horizons: list~int~
        +n_simulations: int
        +min_assets_threshold: float
        +min_equity_threshold: float
        +early_stopping: bool
        +parallel: bool
        +n_workers: int
        +seed: int
        +n_bootstrap: int
    }

    RiskMetrics --> RiskMetricsResult : returns
    RuinProbabilityAnalyzer --> RuinProbabilityResults : produces
    RuinProbabilityAnalyzer --> RuinProbabilityConfig : configured by
    

RiskMetrics is initialized with a loss array and provides VaR, TVaR (CVaR), Expected Shortfall, PML, Maximum Drawdown, and other tail-risk measures. It supports both empirical and parametric methods with optional bootstrap confidence intervals.

RuinProbabilityAnalyzer runs Monte Carlo ruin analysis across multiple time horizons, with support for parallel execution, bootstrap confidence intervals, and mid-year ruin tracking.

Convergence Diagnostics

Convergence analysis ensures Monte Carlo simulations have run long enough to produce reliable results.

        classDiagram
    class ConvergenceDiagnostics {
        -r_hat_threshold: float
        -min_ess: int
        -relative_mcse_threshold: float
        +calculate_r_hat(chains) float
        +calculate_ess(chain, max_lag) float
        +calculate_batch_ess(chains, method) float
        +calculate_ess_per_second(chain, time) float
        +calculate_mcse(chain, ess) float
        +check_convergence(chains, metric_names) dict
        +geweke_test(chain) tuple
        +heidelberger_welch_test(chain, alpha) dict
    }

    class ConvergenceStats {
        <<dataclass>>
        +r_hat: float
        +ess: float
        +mcse: float
        +converged: bool
        +n_iterations: int
        +autocorrelation: float
    }

    ConvergenceDiagnostics --> ConvergenceStats : produces
    

ConvergenceDiagnostics implements Gelman-Rubin R-hat, Effective Sample Size (ESS), Monte Carlo Standard Error (MCSE), Geweke test, and Heidelberger-Welch stationarity test. The check_convergence() method returns a ConvergenceStats dataclass for each metric being tracked.

Loss Modeling

The loss modeling subsystem uses a composite pattern to combine attritional, large, and catastrophic loss generators into a unified manufacturing risk model.

        classDiagram
    class LossDistribution {
        <<abstract>>
        #rng: Generator
        +generate_severity(n_samples)* ndarray
        +expected_value()* float
        +reset_seed(seed) void
    }

    class LognormalLoss {
        +mean: float
        +cv: float
        +mu: float
        +sigma: float
        +generate_severity(n_samples) ndarray
        +expected_value() float
    }

    class ParetoLoss {
        +alpha: float
        +xm: float
        +generate_severity(n_samples) ndarray
        +expected_value() float
    }

    class GeneralizedParetoLoss {
        +severity_shape: float
        +severity_scale: float
        +generate_severity(n_samples) ndarray
        +expected_value() float
    }

    class LossEvent {
        <<dataclass>>
        +amount: float
        +time: float
        +loss_type: str
        +description: str
    }

    class LossData {
        <<dataclass>>
        +timestamps: ndarray
        +loss_amounts: ndarray
        +loss_types: list~str~
        +claim_ids: list~str~
        +development_factors: ndarray
        +metadata: dict
        +validate() bool
        +to_ergodic_format() ErgodicData
        +apply_insurance(program) LossData
        +from_loss_events(events)$ LossData
        +to_loss_events() list~LossEvent~
        +get_annual_aggregates(years) dict
        +calculate_statistics() dict
    }

    LossDistribution <|-- LognormalLoss
    LossDistribution <|-- ParetoLoss
    LossDistribution <|-- GeneralizedParetoLoss
    LossData --> LossEvent : converts to/from
    

LossDistribution is the abstract base class defining the interface for severity distributions. The three concrete implementations (Lognormal, Pareto, Generalized Pareto) cover the full spectrum from attritional to extreme tail modeling.

LossEvent is a lightweight dataclass representing a single loss occurrence with timing, amount, and type classification. LossData is the unified data container for cross-module compatibility, providing conversion to ergodic format and insurance application methods.

Loss Generation (Composite Pattern)

The manufacturing loss generator uses the Composite pattern to combine multiple loss layer generators, each with independent frequency and severity models.

        classDiagram
    class ManufacturingLossGenerator {
        +attritional: AttritionalLossGenerator
        +large: LargeLossGenerator
        +catastrophic: CatastrophicLossGenerator
        +gpd_generator: GeneralizedParetoLoss
        +threshold_value: float
        +exposure: ExposureBase
        +generate_losses(duration, revenue) tuple
        +reseed(seed) void
        +create_simple(frequency, severity_mean, severity_std, seed)$ ManufacturingLossGenerator
        +validate_distributions(n_simulations) dict
    }

    class AttritionalLossGenerator {
        +frequency_generator: FrequencyGenerator
        +severity_distribution: LognormalLoss
        +loss_type: str
        +generate_losses(duration, revenue) list~LossEvent~
        +reseed(seed) void
    }

    class LargeLossGenerator {
        +frequency_generator: FrequencyGenerator
        +severity_distribution: LognormalLoss
        +loss_type: str
        +generate_losses(duration, revenue) list~LossEvent~
        +reseed(seed) void
    }

    class CatastrophicLossGenerator {
        +frequency_generator: FrequencyGenerator
        +severity_distribution: ParetoLoss
        +loss_type: str
        +generate_losses(duration, revenue) list~LossEvent~
        +reseed(seed) void
    }

    class FrequencyGenerator {
        +base_frequency: float
        +revenue_scaling_exponent: float
        +reference_revenue: float
        -rng: Generator
        +reseed(seed) void
        +get_scaled_frequency(revenue) float
        +generate_event_times(duration, revenue) ndarray
    }

    ManufacturingLossGenerator *-- AttritionalLossGenerator : composes
    ManufacturingLossGenerator *-- LargeLossGenerator : composes
    ManufacturingLossGenerator *-- CatastrophicLossGenerator : composes
    ManufacturingLossGenerator o-- GeneralizedParetoLoss : optional extreme
    AttritionalLossGenerator --> FrequencyGenerator : uses
    LargeLossGenerator --> FrequencyGenerator : uses
    CatastrophicLossGenerator --> FrequencyGenerator : uses
    AttritionalLossGenerator --> LognormalLoss : severity
    LargeLossGenerator --> LognormalLoss : severity
    CatastrophicLossGenerator --> ParetoLoss : severity
    

ManufacturingLossGenerator is the composite orchestrator that combines three loss layers (attritional, large, catastrophic) with optional GPD extreme value transformation. The create_simple() class method provides a migration-friendly factory for basic use cases. Each sub-generator pairs a FrequencyGenerator (Poisson process with revenue scaling) with a LossDistribution for severities.

Sensitivity Analysis

Sensitivity tools analyze how parameter changes affect optimization outcomes, with built-in caching for computational efficiency.

        classDiagram
    class SensitivityAnalyzer {
        -base_config: dict
        -optimizer: Any
        -results_cache: dict
        -cache_dir: Path
        +analyze_parameter(param_name, param_range, n_points) SensitivityResult
        +create_tornado_diagram(parameters, metric) dict
        +analyze_parameter_group(params, metric) dict
    }

    class SensitivityResult {
        <<dataclass>>
        +parameter: str
        +baseline_value: float
        +variations: ndarray
        +metrics: dict
        +parameter_path: str
        +units: str
        +calculate_impact(metric) float
        +get_metric_bounds(metric) tuple
        +to_dataframe() DataFrame
    }

    class TwoWaySensitivityResult {
        <<dataclass>>
        +parameter1: str
        +parameter2: str
        +values1: ndarray
        +values2: ndarray
        +metric_grid: ndarray
        +metric_name: str
        +find_optimal_region(target, tolerance) ndarray
        +to_dataframe() DataFrame
    }

    SensitivityAnalyzer --> SensitivityResult : produces
    SensitivityAnalyzer --> TwoWaySensitivityResult : produces
    

SensitivityAnalyzer provides one-way parameter analysis, tornado diagram generation, and parameter group analysis. It uses MD5-based caching to avoid redundant optimizer runs. Results are captured as SensitivityResult (one-way) or TwoWaySensitivityResult (two-way interaction) dataclasses with built-in DataFrame conversion.

Financial Statements

The financial statement subsystem generates GAAP-compliant Balance Sheet, Income Statement, and Cash Flow Statement from simulation data, with support for both indirect and direct (ledger-based) cash flow methods.

        classDiagram
    class FinancialStatementGenerator {
        -manufacturer: WidgetManufacturer
        -manufacturer_data: dict
        -config: FinancialStatementConfig
        -metrics_history: list
        -years_available: int
        -ledger: Ledger
        +generate_balance_sheet(year) DataFrame
        +generate_income_statement(year) DataFrame
        +generate_cash_flow_statement(year) DataFrame
        +generate_reconciliation_report(year) DataFrame
    }

    class CashFlowStatement {
        -metrics_history: list
        -config: Any
        -ledger: Ledger
        +generate_statement(year, period, method) DataFrame
    }

    class FinancialStatementConfig {
        <<dataclass>>
        +currency_symbol: str
        +decimal_places: int
        +include_yoy_change: bool
        +include_percentages: bool
        +fiscal_year_end: int
        +consolidate_monthly: bool
        +current_claims_ratio: float
    }

    FinancialStatementGenerator --> CashFlowStatement : delegates to
    FinancialStatementGenerator --> FinancialStatementConfig : configured by
    FinancialStatementGenerator ..> WidgetManufacturer : reads from
    

FinancialStatementGenerator is the primary entry point, accepting a WidgetManufacturer (or raw data dictionary) and generating formatted DataFrames for each financial statement. It supports ledger-based direct method cash flow when a Ledger is available. The generate_reconciliation_report() method validates the accounting equation and solvency checks.

CashFlowStatement handles the three-section cash flow statement (Operating, Investing, Financing) with both indirect and direct method support.

Data Flow Sequence

        sequenceDiagram
    participant LG as ManufacturingLossGenerator
    participant Sim as Simulation
    participant EA as ErgodicAnalyzer
    participant BO as BusinessOptimizer
    participant SA as SensitivityAnalyzer
    participant RM as RiskMetrics
    participant FS as FinancialStatementGenerator

    LG->>Sim: Generate losses (attritional + large + catastrophic)
    Sim->>EA: Trajectory data (insured & uninsured)
    EA->>EA: Calculate time-average growth
    EA->>EA: Calculate ensemble-average growth
    EA->>RM: Loss data for tail risk
    RM-->>EA: VaR, TVaR, drawdown metrics
    EA-->>BO: Ergodic metrics & analysis results

    BO->>BO: Define objectives & constraints
    BO->>SA: Request parameter sensitivity
    SA->>SA: Parameter sweep with caching
    SA-->>BO: SensitivityResult
    BO->>BO: Find optimal strategy via scipy.optimize
    BO-->>BO: OptimalStrategy

    BO->>FS: Generate financial statements
    FS->>FS: Build balance sheet
    FS->>FS: Build income statement
    FS->>FS: Build cash flow statement
    FS-->>BO: Formatted DataFrames
    

Key Design Patterns

1. Composite Pattern

  • ManufacturingLossGenerator composes AttritionalLossGenerator, LargeLossGenerator, and CatastrophicLossGenerator into a unified interface

  • Each sub-generator independently pairs a FrequencyGenerator with a LossDistribution

2. Template Method (Abstract Base Class)

  • LossDistribution (ABC) defines the interface with generate_severity() and expected_value() as abstract methods

  • LognormalLoss, ParetoLoss, and GeneralizedParetoLoss implement distribution-specific behavior

3. Dataclass Data Transfer Objects

  • ErgodicData, ErgodicAnalysisResults, OptimalStrategy, LossEvent, LossData, ConvergenceStats, RuinProbabilityResults, SensitivityResult all use @dataclass for clean data transfer between modules

4. Factory Method

  • ManufacturingLossGenerator.create_simple() provides a simplified factory for migration from legacy ClaimGenerator

  • LossData.from_loss_events() constructs data from a list of LossEvent objects

5. Strategy Pattern

  • BusinessOptimizer supports multiple optimization strategies: ROE maximization, bankruptcy risk minimization, capital efficiency optimization, and multi-objective optimization

  • Each strategy uses different objective functions with scipy.optimize

6. Caching

  • SensitivityAnalyzer uses MD5-based in-memory and persistent disk caching to avoid redundant optimization runs during parameter sweeps