# Data Models and Analysis Classes This document shows the data structures and analysis models used throughout the system. The diagrams are split into focused sections for readability. ## Ergodic Analysis The ergodic analysis subsystem implements Ole Peters' ergodic economics framework, comparing time-average versus ensemble-average growth rates to demonstrate how insurance transforms business growth dynamics. ```{mermaid} classDiagram class ErgodicAnalyzer { -convergence_threshold: float +calculate_time_average_growth(trajectories) dict +calculate_ensemble_average(trajectories) dict +compare_scenarios(insured, uninsured, metric) dict +check_convergence(values, window_size) tuple +analyze_simulation_batch(results, label) dict +integrate_loss_ergodic_analysis(loss_data, insurance, manufacturer) ErgodicAnalysisResults +validate_insurance_ergodic_impact(...) ValidationResults +significance_test(insured_growth, uninsured_growth) dict } class ErgodicData { <> +time_series: ndarray +values: ndarray +metadata: dict +validate() bool } class ErgodicAnalysisResults { <> +time_average_growth: float +ensemble_average_growth: float +survival_rate: float +ergodic_divergence: float +insurance_impact: dict +validation_passed: bool +metadata: dict } class ValidationResults { <> +is_valid: bool +checks: dict +warnings: list +summary: str } ErgodicAnalyzer --> ErgodicData : accepts ErgodicAnalyzer --> ErgodicAnalysisResults : produces ErgodicAnalyzer --> ValidationResults : validates with ErgodicAnalysisResults --> ErgodicData : derived from ``` **ErgodicAnalyzer** is the core analysis engine. It accepts trajectories as `ErgodicData` or `SimulationResults`, calculates time-average and ensemble-average growth rates, performs convergence checks, and runs integrated loss-ergodic analysis. The `compare_scenarios()` method is the primary entry point for comparing insured versus uninsured outcomes. **ErgodicData** is a lightweight dataclass holding time series arrays and metadata. It validates array length consistency before analysis. **ErgodicAnalysisResults** captures the complete output of an integrated analysis, including growth rates, survival statistics, insurance impact metrics, and validation status. ## Business Optimization The optimization subsystem uses ergodic metrics to find insurance strategies that maximize real business outcomes such as ROE, growth rate, and survival probability. ```{mermaid} classDiagram class BusinessOptimizer { -manufacturer: WidgetManufacturer -loss_distribution: LossDistribution -decision_engine: InsuranceDecisionEngine -ergodic_analyzer: ErgodicAnalyzer -optimizer_config: BusinessOptimizerConfig +maximize_roe_with_insurance(constraints, time_horizon) OptimalStrategy +minimize_bankruptcy_risk(growth_targets, budget) OptimalStrategy +optimize_capital_efficiency(constraints) OptimalStrategy +optimize_business_outcomes(objectives, constraints) BusinessOptimizationResult } class OptimalStrategy { <> +coverage_limit: float +deductible: float +premium_rate: float +expected_roe: float +bankruptcy_risk: float +growth_rate: float +capital_efficiency: float +recommendations: list~str~ +to_dict() dict } class BusinessObjective { <> +name: str +weight: float +target_value: float +optimization_direction: OptimizationDirection +constraint_type: str +constraint_value: float } class BusinessConstraints { <> +max_risk_tolerance: float +min_roe_threshold: float +max_leverage_ratio: float +min_liquidity_ratio: float +max_premium_budget: float +min_coverage_ratio: float +regulatory_requirements: dict } class BusinessOptimizationResult { <> +optimal_strategy: OptimalStrategy +objective_values: dict +constraint_satisfaction: dict +convergence_info: dict +sensitivity_analysis: dict +is_feasible() bool } BusinessOptimizer --> OptimalStrategy : finds BusinessOptimizer --> BusinessObjective : uses BusinessOptimizer --> BusinessConstraints : respects BusinessOptimizer --> BusinessOptimizationResult : produces BusinessOptimizationResult --> OptimalStrategy : contains ``` **BusinessOptimizer** provides multiple optimization methods: `maximize_roe_with_insurance()` for ROE-focused optimization, `minimize_bankruptcy_risk()` for safety-first strategies, `optimize_capital_efficiency()` for capital allocation, and `optimize_business_outcomes()` for multi-objective optimization using `BusinessObjective` definitions. **OptimalStrategy** is the output dataclass capturing the recommended insurance parameters (coverage limit, deductible, premium rate) along with expected business outcomes and actionable recommendations. ## Risk Analysis Risk metrics and ruin probability analysis provide the quantitative foundation for evaluating tail risk and insurance value. ```{mermaid} classDiagram class RiskMetrics { -losses: ndarray -weights: ndarray -rng: Generator +var(confidence, method, bootstrap_ci) float +tvar(confidence) float +expected_shortfall(confidence) float +pml(return_period) float +maximum_drawdown() float +economic_capital(confidence) float +tail_index(threshold) float +risk_adjusted_metrics() dict +coherence_test() dict +summary_statistics() dict +plot_distribution() } class RiskMetricsResult { <> +metric_name: str +value: float +confidence_level: float +confidence_interval: tuple +metadata: dict } class RuinProbabilityAnalyzer { -manufacturer: WidgetManufacturer -loss_generator: ManufacturingLossGenerator -insurance_program: InsuranceProgram -config: SimulationConfig +analyze_ruin_probability(config) RuinProbabilityResults } class RuinProbabilityResults { <> +time_horizons: ndarray +ruin_probabilities: ndarray +confidence_intervals: ndarray +bankruptcy_causes: dict +survival_curves: ndarray +execution_time: float +n_simulations: int +convergence_achieved: bool +mid_year_ruin_count: int +ruin_month_distribution: dict +summary() str } class RuinProbabilityConfig { <> +time_horizons: list~int~ +n_simulations: int +min_assets_threshold: float +min_equity_threshold: float +early_stopping: bool +parallel: bool +n_workers: int +seed: int +n_bootstrap: int } RiskMetrics --> RiskMetricsResult : returns RuinProbabilityAnalyzer --> RuinProbabilityResults : produces RuinProbabilityAnalyzer --> RuinProbabilityConfig : configured by ``` **RiskMetrics** is initialized with a loss array and provides VaR, TVaR (CVaR), Expected Shortfall, PML, Maximum Drawdown, and other tail-risk measures. It supports both empirical and parametric methods with optional bootstrap confidence intervals. **RuinProbabilityAnalyzer** runs Monte Carlo ruin analysis across multiple time horizons, with support for parallel execution, bootstrap confidence intervals, and mid-year ruin tracking. ## Convergence Diagnostics Convergence analysis ensures Monte Carlo simulations have run long enough to produce reliable results. ```{mermaid} classDiagram class ConvergenceDiagnostics { -r_hat_threshold: float -min_ess: int -relative_mcse_threshold: float +calculate_r_hat(chains) float +calculate_ess(chain, max_lag) float +calculate_batch_ess(chains, method) float +calculate_ess_per_second(chain, time) float +calculate_mcse(chain, ess) float +check_convergence(chains, metric_names) dict +geweke_test(chain) tuple +heidelberger_welch_test(chain, alpha) dict } class ConvergenceStats { <> +r_hat: float +ess: float +mcse: float +converged: bool +n_iterations: int +autocorrelation: float } ConvergenceDiagnostics --> ConvergenceStats : produces ``` **ConvergenceDiagnostics** implements Gelman-Rubin R-hat, Effective Sample Size (ESS), Monte Carlo Standard Error (MCSE), Geweke test, and Heidelberger-Welch stationarity test. The `check_convergence()` method returns a `ConvergenceStats` dataclass for each metric being tracked. ## Loss Modeling The loss modeling subsystem uses a composite pattern to combine attritional, large, and catastrophic loss generators into a unified manufacturing risk model. ```{mermaid} classDiagram class LossDistribution { <> #rng: Generator +generate_severity(n_samples)* ndarray +expected_value()* float +reset_seed(seed) void } class LognormalLoss { +mean: float +cv: float +mu: float +sigma: float +generate_severity(n_samples) ndarray +expected_value() float } class ParetoLoss { +alpha: float +xm: float +generate_severity(n_samples) ndarray +expected_value() float } class GeneralizedParetoLoss { +severity_shape: float +severity_scale: float +generate_severity(n_samples) ndarray +expected_value() float } class LossEvent { <> +amount: float +time: float +loss_type: str +description: str } class LossData { <> +timestamps: ndarray +loss_amounts: ndarray +loss_types: list~str~ +claim_ids: list~str~ +development_factors: ndarray +metadata: dict +validate() bool +to_ergodic_format() ErgodicData +apply_insurance(program) LossData +from_loss_events(events)$ LossData +to_loss_events() list~LossEvent~ +get_annual_aggregates(years) dict +calculate_statistics() dict } LossDistribution <|-- LognormalLoss LossDistribution <|-- ParetoLoss LossDistribution <|-- GeneralizedParetoLoss LossData --> LossEvent : converts to/from ``` **LossDistribution** is the abstract base class defining the interface for severity distributions. The three concrete implementations (Lognormal, Pareto, Generalized Pareto) cover the full spectrum from attritional to extreme tail modeling. **LossEvent** is a lightweight dataclass representing a single loss occurrence with timing, amount, and type classification. **LossData** is the unified data container for cross-module compatibility, providing conversion to ergodic format and insurance application methods. ## Loss Generation (Composite Pattern) The manufacturing loss generator uses the Composite pattern to combine multiple loss layer generators, each with independent frequency and severity models. ```{mermaid} classDiagram class ManufacturingLossGenerator { +attritional: AttritionalLossGenerator +large: LargeLossGenerator +catastrophic: CatastrophicLossGenerator +gpd_generator: GeneralizedParetoLoss +threshold_value: float +exposure: ExposureBase +generate_losses(duration, revenue) tuple +reseed(seed) void +create_simple(frequency, severity_mean, severity_std, seed)$ ManufacturingLossGenerator +validate_distributions(n_simulations) dict } class AttritionalLossGenerator { +frequency_generator: FrequencyGenerator +severity_distribution: LognormalLoss +loss_type: str +generate_losses(duration, revenue) list~LossEvent~ +reseed(seed) void } class LargeLossGenerator { +frequency_generator: FrequencyGenerator +severity_distribution: LognormalLoss +loss_type: str +generate_losses(duration, revenue) list~LossEvent~ +reseed(seed) void } class CatastrophicLossGenerator { +frequency_generator: FrequencyGenerator +severity_distribution: ParetoLoss +loss_type: str +generate_losses(duration, revenue) list~LossEvent~ +reseed(seed) void } class FrequencyGenerator { +base_frequency: float +revenue_scaling_exponent: float +reference_revenue: float -rng: Generator +reseed(seed) void +get_scaled_frequency(revenue) float +generate_event_times(duration, revenue) ndarray } ManufacturingLossGenerator *-- AttritionalLossGenerator : composes ManufacturingLossGenerator *-- LargeLossGenerator : composes ManufacturingLossGenerator *-- CatastrophicLossGenerator : composes ManufacturingLossGenerator o-- GeneralizedParetoLoss : optional extreme AttritionalLossGenerator --> FrequencyGenerator : uses LargeLossGenerator --> FrequencyGenerator : uses CatastrophicLossGenerator --> FrequencyGenerator : uses AttritionalLossGenerator --> LognormalLoss : severity LargeLossGenerator --> LognormalLoss : severity CatastrophicLossGenerator --> ParetoLoss : severity ``` **ManufacturingLossGenerator** is the composite orchestrator that combines three loss layers (attritional, large, catastrophic) with optional GPD extreme value transformation. The `create_simple()` class method provides a migration-friendly factory for basic use cases. Each sub-generator pairs a `FrequencyGenerator` (Poisson process with revenue scaling) with a `LossDistribution` for severities. ## Sensitivity Analysis Sensitivity tools analyze how parameter changes affect optimization outcomes, with built-in caching for computational efficiency. ```{mermaid} classDiagram class SensitivityAnalyzer { -base_config: dict -optimizer: Any -results_cache: dict -cache_dir: Path +analyze_parameter(param_name, param_range, n_points) SensitivityResult +create_tornado_diagram(parameters, metric) dict +analyze_parameter_group(params, metric) dict } class SensitivityResult { <> +parameter: str +baseline_value: float +variations: ndarray +metrics: dict +parameter_path: str +units: str +calculate_impact(metric) float +get_metric_bounds(metric) tuple +to_dataframe() DataFrame } class TwoWaySensitivityResult { <> +parameter1: str +parameter2: str +values1: ndarray +values2: ndarray +metric_grid: ndarray +metric_name: str +find_optimal_region(target, tolerance) ndarray +to_dataframe() DataFrame } SensitivityAnalyzer --> SensitivityResult : produces SensitivityAnalyzer --> TwoWaySensitivityResult : produces ``` **SensitivityAnalyzer** provides one-way parameter analysis, tornado diagram generation, and parameter group analysis. It uses MD5-based caching to avoid redundant optimizer runs. Results are captured as `SensitivityResult` (one-way) or `TwoWaySensitivityResult` (two-way interaction) dataclasses with built-in DataFrame conversion. ## Financial Statements The financial statement subsystem generates GAAP-compliant Balance Sheet, Income Statement, and Cash Flow Statement from simulation data, with support for both indirect and direct (ledger-based) cash flow methods. ```{mermaid} classDiagram class FinancialStatementGenerator { -manufacturer: WidgetManufacturer -manufacturer_data: dict -config: FinancialStatementConfig -metrics_history: list -years_available: int -ledger: Ledger +generate_balance_sheet(year) DataFrame +generate_income_statement(year) DataFrame +generate_cash_flow_statement(year) DataFrame +generate_reconciliation_report(year) DataFrame } class CashFlowStatement { -metrics_history: list -config: Any -ledger: Ledger +generate_statement(year, period, method) DataFrame } class FinancialStatementConfig { <> +currency_symbol: str +decimal_places: int +include_yoy_change: bool +include_percentages: bool +fiscal_year_end: int +consolidate_monthly: bool +current_claims_ratio: float } FinancialStatementGenerator --> CashFlowStatement : delegates to FinancialStatementGenerator --> FinancialStatementConfig : configured by FinancialStatementGenerator ..> WidgetManufacturer : reads from ``` **FinancialStatementGenerator** is the primary entry point, accepting a `WidgetManufacturer` (or raw data dictionary) and generating formatted DataFrames for each financial statement. It supports ledger-based direct method cash flow when a `Ledger` is available. The `generate_reconciliation_report()` method validates the accounting equation and solvency checks. **CashFlowStatement** handles the three-section cash flow statement (Operating, Investing, Financing) with both indirect and direct method support. ## Data Flow Sequence ```{mermaid} sequenceDiagram participant LG as ManufacturingLossGenerator participant Sim as Simulation participant EA as ErgodicAnalyzer participant BO as BusinessOptimizer participant SA as SensitivityAnalyzer participant RM as RiskMetrics participant FS as FinancialStatementGenerator LG->>Sim: Generate losses (attritional + large + catastrophic) Sim->>EA: Trajectory data (insured & uninsured) EA->>EA: Calculate time-average growth EA->>EA: Calculate ensemble-average growth EA->>RM: Loss data for tail risk RM-->>EA: VaR, TVaR, drawdown metrics EA-->>BO: Ergodic metrics & analysis results BO->>BO: Define objectives & constraints BO->>SA: Request parameter sensitivity SA->>SA: Parameter sweep with caching SA-->>BO: SensitivityResult BO->>BO: Find optimal strategy via scipy.optimize BO-->>BO: OptimalStrategy BO->>FS: Generate financial statements FS->>FS: Build balance sheet FS->>FS: Build income statement FS->>FS: Build cash flow statement FS-->>BO: Formatted DataFrames ``` ## Key Design Patterns ### 1. **Composite Pattern** - `ManufacturingLossGenerator` composes `AttritionalLossGenerator`, `LargeLossGenerator`, and `CatastrophicLossGenerator` into a unified interface - Each sub-generator independently pairs a `FrequencyGenerator` with a `LossDistribution` ### 2. **Template Method (Abstract Base Class)** - `LossDistribution` (ABC) defines the interface with `generate_severity()` and `expected_value()` as abstract methods - `LognormalLoss`, `ParetoLoss`, and `GeneralizedParetoLoss` implement distribution-specific behavior ### 3. **Dataclass Data Transfer Objects** - `ErgodicData`, `ErgodicAnalysisResults`, `OptimalStrategy`, `LossEvent`, `LossData`, `ConvergenceStats`, `RuinProbabilityResults`, `SensitivityResult` all use `@dataclass` for clean data transfer between modules ### 4. **Factory Method** - `ManufacturingLossGenerator.create_simple()` provides a simplified factory for migration from legacy `ClaimGenerator` - `LossData.from_loss_events()` constructs data from a list of `LossEvent` objects ### 5. **Strategy Pattern** - `BusinessOptimizer` supports multiple optimization strategies: ROE maximization, bankruptcy risk minimization, capital efficiency optimization, and multi-objective optimization - Each strategy uses different objective functions with `scipy.optimize` ### 6. **Caching** - `SensitivityAnalyzer` uses MD5-based in-memory and persistent disk caching to avoid redundant optimization runs during parameter sweeps