Uncertainty Profiler ==================== Gaussian Processes provide predictive uncertainty as a first-class output — but raw variance values are hard to interpret. The :class:`~gpclarity.UncertaintyProfiler` turns them into actionable diagnostics: it classifies regions as interpolation vs. extrapolation, detects when confidence intervals are too wide or too narrow, and can recalibrate the uncertainty scale against held-out validation data. .. contents:: On this page :local: :depth: 2 Initialization -------------- .. code-block:: python import gpclarity, GPy, numpy as np X = np.linspace(0, 10, 50).reshape(-1, 1) y = np.sin(X).flatten() + 0.1 * np.random.randn(50) model = GPy.models.GPRegression(X, y[:, None], GPy.kern.RBF(1)) model.optimize() profiler = gpclarity.UncertaintyProfiler(model, X_train=X) Pass ``X_train`` so the profiler can distinguish interpolation from extrapolation. Providing ``config`` (an :class:`~gpclarity.uncertainty_analysis.UncertaintyConfig` object) lets you adjust thresholds such as the high-uncertainty percentile cutoff. Prediction with Intervals ------------------------- .. code-block:: python X_test = np.linspace(-2, 12, 200).reshape(-1, 1) result = profiler.predict(X_test) print(result.mean.shape) # (200, 1) print(result.variance.shape) # (200, 1) # 2-sigma confidence interval lower, upper = result.get_interval(2.0) ``predict()`` returns a :class:`~gpclarity.uncertainty_analysis.PredictionResult` dataclass. ``get_interval(sigma)`` computes ``mean ± sigma * std``. Diagnostics ----------- ``compute_diagnostics()`` returns a plain dictionary summarising uncertainty across the test set: .. code-block:: python diag = profiler.compute_diagnostics(X_test) print(diag["mean_uncertainty"]) # average predictive variance print(diag["max_uncertainty"]) # peak variance print(diag["uncertainty_std"]) # spread across the test set print(diag["high_uncertainty_ratio"]) # fraction above the 90th percentile print(diag["n_extrapolation_points"]) # points outside the training hull print(diag["coefficient_of_variation"])# std / mean — scale-free spread metric A ``coefficient_of_variation`` between 0.1 and 10.0 is considered well-calibrated. Values outside that range suggest the uncertainty scale may need adjustment. Region Classification --------------------- ``classify_regions()`` assigns each test point one of four labels from the :class:`~gpclarity.uncertainty_analysis.UncertaintyRegion` enum: .. code-block:: python labels = profiler.classify_regions(X_test) # Each label is one of: # UncertaintyRegion.INTERPOLATION — inside the training hull, low uncertainty # UncertaintyRegion.BOUNDARY — near the edge of the training data # UncertaintyRegion.EXTRAPOLATION — outside the training hull # UncertaintyRegion.STRUCTURAL — high uncertainty despite dense training data ``identify_uncertainty_regions()`` returns the actual points in each category: .. code-block:: python regions = profiler.identify_uncertainty_regions(X_test, threshold_percentile=90) print(regions["high_uncertainty_points"]["points"]) print(regions["low_uncertainty_points"]["points"]) print(regions["threshold"]) # variance threshold used Uncertainty Calibration ----------------------- If you have held-out validation data with known targets, ``calibrate_uncertainty()`` finds the scalar multiplier that brings the model's uncertainty in line with observed prediction errors: .. code-block:: python X_val = np.linspace(0, 10, 20).reshape(-1, 1) y_val = np.sin(X_val).flatten() + 0.1 * np.random.randn(20) cal = profiler.calibrate_uncertainty(X_val, y_val) print(cal["optimal_scale"]) # multiply raw std by this factor print(cal["coverage_before"]) # empirical 95% coverage before calibration print(cal["coverage_after"]) # coverage after applying optimal_scale A well-calibrated model has ``coverage_after`` close to 0.95 for 2-sigma intervals. If ``optimal_scale`` >> 1, the model is overconfident; if << 1, it is underconfident. Visualization ------------- .. code-block:: python profiler.plot( X_test, X_train=X, y_train=y, confidence_level=2.0, # number of sigma for shaded band show_regions=True, # colour-code extrapolation regions ) The plot shows the posterior mean, confidence band, training data, and (optionally) a colour overlay for extrapolation regions. Full Summary ------------ ``get_summary()`` combines diagnostics, region classification, and recommendations into a single dict: .. code-block:: python summary = profiler.get_summary(X_test) print(summary["mean_uncertainty"]) print(summary["n_extrapolation_points"]) for rec in summary["recommendations"]: print("-", rec) The ``recommendations`` list contains actionable strings such as *"High extrapolation ratio — restrict predictions to the training domain"* or *"Model is overconfident — consider calibrating uncertainty scale"*.