gpclarity.DataInfluenceMap
Constructor
- gpclarity.data_influence.__init__(model)
Initialize with GP model.
- Parameters:
model (GPy.models.GPRegression) – Trained GP model with
predictandkernattributes- Raises:
ValueError – If model lacks required attributes
Example:
import GPy from gpclarity import DataInfluenceMap kernel = GPy.kern.RBF(input_dim=2) model = GPy.models.GPRegression(X_train, y_train, kernel) model.optimize() influence = DataInfluenceMap(model)
Methods
- gpclarity.data_influence.compute_influence_scores(X_train, y_train=None, *, use_cache=True) InfluenceResult
Compute influence scores using leverage scores (optimized O(n³)).
Leverage scores computed via diagonal of hat matrix using cached Cholesky decomposition.
- Parameters:
X_train (np.ndarray) – Training input locations with shape
(n_train, n_dims)y_train (np.ndarray, optional) – Training outputs with shape
(n_train,)or(n_train, 1). Optional, validated if provided but not used for computation.use_cache (bool) – Whether to use internal cache for kernel matrix. Default:
True
- Returns:
InfluenceResult containing scores and metadata
- Return type:
InfluenceResult
- Raises:
ValueError – If X_train is not 2D array-like
InfluenceError – If computation fails
Example:
result = influence.compute_influence_scores(X_train) # Get most influential point most_influential_idx = np.argmax(result.scores) print(f"Point {most_influential_idx}: score={result.scores[most_influential_idx]:.3f}")
- gpclarity.data_influence.compute_loo_variance_increase(X_train, y_train, *, n_jobs=1, verbose=False) Tuple[np.ndarray, np.ndarray]
Exact Leave-One-Out variance increase with optional parallelization.
- Parameters:
X_train (np.ndarray) – Training inputs with shape
(n_train, n_dims)y_train (np.ndarray) – Training outputs with shape
(n_train,)or(n_train, 1)n_jobs (int) – Number of parallel jobs.
-1for all cores,1for sequential. Default:1verbose (bool) – Whether to display progress bar. Requires
tqdm. Default:False
- Returns:
Tuple of
(variance_increase, prediction_errors). Both arrays shape(n_train,).- Return type:
- Raises:
InfluenceError – If computation fails
Example:
var_increase, pred_errors = influence.compute_loo_variance_increase( X_train, y_train, n_jobs=-1, verbose=True ) # Identify outliers: high variance increase AND high prediction error outlier_mask = (var_increase > np.percentile(var_increase, 95)) & \\ (pred_errors > np.percentile(pred_errors, 95)) outliers = np.where(outlier_mask)[0]
- gpclarity.data_influence.get_influence_report(X_train, y_train, *, compute_loo=True, n_jobs=1) Dict[str, Any]
Comprehensive influence analysis report.
- Parameters:
- Returns:
Dictionary with influence statistics and diagnostics
- Return type:
Return structure:
{ 'computation_summary': { 'total_time': float, 'leverage_time': float, 'n_points': int, 'method': str }, 'influence_scores': { 'mean': float, 'std': float, 'median': float, 'max': float, 'min': float, 'p95': float, 'p5': float }, 'most_influential_point': { 'index': int, 'location': List[float], 'score': float }, 'least_influential_point': { 'index': int, 'location': List[float], 'score': float }, 'diagnostics': { 'high_leverage_count': int, 'low_influence_count': int, 'non_finite_scores': int }, 'loo_analysis': { # Only if compute_loo=True 'variance_increase': List[float], 'prediction_errors': List[float], 'mean_error': float, 'max_error': float } }
Example:
report = influence.get_influence_report(X_train, y_train, compute_loo=True) # Check influence distribution mean_score = report['influence_scores']['mean'] high_leverage = report['diagnostics']['high_leverage_count'] # Get most influential point details most_inf = report['most_influential_point'] print(f"Most influential: Point {most_inf['index']} at {most_inf['location']}")
- gpclarity.data_influence.plot_influence(X_train, influence_scores, ax=None, **scatter_kwargs) plt.Axes
Visualize data point influence.
Delegated to
gpclarity.plotting.plot_influence_map.- Parameters:
X_train (np.ndarray) – Training input locations
influence_scores (np.ndarray or InfluenceResult) – Computed scores or InfluenceResult
ax (plt.Axes, optional) – Matplotlib axes. Created if None.
scatter_kwargs – Additional arguments passed to
ax.scatter()
- Returns:
Matplotlib axes object
- Return type:
plt.Axes
- Raises:
ImportError – If matplotlib not installed
ValueError – If input dimensions > 2
Example:
import matplotlib.pyplot as plt fig, ax = plt.subplots() influence.plot_influence(X_train, result, ax=ax, s=100, alpha=0.6, cmap='hot') plt.show()