Data Influence

The data_influence module identifies which training points have the greatest impact on the model’s predictions. It computes leverage scores via an O(n³) Cholesky solve and exact leave-one-out (LOO) variance increases via optional joblib parallelism. High-leverage points drive the kernel hyperparameters; high LOO-variance points are informative but hard to interpolate around.

When to use: to find outliers that are distorting the fit, remove redundant training points, or understand which observations are driving predictions in a given region.

import gpclarity, numpy as np

influence = gpclarity.DataInfluenceMap(model)
result = influence.compute_influence_scores(X_train)
top_idx = np.argmax(result.scores)
print(f"Most influential point: index {top_idx}, score {result.scores[top_idx]:.4f}")

report = influence.get_influence_report(X_train)
print(report["summary"])

Data Influence

Classes