Data Influence
The data_influence module identifies which training points have the greatest
impact on the model’s predictions. It computes leverage scores via an O(n³)
Cholesky solve and exact leave-one-out (LOO) variance increases via optional
joblib parallelism. High-leverage points drive the kernel hyperparameters;
high LOO-variance points are informative but hard to interpolate around.
When to use: to find outliers that are distorting the fit, remove redundant training points, or understand which observations are driving predictions in a given region.
import gpclarity, numpy as np
influence = gpclarity.DataInfluenceMap(model)
result = influence.compute_influence_scores(X_train)
top_idx = np.argmax(result.scores)
print(f"Most influential point: index {top_idx}, score {result.scores[top_idx]:.4f}")
report = influence.get_influence_report(X_train)
print(report["summary"])