Evaluating metrics¶
Although the probabilistic models implemented in unlockNN use the keras API, currently there is poor support for models that predict distributions, rather than tensors. In particular, often the keras.metrics API is incompatible. To that end, unlockNN provides some alternative utilities for computing metrics, including some useful uncertainty quantifier-specific metrics.
The main interface for computing metrics is the evaluate_uq_metrics()
function:
- unlocknn.metrics.evaluate_uq_metrics(prob_model: ProbNN, test_inputs: List[Union[Structure, Dict[str, Union[ndarray, List[Union[int, float]]]]]], test_targets: List[Union[float, ndarray]], metrics: List[str] = ['nll', 'sharpness', 'variation', 'mae', 'mse', 'rmse']) Dict[str, float]¶
Evaluate probabilistic model metrics.
- Parameters
prob_model – The probabilistic model to evaluate.
test_inputs – The input structures or graphs.
test_targets – The target values for the structures.
metrics – A list of metrics to compute. Defaults to computing all of the currently implemented metrics.
Currently implemented metrics are given in
AVAILABLE_METRICS.- Returns
Dictionary of
{metric_name: value}.
Example
Compute the metrics of the example
MEGNetProbModelfor predicting binary compounds’ formation energies:>>> from unlocknn.download import load_data, load_pretrained >>> binary_model = load_pretrained("binary_e_form") >>> binary_data = load_data("binary_e_form") >>> metrics = evaluate_uq_metrics( ... binary_model, binary_data["structure"], binary_data["formation_energy_per_atom"] ... ) >>> for metric_name, value in metrics.items(): ... print(f"{metric_name} = {value:.3f}") nll = -8922.768 sharpness = 0.032 variation = 0.514 mae = 0.027 mse = 0.002 rmse = 0.041
- unlocknn.metrics.AVAILABLE_METRICS: Dict[str, Callable[[List[Union[float, ndarray]], List[Union[float, ndarray]], List[Union[float, ndarray]]], float]] = {'mae': <unlocknn.metrics.MeanErrorMetric object>, 'mse': <unlocknn.metrics.MeanErrorMetric object>, 'nll': <function neg_log_likelihood>, 'rmse': <unlocknn.metrics.MeanErrorMetric object>, 'sharpness': <function sharpness>, 'variation': <function variation>}¶
Indicates the mapping between a metric name (a potential argument to
evaluate_uq_metrics()) and the function it calls.
Uncertainty quantifier-specific metrics¶
- unlocknn.metrics.neg_log_likelihood(predictions: List[Union[float, ndarray]], stddevs: List[Union[float, ndarray]], true_vals: List[Union[float, ndarray]]) float¶
Calculate the negative log likelihood (NLL) of true values given predictions.
NLL is given by
\[\mathrm{NLL} = -\sum_i \log p_i(y_i),\]where \(y_i\) is the \(i^\mathrm{th}\) observed (true) value and \(p_i\) is the probability density function for the \(i^\mathrm{th}\) predicted Gaussian distribution:
\[p_i \sim \mathcal{N} \left( \hat{y}_i, \sigma_i^2 \right),\]where \(\hat{y}_i\) is the \(i^\mathrm{th}\) predicted mean and \(\sigma_i\) is the \(i^\mathrm{th}\) predicted standard deviation.
- unlocknn.metrics.sharpness(predictions: Optional[List[Union[float, ndarray]]], stddevs: List[Union[float, ndarray]], true_vals: Optional[List[Union[float, ndarray]]]) float¶
Calculate the sharpness of predictions.
Sharpness is the root-mean-squared of the predicted standard deviations.
- unlocknn.metrics.variation(predictions: Optional[List[Union[float, ndarray]]], stddevs: List[Union[float, ndarray]], true_vals: Optional[List[Union[float, ndarray]]]) float¶
Calculate the coefficient of variation of predictions.
Indicates dispersion of uncertainty estimates.
Let \(\sigma\) be predicted standard deviations, \(\bar{\sigma}\) be the mean of the standard deviations and \(N\) be the number of predictions. The coefficient of variation is given by:
\[C_v = \frac{1}{\bar{\sigma}}\sqrt{\frac{\sum_i^N{(\sigma_i - \bar{\sigma})^2}}{N - 1}}\]