Evaluating metrics¶

Although the probabilistic models implemented in unlockNN use the keras API, currently there is poor support for models that predict distributions, rather than tensors. In particular, often the keras.metrics API is incompatible. To that end, unlockNN provides some alternative utilities for computing metrics, including some useful uncertainty quantifier-specific metrics.

The main interface for computing metrics is the evaluate_uq_metrics() function:

unlocknn.metrics.evaluate_uq_metrics(prob_model: ProbNN, test_inputs: List[Union[Structure, Dict[str, Union[ndarray, List[Union[int, float]]]]]], test_targets: List[Union[float, ndarray]], metrics: List[str] = ['nll', 'sharpness', 'variation', 'mae', 'mse', 'rmse']) → Dict[str, float]¶

Evaluate probabilistic model metrics.

Parameters

prob_model – The probabilistic model to evaluate.
test_inputs – The input structures or graphs.
test_targets – The target values for the structures.
metrics – A list of metrics to compute. Defaults to computing all of the currently implemented metrics.

Currently implemented metrics are given in AVAILABLE_METRICS.

Returns: Dictionary of {metric_name: value}.

Example

Compute the metrics of the example MEGNetProbModel for predicting binary compounds’ formation energies:

>>> from unlocknn.download import load_data, load_pretrained
>>> binary_model = load_pretrained("binary_e_form")
>>> binary_data = load_data("binary_e_form")
>>> metrics = evaluate_uq_metrics(
...     binary_model, binary_data["structure"], binary_data["formation_energy_per_atom"]
... )
>>> for metric_name, value in metrics.items():
...     print(f"{metric_name} = {value:.3f}")
nll = -8922.768
sharpness = 0.032
variation = 0.514
mae = 0.027
mse = 0.002
rmse = 0.041

unlocknn.metrics.AVAILABLE_METRICS: Dict[str, Callable[[List[Union[float, ndarray]], List[Union[float, ndarray]], List[Union[float, ndarray]]], float]] = {'mae': <unlocknn.metrics.MeanErrorMetric object>, 'mse': <unlocknn.metrics.MeanErrorMetric object>, 'nll': <function neg_log_likelihood>, 'rmse': <unlocknn.metrics.MeanErrorMetric object>, 'sharpness': <function sharpness>, 'variation': <function variation>}¶: Indicates the mapping between a metric name (a potential argument to evaluate_uq_metrics()) and the function it calls.

Uncertainty quantifier-specific metrics¶

unlocknn.metrics.neg_log_likelihood(predictions: List[Union[float, ndarray]], stddevs: List[Union[float, ndarray]], true_vals: List[Union[float, ndarray]]) → float¶

Calculate the negative log likelihood (NLL) of true values given predictions.

NLL is given by

\[\mathrm{NLL} = -\sum_i \log p_i(y_i),\]

where \(y_i\) is the \(i^\mathrm{th}\) observed (true) value and \(p_i\) is the probability density function for the \(i^\mathrm{th}\) predicted Gaussian distribution:

\[p_i \sim \mathcal{N} \left( \hat{y}_i, \sigma_i^2 \right),\]

where \(\hat{y}_i\) is the \(i^\mathrm{th}\) predicted mean and \(\sigma_i\) is the \(i^\mathrm{th}\) predicted standard deviation.

unlocknn.metrics.sharpness(predictions: Optional[List[Union[float, ndarray]]], stddevs: List[Union[float, ndarray]], true_vals: Optional[List[Union[float, ndarray]]]) → float¶

Calculate the sharpness of predictions.

Sharpness is the root-mean-squared of the predicted standard deviations.

unlocknn.metrics.variation(predictions: Optional[List[Union[float, ndarray]]], stddevs: List[Union[float, ndarray]], true_vals: Optional[List[Union[float, ndarray]]]) → float¶

Calculate the coefficient of variation of predictions.

Indicates dispersion of uncertainty estimates.

Let \(\sigma\) be predicted standard deviations, \(\bar{\sigma}\) be the mean of the standard deviations and \(N\) be the number of predictions. The coefficient of variation is given by:

\[C_v = \frac{1}{\bar{\sigma}}\sqrt{\frac{\sum_i^N{(\sigma_i - \bar{\sigma})^2}}{N - 1}}\]