Quickstart for MEGNet users¶

UnlockNN currently contains implementations for adding uncertainty quantification to a MEGNetModel with minimal overhead. After installing unlockNN (see Installation), you can easily add uncertainty quantification to a trained MEGNetModel. See also the example scripts on GitHub.

This document will demonstrate how to add uncertainty quantification to MEGNet’s pre-trained formation energies model. UnlockNN’s method for adding this uncertainty quantification is explained in Probabilistic models. The essential steps to adding uncertainty quantification are:

Loading/training a MEGNetModel.
Initializing a MEGNetProbModel.
A first run of training using MEGNetProbModel.train().
Fine tuning the model: unfreezing its "NN" layers, using MEGNetProbModel.set_frozen(), then training again.
Saving the model using MEGNetProbModel.save().

The model can then be reloaded using MEGNetProbModel.load().

In order to train the uncertainty quantifier, we will use an example dataset of binary compounds that lie on the convex hull, which we will download from the Materials Project. This example is also available in notebook format on unlockNN’s GitHub page.

Running this example script takes approximately 15 minutes on a desktop computer with an Nvidia GTX 1080 GPU, not including the time it takes to download the data.

"""Add uncertainty quantification to a MEGNetModel for predicting formation energies."""
from pathlib import Path

from megnet.models import MEGNetModel
from unlocknn.download import load_data
from unlocknn.model import MEGNetProbModel

TRAINING_RATIO: float = 0.8
NUM_INDUCING_POINTS: int = 500  # Number of inducing index points for VGP
BATCH_SIZE: int = 128
MODEL_SAVE_DIR: Path = Path("binary_e_form_example")

# Data preprocessing:
# Load binary compounds' formation energies example data,
# then split into training and validation subsets.
full_df = load_data("binary_e_form")
num_training = int(TRAINING_RATIO * len(full_df.index))
train_df = full_df[:num_training]
val_df = full_df[num_training:]
# 4217 training samples, 1055 validation samples.

train_structs = train_df["structure"]
val_structs = val_df["structure"]
train_targets = train_df["formation_energy_per_atom"]
val_targets = val_df["formation_energy_per_atom"]

# 1. Load MEGNetModel
meg_model = MEGNetModel.from_mvl_models("Eform_MP_2019")

# 2. Make probabilistic model
# Specify Kullback-Leibler divergence weighting in loss function:
kl_weight = BATCH_SIZE / num_training
# Then make the model:
prob_model = MEGNetProbModel(
    meg_model=meg_model,
    num_inducing_points=NUM_INDUCING_POINTS,
    kl_weight=kl_weight,
)


def train_model():
    """Train and save the probabilistic model."""
    prob_model.train(
        train_structs,
        train_targets,
        epochs=50,
        val_inputs=val_structs,
        val_targets=val_targets,
    )
    prob_model.save(MODEL_SAVE_DIR)


# 3. First training run is to approximate correct inducing points locations
train_model()
# 4. Unfreeze NN layers and train again for fine tuning
prob_model.set_frozen("NN", freeze=False)
train_model()
# 5. ``train_model`` also handles saving.

# We can then load the model from disk and perform some predictions
loaded_model = MEGNetProbModel.load(MODEL_SAVE_DIR)
example_struct, example_energy = train_structs[0], train_targets[0]
predicted, stddev = loaded_model.predict(example_struct)
# Two standard deviations is the 95% confidence interval
print(f"{example_struct.composition}: ")
print(f"Predicted E_f: {predicted.item():.3f} ± {stddev.item() * 2:.3f} eV")
print(f"Actual E_f: {example_energy:.3f} eV")
"""La2 Rh2: 
Predicted E_f: -0.739 ± 0.063 eV
Actual E_f: -0.737 eV"""