NEP interface#

Training IO#

calorine provides a number of functions for preparing input files for training NEP models, including in particular the setup_training function. There are also several functions for analyzing the training process, including, e.g., the read_loss, read_structures, and get_parity_data functions.

calorine.nep.get_parity_data(structures, property, selection=None, flatten=True)[source]#

Returns the predicted and target energies, forces, virials or stresses from a list of structures in a format suitable for generating parity plots.

The structures should have been read using read_structures, such that the info object is populated with keys of the form <property>_<type> where <property> is, e.g., energy or force and <type> is one of predicted or target.

The resulting parity data is returned as a tuple of dicts, where each entry corresponds to a list.

Parameters:
  • structures (list[Atoms]) – List of structures as read with read_structures.

  • property (str) – One of energy, force, virial, stress, bec, dipole, polarizability, or atomic_v.

  • selection (list[str]) – A list containing which components to return, and/or the norm. Possible values are x, y, z, xx, yy, zz, yz, xz, xy, norm, pressure.

  • flatten (bool) – if True return flattened lists; this is useful for flattening the components of force or virials into a simple list

Return type:

DataFrame

calorine.nep.read_loss(filename)[source]#

Parses a file in loss.out format from GPUMD and returns the content as a data frame. More information concerning file format, content and units can be found here.

Parameters:

filename (str) – input file name

Return type:

DataFrame

calorine.nep.read_nepfile(filename)[source]#

Returns the content of a configuration file (nep.in) as a dictionary.

Parameters:

filename (str) – input file name

Return type:

dict[str, Any]

calorine.nep.read_structures(dirname)[source]#

Parses the output files with training and test data from a nep run and returns their content as two lists of structures, representing training and test data, respectively. Target and predicted data are included in the info dict of the Atoms objects.

Parameters:

dirname (str) – Directory from which to read output files.

Return type:

tuple[list[Atoms], list[Atoms]]

calorine.nep.setup_training(parameters, structures, enforced_structures=[], rootdir='.', mode='kfold', n_splits=None, train_fraction=None, seed=42, overwrite=False)[source]#

Sets up the input files for training a NEP via the nep executable of the GPUMD package.

Parameters:
  • parameters (NamedTuple) – dictionary containing the parameters to be set in the nep.in file; see here for an overview of these parameters

  • structures (List[Atoms]) – list of structures to be included

  • enforced_structures (List[int]) – structures that _must_ be included in the training set, provided in the form of a list of indices that refer to the content of the structures parameter

  • rootdir (str) – root directory in which to create the input files

  • mode (str) – how the test-train split is performed. Options: 'kfold' and 'bagging'

  • n_splits (int) – number of splits of the input structures in training and test sets that ought to be performed; by default no split will be done and all input structures will be used for training

  • train_fraction (float) – fraction of structures to use for training when mode 'bagging' is used

  • seed (int) – random number generator seed to be used; this ensures reproducability

  • overwrite (bool) – if True overwrite the content of rootdir if it exists

Return type:

None

calorine.nep.write_nepfile(parameters, dirname)[source]#

Writes parameters file for NEP construction.

Parameters:
  • parameters (NamedTuple) – input parameters; see here

  • dirname (str) – directory in which to place input file and links

Return type:

None

calorine.nep.write_structures(outfile, structures)[source]#

Writes structures for training/testing in format readable by nep executable.

Parameters:
  • outfile (str) – output filename

  • structures (list[Atoms]) – list of structures with energy, forces, and (possibly) stresses

Return type:

None

Evaluating models#

TNEP models allow one to represent tensorial properties such as dipole moment, susceptibility, or polarizability. To test and analyze these models calorine provides several specialized functions, which can also be used to implement extended Hamiltonians.

calorine.nep.get_dipole(structure, model_filename=None, debug=False)[source]#

Calculates the dipole for a given structure. A NEP model defined by a nep.txt file needs to be provided.

Parameters:
  • structure (Atoms) – Input structure

  • model_filename (Optional[str]) – Path to NEP model in nep.txt format. Defaults to None.

  • debug (bool) – Flag to toggle debug mode. Prints GPUMD output. Defaults to False.

Return type:

ndarray

calorine.nep.get_dipole_gradient(structure, model_filename=None, backend='c++', method='central difference', displacement=0.01, charge=1.0, nep_command='nep', debug=False)[source]#

Calculates the dipole gradient for a given structure using finite differences. A NEP model defined by a nep.txt file needs to be provided.

Parameters:
  • structure (Atoms) – Input structure

  • model_filename (Optional[str]) – Path to NEP model in nep.txt format. Defaults to None.

  • backend (str) – Backend to use for computing dipole gradient with finite differences. One of 'c++' (CPU), 'python' (CPU) and 'nep' (GPU). Defaults to 'c++'.

  • method (str) – Method for computing gradient with finite differences. One of ‘forward difference’ and ‘central difference’. Defaults to ‘central difference’

  • displacement (float) – Displacement in Å to use for finite differences. Defaults to 0.01.

  • charge (float) – System charge in units of the elemental charge. Used for correcting the dipoles before computing the gradient. Defaults to 1.0.

  • nep_command (str) – Command for running the NEP executable. Defaults to 'nep'.

  • debug (bool) – Flag to toggle debug mode. Prints GPUMD output (if applicable). Defaults to False.

Return type:

ndarray

Returns:

dipole gradient with shape (N, 3, 3)

calorine.nep.get_polarizability(structure, model_filename=None, debug=False)[source]#

Calculates the polarizability tensor for a given structure. A NEP model defined by a nep.txt file needs to be provided. The model must be trained to predict the polarizability.

Parameters:
  • structure (Atoms) – Input structure

  • model_filename (Optional[str]) – Path to NEP model in nep.txt format. Defaults to None.

  • debug (bool) – Flag to toggle debug mode. Prints GPUMD output. Defaults to False.

Return type:

ndarray

calorine.nep.get_polarizability_gradient(structure, model_filename=None, displacement=0.01, component='full', debug=False)[source]#

Calculates the dipole gradient for a given structure using finite differences. A NEP model defined by a nep.txt file needs to be provided. This function computes the derivatives using the second-order central difference method with a C++ backend.

Parameters:
  • structure (Atoms) – Input structure.

  • model_filename (Optional[str]) – Path to NEP model in nep.txt format. Defaults to None.

  • displacement (float) – Displacement in Å to use for finite differences. Defaults to 0.01.

  • component (Union[str, List[str]]) – Component or components of the polarizability tensor that the gradient should be computed for. The following components are available: x`, ``y, z, full. Option full computes the derivative whilst moving the atoms in each Cartesian direction, which yields a tensor of shape (N, 3, 6). Multiple components may be specified. Defaults to full.

  • debug (bool) – Flag to toggle debug mode. Prints GPUMD output (if applicable). Defaults to False.

Return type:

ndarray

Returns:

polarizability gradient with shape (N, C, 6) where C is the number of components chosen.

calorine.nep.get_potential_forces_and_virials(structure, model_filename=None, debug=False)[source]#

Calculates the per-atom potential, forces and virials for a given structure. A NEP model defined by a nep.txt file needs to be provided.

Parameters:
  • structure (Atoms) – Input structure

  • model_filename (Optional[str]) – Path to NEP model. Defaults to None.

  • debug (bool) – Flag to toggle debug mode. Prints GPUMD output. Defaults to False.

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

  • potential with shape (natoms,)

  • forces with shape (natoms, 3)

  • virials with shape (natoms, 9)

Inspecting NEP models#

Once a model has been trained it can be analyzed in more detail. To this end, there are functions for accessing the descriptors, the latent space, or to load the entire model. The latter function (read_model) returns a Model object, which contains the entire information about this model. It is thereby possible not only to query but to manipulate the model and write the result back to disk.

calorine.nep.get_descriptors(structure, model_filename, debug=False)[source]#

Calculates the NEP descriptors for a given structure. A NEP model defined by a nep.txt can additionally be provided to get the NEP3 model specific descriptors.

Parameters:
  • structure (Atoms) – Input structure

  • model_filename (str) – Path to NEP model in nep.txt format.

  • debug (bool) – Flag to toggle debug mode. Prints GPUMD output. Defaults to False.

Return type:

ndarray

calorine.nep.get_latent_space(structure, model_filename=None, debug=False)[source]#

Calculates the latent space representation of a structure, i.e, the activiations in the hidden layer. A NEP model defined by a nep.txt file needs to be provided.

Parameters:
  • structure (Atoms) – Input structure

  • model_filename (Optional[str]) – Path to NEP model. Defaults to None.

  • debug (bool) – Flag to toggle debug mode. Prints GPUMD output. Defaults to False.

Return type:

ndarray

calorine.nep.read_model(filename, restart_file=None)[source]#

Parses a file in nep.txt format and returns the content in the form of a Model object.

Parameters:
  • filename (str) – Input file name.

  • restart_file (str) – If provided, also read restart parameters from this file in nep.restart format and attach them to the returned model. Defaults to None.

Return type:

Model

NEP model class#

class calorine.nep.model.Model(version, model_type, types, radial_cutoff, angular_cutoff, n_basis_radial, n_basis_angular, n_max_radial, n_max_angular, l_max_3b, l_max_4b, l_max_5b, has_q_112, has_q_123, has_q_233, has_q_134, n_descriptor_radial, n_descriptor_angular, n_neuron, n_parameters, n_descriptor_parameters, n_ann_parameters, ann_parameters, q_scaler, radial_descriptor_weights, angular_descriptor_weights, sqrt_epsilon_infinity=None, restart_parameters=None, zbl=None, zbl_typewise_cutoff_factor=None, max_neighbors_radial=None, max_neighbors_angular=None, radial_typewise_cutoff_factor=None, angular_typewise_cutoff_factor=None)[source]#

Objects of this class represent a NEP model in a form suitable for inspection and manipulation. Typically a Model object is instantiated by calling the read_model function.

version#

NEP version.

Type:

int

model_type#

One of potential, dipole or polarizability.

Type:

str

types#

Chemical species that this model represents.

Type:

tuple[str, …]

radial_cutoff#

The radial cutoff parameter in Å. Is a list of radial cutoffs ordered after types in the case of typewise cutoffs.

Type:

float | list[float]

angular_cutoff#

The angular cutoff parameter in Å. Is a list of angular cutoffs ordered after types in the case of typewise cutoffs.

Type:

float | list[float]

max_neighbors_radial#

Maximum number of neighbors in neighbor list for radial terms.

Type:

int

max_neighbors_angular#

Maximum number of neighbors in neighbor list for angular terms.

Type:

int

radial_typewise_cutoff_factor#

The radial cutoff factor if use_typewise_cutoff is used.

Type:

float

angular_typewise_cutoff_factor#

The angular cutoff factor if use_typewise_cutoff is used.

Type:

float

zbl#

Inner and outer cutoff for transition to ZBL potential.

Type:

tuple[float, float]

zbl_typewise_cutoff_factor#

Typewise cutoff when use_typewise_cutoff_zbl is used.

Type:

float

n_basis_radial#

Number of radial basis functions \(n_\mathrm{basis}^\mathrm{R}\).

Type:

int

n_basis_angular#

Number of angular basis functions \(n_\mathrm{basis}^\mathrm{A}\).

Type:

int

n_max_radial#

Maximum order of Chebyshev polymonials included in radial expansion \(n_\mathrm{max}^\mathrm{R}\).

Type:

int

n_max_angular#

Maximum order of Chebyshev polymonials included in angular expansion \(n_\mathrm{max}^\mathrm{A}\).

Type:

int

l_max_3b#

Maximum expansion order for three-body terms \(l_\mathrm{max}^\mathrm{3b}\).

Type:

int

l_max_4b#

Maximum expansion order for four-body terms \(l_\mathrm{max}^\mathrm{4b}\).

Type:

int

l_max_5b#

Maximum expansion order for five-body terms \(l_\mathrm{max}^\mathrm{5b}\).

Type:

int

has_q_112#

Flag enabling the 5-body \(q_{112}\) descriptor (0 or 1).

Type:

int

has_q_123#

Flag enabling the 5-body \(q_{123}\) descriptor (0 or 1).

Type:

int

has_q_233#

Flag enabling the 5-body \(q_{233}\) descriptor (0 or 1).

Type:

int

has_q_134#

Flag enabling the higher-body \(q_{134}\) descriptor (0 or 1).

Type:

int

n_descriptor_radial#

Dimension of radial part of descriptor.

Type:

int

n_descriptor_angular#

Dimension of angular part of descriptor.

Type:

int

n_neuron#

Number of neurons in hidden layer.

Type:

int

n_parameters#

Total number of parameters including scalers (which are not fit parameters).

Type:

int

n_descriptor_parameters#

Number of parameters in descriptor.

Type:

int

n_ann_parameters#

Number of neural network weights.

Type:

int

ann_parameters#

Neural network weights.

Type:

dict[tuple[str, dict[str, np.darray]]]

q_scaler#

Scaling parameters.

Type:

List[float]

radial_descriptor_weights#

Radial descriptor weights by combination of species; the array for each combination has dimensions of \((n_\mathrm{max}^\mathrm{R}+1) \times (n_\mathrm{basis}^\mathrm{R}+1)\).

Type:

dict[tuple[str, str], np.ndarray]

angular_descriptor_weights#

Angular descriptor weights by combination of species; the array for each combination has dimensions of \((n_\mathrm{max}^\mathrm{A}+1) \times (n_\mathrm{basis}^\mathrm{A}+1)\).

Type:

dict[tuple[str, str], np.ndarray]

sqrt_epsilon_infinity#

Square root of epsilon infinity $epsilon_infty$ (only for NEP models with charges).

Type:

Optional[float]

restart_parameters#

NEP restart parameters. A nested dictionary that contains the mean (mu) and standard deviation (sigma) for the ANN and descriptor parameters. Is set using the py:meth:~Model.read_restart method. Defaults to None.

Type:

dict[str, dict[str, dict[str, np.ndarray]]]

add_species(species, radial_cutoff=None, angular_cutoff=None, sigma_new=0.1, sigma_factor=0.1, sigma_floor=1e-06, seed=None)[source]#

Add one or more species to the model.

Returns a new Model with the requested species added. New ANN sub-networks and descriptor weight pairs are initialised by drawing mu uniformly from [-1, 1] (matching the GPUMD fresh-model initialisation), with sigma = sigma_new in the restart. Charge-specific parameters (w1_charge) are kept at mu = 0 to preserve stability, also matching GPUMD. Existing parameters receive adaptive sigma: sigma = max(sigma_floor, sigma_factor * |mu|).

Only supported for NEP4 models. For NEP3 the ANN is shared across all species and adding a per-species sub-network is not meaningful.

Parameters:
  • species (list[str]) – New species names to add. Appended to types in the order given.

  • radial_cutoff (float | list[float]) – Radial cutoff(s) for the new species, in Å. Required when the model uses typewise cutoffs (i.e. isinstance(model.radial_cutoff, list) is True). Pass a single float or a list with one value per new species.

  • angular_cutoff (float | list[float]) – Angular cutoff(s) for the new species, in Å. Same requirements as radial_cutoff.

  • sigma_new (float) – SNES sigma assigned to all newly created parameters. Defaults to 0.1, matching the GPUMD sigma0 default.

  • sigma_factor (float) – Controls sigma for existing parameters: sigma = max(sigma_floor, sigma_factor * |mu|).

  • sigma_floor (float) – Minimum sigma for existing parameters.

  • seed (int | None) – Seed for the random number generator used to draw the initial mu values. Pass an integer for reproducible initialisation.

Returns:

New model with updated structure, weights, and restart statistics.

Return type:

Model

Raises:

ValueError – If the model version is not 4, if restart_parameters are not loaded, if any species is already in the model, or if typewise cutoffs are used and radial_cutoff/angular_cutoff are not provided.

augment(n_neuron=None, l_max_4b=None, l_max_5b=None, has_q_112=None, has_q_123=None, has_q_233=None, has_q_134=None, charge_head=False, sigma_new=0.01, sigma_factor=0.1, sigma_floor=1e-06)[source]#

Augment the model by adding neurons, descriptor terms, or a charge output head.

Returns a new Model with the requested structural changes applied. The source model is not modified. Existing parameter values are preserved exactly; new parameters are initialized to zero. The restart SNES statistics are updated as follows:

  • Existing parameters: sigma = max(sigma_floor, sigma_factor * |mu|), which re-opens the SNES search distribution while keeping parameters that were driven toward zero effectively dormant.

  • New parameters: mu = 0, sigma = sigma_new.

Parameters:
  • n_neuron (int) – Target neuron count; must be >= current. None leaves unchanged.

  • l_max_4b (int) – Target 4-body l_max value; must be >= current. None leaves unchanged.

  • l_max_5b (int) – Target 5-body l_max value; must be >= current. None leaves unchanged.

  • has_q_112 (bool) – True enables the q_112 5-body descriptor; None or False leaves the current state unchanged (disabling an already-enabled term raises).

  • has_q_123 (bool) – Same as has_q_112 but for the q_123 term.

  • has_q_233 (bool) – Same as has_q_112 but for the q_233 term.

  • has_q_134 (bool) – Same as has_q_112 but for the q_134 term.

  • charge_head (bool) – If True, promote a potential model to potential_with_charges by adding a charge output head (w1_charge per species and sqrt_epsilon_infinity).

  • sigma_new (float) – SNES sigma assigned to all newly created parameters.

  • sigma_factor (float) – Controls the sigma for existing parameters: sigma = max(sigma_floor, sigma_factor * |mu|).

  • sigma_floor (float) – Minimum sigma for existing parameters; keeps near-zero (dormant) parameters from being accidentally re-activated.

Returns:

New model with updated structure, weights, and restart statistics.

Return type:

Model

Raises:

ValueError – If restart_parameters is not loaded, if n_neuron or an l_max_* target is smaller than the current value, if a has_q_* flag attempts to disable an already-enabled term, or if charge_head=True on a model that is not of type potential.

keep_species(species, sigma_factor=0.1, sigma_floor=1e-06)[source]#

Retain only the specified species, removing all others.

Convenience complement to remove_species(). Useful when the set of species to drop is large (e.g. isolating two elements from a foundation model with dozens of species).

Parameters:
  • species (list[str]) – Species names to keep. All other species are removed.

  • sigma_factor (float) – Passed to remove_species(). Controls adaptive sigma for surviving parameters when restart is loaded.

  • sigma_floor (float) – Passed to remove_species(). Minimum sigma for surviving parameters.

Returns:

New model containing only the requested species.

Return type:

Model

Raises:

ValueError – If any of the requested species is not in the model.

prune(n_neuron=None, l_max_4b=None, l_max_5b=None, has_q_112=None, has_q_123=None, has_q_233=None, has_q_134=None, charge_head=False, sigma_factor=0.1, sigma_floor=1e-06)[source]#

Prune the model by removing neurons, disabling descriptor terms, or removing the charge output head.

Returns a new Model with the requested structural changes applied. The source model is not modified. When reducing n_neuron, neurons are selected by importance score averaged over species: importance[n] = mean_s(||w0_s[n,:]||_2 * |w1_s[n]|).

All surviving parameters receive adaptive SNES sigma: sigma = max(sigma_floor, sigma_factor * |mu|).

Parameters:
  • n_neuron (int) – Target neuron count; must be <= current. None leaves unchanged.

  • l_max_4b (int) – Target 4-body l_max; must be <= current. Setting to 0 removes the 4-body angular descriptor block. Reducing to a lower non-zero value is a header-only change (descriptor dimensions unchanged). None leaves unchanged.

  • l_max_5b (int) – Same as l_max_4b but for five-body terms.

  • has_q_112 (bool) – False disables and removes the q_112 descriptor block. None leaves unchanged. True is not valid; use augment() instead.

  • has_q_123 (bool) – Same as has_q_112 but for the q_123 term.

  • has_q_233 (bool) – Same as has_q_112 but for the q_233 term.

  • has_q_134 (bool) – Same as has_q_112 but for the q_134 term.

  • charge_head (bool) – If True, remove the charge output head from a potential_with_charges model, converting it back to potential. Removes w1_charge per species and sqrt_epsilon_infinity from the restart.

  • sigma_factor (float) – Controls sigma for surviving parameters: sigma = max(sigma_floor, sigma_factor * |mu|).

  • sigma_floor (float) – Minimum sigma for surviving parameters.

Returns:

New model with reduced structure, weights, and restart statistics.

Return type:

Model

Raises:

ValueError – If restart_parameters is not loaded, if any target value would expand the model (use augment() instead), if a has_q_* flag is set to True, or if charge_head=True on a model without charges.

read_restart(filename)[source]#

Parses a file in nep.restart format and saves the content in the form of mean and standard deviation for each parameter in the corresponding NEP model.

Parameters:

filename (str) – Input file name.

remove_species(species, sigma_factor=0.1, sigma_floor=1e-06)[source]#

Remove one or more species from the model.

Returns a new Model with the specified species removed. The source model is not modified.

If restart_parameters are loaded, the surviving parameters receive adaptive SNES sigma values: sigma = max(sigma_floor, sigma_factor * |mu|), re-opening the search distribution while preserving dormant parameters.

Parameters:
  • species (list[str]) – Species names to remove.

  • sigma_factor (float) – Used only when restart is loaded: sigma = max(sigma_floor, sigma_factor * |mu|) for surviving parameters.

  • sigma_floor (float) – Minimum sigma for surviving parameters when restart is loaded.

Returns:

New model with the specified species removed.

Return type:

Model

Raises:

ValueError – If any of the provided species is not found in the model.

property training_parameters: dict#

write_nepfile <calorine.nep.write_nepfile>.

Use this after any model modification (augment(), add_species(), remove_species(), keep_species()) to produce the architecture fields that must go into the new nep.in before training. Merge the result with your existing training-specific parameters (lambda_*, generation, batch, etc.) before calling write_nepfile.

Returns:

Keys version, type, cutoff, n_max, basis_size, l_max, and neuron (plus zbl when applicable) with values in the format expected by write_nepfile.

Return type:

dict

Type:

Return model hyperparameters in the format accepted by

Type:

func

write(filename, restart_file=None)[source]#

Write NEP model to file in nep.txt format.

Parameters:
  • filename (str) – Output file name for the NEP model.

  • restart_file (str) – If provided, also write restart parameters to this file in nep.restart format. Defaults to None.

Return type:

None

write_restart(filename)[source]#

Write NEP restart parameters to file in nep.restart format.