Module for Ab Initio Structure Evolution (MAISE) features
* neural network-based description of interatomic interactions
* evolutionary optimization
* structure analysis
1. General info
2. Download and Installation
3. Input
4. Examples
5. Setup input tag description
MAISE has been developed by
Alexey Kolmogorov [email protected]
Samad Hajinazar [email protected]
Ernesto Sandoval [email protected]
Current version 2.9 works on Linux platforms and combines 3 modules for modeling, optimizing, and analyzing atomic structures.
1 The neural network (NN) module builds, tests, and uses NN models to describe interatomic interactions with near-ab initio accuracy at a low computational cost compared to density functional theory calculations.
With the primary goal of using NN models to accelerate structure search, the main function of the module is to relax given structures. To simplify the NN application and comparison, we closely matched the input and output file formats with those used in the VASP software. Previously parameterized NN models available in the 'models/' directory have been generated and extensively tested for crystalline and/or nanostructured materials. First practical applications of NNs include the prediction of new synthesizable Mg-Ca and M-Sn alloys [1-3] as well as identification of more stable Cu-Pd-Ag and Au nanoparticles [4,5].
Users can create their own NN models with MAISE which are typically trained on density functional theory (DFT) total energy and atomic force data for relatively small structures. The generation of relevant and diverse configurations is done separately with an 'evolutionary sampling' protocol detailed in our published work [6]. The code introduces a unique feature, 'stratified training', of how to build robust NNs for chemical systems with several elements [6]. NN models are developed in a hierarchical fashion, first for elements, then for binaries, and so on, which enables generation of reusable libraries for extended blocks in the periodic table.
2 The implemented evolutionary algorithm (EA) enables an efficient identification of ground state configurations at a given chemical composition. Our studies have shown that the EA is particularly advantageous in dealing with large structures when no experimental structural input is available [7,8].
The searches can be performed for 3D bulk crystals, 2D films, and 0D nanoparticles. Population of structures can be generated either randomly or predefined based on prior information. Essential operations are 'crossover', when a new configuration is created based on two parent structures in the previous generation, and 'mutation', when a parent structure is randomly distorted. For 0D nanoparticles we have introduced a multitribe evolutionary algorithm that allows an efficient simultaneous optimization of clusters in a specified size range [4].
3 The analysis functions include the comparison of structures based on
the radial distribution function (RDF), the determination of the space
group and the Wyckoff positions with an external SPGLIB package,
etc. In particular, the RDF-based structure dot product is essential
for eliminating duplicate structures in EA searches and selecting
different configurations in the pool of found low-energy structures.
[1] https://pubs.rsc.org/en/content/articlelanding/2018/cp/c8cp05314f#!divAbstract
[2] https://www.nature.com/articles/s41524-022-00825-4
[3] https://pubs.rsc.org/en/content/articlelanding/2023/cp/d3cp02817h/unauth
[4] https://pubs.rsc.org/en/content/articlelanding/2019/cp/c9cp00837c#!divAbstract
[5] https://pubs.acs.org/doi/10.1021/acs.jpcc.9b08517
[6] https://journals.aps.org/prb/abstract/10.1103/PhysRevB.95.014114
[7] https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.109.075501
[8] https://journals.aps.org/prb/abstract/10.1103/PhysRevB.98.085131
The source code for MAISE can be obtained from the commandline by running:
git clone git://github.com/maise-guide/maise.git
or
git clone https://github.com/maise-guide/maise.git
or
wget -O master.zip https://github.com/maise-guide/maise/archive/master.zip unzip master.zip
1 Use 'make --jobs' for full compilation. For recompilation, use 'make clean' to remove object files or 'make clean-all' to remove object files and external libraries.
2 During MAISE compilation, 'make --jobs' checks if two required external libraries, GSL library and SPGLIB v1.11.2.1, Feb 2019, are present. If not, they will be automatically downloaded to ./ext-dep and installed in ./lib on most systems.
3 If the GSL or SPGLIB installation is not completed automatically please compile them manually and copy (i) libgsl.a, libgslcblas.a and libsymspg.a into the './lib' subdirectory; (ii ) the 'spglib.h' header into './lib/include' subdirectory; and (iii) all gsl headers into the './lib/include/gsl' subdirectory.
4 A 'check' script is available in the './test/' directory which can be run after compiling the maise executable to ensure the proper functionality of the code. This script automatically checks for the performance of the code in parsing the data, training the neural network, and evaluating a crystal structure. If the compilation is fine the 'check' script will output so; otherwise error logs will be provided with further information about the issue.
The code has been extensively tested on Linux platforms. We will appreciate users' feedback on the installation and performance of the package on different platforms.
Main input files that define a simulation are 'setup' with job settings, 'model' with NN parameters, and 'POSCAR' with atomic structure parameters in the VASP format. Conversion of atomic environments into NN inputs during the parsing stage of NN development requires a 'basis' file that specifies Behler-Parrinello symmetry functions.
EVOS | NNET | CELL | |||||
SEARCH | EXAM | PARSE | TRAIN | TEST | SIMUL | EXAM | |
setup | + | + | + | + | + | + | |
model | +* | +# | +# | ||||
basis | + | $ | |||||
SPG | + | + | |||||
GSL | + | + | |||||
* for stratified training one needs to provide individual models $ 'basis' stored in the parsed directory is appended to 'model' at the end of the training # 'model' has 'basis' pasted at the end once training is finished |
The structure examination and manipulation functions are run by calling maise with a flag:
maise -flag
Flag | Flag Description |
---|---|
man | output the list of available flags |
rdf | compute and plot the RDF for POSCAR |
cxc | compute dot product for POSCAR0 and POSCAR1 using RDF |
cmp | compare RDF, space group, and volume of POSCAR0 and POSCAR1 |
spg | convert POSCAR into str.cif, CONV, PRIM |
cif | convert str.cif into CONV and PRIM |
dim | find whether POSCAR is periodic (3) or non-periodic (0) |
vol | compute volume per atom for crystal or nano structures |
rot | rotate a nanoparticle along eigenvectors of moments of inertia |
mov | move atoms along one direction by a constant shift |
box | reset the box size for nanoparticles |
sup | make a supercell specified by na x nb x nc |
eig | shift unit cell in POSCAR along a phonon eigenmode |
ord | order atoms by species |
out | extract snapshots from MD or relaxation trajectories in VASP |
Directory 'examples/' has samples of maise jobs for unit cell analysis and manipulation, parsing data, training neural networks, simulating structures with neural network models, evolutionary search for ground state wit neural network model, molecular dynamics run, and phonon calculation. Eash example has a README file, a setup file with only relevant tags for the particular job, and reference output files for comparison.
Main job type selector
Structure-enviroment
Main EVOS
EVOS operations
EVOS crossover/mutation
Molecular dynamics
Species related
I/O
General model
Neural Network model
Neural Network training
Parsing
Cell relaxation
JOBT
NMAX MMAX
CODE DENE KMSH LBOX NDIM NITR NNJB NPOP RAND RUNT SEED SITR TINI
BLOB CHOP INVS MATE MUTE PACK PLNT REFL RUBE SWAP TETR
ACRS ADST ELPS LCRS LDST MCRS SCRS SDST
CPLT CPLP ICMP DELT MOVI NSTP MDTP TMAX TMIN TSTP
ASPC NSPC TSPC
COUT DATA DEPO EVAL OTPT WDIR
NCMP NNGT NNNN NNNU NSYM
FMRK LREG NTRN NTST TEFS NPAR
EMAX FMAX FMIN VMAX VMIN MMAX
ETOL MINT MITR PGPA RLXT TIME
TAG | DESCRIPTION |
---|---|
JOBT | structure analysis (00) use analysis tools specified by flags, evolutionary search (10) run (11) soft exit (12) hard exit (13) analysis, cell simulation (20) relaxation (21) molecular dynamics (22) phonon calculations, data parsing (30) prepare inputs for NN training , NN training (40) full training (41) stratified training |
CODE | Type of the code in use. (0) MAISE-INT (1) VASP-EXT (2) MAISE-EXT |
NPAR | Number of cores for parallel NN training or cell simulation |
MINT | The optimizer algorithm for the neural network training and the cell optimization. (gsl minimizer type (0) BFGS2 (1) CG-FR (2) CG-PR (3) steepest descent |
MITR | Maximum number of the optimization steps; if the desired accuracy is not reached for NN training or cell optimization steps |
RLXT | Cell optimization type (2) force only (3) full cell (7) volume (ISIF in VASP) |
ETOL | Error tolerance for training or cell optimization convergence |
TEFS | Training target value (0) E (1) EF (2) ES (3) EFS (4) TOY |
FMRK | Fraction of atoms that will be parsed to use for EF or EFS trainings |
COUT | Output type in the OUTCAR file in cell evaluation and optimization |
NMAX | Maximum number of atoms in the unit cell |
MMAX | Maximum number of neighbors within the cutoff radius |
NSPC | Number of element types for evolutionary search, parsing the data and neural network training. |
TSPC | Atomic number of the elements specified with NSPC tag |
ASPC | Number of atoms of each element for the evolutionary search |
NSYM | Number of the Behler-Parrinello symmetry functions for parsing data using the "basis" file |
NCMP | The length of the input vector of the neural network |
NTRN | Number of structures used for neural network trainin (negative number means percentage) |
NTST | Number of structures used for neural network testing (negative number means percentage) |
NNNN | Number of hidden layers in the neural network (does not include input vector and output neuron) |
NNNU | Number of neurons in hidden layers |
NNGT | Activation function type for the hidden layers' neurons (0) linear (1) tanh |
EMAX | Parse only this fraction of lowest-energy structures. From 0 to 1 |
FMAX | Will not parse data with forces larger than this value |
VMIN | Will not parse data with volume/atom smaller than this value |
VMAX | Will not parse data with volume/atom larger than this value |
NDIM | Dimensionality of the unit cell in evolutionary search and cell optimization (3) crystal (2) film (0) particle |
LBOX | Box dimension for generating particles in evolutionary search in Angs (ignored for crystals) |
NPOP | Population size in the evolutionary search |
SITR | Starting iteration in the evolutionary search (0) start from random or specified structures |
NITR | Number of iterations in the evolutionary search (should be larger than SITR) |
TINI | Type of starting the evolutionary search when SITR=0 |
TIME | Maximum time for cell relaxation in evolutionary search and cell optimization |
PGPA | Pressure in GPa |
DENE | Store distinct structures generated in evolutionary search in POOL/ if within this energy/atom (eV/atom) window from the ground state |
KMSH | K-mesh density used for VASP-EVOS. Suggested values: 0.30 for s/c, 0.05 for metals |
SEED | Starting seed for the random number generator in evolutionary search (0) Uses time as seed (+) The seed value |
RAND | Starting seed for the parsing of the dataset. (0) Uses time as seed (+) The seed value (-) No randomization: structures are parsed in listing order |
TMIN | Minimum temperature in MD runs (K) |
TMAX | Maximum temperature in MD runs (K) |
TSTP | Temperature step in MD runs (K) in running form TMIN to TMAX |
DELT | Time step in the MD runs |
NSTP | Number of steps per temperature in MD runs |
CPLT | Coupling constant in Nose-Hoover thermostat for MD runs. Suggested: 25.0 |
CPLP | Coupling constant in Brendsen barostat for MD runs. Suggested: 100.0 |
ICMP | Isothermal compressibility in Brendsen barostat for MD runs (in 1/GPa) |
MOVI | Number of steps after which a snapshot of structure will be saved during the MD run |
MDTP | MD run type (10) NVE (20) NVT: Nose-Hoover (30) NPT: Nose-Hoover and Brendsen (40) Isobaric (11,21,31,41) runs with velocisities read in from POSCAR file |
DEPO | Path to the DFT datasets to be parsed |
DATA | Location of the parsed data to parse or read for training (will be overwritten during parsing) |
OTPT | Directory for storing model parameters in the training process |
EVAL | Directory for model testing data |
WDIR | Work directory for evolutionary search, MD runs, etc. |
TETR | Fraction of the structures generated randomly using tetris operation. From 0 to 1 |
PLNT | Fraction of the structures generated from seeds. From 0 to 1 |
PACK | Fraction of the structures generated from closed-pack structures. From 0 to 1 |
BLOB | Fraction of the structures generated randomly using blob shape. From 0 to 1 |
MATE | Fraction of the structures generated by crossover using two halves from each parent. From 0 to 1 |
SWAP | Fraction of the structures generated by crossover using core and shell of parents. From 0 to 1 |
RUBE | Fraction of the structures generated by Rubik's cube operation. From 0 to 1 |
REFL | Fraction of the structures generated by symmetrization via reflection. From 0 to 1 |
INVS | Fraction of the structures generated by symmetrization via inversion. From 0 to 1 |
CHOP | Fraction of the structures generated by chopping to make facets. From 0 to 1 |
MUTE | Fraction of the structures generated by random distortions to the structure. From 0 to 1 |
MCRS | 0.50 mutation rate in crossover |
SCRS | 0.00 crossover: swapping rate |
LCRS | 0.00 crossover: mutation strength for lattice vectors |
ACRS | 0.10 crossover: mutation strength for atomic positions |
SDST | 0.00 distortion: swapping rate |
LDST | 0.00 distortion: mutation strength for lattice vectors |
ADST | 0.20 distortion: mutation strength for atomic positions |
ELPS | 0.30 random: nanoparticle ellipticity |