gaddlemaps package
Subpackages
Module contents
gaddlemaps — Change molecules in simulations with python
This python package provides an implementation of the GADDLE-Maps (General Algorithm for Discrete Object Deformations Based on Local Exchange Maps) that allows to change molecules in a molecular dynamics simulations. For example, this tool is very handy to back-map a coarse grained simulation to atomistic forcefields and also in the other way around.
TODO: Complete this description
- class gaddlemaps.Alignment(start=None, end=None)[source]
Bases:
object
Class to manage the molecules alignment.
This class serve as interface during the molecular alignment process. It will take two molecules in different representations (e.g. coarse grained and atomistic), one of them will be taken as initial representation and the other as the final one. These molecules can be passed as arguments in the object initialization or assigned after its creation.
NOTE: Once both molecules are assigned you can only set them again with the same molecule type (although it may have different position).
Once the molecules are fixed you can call the “aling_molecules” method to find the optimum overlap between both representation. Then, you should call the “init_exchange_map” method to generate the map to change between resolution for other start molecules configurations. You can also call the “write_comparative_gro” method to write a .gro file with the two aligned molecules with different residue names to visualize the found overlap in other visualization programs.
- Parameters
- exchange_map
The map to change from initial resolution to the end one. It has to be initialized with the init_exchange_map method.
- Type
- STEPS_FACTOR
The factor to apply in the calculation of the number of steps in the Monte-Carlo alignment.
- Type
- Attributes
Methods
align_molecules
([restrictions, ...])Starts the alignment engine to find the optimal overlap between molecs.
init_exchange_map
([scale_factor])Initializes the exchange map with the current molecules configuration.
Creates the widget to visually generate the restrictions for alignment.
write_comparative_gro
([fname])Writes a .gro file with start and end molecules to check the overlap.
- SIGMA_SCALE = 0.5
- STEPS_FACTOR = 5000
- align_molecules(restrictions=None, deformation_types=None, ignore_hydrogens=True, auto_guess_protein_restrictions=True)[source]
Starts the alignment engine to find the optimal overlap between molecs.
NOTE: If None is input as restrictions and the molecules to align have multiple residues and also the auto_guess_protein_restrictions parameter is True, they will be guessed by residue matching (see guess_protein_restrains function). However take into account that the time taken by this step scales with the square of the number of atoms if no restrictions are given and linearly if they are guessed for all the atoms by residue matching.
If you know that the molecules in both resolution are in the same configuration you may want to perform the alignment just rotating and translating the molecules as a whole and avoid molecular deformation by setting the “deformation_types” parameter to (0, 1). This will translate in a better performance.
On the other hand, in the most of the cases, the hydrogen atoms are not very relevant atoms for the alignment so they are by default (see “ignore_hydrogens” parameter) not taken into account in the distance minimization process when they belong to the molecule with larger number of atoms (the one that will remain still in the process).
- Parameters
restrictions (list of tuple of int, optional) –
A list of tuples with pairs of atom numbers corresponding to start and end atoms indexes in the molecules. The align will be performed privileging configurations where those atoms are close. By default is set to [].
Example: >>> restrictions = [(1, 3), (4, 5)]
IMPORTANT INDEX ARE REFERENCED TO THE ATOM INDEX IN THE MOLECULE (STARTS IN 0).
deformation_types (tuple of int, optional) –
- Specifies the type of the minimization. Possible options:
0 : Translation 1 : Rotation 2 : Individual atom move
If it is None, all the possibilities are chosen.
ignore_hydrogens (bool, optional) – If True, hydrogen atoms will not be included in the minimization of the distances. This will be only applied to the molecule which is not moved in the alignment engine.
auto_guess_protein_restrictions (bool, optional) – If True automatic restrictions will try to be guessed if the molecules to be aligned have multiple residues.
- Raises
IOError – If the automatic restrictions guess failed. In that case, consider to set “auto_guess_protein_restrictions” parameter to False.
- init_exchange_map(scale_factor=0.5)[source]
Initializes the exchange map with the current molecules configuration.
- Parameters
scale_factor (float) – The compression factor to apply to mapped molecules (see the ExchangeMap class documentation for more information).
- interactive_restrictions()[source]
Creates the widget to visually generate the restrictions for alignment.
- Return type
- Returns
restriction_widget (ipywidgets.Widget) – The widget that contains the constraint generator.
restrictions (List[Tuple[int, int]]) – The restrictions that will be generated by the widget. This will be initially empty and it will be filled as the widget is used.
- write_comparative_gro(fname=None)[source]
Writes a .gro file with start and end molecules to check the overlap.
The molecules have START and END residue names respectively to ease the representation.
- Parameters
fname (string, optional) – The output .gro file name. If it is None, fname is set as: {name}_compare.gro, where name is the name of the molecule.
- class gaddlemaps.Chi2Calculator(mol1, mol2, restrictions=None)[source]
Bases:
object
Functor that calculates the chi2 between 2 positions arrays.
This class is initialized with the restrictions that must be taken into account. This avoid repeating array accessing. Once the object is initialized it can be called with new atomic positions to calculate the new chi2 value associated with the distance between set of points. This distance is calculated quite differently depending on the given restrictions. In case of no restrictions see “chi2_molecules” method. If some restrictions are given the distances between atoms is calculated between of pairs of atoms specified in the restrictions instead of between closest atoms.
- Parameters
mol1 (numpy.ndarray((N, 3))) – Array with the atomic positions of the molecule that will be still. These positions will remain constant in future object call.
mol2 (numpy.ndarray((N, 3))) – Array with the atomic positions of the molecule that will be changing during the optimum overlap finding process.
restriction (numpy.ndarray((N, 2)) or array convertible, optional) –
A list of tuples with pairs of atom numbers corresponding to mol1 and mol2 atoms molecules. The align will be performed privileging configurations where those atoms are close. By default is set to [].
Example
>>> restrictions = [(1, 3), (4, 5)]
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM INDEX IN THE MOLECULE
Methods
__call__
(mol2)Call self as a function.
chi2_molecules
(mol2)Computes the chi2 by calculating the distance between molecules.
- chi2_molecules(mol2)[source]
Computes the chi2 by calculating the distance between molecules.
This function computes a chi2 based on the distance between nearest atoms of two molecules. Basically sums the distance between the atoms of mol2 and the nearest atom of mol1.
- Parameters
mol2 (numpy.ndarray((N,3))) – Position of the second molecule atoms.
- Returns
chi2 – Computed chi2.
- Return type
- class gaddlemaps.ExchangeMap(refmolecule, targetmolecule, scale_factor=0.5)[source]
Bases:
object
Functor to extrapolate atomic resolution to other configurations.
When this class is initialized, a functor is created. It has to be initialized with the molecules in the initial and final resolution overlapped. Then you can call this method with new molecules in the initial resolution to obtain its representation in the final one. The new coordinates of the extrapolated molecules are scaled by the “scale_factor”. This factor should be smaller than one if you want to extrapolate a complete system for future simulations. This avoids molecular overlapping and prevents the simulations to crash.
- Parameters
Examples
>>> BmimCG_ref = Molecule(fgro, fitp) >>> BmimAA_target = Molecule(groAA, itpAA)
>>> transformation = ExchangeMap(BmimCG_ref, BmimAA_target)
>>> BmimCG_new = Molecule(fgro2, fitp) >>> BmimAA_new = transformation(BmimCG_new)
- Attributes
equivalences
dict of int to list of int : {r1_atom_index: [closest_r2_atoms_indexs]}
Methods
__call__
(refmolecule)This function takes as argument a molecule like the refmolecule, but in other position and returns its targetmolecule equivalent as a new molecule with the same res_number than the input.
- class gaddlemaps.Manager(system)[source]
Bases:
object
Class to manage the mapping process of a simulation system.
This class has methods to allow you to change the resolution of a simulation from the .gro file of the system, the .itps of the molecules you want to map (avoid the mapping of solvent molecules as you can resolvate the system once it is mapped), and one .gro and one .itp for the molecules to map in the final resolution.
This class has to be initialized with a System object but it can also be initialized with the simulation files through self.from_files method. Then you should specified the molecules in the final resolution using the “add_end_molecules” method. This method will attach the molecules in the final resolution with the corresponding one in the initial resolution looking to pair of molecules that have the same name. If your molecules have different names you can attach them manually by accessing the molecule_correspondence attribute and setting the end attribute of the Alignemnt object associated to the molecule you want to map. For example, say you started a Manager with a system with POPC molecules and you want to replace them with other molecules (VTE):
Example: >>> vet_molecule = Molecule(vte_gro, vte_itp) >>> manager = Manager(System) >>> manager.molecule_correspondence[‘POPC’].end = vte_molecule
Once you have set the molecules in the final resolution you can call the “align_molecules” method toe find the optimum overlap between molecules in both resolution. Then you have to calculate the exchange maps that will be used to extrapolate the found overlap to the rest of molecular configuration in the system. This can be done calling the “calculate_exchange_maps” method. Finally, you can call the “extrapolate_system” method to write a .gro file with the system but now with the molecules in the desired final resolution.
- Parameters
system (System) – The simulation system to be mapped. It has to be an instance of System from system_components module.
- molecule_correspondence
A dictionary with the name of the loaded molecules as keys and Alignment objects as value.
- Type
dict of str: Alignment
- Attributes
complete_correspondence
dict of str: Alignment
Methods
add_end_molecule
(molecule)Add a new molecule in the end resolution to the correct Alignment.
add_end_molecules
(*molecules)Add multiple molecules at once.
align_molecules
([restrictions, ...])Starts the alignment engine to find the optimal overlap between molecules
calculate_exchange_maps
([scale_factor])Runs the alignment engine and calculate the exchange maps.
extrapolate_system
(fgro_out)Loops over the molecules in self.system and applies the exchange map.
from_files
(f_system_gro, *ftops)Build the object using the system .gro file and molecules topologies.
interactive_restrictions
([style])Creates the widget to generate the restrictions of all the species in the alignment.
parse_restrictions
([restrictions, ...])Checks the format and validates of the restrictions for the alignment.
- add_end_molecule(molecule)[source]
Add a new molecule in the end resolution to the correct Alignment.
- Parameters
molecule (Molecule) – The molecule in the end resolution.
- Raises
KeyError – If the molecule is not found in the system.
TypeError – If the molecule is not instance of Molecule
ValueError – If the molecule does not match with the start.
- add_end_molecules(*molecules)[source]
Add multiple molecules at once.
- Parameters
*molecule (Molecule) – The molecules in the end resolution.
- Raises
KeyError – If the molecule is not found in the system.
TypeError – If the molecule is not instance of Molecule
ValueError – If the molecule does not match with the start.
- align_molecules(restrictions=None, deformation_types=None, ignore_hydrogens=None, parse_restrictions=True)[source]
Starts the alignment engine to find the optimal overlap between molecules
- Parameters
restrictions (dict of str: list of tuple of int, optional) –
A dictionary with the molecules names as keys and a list of tuples with pairs of atom numbers corresponding to start and end atoms molecules. The align will be performed privileging configurations where those atoms are close. By default, restrictions will be set to [] for every molecule.
Example
>>> restrictions = [(1, 3), (4, 5)]
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM NUMBER IN THE .itp FILE
Note
If the restrictions were previously parsed, the parse_restrictions option must be set to False to avoid to parse the restrictions two times.
deformation_types (dict of str: tuple of int, optional) –
A dictionary with the molecules names as keys and the value specifies the type of the minimization. Possible options:
0 : Translation 1 : Rotation 2 : Individual atom move
If it is None, all the possibilities are chosen for every molecule.
ignore_hydrogens (dict of str: bool, optional) – A dictionary with the molecules names as keys and a bool as value. If True, hydrogen atoms will not be included in the minimization of the distances. This will be only applied to the molecule which is not moved in the alignment engine. If it is None, it will be set as True for every molecule.
parallel (Bool, optional) – Not implemented. In the future will run the alignment for each molecule in parallel.
- calculate_exchange_maps(scale_factor=0.5)[source]
Runs the alignment engine and calculate the exchange maps.
- Parameters
scale_factor (float, optional) – The compression factor to apply to mapped molecules.
- property complete_correspondence: Dict[str, Alignment]
Alignment A dictionary with the name of the loaded molecules as keys and Alignment objects as value if it has start and end init.
- Type
dict of str
- extrapolate_system(fgro_out)[source]
Loops over the molecules in self.system and applies the exchange map.
- Parameters
fgro_out (string) – Gro file name to save the system in the final resolution.
- Raises
SystemError – If the exchange maps are not initialized.
- classmethod from_files(f_system_gro, *ftops)[source]
Build the object using the system .gro file and molecules topologies.
- interactive_restrictions(style=None)[source]
Creates the widget to generate the restrictions of all the species in the alignment. It generates the final representation for the widget.
- Parameters
style (Optional[int]) –
An integer that determine which style will be used to represent the widget for each specie.
0: One tab per specie. 1: Accordion, when one specie opens the other collapse 2: Vertically aligned, one over the other
The default value is 2. This is the only one fully operational, in the other ones it is necessary to manually refresh the widget in the notebook when changing between species.
- Return type
- Returns
restriction_widget (ipywidgets.Widget) – The widget that contains the constraint generator for all the species
restrictions (Dict[str, List[Tuple[int, int]]]) – The dictionary with the restrictions that will be generated by the widget for each specie. This will be initially empty and it will be filled as the widget is used.
- parse_restrictions(restrictions=None, guess_proteins=False)[source]
Checks the format and validates of the restrictions for the alignment.
- Parameters
restrictions (dict of str: list of tuple of int.) –
A dictionary with the molecules names as keys and a list of tuples with pairs of atom numbers corresponding to start and end atoms molecules. The align will be performed privileging configurations where those atoms are close. By default, restrictions will be set to [] for every molecule.
Example
>>> restrictions = [(1, 3), (4, 5)]
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM NUMBER IN THE .itp FILE (IT USUALLY STARTS IN 1).
guess_proteins (bool, optional) – If True, restriction for proteins with more than 3 residues will be guessed using “guess_protein_restrain” function. This will overwrite the input restrains. Default False.
- Returns
new_restrictions – The validated restrictions.
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM INDEX IN THE MOLECULE (STARTS IN 0).
- Return type
dict of str: list of tuple of int.
- Raises
KeyError – If a molecule name is not in the system.
ValueError – If the format is wrong or the index are not in the molecules.
- gaddlemaps.accept_metropolis(energy_0, energy_1, acceptance=0.01)[source]
Evaluates if the new configuration with energy_1 is accepted or not.
This function returns True if the configuration with energy_1 is accepted and False otherwise. If energy_1 <= energy_0 the configuration is accepted. Else, it will be also accepted if a random number between 0 and 1 is smaller than acceptance*energy_0/energy_1.
- Parameters
- Returns
change – True if the change is accepted and False if not.
- Return type
Bool
- gaddlemaps.calcule_base(pos)[source]
Calculates a orthonormal base from a list of 3 atoms.
Given a list of three vectors with the position of three atoms, this function returns a vector basis and the application point. The first vector goes from the application point to the last atom. The second one is normal to the plane which contains the three atoms and perpendicular to the first vector. The last is perpendicular to the others. All are unitary forming an orthonormal basis. In case of collinear points, the second vector is set to ([vec1[1], -vec1[0], 0]).
- Parameters
pos (list) – List with three vectors (numpy.ndarray) with the position of the atoms.
- Return type
- Returns
base (tuple of numpy.ndarray) – A tuple with the three vectors of the base.
app_point (float) – The application point of the base. It corresponds to pos[0]
- gaddlemaps.find_atom_random_displ(atoms_pos, bonds_info, atom_index, sigma_scale=0.5)[source]
Finds a random displacement for the atom with a given index.
This displacement is chosen in a perpendicular direction according to the number of bonded atoms.
- Parameters
atoms_pos (numpy.ndarray) – An array with the positions of the molecule atoms in rows.
bonds_info (dictionary) –
- A dict with the information of the bonds. Example:
bonds_info = {0:[(1, 5.4), (2, 6.4)], 1:[(0, 5.4), ], …}
The keys refers to atom index and values are lists with tuples. Each tuple contains the bonded atom index and the bond length.
atom_index (integer (Optional)) – The index of the atom to move (respecting the index of atoms_pos). If None is given a random one is taken.
sigma_scale (float) – A factor to scale the sigma of the distribution of the displacement module.
- Returns
displ – The displacement vector to sum to the position of the interest atom.
- Return type
- gaddlemaps.guess_protein_restrains(mol1, mol2)[source]
Guess restrains for molecules with multiple residues.
Checks if mol1 and mol2 have the same residue names and create restrains pairing atoms with the same residues. This function only works with molecules with multiple residues.
- Parameters
- Returns
restrains – A list of tuples with pairs of atom numbers corresponding to start and end atoms molecules. The align will be performed privileging configurations where those atoms are close. By default is set to [].
Example
>>> restrictions = [(1, 3), (4, 5)]
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM INDEX IN THE MOLECULE (STARTS IN 0).
- Return type
list of tuple of int
- Raises
IOError – If the input molecules have different residue names.
- gaddlemaps.guess_residue_restrains(res1, res2, offset1=0, offset2=0)[source]
Guess restrains for molecules with one residue.
- Parameters
- Returns
restrains – A list of tuples with pairs of atom numbers corresponding to start and end atoms residues. The align will be performed privileging configurations where those atoms are close. By default is set to [].
Example
>>> restrictions = [(1, 3), (4, 5)]
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM INDEX IN THE MOLECULE (STARTS IN 0).
- Return type
list of tuple of int
- gaddlemaps.interactive_restrictions(correspondence, style=None)[source]
Creates the widget to generate the restrictions of all the species in the alignment. It generates the final representation fo the widget.
- Parameters
manager (Manager) – The object that manages all the alignment process
style (Optional[int]) –
An integer that determine which style will be used to represent the widget for each specie.
0: One tab per specie. 1: Accordion, when one specie opens the other collapse 2: Vertically aligned, one over the other
The default value is 2. This is the only one fully operational, in the other ones it is necessary to manually refresh the widget in the notebook when changing between species.
- Return type
- Returns
restriction_widget (ipywidgets.Widget) – The widget that contains the constraint generator for all the species
restrictions (Dict[str, List[Tuple[int, int]]]) – The dictionary with the restrictions that will be generated by the widget for each specie. This will be initially empty and it will be filled as the widget is used.
- gaddlemaps.minimize_molecules(mol1_positions, mol2_positions, mol2_com, sigma_scale, n_steps, restriction, mol2_bonds_info, displacement_module, sim_type)[source]
Minimizes the distance between two molecules.
- Parameters
mol1_positions (np.array((N, 3))) – The positions of the atoms of the static molecule.
mol2_positions (np.array((N, 3))) – The positions of the atoms of the mobile molecule.
mol2_com (np.array(3)) – The center of mass (or the geometric center) of the mobile molecule. This will be used to apply de rotations.
sigma_scale (float) – A number that module the amplitude of the single atom displacements.
n_steps (int) – The number of steps without changes in the chi2 in the Monte-Carlo alignment.
restriction (list of tuple of int) –
A list of tuples with pairs of atom numbers corresponding to mol1 and mol2 atoms molecules. The align will be performed privileging configurations where those atoms are close.
Example
>>> restrictions = [(1, 3), (4, 5)]
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM INDEX IN THE MOLECULE (STARTS IN 0).
mol2_bonds_info (defaultdict of int: list of tuple(int, float)) – A complex data structure that collect the information of the bond distances. The key of the property corresponds to the atom index in the molecule. The value is a list with tuples. For each tuple, the first value corresponds with the index of the bonded atom and the second is the length of the bond.
same_com (bool) – If True, translations are not allowed.
anchura (float) – A number that modules the amplitude in the translations displacements.
sim_type (tuple of int) –
- Specifies the type of the minimization. Possible options:
0 : Translation 1 : Rotation 2 : Individual atom move
If it is a list, all specified methods are combined.
- gaddlemaps.move_mol_atom(atoms_pos, bonds_info, atom_index=None, displ=None, sigma_scale=0.5)[source]
Moves an atom of a molecule respecting almost all bond distances.
By default, a random atom is picked from atom_pos and moved randomly in a certain direction (see the published article for a better description of this step).
- Parameters
atoms_pos (numpy.ndarray) – An array with the positions of the molecule atoms in rows.
bonds_info (dictionary) –
- A dict with the information of the bonds. Example:
bonds_info = {0:[(1, 5.4), (2, 6.4)], 1:[(0, 5.4), ], …}
The keys refers to atom index and values are lists with tuples. Each tuple contains the bonded atom index and the bond length.
atom_index (integer (Optional)) – The index of the atom to move (respecting the index of atoms_pos). If None is given a random one is taken.
displ (numpy.ndarray (Optional)) – The displacement vector. If None is given a random displacement is calculated in the normal plane to the line jointing most nearest atoms.
sigma_scale (float) – A factor to scale the sigma of the distribution of the displacement module.
- Returns
modified_atoms_pos – An array with the modified positions of the atoms in rows.
- Return type
- gaddlemaps.remove_hydrogens(molecule, restrictions)[source]
Returns positions of atoms that are not hydrogens and fixes restrictions.
- Parameters
molecule (Molecule) – The molecule to remove hydrogens.
restrictions (list of tuple of int) –
A list of tuples with pairs of atom numbers corresponding to start and end atoms molecules. The align will be performed privileging configurations where those atoms are close. By default is set to [].
Example
>>> restrictions = [(1, 3), (4, 5)]
IMPORTANT: INDEX ARE REFERENCED TO THE ATOM INDEX IN THE MOLECULE
- Return type
- Returns
new_positions (numpy.ndarray) – The positions of the atoms that are not hydrogens.
new_restrictions (list of tuple of int) – The restrictions with the correct indexes.