GROMACS Cluster Analysis Code Tutorial

Introductions about gmx_mmpbsa

GROMACS Cluster Analysis Code Tutorial

What is cluster analysis?

Cluster analysis or clustering is the main task of exploratory data mining and a common technique for statistical data analysis. It has been used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics, and machine learning. Cluster analysis itself is not a specific algorithm, but the general task to be solved usually needs to modify the data preprocessing and model parameters until the results get the required attributes.

Tutorial

The knowledge involved in writing the GROMACS analysis program is explained in the first two documents, but for those who are not familiar with C++, it is still difficult to read and understand. So here is the specific method for reference.

1. Define the program to be added in

src\gromacs\gmxana\gmx_ana.h

2. int gmx_mdcluster(int argc, char *argv[]);

3. Register the module in // Modules from gmx_ana.h. in src\programs\legacymodules.cpp, so that it can be called in the gmx main program.

RegisterModule(manager, &gmx_mdcluster, "mdcluster", "md Cluster structures");

4. Combine the two files src\gromacs\gmxana\gmx_mdmat.cpp and src\gromacs\gmxana\gmx_cluster.cpp and name them mdcluster.cpp

5. Organize the header file of mdcluster.cpp and modify the name of the main function to mdcluster.

6. Try to compile. If you encounter a function redefinition error, modify the function name. Until the compilation is passed, the execution of gmx mdcluster succeeds.

7. Write the required functions according to your needs. The main thing is to understand the distance matrix output by the mdmat function and call it in the cluster function.

8. If necessary, add other required functions. If necessary, refer to other analysis programs that come with GROMACS.

Code

Instructions

gmx mdcluster -f -s -n

Options

-f: default traj.trr, the trajectory file to be analyzed

-s: default topol.tprrun input file

-n: default index.ndx, optional index file

-g: default cluster.log, output file, containing cluster information at each moment

-unm: default num.xvg, optional output xvg file, the number of clusters at each moment

-xyz: default cluster-xyz.pdb, optional output coordinate file in xyz format, lists the coordinates of each cluster. The extension is pdb, because GROMACS does not support the specified output file in xyz format, it can only be used pdb instead.

Other options related to cluster analysis have not changed, refer to the cluster documentation.

Note

  • There is only one cutoff distance used in cluster analysis, which may not be suitable for complex systems. It can be extended to use different cutoff values for different atoms.
  • It may be better to use the pdb format for the output configuration, although the file is larger.
  • Alternative cluster analysis algorithms can be used.
* For Research Use Only.
Inquiry