Transferable biomolecular force fields are developed by fitting either ab initio or experimental data related to representative molecules and can then be used to model chemical entities that are similar to the ones they were developed for. However, once parametrized on a given dataset, they are difficult to refit once new chemical entities, sensing schemes, or functional forms are introduced. On the other hand, Machine Learning Force Fields (MLFF) have recently gained attention for their accuracy and ease of expanding their Applicability Domain (AD). Nonetheless, their prediction times make them incompatible with High-Throughput Virtual Screening (HTVS) requirements. In this work, we follow the inverse of the widely adopted approach with transferable force fields and propose a new condensation approach that takes advantage of machine learning algorithms to massively predict force field parameters. The generated numerical distributions are then condensed in a single value that captures in a statistical way the chemical variability of the underlying molecules sharing that specific force field parameter and giving rise to the distribution itself, improving 30x computational efficiency with limited reduction in predicted molecular geometries accuracy. When tested on the public release of the OpenFF Industry Benchmark Season 1 v1.1 dataset, the molecular structures optimized by minimizing the Potential Energy Surface built with condensed FF parameters only show a minor decrease in Root Mean Squared Deviation (RMSD) and Torsion Fingerprint Deviations (TFD) performances compared to those obtained using molecule-specific FF parameters predicted at runtime. To give more context, the original MLFF and its condensed version are evaluated with respect to several well-known transferable force fields widely used for biomolecular simulations.

Condensation of Force Field Parameters from Machine Learning Predicted Distributions for High-Throughput Virtual Screening Applications

Zhang, Yuedong;Gadioli, Davide;Palermo, Gianluca
2025-01-01

Abstract

Transferable biomolecular force fields are developed by fitting either ab initio or experimental data related to representative molecules and can then be used to model chemical entities that are similar to the ones they were developed for. However, once parametrized on a given dataset, they are difficult to refit once new chemical entities, sensing schemes, or functional forms are introduced. On the other hand, Machine Learning Force Fields (MLFF) have recently gained attention for their accuracy and ease of expanding their Applicability Domain (AD). Nonetheless, their prediction times make them incompatible with High-Throughput Virtual Screening (HTVS) requirements. In this work, we follow the inverse of the widely adopted approach with transferable force fields and propose a new condensation approach that takes advantage of machine learning algorithms to massively predict force field parameters. The generated numerical distributions are then condensed in a single value that captures in a statistical way the chemical variability of the underlying molecules sharing that specific force field parameter and giving rise to the distribution itself, improving 30x computational efficiency with limited reduction in predicted molecular geometries accuracy. When tested on the public release of the OpenFF Industry Benchmark Season 1 v1.1 dataset, the molecular structures optimized by minimizing the Potential Energy Surface built with condensed FF parameters only show a minor decrease in Root Mean Squared Deviation (RMSD) and Torsion Fingerprint Deviations (TFD) performances compared to those obtained using molecule-specific FF parameters predicted at runtime. To give more context, the original MLFF and its condensed version are evaluated with respect to several well-known transferable force fields widely used for biomolecular simulations.
2025
Force Fields, MLFF, High Troughput Virtual Screening, Drug Discovery, HPC, Molecular Modeling
File in questo prodotto:
File Dimensione Formato  
condensation-of-force-field-parameters-from-machine-learning-predicted-distributions-for-high-throughput-virtual.pdf

accesso aperto

: Publisher’s version
Dimensione 3.26 MB
Formato Adobe PDF
3.26 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1303269
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact