DigFrag as a digital fragmentation method used for artificial intelligence-based drug design

Yang, Ruoqi; Zhou, Hao; Wang, Fan; Yang, Guangfu

doi:10.1038/s42004-024-01346-5

Download PDF

Article
Open access
Published: 11 November 2024

DigFrag as a digital fragmentation method used for artificial intelligence-based drug design

Communications Chemistry volume 7, Article number: 258 (2024) Cite this article

1545 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Fragment-Based Drug Design (FBDD) plays a pivotal role in the field of drug discovery and development. The construction of high-quality fragment libraries is a critical step in FBDD. Conventional fragmentation approaches often rely on rigid rules and chemical intuition, limiting their adaptability to diverse molecular structures. The rapid development of Artificial Intelligence (AI) technology offers a transformative opportunity to rethink traditional methods. Here, we present DigFrag, a digital fragmentation method that highlights important substructures by focusing locally within the molecular graph. In addition, we feed the fragments segmented by machine intelligence and human expertise into the deep generative model to compare the preference for data from different sources. Experimental results show that the structural diversity of fragments segmented by DigFrag is higher, and more desirable compounds are generated based on these fragments. These results also demonstrate that data generated based on AI methods may be more suitable for AI models. Moreover, a user-friendly platform called MolFrag (https://dpai.ccnu.edu.cn/MolFrag/) is developed based on various fragmentation techniques to support molecular segmentation.

Molecular fragmentation as a crucial step in the AI-based drug development pathway

Article Open access 01 February 2024

A deep generative model for molecule optimization via one fragment modification

Article 09 December 2021

Structure-based drug design with equivariant diffusion models

Article Open access 09 December 2024

Introduction

Fragment-based drug design (FBDD) is the process of screening for favorable fragments and combining them to create drug molecules^1,2. Compared to traditional high-throughput screening methods, FBDD allows the exploration of a much broader chemical space and the discovery of active compounds with a higher hit rate³. A high-quality fragment library is a prerequisite for FBDD. Typically, the segmentation of a molecule in accordance with predefined guidelines stands as a highly effective means of obtaining these structural fragments, and is of paramount importance in the field of FBDD⁴. However, it is noteworthy that few innovative studies on fragmentation methods have been reported. Conventional fragmentation methods, exemplified by the likes of RECAP⁵ and BRICS⁶, predominantly hinge on the principles of retrosynthesis, where a set of rules govern the breaking of chemical bonds, often emphasizing the selective cleavage of acyclic bonds. As eloquently elucidated by Diao et al., this restriction has its rationale, but it greatly reduces the novelty of the obtained fragments, especially for macrocyclic compounds^7,8. Furthermore, these methods do not allow a reasonable selection of fragment size. To elaborate, the diminution of fragment size to an extreme extent may inadvertently disrupt the pharmacophores, while an excessively large fragment may not efficiently extract smaller substituents⁹.

Over the past decade, artificial intelligence (AI) technologies have achieved great success in many fields. With the increasing abundance of biomedical data, many AI-driven predictive models have been introduced into the field of drug discovery and development, including convolutional neural networks (CNNs)¹⁰, recurrent neural networks (RNNs)¹¹, and graph neural networks (GNNs)¹². In particular, GNNs consider the input molecules as graphs with attributes, and obtain graph embeddings through the three steps of message passing, aggregation, and updating¹³. Studies have shown that the attention mechanism serves to improve both the efficiency and accuracy of the model by directing its focus towards feature information that is more relevant to the given task^14,15. Combining attention mechanisms with GNNs can better reveal the relative importance of structural fragments within a compound.

Typically, chemists with extensive expertise segment molecules based on chemical rules. In contrast, machine intelligence-oriented perspective, while lacking domain knowledge, provide a distinctive paradigm by segmenting molecules based on mathematical logic (i.e., statistics). Through a comparative analysis of these fragmentation methods, our investigation illuminates that AI-based models segments unique fragments that are feasible in the virtual world but may not be applicable in the real world. This discrepancy between reality and virtuality underscores the intrinsic advantage of AI-based methods, as they enable the exploration of uncharted territories and overcome the limitations of traditional methods. Consequently, the potential of AI to transcend the boundaries of established chemical paradigms becomes manifest, opening frontiers in the field of molecular segmentation.

On this basis, we propose DigFrag, a digital fragmentation method based on the graph attention mechanism, which gets rid of the constraints of traditional chemical rules and segments drug/pesticide molecules with attention weights (i.e., contribution to the prediction task) oriented to ultimately obtain the drug/pesticide-like fragments from the perspective of machine intelligence rather than human expertise. Comparison results show that the structural diversity of the fragments segmented by DigFrag is higher. In addition, the application of these fragments to deep generative models yields desirable compounds, highlighting the potential of DigFrag in FBDD. The scheme of this work is depicted in Fig. 1. We believe that DigFrag is a more applicable fragmentation method for AI drug discovery.

**Fig. 1: The workflow of this study is divided into three parts.**

Results

Comparison of the property distributions of segmented fragments

Prior to comparison, we trained the models to segment drug/pesticide fragments. The results of the five-fold cross-validation are presented in Supplementary Table 1. It can be seen that these models demonstrated excellent and robust performance across the three key metrics, with accuracy, area under the curve (AUC), and Matthews correlation coefficient (MCC) exceeding 0.90, 0.96, and 0.80, respectively.

Next, we conducted an in-depth evaluation of the fragments obtained by the proposed method (DigFrag), the conventional methods (RECAP⁵ and BRICS⁶), and the latest method (MacFrag⁸), focusing on the property distributions as a critical metric of comparison. As shown in Table 1 and Supplementary Fig. 1, for drug fragments, the DigFrag fragments were much closer to the BRICS fragments in terms of property distributions. For example, the differences in molecular weight and the number of hydrogen bond acceptors between the two groups of fragments were not significant. Interestingly, we found that the DigFrag fragments had a significantly higher number of rotatable bonds compared to the BRICS fragments, which may be attributed to their unique cleavage of cyclic bonds. However, for pesticide fragments, the properties of the DigFrag fragments were distributed in lower regions (Table 2 and Supplementary Fig. 2). Taking molecular weight as an example, the average molecular weight of DigFrag fragments was 137.07, signifying a substantial reduction in comparison with the other three fragments. It should be noted that all properties of the DigFrag fragments showed a similar trend, whether they were drugs or pesticides. In addition, we calculated the number of fragments obtained following the segmentation of each drug/pesticide molecule by the four methods. As shown in Supplementary Fig. 3, none of these methods, except MacFrag, segmented the molecule into more than 20 fragments. In terms of distributions, DigFrag displayed greater similarity to BRICS, and both methods tended to segment smaller fragments.

Table 1 Average properties of drug fragments segmented by the four methods (BRICS, RECAP, MacFrag and DigFrag)

Full size table

Table 2 Average properties of pesticide fragments segmented by the four methods (BRICS, RECAP, MacFrag and DigFrag)

Full size table

Comparison of the structural diversity of segmented fragments

We then performed a rigorous analysis of the four methods, with a primary focus on the structural diversity of the segmented fragments. As shown in Supplementary Table 2, our findings demonstrated a remarkable disparity. Specifically, for both drugs and pesticides, the number of duplicates between the fragments segmented by the DigFrag method and the other three methods was low (9.97–21.37% for drugs and 8.94–15.20% for pesticides). In contrast, the fragments segmented by the MacFrag method almost covered the BRICS fragments (96.42% for drugs and 99.05% for pesticides) and the RECAP fragments (76.59% for drugs and 76.31% for pesticides). This suggests that the MacFrag method is only an extension of the conventional methods, while the proposed DigFrag method is able to obtain more unique fragments. In addition, we counted the occurrence numbers of each fragment in the four fragmentation methods and visualized their spatial distribution using the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm. As illustrated in Supplementary Figs. 4–5, over 75% of the drug/pesticide fragments occurred in only one method. The proportion of drug/pesticide fragments shared by two methods (mainly MacFrag) was approximately 20%, while less than 5% of the remaining fragments occurred in three or four methods. It is noteworthy that despite the small number of fragments shared by the four methods, these had a more dispersed distribution in chemical space compared to the fragments shared by the three methods.

To provide a more comprehensive insight into the assessment of structural diversity, we employed the fragment cluster ratio as a quantitative measure, which provides a more intuitive reflection of the overall structural diversity within the fragment sets. As shown in Fig. 2 and Supplementary Tables 3–4, the DigFrag method consistently yielded higher fragment cluster ratios than other methods. In particular, for the most commonly used similarity thresholds (0.4 and 0.6), the fragments segmented by the DigFrag method exhibited greater structural diversity.

**Fig. 2: Fragment cluster ratio of drug fragments and pesticide fragments at different Tanimoto similarity thresholds.**

Comparison of the qualities of generated molecules

Subsequently, we conducted a comparative analysis of generative models utilizing distinct sets of fragments on the MOSES benchmarking platform¹⁶. As shown in Tables 3–4, the drug molecules generated by the MacFrag-based model achieved optimal results in terms of uniqueness and diversity, but the Filters score was significantly lower than that of the other models, which suggests that these compounds are unsatisfactory in terms of safety. In contrast, the Filters score of the drug molecules generated by the DigFrag-based model was 0.828, indicating the improved safety of compounds modified by digital fragments compared to other fragments. We speculate that this observed superiority may be attributed to the inherent capacity of deep learning models to incorporate considerations of toxicity risk, stability, and other pertinent factors during the fragmentation process, which is not possible with the other three methods. Similarly, except for a slightly higher proportion of duplications, the pesticide molecules generated by the DigFrag-based model achieved superior performance in terms of SMILES validity, novelty, scaffold diversity, and structure alerts. Moreover, an examination of the average Quantitative Estimation of Drug-likeness (QED)¹⁷ and Synthetic Accessibility (SA)¹⁸ revealed that drug/pesticide molecules generated by the DigFrag-based model outperformed those generated by the other models, as illustrated in Fig. 3. To quantitatively compare the similarity between the property distributions of the generated molecules and the MOSES dataset, the Wasserstein-1 distance provided by the platform was utilized. As shown in Supplementary Tables 5–6, the drug/pesticide molecules generated by the DigFrag model exhibited the highest similarity in terms of three property distributions (molecular weight, QED, and SA). Taken together, these results demonstrate that FBDD employing DigFrag fragments yields molecules of higher quality. Furthermore, the findings suggest a preference within AI-based models for AI-sourced data, reinforcing the potential advantages of leveraging AI in molecular design.

Table 3 Performance evaluation of different metrics for the generated drug molecules using the four fragment-based deep generative models

Full size table

Table 4 Performance evaluation of different metrics for the generated pesticide molecules using the four fragment-based deep generative models

Full size table

**Fig. 3: Evaluating the qualities of the representative molecules generated by the four fragment-based deep generative models.**

Binding affinity calculation of generated molecules to specific targets

Considering that the DeepFMPO architecture does not incorporate information about the target during the optimization of molecular structures, we performed molecular docking to assess the binding strength of the modified drug/pesticide molecules to the dopamine receptor D2 (DRD2)/4-hydroxyphenylpyruvate dioxygenase (HPPD). As anticipated, marginal disparities in binding affinity were observed between the drug/pesticide molecules generated by the four methods. Specifically, the binding free energy ranges of the generated compounds were −9.16 to −9.36 Kcal/mol (DRD2) and −8.37 to −8.53 Kcal/mol (HPPD), respectively (Supplementary Fig. 6). It is noteworthy that, despite the lack of a pronounced advantage for the drug/pesticide molecules generated by the DigFrag method, they manifested an acceptable level of binding to the corresponding targets, indicating the preservation of desired properties and high affinities.

Binding mode analysis of selected molecules to specific targets

Through the application of refined filters, we identified 24 desirable drug molecules and 20 desirable pesticide molecules that satisfy QED values greater than 0.75, SA values less than 3, and binding free energies less than the positive drug Domperidone (−10.7 Kcal/mol)¹⁹/positive pesticide Mesotrione (−8.4 Kcal/mol)²⁰. The chemical structures of these selected molecules are presented in Supplementary Figs. 7–8. Subsequently, an in-depth analysis of the interaction between several desirable drug/pesticide molecules and the corresponding targets was performed. As shown in Figs. 4–5, all drug ligands were effectively docked into the active pocket of DRD2 and formed hydrogen bonds with amino acid residues SER-193, TYR-416, THR-119, VAL-115, TRP-413, and GLU-95. Similarly, all pesticide ligands were stably bound to HPPD through hydrogen bonding interactions with amino acid residues SER-65, SER328, ASP-211, ASP-387, TYR-88, TYR-103, and LEU-78. Intriguingly, the generated compounds exhibited a distinct binding mode compared to the positive drug, suggesting a potential pharmacological mechanism that warrants further exploration in future investigations.

**Fig. 4: Binding mode analysis of generated drug molecules to DRD2 via AutoDock.**

**Fig. 5: Binding mode analysis of generated pesticide molecules to HPPD via AutoDock.**

Web server construction and application

Although several fragmentation methods have been reported, no easy-to-use web server is available. Therefore, an innovative platform called MolFrag (https://dpai.ccnu.edu.cn/MolFrag/) was constructed that seamlessly combines four molecular fragmentation approaches, ensuring accessibility to researchers with varying levels of expertise (Fig. 6). We believe that the application of MolFrag streamlines the process of molecular segmentation, making it an invaluable resource for the scientific community.

**Fig. 6: The interface of the Molecular Fragmentation in MolFrag.**

Discussion

In recent years, the field of drug discovery and development has been profoundly revolutionized by advances in AI technology^10,11,12. In this work, we introduced DigFrag, an intuitive fragmentation method that segments molecular structures based on AI-driven models, thereby circumventing the constraints inherent in traditional bond-breaking rules. We comprehensively compared the fragments obtained by machine intelligence with those obtained by human expertise. The results demonstrate the superior efficacy of DigFrag in the segmentation of diverse molecular fragments. Furthermore, we fed two initial sets of compounds with the four sets of drug/pesticide fragments into an existing deep generative model. The outcomes of this experiment exhibited the propensity of the DigFrag-derived model to generate molecules characterized by desired properties, thereby substantiating that AI-based models may be more compatible with AI-sourced data.

When we review conventional fragmentation methods, it becomes evident that they are limited in certain contexts. Chemists have successfully refined these methods through extensive experience and domain knowledge. However, in the evolving scientific environment, we are faced with increasingly complex molecular systems. Therefore, we want to explore how we can leverage the power of AI to drive the evolution of molecular fragmentation methods. In addition, we focus not only on the limitations of traditional methods, but also on the differences in data adaptability between machine intelligence and human expertise.

In the FBDD study, we integrated the fragments segmented by the four methods (BRICS, RECAP, MacFrag and DigFrag) into the DeepFMPO framework. To our surprise, AI-based models appeared to be more effective in utilizing the data derived from AI-based methods, and the generated molecules exhibited remarkable excellence in most of the evaluation metrics. Moreover, conventional fragmentation methods tended to ignore target-related information, while DigFrag was able to incorporate this information, which facilitated the segmentation of the same molecular entities into distinct fragments, thereby leading to more personalized molecular design.

However, it is imperative to acknowledge that there is discernible room for improvement within the proposed method. Firstly, an important direction for future research lies in conducting a comprehensive investigation to explore the underlying reasons why data from AI-based methods exhibit increased compatibility with AI-based models. Secondly, recognizing the limitations associated with the two-dimensional graph, which inherently neglects the geometric space of compounds, we aspire to segment molecules based on the three-dimensional graph. This endeavor is expected to yield richer conformational information about the fragments. Furthermore, since the development of fragment-based deep generative models is still in its infancy, we are committed to developing more deep learning frameworks for specific targets, thus expanding the practical application of this method.

Overall, our findings demonstrate that data generated based on AI methods may be more applicable to AI models. We strongly believe that this idea establishes a versatile research paradigm with far-reaching implications in various scientific domains (e.g., materials). For future work, we will strive to extend this idea to a broader range of subtasks, thereby generating more expansive data available for AI-based model exploitation.

Methods

Dataset curation

The modeling dataset utilized in this study was sourced from our self-constructed database called PADFrag²¹. This database primarily contains 1652 United States Food and Drug Administration (FDA)-approved drugs cataloged in DrugBank and 1259 commercial pesticides listed in Alan Wood. To improve data consistency, structurally non-standard compounds were systematically excluded. The dataset was then partitioned into distinct subsets for the purposes of training, validation, and testing in an 8:1:1 ratio.

AI-based fragmentation method

Construction of molecular graph

In the field of drug discovery and development, GNN-based architectures have been shown to achieve excellent predictive performance for various subtasks^12,22. The molecular graph, serving as the input representation of GNNs, is defined as G = (V, E), where V stands for nodes (atoms) and E stands for connecting edges (chemical bonds). In this work, the featurization of these elements was executed using the AttentiveFPAtomFeaturizer and AttentiveFPBondFeaturizer modules of the open-source toolkit DGL-LifeSci²³ (Supplementary Table 7).

Model architecture

In this study, we employed the AttentiveFP framework introduced by Xiong et al. to capture the structural information of molecules²⁴. This architecture represents a feature extraction network based on the graph attention mechanism. Specifically, the original molecular graph is initially fed into a series of attention layers to obtain individual atomic embeddings. Subsequently, these embeddings are combined into a unified vector, commonly referred to as a super node. Finally, the entire molecular embedding is obtained through multiple attention layers. It is noteworthy that the attention layer consists of two essential components: the aggregation module (Eqs. 1–3) based on the attention mechanism and the update module (Eqs. 4–5) based on the gated recurrent unit (GRU).

$${e}_{{vu}}={leaky}-{relu}\left(W\cdot \left[{h}_{v}\,,\,{h}_{u}\right]\right)$$

(1)

$${a}_{{vu}}=\frac{exp \left({e}_{{vu}}\right)}{{\sum }_{u\in N\left(v\right)} exp \left({e}_{{vu}}\right)}$$

(2)

$${C}_{v}={elu}\left({\sum}_{u\in N\left(v\right)}{a}_{{vu}}\cdot W\cdot {h}_{v}\,\right)$$

(3)

where ${h}_{v}$ and ${h}_{u}$ represent the vector of node $v$ and the neighbor node $u$, and $W$ is a trainable weight matrix.

$${C}_{v}^{k-1}={\sum}_{u\in N\left(v\right)}{M}^{k-1}\left({h}_{u}^{k-1},{h}_{v}^{k-1}\right)$$

(4)

$${h}_{v}^{k}={{GRU}}^{k-1}\left({C}_{v}^{k-1},{h}_{v}^{k-1}\right)$$

(5)

where $k$ represent iteration number, and $M$ is the message function.

Segmentation of molecules

Considering the latent relationship between structural fragments and drug/pesticide-likeness, DigFrag was used to leverage the graph attention mechanism to segment the molecular graph into subgraphs of different sizes (i.e., structural fragments). Consecutively, the fragments with relatively diminished relevance were filtered out by setting an attention weight threshold, while the fragments closely related to the drug/pesticide-likeness were retained. Moreover, the conventional fragmentation method (BRICS and RECAP) was implemented in RDKit, and the latest fragmentation method (MacFrag) was implemented through the source code released by the authors.

Comparative analysis of the fragments obtained by the four methods

In the present study, we conducted a comparative assessment of segmented fragments from two aspects. At the structural level, we generated the corresponding Morgan fingerprint for each fragment using the GetMorganFingerprintAsBitVect module of RDKit. For a more rigorous quantitative evaluation, we executed Butina clustering based on the Tanimoto Coefficient²⁵ for distinct groups of fragments using the BulkTanimotoSimilarity and ClusterData modules of RDKit and then calculated the fragment cluster ratio, which is denoted as the ratio of fragment clusters to the total number of fragments. Higher values of this metric indicate greater diversity in the structural landscape. At the property level, we used the open-source software PaDEL-Descriptor²⁶ to calculate several attributes related to the drug/pesticide-likeness rules, including molecular weight, LogP, topological polar surface area, number of hydrogen bond donors, number of hydrogen bond acceptors, number of rotatable bonds^27,28. It should be noted that the calculation of fingerprints and attributes did not consider the breaking position of the fragments, mainly due to the fact that these representations are not relevant to the broken bond information.

Fragment-based molecular generation

To further elucidate the influence of digital and traditional fragments on fragment-based deep generative models, we undertook an investigation utilizing the DeepFMPO²⁹ architecture developed by Ståhl et al. with a specific focus on its application in de novo drug design. DeepFMPO is an actor-critic reinforcement learning model that acquires the ability to generate compounds with desired properties by replacing fragments within a compound. Therefore, the generation of molecules using DeepFMPO requires a balanced binary tree library and an initial set of lead molecules. In this study, the former are the fragments segmented by different methods, and the latter are the molecules collected from the ChEMBL database³⁰ that did not fulfill the sweet spot criteria (561 molecules for DRD2 and 221 molecules for HPPD). The model architecture used a bidirectional long short-term memory (LSTM) network, with the number of neurons in the hidden layer being 128, 64, and 32, respectively. The learning rate and batch size were set to 1e-4 and 512, respectively. Model training was executed until 4000 epochs were reached, after which all molecules generated in the last 100 epochs were compiled and a random selection of 1000 molecules was sampled as representatives for subsequent analysis. Performance evaluation of each model was conducted on the MOSES benchmarking platform¹⁶.

Molecular docking of the generated molecules

Docking studies were performed to assess the protein-ligand binding affinity of representative molecules with DRD2 (drug) and HPPD (pesticide). The three-dimensional structures of the targets were obtained from the Protein Data Bank database³¹ (ID: 6CM4³² and 5YWG³³) and subsequently subjected to dehydration, hydrogenation, and charge calculation using AutoDockTools software. The three-dimensional structures of representative drug/pesticide molecules were constructed using ChemOffice software³⁴. Binding free energy calculations were performed using Autodock Vina software³⁵. Furthermore, the visualization of protein-ligand interactions was achieved through the utilization of PyMOL software³⁶.

Data availability

The data required to reproduce our results are available at https://doi.org/10.5281/zenodo.13997755. The numerical source data of main figures are available in Supplementary Data 1. All other data are available from the corresponding authors upon reasonable request.

Code availability

The source code implemented in this study is available at https://github.com/yang1rq/MolFrag.

References

Bian, Y. & Xie, X. S. Computational fragment-based drug design: current trends, strategies, and applications. Aaps. J. 20, 59 (2018).
Article PubMed Google Scholar
Fattori, D. Molecular recognition: the fragment approach in lead generation. Drug. Discov. Today 9, 229–238 (2004).
Article CAS PubMed Google Scholar
Sheng, C. & Zhang, W. Fragment informatics and computational fragment-based drug design: an overview and update. Med. Res. Rev. 33, 554–598 (2013).
Article CAS PubMed Google Scholar
Erlanson, D. A., Davis, B. J. & Jahnke, W. Fragment-based drug discovery: advancing fragments in the absence of crystal structures. Cell. Chem. Biol. 26, 9–15 (2019).
Article CAS PubMed Google Scholar
Lewell, X. Q., Judd, D. B., Watson, S. P. & Hann, M. M. RECAP–retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J. Chem. Inf. Comput. Sci. 38, 511–522 (1998).
Article CAS PubMed Google Scholar
Degen, J., Wegscheid-Gerlach, C., Zaliani, A. & Rarey, M. On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem 3, 1503–1507 (2008).
Article CAS PubMed Google Scholar
Cummings, M. D. & Sekharan, S. Structure-based macrocycle design in small-molecule drug discovery and simple metrics to identify opportunities for macrocyclization of small-molecule ligands. J. Med. Chem. 62, 6843–6853 (2019).
Article CAS PubMed Google Scholar
Diao, Y., Hu, F., Shen, Z. & Li, H. MacFrag: segmenting large-scale molecules to obtain diverse fragments with high qualities. Bioinformatics 39, btad012 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cramer, J., Sager, C. P. & Ernst, B. Hydroxyl groups in synthetic and natural-product-derived therapeutics: a perspective on a common functional group. J. Med. Chem. 62, 8915–8930 (2019).
Article CAS PubMed Google Scholar
Zheng, L., Fan, J. & Mu, Y. OnionNet: a multiple-layer intermolecular-contact-based convolutional neural network for protein-ligand binding affinity prediction. ACS Omega 4, 15956–15965 (2019).
Article CAS PubMed PubMed Central Google Scholar
Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
Article CAS PubMed Google Scholar
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z., Guan, J. & Zhou, S. FraGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction. Bioinformatics 37, 2981–2987 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cai, H., Zhang, H., Zhao, D., Wu, J. & Wang, L. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief. Bioinform. 23, bbac408 (2022).
Article PubMed Google Scholar
Wu, Z. et al. Mining toxicity information from large amounts of toxicity data. J. Med. Chem. 64, 6924–6936 (2021).
Article CAS PubMed Google Scholar
Polykovskiy, D. et al. Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 11, 565644 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Article PubMed PubMed Central Google Scholar
Xiang, Z., Ma, H., Mou, Y. & Xu, C. F. Association between polymorphism of dopamine D2 receptor genes and therapeutic effect of domperidone in functional dyspepsia. Turk. J. Gastroenterol. 26, 1–5 (2015).
Article CAS PubMed Google Scholar
Yu, X. H. et al. Discovery and development of 4-hydroxyphenylpyruvate dioxygenase as a novel crop fungicide target. J. Agric. Food Chem. 71, 19396–19407 (2023).
Article CAS PubMed Google Scholar
Yang, J. F. et al. PADFrag: a database built for the exploration of bioactive fragment space for drug discovery. J. Chem. Inf. Model. 58, 1725–1730 (2018).
Article CAS PubMed Google Scholar
Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520–1530 (2018).
Article CAS PubMed PubMed Central Google Scholar
Li, M. et al. DGL-LifeSci: an open-source toolkit for deep learning on graphs in life science. ACS Omega 6, 27233–27238 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xiong, Z. et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63, 8749–8760 (2020).
Article CAS PubMed Google Scholar
Zhang, B., Vogt, M., Maggiora, G. M. & Bajorath, J. Design of chemical space networks using a Tanimoto similarity variant based upon maximum common substructures. J. Comput. Aided Mol. Des. 29, 937–950 (2015).
Article CAS PubMed Google Scholar
Yap, C. W. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474 (2011).
Article CAS PubMed Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).
Article CAS PubMed Google Scholar
Tice, C. M. Selecting the right compounds for screening: does Lipinski’s Rule of 5 for pharmaceuticals apply to agrochemicals? Pest. Manag. Sci. 57, 3–16 (2001).
Article CAS PubMed Google Scholar
Ståhl, N., Falkman, G., Karlsson, A., Mathiason, G. & Boström, J. Deep reinforcement learning for multiparameter optimization in de novo drug design. J. Chem. Inf. Model. 59, 3166–3176 (2019).
Article PubMed Google Scholar
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).
Article CAS PubMed Google Scholar
Goodsell, D. S. et al. RCSB Protein Data Bank: enabling biomedical research and drug discovery. Protein Sci. 29, 52–65 (2020).
Article CAS PubMed Google Scholar
Wang, S. et al. Structure of the D2 dopamine receptor bound to the atypical antipsychotic drug risperidone. Nature 555, 269–273 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lin, H. Y. et al. Molecular insights into the mechanism of 4-hydroxyphenylpyruvate dioxygenase inhibition: enzyme kinetics, X-ray crystallography and computational simulations. FEBS J. 286, 975–990 (2019).
Article CAS PubMed Google Scholar
Buntrock, R. E. ChemOffice ultra 7.0. J. Chem. Inf. Comput. Sci. 42, 1505–1506 (2002).
Article CAS PubMed Google Scholar
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yuan, S., Chan, H. C. S. & Hu, Z. Using PyMOL as a platform for computational drug design. Wires. Comput. Mol. Sci. 7, e1298 (2017).

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (2023YFD1700500) and the National Natural Science Foundation of China (No. 21837001).

Author information

Authors and Affiliations

State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
Ruoqi Yang, Hao Zhou, Fan Wang & Guangfu Yang

Authors

Ruoqi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Fan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guangfu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.W. and G.Y. conceived the project. R.Y. and H.Z. performed the computational experiments. F.W. and R.Y. collected and analyzed the data. R.Y. wrote the manuscript. F.W. and G.Y. provided evaluation and suggestions.

Corresponding authors

Correspondence to Fan Wang or Guangfu Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Chemistry thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, R., Zhou, H., Wang, F. et al. DigFrag as a digital fragmentation method used for artificial intelligence-based drug design. Commun Chem 7, 258 (2024). https://doi.org/10.1038/s42004-024-01346-5

Download citation

Received: 04 July 2024
Accepted: 29 October 2024
Published: 11 November 2024
DOI: https://doi.org/10.1038/s42004-024-01346-5

Subjects

Abstract

Similar content being viewed by others

Molecular fragmentation as a crucial step in the AI-based drug development pathway

A deep generative model for molecule optimization via one fragment modification

Structure-based drug design with equivariant diffusion models

Introduction

Results

Comparison of the property distributions of segmented fragments

Comparison of the structural diversity of segmented fragments

Comparison of the qualities of generated molecules

Binding affinity calculation of generated molecules to specific targets

Binding mode analysis of selected molecules to specific targets

Web server construction and application

Discussion

Methods

Dataset curation

AI-based fragmentation method

Construction of molecular graph

Model architecture

Segmentation of molecules

Comparative analysis of the fragments obtained by the four methods

Fragment-based molecular generation

Molecular docking of the generated molecules

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links