Abstract
Metabolites are small molecules that play vital roles in sustaining biological functions and supporting energy production. Investigating disease-associated metabolites allows researchers to gain deeper insights into their contribution to disease development. This research is focused on exploring metabolite–disease associations through similarity analysis. Cancer-related diseases and their corresponding metabolites are retrieved from the HMDB database. The metabolite and disease data are converted into numerical vectors using the TF-IDF method to enable similarity analysis. Three statistical similarity measures—Pearson correlation, Bhattacharyya distance, and Chebyshev distance—are applied to the TF-IDF dataset. The main objective of this research is to develop a graph-based model that predicts the correlation strength between metabolites and diseases using statistical measures. The research is conducted in three phases. In the first phase, the primary objective is to identify the most effective statistical similarity measure for detecting metabolite–disease associations. In the second phase, the similarity dataset is used as input to a Graph Attention Network model, which classified associations as strong, moderate, or weak. This framework provided valuable insights into the degree of relationship between metabolites and diseases, highlighting how metabolite imbalances may contribute to disease onset and facilitate early diagnosis. Based on the comparative evaluation of the three measures, Pearson correlation proved to be the most suitable method for metabolite–disease analysis. Furthermore, in the overall results, the GAT model predicted the metabolite–disease associations as moderate and achieved an accuracy of 99%. In the third phase, the TF-IDF dataset is directly given to the GAT model to predict association between metabolites and diseases. Finally, results from the Pearson similarity dataset with the GAT model and results from the TF-IDF method with the GAT model are compared. While the TF–IDF approach within the GAT framework achieved 33% accuracy, this research findings indicate that the GAT model with Pearson similarity, is particularly well-suited for examining how individual metabolites are linked to multiple diseases in terms of correlation strength.Keywords
- Metabolites
- Cancer
- Diseases
- Statistical Measures
- TF- IDF
- GAT
- Correlation Strength
References
- Spelmen Vimalraj, S.; Porkodi Rajendran.: Convalecing the Process of Ranking Metabolites for Diseases using Subcellular Localization. Arabian Journal for Science and Engineering • Arabian Journal for Science and Engineering. 47:1619–1629 (2021).
- Zhu, Keyun; Huang, Mengting; Wang, Yimeng; Gu, Yaxin; Li, Weihua; Liu, Guixia; Tang, Yun.: MetaPredictor: in silico prediction of drug metabolites based on deep language models with prompt engineering. Briefings in Bioinformatics • Briefings in Bioinformatics. 25(5): bbae374 (2024). https://doi.org/10.1093/bib/bbae374
- Weng, Qinghui; Hu, Mingyi; Peng, Guohao; Zhu, Jinlin.: DMoVGPE: predicting gut microbial associated metabolite profiles with deep mixture of variational Gaussian Process experts. BMC Bioinformatics • BMC Bioinformatics. 26(1): 1–23 (2025). https://doi.org/10.1186/s12859-025-06110-7
- Wang, Yongtian; Juan, Liran; Peng, Jiajie; Wang, Tao; Zang, Tianyi; Wang, Yadong.: Explore potential disease-related metabolites based on latent factor model. BMC Genomics • BMC Genomics. 23(Suppl 1): 269 (2022). https://doi.org/10.1186/s12864-022-08504-w
- Liu, Wenzhi; Lu, Pengli.: Predicting Disease–Metabolite Associations Based on the Metapath Aggregation of Tripartite Heterogeneous Networks. Interdisciplinary Sciences: Computational Life Sciences • Interdisciplinary Sciences. 16: 829–843 (2024). https://doi.org/10.1007/s12539-024-00645-8
- Ruiz-Moreno, Angel J.; Del Castillo-Izquierdo, Ángela; Tamargo-Rubio, Isabel; Fu, Jingyuan.: MicrobeRX: a tool for enzymatic-reaction-based metabolite prediction in the gut microbiome. Microbiome • Microbiome. 13: Article 2070 (2025). https://doi.org/10.1186/s40168-025-02070-5
- Xiao, F.; Huang, C.; Chen, A.; Xiao, W.; Li, Z.: Identification of Metabolite–Disease Associations Based on Knowledge Graph. Metabolomics • Metabolomics. 21:32 (2025).
- Ren, Sheng; Hinzman, Anna A.; Kang, Edward; Szczesniak, Robert D.; Lu, Long Jason: Computational and Statistical Analysis of Metabolomics Data. Metabolomics • Metabolomics. 11:1492–1513 (2015).
- Sun, Feiyue; Sun, Jianqiang; Zhao, Qi: A Deep Learning Method for Predicting Metabolite–Disease Associations via Graph Neural Network. Briefings in Bioinformatics • Briefings in Bioinformatics. 23(4):bbac266 (2022).
- Vaida, Maria; Wu, Jiawen; Himdiat, Eyad; Haince, Jean-François; Bux, Rashid A.; Huang, Guoyu; Tappia, Paramjit S.; Ramjiawan, Bram; Ford, W. Rand: M-GNN: A Graph Neural Network Framework for Lung Cancer Detection Using Metabolomics and Heterogeneous Graph Modeling. International Journal of Molecular Sciences • IJMS. 26(10):4655 (2025)