Abstract
Machine Learning (ML) is transforming the world with research breakthroughs that are leading to the progress of every field. We are living in an era of data explosion. This further improves the output as data that can be fed to the models is more than it has ever been. Therefore, prediction algorithms are now capable of solving many of the complex problems that we face by leveraging the power of data. The models are capable of correlating a dataset and its features with an accuracy that humans fail to achieve. Bearing this in mind, this research takes an in-depth look into the of the problem- solving potential of ML in the area of Database Management Systems (DBMS). Although ML hallmarks significant scientific milestones, the field is still in its infancy. The limitations of ML models are also studied in this paper.
Keywords
- Machine learning
- Databases
- Intelligent data store
References
- β’ Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT press, 2016.
- β’ Vraj Shah, Side Li, Arun Kumar, and Lawrence Saul. SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data.
- β’ Victor Zhong, Caiming Xiong, and Richard Socher. Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement Learning. arXiv preprint arXiv:1709.00103, 2017.
- β’ Xiaojun Xu, Chang Liu, and Dawn Song. SQLNet: Generating Structured Queries from Natural Language without Reinforcement Learning. arXiv preprint arXiv:1711.04436, 2017.
- β’ Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. The VLDB Journal, 27(5):643β668, 2018.
- β’ Michael Stillger, Guy M Lohman, Volker Markl, and Mokhtar Kandil. LEO-DB2βs learning optimizer. In VLDB, volume 1, pages 19β28, 2001.
- β’ Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, and Sriram Rao. Towards a Learning Optimizer for Shared Clouds. Proceedings of the VLDB Endowment, 12(3):210β222, 2018.
- β’ Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. Deep Unsupervised Cardinality Estimation. Proceedings of the VLDB Endowment, 13(3):279β292, 2019.
- β’ Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. DeepDB: Learn from Data, not from Queries! Proceedings of the VLDB Endowment, 13(7):992β 1005, 2020.
- β’ Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data, pages 1153β1170, 2019.
- β’ Levente Kocsis and Csaba SzepesvΓ‘ri. Bandit based Monte-Carlo Planning. In European conference on machine learning, pages 282β293. Springer, 2006.
- β’ Ryan Marcus and Olga Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pages 1β4, 2018.
- β’ Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment, 12(11):1705β1718, 2019.
- β’ Surajit Chaudhuri and Vivek Narasayya. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd international conference on Very large data bases, pages 3β14, 2007.
- β’ Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R Narasayya. AI meets AI: Leveraging Query Executions to Improve Index Recommendations. In Proceedings of the 2019 International Conference on Management of Data, pages 1241β1258, 2019.
- β’ Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J Gordon. Query-based Workload Forecasting for Self-Driving Database Management Systems. In Proceedings of the 2018 International Conference on Management of Data, pages 631β645, 2018.
- β’ Xi Liang, Aaron J Elmore, and Sanjay Krishnan. Opportunistic View Materialization with Deep Reinforcement Learning. arXiv preprint arXiv:1903.01363, 2019.
- β’ Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, and Barzan Mozafari. Database Learning: Toward a Database that Becomes.
- o Smarter Every Time. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 587β602, 2017.
- β’ Qingzhi Ma and Peter Triantafillou. DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models. In Proceedings of the 2019 International Conference on Management of Data, pages 1553β1570, 2019.
- β’ Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. Tuning Database Configuration Parameters with iTuned. Proceedings of the VLDB Endowment, 2(1):1246β1257, 2009.
- β’ Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang.
- o Automatic Database Management System Tuning through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 1009β1024, 2017.
- β’ Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. iBTune: Individualized Buffer Tuning for Large-Scale Cloud Databases. Proceedings of the VLDB Endowment, 12(10):1221β1234, 2019.
- β’ Ryan Marcus and Olga Papaemmanouil. WiSeDB: A Learning-Based Workload Management Advisor for Cloud Databases. Proc. VLDB Endow., 9(10):780β791, June 2016.
- β’ Ryan Marcus and Olga Papaemmanouil. Releasing Cloud Databases for the Chains of Performance Prediction Models. In CIDR, 2017.
- β’ perfenforce demonstration: Data analytics with performance guarantees.
- β’ Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, pages 489β504, 2018.
- β’ Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. FITing-Tree: A Data-aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data, pages 1189β1206, 2019.
- β’ Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 308β320, 2020.
- β’ Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, and Ruslan Salakhutdinov. External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. IEEE Data Engineering, 11:1910β1913, 2019.
- β’ Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. Self-Driving Database Management Systems. In CIDR, volume 4, page 1, 2017.
- β’ Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. Sagedb: A Learned Database System. 2019.
- β’ Ryan Marcus and Olga Papaemmanouil. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proceedings of the VLDB Endowment, 12(11):1733β1746, 2019.
- β’ Mitchell, T. (1997). Machine Learning. McGraw Hill. p. 2. ISBN 978-0-07-042807-2.
- β’ O. Simeone, "A Very Brief Introduction to Machine Learning With Applications to Communication Systems," in IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 4, pp. 648-664, Dec. 2018, doi: 10.1109/TCCN.2018.2881442.
- β’ Alpaydin, Ethem (2010). Introduction to Machine Learning. London: The MIT Press. ISBN 978-0-262-01243-0. Retrieved 1 August 2020.
- β’ K. Kara, K. Eguro, C. Zhang, and G. Alonso. ColumnML: Column-Store Machine Learning with On-the-Fly Data Transformation. PVLDB, 12(4):348β361, 2018.
- β’ A. Kumar, J. Naughton, and J. M. Patel. Learning generalized linear models over normalized data. In Proceedings of the 2015 ACM SIGMOD International 360 Conference on Management of Data, pages 1969β1984. ACM, 2015.
- β’ Alonso, G., Istvan, Z., Kara, K., Owaida, M. and Sidler, D., 2019. doppioDB 1.0: Machine Learning inside a Relational Engine. IEEE Data Eng. Bull., 42(2), pp.19-31
- β’ A. Kumar, J. Naughton, and J. M. Patel, βLearning Generalized Linear Models Over Normalized Data,β in SIGMODβ15.
- β’ Langley, Pat (2011). "The changing science of machine learning". Machine Learning. 82 (3): 275β279. doi:10.1007/s10994-011-5242-y.