Abstract

This survey paper aims to provide an overview of the various techniques used for text detection and recognition in images and videos. Text detection and recognition are important tasks in the field of computer vision and have numerous applications such as document analysis, scene understanding, and video indexing. In this paper, we will review the different approaches and algorithms that have been proposed for text detection and recognition, including traditional methods based on image processing techniques as well as more recent deep learning-based approaches. We will also discuss the challenges and future directions in this field. One common approach to text detection in images is to use edge detection algorithms to locate regions of high contrast that may contain text. Once potential text regions have been identified, various techniques, such as connected component analysis or stroke width transform, can be used to extract individual characters or words. For text recognition, optical character recognition (OCR) algorithms are often employed to convert the extracted text into machine-readable format. Deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have shown promising results in text detection and recognition tasks, outperforming traditional methods in many cases. Despite the progress that has been made in this field, challenges such as handling complex backgrounds, varying fonts and sizes, and low-quality images still remain. Future research directions may focus on developing more robust and accurate algorithms that can handle these challenges effectively.

References

  1. Al-Helali, Baligh M. and Sabri A. Mahmoud. 2017. “Arabic Online Handwriting Recognition (AOHR): A Survey.” ACM Computing Surveys 50(3).
  2. Bell, Alan, Jason M. Brenier, Michelle Gregory, Cynthia Girand, and Dan Jurafsky. 2009. “Predictability Effects on Durations of Content and Function Words in Conversational English.” Journal of Memory and Language 60(1):92–111.
  3. Cai, Yu Qing, Da Xin Gong, Li Ying Tang, Yue Cai, Hui Jun Li, Tian Ci Jing, Mengchun Gong, Wei Hu, Zhen Wei Zhang, Xingang Zhang, and Guang Wei Zhang. 2024. “Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions.” Journal of Medical Internet Research 26.
  4. Chataut, Robin, Alex Phoummalayvane, and Robert Akl. 2023. “Unleashing the Power of IoT: A Comprehensive Review of IoT Applications and Future Prospects in Healthcare, Agriculture, Smart Homes, Smart Cities, and Industry 4.0.” Sensors 23(16).
  5. Chen, Xiaoxue, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2021. “Text Recognition in the Wild: A Survey.” ACM Computing Surveys 54(2).
  6. Coito, Tiago, Bernardo Firme, Miguel S. E. Martins, Susana M. Vieira, João Figueiredo, and João M. C. Sousa. 2021. “Intelligent Sensors for Real-Time Decision-Making.” Automation 2(2):62–82.
  7. Črepinšek, Matej, Shih Hsi Liu, and Marjan Mernik. 2014. “Replication and Comparison of Computational Experiments in Applied Evolutionary Computing: Common Pitfalls and Guidelines to Avoid Them.” Applied Soft Computing Journal 19:161–70.
  8. Doermann, David, Jian Liang, and Huiping Li. 2003. “Progress in Camera-Based Document Image Analysis University of Maryland College Park.” (Icdar).
  9. Dorafshan, Sattar, Robert J. Thomas, and Marc Maguire. 2018. “Comparison of Deep Convolutional Neural Networks and Edge Detectors for Image-Based Crack Detection in Concrete.” Construction and Building Materials 186:1031–45.
  10. Fan, Cheng, Yongjun Sun, Yang Zhao, Mengjie Song, and Jiayuan Wang. 2019. “Deep Learning-Based Feature Engineering Methods for Improved Building Energy Prediction.” Applied Energy 240(September 2018):35–45.
  11. Femi Osasona, Olukunle Oladipupo Amoo, Akoh Atadoga, Temitayo Oluwaseun Abrahams, Oluwatoyin Ajoke Farayola, and Benjamin Samson Ayinla. 2024. “Reviewing the Ethical Implications of Ai in Decision Making Processes.” International Journal of Management & Entrepreneurship Research 6(2):322–35.
  12. Foster, Ian, Markus Fidler, Alain Roy, Volker Sander, and Linda Winkler. 2004. “End-to-End Quality of Service for High-End Applications.” Computer Communications 27(14):1375–88.
  13. Glasmachers, Tobias. 2017. “Limits of End-to-End Learning.” Journal of Machine Learning Research 77:17–32.
  14. Gouiza, Nissrine, Hakim Jebari, and Kamal Reklaoui. 2024. “Integration of Iot-Enabled Technologies and Artificial Intelligence in Diverse Domains: Recent Advancements and Future Trends.” Journal of Theoretical and Applied Information Technology 102(5):1975–2029.
  15. Hamad, Karez and Mehmet Kaya. 2016. “A Detailed Analysis of Optical Character Recognition Technology.” International Journal of Applied Mathematics, Electronics and Computers 4(Special Issue-1):244–244.
  16. Jain, Richa and Prof Deepa Gianchandani. 2019. “A Hybrid Approach for Detection and Recognition of Traffic Text Sign Using MSER and OCR.” Proceedings of the International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2018 775–78.
  17. Li, Yi, Yefeng Zheng, David Doermann, and Stefan Jaeger. 2008. “Script-Independent Text Line Segmentation in Freestyle Handwritten Documents.” IEEE Transactions on Pattern Analysis and Machine Intelligence 30(8):1313–29.
  18. Liu, Xiyan, Gaofeng Meng, and Chunhong Pan. 2019. “Scene Text Detection and Recognition with Advances in Deep Learning: A Survey.” International Journal on Document Analysis and Recognition 22(2):143–62.
  19. Liu, Yu, Qinghua Guo, and Maggi Kelly. 2008. “A Framework of Region-Based Spatial Relations for Non-Overlapping Features and Its Application in Object Based Image Analysis.” ISPRS Journal of Photogrammetry and Remote Sensing 63(4):461–75.
  20. Mahmood, Zahid, Khurram Khan, Uzair Khan, Syed Hasan Adil, Syed Saad Azhar Ali, and Mohsin Shahzad. 2022. “Towards Automatic License Plate Detection.” Sensors 22(3):1–19.
  21. Mandal, Murari and Santosh Kumar Vipparthi. 2022. “An Empirical Review of Deep Learning Frameworks for Change Detection: Model Design, Experimental Frameworks, Challenges and Research Needs.” IEEE Transactions on Intelligent Transportation Systems 23(7):6101–22.
  22. Mattys, Sven L., Laurence White, and James F. Melhorn. 2005. “Integration of Multiple Speech Segmentation Cues: A Hierarchical Framework.” Journal of Experimental Psychology: General 134(4):477–500.
  23. Meng, Fanfei and Branden Ghena. 2023. “Research on Text Recognition Methods Based on Artificial Intelligence and Machine Learning.” Advances in Computer and Communication 4(5):340–44.
  24. Merino-Gracia, Carlos, Karel Lenc, and Majid Mirmehdi. 2012. “A Head-Mounted Device for Recognizing Text in Natural Scenes.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7139 LNCS:29–41.
  25. Najafabadi, Maryam M., Flavio Villanustre, Taghi M. Khoshgoftaar, Naeem Seliya, Randall Wald, and Edin Muharemagic. 2015. “Deep Learning Applications and Challenges in Big Data Analytics.” Journal of Big Data 2(1):1–21.
  26. Nassar, Ahmed and Mostafa Kamal. 2021. “International Journal of Responsible Artificial Intelligence Ethical Dilemmas in AI-Powered Decision-Making: A Deep Dive into Big Data-Driven Ethical Considerations.” International Journal of Responsible Artificial Intelligence 1–11.
  27. Noor, Mohd Halim Mohd and Ayokunle Olalekan Ige. 2024. “A Survey on Deep Learning and State-of-the-Art Applications.” 1–56.
  28. Oyekunle, David, David Boohene, and David Preston. 2024. “Ethical Considerations in AI-Powered Work Environments: A Literature Review and Theoretical Framework for Ensuring Human Dignity and Fairness.” International Journal of Scientific Research and Management (IJSRM) 12(03):6166–78.
  29. Paramesha, Mallikarjuna, Nitin Rane, and Jayesh Rane. 2024. “Big Data Analytics, Artificial Intelligence, Machine Learning, Internet of Things, and Blockchain for Enhanced Business Intelligence.” SSRN Electronic Journal (July):110–33.
  30. Parvez, Mohammad Tanvir and Sabri A. Mahmoud. 2013. “Offline Arabic Handwritten Text Recognition: A Survey.” ACM Computing Surveys 45(2).
  31. Ramanathan, Vignesh, Kevin Tang, Greg Mori, and Li Fei-Fei. 2015. “Learning Temporal Embeddings for Complex Video Analysis.” Proceedings of the IEEE International Conference on Computer Vision 2015 Inter:4471–79.
  32. Rane, Nitin, Mallikarjuna Paramesha, Saurabh Choudhary, and Jayesh Rane. 2024. “Artificial Intelligence, Machine Learning, and Deep Learning for Advanced Business Strategies: A Review.” SSRN Electronic Journal (June):10–11.
  33. Roberts, Michael, Derek Driggs, Matthew Thorpe, Julian Gilbey, Michael Yeung, Stephan Ursprung, Angelica I. Aviles-Rivero, Christian Etmann, Cathal McCague, Lucian Beer, Jonathan R. Weir-McCall, Zhongzhao Teng, Effrossyni Gkrania-Klotsas, Alessandro Ruggiero, Anna Korhonen, Emily Jefferson, Emmanuel Ako, Georg Langs, Ghassem Gozaliasl, Guang Yang, Helmut Prosch, Jacobus Preller, Jan Stanczuk, Jing Tang, Johannes Hofmanninger, Judith Babar, Lorena Escudero Sánchez, Muhunthan Thillai, Paula Martin Gonzalez, Philip Teare, Xiaoxiang Zhu, Mishal Patel, Conor Cafolla, Hojjat Azadbakht, Joseph Jacob, Josh Lowe, Kang Zhang, Kyle Bradley, Marcel Wassin, Markus Holzer, Kangyu Ji, Maria Delgado Ortet, Tao Ai, Nicholas Walton, Pietro Lio, Samuel Stranks, Tolou Shadbahr, Weizhe Lin, Yunfei Zha, Zhangming Niu, James H. F. Rudd, Evis Sala, and Carola Bibiane Schönlieb. 2021. “Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for COVID-19 Using Chest Radiographs and CT Scans.” Nature Machine Intelligence 3(3):199–217.
  34. Rouast, Philipp V., Marc T. P. Adam, and Raymond Chiong. 2021. “Deep Learning for Human Affect Recognition: Insights and New Developments.” IEEE Transactions on Affective Computing 12(2):524–43.
  35. Sakhi, Omid. 2013. “Segmentation of Heterogeneous Document Images: An Approach Based on Machine Learning, Connected Components, and Texture Analysis.”
  36. Space, Architectural. n.d. “Performative Materiality.” 1:611–18.
  37. Tang, Wei, Pei Yu, Jiahuan Zhou, and Ying Wu. 2017. “Towards a Unified Compositional Model for Visual Pattern Modeling.” Proceedings of the IEEE International Conference on Computer Vision 2017-Octob:2803–12.
  38. Vu, Hoai Nam, Tuan Anh Tran, Na In Seop, and Soo Hyung Kim. 2016. “Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering.” International Journal of Networked and Distributed Computing 4(1):11–21.
  39. Yao, Cong, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. 2012. “Detecting Texts of Arbitrary Orientations in Natural Images.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 8:1083–90.
  40. Yin, Xu Cheng, Wei Yi Pei, Jun Zhang, and Hong Wei Hao. 2015. “Multi-Orientation Scene Text Detection with Adaptive Clustering.” IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9):1930–37.
  41. Zherzdev, Sergey and Alexey Gruzdev. 2018. “LPRNet: License Plate Recognition via Deep Neural Networks.” 1–6.
  42. Zhou, Zijian, Oluwatosin Alabi, Meng Wei, Tom Vercauteren, and Miaojing Shi. 2023. “Text Promptable Surgical Instrument Segmentation with Vision-Language Models.” Advances in Neural Information Processing Systems 36(NeurIPS):1–13.