تشخیص اعداد دست‌نویس MNIST با استفاده از شبکه ترنسفری VGG16

تقندیکی, کاظم

doi:10.48301/jssc.2024.460608.1017

تشخیص اعداد دست‌نویس MNIST با استفاده از شبکه ترنسفری VGG16

نوع مقاله : مقاله پژوهشی

نویسنده

کاظم تقندیکی

عضو هیأت علمی گروه مهندسی کامپیوتر، دانشگاه ملی مهارت، تهران، ایران

10.48301/jssc.2024.460608.1017

چکیده

تشخیص اعداد دست‌نویس با استفاده از مجموعه داده MNIST ازجمله مسائل اساسی در زمینه یادگیری عمیق و بینایی کامپیوتری است. در این تحقیق، از مدل یادگیری انتقالی VGG16 برای تشخیص اعداد دست‌نویس استفاده شده است. این مدل که قبلاً بر روی مجموعه داده ImageNet آموزش دیده بود، دوباره به‌منظور سازگاری با مجموعه داده MNIST آموزش داده شد. عملکرد این مدل با استفاده از معیارهای Accuracy، Precision، Recall و F1 score ارزیابی شد و نتایج آن با سایر الگوریتم‌های یادگیری عمیق مانند شبکه‌های عصبی پیچشی، چندلایه‌ای‌های چندگانه و الگوریتم‌های یادگیری ماشین سنتی مقایسه شد. نتایج نشان داد که مدل VGG16 با استفاده از یادگیری انتقالی، دارای دقت (Accuracy) 99 درصد در تشخیص اعداد دست‌نویس می‌باشد که نسبت به مدل‌های آموزش‌دیده از ابتدا، دقت بالاتری دارد. از این‌رو استفاده از مدل‌های پیش‌آموزش‌شده می‌تواند عملکرد مدل‌های یادگیری عمیق را برای تشخیص اعداد دست‌نویس بهبود بخشد، در حالی که زمان آموزش و منابع محاسباتی موردنیاز را کاهش می‌دهد.

کلیدواژه‌ها

تشخیص اعداد

مجموعه داده MNIST

مجموعه داده ImageNet

یادگیری عمیق

مدل VGG16

عنوان مقاله English

Handwritten Digit Recognition on MNIST Using Transfer Learning with VGG16

نویسنده English

Kazem Taghandiki

Faculty Member, Department of Computer Engineering, National University of Skills (NUS), Tehran, Iran.

چکیده English

Handwritten digit recognition using the MNIST dataset is one of the fundamental problems in the field of deep learning and computer vision. In this study, the VGG16 transfer learning model was employed for recognizing handwritten digits. This model, which was previously trained on the ImageNet dataset, was retrained to adapt to the MNIST dataset. The performance of this model was evaluated using metrics such as accuracy, precision, recall, and F1 score, and the results were compared with other deep learning algorithms, including convolutional neural networks (CNNs), multilayer perceptrons (MLPs), and traditional machine learning algorithms. The results indicated that the VGG16 model, utilizing transfer learning, achieved an accuracy of 99% in recognizing handwritten digits, which is higher than that of models trained from scratch. Therefore, the use of pre-trained models can enhance the performance of deep learning models in handwritten digit recognition while reducing the required training time and computational resources.

کلیدواژه‌ها English

Handwritten Digit Recognition MNIST Dataset

ImageNet Dataset

Deep Learning

VGG16 Model

Aslani, S., & Jacob, J. (2023). Utilisation of deep learning for COVID-19 diagnosis. Clinical Radiology, 78(2), 150-157. https://doi.org/10.1016/j.crad.2022.11.006

Azizi, S., Kornblith, S., Saharia, C., Norouzi, M., & Fleet, D. J. (2023). Synthetic data from diffusion models improves imagenet classification. arXiv 1-19. https://doi.org/10.48 550/arXiv.2304.08466

Bakasa, W., & Viriri, S. (2023). VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction. Journal of Imaging, 9(7), 138. https://doi.org/10.33 90/jimaging9070138

Berngardt, O. I. (2023). Improving Classification Neural Networks by using Absolute activation function (MNIST/LeNET-5 example). arXiv, 1-19. https://doi.org/10.48550/arXiv. 2304.11758

Chandure, S., & Inamdar, V. (2023). Handwritten MODI Character Recognition Using Transfer Learning with Discriminant Feature Analysis. Institution of Electronics and Telecommunication Engineers Journal of Research, 69(5), 2584-2594. https://doi.or g/10.1080/03772063.2021.1902867

Chen, H., Luo, H., Huang, B., Jiang, B., & Kaynak, O. (2024). Transfer Learning-Motivated Intelligent Fault Diagnosis Designs: A Survey, Insights, and Perspectives. Institute of Electrical and Electronics Engineers Transactions on Neural Networks and Learning Systems, 35(3), 2969-2983. https://doi.org/10.1109/TNNLS.2023.3290974

Cireşan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation, 22(12), 3207-3220. https://doi.org/10.1162/NECO_a_00052

Cohen, G., Afshar, S., Tapson, J., & Schaik, A. V. (2017, May 14-19). EMNIST: Extending MNIST to handwritten letters [Conference session]. 2017 International Joint Conference on Neural Networks Anchorage, Alaska, USA. https://doi.org/10.1109/IJCNN.2017.79 66217

Deng, J., Dong, W., Socher, R., Li, L. J., Kai, L., & Li, F-F. (2009, June 20-25). ImageNet: A large-scale hierarchical image database [Conference session]. 2009 Institute of Electrical and Electronics Engineers Conference on Computer Vision and Pattern Recognition, Miami, Florida, USA. https://doi.org/10.1109/CVPR.2009.5206848

Fateh, A., Fateh, M., & Abolghasemi, V. (2021). Multilingual handwritten numeral recognition using a robust deep network joint with transfer learning. Information Sciences, 581(3), 479-494. https://doi.org/10.1016/j.ins.2021.09.051

Ghaffarian, H., & Bamohabbat, A. R. (2023). Classification and Prediction of Customer Categories Using Combination of LRFM Method, Quartiles and Multi-class Data Mining Methods. Quarterly Scientific Journal of Technical and Vocational University, 20(1), 511-532. https://doi.org/10.48301/kssa.2022.316104.1852

Hassan, E., Hossain, M. S., Saber, A., Elmougy, S., Ghoneim, A., & Muhammad, G. (2024). A quantum convolutional network and ResNet (50)-based classification architecture for the MNIST medical dataset. Biomedical Signal Processing and Control, 87(7792), 105560. https://doi.org/10.1016/j.bspc.2023.105560

Iman, M., Arabnia, H. R., & Rasheed, K. (2023). A Review of Deep Transfer Learning and Recent Advancements. Technologies, 11(2), 40. https://doi.org/10.3390/technologi es11020040

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25(2), 1-9. h ttps://doi.org/10.1145/3065386

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019, May 6-9). Albert: A lite bert for self-supervised learning of language representations [Conference session]. International Conference on Learning Representations, New Orleans, Louisiana, United States. https://doi.org/10.48550/arXiv.1909.11942

Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the Institute of Electrical and Electronics Engineers, 86(11), 2278-2324. https://doi.org/10.1109/5.726791

Lecun, Y., Cortes, C., & Burges, C. J. (2010). MNIST handwritten digit database [Data set]. AT&T Labs. http://yann.lecun.com/exdb/mnist

Lejeune, E. (2020). Mechanical MNIST: A benchmark dataset for mechanical metamodels. Extreme Mechanics Letters, 36, 100659. https://doi.org/10.1016/j.eml.2020.100659

Namjouye Rad, A. A., & Dadgarpour, M. (2021). Detection of network penetration by data mining and using machine learning via SVM algorithm. Quarterly Scientific Journal of Technical and Vocational University, 17(4), 13-34. https://doi.org/10.48301/kssa .2021.128393

Rudregowda, S., Patil Kulkarni, S., H L, G., Ravi, V., & Krichen, M. (2023). Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network. Acoustics, 5(1), 343-353. https://doi.org/10.3390/acoustics5010020

Salehi, A. W., Khan, S., Gupta, G., Alabduallah, B. I., Almjally, A., Alsolai, H., Siddiqui, T., & Mellit, A. (2023). A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability, 15(7), 5930. https://doi.org/1 0.3390/su15075930

Shang, S., Shan, Z., Liu, G., Wang, L., Wang, X., Zhang, Z., & Zhang, J. (2024, February 20-27). Resdiff: Combining Cnn and Diffusion Model for Image Super-resolution [Conference session]. Proceedings of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, Vancouver, Canada. http://dx.do i.org/10.13140/RG.2.2.22060.13444

Simonyan, K., & Zisserman, A. (2014, May 7-9). Very deep convolutional networks for large-scale image recognition [Conference session]. International Conference on Learning Representations, San Diego, California. https://doi.org/10.48550/arXiv.1409.1556

Taghandiki, K. (2023). Implementation of a Noisy Hyperlink Removal System: Using the Semantic and Relational Approach of the DBpedia Ontology. Quarterly Scientific Journal of Technical and Vocational University, 20(3), 485-507. https://doi.org/10.48301/kssa .2023.382583.2426

Taghandiki, K., Ahmadi, M. H., & Ehsan, E. R. (2023). Automatic summarisation of Instagram social network posts Combining semantic and statistical approaches. arXiv 1-7. http s://doi.org/10.48550/arXiv.2303.07957

Tan, M., & Le, Q. (2019, Jun 9-15). Efficientnet: Rethinking model scaling for convolutional neural networks [Conference session]. International conference on machine learning, Long Beach, California, USA. https://proceedings.mlr.press/v97/tan19a.html?ref=ji na-ai-gmbh.ghost.io

Tan, M., & Le, Q. V. (2019). Mixconv: Mixed depthwise convolutional kernels. arXiv, 1-13. https://doi.org/10.48550/arXiv.1907.09595

Taye, M. M. (2023). Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers, 12(5), 91. https://doi.or g/10.3390/computers12050091

Theodoris, C. V., Xiao, L., Chopra, A., Chaffin, M. D., Al Sayed, Z. R., Hill, M. C., Mantineo, H., Brydon, E. M., Zeng, Z., Liu, X. S., & Ellinor, P. T. (2023). Transfer learning enables predictions in network biology. Nature, 618(7965), 616-624. https://doi.org/ 10.1038/s41586-023-06139-9

Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019, October 27- November 02). Cutmix: Regularization strategy to train strong classifiers with localizable features [Conference session]. Proceedings of the Institute of Electrical and Electronics Engineers/ International Conference on Computer Vision international conference on computer vision, Seoul, Korea (South). https://doi.org/10.1109/ICCV.2019.00612

Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision – European Conference on Computer Vision 2014 (pp. 818-833). Springer International Publishing. https://doi.org/10.1007/978-3-319-10590-1_53

دوره 1، شماره 2
علوم انسانی
تابستان 1403
صفحه 45-68

XML

اصل مقاله 1.31 M

تاریخ دریافت 20 خرداد 1403
تاریخ بازنگری 19 شهریور 1403
تاریخ پذیرش 21 آبان 1403

تعداد مشاهده مقاله 297
تعداد دریافت فایل اصل مقاله 145

علوم مهارتی و خلاقیت

تشخیص اعداد دست‌نویس MNIST با استفاده از شبکه ترنسفری VGG16

Handwritten Digit Recognition on MNIST Using Transfer Learning with VGG16

دوره 1، شماره 2علوم انسانیتابستان 1403صفحه 45-68

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

دوره 1، شماره 2
علوم انسانی
تابستان 1403
صفحه 45-68