علوم مهارتی و خلاقیت

علوم مهارتی و خلاقیت

تشخیص اعداد دست‌نویس MNIST با استفاده از شبکه ترنسفری VGG16

نوع مقاله : مقاله پژوهشی

نویسنده
عضو هیأت علمی گروه مهندسی کامپیوتر، دانشگاه ملی مهارت، تهران، ایران
چکیده
تشخیص اعداد دست‌نویس با استفاده از مجموعه داده MNIST ازجمله مسائل اساسی در زمینه یادگیری عمیق و بینایی کامپیوتری است. در این تحقیق، از مدل یادگیری انتقالی VGG16 برای تشخیص اعداد دست‌نویس استفاده شده است. این مدل که قبلاً بر روی مجموعه داده ImageNet آموزش دیده بود، دوباره به‌منظور سازگاری با مجموعه داده MNIST آموزش داده شد. عملکرد این مدل با استفاده از معیارهای Accuracy، Precision، Recall و F1 score ارزیابی شد و نتایج آن با سایر الگوریتم‌های یادگیری عمیق مانند شبکه‌های عصبی پیچشی، چندلایه‌ای‌های چندگانه و الگوریتم‌های یادگیری ماشین سنتی مقایسه شد. نتایج نشان داد که مدل VGG16 با استفاده از یادگیری انتقالی، دارای دقت (Accuracy) 99 درصد در تشخیص اعداد دست‌نویس می‌باشد که نسبت به مدل‌های آموزش‌دیده از ابتدا، دقت بالاتری دارد. از این‌رو استفاده از مدل‌های پیش‌آموزش‌شده می‌تواند عملکرد مدل‌های یادگیری عمیق را برای تشخیص اعداد دست‌نویس بهبود بخشد، در حالی که زمان آموزش و منابع محاسباتی موردنیاز را کاهش می‌دهد.
کلیدواژه‌ها

عنوان مقاله English

Handwritten Digit Recognition on MNIST Using Transfer Learning with VGG16

نویسنده English

Kazem Taghandiki
Faculty Member, Department of Computer Engineering, National University of Skills (NUS), Tehran, Iran.
چکیده English

Handwritten digit recognition using the MNIST dataset is one of the fundamental problems in the field of deep learning and computer vision. In this study, the VGG16 transfer learning model was employed for recognizing handwritten digits. This model, which was previously trained on the ImageNet dataset, was retrained to adapt to the MNIST dataset. The performance of this model was evaluated using metrics such as accuracy, precision, recall, and F1 score, and the results were compared with other deep learning algorithms, including convolutional neural networks (CNNs), multilayer perceptrons (MLPs), and traditional machine learning algorithms. The results indicated that the VGG16 model, utilizing transfer learning, achieved an accuracy of 99% in recognizing handwritten digits, which is higher than that of models trained from scratch. Therefore, the use of pre-trained models can enhance the performance of deep learning models in handwritten digit recognition while reducing the required training time and computational resources.

کلیدواژه‌ها English

Handwritten Digit Recognition MNIST Dataset
ImageNet Dataset
Deep Learning
VGG16 Model
Aslani, S., & Jacob, J. (2023). Utilisation of deep learning for COVID-19 diagnosis. Clinical Radiology, 78(2), 150-157. https://doi.org/10.1016/j.crad.2022.11.006
Azizi, S., Kornblith, S., Saharia, C., Norouzi, M., & Fleet, D. J. (2023). Synthetic data from diffusion models improves imagenet classification. arXiv 1-19. https://doi.org/10.48 550/arXiv.2304.08466
Bakasa, W., & Viriri, S. (2023). VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction. Journal of Imaging, 9(7), 138. https://doi.org/10.33 90/jimaging9070138
Berngardt, O. I. (2023). Improving Classification Neural Networks by using Absolute activation function (MNIST/LeNET-5 example). arXiv, 1-19. https://doi.org/10.48550/arXiv. 2304.11758
Chandure, S., &  Inamdar, V. (2023). Handwritten MODI Character Recognition Using Transfer Learning with Discriminant Feature Analysis. Institution of Electronics and Telecommunication Engineers Journal of Research, 69(5), 2584-2594. https://doi.or g/10.1080/03772063.2021.1902867
Chen, H., Luo, H., Huang, B., Jiang, B., & Kaynak, O. (2024). Transfer Learning-Motivated Intelligent Fault Diagnosis Designs: A Survey, Insights, and Perspectives. Institute of Electrical and Electronics Engineers Transactions on Neural Networks and Learning Systems, 35(3), 2969-2983. https://doi.org/10.1109/TNNLS.2023.3290974
Cireşan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation, 22(12), 3207-3220. https://doi.org/10.1162/NECO_a_00052
Cohen, G., Afshar, S., Tapson, J., & Schaik, A. V. (2017, May 14-19). EMNIST: Extending MNIST to handwritten letters [Conference session]. 2017 International Joint Conference on Neural Networks Anchorage, Alaska, USA. https://doi.org/10.1109/IJCNN.2017.79 66217
Deng, J., Dong, W., Socher, R., Li, L. J., Kai, L., & Li, F-F. (2009, June 20-25). ImageNet: A large-scale hierarchical image database [Conference session]. 2009 Institute of Electrical and Electronics Engineers Conference on Computer Vision and Pattern Recognition, Miami, Florida, USA. https://doi.org/10.1109/CVPR.2009.5206848
Fateh, A., Fateh, M., & Abolghasemi, V. (2021). Multilingual handwritten numeral recognition using a robust deep network joint with transfer learning. Information Sciences, 581(3), 479-494. https://doi.org/10.1016/j.ins.2021.09.051
Ghaffarian, H., & Bamohabbat, A. R. (2023). Classification and Prediction of Customer Categories Using Combination of LRFM Method, Quartiles and Multi-class Data Mining Methods. Quarterly Scientific Journal of Technical and Vocational University, 20(1), 511-532. https://doi.org/10.48301/kssa.2022.316104.1852
Hassan, E., Hossain, M. S., Saber, A., Elmougy, S., Ghoneim, A., & Muhammad, G. (2024). A quantum convolutional network and ResNet (50)-based classification architecture for the MNIST medical dataset. Biomedical Signal Processing and Control, 87(7792), 105560. https://doi.org/10.1016/j.bspc.2023.105560
Iman, M., Arabnia, H. R., & Rasheed, K. (2023). A Review of Deep Transfer Learning and Recent Advancements. Technologies, 11(2), 40. https://doi.org/10.3390/technologi es11020040
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25(2), 1-9. h ttps://doi.org/10.1145/3065386
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019, May 6-9). Albert: A lite bert for self-supervised learning of language representations [Conference session]. International Conference on Learning Representations, New Orleans, Louisiana, United States. https://doi.org/10.48550/arXiv.1909.11942
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the Institute of Electrical and Electronics Engineers, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
Lecun, Y., Cortes, C., & Burges, C. J. (2010). MNIST handwritten digit database [Data set]. AT&T Labs. http://yann.lecun.com/exdb/mnist
Lejeune, E. (2020). Mechanical MNIST: A benchmark dataset for mechanical metamodels. Extreme Mechanics Letters, 36, 100659. https://doi.org/10.1016/j.eml.2020.100659
Namjouye Rad, A. A., & Dadgarpour, M. (2021). Detection of network penetration by data mining and using machine learning via SVM algorithm. Quarterly Scientific Journal of Technical and Vocational University, 17(4), 13-34. https://doi.org/10.48301/kssa .2021.128393
Rudregowda, S., Patil Kulkarni, S., H L, G., Ravi, V., & Krichen, M. (2023). Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network. Acoustics, 5(1), 343-353. https://doi.org/10.3390/acoustics5010020
Salehi, A. W., Khan, S., Gupta, G., Alabduallah, B. I., Almjally, A., Alsolai, H., Siddiqui, T., & Mellit, A. (2023). A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability, 15(7), 5930. https://doi.org/1 0.3390/su15075930
Shang, S., Shan, Z., Liu, G., Wang, L., Wang, X., Zhang, Z., & Zhang, J. (2024, February 20-27). Resdiff: Combining Cnn and Diffusion Model for Image Super-resolution [Conference session]. Proceedings of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, Vancouver, Canada.  http://dx.do i.org/10.13140/RG.2.2.22060.13444
Simonyan, K., & Zisserman, A. (2014, May 7-9). Very deep convolutional networks for large-scale image recognition [Conference session]. International Conference on Learning Representations, San Diego, California. https://doi.org/10.48550/arXiv.1409.1556
Taghandiki, K. (2023). Implementation of a Noisy Hyperlink Removal System: Using the Semantic and Relational Approach of the DBpedia Ontology. Quarterly Scientific Journal of Technical and Vocational University, 20(3), 485-507. https://doi.org/10.48301/kssa .2023.382583.2426
Taghandiki, K., Ahmadi, M. H., & Ehsan, E. R. (2023). Automatic summarisation of Instagram social network posts Combining semantic and statistical approaches. arXiv 1-7. http s://doi.org/10.48550/arXiv.2303.07957
Tan, M., & Le, Q. (2019, Jun 9-15). Efficientnet: Rethinking model scaling for convolutional neural networks [Conference session]. International conference on machine learning, Long Beach, California, USA. https://proceedings.mlr.press/v97/tan19a.html?ref=ji na-ai-gmbh.ghost.io
Tan, M., & Le, Q. V. (2019). Mixconv: Mixed depthwise convolutional kernels. arXiv, 1-13. https://doi.org/10.48550/arXiv.1907.09595
Taye, M. M. (2023). Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers, 12(5), 91. https://doi.or g/10.3390/computers12050091
Theodoris, C. V., Xiao, L., Chopra, A., Chaffin, M. D., Al Sayed, Z. R., Hill, M. C., Mantineo, H., Brydon, E. M., Zeng, Z., Liu, X. S., & Ellinor, P. T. (2023). Transfer learning enables predictions in network biology. Nature, 618(7965), 616-624. https://doi.org/ 10.1038/s41586-023-06139-9
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019, October 27- November 02). Cutmix: Regularization strategy to train strong classifiers with localizable features [Conference session]. Proceedings of the Institute of Electrical and Electronics Engineers/ International Conference on Computer Vision international conference on computer vision, Seoul, Korea (South). https://doi.org/10.1109/ICCV.2019.00612
Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision – European Conference on Computer Vision 2014 (pp. 818-833). Springer International Publishing. https://doi.org/10.1007/978-3-319-10590-1_53 
دوره 1، شماره 2
علوم انسانی
تابستان 1403
صفحه 45-68

  • تاریخ دریافت 20 خرداد 1403
  • تاریخ بازنگری 19 شهریور 1403
  • تاریخ پذیرش 21 آبان 1403