Vehicle Feature VQA: Visual Question Answering for Vehicle Feature
Keywords:
Visual Question Answering, Vehicle Feature, RestNet50, Convolutional Neural Networks, feature extraction, Long Short-Term Memory, Blue ScoreAbstract
Visual Question Answering (VQA) can automatically produce the predict answers for questions and real-world images. In this paper, we propose the VQA dataset for Vehicle Feature to know the knowledge of Vehicle. We develop the VQA model using RestNet50 in Convolutional Neural Networks (CNN) for feature extraction of images and Long Short-Term Memory (LSTM) for question feature extraction and answer generation. The experimental result describes the training loss, evaluation loss, Blue Score, and VQA accuracy for epochs 20 and epochs 30. In epochs 20, after VQA model generated the training loss 1.1949 , evaluation loss 1.7953, Blue Score 0.6180, and VQA accuracy 0.0493, this model predicted the one correct answer for question and image. In epochs 30, the VQA model predicted the five correct answers in fifteen test data for vehicle feature questions and image according to generate the training loss 0.8780, evaluation loss 1.6634, Blue Score 0.6775 and VQA accuracy 0.0627.
References
[1] S. Chowdhury, B. Soni, “eaVQA: An Experimental Analysis on Visual Question Answering Models”, in Proc. of the 18th International Conference on Natural Language Processing, 2021, pp. 550-554.
[2] Z.Wang, S. Ji, “Learning Convolutional Text Representations for Visual Question Answering”, in Proc. the 2018 SIAM International Conference on Data Mining, 2018, pp. 594-602.
[3] S. Antol et al., "VQA: Visual Question Answering," in Proc. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 2425-2433.
[4] Y. Goyal, T. Khot, D. Summers-Stay, D. Batra and D. Parikh, "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering," in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6325-6334.
[5] J. Guo et al., "From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models," in Proc. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 10867-10877.
[6] E. Borisova, N. Rauscher, G. Rehm, “SciVQA 2025: Overview of the First Scientific Visual Question Answering Shared Task”, in Proc. of the Fifth Workshop on Scholarly Document Processing (SDP 2025), Vienna, Austria, 2025, pp. 182-210.
[7] C. Zhou, G. Chen, X. Bai, M. Dong, “On the Human-level Performance of Visual Question Answering”, in Proc. of the 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, 2025, pp. 4109-4113.
[8] Z. Zhang, “Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolution Approach”, arXiv:2405.00479v2[online], pp.1-12, https://arxiv.org/html/2405.00479v2 [11.Nov.2024].
[9] Huynh, N.D., Bouadjenek, M.R., Aryal, S., Razzak, I. and Hacid, H. “Visual question answering: from early developments to recent advances--a survey”. arXiv preprint arXiv:2501.03939[on-line], https://arxiv.org/abs/2501.03939, [11.Jan.2025].
[10] S. Gautam, V. Thambawita, M. Riegler, P. Halvorsen, S. Hicks, “Medico 2025: Visual Question Answering for Gastrointestinal Imaging”, arXiv preprint arXiv:2508. 10869 [on-line], https://arxiv.org/abs/2508.10869 [14.Aug.2025].
[11] I. Allaouzi, M. B. Ahmed, B. Benamrou, “An Encoder-Decoder model for visual question answering in the medical domain”, CEUR-WS.org[on-line], vol. 2380, pp.124-132, https://ceur-ws.org/Vol-2380/paper_124.pdf [9-12 September 2019].
[12] R. Pal, S. Kar, D. K. Prasad, “NorVivqA: Visual Question Answering for Visually Impaired in Norwegian Language”, CEUR-WS.org [on-line], Vol.3975, pp.3-13, https://ceur-ws.org/Vol-3975/paper3.pdf [17-18, June 2025].
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Pa Pa Tun, Khin Mar Soe

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who submit papers with this journal agree to the following terms.