Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations
Main Article Content
Abstract
Visual Question Answering (VQA) is an interdisciplinary field that combines computer vision and natural language processing to answer questions about images. This paper explores the use of contrastive learning techniques to enhance VQA systems by improving the representation of image-text pairs. We introduce a novel approach that leverages context-aware image-text embeddings to boost performance on VQA tasks. Experimental results demonstrate significant improvements over baseline methods, highlighting the effectiveness of contrastive learning in multimodal data settings.
Downloads
Download data is not yet available.
Article Details
How to Cite
Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations. (2024). Innovative Computer Sciences Journal, 10(1). https://innovatesci-publishers.com/index.php/ICSJ/article/view/233
Issue
Section
Articles
This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations. (2024). Innovative Computer Sciences Journal, 10(1). https://innovatesci-publishers.com/index.php/ICSJ/article/view/233