Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations

Main Article Content

Tanja Mayer

Abstract

Visual Question Answering (VQA) is an interdisciplinary field that combines computer vision and natural language processing to answer questions about images. This paper explores the use of contrastive learning techniques to enhance VQA systems by improving the representation of image-text pairs. We introduce a novel approach that leverages context-aware image-text embeddings to boost performance on VQA tasks. Experimental results demonstrate significant improvements over baseline methods, highlighting the effectiveness of contrastive learning in multimodal data settings.

Downloads

Download data is not yet available.

Article Details

How to Cite
Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations. (2024). Innovative Computer Sciences Journal, 10(1). http://innovatesci-publishers.com/index.php/ICSJ/article/view/233
Section
Articles

How to Cite

Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations. (2024). Innovative Computer Sciences Journal, 10(1). http://innovatesci-publishers.com/index.php/ICSJ/article/view/233