Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations

Tanja Mayer

Full Article

Published: May 8, 2024

Tanja Mayer

Department of Computer Science, University of Luxembourg, Luxembourg

Abstract

Visual Question Answering (VQA) is an interdisciplinary field that combines computer vision and natural language processing to answer questions about images. This paper explores the use of contrastive learning techniques to enhance VQA systems by improving the representation of image-text pairs. We introduce a novel approach that leverages context-aware image-text embeddings to boost performance on VQA tasks. Experimental results demonstrate significant improvements over baseline methods, highlighting the effectiveness of contrastive learning in multimodal data settings.

Downloads

Download data is not yet available.

How to Cite

Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations. (2024). Innovative Computer Sciences Journal, 10(1). http://innovatesci-publishers.com/index.php/ICSJ/article/view/233

Issue

Vol. 10 No. 1 (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Contrastive Learning with Multimodal Data: Enhancing Visual Question Answering with Context-Aware Image-Text Representations. (2024). Innovative Computer Sciences Journal, 10(1). http://innovatesci-publishers.com/index.php/ICSJ/article/view/233

Download Citation

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

How to Cite