CapText: Large Language Model-based Caption Generation From Image Context and Description

Ghosh, Shinjini; Anupam, Sagnik

Computer Science > Machine Learning

arXiv:2306.00301 (cs)

[Submitted on 1 Jun 2023 (v1), last revised 6 Jun 2023 (this version, v2)]

Title:CapText: Large Language Model-based Caption Generation From Image Context and Description

Authors:Shinjini Ghosh, Sagnik Anupam

View PDF

Abstract:While deep-learning models have been shown to perform well on image-to-text datasets, it is difficult to use them in practice for captioning images. This is because captions traditionally tend to be context-dependent and offer complementary information about an image, while models tend to produce descriptions that describe the visual features of the image. Prior research in caption generation has explored the use of models that generate captions when provided with the images alongside their respective descriptions or contexts. We propose and evaluate a new approach, which leverages existing large language models to generate captions from textual descriptions and context alone, without ever processing the image directly. We demonstrate that after fine-tuning, our approach outperforms current state-of-the-art image-text alignment models like OSCAR-VinVL on this task on the CIDEr metric.

Comments:	Update 6/6/23: Fixed typographic error in abstract
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2306.00301 [cs.LG]
	(or arXiv:2306.00301v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.00301

Submission history

From: Sagnik Anupam [view email]
[v1] Thu, 1 Jun 2023 02:40:44 UTC (1,729 KB)
[v2] Tue, 6 Jun 2023 03:41:05 UTC (1,729 KB)

Computer Science > Machine Learning

Title:CapText: Large Language Model-based Caption Generation From Image Context and Description

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:CapText: Large Language Model-based Caption Generation From Image Context and Description

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators