Skip to content

shaheriar/Image-Captioning-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CS 228 Final Project

Enhancing Image Captioning with Deep Learning Models

Saul Gonzalez - sgonz081

Shaheriar Malik - smali032

Dataset: https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset

Abstract

Image captioning is a challenging task that involves generating descriptive textual representations for images, surpassing the complexity of mere image classification. To tackle this intricate task, we adopt a well-established approach that combines Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) networks , further enhanced by the integration of an Attention layer within the decoder. This enables us to effectively generate coherent and meaningful captions. Moreover, we employ advanced techniques such as mutliprocessing during the image retrieval and preprocessing stages, resulting in a substantial reduction in training time. We can efficiently fetch and preprocess multiple images simultaneously, harnessing the full potential of modern computing architectures.

Following is the diagram of our model and three randomly picked images with the generated captions using our model:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •