Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
Synthetic Scene-Text Image Samples

The code in the master branch is for Python2. Python3 is supported in the python3 branch.
The main dependencies are:
pygame==2.0.0, opencv (cv2), PIL (Image), numpy, matplotlib, h5py, scipy
python gen.py --viz [--datadir <path-to-dowloaded-renderer-data>]
where, --datadir points to the renderer_data directory included in the
data torrent.
Specifying this datadir is optional, and if not specified, the script will
automatically download and extract the same renderer.tar.gz data file (~24 M).
This data file includes:
- sample.h5: This is a sample h5 file which contains a set of 5 images along with their depth and segmentation information. Note, this is just given as an example; you are encouraged to add more images (along with their depth and segmentation information) to this database for your own use.
- fonts: three sample fonts (add more fonts to this folder and then update
fonts/fontlist.txtwith their paths). - newsgroup: Text-source (from the News Group dataset). This can be subsituted with any text file. Look inside
text_utils.pyto see how the text inside this file is used by the renderer. - models/colors_new.cp: Color-model (foreground/background text color model), learnt from the IIIT-5K word dataset.
- models: Other cPickle files (char_freq.cp: frequency of each character in the text dataset; font_px2pt.cp: conversion from pt to px for various fonts: If you add a new font, make sure that the corresponding model is present in this file, if not you can add it by adapting
invert_font_size.py).
This script will generate random scene-text image samples and store them in an h5 file in results/SynthText.h5. If the --viz option is specified, the generated output will be visualized as the script is being run; omit the --viz option to turn-off the visualizations. If you want to visualize the results stored in results/SynthText.h5 later, run:
python visualize_results.py
A dataset with approximately 800000 synthetic scene-text images generated with this code can be found in the SynthText.zip file in the torrent here; dataset detais/description in readme.txt file in the same torrent.
Segmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available here.
predict_depth.mMATLAB script to regress a depth mask for a given RGB image; uses the network of Liu etal. However, more recent works (e.g., this) might give better results.run_ucm.mandfloodFill.pyfor getting segmentation masks using gPb-UCM.
For an explanation of the fields in sample.h5 (e.g.: seg,area,label), please check this comment.
The 8,000 background images used in the paper, along with their
segmentation and depth masks, are included in the same
torrent
as the pre-generated dataset under the bg_data directory. The files are:
| filenames | description |
|---|---|
imnames.cp |
names of images which do not contain background text |
bg_img.tar.gz |
images (filter these using imnames.cp) |
depth.h5 |
depth maps |
seg.h5 |
segmentation maps |
use_preproc_bg.py provides sample code for reading this data.
Note: We do not own the copyright to these images.
- @JarveeLee has modified the pipeline for generating samples with Chinese text here.
- @adavoudi has modified it for arabic/persian script, which flows from right-to-left here.
- @MichalBusta has adapted it for a number of languages (e.g. Bangla, Arabic, Chinese, Japanese, Korean) here.
- @gachiemchiep has adapted for Japanese here.
- @gungui98 has adapted for Vietnamese here.
- @youngkyung has adapted for Korean here.
- @kotomiDu has developed an interactive UI for generating images with text here.
- @LaJoKoch has adapted for German here.
Please refer to the paper for more information, or contact me (email address in the paper).