Add Image and Segmentation Extraction Scripts for Waymo Dataset #902

TaylorBrown96 · 2025-03-14T02:12:21Z

Overview

This initial commit introduces a set of core scripts for extracting and processing images and segmentation masks from the Waymo Open Dataset. The scripts ensure efficient data handling, conversion, and validation to support machine learning workflows.

This framework not only facilitates segmentation tasks but is also adaptable for various other applications, such as object detection and other computer vision models. By enabling the conversion of parquet files into easily usable file formats, it provides a solid foundation for training a wider range of machine learning models.

Added Files

entry.py
- Orchestrates the entire extraction and conversion process.
- Checks for the existence of required files and prompts for downloads if missing.
- Converts parquet files to CSV for easier inspection.
- Extracts images and segmentation masks, and reports the number of matching pairs.
extract_images.py
- Extracts images from parquet files where corresponding segmentation keys exist.
- Ensures efficient storage by avoiding duplicate extractions.
extract_segmentations.py
- Extracts segmentation masks only if a corresponding image exists to ensure data consistency.
- Utilizes parquet key matching for accurate alignment.
parquetToCSV.py
- Converts parquet datasets into CSV format for human-readable inspection and validation.
schemaExtraction.py
- Removes metadata from parquet schemas and prints them in an indented, readable format.
- Provides row counts for both image and segmentation datasets.
README.md
- Provides an overview of the project, setup instructions, and usage guidelines.
- Includes explanations for each script and how they interact within the data processing pipeline.
- Outlines requirements and expected input/output formats.

Testing & Validation

Verified the extraction pipeline using sample Waymo parquet files.
Ensured that only matching image-segmentation pairs are processed.
Confirmed the correctness of the CSV outputs.
Validated schema extraction for clarity and accuracy.
Reviewed the README.md for completeness and clarity.

- Added `entry.py` to coordinate the extraction and conversion process for images and segmentation masks from Waymo Open Dataset. - Implemented `extractImages.py` for extracting images with corresponding segmentation keys. - Implemented `extractSegmentations.py` for extracting segmentation masks from parquet files. - Added `parquetToCSV.py` for converting parquet data to CSV format for easier inspection. - Included `schemaExtraction.py` for inspecting and formatting the schema of parquet files, removing metadata for clarity. These additions establish the foundation for processing and analyzing Waymo dataset files, enabling image and segmentation extraction, format conversion, and schema analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Image and Segmentation Extraction Scripts for Waymo Dataset #902

Add Image and Segmentation Extraction Scripts for Waymo Dataset #902

Uh oh!

TaylorBrown96 commented Mar 14, 2025

Uh oh!

Uh oh!

Add Image and Segmentation Extraction Scripts for Waymo Dataset #902

Are you sure you want to change the base?

Add Image and Segmentation Extraction Scripts for Waymo Dataset #902

Uh oh!

Conversation

TaylorBrown96 commented Mar 14, 2025

Overview

Added Files

Testing & Validation

Uh oh!

Uh oh!