Skip to content

Add Image and Segmentation Extraction Scripts for Waymo Dataset #902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TaylorBrown96
Copy link

Overview

This initial commit introduces a set of core scripts for extracting and processing images and segmentation masks from the Waymo Open Dataset. The scripts ensure efficient data handling, conversion, and validation to support machine learning workflows.

This framework not only facilitates segmentation tasks but is also adaptable for various other applications, such as object detection and other computer vision models. By enabling the conversion of parquet files into easily usable file formats, it provides a solid foundation for training a wider range of machine learning models.


Added Files

  1. entry.py

    • Orchestrates the entire extraction and conversion process.
    • Checks for the existence of required files and prompts for downloads if missing.
    • Converts parquet files to CSV for easier inspection.
    • Extracts images and segmentation masks, and reports the number of matching pairs.
  2. extract_images.py

    • Extracts images from parquet files where corresponding segmentation keys exist.
    • Ensures efficient storage by avoiding duplicate extractions.
  3. extract_segmentations.py

    • Extracts segmentation masks only if a corresponding image exists to ensure data consistency.
    • Utilizes parquet key matching for accurate alignment.
  4. parquetToCSV.py

    • Converts parquet datasets into CSV format for human-readable inspection and validation.
  5. schemaExtraction.py

    • Removes metadata from parquet schemas and prints them in an indented, readable format.
    • Provides row counts for both image and segmentation datasets.
  6. README.md

    • Provides an overview of the project, setup instructions, and usage guidelines.
    • Includes explanations for each script and how they interact within the data processing pipeline.
    • Outlines requirements and expected input/output formats.

Testing & Validation

  • Verified the extraction pipeline using sample Waymo parquet files.
  • Ensured that only matching image-segmentation pairs are processed.
  • Confirmed the correctness of the CSV outputs.
  • Validated schema extraction for clarity and accuracy.
  • Reviewed the README.md for completeness and clarity.

- Added `entry.py` to coordinate the extraction and conversion process for images and segmentation masks from Waymo Open Dataset.
- Implemented `extractImages.py` for extracting images with corresponding segmentation keys.
- Implemented `extractSegmentations.py` for extracting segmentation masks from parquet files.
- Added `parquetToCSV.py` for converting parquet data to CSV format for easier inspection.
- Included `schemaExtraction.py` for inspecting and formatting the schema of parquet files, removing metadata for clarity.

These additions establish the foundation for processing and analyzing Waymo dataset files, enabling image and segmentation extraction, format conversion, and schema analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant