Video DataType support #5054

stayrascal · 2025-08-26T10:05:17Z

stayrascal
Aug 26, 2025

Backgroud

Video data, as a critical type of multimodal data, uniquely integrates visual, audio, and temporal dimensions, inherently fusing spatial (image-based) and temporal information. It has been widely adopted across domains including short-video platforms, live streaming, public security, healthcare, and autonomous driving.

Given the large volume of video data, most processing paradigms typically involve streaming-based reading and processing to minimize memory footprint. This distinguishes it from image data, which generally requires full initial loading into memory prior to processing.

Thus, when introducing the Video data type into Daft, it should avoid storing the entire dataset in memory. Drawing inspiration from the File data type, we can either store merely a URL reference to the video data or directly utilize the underlying data structure of the File data type as its internal representation.

Beyond the core content of video data, it is critical to extract key metadata to facilitate subsequent filtering of target videos prior to processing. Videos encompass extensive metadata, such as frame count, resolution (height/width), time base, duration, pixel format, bit rate, codec name, and profile, among others. However, incorporating all such metadata into the Video data type is impractical from a memory efficiency standpoint. Instead, we prioritize including only essential metadata fields—specifically frame count, height, width, and FPS. Additional metadata can be dynamically retrieved during video processing as needed.

Design

DataType

Based on all the preceding points, we will introduce a logical data type Video, whose physical data type initially includes file reference, frame count, height, and width.

#[derive(Clone, Debug, Display, PartialEq, Eq, Serialize, Deserialize, Hash)]
pub enum DataType {
    /// ....
    Video
}

impl DataType {
    pub fn to_arrow(&self) -> DaftResult<ArrowType> {
        match self {
            /// ...
            #[cfg(feature = "python")]
            Self::Python 
            | Self::Unknown 
            | Self::File 
            | Self::Video => Err(DaftError::TypeError(format!(
                "Can not convert {self:?} into arrow type"
            )))
        }
    }

    pub fn to_physical(&self) -> Self {
        use DataType::*;
        match self {
            /// ...
            Video => Struct(vec![
                Field::new("file", File),
                Field::new("frames", UInt32),
                Field::new("width", UInt32),
                Field::new("height", UInt32),
            ]),
            _ => {
                assert!(self.is_physical());
                self.clone()
            }
        }
    }

    #[inline]
    pub fn is_video(&self) -> bool {
        match self {
            Self::Video => true,
            Self::Extension(_, inner, _) => inner.is_video(),
            _ => false,
        }
    }
}

Functions

With the Video data type implemented, we must address how to apply functions and UDFs to Video data. Common use cases include extracting key frames, splitting videos by key frames, and extracting audio.

From the discussions in #4824 and #4820, namespaced expression syntax will be deprecated in v0.6. To this end, we will introduce dedicated functions key_frames(), split_video(), and audio() to fulfill these objectives.

def video(expr: Expression, io_config: IOConfig | None = None) -> Expression:
    """Converts a string contains video file/url reference, or a binary column, or a file column to a video expression"""
    return expr._eval_expressions("video", io_config=io_config)


def audio(expr: Expression, sample_rate: int = 48000, io_config: IOConfig | None = None) -> Expression:
    """Extract audio data from a video/file column"""
    return expr._eval_expressions("video", sample_rate=sample_rate, io_config=io_config)


def key_frames(
        expr: Expression,
        method: str = "I_frame",
        threshold: float = 0,
        frame_cnt: int = 10,
        seconds_per_frame: int = -1,
) -> Expression:
    """Extract key frames from a video column.

    Args:
        expr: The video column to extract key frames from.
        method: The method to use for key frame extraction. Optional values: ["difference", "optical_flow", "histogram", "I_frame"].
        threshold: The threshold used for determining key frame. Recommended values: 2,000,000 for difference, 2.0 for optical_flow, 0.01 for histogram.
        frame_cnt: The number of key frames to extract. -1 means no limit on the number.
        seconds_per_frame: Frame extraction interval in seconds. -1 means no specified interval.
    """

    return expr._eval_expressions("key_frames",
                                  method=method,
                                  threshold=threshold,
                                  frame_cnt=frame_cnt,
                                  seconds_per_frame=seconds_per_frame)

Example

Building on the Video data type and associated functions, we present a usage example centered around key frame extraction.

df = df.from_glob_path("s3://bucket/videos/")

# Convert path to video directly from utf8 data type to video type. Daft should support convert from utf8 and file data type both.
df = df.with_column("video", video(col("path")))

# Filter video by video metadata.
df = df.filter((df["width"] > 1024) & (df["height"] > 576) & (df["frames"] > 100)))

# Extract the key frames, the `key_frames` function will streaming read video data
# and extract multiple key frames, the data type of each frame is FixedShapeImage. The `key_frames` might add more parameters to indicate what's the image mode of key frames.
# TODO consider whether to include some metadata for key frame to compatible with daft.read_video_frames
df = df.with_column("key_frames", key_frames(col("video"), method= "I_frame").explode("key_frames")

# Save the key frames as a dataset.
df.select("path", "key_frames").write_lance("key_frames_dataset")

VideoArray

pub type VideoArray = LogicalArray<VideoType>;

impl VideoArray {
    #[cfg(feature = "python")]
    pub fn new_from_reference_array(
        name: &str,
        urls: &Utf8Array,
        io_config: Option<IOConfig>,
    ) -> Self {
        todo!()
    }

    #[cfg(feature = "python")]
    pub fn new_from_file_array(
        name: &str,
        files: &FileArray,
    ) -> Self {
        todo!()
    }

}

Roadmap

add video data type
add VideoArray with basic construct ability
Support type conversion between video and file/utf8/url
Support scalar function for video type
Add daft.Video python object for UDF case

Alternatives

1. Extend `File` data type to add `metadata` field.

Summary: Add a new metadata map data type to the physical data type of File, the detail as follow:

Struct(vec![
      Field::new("discriminant", UInt8),
      Field::new("data", Binary),
      Field::new("url", Utf8),
      Field:new("metadata", Map),
      #[cfg(feature = "python")]
      Field::new("io_config", Python),
 ])

The key distinction between Video and File lies in our intent to extract specific metadata to facilitate downstream filtering. If the File data type natively supported metadata, there would be no need for distinct Video or Audio data types. Furthermore, metadata is also the defining characteristic that differentiates the File type from the Url type.

malcolmgreaves · 2025-08-27T23:41:01Z

malcolmgreaves
Aug 27, 2025
Maintainer

Discussion in community slack: https://dist-data.slack.com/archives/C052CA6Q9N1/p1756267131368819

0 replies

universalmind303 · 2025-08-28T18:10:56Z

universalmind303
Aug 28, 2025
Maintainer

@stayrascal I think this is a great idea. It aligns well with some of my personal goals for File datatype. Having Video being a logical type over File makes perfect sense to me. If this is something you're interested in implementing, I'd be happy to help out in any way I can!

0 replies

jaychia · 2025-09-08T18:49:25Z

jaychia
Sep 8, 2025
Maintainer

@samster25 is back and will take a look here!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Video DataType support #5054

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Video DataType support #5054

Uh oh!

Uh oh!

stayrascal Aug 26, 2025

Backgroud

Design

DataType

Functions

Example

VideoArray

Roadmap

Alternatives

1. Extend File data type to add metadata field.

Replies: 3 comments

Uh oh!

malcolmgreaves Aug 27, 2025 Maintainer

Uh oh!

Uh oh!

universalmind303 Aug 28, 2025 Maintainer

Uh oh!

jaychia Sep 8, 2025 Maintainer

stayrascal
Aug 26, 2025

1. Extend `File` data type to add `metadata` field.

malcolmgreaves
Aug 27, 2025
Maintainer

universalmind303
Aug 28, 2025
Maintainer

jaychia
Sep 8, 2025
Maintainer