Video DataType support #5054
stayrascal
started this conversation in
Roadmaps
Replies: 3 comments
-
Discussion in community slack: https://dist-data.slack.com/archives/C052CA6Q9N1/p1756267131368819 |
Beta Was this translation helpful? Give feedback.
0 replies
-
@stayrascal I think this is a great idea. It aligns well with some of my personal goals for |
Beta Was this translation helpful? Give feedback.
0 replies
-
@samster25 is back and will take a look here! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Backgroud
Video data, as a critical type of multimodal data, uniquely integrates visual, audio, and temporal dimensions, inherently fusing spatial (image-based) and temporal information. It has been widely adopted across domains including short-video platforms, live streaming, public security, healthcare, and autonomous driving.
Given the large volume of video data, most processing paradigms typically involve streaming-based reading and processing to minimize memory footprint. This distinguishes it from image data, which generally requires full initial loading into memory prior to processing.
Thus, when introducing the Video data type into Daft, it should avoid storing the entire dataset in memory. Drawing inspiration from the
File
data type, we can either store merely a URL reference to the video data or directly utilize the underlying data structure of theFile
data type as its internal representation.Beyond the core content of video data, it is critical to extract key metadata to facilitate subsequent filtering of target videos prior to processing. Videos encompass extensive metadata, such as frame count, resolution (height/width), time base, duration, pixel format, bit rate, codec name, and profile, among others. However, incorporating all such metadata into the Video data type is impractical from a memory efficiency standpoint. Instead, we prioritize including only essential metadata fields—specifically frame count, height, width, and FPS. Additional metadata can be dynamically retrieved during video processing as needed.
Design
DataType
Based on all the preceding points, we will introduce a logical data type Video, whose physical data type initially includes file reference, frame count, height, and width.
Functions
With the Video data type implemented, we must address how to apply functions and UDFs to Video data. Common use cases include extracting key frames, splitting videos by key frames, and extracting audio.
From the discussions in #4824 and #4820, namespaced expression syntax will be deprecated in v0.6. To this end, we will introduce dedicated functions key_frames(), split_video(), and audio() to fulfill these objectives.
Example
Building on the
Video
data type and associated functions, we present a usage example centered around key frame extraction.VideoArray
Roadmap
Alternatives
1. Extend
File
data type to addmetadata
field.Summary: Add a new
metadata
map data type to the physical data type ofFile
, the detail as follow:The key distinction between
Video
andFile
lies in our intent to extract specific metadata to facilitate downstream filtering. If theFile
data type natively supported metadata, there would be no need for distinct Video or Audio data types. Furthermore, metadata is also the defining characteristic that differentiates theFile
type from theUrl
type.Beta Was this translation helpful? Give feedback.
All reactions