Skip to content

Clarification needed on discrepancy between Figure 3 of the paper and the actual dataset clip durations. #13

@hongluzhou

Description

@hongluzhou

Thank you for sharing the code and data!
If I understand Figure 3 (from Section 3.2) correctly, it shows that there are over 50k clips with a duration longer than 180 seconds. However, when I checked 'miradata_v1_330k.csv', it seems there are only 35k clips exceeding 180 seconds. I'm confused by the discrepancy. Am I misunderstanding Figure 3?

df = pd.read_csv('miradata_v1_330k.csv', encoding='utf-8')
print(len(df))
# 330313 will be printed

filtered_df = df[df['seconds'] > 180]
print(len(filtered_df))
# 35548 will be printed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions