Skip to content

Dataset Versions API Does Not Return All Versions Due to LIMIT and OFFSET Placement in SQL Query #2944

@inanalper

Description

@inanalper

Description: I've encountered an issue with the Marquez dataset versions API where not all dataset versions are returned, even when the limit parameter is set higher than the total number of versions.

Steps to Reproduce:

Prepare Data: Download the dump and initialize my database(Its 130 KB) https://drive.google.com/file/d/1T8LI-NRHg7Qxj_pi7CN0sRm0ZcssooxU/view
API Request: Use the /api/v1/namespaces/s3a%3A%2F%2Fproduct-data/datasets/%2F4f5e4a74-d608-48b9-968b-b638ff80654f/versions
Set Limit: Set the limit parameter to value 25, 100 and 1000. The returned list sizes will be 1, 3 and 6 respectively while the totalCount property is always 6.
Notice that the API returns fewer versions than expected.

Expected Behavior:

The API should return all dataset versions up to the specified limit. If the limit exceeds the total number of versions, all versions should be returned.

Actual Behavior:

The API returns fewer versions than expected, and the number of versions returned does not match the total count, even when the limit is sufficiently high.

Cause:

The issue is due to the placement of the LIMIT and OFFSET clauses within the SQL query used in the DatasetVersionDao.findAll method. The LIMIT and OFFSET are applied within a Common Table Expression (CTE) before grouping and filtering, leading to inconsistent results.

I am going to open a PR to fix the placement according to your guidelines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions