Skip to content

[Feature]: External Filter Funcion at SDK #39914

@PwzXxm

Description

@PwzXxm

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

For users who have a large size of filtering expression which is nearly impossible to transmit over the network, we should provide a simple syntactic sugar for them. That is, we want to help the user to achieve the following:

  • Given an external filtering function and a batch_size, we call KNN search, check function on the returned results and if the number of satisfied results is less than the desired batch_size, continue to iterate until there are batch_size valid results.
    This is a pure SDK feature and the logic is only on SDK.

It could be used together with filter which filters on the server, and external_filter_func works on the SDK.
For example, the user could specify batch_size= 30, filter=id > 10,

  1. The server returns a batch with 30 results all satisfying id > 10
  2. If the user specifies external filter only accepts id % 2 == 0, then this condition is applied to the returned 30 results
  3. If there are less than 30 results after the external filter, call next again until there are 30 results or no more returned results or meet limit.

PS limit will be deprecated in the future.

Describe the solution you'd like.

vector_to_search = rng.random((1, DIM), np.float32)
expr = f"10 <= {AGE} <= 25"
valid_ids = [1, 12, 123, 1234]

def external_filter_func(hits: Hits):
    # option 1
    return list(filter(lambda hit: hit.id in valid_ids, hits))

    # option 2
    results = []
    for hit in hits:
        if hit.id in valid_ids:
            results.append(hit)
    return results

search_iterator = milvus_client.search_iterator(
    collection_name=collection_name,
    data=vector_to_search,
    batch_size=100,
    anns_field=PICTURE,
    filter=expr,
    external_filter_func=external_filter_func,
    output_fields=[USER_ID, AGE]
)

page_idx = 0
while True:
    res = search_iterator.next()
    if len(res) == 0:
        print("search iteration with external filter finished, close")
        search_iterator.close()
        break
    for i in range(len(res)):
        print(res[i])
    page_idx += 1
    print(f"page{page_idx}-------------------------")

Targeting Milvus Version: V2.5.5, V2.6.x

Describe an alternate solution.

No response

Anything else? (Additional Context)

SDKs:

Metadata

Metadata

Assignees

Labels

kind/featureIssues related to feature request from users

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions