Skip to content

[Bug]: Load failed and search failed after milvus restarted #45721

@zhuwenxing

Description

@zhuwenxing

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.6-20251119-f288f698-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

loaded failed


[2025-11-19T17:22:13.432Z]         # release and reload with changed replicas

[2025-11-19T17:22:13.432Z]         collection_w.release()

[2025-11-19T17:22:13.432Z]         replica_number = 1

[2025-11-19T17:22:13.432Z]         if replicas_loaded in [0, 1] and len(ms.query_nodes) >= 2:

[2025-11-19T17:22:13.432Z]             replica_number = 2

[2025-11-19T17:22:13.432Z] >       collection_w.load(replica_number=replica_number)

[2025-11-19T17:22:13.432Z] 

[2025-11-19T17:22:13.432Z] testcases/test_action_second_deployment.py:220: 

[2025-11-19T17:22:13.432Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2025-11-19T17:22:13.432Z] /usr/local/lib/python3.10/dist-packages/pymilvus/orm/collection.py:430: in load

[2025-11-19T17:22:13.432Z]     conn.load_collection(

[2025-11-19T17:22:13.432Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:254: in handler

[2025-11-19T17:22:13.432Z]     raise e from e

[2025-11-19T17:22:13.432Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:250: in handler

[2025-11-19T17:22:13.432Z]     return func(*args, **kwargs)

[2025-11-19T17:22:13.432Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:297: in handler

[2025-11-19T17:22:13.432Z]     return func(self, *args, **kwargs)

[2025-11-19T17:22:13.432Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:195: in handler

[2025-11-19T17:22:13.432Z]     raise e from e

[2025-11-19T17:22:13.432Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:165: in handler

[2025-11-19T17:22:13.432Z]     return func(*args, **kwargs)

[2025-11-19T17:22:13.432Z] /usr/local/lib/python3.10/dist-packages/pymilvus/client/grpc_handler.py:1335: in load_collection

[2025-11-19T17:22:13.432Z]     check_status(response)

[2025-11-19T17:22:13.432Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2025-11-19T17:22:13.432Z] 

[2025-11-19T17:22:13.432Z] status = error_code: UnexpectedError

[2025-11-19T17:22:13.432Z] reason: "call query coordinator LoadCollection: when load 2 replica count: service resourc...Collection: when load 2 replica count: service resource insufficient[currentStreamingNode=1][expectedStreamingNode=2]"

[2025-11-19T17:22:13.432Z] 

[2025-11-19T17:22:13.432Z] 

[2025-11-19T17:22:13.432Z]     def check_status(status: Status):

[2025-11-19T17:22:13.432Z]         if status.code != 0 or status.error_code != 0:

[2025-11-19T17:22:13.432Z] >           raise MilvusException(status.code, status.reason, status.error_code)

[2025-11-19T17:22:13.432Z] E           pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=call query coordinator LoadCollection: when load 2 replica count: service resource insufficient[currentStreamingNode=1][expectedStreamingNode=2])>

search failed


[2025-11-19T17:22:13.436Z] >       collection_w.search(vectors_to_search[:default_nq], default_search_field,

[2025-11-19T17:22:13.436Z]                             search_params, default_limit,

[2025-11-19T17:22:13.436Z]                             default_search_exp,

[2025-11-19T17:22:13.436Z]                             output_fields=[ct.default_int64_field_name],

[2025-11-19T17:22:13.436Z]                             check_task=check_task,

[2025-11-19T17:22:13.436Z]                             check_items={"nq": default_nq,

[2025-11-19T17:22:13.436Z]                                          "limit": default_limit})

[2025-11-19T17:22:13.436Z] 

[2025-11-19T17:22:13.436Z] testcases/test_action_second_deployment.py:151: 

[2025-11-19T17:22:13.436Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/orm/collection.py:810: in search

[2025-11-19T17:22:13.436Z]     resp = conn.search(

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:254: in handler

[2025-11-19T17:22:13.436Z]     raise e from e

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:250: in handler

[2025-11-19T17:22:13.436Z]     return func(*args, **kwargs)

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:297: in handler

[2025-11-19T17:22:13.436Z]     return func(self, *args, **kwargs)

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:195: in handler

[2025-11-19T17:22:13.436Z]     raise e from e

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py:165: in handler

[2025-11-19T17:22:13.436Z]     return func(*args, **kwargs)

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/client/grpc_handler.py:975: in search

[2025-11-19T17:22:13.436Z]     return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/client/grpc_handler.py:910: in _execute_search

[2025-11-19T17:22:13.436Z]     raise e from e

[2025-11-19T17:22:13.436Z] /usr/local/lib/python3.10/dist-packages/pymilvus/client/grpc_handler.py:899: in _execute_search

[2025-11-19T17:22:13.436Z]     check_status(response.status)

[2025-11-19T17:22:13.436Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2025-11-19T17:22:13.436Z] 

[2025-11-19T17:22:13.436Z] status = error_code: UnexpectedError

[2025-11-19T17:22:13.436Z] reason: "failed to search: loaded collection do not found any channel in target, may be in...ction do not found any channel in target, may be in recovery: collection on recovering[collection=462309007702185671]"

[2025-11-19T17:22:13.436Z] 

[2025-11-19T17:22:13.436Z] 

[2025-11-19T17:22:13.436Z]     def check_status(status: Status):

[2025-11-19T17:22:13.436Z]         if status.code != 0 or status.error_code != 0:

[2025-11-19T17:22:13.436Z] >           raise MilvusException(status.code, status.reason, status.error_code)

[2025-11-19T17:22:13.436Z] E           pymilvus.exceptions.MilvusException: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=462309007702185671])>

sn and qn all works well and numbers all >=2

pulsar-cluster-reinstall-3337-milvus-datanode-655467594f-29vcq    1/1     Running                  1 (28m ago)     28m     10.104.25.92    4am-node30   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-datanode-655467594f-9p7xd    1/1     Running                  1 (28m ago)     28m     10.104.13.195   4am-node16   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-mixcoord-78d758cdc8-mh9jj    1/1     Running                  1 (28m ago)     28m     10.104.19.48    4am-node28   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-proxy-5566f84f46-kkrbz       1/1     Running                  1 (28m ago)     28m     10.104.19.49    4am-node28   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-querynode-f856cd968-8rq8c    1/1     Running                  2 (27m ago)     28m     10.104.18.22    4am-node25   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-querynode-f856cd968-j5vlp    1/1     Running                  1 (28m ago)     28m     10.104.14.197   4am-node18   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-querynode-f856cd968-pfhhp    1/1     Running                  1 (28m ago)     28m     10.104.19.50    4am-node28   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-streamingnode-5d684cf2z9xv   1/1     Running                  2 (18m ago)     28m     10.104.30.250   4am-node38   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-streamingnode-5d684cflmz9m   1/1     Running                  1 (28m ago)     28m     10.104.25.93    4am-node30   <none>           <none>

Expected Behavior

No response

Steps To Reproduce

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/3337/pipeline

cluster: 4am
ns:chaos-testing
pods

pulsar-cluster-reinstall-3337-milvus-datanode-655467594f-29vcq    1/1     Running                  1 (28m ago)     28m     10.104.25.92    4am-node30   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-datanode-655467594f-9p7xd    1/1     Running                  1 (28m ago)     28m     10.104.13.195   4am-node16   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-mixcoord-78d758cdc8-mh9jj    1/1     Running                  1 (28m ago)     28m     10.104.19.48    4am-node28   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-proxy-5566f84f46-kkrbz       1/1     Running                  1 (28m ago)     28m     10.104.19.49    4am-node28   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-querynode-f856cd968-8rq8c    1/1     Running                  2 (27m ago)     28m     10.104.18.22    4am-node25   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-querynode-f856cd968-j5vlp    1/1     Running                  1 (28m ago)     28m     10.104.14.197   4am-node18   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-querynode-f856cd968-pfhhp    1/1     Running                  1 (28m ago)     28m     10.104.19.50    4am-node28   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-streamingnode-5d684cf2z9xv   1/1     Running                  2 (18m ago)     28m     10.104.30.250   4am-node38   <none>           <none>
 pulsar-cluster-reinstall-3337-milvus-streamingnode-5d684cflmz9m   1/1     Running                  1 (28m ago)     28m     10.104.25.93    4am-node30   <none>           <none>

Anything else?

No response

Metadata

Metadata

Labels

kind/bugIssues or changes related a bugpriority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.severity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions