Skip to content

Conversation

@abd-770
Copy link
Contributor

@abd-770 abd-770 commented Feb 24, 2025

Description:

This PR implements support for both client verification of server (1-way TLS) and mutual authentication (2-way TLS) for all the bulk import functions in pymilvus.

  • bulk_import
  • get_import_progress
  • list_import_jobs

Now, users with TLS-mode milvus can use these functions in pymilvus, using their certificate's.

@sre-ci-robot
Copy link

Welcome @abd-770! It looks like this is your first PR to milvus-io/pymilvus 🎉

@abd-770
Copy link
Contributor Author

abd-770 commented Feb 26, 2025

@longjiquan @XuanYang-cn
I'll modify the commit details after all the changes are finalised.
Also, is there an way to test the changes to confirm it's working as expected.

@abd-770
Copy link
Contributor Author

abd-770 commented Mar 3, 2025

@longjiquan @XuanYang-cn
Can you guys review the PR?

@XuanYang-cn
Copy link
Contributor

@abd-770 We are reviewing this PR. In the mean time, could you please fix the DCO error?

@XuanYang-cn XuanYang-cn modified the milestones: 2.5.5, 2.5.6 Mar 3, 2025
@yhmo
Copy link
Contributor

yhmo commented Mar 3, 2025

@abd-770

The "**kwargs" is passed into _post_request() method for each API, and eventually passed into the requests.post() method.

I suppose you can call the get_import_progress() like this:

resp = get_import_progress(
    url=uri,
    job_id="455462875463422512",
    verify=CERTPATH
)

@XuanYang-cn XuanYang-cn removed this from the 2.5.6 milestone Mar 3, 2025
@XuanYang-cn XuanYang-cn added the wontfix This will not be worked on label Mar 3, 2025
@abd-770
Copy link
Contributor Author

abd-770 commented Mar 3, 2025

@yhmo @XuanYang-cn
I see, but as for the user convenience, it's much preferred to add a dedicated certificate field to the function, since many users connect to milvus via TLS.
Alot of users, including me, were under the impression that these function only supports non-TLS calls and we were using the deprecated bulk import functions, until I looked into the pymilvus source code and found out about this.

We can also update the documentation for the same and provide certificate path to the syntax of the function.

@yhmo
Copy link
Contributor

yhmo commented Mar 4, 2025

@abd-770

request has two parameters for tls:

    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

In pymilvus, for one-way tls, the client needs to pass a ca file: https://github.com/milvus-io/pymilvus/blob/master/examples/cert/example_tls1.py
For two-way tls, the client needs to provide two files: a pem and a client key: https://github.com/milvus-io/pymilvus/blob/master/examples/cert/example_tls2.py

Looks like the "verify" parameter is for one-way tls, only passing one path of ca file. It can be a boolean value which means internal ca.
The "cert" parameter is for two-way tls, which can be a tuple of two files. Seems it also can be a string for one-way tls, one ca path, I am not sure.
Your code change only handles one case: the "verify" for one ca path.

I think the cert_path should cover the three cases, I suggest to rename the cert_path to "cert"

def _post_request(
    url: str, api_key: str, params: {}, timeout: int = 20, cert: Optional[bool, str, tuple] = None, **kwargs
)

If cert is a bool, pass it as "verify".
If cert is a ca path, pass it as request("verify"), maybe also ok to pass it to request("cert"), I am not sure.
If cert is a tuple, pass it as request("cert").

@abd-770
Copy link
Contributor Author

abd-770 commented Mar 5, 2025

@yhmo
I can work on implementing the 2-way TLS for these functions.
I will create a seperate issue/PR for it.

Meanwhile for this PR, can we get it merged?

If cert is a bool, pass it as "verify".

For this case, you mean setting the value of verify=true?

        resp = requests.post(
            url=url, headers=_http_headers(api_key), json=params, timeout=timeout, verify=true, **kwargs
        )

@yhmo
Copy link
Contributor

yhmo commented Mar 5, 2025

You can read the source code of "requests.api", under the path [path_to_python_packages]/site-packages/requests/api.py

def post(url, data=None, json=None, **kwargs):
    return request("post", url, data=data, json=json, **kwargs)

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content_type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

The description of "param verify" and "param cert".
"verify" can be a boolean or a path of certificate file
"cert" can be a path of cert file or a tuple of two cert paths

@abd-770
Copy link
Contributor Author

abd-770 commented Mar 18, 2025

@yhmo @XuanYang-cn
Two-way TLS with client side authentication is added to the bulk functions.
Can you take a look at the changes and suggest any improvements.
Also, is there any way to test these functionalities.

Abdullah-Ahmed2 added 2 commits April 13, 2025 05:31
@abd-770
Copy link
Contributor Author

abd-770 commented Apr 13, 2025

@yhmo @XuanYang-cn @longjiquan Please review this PR.
It's been open for some time. I've also resolved the new merge conflicts in the file

@XuanYang-cn XuanYang-cn removed the wontfix This will not be worked on label Apr 15, 2025
@XuanYang-cn
Copy link
Contributor

@abd-770 THX for the effort.

Please fix the coding style check use the following instructions:

pip install -e ".[dev]"
make format

@yhmo
Copy link
Contributor

yhmo commented Apr 15, 2025

@abd-770
You can test the tls functionalities with a milvus standalone(deployed by docker-compose). The server-side configurations: https://milvus.io/docs/tls.md
Once the server is launched with tls, you can try these examples:
one-way tls client-side example: https://github.com/milvus-io/pymilvus/blob/master/examples/cert/example_tls1.py
two-way tls client-side example: https://github.com/milvus-io/pymilvus/blob/master/examples/cert/example_tls2.py

To verify the new parameters for bulk_import/get_import_progress/list_import_jobs, you can call them with fake collection name and fake file paths. If the tls parameter takes effect, you will see error like "collection doesn't exist" returned from the server, instead of a connection exception.

Signed-off-by: Abdullah-Ahmed2 <[email protected]>
@XuanYang-cn XuanYang-cn added this to the 2.6.0 milestone Apr 16, 2025
@XuanYang-cn XuanYang-cn added the PR | need to cherry-pick to 2.x This PR need to be cherry-picked to 2.x branch label Apr 16, 2025
@mergify mergify bot added the ci-passed label Apr 16, 2025
Copy link
Contributor

@XuanYang-cn XuanYang-cn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abd-770, XuanYang-cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 2d73314 into milvus-io:master Apr 16, 2025
11 checks passed
@abd-770 abd-770 changed the title Add Optional CA Certificate Support to Bulk Import Functions Add 1-Way and 2-Way TLS Support to Bulk Import Functions Apr 17, 2025
XuanYang-cn pushed a commit to XuanYang-cn/pymilvus that referenced this pull request Apr 21, 2025
…io#2672)

Issue: milvus-io#2671
This PR addresses the above issue by adding optional CA certificate
support to the below bulk import functions:
- **bulk_import**
- **get_import_progress**
- **list_import_jobs**

Now, users with TLS-mode milvus can use these functions in pymilvus,
using their SSL-certificate.

Co-authored-by: Abdullah-Ahmed2 <[email protected]>
Signed-off-by: yangxuan <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Apr 23, 2025
…2739)

Issue: #2671 This PR
addresses the above issue by adding optional CA certificate support to
the below bulk import functions:
- **bulk_import**
- **get_import_progress**
- **list_import_jobs**

Now, users with TLS-mode milvus can use these functions in pymilvus,
using their SSL-certificate.

Signed-off-by: yangxuan <[email protected]>
Co-authored-by: Abdullah Ahmed <[email protected]>
Co-authored-by: Abdullah-Ahmed2 <[email protected]>
@XuanYang-cn XuanYang-cn added PR | cherry-picked to 2.x PR already cherry-picked to branch 2.x and removed PR | need to cherry-pick to 2.x This PR need to be cherry-picked to 2.x branch labels May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants