Add registry proxying section #66

dmcgowan · 2019-07-26T00:01:56Z

Define repository namespace query parameter for proxying.

Closes #12

Giving time for registry operators to weigh in

Maintainer approval

(mike: I updated the maintainer list in this vote to reflect the current maintainer list)

mikebrow

See comments & questions.

spec.md

jzelinskie · 2019-08-09T19:13:22Z

Why do clients need to know anything about pull-through caching if its implemented server-side?

dmcgowan · 2019-08-09T20:55:11Z

Why do clients need to know anything about pull-through caching if its implemented server-side?

The clients should know how the registry host was resolved from a given image reference. The clients don't care how the server is implemented, but they SHOULD provide information to the server which indicates what the reference being asked for is. Just as when an HTTP client connects with a PROXY server it must communicate what the upstream server is, the same is true here. Today the protocol doesn't define anyway to communicate what the upstream is and proxies end up be hardcoded to a single upstream. In a few cases you can see proxies use custom domains per upstream and require users to change the name of their images in order to use them.

josephschorr · 2019-08-12T13:21:50Z

Today the protocol doesn't define anyway to communicate what the upstream is and proxies end up be hardcoded to a single upstream.

Right... isn't that the point? If I encode that "myregistry/mynamespace/myrepo" goes to "upstream/foo/bar", that's a detail for the maintainer of myrepo and one that the client, ideally, doesn't need to know; the whole point is the puller thinks they are getting "myregistry/mynamespace/myrepo".

If the goal is to allow the client to specify "upstream/foo/bar", then I'd that the target is not really a repository anymore, but simply a working proxy, and thus, a different protocol parameter might be useful, but registries should therefore have the option to not support said parameter.

dmcgowan · 2019-08-13T18:33:03Z

Right... isn't that the point?

That is one use case that will still work. In the example you mentioned, when a repository is proxied in that fashion, the puller often does know of this detail as they must explicitly provide myregistry with the intent of getting some upstream content. The use case where myregistry is some sort of blessed version of upstream is reasonable, but not the intent of the namespace parameter here.

If the goal is to allow the client to specify "upstream/foo/bar"

This is the use case here and proxy may be better terminology here, but that is really a detail of the registry. The registry may act as a proxy, proxy-cache, or active mirror, that is out of scope for definition here. This parameter just enables all of those features to work across multiple namespaces. For example if you want public images from both docker.io/* and quay.io/* to be cached in the same registry proxy today, you would need the server to have two hostnames (something like docker.io.myproxy and quay.io.myproxy) then have clients configured to do that mapping for each namespace. This simple query parameter provides a much simpler option to clients and servers. If a server does not support it, it ignores the parameter. If a client is configured to send all requests to a server which does not support it, that is not different than any other misconfiguration by clients today.

josephschorr · 2019-08-13T19:35:27Z

For example if you want public images from both docker.io/* and quay.io/* to be cached in the same registry proxy today, you would need the server to have two hostnames (something like docker.io.myproxy and quay.io.myproxy) then have clients configured to do that mapping for each namespace

Or configure two repositories, one for each? (especially since combining them could lead to merge conflicts).

I'm concerned we're adding quite a bit of complexity to address a use case that has simpler solutions when configured on the registry side.

dmcgowan · 2019-08-13T20:26:25Z

I'm concerned we're adding quite a bit of complexity to address a use case that has simpler solutions when configured on the registry side.

Can you elaborate here? Configuring a repository for each mirror is non-trivial. Configuring a domain for each upstream and routing to the upstream based on the domain is not easier, that would still requires the same routing on the server side that an implementation of this would require. The client side implementation to support per-registry configuration is not simple and inherently requires catch -all conditions when trying to enforce proxying through a gateway.

I did do a client side implementation of this to demonstrate the feature and allow server side implementations a client to test against. On the client side, it is not complex at all since clients should already know how to handle 404s when multiple registry endpoints are configured. On the server side, the complexity to support this isn't much more than existing proxy-cache support.

josephschorr · 2019-08-13T20:36:11Z

Configuring a repository for each mirror is non-trivial.

Its non-trivial, but its not that difficult either :)

My concern remains around complexity: the document as outlined, for example, says that the ns should not be sent to non-mirroring registries... but how does the client know that? Is it the registry's job to report back an error if that argument is found but unsupported? How will clients know to be able to check for this capability?

If we feel that pass-through proxying of other registries is, in and of itself, a feature of the protocol (rather than something configured on the registry side), then I suspect we need to give significantly more thought to the end-to-end user experience. For example, I could imagine some paths supporting proxying and others not.

dmcgowan · 2019-08-13T20:56:17Z

how does the client know that?

The clients have the most context and really does not need to be defined here, only that a client SHOULD make that distinction to avoid sending unnecessary redundant information. The clients themselves have both the configuration and endpoint resolution logic, so it has multiple options for determining this. In the implementation I sent I just simply did this by checking whether the endpoint was configured without push support, as this could indicate the registry being communicated with may not be the upstream source. However, I will probably add a check there for ns != host since there are never push configurations (such as with a Kubernetes runtime). Either way, this is trivial and not required.

Is it the registry's job to report back an error if that argument is found but unsupported?

No, the registry can simply ignore it. This is like asking a registry today which was configured to mirror docker.io to return an error if the client actually meant quay.io, the registry just isn't expected to have the same amount of context as a client in regards to the intent of the entire pull process. If the registry chooses to be handle the ns parameter and not support it, it is as easy as returning a 404 for unconfigured upstreams.

How will clients know to be able to check for this capability?

They aren't expected to check for it, but rather be explicitly configured for it. A client will know if it is configured to always use a specific mirror or a mirror for multiple namespaces. I think what you are suggesting here though is the idea of registry discovery. That is a much larger topic that I would still love to see happen, in that feature a client could start with zero knowledge (except of course the domain quay.io, docker.io, etc) and discover registry capabilities and endpoints.

spec.md

vbatts · 2019-08-14T21:41:21Z

discussion ensues on the call today.
This sounds like a decent addition, but with a clear use-case for the behavior, and whether a registry implementation MUST support it.

jzelinskie · 2019-08-14T22:04:06Z

pinging @thomasmckay and @kurtismullins who are implementing mirroring on Quay -- they probably have feedback and want to track this thread

RCMariko · 2019-08-15T14:08:52Z

Pulp container plugin team will want to keep an eye on this thread as well. Any feedback @ipanova @dkliban @asmacdo ?

vbatts · 2019-08-15T14:11:09Z

is `ns` already used, so best to continue with that mnemonic? or could it be something to not collide with the outdated concept that images would on be named "transport/namespace/name:tag"?

dmcgowan · 2019-08-19T22:13:55Z

@vbatts I use ns or namespace here because namespace is a very generic concept. Certainly as a generic concept it has been used to mean many things. However generally namespace would refer to additional context (such as a prefix) on another name. In the distribution spec case, the name given to the registry would be the part on the URL path, the namespace just gives that additional context to the name. Existing distribution clients today parse the name as you described <sometransport/host/whatever>/<name given to the registry>, in which case when given just <name given to registry>, the <sometransport/host/whatever> would be the namespace of that. You could continue to divide those parts in smaller names in other namespaces, such as the Docker hub does with usernames/reponames, but that is out of scope here. Capturing this in elegant words is kind of tough, recommendations on which parts are unclear or how to make it better are appreciated.

stevvooe

LGTM

I think this is a great addition. It might be a good idea to add a few combinatoric examples of how ns and the the repo name are combined to calculate the upstream and local mirroring location.

vbatts · 2019-12-16T17:07:39Z

@dmcgowan one thing i'm unclear on here is: can i have a single registry mirror that will be usable for more than one remote registry (i.e. remote of docker.io/..., quay.io/..., etc)

spec.md

vbatts · 2020-04-01T21:53:41Z

Notes from today's call:

(jz) quay does something different for mirroring. This should be called "proxying"
(dmcg) this is really for client side caching proxy, and is needed.

Please lets find a way to classify this language (whether client or server side). So we can close out or merge this

jzelinskie · 2020-04-01T22:13:46Z

Reiterating our convo from the meeting:

I actually see a lot of value in adding this query parameter, but removing any connotation that it is the blessed solution for repository mirroring. I think that by including this value, a proxy could implement lots of different behavior for the client that need not be directly related to repository mirroring.

amouat · 2020-04-09T13:29:41Z

I note that harbor uses the term "replication" rather than proxy-cache or mirroring, which I quite like https://goharbor.io/docs/1.10/administration/configuring-replication/

brandond · 2025-06-17T02:29:44Z

Yeah I think it's safe to say that ns has become a de facto standard despite this PR stalling out.

sudo-bmitch · 2025-06-30T13:53:19Z

If the registry ignores a ns value, what are the allowed responses? Should the proxy notify the client about which upstream registry it used? I'm worried about dependency confusion attacks where a client thinks it is pulling from one registry, but the proxy responds with content from another location. A header from the proxy, indicating the upstream registry, would allow the client to cross reference the expected upstream and reject the proxy response on a mismatch.

brandond · 2025-06-30T19:37:44Z

I'm worried about dependency confusion attacks where a client thinks it is pulling from one registry, but the proxy responds with content from another location

This is not a problem that needs to be solved here. Tag mutability, and figuring out what a tag points at when pulling from different registries or mirrors, is a known issue. If this matters to you, the correct solution is to use digests instead of tags. Beyond that, it doesn't matter where the content comes from. Allowing the client to make decisions based on where a registry reports that it is getting its content from violates separation of concerns.

Pull a digest instead of a tag, and you're guaranteed to get what you want. Any hacks around "trusting" content because the server tells you it got it from a specific location, are insecure by design.

sudo-bmitch · 2025-06-30T21:34:13Z

I'm worried about dependency confusion attacks where a client thinks it is pulling from one registry, but the proxy responds with content from another location

This is not a problem that needs to be solved here. Tag mutability, and figuring out what a tag points at when pulling from different registries or mirrors, is a known issue. If this matters to you, the correct solution is to use digests instead of tags. Beyond that, it doesn't matter where the content comes from. Allowing the client to make decisions based on where a registry reports that it is getting its content from violates separation of concerns.

Pull a digest instead of a tag, and you're guaranteed to get what you want. Any hacks around "trusting" content because the server tells you it got it from a specific location, are insecure by design.

I completely agree to using digests and signing for security, even when no proxies are involved. But we have a scenario where it's less the client trusting the proxy, and more the proxy telling the client "you asked for X and I'm choosing to give you Y instead". Or we could say proxies should not do that. Or we can keep the current language that says a proxy is within the spec to return different content than what the client requested without any notification to the client. Of the three options, another header to give the client some feedback seems the least intrusive and most flexible to implementations.

brandond · 2025-06-30T22:23:26Z

I think making the client at all aware of where the request is being served from is problematic. Say I have local registry A acting as pull-through cache that runs alongside my cluster. That is backed by an organization-level registry B with only approved images, which are populated from Docker Hub via a manual sync process that requires approval. If I ask for an image from Docker Hub from either of these registries, with ?ns=docker.io, what should they return as the upstream for the request?

And why should the client care at all, if the only right way to ensure you get what you want is to use the content digest?

The addressable content digest is the integrity check. Anything else is theater.

dmcgowan · 2025-06-30T22:23:32Z

@sudo-bmitch I understand your point but what is the logic you are considering there. From a client perspective, header is included and matches...good, header is not included...also good? Header is included and wrong...fail with what error. From the registry perspective, if it supported and has the content, return header and content, if it is supported and does not have the content, then 404. Either way, 404 is the way for a registry to return it understands the request and does not have the content. With proxies, it is more important for clients to understand the proxies they are communicating with and ensure that proxy is trusted. Either trusted proxy, signed content, or content by digest is the only way we should encourage these use cases.

sudo-bmitch · 2025-07-01T01:48:31Z

I understand your point but what is the logic you are considering there.

I think different tools may have their own logic, but the version I'm considering is:

If the header is returned and matches the expected value, use it.
If the header is missing, and the proxy is directly configured for a specific registry, use it.
If the header is missing, and the proxy is a default for all registries, send non-content addressable requests upstream.
If the header does not match, send non-content addressable requests upstream.

This could be a user configurable behavior, not unlike how TLS verification is configurable in most tools. The value add to me is that there are registries that merge content from multiple upstream sources into a single global namespace. If that registry is used as a proxy, a header would make it possible to detect that content is being returned from a potentially malicious squatter on a repository path that happens to be a different mirror than the expected upstream source. Either the registry wouldn't return a header and tooling should assume it only proxies a single registry, or it should return a header indicating that the content came from a different upstream than expected.

Having many different types of proxies, from the pull through cache of a single registry, a manual mirror of approved content, a mash up from multiple upstream sources, and proxies that understand and use the new ns parameter, combine with different configurations of clients, from explicitly setting a proxy for each upstream to having a global default proxy for everything, mean that it would be easy to misconfigure the combination. So having an extra handshake in the process means clients can require additional user verification before trusting it, just like clients today don't automatically fallback to insecure TLS settings, but there is a configuration for the exceptions.

phillebaba · 2025-07-01T09:42:17Z

I can only speak for how Spegel implements resolving tags with the ns parameter. The mirror registry should include the registry as part of the tag resolve process to avoid any name squatting. If a mirror merges multiple upstream registries it should only resolve the tag if the full registry, repository and tag matches. That is at least how Spegel implements this.

As for the trust aspect, I think the same rules apply as they do today. The responsibility is on the end user to use a registry that they trust. There is nothing stopping a bad acting registry from returning whatever digest it likes to the client. It does not really matter if it is a mirror or not.

sudo-bmitch

Following up on the Thursday call discussion, I added a commit with the OCI-Namespace header. If that's blocking other maintainers from approving, I can split that into a separate PR.

Changes were addressed followed by a LGTM comment.

mikebrow

couple nits

spec.md

sudo-bmitch

Should I add back the change from commit 6386ae2, or was its removal intentional?

A registry that uses the ns query parameter to scope the request SHOULD return the ns query parameter value in the OCI-Namespace header.

mikebrow · 2025-11-06T20:00:11Z

@dmcgowan I updated the maintainer vote list..

note brandon's query ^^

Define repository namespace query parameter for proxying. Signed-off-by: Derek McGowan <[email protected]>

Signed-off-by: Brandon Mitchell <[email protected]>

dmcgowan · 2025-11-06T20:05:57Z

Updated, not intentionally removed, just never had it in my local branch when rebased

sudo-bmitch

LGTM

dmcgowan added this to the v1.0.0-rc1 milestone Jul 26, 2019

dmcgowan force-pushed the mirror-query-param branch from 42589cd to e5df360 Compare July 26, 2019 00:05

dmcgowan mentioned this pull request Jul 26, 2019

Add support for pull-through cache with registry containerd/cri#1196

Closed

mikebrow reviewed Aug 8, 2019

View reviewed changes

dmcgowan force-pushed the mirror-query-param branch from e5df360 to 29fdf25 Compare August 9, 2019 20:45

vbatts reviewed Aug 14, 2019

View reviewed changes

spec.md Outdated Show resolved Hide resolved

stevvooe previously approved these changes Aug 24, 2019

View reviewed changes

majewsky reviewed Dec 17, 2019

View reviewed changes

spec.md Outdated Show resolved Hide resolved

jzelinskie modified the milestones: v1.0.0-rc1, v1.0.0-next May 6, 2020

dmcgowan mentioned this pull request May 20, 2020

Send X-Forwarded-Host when using wildcard mirror containerd/cri#1434

Closed

dmcgowan force-pushed the mirror-query-param branch 2 times, most recently from 658a97c to 1e3a9a4 Compare June 26, 2020 03:59

dmcgowan dismissed vbatts’s stale review via e57f6f2 June 26, 2025 17:15

dmcgowan force-pushed the mirror-query-param branch 3 times, most recently from 669cad7 to 68508bd Compare June 26, 2025 20:54

sudo-bmitch previously approved these changes Jul 4, 2025

View reviewed changes

BenTheElder mentioned this pull request Jul 17, 2025

fallback partial userspace proxy? kubernetes/kubernetes#132955

Open

sudo-bmitch mentioned this pull request Aug 15, 2025

add "ns" query param to registry mirror requests regclient/regclient#976

Merged

4 tasks

mikebrow reviewed Nov 6, 2025

View reviewed changes

spec.md Outdated Show resolved Hide resolved

spec.md Outdated Show resolved Hide resolved

dmcgowan dismissed sudo-bmitch’s stale review via ba13430 November 6, 2025 18:41

dmcgowan force-pushed the mirror-query-param branch from 6386ae2 to ba13430 Compare November 6, 2025 18:41

jdolitsky previously approved these changes Nov 6, 2025

View reviewed changes

sudo-bmitch requested changes Nov 6, 2025

View reviewed changes

dmcgowan dismissed jdolitsky’s stale review via 2f1b040 November 6, 2025 20:03

dmcgowan force-pushed the mirror-query-param branch from ba13430 to 2f1b040 Compare November 6, 2025 20:03

dmcgowan and others added 2 commits November 6, 2025 12:04

Add registry proxying section

4828340

Define repository namespace query parameter for proxying. Signed-off-by: Derek McGowan <[email protected]>

Add the OCI-Namespace header in the proxy response

ca746bd

Signed-off-by: Brandon Mitchell <[email protected]>

dmcgowan force-pushed the mirror-query-param branch from 2f1b040 to ca746bd Compare November 6, 2025 20:05

sudo-bmitch approved these changes Nov 6, 2025

View reviewed changes

jdolitsky approved these changes Nov 6, 2025

View reviewed changes

Add registry proxying section #66

Are you sure you want to change the base?

Add registry proxying section #66

Conversation

dmcgowan commented Jul 26, 2019 • edited by mikebrow Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jzelinskie commented Aug 9, 2019

Uh oh!

dmcgowan commented Aug 9, 2019

Uh oh!

josephschorr commented Aug 12, 2019

Uh oh!

dmcgowan commented Aug 13, 2019

Uh oh!

josephschorr commented Aug 13, 2019

Uh oh!

dmcgowan commented Aug 13, 2019

Uh oh!

josephschorr commented Aug 13, 2019

Uh oh!

dmcgowan commented Aug 13, 2019

Uh oh!

Uh oh!

vbatts commented Aug 14, 2019

Uh oh!

jzelinskie commented Aug 14, 2019

Uh oh!

RCMariko commented Aug 15, 2019

Uh oh!

vbatts commented Aug 15, 2019 via email

Uh oh!

dmcgowan commented Aug 19, 2019

Uh oh!

stevvooe left a comment

Choose a reason for hiding this comment

Uh oh!

vbatts commented Dec 16, 2019

Uh oh!

Uh oh!

vbatts commented Apr 1, 2020

Uh oh!

jzelinskie commented Apr 1, 2020

Uh oh!

amouat commented Apr 9, 2020

Uh oh!

brandond commented Jun 17, 2025

Uh oh!

sudo-bmitch commented Jun 30, 2025

Uh oh!

brandond commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sudo-bmitch commented Jun 30, 2025

Uh oh!

brandond commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmcgowan commented Jun 30, 2025

Uh oh!

sudo-bmitch commented Jul 1, 2025

Uh oh!

phillebaba commented Jul 1, 2025

Uh oh!

sudo-bmitch left a comment

Choose a reason for hiding this comment

Uh oh!

mikebrow left a comment

dmcgowan commented Jul 26, 2019 •

edited by mikebrow

Loading

brandond commented Jun 30, 2025 •

edited

Loading

brandond commented Jun 30, 2025 •

edited

Loading

sudo-bmitch left a comment •

edited

Loading