Skip to content

Conversation

abhishekmadan30
Copy link
Contributor

fixes #221 . Created a bunch of helpers to use this package in taskcluster.py . I also didnt completely depcreate the url creation as that is still used. I bypassed it and replaced all uses when to get request is made.

Copy link
Collaborator

@ahal ahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up nicely, thanks!

I have some changes to request. Once you put up a new revision I'll pull this down locally and see if it causes any differences in the graph (both here and in Gecko).

@abhishekmadan30 abhishekmadan30 requested a review from ahal August 19, 2025 19:45
@ahal
Copy link
Collaborator

ahal commented Aug 19, 2025

Thanks! I'll take a closer look at this tomorrow. I'll try and test it against gecko_taskgraph as well.

Copy link
Collaborator

@ahal ahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tested this patch both in Taskgraph and the Gecko repo and it looks like it's working as expected!

But get_retry_post_session, _do_request and _handle_artifact are all unused now in taskcluster.py. Can you remove these and associated tests please?

@ahal
Copy link
Collaborator

ahal commented Aug 20, 2025

Also I think I changed my mind about leaving use_proxy in the function signature just for backwards compatibility.. We should remove it rather than confuse consumers who are attempting to set it to False or whatnot and then wondering why it isn't working.

We can do a major version bump for this.. no big deal.

Copy link
Collaborator

@ahal ahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

This is mostly cleanup, but there are two issues I neglected to catch in my first review:

  1. The _handle_artifact for yaml
  2. The .capitalize() issue

response.raw.read = functools.partial(response.raw.read, decode_content=True)
return response.raw


def get_artifact_url(task_id, path, use_proxy=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave this for a follow-up issue, but you can actually get artifact urls and the like from the Taskcluster client as well, using something like:

queue.buildUrl("getArtifact", task_id, path)

If we used this in get_artifact_url, get_index_url and get_task_url, I think we could get rid of the get_root_url method as well as the dependency on taskcluster_urls.

But as I mentioned, this is close to landing, so let's get this landed and maybe consider this change later.

return _handle_artifact(path, response)
queue = get_taskcluster_client("queue")
response = queue.getArtifact(task_id, path)
return response
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could cause some problems, at least if I'm reading this right, we used to load yaml artifacts and now we don't. I think the taskcluster client will handle json for us, but not yaml.

So maybe we can add the _handle_artifact logic back in? Sorry, I said to remove it but I was focused on the fact that it was unused and didn't notice the regression.

break
tasks = response.get("tasks", [])
for task in tasks:
results.append(task)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop doesn't seem to do anything

if "taskId" in t:
task_id = t.get("taskId")
if task_id:
task_ids.append(task_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some superfluous existence checks here. You could re-write this as:

for t in results:
    if task_id := t.get("taskId"):
        task_ids.append(task_id)

(look up the "walrus operator" if that confuses you)

But this can be further simplified using a list comprehension:

task_ids = [t["taskId"] for t in results if "taskId" in t]

namespace = task.get("namespace") # type: ignore
task_id = task.get("taskId") # type: ignore
if namespace and task_id:
task_ids[namespace] = task_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be re-written using a dictionary comprehension:

task_ids = {
    t["namespace"]: t["taskId"]
    for t in tasks
    if "namespace" in t and "taskId" in t
}

else:
options = taskcluster.optionsFromEnvironment()

return getattr(taskcluster, service.capitalize())(options)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized the use of capitalize here is going to break the purge_cache function (and also WorkerManager, but we don't use that here)..

E.g:

>>> "purgeCache".capitalize()
'Purgecache'

I know you copied this code from something I wrote, so my bad.. Maybe instead of using .capitalize() we can do something like service[0].upper() + service[1:]

"""Requests a cache purge from the purge-caches service."""
if testing:
logger.info(f"Would have purged {provisioner_id}/{worker_type}/{cache_name}.")
else:
logger.info(f"Purging {provisioner_id}/{worker_type}/{cache_name}.")
purge_cache_url = get_purge_cache_url(provisioner_id, worker_type, use_proxy)
_do_request(purge_cache_url, json={"cacheName": cache_name})
purge_cache_client = get_taskcluster_client("purgeCache")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see my note about this above)

tasks = response.get("tasks", [])
for task in tasks:
if "status" in task: # type: ignore
status = task["status"] # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good opportunity to try out the walrus operator ;)

continue

raise e
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the taskcluster client would return None on a 404? I would have thought it would raise an exception still.. Though it would probably be from aiohttp rather than requests.

@@ -57,29 +58,31 @@ def params():
),
),
)
def test_index_search(caplog, responses, params, state, expires, expected, logs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but fyi there exists an aioresponses mock library designed to be used with aiohttp that we could make use of in a similar way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use the Taskcluster Python client rather than direct http requests
2 participants