Skip to content

Conversation

@smiet
Copy link
Contributor

@smiet smiet commented Jun 12, 2024

It would be amazing to have all of the QUASR configurations accessible with just one command.

I implemented a simple interface using the 'requests' library, that downloads any QUASR field.

I tried to make it so that it picks a random configuration, but apparently not all integers between 0 and 300,000 correspond to an entry in the database.
@andrewgiuliani is there a list of which entries exist, so that a robust random picker can be returned?
Or even better, the user could directly query the database for a list of constraints (NFP, A, iota, coil length, etc) and get a random config that fits their needs.

Something like this must be already implemented on the website, one would only have to adapt this to python. I am happy to do this but would need more info on the database itself. Would this be of interest @andrewgiuliani?

@smiet smiet self-assigned this Jun 12, 2024
@codecov
Copy link

codecov bot commented Jun 12, 2024

Codecov Report

Attention: Patch coverage is 19.60784% with 41 lines in your changes missing coverage. Please review.

Project coverage is 91.78%. Comparing base (73104bf) to head (751bd9c).
Report is 92 commits behind head on master.

Files with missing lines Patch % Lines
src/simsopt/configs/zoo.py 19.60% 41 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #425      +/-   ##
==========================================
- Coverage   92.01%   91.78%   -0.23%     
==========================================
  Files          82       82              
  Lines       16013    16043      +30     
==========================================
- Hits        14734    14725       -9     
- Misses       1279     1318      +39     
Flag Coverage Δ
unittests 91.78% <19.60%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

landreman
landreman previously approved these changes Jun 19, 2024
return curves, currents, ma

elif return_style == 'json':
return coils, ma, surfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add else: case to raise an exception if return_style is something else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is now an exception raised higher up if return_style is something else

return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis
like the other configurations in the zoo.
If 'json' the function will return the full set of coils, magnetic axis and a number of surfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "a number of surfaces", do you specifically mean a list of SurfaceXYZTensorFourier objects? If so it would be preferable to give this specific information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clarified in the latest commit

ID: the ID of the configuration to download. The database is navigatable at https://quasr.flatironinstitute.org/
Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655
return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "json" isn't an intuitive description of what the function returns. Maybe "surfaces"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?

Copy link
Contributor

@andrewgiuliani andrewgiuliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR is ready yet. We still need to rerun code coverage to make sure no lines are missing, and address @landreman's comments.

Copy link

@jsoules jsoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I'm the author/maintainer of the QUASR navigator website. @andrewgiuliani asked me to take a look at this PR. Really glad for your interest and to see that the site is useful!

I've made a couple suggestions here. (I'm not a part of this project so I don't consider anything I say to be a blocker for this PR--just suggestions.)

Regarding the test, how often would you expect this to be run? While I don't anticipate a big issue, I would want to confirm with our admins before encouraging anyone to run lots of automated downloads from the site as part of a test suite; I would think that it would be better to do a unit test run on every commit & an integration test run on PR merge or whatever. (But I'm not sure whether that works with the existing test setup.)

In your PR, you asked whether there is a list of which integers correspond to a database entry, as you noted they are not contiguous. Unfortunately there isn't a fixed list of good IDs--it's subject to change with each release of new backing data. We could give you a subset, but I don't think a random draw from the complete set of devices is practical at this time. (For context, I was originally enforcing constraints on the valid ID sets in the website routing, but I realized it wouldn't be worthwhile given the likelihood that they would change.)

You also asked about querying against the database for a range of figures of merit. Unfortunately this is also not possible at present--the site actually works by downloading the full set of metadata for all devices, so there isn't a persistent database server to query against. In practice all filtering of the downloaded devices is done client-side, in the browser.

That said, while I appreciate the rigor of selecting random devices, I think using a fixed set of IDs to download should suffice here--we're hopefully not that adversarial an environment 😄 What I might consider would be testing several example IDs of different lengths & a range of device structures, to make sure the full gamut of known-good IDs can be handled by your code & the correct data is downloaded; but honestly even that seems like it might be overkill...

Again, thanks for your interest and please let me know if you'd like to discuss further!

parametrized in Boozer coordinates.
"""

assert return_style in ['default', 'json']
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions can be disabled by caller/system config [e.g. if PYTHONOPTIMIZE is set], so it might be better not to rely on assert for a true runtime (not-debugging) check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this! thanks for letting me know, we've changed the assert to an if-statement

Comment on lines 217 to 219
id_str = f"{ID:07d}"
# string to 7 digits
url = f'https://quasr.flatironinstitute.org/simsopt_serials/{id_str[0:4]}/serial{id_str}.json'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but note that we reserve the right to change the ID structure in new versions of the database, which would break this code. (Fair warning!)

I might be able to set up an endpoint on QUASR for converting an ID (arbitrary length) to the correct location, but I wouldn't have a chance to do that for at least a few weeks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, thanks!

print(f"Configuration with ID {ID:07} downloaded successfully")
surfaces, ma, coils = json.loads(r.content, cls=GSONDecoder)
else:
raise ValueError(f"Configuration ID {ID:07d} does not exist. Status code: {r.status_code}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly accurate--the download might've failed for some other reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated string here, thanks!

Comment on lines 6 to 7
curves, currents, ma = get_QUASR_data(952)
coils, ma, surfaces = get_QUASR_data(952, return_style='json')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be useful to have assertions verifying (from the returned data) that these are the same device record.

Copy link
Contributor

@andrewgiuliani andrewgiuliani May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, checked that the first coil/surface of the downloaded one is as expected

ID: the ID of the configuration to download. The database is navigatable at https://quasr.flatironinstitute.org/
Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655
return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?

@smiet smiet marked this pull request as draft November 12, 2024 09:16
@smiet
Copy link
Contributor Author

smiet commented Nov 12, 2024

Sorry to leave this hanging so long, but I think this would be a great addition, and am wanting to use this as I am looking at more QUASRs with pyoculus, and it would be great to directly grab them from a script.

@jsoules, has the content of the json changed with newer entries to the database? For example https://quasr.flatironinstitute.org/model/1258083 only returns surfaces and coils

@andrewgiuliani, lets divide the work, it should be pretty minimal, hard-code a few numbers into the tests and adapt to the new format of QUASR (no magnetic axis anymore :'( )

@jsoules
Copy link

jsoules commented Nov 12, 2024

@jsoules, has the content of the json changed with newer entries to the database? For example https://quasr.flatironinstitute.org/model/1258083 only returns surfaces and coils

Hi,
Not sure what you mean--I'm looking at that link and I see it populating with surfaces, coils, modB, and currents, as well as populating the Poincare plots. (All those assets are present on the site back-end.)

The data in the individual-record JSON is an extract of the relevant row for the whole-database JSON; this is what populates the "Device Metadata" block on the right-hand side in the web UI.

The set of fields included in the JSON (i.e. the quantities of interest) have changed in different releases, but without knowing the time point you're comparing to, I'm hesitant to give a definitive answer. But any changes would affect all devices, as the same format is used for the entire database.

@smiet
Copy link
Contributor Author

smiet commented Nov 13, 2024

When we wrote the scripts in June, the database would return a json file containing surfaces, magnetic_axis, coils, which allowed us to mirror the return style of other configurations in the simsopt zoo such as simsopt.configs.get_w7x_data().

This would allow a user to more easily swap out a quasr configuration in an existing script.

Is it also possible for the user to request the entire row from the database? These quantities could be useful in scripts that further deal with this data.

@andrewgiuliani
Copy link
Contributor

Hi Chris, you're right! I have those json files too, we will swap them out so the axis is returned as well. As for the row of the data frame, that should be doable as well. Let me discuss with Jeff

@missing-user
Copy link
Contributor

missing-user commented Jan 8, 2025

Hi, I also think this is a great addition and would further suggest to optionally cache the requests to avoid unnecessary server load. I'd suggest such a minimally invasive implementation: #468

Also related to @jsoules comment #425 (review) since the requests cache for the unit test could be stored as an artifact, avoiding any traffic generated by CI.

@mishapadidar
Copy link
Contributor

@smiet is this PR still in progress or should we close?

@smiet
Copy link
Contributor Author

smiet commented May 14, 2025

@andrew and I were planning to have a look-see this week and wrap it up. Currently stuck a bit with how to 'Mock' a request to a server in the testing...

Also looking into the caching of previous results to minimize the clobbering of the database. Let us keep this PR open and add the code from #468 if we choose so (it creates a folder upon import, which I would rather not, we will discuss the solution).

@jsoules
Copy link

jsoules commented May 14, 2025

@andrew and I were planning to have a look-see this week and wrap it up. Currently stuck a bit with how to 'Mock' a request to a server in the testing...

Let me know if you need a hand with that--it can be a bit idiosyncratic until you get used to it (mocking a context manager, in particular, can be fussy).

@andrewgiuliani andrewgiuliani requested review from jsoules and mishapadidar and removed request for jsoules May 16, 2025 20:24
Copy link
Contributor

@andrewgiuliani andrewgiuliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mishapadidar @smiet @jsoules I've mocked the requests.get call so that this is a true unit test, please give this a look.

I have not written an integration test - I don't know if we have the infrastructure set up to run a subset of tests only at merging. Can we do this @mbkumar ?

@mbkumar
Copy link
Collaborator

mbkumar commented May 16, 2025

I don't know if we have the infrastructure set up to run a subset of tests only at merging. Can we do this @mbkumar ?

We can definitely do this. It's a bunch of if conditions in the Action file.

@andrewgiuliani andrewgiuliani self-requested a review May 16, 2025 20:40
@andrewgiuliani andrewgiuliani marked this pull request as ready for review May 16, 2025 20:40
Copy link

@jsoules jsoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At @andrewgiuliani's request, I've taken a look at this.

In general I think it's in really good shape; I caught a couple minor things, and made a few broader-scope suggestions that are not germane to this pull request.

The one thing that I think could be improved before merging is how this test will interact with the caching system in the implementation in zoo.py (explained in my in-line comments below).

I think you were also wondering about the integration test. I'll leave it to the local experts to determine how to actually flag a test to be run rarely (& what conditions should trigger running it), but apart from that it would just be as simple as running lines 34-40 of test_QUASR_downloader without mocking request, so that you actually hit the QUASR Navigator web server.

Comment on lines 244 to 245
else:
raise ValueError #should not be reached as we check before download to avoid clobbering the database.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd probably leave this in even if it's supposed to be unreachable. Better to have a noisy error than just silently not do anything.

Comment on lines +16 to +19
This unit test checks that the get_QUASR_data functionality works as expected.
We download the device with ID=0000952 is downloaded correctly. We also check that
exceptions are raised if an ID is requested, but the associated device
does not exist, or if the improper return style is passed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I tend to write separate tests for these different conditions, since the failures then give you a more specific idea of what's wrong, but of course you should follow whatever the project's standard practice is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also there's a typo in "We download the device with ID=.. is downloaded correctly."

mock_response = MagicMock()
mock_response.status_code = 200

with open(THIS_DIR / '../test_files/serial0000952.json', "rb") as f:
Copy link

@jsoules jsoules May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you have this string (& the numbers '952' in it) reused a couple times, it might be a good idea to do something like:

serial_number = 952
test_json_filename = f"serial{serial_number:07}.json"
path_to_test_data_file = THIS_DIR / f"../test_files/{test_json_filename}"
with open(path_to_test_data_file, "rb") as f:
...
curves, currents = get_QUASR_data(serial_number, return_style='simsopt-style')
...

to make it easier to switch which test file you're using & where it's located. Not a big deal but it ensures consistency across the different usage points.

Comment on lines 262 to 263
if return_style == 'simsopt-style':
nfp = surfaces[0].nfp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray thought--this is code to convert the downloaded JSON to internal object types, right? Is this repeated in several places in your project? Just feels like the kind of thing that might be, in which case might consider making it a separate function. (I don't want to expand the scope of this PR, just wanted to flag it as a thing to think about.)

Comment on lines +42 to +46
surfaces, coils = get_QUASR_data(952, return_style='quasr-style')
assert isinstance(coils[0], Coil)
assert isinstance(surfaces[0], SurfaceXYZTensorFourier)
np.testing.assert_allclose(surfaces[0].x, true_surfaces[0].x)
np.testing.assert_allclose(coils[0].x, true_coils[0].x)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about is the way that running the test might impact the environment where the test is being run. In general, it's not good if the test makes changes that are visible outside the test, because that could impact subsequent tests. I mention it because in this case I think the test will interact with the caching code--the get_QUASR_data function is expected to create a cache file the first time it runs for a particular ID, and then read from the cache file on subsequent runs for the same ID.

For this test, that means the very first time you run it you'll be creating the cache file, but on subsequent runs (and every time, for the code I've highlighted here) you'll be reading from the cache file.

  1. I would recommend checking for, and cleaning up, the cache file at the end of the test.
  2. If you care about the caching behavior, you might want to check that it's happening between these two calls.

I think you could handle deleting the cache file by using the FILE_PATH = THIS_DIR / f'QUASR_cache/serial{id_str}.json' logic you have in the actual implementing function (i.e. get_QUASR_data()) and then doing FILE_PATH.is_file() to confirm it exists and doing FILE_PATH.unlink() if it does (you'd still need to clear the QUASR_cache directory as well, I think there's a rmdir() method in pathlib).

Asserting not FILE_PATH.is_file() before the first call to get_QUASR_data(), and FILE_PATH.is_file() after it, would check that the cache file is getting created. If you'd like to check that the cache is actually getting used, you could also do mock_get.assert_not_called() before the first get_QUASR_data(), then mock_get.assert_called_once() after the SECOND get_QUASR_data() (to make sure it actually used the cache). Equivalently, you can also assert against the mock_get.call_count variable.

I could also see mocking builtins.print and inspecting the values it was called with, if you want to check anything about the function's reporting, but I'm not sure whether I'd bother realistically since "does the thing print this" is unlikely to be mission-critical functionality 😄 The only reason I'd do it is if you wanted to inspect what it was printing to confirm that it says it's using the cache.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to avoid having to do the cleanup by using a TemporaryDirectory context in this test; temporary directories are deleted when you exit the context. See https://adamj.eu/tech/2024/12/30/python-temporary-files-directories-unittest/ for an example (except you'd want to use https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory instead of NamedTemporaryFile).

However, using the temporary directory would require changing the definition of THIS_DIR in configs/zoo.py (which defines get_QUASR_data()), which might be complicated, so you're probably best off just deleting the cache file that you expect to have created in the test.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be convenient to let the user configure where the parent directory for the downloads should be located (for the functions in zoo.py)--so that you can use that instead of the THIS_DIR which I think (?) is relative to where the zoo.py file is located. But that's obviously well outside the scope of this PR.

# mock os.makedir only here so it throws an error
with patch("simsopt.configs.zoo.os.makedirs") as mock_makedirs:
mock_makedirs.side_effect = Exception("Failed to create directory")
_, _ = get_QUASR_data(953, return_style='quasr-style')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here of course you've hard-coded 953, because if you used 952 again you'd hit the cache and makedir wouldn't be called. Subtle! 😄

If you go with the defined serial_number variable I suggested earlier, you could use serial_number + 1 or something.

Alternatively, if this were a separate test, you would theoretically have deleted the cached file at the end of whatever test created it.

id_str = f"{ID:07d}" # string to 7 digits
url = f'https://quasr.flatironinstitute.org/simsopt_serials/{id_str[0:4]}/serial{id_str}.json'

FILE_PATH = THIS_DIR / f'QUASR_cache/serial{id_str}.json'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if this will still do the right thing on environments where / is not the path separator (e.g. windows). If you care, could change to FILE_PATH = THIS_DIR / 'QUASR_cache' / f"serial{id_str}.json" (and push it onto pathlib).

smiet added 2 commits June 3, 2025 18:41
`get_quasr_data` always attempted to cache in the install directory of simsopt. Now it only does that if the directory is writeable, otherwise in the pwd. And an option to skip the caching alltogether.
@smiet
Copy link
Contributor Author

smiet commented Jun 3, 2025

It is getting there! I changed the caching to test if the directory (relative to zoo.py, in the install directory) is writeable, if not make a cache in the cwd. (in some types of installation this might be the case, i.e. it is installed in base system python and a user imports it).

@andrewgiuliani can you address the last few issues by @jsoules? We should not let perfect be the enemy of good enough, and this is a really nice functionality to have, so let's merge it soon!

success = True

if use_cache:
with open(FILE_PATH, 'wb') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if the directory doesn't exist yet.

Suggested change
with open(FILE_PATH, 'wb') as f:
os.makedirs(FILE_PATH.parent, exist_ok=True)
with open(FILE_PATH, 'wb') as f:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants