added a QUASR-downloader #425

smiet · 2024-06-12T08:55:05Z

It would be amazing to have all of the QUASR configurations accessible with just one command.

I implemented a simple interface using the 'requests' library, that downloads any QUASR field.

I tried to make it so that it picks a random configuration, but apparently not all integers between 0 and 300,000 correspond to an entry in the database.
@andrewgiuliani is there a list of which entries exist, so that a robust random picker can be returned?
Or even better, the user could directly query the database for a list of constraints (NFP, A, iota, coil length, etc) and get a random config that fits their needs.

Something like this must be already implemented on the website, one would only have to adapt this to python. I am happy to do this but would need more info on the database itself. Would this be of interest @andrewgiuliani?

codecov · 2024-06-12T10:44:02Z

Codecov Report

Attention: Patch coverage is 19.60784% with 41 lines in your changes missing coverage. Please review.

Project coverage is 91.78%. Comparing base (73104bf) to head (751bd9c).
Report is 92 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simsopt/configs/zoo.py	19.60%	41 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #425      +/-   ##
==========================================
- Coverage   92.01%   91.78%   -0.23%     
==========================================
  Files          82       82              
  Lines       16013    16043      +30     
==========================================
- Hits        14734    14725       -9     
- Misses       1279     1318      +39

Flag	Coverage Δ
unittests	`91.78% <19.60%> (-0.23%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

landreman · 2024-06-19T20:20:44Z

src/simsopt/configs/zoo.py

+        return curves, currents, ma
+
+    elif return_style == 'json':
+        return coils, ma, surfaces


Add else: case to raise an exception if return_style is something else

there is now an exception raised higher up if return_style is something else

src/simsopt/configs/zoo.py

landreman · 2024-06-19T20:23:49Z

src/simsopt/configs/zoo.py

+        
+        return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis
+                      like the other configurations in the zoo. 
+                      If 'json' the function will return the full set of coils, magnetic axis and a number of surfaces


By "a number of surfaces", do you specifically mean a list of SurfaceXYZTensorFourier objects? If so it would be preferable to give this specific information.

this is clarified in the latest commit

tests/configs/test_zoo.py

src/simsopt/configs/zoo.py

landreman · 2024-06-19T20:26:42Z

src/simsopt/configs/zoo.py

+        ID: the ID of the configuration to download.  The database is navigatable at https://quasr.flatironinstitute.org/
+            Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655
+        
+        return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis


The name "json" isn't an intuitive description of what the function returns. Maybe "surfaces"?

Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?

andrewgiuliani

I don't think this PR is ready yet. We still need to rerun code coverage to make sure no lines are missing, and address @landreman's comments.

jsoules

Hi,
I'm the author/maintainer of the QUASR navigator website. @andrewgiuliani asked me to take a look at this PR. Really glad for your interest and to see that the site is useful!

I've made a couple suggestions here. (I'm not a part of this project so I don't consider anything I say to be a blocker for this PR--just suggestions.)

Regarding the test, how often would you expect this to be run? While I don't anticipate a big issue, I would want to confirm with our admins before encouraging anyone to run lots of automated downloads from the site as part of a test suite; I would think that it would be better to do a unit test run on every commit & an integration test run on PR merge or whatever. (But I'm not sure whether that works with the existing test setup.)

In your PR, you asked whether there is a list of which integers correspond to a database entry, as you noted they are not contiguous. Unfortunately there isn't a fixed list of good IDs--it's subject to change with each release of new backing data. We could give you a subset, but I don't think a random draw from the complete set of devices is practical at this time. (For context, I was originally enforcing constraints on the valid ID sets in the website routing, but I realized it wouldn't be worthwhile given the likelihood that they would change.)

You also asked about querying against the database for a range of figures of merit. Unfortunately this is also not possible at present--the site actually works by downloading the full set of metadata for all devices, so there isn't a persistent database server to query against. In practice all filtering of the downloaded devices is done client-side, in the browser.

That said, while I appreciate the rigor of selecting random devices, I think using a fixed set of IDs to download should suffice here--we're hopefully not that adversarial an environment 😄 What I might consider would be testing several example IDs of different lengths & a range of device structures, to make sure the full gamut of known-good IDs can be handled by your code & the correct data is downloaded; but honestly even that seems like it might be overkill...

Again, thanks for your interest and please let me know if you'd like to discuss further!

jsoules · 2024-06-20T16:17:13Z

src/simsopt/configs/zoo.py

+                      parametrized in Boozer coordinates.
+    """
+
+    assert return_style in ['default', 'json']


Assertions can be disabled by caller/system config [e.g. if PYTHONOPTIMIZE is set], so it might be better not to rely on assert for a true runtime (not-debugging) check.

I didn't know this! thanks for letting me know, we've changed the assert to an if-statement

jsoules · 2024-06-20T16:18:51Z

src/simsopt/configs/zoo.py

+    id_str = f"{ID:07d}"
+    # string to 7 digits
+    url = f'https://quasr.flatironinstitute.org/simsopt_serials/{id_str[0:4]}/serial{id_str}.json'


This is fine, but note that we reserve the right to change the ID structure in new versions of the database, which would break this code. (Fair warning!)

I might be able to set up an endpoint on QUASR for converting an ID (arbitrary length) to the correct location, but I wouldn't have a chance to do that for at least a few weeks.

noted, thanks!

jsoules · 2024-06-20T16:19:31Z

src/simsopt/configs/zoo.py

+            print(f"Configuration with ID {ID:07} downloaded successfully")
+            surfaces, ma, coils = json.loads(r.content, cls=GSONDecoder)
+        else:
+            raise ValueError(f"Configuration ID {ID:07d} does not exist. Status code: {r.status_code}")


This isn't strictly accurate--the download might've failed for some other reason.

updated string here, thanks!

tests/configs/test_zoo.py

jsoules · 2024-06-20T16:27:49Z

tests/configs/test_zoo.py

+        curves, currents, ma = get_QUASR_data(952)
+        coils, ma, surfaces = get_QUASR_data(952, return_style='json')


Might also be useful to have assertions verifying (from the returned data) that these are the same device record.

done, checked that the first coil/surface of the downloaded one is as expected

jsoules · 2024-06-20T16:31:26Z

src/simsopt/configs/zoo.py

+        ID: the ID of the configuration to download.  The database is navigatable at https://quasr.flatironinstitute.org/
+            Alternatively, you can download the latest full set of devices from https://zenodo.org/doi/10.5281/zenodo.10050655
+        
+        return_style: 'default' or 'json'. If 'default', the function will return the curves, currents and magnetic axis


Agreed--it also looks like "default" is only returning one half-period? Am I reading that right?

smiet · 2024-11-12T10:11:21Z

Sorry to leave this hanging so long, but I think this would be a great addition, and am wanting to use this as I am looking at more QUASRs with pyoculus, and it would be great to directly grab them from a script.

@jsoules, has the content of the json changed with newer entries to the database? For example https://quasr.flatironinstitute.org/model/1258083 only returns surfaces and coils

@andrewgiuliani, lets divide the work, it should be pretty minimal, hard-code a few numbers into the tests and adapt to the new format of QUASR (no magnetic axis anymore :'( )

jsoules · 2024-11-12T15:01:11Z

@jsoules, has the content of the json changed with newer entries to the database? For example https://quasr.flatironinstitute.org/model/1258083 only returns surfaces and coils

Hi,
Not sure what you mean--I'm looking at that link and I see it populating with surfaces, coils, modB, and currents, as well as populating the Poincare plots. (All those assets are present on the site back-end.)

The data in the individual-record JSON is an extract of the relevant row for the whole-database JSON; this is what populates the "Device Metadata" block on the right-hand side in the web UI.

The set of fields included in the JSON (i.e. the quantities of interest) have changed in different releases, but without knowing the time point you're comparing to, I'm hesitant to give a definitive answer. But any changes would affect all devices, as the same format is used for the entire database.

smiet · 2024-11-13T12:52:46Z

When we wrote the scripts in June, the database would return a json file containing surfaces, magnetic_axis, coils, which allowed us to mirror the return style of other configurations in the simsopt zoo such as simsopt.configs.get_w7x_data().

This would allow a user to more easily swap out a quasr configuration in an existing script.

Is it also possible for the user to request the entire row from the database? These quantities could be useful in scripts that further deal with this data.

andrewgiuliani · 2024-11-13T12:57:09Z

Hi Chris, you're right! I have those json files too, we will swap them out so the axis is returned as well. As for the row of the data frame, that should be doable as well. Let me discuss with Jeff

missing-user · 2025-01-08T22:37:31Z

Hi, I also think this is a great addition and would further suggest to optionally cache the requests to avoid unnecessary server load. I'd suggest such a minimally invasive implementation: #468

Also related to @jsoules comment #425 (review) since the requests cache for the unit test could be stored as an artifact, avoiding any traffic generated by CI.

mishapadidar · 2025-05-13T15:25:31Z

@smiet is this PR still in progress or should we close?

smiet · 2025-05-14T08:49:45Z

@andrew and I were planning to have a look-see this week and wrap it up. Currently stuck a bit with how to 'Mock' a request to a server in the testing...

Also looking into the caching of previous results to minimize the clobbering of the database. Let us keep this PR open and add the code from #468 if we choose so (it creates a folder upon import, which I would rather not, we will discuss the solution).

jsoules · 2025-05-14T13:50:44Z

@andrew and I were planning to have a look-see this week and wrap it up. Currently stuck a bit with how to 'Mock' a request to a server in the testing...

Let me know if you need a hand with that--it can be a bit idiosyncratic until you get used to it (mocking a context manager, in particular, can be fussy).

andrewgiuliani

@mishapadidar @smiet @jsoules I've mocked the requests.get call so that this is a true unit test, please give this a look.

I have not written an integration test - I don't know if we have the infrastructure set up to run a subset of tests only at merging. Can we do this @mbkumar ?

mbkumar · 2025-05-16T20:32:38Z

I don't know if we have the infrastructure set up to run a subset of tests only at merging. Can we do this @mbkumar ?

We can definitely do this. It's a bunch of if conditions in the Action file.

jsoules

At @andrewgiuliani's request, I've taken a look at this.

In general I think it's in really good shape; I caught a couple minor things, and made a few broader-scope suggestions that are not germane to this pull request.

The one thing that I think could be improved before merging is how this test will interact with the caching system in the implementation in zoo.py (explained in my in-line comments below).

I think you were also wondering about the integration test. I'll leave it to the local experts to determine how to actually flag a test to be run rarely (& what conditions should trigger running it), but apart from that it would just be as simple as running lines 34-40 of test_QUASR_downloader without mocking request, so that you actually hit the QUASR Navigator web server.

jsoules · 2025-05-20T15:30:01Z

src/simsopt/configs/zoo.py

-    else: 
-        raise ValueError  #should not be reached as we check before download to avoid clobbering the database. 


Personally, I'd probably leave this in even if it's supposed to be unreachable. Better to have a noisy error than just silently not do anything.

jsoules · 2025-05-20T15:32:35Z

tests/configs/test_zoo.py

+        This unit test checks that the get_QUASR_data functionality works as expected.
+        We download the device with ID=0000952 is downloaded correctly.  We also check that
+        exceptions are raised if an ID is requested, but the associated device
+        does not exist, or if the improper return style is passed.


Personally I tend to write separate tests for these different conditions, since the failures then give you a more specific idea of what's wrong, but of course you should follow whatever the project's standard practice is.

Oh also there's a typo in "We download the device with ID=.. is downloaded correctly."

jsoules · 2025-05-20T15:39:48Z

tests/configs/test_zoo.py

+        mock_response = MagicMock()
+        mock_response.status_code = 200
+
+        with open(THIS_DIR / '../test_files/serial0000952.json', "rb") as f:


Given that you have this string (& the numbers '952' in it) reused a couple times, it might be a good idea to do something like:

serial_number = 952 test_json_filename = f"serial{serial_number:07}.json" path_to_test_data_file = THIS_DIR / f"../test_files/{test_json_filename}" with open(path_to_test_data_file, "rb") as f: ... curves, currents = get_QUASR_data(serial_number, return_style='simsopt-style') ...

to make it easier to switch which test file you're using & where it's located. Not a big deal but it ensures consistency across the different usage points.

jsoules · 2025-05-20T15:43:29Z

src/simsopt/configs/zoo.py

    if return_style == 'simsopt-style':
        nfp = surfaces[0].nfp


Stray thought--this is code to convert the downloaded JSON to internal object types, right? Is this repeated in several places in your project? Just feels like the kind of thing that might be, in which case might consider making it a separate function. (I don't want to expand the scope of this PR, just wanted to flag it as a thing to think about.)

jsoules · 2025-05-20T16:06:06Z

tests/configs/test_zoo.py

+        surfaces, coils = get_QUASR_data(952, return_style='quasr-style')
+        assert isinstance(coils[0], Coil)
+        assert isinstance(surfaces[0], SurfaceXYZTensorFourier)
+        np.testing.assert_allclose(surfaces[0].x, true_surfaces[0].x)
+        np.testing.assert_allclose(coils[0].x, true_coils[0].x)


Something to think about is the way that running the test might impact the environment where the test is being run. In general, it's not good if the test makes changes that are visible outside the test, because that could impact subsequent tests. I mention it because in this case I think the test will interact with the caching code--the get_QUASR_data function is expected to create a cache file the first time it runs for a particular ID, and then read from the cache file on subsequent runs for the same ID.

For this test, that means the very first time you run it you'll be creating the cache file, but on subsequent runs (and every time, for the code I've highlighted here) you'll be reading from the cache file.

I would recommend checking for, and cleaning up, the cache file at the end of the test.

If you care about the caching behavior, you might want to check that it's happening between these two calls.

I think you could handle deleting the cache file by using the FILE_PATH = THIS_DIR / f'QUASR_cache/serial{id_str}.json' logic you have in the actual implementing function (i.e. get_QUASR_data()) and then doing FILE_PATH.is_file() to confirm it exists and doing FILE_PATH.unlink() if it does (you'd still need to clear the QUASR_cache directory as well, I think there's a rmdir() method in pathlib).

Asserting not FILE_PATH.is_file() before the first call to get_QUASR_data(), and FILE_PATH.is_file() after it, would check that the cache file is getting created. If you'd like to check that the cache is actually getting used, you could also do mock_get.assert_not_called() before the first get_QUASR_data(), then mock_get.assert_called_once() after the SECOND get_QUASR_data() (to make sure it actually used the cache). Equivalently, you can also assert against the mock_get.call_count variable.

I could also see mocking builtins.print and inspecting the values it was called with, if you want to check anything about the function's reporting, but I'm not sure whether I'd bother realistically since "does the thing print this" is unlikely to be mission-critical functionality 😄 The only reason I'd do it is if you wanted to inspect what it was printing to confirm that it says it's using the cache.

You might be able to avoid having to do the cleanup by using a TemporaryDirectory context in this test; temporary directories are deleted when you exit the context. See https://adamj.eu/tech/2024/12/30/python-temporary-files-directories-unittest/ for an example (except you'd want to use https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory instead of NamedTemporaryFile).

However, using the temporary directory would require changing the definition of THIS_DIR in configs/zoo.py (which defines get_QUASR_data()), which might be complicated, so you're probably best off just deleting the cache file that you expect to have created in the test.

It might be convenient to let the user configure where the parent directory for the downloads should be located (for the functions in zoo.py)--so that you can use that instead of the THIS_DIR which I think (?) is relative to where the zoo.py file is located. But that's obviously well outside the scope of this PR.

jsoules · 2025-05-20T16:08:15Z

tests/configs/test_zoo.py

+            # mock os.makedir only here so it throws an error
+            with patch("simsopt.configs.zoo.os.makedirs") as mock_makedirs:
+                mock_makedirs.side_effect = Exception("Failed to create directory")
+                _, _ = get_QUASR_data(953, return_style='quasr-style')


And here of course you've hard-coded 953, because if you used 952 again you'd hit the cache and makedir wouldn't be called. Subtle! 😄

If you go with the defined serial_number variable I suggested earlier, you could use serial_number + 1 or something.

Alternatively, if this were a separate test, you would theoretically have deleted the cached file at the end of whatever test created it.

jsoules · 2025-05-20T16:12:13Z

src/simsopt/configs/zoo.py

+    id_str = f"{ID:07d}" # string to 7 digits
    url = f'https://quasr.flatironinstitute.org/simsopt_serials/{id_str[0:4]}/serial{id_str}.json'
+
+    FILE_PATH = THIS_DIR / f'QUASR_cache/serial{id_str}.json'


I'm unsure if this will still do the right thing on environments where / is not the path separator (e.g. windows). If you care, could change to FILE_PATH = THIS_DIR / 'QUASR_cache' / f"serial{id_str}.json" (and push it onto pathlib).

`get_quasr_data` always attempted to cache in the install directory of simsopt. Now it only does that if the directory is writeable, otherwise in the pwd. And an option to skip the caching alltogether.

smiet · 2025-06-03T16:46:02Z

It is getting there! I changed the caching to test if the directory (relative to zoo.py, in the install directory) is writeable, if not make a cache in the cwd. (in some types of installation this might be the case, i.e. it is installed in base system python and a user imports it).

@andrewgiuliani can you address the last few issues by @jsoules? We should not let perfect be the enemy of good enough, and this is a really nice functionality to have, so let's merge it soon!

missing-user · 2025-07-26T12:19:36Z

src/simsopt/configs/zoo.py

+            success = True
+
+            if use_cache: 
+                with open(FILE_PATH, 'wb') as f:


This will fail if the directory doesn't exist yet.

Suggested change

with open(FILE_PATH, 'wb') as f:

os.makedirs(FILE_PATH.parent, exist_ok=True)

with open(FILE_PATH, 'wb') as f:

added a QUASR-downloader

e5545ac

smiet requested review from andrewgiuliani and landreman June 12, 2024 08:55

smiet self-assigned this Jun 12, 2024

small changes and added unit test

181f7ca

landreman previously approved these changes Jun 19, 2024

View reviewed changes

andrewgiuliani requested changes Jun 19, 2024

View reviewed changes

jsoules reviewed Jun 20, 2024

View reviewed changes

smiet marked this pull request as draft November 12, 2024 09:16

Merge branch 'master' into cbs/QUASR_loader

57c8291

fixed suggestions on this summers' PR

c52dd99

smiet dismissed landreman’s stale review via c52dd99 November 12, 2024 10:58

andrewgiuliani added 6 commits May 16, 2025 11:17

Merge branch 'master' into cbs/QUASR_loader

6a05267

update docstring

98a1fb7

added docstring to unit test

d47bcf9

mocking the unit test

16ae6a1

adding requests to CI

4ae282c

updated failure messages

3ff0a75

andrewgiuliani requested review from jsoules and mishapadidar and removed request for jsoules May 16, 2025 20:24

andrewgiuliani reviewed May 16, 2025

View reviewed changes

adding device ID=0000952

a9d0a45

andrewgiuliani self-requested a review May 16, 2025 20:40

andrewgiuliani marked this pull request as ready for review May 16, 2025 20:40

andrewgiuliani added 3 commits May 19, 2025 17:13

additing additional caching as proposed in PR #468

14daba0

removing unneeded import

19090b1

added another unit test, clarified documentation regarding caching

d4a4552

jsoules reviewed May 20, 2025

View reviewed changes

smiet added 2 commits June 3, 2025 18:41

Make QUASR downloader cache check writeability

af777eb

`get_quasr_data` always attempted to cache in the install directory of simsopt. Now it only does that if the directory is writeable, otherwise in the pwd. And an option to skip the caching alltogether.

fix mistakes

751bd9c

andrewgiuliani mentioned this pull request Jun 25, 2025

Refactor: unify stellarator configuration loaders into a single get_data API #532

Merged

missing-user reviewed Jul 26, 2025

View reviewed changes

		curves, currents, ma = get_QUASR_data(952)
		coils, ma, surfaces = get_QUASR_data(952, return_style='json')

		else:
		raise ValueError #should not be reached as we check before download to avoid clobbering the database.

	with open(FILE_PATH, 'wb') as f:
	os.makedirs(FILE_PATH.parent, exist_ok=True)
	with open(FILE_PATH, 'wb') as f:

Uh oh!

added a QUASR-downloader #425

Are you sure you want to change the base?

added a QUASR-downloader #425

Uh oh!

Conversation

smiet commented Jun 12, 2024

Uh oh!

codecov bot commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewgiuliani left a comment

Choose a reason for hiding this comment

Uh oh!

jsoules left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewgiuliani May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smiet commented Nov 12, 2024

Uh oh!

jsoules commented Nov 12, 2024

Uh oh!

smiet commented Nov 13, 2024

Uh oh!

andrewgiuliani commented Nov 13, 2024

Uh oh!

missing-user commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mishapadidar commented May 13, 2025

Uh oh!

smiet commented May 14, 2025

Uh oh!

jsoules commented May 14, 2025

Uh oh!

andrewgiuliani left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbkumar commented May 16, 2025

Uh oh!

jsoules left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 12, 2024 •

edited

Loading

andrewgiuliani May 16, 2025 •

edited

Loading

missing-user commented Jan 8, 2025 •

edited

Loading

andrewgiuliani left a comment •

edited

Loading

jsoules May 20, 2025 •

edited

Loading