-
Notifications
You must be signed in to change notification settings - Fork 77
Re-allow restart in driven solver by parsing csv table from file #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b20b718
to
0c9f142
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One big bug right now, which is that if you run a non-restarting config file and there was already a postpro folder in the same location, which previously would have overwritten, it will now error saying the data to restart from is inconsistent.
There's also some minor things with the printing to terminal of the total iteration. I tested restarting the uniform cpw case a few times on either excitation and it seems to work on that.
Aside from that looks ok. I have some simplifications on hughcars/fix_restart_printing
and have extended the julia testing to allow looking at an already populated postpro folder which makes this simpler. Once this is in we should seriously look at refactoring the csv stuff away, because at this point we're reading and writing so could drastically simplify things by using an existing library. That's beyond the scope of this patch
pr however.
When ready, don't forget to add an entry in the CHANGELOG mentioning the bug fix.
palace/drivers/drivensolver.cpp
Outdated
std::size_t total_step_i = indexing.restart - 1; | ||
for (const auto &[exc_step_i, exc_kv] : | ||
EnumerateView(port_excitations.excitations, indexing.excitation_n0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this abstraction, nor the indexing helper, this is pretty straightforward integer arithmetic. Can then delete the EnumerateView
and its file.
const int restart_excitation_idx = (iodata.solver.driven.restart - 1) / omega_sample.Size() + 1;
const int freq_restart_idx = (iodata.solver.driven.restart - 1) % int(omega_sample.size());
for (const auto &[excitation_idx, excitation_spec] : port_excitations)
{
if (++excitation_counter < restart_excitation_idx)
{
continue;
}
....
// Frequency loop.
for (std::size_t omega_i = ((excitation_counter == restart_excitation_idx) ? freq_restart_idx : 0); omega_i < omega_sample.size(); omega_i++)
{
auto omega = omega_sample[omega_i];
Given restart is only applied to the online loop of the adaptive solve too, the same integer loops can be used there. See hughcars/fix_restarting_printing
for details and also some changes to the julia testing calls to allow checking it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnumerateView
and DrivenSolverIndexing
came about is because I had bugs with indexing while coding these loops and so decided I need some structure that help and are easy to test.
Originally, DrivenSolverIndexing
was a more complicated structure that generated nested excitation/frequency iterators — until I thought that this was too specific for the driver solver. Then I coded up EnumerateView
because we can use it elsewhere in the code too. Also it is just a a feature of the ranges
library that we will get in future C++ standards, so this is just bridge to then. Importantly, this class is easily testable in isolation. DrivenSolverIndexing
is now just a collection of POD that I am happy to remove.
There's no need for this abstraction, nor the indexing helper, this is pretty straightforward integer arithmetic. Can then delete the
EnumerateView
and its file.
Sure it is simple integer arithmetic. And I am not particularly attached to EnumerateView
. But there is the unfortunate fact that on your hughcars/fix_restarting_printing
branch you introduced two separate indexing errors:
- In the uniform sweep look (line 125) you wrote:
const int restart_excitation_idx = (iodata.solver.driven.restart - 1) / (omega_sample.size() + 1) + 1;
I - In the adaptive sweep loop (line 373) you wrote
const int restart_excitation_idx =(iodata.solver.driven.restart - 1) / port_excitations.Size() + 1
;
These should be the same and equal what you wrote in your github comment: const int restart_excitation_idx = (iodata.solver.driven.restart - 1) /omega_sample.Size() + 1;
(except Size()
should be size()
).
The rest looks correct (I think?), but I still had to draw a little 2d table on a pieced of paper and check various combinations. Specifically:
- The above assumes that
restart_excitation_idx
is 1-index, which is different fromfreq_restart_idx
which is 0-index. Which is super-confusing. Double so since you initializeint excitation_counter = 0;
and immediately increment. restart_excitation_idx
andfreq_restart_idx
should probably be called_counter
not_idx
sinceexcitation_idx
refers to something else (the value of excitation indexing being simulated).++excitation_counter < restart_excitation_idx
is correct but I would argue hard to read.
Not sure how to test this in a quick unit test, without running a bunch of small driven solves.
Like with our discussion on strong-typed indices, I would continue to suggest clean, readable and testable code, even when that requires a few 100s lines of tested boilerplate helpers. But I understand that what counts as "small helpers" vs "unnecessary overhead" is subjective.
Edit: Please make a decision exactly what solution you want here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: For port_excitations
we use std::map
. We actually use std::map
in all sorts of places for the map API but these things are typically tiny amounts of data, so std::map
is pretty inefficient. Also we often want both a "counter index" & "semantic index" (excitation, port, etc). So we could do this with map + enumerate. But we could also consider making a class with a vector backed and the two different indices like boost vector property map does. That would also give us random-access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 7 frequency samples, and four excitations, testing restarting in each of the excitation sweeps for cpw, then
const int excitation_restart_counter =
((iodata.solver.driven.restart - 1) / omega_sample.size()) + 1;
const int freq_restart_idx = (iodata.solver.driven.restart - 1) % omega_sample.size();
works. excitation_restart_counter
is a better name as it's supposed to be reflecting the excitation_counter
anyway. It can be changed to 0-based instead, but that means needing if (++excitation_counter < excitation_restart_counter)
to become if (excitation_counter++ < excitation_restart_counter)
which requires a "mid - left - right" parse, rather than left to right parse.
EnumerateView requires opening another file and then internalizing that, and maintaining that file and testing for it, for one call site, It's not worth it. I don't want to be introducing extra things for developers to have to learn only to loop over a 2 dimensional array.
palace/drivers/drivensolver.hpp
Outdated
// Mini helper class that stored indexing information for printing and restart. | ||
struct DrivenSolverIndexing | ||
{ | ||
std::size_t nr_total_samples; | ||
|
||
// Restart (1-Based indexing from config file) | ||
std::size_t restart = 1; | ||
|
||
// Offsets from restart (0-Based) | ||
std::size_t excitation_n0 = 0; | ||
std::size_t omega_n0 = 0; | ||
|
||
DrivenSolverIndexing(std::size_t nr_port_excitations, std::size_t nr_freq_samples, | ||
std::size_t restart_) | ||
: nr_total_samples(nr_port_excitations * nr_freq_samples), restart(restart_), | ||
excitation_n0(std::size_t(restart - 1) / nr_freq_samples), | ||
omega_n0(std::size_t(restart - 1) % nr_freq_samples) | ||
{ | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed, see the loops doing away with EnumerateView
.
palace/drivers/drivensolver.cpp
Outdated
|
||
total_step_i++; // Increment combined counter. | ||
} | ||
indexing.omega_n0 = 0; // No offset in omega from "Restart" in later excitations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, can compute this on the fly in the print message from the excitation and omega indices when using integer loops.
test/unit/test-view_helper.cpp
Outdated
@@ -0,0 +1,214 @@ | |||
#include <iterator> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete with EnumerateView
.
|
||
template <ProblemType solver_t> | ||
void PostOperatorCSV<solver_t>::MoveTableValidateReload(TableWithCSVFile &t_csv_base, | ||
Table &&t_ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears pretty complicated for loading a csv, then checking the number of rows and columns match the existing dimensions. Additionally it suggests you couldn't restart a uniform run with existing data, by adding an extra excitation index after the fact, as the two sizes wouldn't match exactly, but the loaded be a subset of the to be filled size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me respond to this in reverse:
it suggests you couldn't restart a uniform run with existing data, by adding an extra excitation index after the fact
This is correct — one absolutely cannot do that. Changing (including appending) which frequencies or excitations are being run was not a feature we envisaged and would require bigger changes.
Why? Fundamentally, because in a restart we don't parse the "old-config.json" file. We would need this in order to fully understand what the old data corresponds to and figure out what data is missing that the user asked for the "new-config.json". And even if we had the old-config.json a single restart
integer has issues if the shape of frequency / excitations changes (see below).
Importantly, we cannot currently reconstruct everything we need of "old-config.json" from parsing in the "data.csv" files we read in. Instead, we have to assume that the "data.csv" we parse has the same format required by the new-config.json. What MoveTableValidateReload
does is (a) checks to validate everything is consistent for the driver solver loop to continue error-free (b) get the information we cannot recover from parsing by looking at the default t_ref
table.
Consider things that go wrong with your idea:
-
Let's assume that we have a uniform run with two excitations (2,4) already done. Now we want to add an extra excitation index 1 after the start. Currently, we store and iterator over excitations in a map, so excitation indices are sorted. 1 has to pre-pend columns to the the table. So the "restart" indexing with a single inter no longer works. It gets worse if we want to add both 1,3 since then we have to interleave column blocks. In principle, we could print column blocks as 2,4,1,3 but we would need extra structure in config.json that encodes the run order of excitations.
-
No it get's worse if we start off with a uniform run with a single excitation (e.g. 2). Because of backward compatibility we decided to not print the excitation index if there is only a single excitation (except in the S matrix). So when we load the table we don't know what the original excitation was. Now I pass a new-config.json that asks for excitations 1,2,3 and a restart index. Even setting aside what the restart index means in this case, how does it know to move the existing data to column block 2 (and add the
[2]
print label to it)?
With more effort there are things I could have done with parsing information, but don't solve the indexing problem above:
- There are columns options (like float print precision and empty space padding) that I could parse from "data.csv" right now but don't. I just inherit them from
t_ref
for convenience. These are annoying to parse and would make the parsing in tableTable
using more involved (see your complaints above). Also, if we decide to change the padding default, I think the new table defaults should overwrite the old one anyway. - I don't validate the parse frequencies values printed in the data.csv against the one in "new-config.json" (only the overall expected length). I could do this for extra security.
This appears pretty complicated for loading a csv, then checking the number of rows and columns match the existing dimensions.
Could you please point to what you think is unnecessary in the checks / reload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think given the current state the settings and checks are necessary, my point is that we have arrived at a very complicated situation, for something that is deep down pretty simple: writing a string of numbers to a file, separated by commas and whitespace, and the ability to pick up where we left off. That we can't do something like deciding to add another excitation onto an existing set of runs, is indicative we've gone astray. This is tangent to this PR, but reviewing this has really made me appreciate it.
Yes, there was a typo the print message. The frequency index was divided by port excitation size which is why you were seeing something wrong:
should have been
This is fixed in the "Address PR Commit", but see below about integrating with the changes on your branch. |
I see. "No restart" still sets restart=1 and so if there is data in the tables already there, it will try and load existing tables and check that it can start writing at position 1 (which it can't since there is data there or the previous run had a different shape). Since we allow overwriting runs, we can just short circuit restart=1 to ignore existing tables. Edit below in commit: |
c18b36a
to
5beff28
Compare
Rebased this branch ( |
5beff28
to
52613df
Compare
One open issue is still testing of the indexing in the But I've not actually tested anything that does a solve, since I'm not sure the unit-test system is quite set up for that yet.
There are probably testing refactors one can do to make components more individually testable, but that is a longer term project. Your commit about the julia option change looks ok, but I do you have something specific in mind for hooking this into the CI? |
I wasn't intending to get this as an explicit CI run, as it's such a niche feature. It was just a capability I was using to perform testing with, which showed things up. I've also opted to just disable Restart on the adaptive runs on the branch i was using, as it doesn't make sense with them being the "cheap" piece, and the expensive piece of assembling the PROM having to be done no matter what. |
fe75043
to
a3b7f45
Compare
- Add DrivenSolverIndex data container to group values - Restart acts on total sample (freq and excitation combined) - Add EnumerateView helper for C++17 compatibility - Add tests
- Don't allow empty_cell_val per column for consistent re-parsing - Fix lshift bug in Column - Remove unused drive_source_index in PostOperator CSV - Remove unused AppendHeader, Append Row in TableWithCSVFile - Add comments - Add small tweaks for measurement table parsing Propagate table column block index - Needed for row alignment check during table reload
- Assumes all entries are all double
- Add function to return csv filepath for potential error messages
- Replace pointer to common options with explicit passing, in order to not write custom move
- Must validate table structure against expectation - Must copy some parameter from reference that can't be parsed - Validate cursor location with restart - Properly set nr_expected_measurement_rows for all solvers
- Thread through objects needed for init and print in call
- Move fem_op_t so PostOperatorCSV ctor can use it
- Move expected filling pattern into free function for testing - Add test case csv files to load
In PostOperatorCSV: - m_idx_row -> row_i - m_idx_value -> row_idx_v - excitation_idx_all -> ex_idx_v_all - SingleColBlock() -> HasSingleExIdx() - may_reload_table() -> MayReloadTable Also some test names changed.
a3b7f45
to
fa7bcc2
Compare
This resolves #376.
Palace had the "Restart" flag in the driven solver. Previously this was simple because printing always appended to files row-by-row. A started run did not need to know what happened or what was written before the restart.
With then new_postpro changes (#302) and multi-excitation support (#309), this is more complicated since all data for the full run is stored by PostOperatorCSV. This is all due to the fact that multi-excitations can't just append to rows below but need to change columns to the side. To re-enable restart, we have to parse back the csv file from disk, which we do using scnlib. When we load from the csv, we have to compare to the expected table, in order to (a) validate that the read in tables are in a good state (b) recover information about the Table state (like internal column names) not in the csv.
Additionally, we clean up various parts of the code for simplicity and to make it more testable. We add several unit tests for the table reload.
The commits have been squashed and cleaned to be understandable one-by-one.