Skip to content

Conversation

ekluzek
Copy link
Collaborator

@ekluzek ekluzek commented Aug 21, 2025

Description of changes

This adds a framework for adding some structure to tell the model that you want to bypass some parts of the model for some different types of testing that you may want to do. It's controlled by a namelist read as part of the SelfTestDriver code. Then there are some methods to control the bypass functionality mainly in clm_initialize and clm_driver.

There is some more work I think that should happen before this comes in. But, this gives the rough structure. I need this to get to this point so that I can use this branch in my other ones.

This does part of #3301 but in a cleaner way.

Specific notes

Contributors other than yourself, if any: @samsrabin and others for ideas

CTSM Issues Fixed (include github issue #):
This enables work in #2995
Starts #3276
Some work here #3295

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)? Yes
New namelist items for self tests

Does this create a need to change or add documentation? Did you do so? Maybe? But, no to second

Testing performed, if any: So far running a self test case. Will run the decomp_init testlist as well as aux_clm

Definition of done:

  • Get feedback on the general structure
  • Apply the suggested changes
  • Figure out what the error checking should look like
  • Decide on what things above should come in now, vs opened in an issue and done later in separate PR's
  • Have CTSM SE group review it
  • Add error checking in build-namelist
  • Add error checking in the code
  • Changes to the logical functions for robustness
  • Add some documentation around this

@ekluzek ekluzek requested a review from samsrabin August 21, 2025 23:23
@ekluzek ekluzek self-assigned this Aug 21, 2025
@ekluzek ekluzek added the enhancement new capability or improved behavior of existing capability label Aug 21, 2025
@ekluzek ekluzek added testing additions or changes to tests bfb bit-for-bit devops Development Operations to improve development throughput, E.g., adding GitHub Workflows labels Aug 21, 2025
@ekluzek ekluzek changed the base branch from master to b4b-dev August 21, 2025 23:24
@samsrabin
Copy link
Member

@ekluzek This is the PR I was asking about earlier. Is it ready for review or still under development?

@ekluzek
Copy link
Collaborator Author

ekluzek commented Aug 22, 2025

@ekluzek This is the PR I was asking about earlier. Is it ready for review or still under development?

Ahh, actually both. I want to get some feedback on it now. But, I think there are several things I need to do, before a final review. I want to get feedback on if this looks like even a good direction to pursue, as if not I'll brainstorm on a different direction. I also want to get some feedback now, if it does look promising on the list of things to do before merging it into b4b-dev.

Copy link
Member

@samsrabin samsrabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly how I was envisioning you implementing the self tests, so feel free to continue!

Fix spelling from review.

Co-authored-by: Sam Rabin <[email protected]>
Copy link
Collaborator Author

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes I'm making on things to do.

push @groups, "clm_canopy_inparm";
push @groups, "prigentroughness";
push @groups, "zendersoilerod";
push @groups, "for_testing_options";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should add more error checking to the build-namelist for these bypass options. I also think that these testing options should maybe do things like ensure that history and restart files are off the like. There also might be some error checking that some of the advanced options are NOT turned on with the bypass options, and that sort of thing. And it will be important to make sure it's clear that these bypass and testing options are turned on -- and they don't get turned on accidentally. It will take some thinking to figure that out.

! CalcIrrigationNeeded. Simply declaring this variable makes the ICE go away.
real(r8), allocatable :: dummy1_to_make_pgi_happy(:)
!-----------------------------------------------------------------------
if ( for_testing_bypass_run_except_clock_advance() ) return
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this short circuits the code -- it's worth thinking if this should be a simple return statement or a more explicit if structure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsrabin this is one thing I'd like to hear feedback on. Let me know what you think about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this one isn't just an early return; it actually short-circuits the entire subroutine. So why not just wrap its call in an if statement?

Copy link
Member

@samsrabin samsrabin Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing now that the other returns are the same way, so same question there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like your idea @samsrabin. For performance it's of course better to do at the higher level. For readability it might depend on where you want these things to be seen.

But, in this case it would remove this strange testing option from code that's a mix of science/software infrastructure to the higher level in the NUOPC cap that is pretty much just for SE's. So it helps with readability as well as separation of concerns.

call bgc_vegetation_inst%Init2(bounds_proc, NLFilename)
end if

if ( .not. for_testing_bypass_init_after_self_tests() )then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to indent the lines below. I left it unindents until I saw if it could run, and that this is the spot where it should be.

if (nsrest == nsrContinue ) then
call htapes_fieldlist()
end if
end if
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent lines above.

call hist_htapes_build()
end if

if ( .not. for_testing_bypass_init_after_self_tests() )then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing about the indents.

It's also not completelly clear how many of these type of things there should be.

One reason that I need to do this here -- rather than having return statements, is that I need the timers to work. So if a timer has been started in the code above, I can't return in the middle without that timer being messed up. So it needs to exucute the part of the code where the timer is stopped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsrabin this and the one above about the return statements are things I'd like to hear from you on. In here they need to be if statements as I say above, but in the above section they could be either.

Although, maybe because this is a weird pathway in the code -- the return statements should be preferred so it doesn't disrupt the code flow as much as if statements. It's easy to miss returns in the code, flow but that's only important when it's something that might happen commonly enough. There was a code standard in my past that recommended to not have return statements in the midst of subroutines, because they are easy to miss. And I do see that point...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer early returns as much as possible, especially since as you said these are rare cases. Avoiding returns by using if statements creates its own readability issues: How far indented am I right now? Under what conditions?

The bigger readability problem with this module is how enormous the subroutines are. I'd prefer to see things like initialize2 getting refactored before we start worrying about ifs vs. returns. Maybe that could be something you do before this PR? I.e., refactor that subroutine to create at least two new ones:

  1. Everything inside the first new if
  2. Everything inside the second new if

Maybe also (3) everything between the new ifs.

I wouldn't want to see that done in this PR, though, in the interest of keeping things easy to review and test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring could also avoid return statements too: Break subroutines into two smaller subroutines, with the call of the first being wrapped in an if statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good points, thanks for the comments. I'll be thinking about all this. I can't do a lot of work in refactoring clm_initialize and won't do it here. But maybe it would help to do some simple things in a different small PR. Hmmm...


! Initialize urban time varying data
call urbantv_inst%Init(bounds, NLFilename)
if ( .not. for_testing_bypass_init_after_self_tests() )then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some of my decompInit testing work, I found where the urbantv calll specifically was problematic, so I'm explicitly just avoiding this one call here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this isn't where the problem was.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I should probably remove this one.

character(len=*), parameter :: nmlname = 'for_testing_options'
!-----------------------------------------------------------------------

namelist /for_testing_options/ for_testing_bypass_init, for_testing_bypass_run
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the other for_testing namelist options should move to here as well.

call shr_mpi_bcast (for_testing_bypass_init, mpicom)
call shr_mpi_bcast (for_testing_bypass_run, mpicom)

if (masterproc) then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There also should be some error checking done here in the Fortran code.

!-----------------------------------------------------------------------

logical function for_testing_bypass_init_after_self_tests()
! Determine if should exit initialization early after having run the self tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should get more sosphisticated and do things like NOT return true until after the self-tests are run. And ensure that the self_test namelist was read in, or else abort. The run phase bypass should maybe NOT return true until after the init phase has passed and things like that.

 Conflicts:
	bld/namelist_files/namelist_definition_ctsm.xml
	src/cpl/nuopc/lnd_comp_nuopc.F90
	src/main/clm_varctl.F90
	src/self_tests/SelfTestDriver.F90
…sting_work

 Conflicts:
	cime_config/testdefs/ExpectedTestFails.xml
…sting_work

 Conflicts:
	cime_config/testdefs/testmods_dirs/clm/for_testing_fastsetup_bypassrun/user_nl_clm
	cime_config/testdefs/testmods_dirs/clm/run_self_tests/user_nl_clm
… stuff in it when use_noio is TRUE

Work on reconciling timers and for_testing bypass code from the mpi_scan
branch.
 Conflicts:
	src/main/clm_initializeMod.F90
… out, and so that the self-tests can run to completion afterwards

 Conflicts:
	src/cpl/nuopc/lnd_comp_nuopc.F90
	src/main/clm_initializeMod.F90
…Init_lnd timers to around the calls rather than for the entire subroutine, because the things at the top that may abort will then have a broken timer

 Conflicts:
	src/cpl/nuopc/lnd_comp_nuopc.F90
	src/main/decompInitMod.F90
…_exit_after_self_tests, change the self tests testmod so that its about initialization, this works with a compset with SATM, but hangs -- because nothing stops the run

 Conflicts:
	cime_config/testdefs/testmods_dirs/clm/run_self_tests/README
	cime_config/testdefs/testmods_dirs/clm/run_self_tests/shell_commands
	cime_config/testdefs/testmods_dirs/clm/run_self_tests/user_nl_clm
	src/cpl/nuopc/lnd_comp_nuopc.F90
… of atm_present and adjustments to how send_to_atm was done
…hould be

 Conflicts:
	src/main/clm_initializeMod.F90
@ekluzek ekluzek changed the title [WIP] For testing add a framework to bypass parts of the code For testing add a framework to bypass parts of the code Oct 1, 2025
@ekluzek ekluzek moved this from Todo to In Progress in LMWG: Sprint Planning Board Oct 6, 2025
@ekluzek ekluzek moved this from In Progress to Stalled in LMWG: Sprint Planning Board Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bfb bit-for-bit devops Development Operations to improve development throughput, E.g., adding GitHub Workflows enhancement new capability or improved behavior of existing capability testing additions or changes to tests

Projects

Status: Stalled

Development

Successfully merging this pull request may close these issues.

2 participants