Skip to content

Setting a user-defined atmosphere iteration number or time on GPUs causes illegal memory crash #566

@taimoorsohail

Description

@taimoorsohail

I am trying to run a ClimaOcean.jl simulation by loading a set of u, v, T, S and e fields and launching the simulation at some specified time and iteration number (basically a rudimentary checkpointer implementation). Here is a working example: https://github.com/taimoorsohail/ocean-ensembles/blob/ts/parallel-run/test/synching_clock.jl

Unfortunately, the synching of the simulation and atmosphere clocks causes an error on GPUs (only - CPU works fine) when doing run!(simulation).

Specifically, I get the error:

ERROR: LoadError: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)

originating from the line: https://github.com/CliMA/Oceananigans.jl/blob/fc8db79871b673811d6cc5df0e7057af869d601f/src/Models/HydrostaticFreeSurfaceModels/SplitExplicitFreeSurfaces/compute_slow_tendencies.jl#L42
in Oceananigans.jl.

I tested some combinations of time and iteration numbers, and the code seems to work when set to any iteration no. but time is set to 0 OR when the atmosphere and simulation times are set to 0 but ocean time is set to a user defined value.

Is there something specific to GPUs I am hitting here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions