Wetland area and areas of landunits in coastal grid cells should be determined more rigorously

### Brief summary of bug

I believe that there are some problems with how we calculate wetland area, both in mksurfdata_map and at runtime. These problems lead to a situation where I believe there is currently no correct answer to the question, "Should PCT cover raw datasets specify the percent of the total grid cell or just the percent of the land area in which the given landunit appears?"

@olyson @lawrencepj1 @dlawrenncar  @wwieder @ekluzek @slevisconsulting @mvertens 

### General bug information

**CTSM version you are using:** master

**Does this bug cause significantly incorrect results in the model's science?** Yes, at least to some degree – though errors are probably relatively small

**Configurations affected:** All

### Background / motivation

@olyson recently asked:

> Should the raw input dataset fields (e.g., PCT_URBAN in 0.05deg raw input urban dataset) represent the fields in terms of a percentage of total gricell area, or in terms of a percentage of the total land area in the gridcell?

After giving this some thought, I have come to the conclusion that either choice will do some things correctly and some things incorrectly. I feel that it would be helpful if we could provide a clear answer to this question and have the resulting landunit fractions in the model be consistent with the raw datasets.

I should note that I have *not* carefully examined all of the relevant code or tried running various scenarios: My thoughts below are based mainly on my recollections of how mksurfdata_map works. Also note that, although I refer to mksurfdata_map in my comments, really I don't imagine any changes being made to that deprecated code: all changes can be applied to the new mksurfdata_esmf.

### Problems with the current formulation

#### Problems with specifying raw dataset PCT cover as percent of the entire grid cell

(A) If a raw dataset (e.g., for glacier) specifies PCT cover as percent of the entire grid cell, for a grid cell that is partially ocean, then we can end up with too much vegetated area and too little area of this special landunit in the final surface dataset.

For example, consider the idealized scenario where all raw data sets are at the same resolution, all raw data sets agree about the land/ocean mask/fraction, the model resolution matches this raw data set resolution, and the ocean model grid also agrees exactly with this land/ocean fraction. Let's consider a 0.05 deg grid cell in this scenario where the raw data sees 98% ocean, 1% glacier and 1% vegetated. If the glacier raw data set encoded this as 1% glacier, then I think that mksurfdata_map would create a model grid cell that is 1% glacier and 99% vegetated. When run in the model, with a landfrac of 2%, the final grid cell cover would be 98% ocean (i.e., owned by the ocean model), 0.02% glacier and 1.98% vegetated; this would be wrong. If, instead, the glacier raw data set encoded this as 50% glacier, then I think mksurfdata_map would create a model grid cell that is 50% glacier and 50% vegetated. When run in the model, with a landfrac of 2%, the final grid cell cover would be 98% ocean, 1% glacier and 1% vegetated, as desired. So for this idealized scenario where all data sets agree in their resolution and land fraction and no mapping is needed, I think we would want raw data sets to give percent cover in terms of percentage of just the land area in the grid cell, at least given the current treatments of wetland area.

#### Problems with specifying raw dataset PCT cover as percent of just the land area

However, there are also problems with doing the reverse – specifying raw dataset PCT cover as percent of just the land area. I can think of two off-hand:

(B) Averaging multiple source grid cells to form a destination cell: Consider a model (destination) grid cell that exactly contains two raw data (source) grid cells. One of the source grid cells is as above – with 98% ocean, 1% glacier and 1% vegetated; if we encode glacier area as percent of just the land area, the raw dataset would prescribe 50% glacier area. The other source grid cell is 100% land, but has 0% glacier. As above, let's assume that the ocean grid agrees exactly with the land fractions on the raw data sets. Then the resulting land grid cell would be 51% land and 49% ocean. The glacier landunit in this grid cell would cover 25% of the land area – since it is determined by averaging the 50% in grid cell 1 with 0% in grid cell 2. This is much more glacier area than there should really be: since grid cell 1 is mostly ocean, its percent glacier cover should have gotten much less weight in forming the average, and the actual glacier percent over the land portion of this grid cell should be a little less than 1%. (We could handle this by having a land fraction on each raw data set and including that in the mapping, but this could be hard to get right if different raw data sets have different land masks, and I believe that there is a simpler solution, as I explain below.)

(C) Filling missing areas with wetlands: Another scenario is one where the ocean model grid doesn't resolve the coast line exactly, so CTSM needs to fill in some missing areas with wetland. Our logic for how much wetland area to apply is currently inconsistent: see https://github.com/ESCOMP/CTSM/issues/1001#issuecomment-678518813 , where I state:

> I think that the faulty logic is in what we do with wetland % in grid cells with pftdata_mask > 1e-6: I'm starting to think that our handling of wetlands with respect to lake and glacier is correct (in places with pftdata_mask = 0, fill the remainder of the grid cell with wetlands where it wasn't filled with glacier or lake), and that the real problem is that, anywhere where we have pftdata_mask > 1e-6, we fill the whole grid cell with vegetation.

(see that issue comment for more elaboration). However, I believe that comment is only correct if the glacier and lake raw data sets give percent cover in terms of percentage of the entire grid cell.

Consider a scenario like that in (A), but now the ocean model grid does not fully resolve the coast line, so the land model grid cell in question ends up with 100% landfrac. In this case, we would want a wetland area of 98%. I think there is currently no way to achieve this 98% wetland cover in this scenario, but we certainly cannot achieve it with a glacier raw data set that prescribes 50% glacier cover, as suggested in (A).

(This was my motivation for this comment last year: https://github.com/ESCOMP/CTSM/issues/1406#issuecomment-871555350 )

### Suggested path forward

I believe that we can handle these scenarios and others correctly if we put in place the following conventions and changes:

(1) **Raw data sets of PCT cover should represent the fields in terms of percentage of the total grid cell area.** This keeps the averaging of raw data grid cells (scenario (B)) simple, without requiring use of the land fraction associated with each raw data set. This may already be how things are done; I'm not sure.

(2) **Change the setting of wetland cover in mksurfdata_map so that it fills all area outside the current raw data set masks, rather than the current logic where wetland area is only added if** `pctlnd_pft(n) < 1.e-6_r8`. More exactly, I think that it would make sense to consider two areas: (a) the mapped land fraction given by `pctlnd_pft`, and (b) the total special landunit area. It is important to consider (b) so that we don't overwrite some special landunits with wetland where these special landunits extend beyond the PFT data's land fraction. (This is particularly important for glaciers, where we can have floating ice shelves: see https://github.com/ESCOMP/CTSM/issues/545 .) We could probably do even better by considering each raw data set's notion of the land fraction, but thinking about these multiple, disagreeing land fractions makes my head hurt. So I propose treating the PFT data's land fraction as the definitive land fraction for the sake of setting wetland area, but imposing a limit that ensures that it doesn't overwrite any special landunits: something like:

``` fortran
pctwet(n) = min(100._r8 - pctlnd_pft(n), 100._r8 - pctgla(n) - pctlak(n) - pcturb(n))
pctwet(n) = max(pctwet(n), 0._r8)
```

In addition to setting wetland area more correctly, this formulation will also deal with scenario (A): The problem with the current code that leads to the issue in scenario (A) is that, for a grid cell with `pctlnd_pft(n) > 1e-6`, there is no wetland area and the natural vegetated landunit is given all of the remaining area in the grid cell. With the proposal here, most of the grid cell in scenario (A) would be filled with wetland, and natural vegetation would only fill the remainder: 1% of the grid cell, as desired.

Notes about the above:
- I am assuming that `pctlnd_pft` is formed by mapping a field that specifies the land *fraction* – not just land *mask* – in each source grid cell. That is, I'm assuming that the source data for this field can be any value between 0 and 1 on the raw data grid. I'm not sure whether that is currently the case.
- Note that all special land units are now included here, in contrast to the current code for setting wetland area, which ignores urban area. (Maybe that should have been changed in the context of https://github.com/ESCOMP/CTSM/pull/1587 but I didn't think of it then.) I am *not* including crop area... I think it's correct not to include crop area in this because I think it's effectively handled via `pctlnd_pft` in this respect (though I'm not positive about this).
  - Update (2022-11-17): I think I *will* include crop area: see https://github.com/ESCOMP/CTSM/issues/1716#issuecomment-1318748012
- This isn't completely correct in some cases – e.g., consider the case where `pctlnd_pft(n) = 50` and `pctgla(n) = 50`, but all of the glacier cover is floating ice shelves that are complementary to the land fraction. Then we would really want `pctwet(n)` to be 0, but we would end up with it being 50. I think that handling this entirely correctly could be done with some extra mappings, but I think we can live with this error.

(See also https://github.com/ESCOMP/CTSM/issues/1001#issuecomment-678518813 for a similar suggestion, though I think I got the exact solution wrong there.)

Update (2022-04-29): If we wanted to accommodate inland wetlands (which are no longer used by default but may still be allowed), then we would need to add the above expression to any existing wetland areas read from the wetland raw dataset.

(3) **Change CTSM's runtime code so that, in grid cells that have landfrac < 1, wetland area is preferentially decreased and then the remaining landunit areas (including any remaining wetland areas) are renormalized appropriately.** This is needed because, within CTSM, we think of each landunit's weight as being the fractional area of the land-owned portion of the grid cell, *not* the fractional area of the entire grid cell. I'm not aware of this correction being done currently, but I'm pretty sure that it should be done – at least if we do the other things suggested above. This is particularly important with the change proposed in (2), since now we will more often have fractional wetland areas in a grid cell. (Without the change in (2), I think it has often – though not always – been the case that wetland area is either 0% or 100% of a grid cell, so this change (3) hasn't been as important.)

**Care will need to be taken here to ensure that this same reweighting is also applied to transient land cover – including possibly the dynamic glacier areas that come from CISM (I want to give this glacier issue a little more thought).**
- Update (2022-04-25): see https://github.com/ESCOMP/CTSM/issues/1716#issuecomment-1108959121 for thoughts on the dynamic glacier areas from CISM. The summary is that I think these are already expressed as a fraction of the land-owned area, so they do NOT need to be reweighted – i.e., this coupling can stay as it currently is.
- Update (2022-04-29): My initial thought here is that, in initialization, we would compute a scaling factor for each grid cell based on a combination of landfrac and the wetland area on the surface dataset. We would store a gridcell-level array with those scaling factors. Then, when reading landunit areas from the landuse timeseries file, we would apply the same saved scaling factors.

For example, in scenario (A), the surface dataset would specify 98% wetland, 1% glacier and 1% natural vegetation. The runtime land fraction is 2%. Wetland area would be preferentially decreased – to exactly 0% in this case – and then the remaining areas would be renormalized so that the resulting grid cell has 50% glacier and 50% natural vegetation over the small land area – as desired.

In scenario (C) we again have 98% wetland, 1% glacier and 1% natural vegetation on the surface dataset, but now the runtime land fraction is 100%. There would be no decrease of wetland area in this scenario, and the resulting grid cell would remain with 98% wetland, 1% glacier and 1% natural vegetation – again, as desired.

Also consider an intermediate scenario with the same specification on the surface dataset but where the runtime land fraction is 50%. Wetland area would be decreased to 48%, then the remaining landunit areas would be renormalized, resulting in a grid cell where the land-owned areas are covered by 96% wetland, 2% glacier and 2% natural vegetation. (So, in terms of the total grid cell area, we would have 50% ocean, `0.5*96%` = 48% wetland `0.5*2%` = 1% glacier, and `0.5*2%` = 1% natural vegetation – once again, as desired.)

Finally, consider a scenario where the runtime land fraction is greater than the wetland area: again, the surface dataset specifies 98% wetland, 1% glacier and 1% natural vegetation, but now the runtime land fraction is 1%. This would work out the same as the first example: wetland area would be completely decreased, and other landunit areas would be renormalized. No explicit changes are needed to account for the land fraction being smaller than the specified wetland area.

(4) Update (2022-05-03): I realized that a fourth piece is needed, too: **Change the remapping of PCT landunit areas in mksurfdata to NOT normalize by the mask.** See https://github.com/escomp/ctsm/issues/1716#issuecomment-1116375444 for details.

### Final thoughts / summary about how this changes the treatment of wetland areas

In the past, we have *sort of* had a rule that wetlands covered either 0% or 100% of a grid cell. Before the fix for https://github.com/ESCOMP/CTSM/issues/545 the exception was lakes, and after that fix the exception was lake and glacier areas. But I am coming to see that – in addition to the inconsistency in how different landunits are treated with respect to determining wetland area – this rule makes it hard/impossible to treat other aspects of landunit areas correctly.

I am therefore proposing a more rigorous determination of wetland areas, both in mksurfdata_map and at runtime, which I believe will allow us to have a more correct and more conceptually consistent treatment of landunit areas in coastal grid cells.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wetland area and areas of landunits in coastal grid cells should be determined more rigorously #1716

Brief summary of bug

General bug information

Background / motivation

Problems with the current formulation

Problems with specifying raw dataset PCT cover as percent of the entire grid cell

Problems with specifying raw dataset PCT cover as percent of just the land area

Suggested path forward

Final thoughts / summary about how this changes the treatment of wetland areas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wetland area and areas of landunits in coastal grid cells should be determined more rigorously #1716

Description

Brief summary of bug

General bug information

Background / motivation

Problems with the current formulation

Problems with specifying raw dataset PCT cover as percent of the entire grid cell

Problems with specifying raw dataset PCT cover as percent of just the land area

Suggested path forward

Final thoughts / summary about how this changes the treatment of wetland areas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions