You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that `ref="main"` is required; the default branch is "main", and must be
258
+
referred to explicitly.
259
+
249
260
### Source-level install for developers
250
261
251
262
If you want to work with and potentially change the `growthcleanr` code itself,
@@ -507,8 +518,9 @@ The following options change the behavior of the growthcleanr algorithm.
507
518
and included as valid measurements for cleaning.
508
519
509
520
-`sd.extreme` - default `25`; a very extreme value check on modified
510
-
(recentered) Z-scores used as a first-pass elimination of clearly implausable
521
+
(recentered) Z-scores used as a first-pass elimination of clearly implausible
511
522
values, often due to misplaced decimals.
523
+
512
524
-`z.extreme` - default `25`; similar usage as `sd.extreme`, for absolute
513
525
Z-scores.
514
526
@@ -555,10 +567,23 @@ techniques.
555
567
-`flag.both` - in case of two measurements with at least one beyond
556
568
thresholds, flag both instead of one (as in default)
557
569
558
-
-`sd.recenter` - defaults to NA; data frame or table w/median SD-scores per day
559
-
of life by gender and parameter. Columns must include param, sex, agedays, and
560
-
sd.median (referred to elsewhere as "modified Z-score"). By default, median
561
-
values will be calculated using growth data to be cleaned.
570
+
-`sd.recenter` - default `NA`; specifies how to recenter medians. May be a data frame
571
+
or table w/median SD-scores per day of life by gender and parameter, or "`nhanes`"
572
+
or "`derive`" as a character vector.
573
+
574
+
- If `sd.recenter` is specified as a data set, use the data set
575
+
- If `sd.recenter` is specified as "`nhanes`", use NHANES reference medians
576
+
- If `sd.recenter` is specified as "`derive`", derive from input
577
+
- If `sd.recenter` is not specified or `NA`:
578
+
- If the input set has at least 5,000 observations, derive medians from input
579
+
- If the input set has fewer than 5,000 observations, use NHANES
580
+
581
+
If specifying a data set, columns must include param, sex, agedays, and sd.median
582
+
(referred to elsewhere as "modified Z-score"), and those medians will be used for
583
+
centering. This data set must include a row for every ageday present in the dataset
584
+
to be cleaned; the NHANES reference medians include a row for every ageday in the
585
+
range (731-7305 days). A summary of how the NHANES reference medians were derived is
586
+
below under [NHANES reference data](#nhanes).
562
587
563
588
### Operational options
564
589
@@ -959,20 +984,77 @@ for `cleangrowth()`.
959
984
960
985
## <aname="related"></a>Related tools
961
986
962
-
The CDC provides a
963
-
[SAS Program for the 2000 CDC Growth Charts](https://www.cdc.gov/nccdphp/dnpao/growthcharts/resources/sas.htm)
964
-
which can also be used to identify biologically implausible values using a different
965
-
approach, as also implemented for `growthcleanr` in the function `ext_bmiz()`, described
966
-
above under [Computing BMI percentiles and Z-scores](#bmi).
987
+
The CDC provides a[SAS Program for the 2000 CDC Growth
988
+
Charts](https://www.cdc.gov/nccdphp/dnpao/growthcharts/resources/sas.htm) which can
989
+
also be used to identify biologically implausible values using a different approach, as
990
+
also implemented for `growthcleanr` in the function `ext_bmiz()`, described above under
991
+
[Computing BMI percentiles and Z-scores](#bmi).
967
992
968
993
[GrowthViz](https://github.com/mitre/GrowthViz) provides insights into how
969
-
`growthcleanr` assesses data, packaged in a Jupyter notebook. It ships with the
970
-
same `syngrowth` synthetic example dataset as `growthcleanr`, with results
971
-
included.
994
+
`growthcleanr` assesses data, packaged in a Jupyter notebook. It ships with the same
995
+
`syngrowth` synthetic example dataset as `growthcleanr`, with results included.
996
+
997
+
## <aname="nhanes"></a>NHANES reference medians
998
+
999
+
`growthcleanr`[releases](https://github.com/carriedaymont/growthcleanr/releases) up to
1000
+
1.2.4 offered two options for recentering medians, either the default of deriving
1001
+
medians from the input set, or supplying an externally-defined set of medians. These
1002
+
left out an option for researchers working with either small datasets or with data
1003
+
which might otherwise not be representative of the population, as deriving medians from
1004
+
the input set in those cases might be problematic. To provide a standard default
1005
+
reference to address these latter cases, a set of medians were derived from the
1006
+
[National Health and Nutrition Examination
1007
+
Survey](https://wwwn.cdc.gov/nchs/nhanes/Default.aspx) (NHANES). A summary of that
1008
+
process is below. As of release 1.2.5, the default behavior is:
1009
+
1010
+
- If `sd.recenter` is specified as a data set, use the data set
1011
+
- If `sd.recenter` is specified as `nhanes`, use NHANES
1012
+
- If `sd.recenter` is specified as `derive`, derive from input
1013
+
- If `sd.recenter` is not specified or `NA`:
1014
+
- If the input set has at least 5,000 observations, derive medians from input
1015
+
- If the input set has fewer than 5,000 observations, use NHANES
1016
+
1017
+
With the verbose `cleangrowth()` option `quietly = FALSE`, the recentering medians
1018
+
approach used will be noted in the output. If the input set has fewer than 100
1019
+
observations for any age-year, this will also be noted in the output.
1020
+
1021
+
The NHANES reference medians are based primarily on data from NHANES 2009-2010 through
1022
+
2017-2018, including approximately 39,000 heights/lengths and weights from children and
1023
+
adolescents between the ages of 0 months and <240 months. Weight and height SD scores
1024
+
were calculated from the [L, M, and S
1025
+
parameters](https://www.cdc.gov/growthcharts/percentile_data_files.htm) for the [CDC
1026
+
growth charts](https://www.cdc.gov/nccdphp/dnpao/growthcharts/resources/sas.htm) were
1027
+
used as the reference to calculate weight and height SD scores for the NHANES 2009-2010
1028
+
through 2017-2018 samples. Based on the distributions of age-days in children at 0
1029
+
months, an age adjustment was made based on the median number of days among these
1030
+
infants. This adjustment was made after consultation with the National Center for
1031
+
Health Statistics confirmed that a general assumption of ages occurring at the midpoint
1032
+
of the indicated integer month of age did not apply to children recorded as 0 months,
1033
+
and uses 0.75 months instead.
1034
+
1035
+
Weights were supplemented with a random sample of birthweights from NCHS's [Vital
1036
+
Statistics Natality Birth
1037
+
Data](https://www.nber.org/research/data/vital-statistics-natality-birth-data) for 2018. These had sample weights assigned so that the sum of the sample weights for the
1038
+
sample equalled the sum of the sample weights for each month for infants in NHANES, as
1039
+
NHANES is a multi-stage complex survey. The reference data was then smoothed using the
1040
+
`svysmooth()` function in the R
1041
+
[`survey`](https://cran.r-project.org/web/packages/survey/index.html) package to
1042
+
estimate the weight and height SD scores for each day up to 7,305 days, with a
1043
+
bandwidth chosen to balance between over- and under-fitting, and interpolation between
1044
+
the estimates from this function was used to obtain an estimate for each day of age.
1045
+
Predictions from a regression model fit to smoothed height SDs between 23 and 365 days
1046
+
(the youngest child in NHANES had an estimated age in days of 23) were used to extend
1047
+
smoothed height SD scores to children between 1 and 22 days of age.
972
1048
973
1049
## <aname="changes"></a>Changes
974
1050
975
-
For a detailed history of released versions, see `NEWS.md`.
1051
+
For a detailed history of released versions, see `NEWS.md`. Tagged releases, starting
0 commit comments