-
Notifications
You must be signed in to change notification settings - Fork 322
Adding InitializationTelemetry - e.g. guard rails reporting #7287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Under an unreasonably strict SecurityManager, even reading core properties can be blocked, so moving checks into the try.
Auto-injection telemetry requires different notifications for each case
Committing early, so I can share the branch
Adding InitializationTelemetry to the bootstrap InitializationTelemetry is used to report guard rail check failures via a helper process installed by single step Most of the real changes are in AgentBootstrap with a lot of attention towards working with restrictive SecurityManagers. With the most restrictive SecurityManagers access to env & properties are blocked, so SecurityPermission & Throwable checks are usually liberally throughout. For constructing the JSON that goes to the helper process, we don't want to introduce dependencies into the bootstrap. So I've implemented a small JsonBuffer class modeled loosely after GSON's JsonWriter.
dd-java-agent/src/main/java/datadog/trace/bootstrap/AgentBootstrap.java
Outdated
Show resolved
Hide resolved
dd-java-agent/src/main/java/datadog/trace/bootstrap/AgentBootstrap.java
Outdated
Show resolved
Hide resolved
dd-java-agent/src/main/java/datadog/trace/bootstrap/AgentBootstrap.java
Outdated
Show resolved
Hide resolved
dd-java-agent/src/main/java/datadog/trace/bootstrap/AgentBootstrap.java
Outdated
Show resolved
Hide resolved
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 48 metrics, 15 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.048 s) : 0, 1048480
Total [baseline] (8.514 s) : 0, 8514043
Agent [candidate] (1.045 s) : 0, 1045417
Total [candidate] (8.541 s) : 0, 8540898
section iast
Agent [baseline] (1.171 s) : 0, 1171235
Total [baseline] (8.961 s) : 0, 8960865
Agent [candidate] (1.176 s) : 0, 1175992
Total [candidate] (8.973 s) : 0, 8972714
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.178 s) : 0, 1178384
Total [baseline] (8.983 s) : 0, 8982520
Agent [candidate] (1.17 s) : 0, 1170064
Total [candidate] (8.929 s) : 0, 8929213
section iast_TELEMETRY_OFF
Agent [baseline] (1.176 s) : 0, 1176264
Total [baseline] (8.975 s) : 0, 8975043
Agent [candidate] (1.177 s) : 0, 1176895
Total [candidate] (9.0 s) : 0, 9000272
gantt
title insecure-bank - break down per module: candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (669.346 ms) : 0, 669346
BytebuddyAgent [candidate] (666.905 ms) : 0, 666905
GlobalTracer [baseline] (305.893 ms) : 0, 305893
GlobalTracer [candidate] (305.693 ms) : 0, 305693
AppSec [baseline] (51.512 ms) : 0, 51512
AppSec [candidate] (51.303 ms) : 0, 51303
Remote Config [baseline] (673.054 µs) : 0, 673
Remote Config [candidate] (660.96 µs) : 0, 661
Telemetry [baseline] (7.608 ms) : 0, 7608
Telemetry [candidate] (7.471 ms) : 0, 7471
section iast
BytebuddyAgent [baseline] (778.347 ms) : 0, 778347
BytebuddyAgent [candidate] (780.597 ms) : 0, 780597
GlobalTracer [baseline] (295.415 ms) : 0, 295415
GlobalTracer [candidate] (296.917 ms) : 0, 296917
AppSec [baseline] (51.277 ms) : 0, 51277
AppSec [candidate] (49.24 ms) : 0, 49240
IAST [baseline] (22.594 ms) : 0, 22594
IAST [candidate] (27.243 ms) : 0, 27243
Remote Config [baseline] (599.418 µs) : 0, 599
Remote Config [candidate] (588.972 µs) : 0, 589
Telemetry [baseline] (9.566 ms) : 0, 9566
Telemetry [candidate] (7.919 ms) : 0, 7919
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (782.48 ms) : 0, 782480
BytebuddyAgent [candidate] (777.8 ms) : 0, 777800
GlobalTracer [baseline] (297.673 ms) : 0, 297673
GlobalTracer [candidate] (296.289 ms) : 0, 296289
AppSec [baseline] (50.047 ms) : 0, 50047
AppSec [candidate] (48.261 ms) : 0, 48261
IAST [baseline] (22.075 ms) : 0, 22075
IAST [candidate] (24.909 ms) : 0, 24909
Remote Config [baseline] (610.774 µs) : 0, 611
Remote Config [candidate] (598.671 µs) : 0, 599
Telemetry [baseline] (11.958 ms) : 0, 11958
Telemetry [candidate] (8.728 ms) : 0, 8728
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (781.797 ms) : 0, 781797
BytebuddyAgent [candidate] (780.698 ms) : 0, 780698
GlobalTracer [baseline] (298.824 ms) : 0, 298824
GlobalTracer [candidate] (298.18 ms) : 0, 298180
AppSec [baseline] (49.579 ms) : 0, 49579
AppSec [candidate] (52.02 ms) : 0, 52020
IAST [baseline] (23.216 ms) : 0, 23216
IAST [candidate] (23.201 ms) : 0, 23201
Remote Config [baseline] (586.7 µs) : 0, 587
Remote Config [candidate] (599.911 µs) : 0, 600
Telemetry [baseline] (8.72 ms) : 0, 8720
Telemetry [candidate] (8.641 ms) : 0, 8641
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1055895
Total [baseline] (10.364 s) : 0, 10364411
Agent [candidate] (1.056 s) : 0, 1055750
Total [candidate] (10.404 s) : 0, 10404176
section appsec
Agent [baseline] (1.171 s) : 0, 1170596
Total [baseline] (10.421 s) : 0, 10421251
Agent [candidate] (1.172 s) : 0, 1172178
Total [candidate] (10.497 s) : 0, 10497078
section iast
Agent [baseline] (1.179 s) : 0, 1179248
Total [baseline] (10.842 s) : 0, 10841674
Agent [candidate] (1.181 s) : 0, 1181425
Total [candidate] (10.874 s) : 0, 10873508
section profiling
Agent [baseline] (1.261 s) : 0, 1260683
Total [baseline] (10.66 s) : 0, 10660170
Agent [candidate] (1.258 s) : 0, 1257614
Total [candidate] (10.666 s) : 0, 10665634
gantt
title petclinic - break down per module: candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (674.17 ms) : 0, 674170
BytebuddyAgent [candidate] (674.01 ms) : 0, 674010
GlobalTracer [baseline] (308.362 ms) : 0, 308362
GlobalTracer [candidate] (308.659 ms) : 0, 308659
AppSec [baseline] (51.66 ms) : 0, 51660
AppSec [candidate] (51.472 ms) : 0, 51472
Remote Config [baseline] (669.678 µs) : 0, 670
Remote Config [candidate] (677.575 µs) : 0, 678
Telemetry [baseline] (7.518 ms) : 0, 7518
Telemetry [candidate] (7.402 ms) : 0, 7402
section appsec
BytebuddyAgent [baseline] (678.718 ms) : 0, 678718
BytebuddyAgent [candidate] (680.933 ms) : 0, 680933
GlobalTracer [baseline] (300.059 ms) : 0, 300059
GlobalTracer [candidate] (300.875 ms) : 0, 300875
AppSec [baseline] (157.098 ms) : 0, 157098
AppSec [candidate] (158.659 ms) : 0, 158659
Remote Config [baseline] (607.48 µs) : 0, 607
Remote Config [candidate] (623.333 µs) : 0, 623
Telemetry [baseline] (9.935 ms) : 0, 9935
Telemetry [candidate] (8.414 ms) : 0, 8414
IAST [baseline] (21.498 ms) : 0, 21498
IAST [candidate] (18.29 ms) : 0, 18290
section iast
BytebuddyAgent [baseline] (783.275 ms) : 0, 783275
BytebuddyAgent [candidate] (784.048 ms) : 0, 784048
GlobalTracer [baseline] (297.429 ms) : 0, 297429
GlobalTracer [candidate] (298.447 ms) : 0, 298447
AppSec [baseline] (49.982 ms) : 0, 49982
AppSec [candidate] (53.162 ms) : 0, 53162
Remote Config [baseline] (572.185 µs) : 0, 572
Remote Config [candidate] (594.997 µs) : 0, 595
Telemetry [baseline] (9.422 ms) : 0, 9422
Telemetry [candidate] (8.961 ms) : 0, 8961
IAST [baseline] (25.038 ms) : 0, 25038
IAST [candidate] (22.63 ms) : 0, 22630
section profiling
ProfilingAgent [baseline] (95.267 ms) : 0, 95267
ProfilingAgent [candidate] (97.177 ms) : 0, 97177
BytebuddyAgent [baseline] (673.612 ms) : 0, 673612
BytebuddyAgent [candidate] (668.706 ms) : 0, 668706
GlobalTracer [baseline] (392.794 ms) : 0, 392794
GlobalTracer [candidate] (393.162 ms) : 0, 393162
AppSec [baseline] (53.178 ms) : 0, 53178
AppSec [candidate] (52.892 ms) : 0, 52892
Remote Config [baseline] (678.729 µs) : 0, 679
Remote Config [candidate] (692.804 µs) : 0, 693
Telemetry [baseline] (7.503 ms) : 0, 7503
Telemetry [candidate] (7.423 ms) : 0, 7423
Profiling [baseline] (95.291 ms) : 0, 95291
Profiling [candidate] (97.201 ms) : 0, 97201
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 18 unstable metrics. Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section baseline
no_agent (369.022 µs) : 349, 389
. : milestone, 369,
iast (484.788 µs) : 462, 508
. : milestone, 485,
iast_FULL (554.528 µs) : 533, 577
. : milestone, 555,
iast_GLOBAL (506.438 µs) : 484, 529
. : milestone, 506,
iast_HARDCODED_SECRET_DISABLED (494.913 µs) : 472, 518
. : milestone, 495,
iast_INACTIVE (443.625 µs) : 423, 464
. : milestone, 444,
iast_TELEMETRY_OFF (476.454 µs) : 454, 498
. : milestone, 476,
tracing (445.267 µs) : 425, 466
. : milestone, 445,
section candidate
no_agent (368.696 µs) : 349, 389
. : milestone, 369,
iast (486.704 µs) : 464, 509
. : milestone, 487,
iast_FULL (546.537 µs) : 525, 568
. : milestone, 547,
iast_GLOBAL (521.291 µs) : 499, 544
. : milestone, 521,
iast_HARDCODED_SECRET_DISABLED (480.521 µs) : 459, 502
. : milestone, 481,
iast_INACTIVE (444.036 µs) : 423, 465
. : milestone, 444,
iast_TELEMETRY_OFF (475.523 µs) : 453, 498
. : milestone, 476,
tracing (439.437 µs) : 419, 460
. : milestone, 439,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section baseline
no_agent (1.343 ms) : 1324, 1362
. : milestone, 1343,
appsec (1.715 ms) : 1690, 1739
. : milestone, 1715,
appsec_no_iast (1.704 ms) : 1679, 1729
. : milestone, 1704,
iast (1.48 ms) : 1457, 1503
. : milestone, 1480,
profiling (1.539 ms) : 1513, 1565
. : milestone, 1539,
tracing (1.476 ms) : 1452, 1500
. : milestone, 1476,
section candidate
no_agent (1.353 ms) : 1332, 1373
. : milestone, 1353,
appsec (1.74 ms) : 1716, 1763
. : milestone, 1740,
appsec_no_iast (1.713 ms) : 1688, 1739
. : milestone, 1713,
iast (1.485 ms) : 1462, 1508
. : milestone, 1485,
profiling (1.53 ms) : 1505, 1555
. : milestone, 1530,
tracing (1.466 ms) : 1442, 1490
. : milestone, 1466,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section baseline
no_agent (15.447 s) : 15447000, 15447000
. : milestone, 15447000,
appsec (14.919 s) : 14919000, 14919000
. : milestone, 14919000,
iast (18.601 s) : 18601000, 18601000
. : milestone, 18601000,
iast_GLOBAL (17.909 s) : 17909000, 17909000
. : milestone, 17909000,
profiling (15.426 s) : 15426000, 15426000
. : milestone, 15426000,
tracing (14.922 s) : 14922000, 14922000
. : milestone, 14922000,
section candidate
no_agent (15.43 s) : 15430000, 15430000
. : milestone, 15430000,
appsec (14.902 s) : 14902000, 14902000
. : milestone, 14902000,
iast (18.993 s) : 18993000, 18993000
. : milestone, 18993000,
iast_GLOBAL (17.733 s) : 17733000, 17733000
. : milestone, 17733000,
profiling (15.105 s) : 15105000, 15105000
. : milestone, 15105000,
tracing (14.968 s) : 14968000, 14968000
. : milestone, 14968000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.39.0-SNAPSHOT~e1e17fca28, baseline=1.39.0-SNAPSHOT~a4f73e2ed0
dateFormat X
axisFormat %s
section baseline
no_agent (1.466 ms) : 1455, 1478
. : milestone, 1466,
appsec (2.225 ms) : 2190, 2260
. : milestone, 2225,
iast (1.991 ms) : 1948, 2034
. : milestone, 1991,
iast_GLOBAL (2.032 ms) : 1988, 2076
. : milestone, 2032,
profiling (1.882 ms) : 1846, 1918
. : milestone, 1882,
tracing (1.841 ms) : 1808, 1873
. : milestone, 1841,
section candidate
no_agent (1.462 ms) : 1451, 1474
. : milestone, 1462,
appsec (2.236 ms) : 2200, 2271
. : milestone, 2236,
iast (1.99 ms) : 1947, 2033
. : milestone, 1990,
iast_GLOBAL (2.033 ms) : 1989, 2077
. : milestone, 2033,
profiling (1.864 ms) : 1830, 1899
. : milestone, 1864,
tracing (1.847 ms) : 1814, 1880
. : milestone, 1847,
|
Adding methods that return File instead of URL - avoids getting path then converting back to a File to delete Added convenience method that takes File for classpath - avoids conversions
Added SystemUtils which has getPropertyOrDefault, tryGetProperty, getEnvOrDefault, tryGetEnv These helpers handled SecurityException and return null or default value. Similarly added tryGetVersion and getVersionOrDefault to AgentJar
Adding tests to make sure that AgentBootstrap doesn't fail even with a very strict SecurityManager These tests are implemented via SecurityManagerCheck which is a simple main designed to be used with a javaagent and a TestSecurityManager TestSecurityManager is a SecurityManager with a minimal set of permissions for the javaagent to run normally TestSecurityManager has various sub-classes that restrict access to different things used by the javaagent. The aim is to make that under no circumstance does the javaagent fail in a way that prevents the program from running normally. More tests to follow
Fixing oversight in telemetry forwarder code - caught by adding more tests
I neglected to escape \ previously.
Rounded out InitializationTelemetryTest to cover... - complete failure case - normal start-up case - different failure cases - both complete & incomplete ...including unable to send telemetry because the forwarder is blocked Fixed problem with major failure inside Agent being improperly marked as complete
Pushing InitializationTelemetry down into Agent & AgentCLI - introduced InitializationTelemetry next to Agent & AgentCLI - wraps the bootstrap initialization telemetry via MethodHandles - changed InitializationTelemetry to BootstrapInitializationTelemetry to disambiguate - altered (Bootstrap)InitializationTelemetry to track completion status - allows callers or InitializationTelemetry to indicate completeness without having to return status everywhere - removed prior returning of booleans in favor of passing (Bootstrap)InitializationTelemetry through to various init routines
…Dog/dd-trace-java into dougqh/guardrails-telemetry
…Dog/dd-trace-java into dougqh/guardrails-telemetry
PerfectSlayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an issue with telemetry initialization
| if (bootstrapInitTelemetry == null) { | ||
| return new BootstrapProxy(bootstrapInitTelemetry); | ||
| } else { | ||
| return InitializationTelemetry.noneInstance(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to invert the if cases. BootstrapProxy is always created with a null instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I need to figure out a way to test this better, too.
| */ | ||
| public abstract class InitializationTelemetry { | ||
| /** Returns a proxy around a BoostrapInitializationTelemetry object */ | ||
| public static final InitializationTelemetry proxy(Object bootstrapInitTelemetry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final InitializationTelemetry proxy(Object bootstrapInitTelemetry) { | |
| public static InitializationTelemetry proxy(Object bootstrapInitTelemetry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep final almost everywhere, but I decided to remove the final where it is redundant because the class is final.
In this case, the class isn't final. So I'm keeping the final because I don't want someone to override this method on a child class.
| } | ||
|
|
||
| /** Returns a singleton of the none InitializationTelemetry */ | ||
| public static final InitializationTelemetry noneInstance() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final InitializationTelemetry noneInstance() { | |
| public static InitializationTelemetry noneInstance() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually have noop rather than none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
| final BootstrapInitializationTelemetry initTelemetry, | ||
| final String agentArgs, | ||
| final Instrumentation inst) | ||
| throws IOException, URISyntaxException, ClassNotFoundException, NoSuchMethodException, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClassNotFoundException seems never thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up the throws clause. I removed ClassNotFoundException and switched to ReflectiveOperationException.
I'm not really sure what I want this clause to be. I let the refactor define it, but I still feel it is a bit messy.
|
|
||
| /** Telemetry class used to relay information about tracer activation. */ | ||
| public abstract class BootstrapInitializationTelemetry { | ||
| public static final BootstrapInitializationTelemetry noneInstance() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final BootstrapInitializationTelemetry noneInstance() { | |
| public static BootstrapInitializationTelemetry noneInstance() { |
| public class SystemUtils { | ||
| private SystemUtils() {} | ||
|
|
||
| public static final String tryGetEnv(String envVar) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final String tryGetEnv(String envVar) { | |
| public static String tryGetEnv(String envVar) { |
| return getEnvOrDefault(envVar, null); | ||
| } | ||
|
|
||
| public static final String getEnvOrDefault(String envVar, String defaultValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final String getEnvOrDefault(String envVar, String defaultValue) { | |
| public static String getEnvOrDefault(String envVar, String defaultValue) { |
| } | ||
| } | ||
|
|
||
| public static final String tryGetProperty(String property) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final String tryGetProperty(String property) { | |
| public static String tryGetProperty(String property) { |
| } | ||
| } | ||
|
|
||
| public static final String getPropertyOrDefault(String property, String defaultValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final String getPropertyOrDefault(String property, String defaultValue) { | |
| public static String getPropertyOrDefault(String property, String defaultValue) { |
dd-java-agent/src/main/java/datadog/trace/bootstrap/SystemUtils.java
Outdated
Show resolved
Hide resolved
This reverts commit 13e5272.
Co-authored-by: Bruce Bujon <[email protected]>
…Dog/dd-trace-java into dougqh/guardrails-telemetry
Introducing Result class to replace just returning exitCode In subsequent change, Result class will also provide the forwarded JSON telemetry information
InitializationTelemetryCheck now sets up a forwarder script that captures the output to a file That output is then placed into the Result object in the new field telemetryJson InitializationTelemetryTest has update to spot check the results of the Result.telemetryJson
What Does This Do
Adding InitializationTelemetry for use by single step
Motivation
BootstrapInitializationTelemetry & InitializationTelemetry are used to report guard rail check failures via a helper process installed by single step
Additional Notes
Most of the real changes are in AgentBootstrap with a lot of attention towards working with restrictive SecurityManagers. With the most restrictive SecurityManagers access to env, properties, and process execution can be blocked.
To make dealing with SecurityExceptions less ugly, I've introduced a SystemUtils that helps with getProperty and getEnv.
But in many places, SecurityExceptions are still checked directly.
AgentBootstrap uses BootstrapInitializationTelemetry which records the gathered information into JSON buffers in memory and then sends to the helper process on finish.
For constructing the JSON that goes to the helper process, we don't want to introduce dependencies into the bootstrap. So I've implemented a small JsonBuffer class modeled loosely after GSON's JsonWriter.
To provide telemetry inside Agent, Agent is passed a BootstrapInitializationTelemetry by AgentBootstrap. The BootstrapInitializationTelemetry is wrapped by an InitializationTelemetry that uses MethodHandles to cross the ClassLoader divide.
APMLP-109