-
Notifications
You must be signed in to change notification settings - Fork 810
Agent Developer Mode
The Agent Developer Mode allows the user to collect a wide array of metrics concerning the performance of the agent itself. It provides visibility into bottlenecks when writing an AgentCheck and when making changes to the collector core.
The developer mode can be enabled by adding to your datadog.conf file
developer_mode: yes
Be sure to restart the agent after modifying the configuration file.
There is also an option to override the datadog.conf setting with the --profile command-line flag (e.g. python agent.py start --profile). When in developer mode the following functionality is enabled in the agent:
- Metrics for collection time, emit time and CPU used are sent to Datadog on every collector run.
- The collector loop is profiled using cProfile. At an interval specified by
collector_profile_intervalin the configuration file, thepstatsoutput for the collector loop is dumped tolog.debugas well as to the file./collector-stats.dmp. - An additional check
agent_metricsis run at the end of every collector loop. This check collects a variety of metrics about the collector's performance, and can be configured with the same interface used to configure regularAgentChecks. Source code for this check can be found under checks.d/agent_metrics.py
Here is an example configuration for the agent_metrics check:
init_config:
process_metrics:
- name: get_memory_info
type: gauge
active: yes
- name: get_io_counters
type: rate
active: yes
- name: get_connections
type: gauge
active: no
instances:
[{}]
Each element in the process_metrics list represents a single psutil.Process method that will be executed against the running collector process. The name field specifies the name of the method, the type field specifies the metric type (currently only gauge and rate are supported), and the active field is a utility flag to activate/deactivate certain method calls during the check. Note the method specified in name is executed only when:
- The method is available on the
psutil.Processclass as ofpsutil==2.1.1 - The underlying OS supports the execution of that method (e.g
get_io_countersis not available for OS X processes)
If the agent_metrics check cannot execute a particular method, it logs a warning and continues with its business. For debugging, the list of metrics collected in this check is available in the log (grep for AGENT STATS)
Metrics collected via the psutil methods are parsed and aggregated in a namespace derived from the method name and its output. E.g. get_memory_info is parsed to datadog.agent.collector.memory_info.rss and datadog.agent.collector.memory_info.vms. The logic for this parsing lives here and here. Once computed, these metrics are then aggregated and forwarded to Datadog as with any other AgentCheck
It is sometimes useful to profile individual checks to spot bottlenecks and critical paths in agent performance. When used with agent.py check the --profile flag dumps some interesting profiling information to stdout. Presently this consists of the following:
- Check runtime
- Memory use and Disk I/O if available
- Pstats output restricted to 20 calls.
Here is an example of what you see when profiling the network check