Skip to content

aws-glue-alpha: not possible to disable metrics on glue jobs #35149

@pepdekpd

Description

@pepdekpd

Describe the bug

It seems that in the various glue spark job constructs, the argument --enable-metrics is always passed.
According to glue documentation though passing the argument regardless of the value enables metrics.
protected nonExecutableCommonArguments(props: SparkJobProps): {[key: string]: string} { // Enable CloudWatch metrics and continuous logging by default as a best practice const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); const profilingMetricsArgs = { '--enable-metrics': '' }; const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' };

--enable-metrics
Enables the collection of metrics for job profiling for this job run. These metrics are available on the AWS Glue console and the Amazon CloudWatch console. The value of this parameter is not relevant. To enable this feature, you can provide this parameter with any value, but true is recommended for clarity. To disable this feature, remove this parameter from your job configuration.

As these metrics seem part of a cloudwatch custom namespace "Glue" we incur costs on usage type "EU-CW:MetricMonitorUsage".
So we like to disable this metric colletion, and see if we can control the cloudwatch costs.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Library Version

No response

Expected Behavior

Somehow you should be able to disable the metric collection on spark glue jobs.

Current Behavior

Metrics are collected no matter what.

Reproduction Steps

`from aws_cdk import aws_glue_alpha as glue_alpha, aws_iam as iam, Stack

from constructs import Construct

class BugStack(Stack):

def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
    super().__init__(scope, construct_id, **kwargs)

    glue_job_role = iam.Role(
        self,
        "glue-job-role",
        assumed_by=iam.CompositePrincipal(
            iam.ServicePrincipal("glue.amazonaws.com"),
        ),
    )

    self.glue_job = glue_alpha.PySparkEtlJob(
        scope=self,
        id="blabla",
        job_name="blabla-job",
        glue_version=glue_alpha.GlueVersion.V5_0,
        script=glue_alpha.Code.from_asset(path="requirements.txt"),
        max_concurrent_runs=1,
        role=glue_job_role,
        max_retries=0,
        number_of_workers=2,
    )

`

If you synthesize this stack, you see in the cloudformation template:
"blablaE9B37712": { "Type": "AWS::Glue::Job", "Properties": { "Command": { "Name": "glueetl", "PythonVersion": "3", "ScriptLocation": { "Fn::Join": [ "", [ "s3://", { "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" }, "/5bc8938db6aed20519ef1629110549ff89b31be88860574351eed55b96b4a3fc.txt" ] ] } }, "DefaultArguments": { "--job-language": "python", "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-observability-metrics": "true" }, "ExecutionProperty": { "MaxConcurrentRuns": 1 }, "GlueVersion": "5.0", "JobRunQueuingEnabled": false, "MaxRetries": 0, "Name": "blabla-job", "NumberOfWorkers": 2, "Role": { "Fn::GetAtt": [ "gluejobroleAAC64F87", "Arn" ] },

Possible Solution

No response

Additional Information/Context

No response

AWS CDK Library version (aws-cdk-lib)

2.208.0

AWS CDK CLI version

2.1023.0

Node.js Version

22.12.0

OS

ubuntu

Language

Python

Language Version

3.12

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-glueRelated to AWS Glueeffort/mediumMedium work item – several days of effortfeature-requestA feature should be added or improved.p2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions