Skip to content

[BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run #2278

@bjm88620

Description

@bjm88620

SynapseML version

com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3

System information

  • Language version (e.g. python 3.8, scala 2.12): python 3.9
  • Spark Version (e.g. 3.2.3): 3.3.2
  • Spark Platform (e.g. Synapse, Databricks): Databricks

Describe the problem

I have a for-loop lightgbm fit job for rolling back validation;
The job failed on multi-node cluster with log error Connection Refused, and after checked the failed tasks, the executor failed with detail error message java.lang.ArrayIndexOutOfBoundsException and caused the Connection Refused error;

Meanwhile the job can run on single-node cluster without any issue.

The dataframe sent to model is around 48,000, with partition as below

Partition 0 has 19000 records
Partition 1 has 18000 records
Partition 2 has 7000 records
Partition 3 has 4000 records

And the issue cannot be fixed by df.repartition(5).

Screenshot 2024-09-04 at 21 16 29

Code to reproduce issue

max_base_date = '2024-09-01'
tmp_train_df = train_merged_df.where(sf.col('base_date')<max_base_date).cache()
tmp_actual_df = actual_merged_df.where(sf.col('base_date')<max_base_date).cache()
model.fit(tmp_train_df, tmp_actual_df)

Other info / logs

No response

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions