Cast column types with select() instead of withColumn() #396

rli602 · 2021-11-08T20:26:19Z

Issue #, if available: N/A

Description of changes:
We're facing slowness while running the ColumnProfiler on large datasets. According to the PySpark's doc, using withColumn for multiple times, like in a for-loop, can cause performance issues. select() is suggested here to replace the multiple withColumn() call.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

cast column type with select instead of withColumn

4eb9cae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cast column types with select() instead of withColumn() #396

Cast column types with select() instead of withColumn() #396

Uh oh!

rli602 commented Nov 8, 2021

Uh oh!

Uh oh!

Cast column types with select() instead of withColumn() #396

Are you sure you want to change the base?

Cast column types with select() instead of withColumn() #396

Uh oh!

Conversation

rli602 commented Nov 8, 2021

Uh oh!

Uh oh!