Replies: 2 comments
-
I think the misunderstanding here is that So if
then
The limit it in the front because of limit pushdown. And so it is not always possible to modify In your example, it is probably fine, but we need to somehow save the plan and showed output together, which we don't do for memory purposes. Wdyt? |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply, @srilman I understand and agree with your point. However, what I wanted to highlight is that in some cases, calling df.show() twice may trigger two executions of the UDF, especially if the UDF has side effects. Algorithm developers do not have much experience on how Daft (or other SQL-like engines) works internally, and this behavior can be confusing for them. If we don’t have a good solution to this issue and decide to address it through better documentation and user guidance instead, I’m fine with that approach as well. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Suppose we have two UDFs:
udf1
andudf2
, and the following code:In this case, udf1 will be triggered twice, which is not what the user expects.
We can work around this by using
collect()
:However, this requires the user to have experience with how Daft works, which can be challenging for algorithm developers.
In conclusion, users must understand how Daft works and what happens when calling show(). Otherwise, this behavior may confuse users.
Beta Was this translation helpful? Give feedback.
All reactions