Skip to content

Conversation

@gitmodimo
Copy link
Contributor

@gitmodimo gitmodimo commented Mar 5, 2025

Rationale for this change

Concurrent queue primitive used in acero is also useful for writing custom acero nodes. Especially the Backpressure version which is very sepcific to acero. Making this primitive public simplifies and point into right direction when implementing custom nodes with backpressure.

What changes are included in this PR?

Apparently only header name change is required for this header to be installed. No symbols are exported as the queues are templates.

Are these changes tested?

It works on my windows and linux build.

Are there any user-facing changes?

Queue definition is available in new installed header.

@gitmodimo gitmodimo requested a review from westonpace as a code owner March 5, 2025 09:55
@github-actions
Copy link

github-actions bot commented Mar 5, 2025

⚠️ GitHub issue #45673 has been automatically assigned in GitHub to PR creator.

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 14, 2025
@zanmato1984
Copy link
Contributor

This PR fails on the lint of preventing inclusion of <mutex> in public headers. The lint rule was introduced in #2225 (https://issues.apache.org/jira/browse/ARROW-1722), likely due to the compatibility issue of <mutex> under C++/CLI.

Hi @kou @pitrou , do you known more details about the issue? And any idea how we can mitigate it? Thanks.

@kou kou changed the title GH-45673: Make ConcurrentQueue and BackpressureConcurrentQueue public interface GH-45673: [C++] Make ConcurrentQueue and BackpressureConcurrentQueue public interface Mar 14, 2025
@kou
Copy link
Member

kou commented Mar 14, 2025

I haven't used C++/CLI but it doesn't support <mutex>: #16761

Can we hide <mutex> into .cc? Or can we disable this when __cplusplus_cli is defined? #16761 (comment)

@kou
Copy link
Member

kou commented Mar 14, 2025

BTW, it seems that we can use nullptr with C++/CLI: https://learn.microsoft.com/en-us/cpp/extensions/nullptr-cpp-component-extensions?view=msvc-170

@zanmato1984
Copy link
Contributor

Thanks for the info.

Can we hide <mutex> into .cc? Or can we disable this when __cplusplus_cli is defined? #16761 (comment)

The classes being made public are templates so I'm afraid it won't be easy to hide <mutex> into .cc. I think it makes more sense to just exclude the classes for __cplusplus_cli. Hi @gitmodimo , could you make the change accordingly? Thanks.

@zanmato1984
Copy link
Contributor

I think it makes more sense to just exclude the classes for __cplusplus_cli. Hi @gitmodimo , could you make the change accordingly? Thanks.

Oops, just realized that this won't be easy too. The lint rule is doing hard text matching so it won't understand the semantic of <mutex> being excluded by #ifndef __cplusplus_cli.

@zanmato1984
Copy link
Contributor

--git a/cpp/src/arrow/acero/concurrent_queue_internal.h b/cpp/src/arrow/acero/concurrent_queue_internal.h
index a751db7026..40c7857d2a 100644
--- a/cpp/src/arrow/acero/concurrent_queue_internal.h
+++ b/cpp/src/arrow/acero/concurrent_queue_internal.h
@@ -20,7 +20,11 @@
 #include <condition_variable>
 #include <mutex>
 #include <queue>
+
 #include "arrow/acero/backpressure_handler.h"
+#include "arrow/util/macros.h"
+
+#include MUTEX
 
 namespace arrow::acero {
 
--git a/cpp/src/arrow/util/macros.h b/cpp/src/arrow/util/macros.h
index af29fd636b..f3b36495d8 100644
--- a/cpp/src/arrow/util/macros.h
+++ b/cpp/src/arrow/util/macros.h
@@ -129,6 +129,16 @@
 
 #endif  // ifndef NULLPTR
 
+#ifndef MUTEX
+
+#  ifdef __cplusplus_cli
+#    error "<mutex> not supported in C++/CLI"
+#  else
+#    define MUTEX <mutex>
+#  endif
+
+#endif  // ifndef MUTEX
+
 // ----------------------------------------------------------------------
 
 // clang-format off

Hi @kou , does the workaround above make sense?

@kou
Copy link
Member

kou commented Mar 14, 2025

Ah...

The workaround may not work because ConcurrentQueue class still uses std::mutex without #include <mutex>.

It may be better that we remove the lint check and add a CI job that uses C++/CLI. I'm not familiar with C++/CLI but it seems that we can use the COMMON_LANGUAGE_RUNTIME CMake target property: https://cmake.org/cmake/help/latest/prop_tgt/COMMON_LANGUAGE_RUNTIME.html

@pitrou
Copy link
Member

pitrou commented Mar 14, 2025

Does this imply we're promising to maintain these APIs and keep them compatible?

@zanmato1984
Copy link
Contributor

Does this imply we're promising to maintain these APIs and keep them compatible?

Yes. I think acero is designed to be customizable for users to implement their own plan nodes, most likely source and sink nodes. For source nodes, the concurrent queue can be a quite useful common utility.

@gitmodimo
Copy link
Contributor Author

Maybe I just use to arrow/util/mutex.h ?

@zanmato1984
Copy link
Contributor

Maybe I just use to arrow/util/mutex.h ?

Oh, nice finding! I think this concludes our discussion :)

@gitmodimo
Copy link
Contributor Author

I am not really sure what limiations c++/cli has. Is it safe to use std::condition_variable?

void WaitUntilNonEmpty(std::unique_lock<std::mutex>& lock) {
cond_.wait(lock, [&] { return !queue_.empty(); });
}

@pitrou
Copy link
Member

pitrou commented Mar 14, 2025

Let's be more radical: do we actually care about C++/CLI? We've never had any CI for it.

@gitmodimo
Copy link
Contributor Author

gitmodimo commented Mar 14, 2025

FWIW this works. Let me know which way to proceed.
https://github.com/gitmodimo/arrow/compare/PublicConcurrentQueue..PublicConcurrentQueueMutex
Had to add a bit clunky wait interface in Guard

@zanmato1984
Copy link
Contributor

I am not really sure what limiations c++/cli has. Is it safe to use std::condition_variable?

I guess not.

Let's be more radical: do we actually care about C++/CLI? We've never had any CI for it.

I'm not sure. Maybe @kou knows more? Or should we discuss it in ML?

@kou
Copy link
Member

kou commented Mar 15, 2025

@TobyShaw Are you still using Arrow with C++/CLI?

@TobyShaw
Copy link

TobyShaw commented Mar 15, 2025 via email

@kou
Copy link
Member

kou commented Mar 16, 2025

OK.
Let's drop support for C++/CLI: #45810

@gitmodimo
Copy link
Contributor Author

Upon further deliberation I have some doubts whether this should be public.
Correct me if I am wrong, but it seems to be kind of anit-pattern of acero used in asof_join and sorted_merge nodes tu use additional external thread to do the processing. The joining and and merging should ideally occur in InputReceived tasks.
In time series processing something like SerialSequencingQueue with backpressure would be better fit.

@zanmato1984
Copy link
Contributor

Upon further deliberation I have some doubts whether this should be public.

Correct me if I am wrong, but it seems to be kind of anit-pattern of acero used in asof_join and sorted_merge nodes tu use additional external thread to do the processing. The joining and and merging should ideally occur in InputReceived tasks.

In time series processing something like SerialSequencingQueue with backpressure would be better fit.

I agree that it seems a bit anti-pattern of asof join and sorted merge for using an external thread to do the processing. However imho that's independent of whether the concurrent queue can be useful enough for implementing customized nodes to be made public. If this queue is necessary for your node, then we can/should do it. Or if you feel it is more reasonable and available to utilize other queue implementation, then we can defer/close this pr.

No rush and no worry. Taking a step back of making something public is a good thing. And take your time to find the best way to implement your node. What do you think? @gitmodimo

@gitmodimo
Copy link
Contributor Author

No rush and no worry. Taking a step back of making something public is a good thing. And take your time to find the best way to implement your node. What do you think? @gitmodimo

Thank you for your patience. I think I will first try to implement my node using similar pattern as sink_node - it seems to be using pull based access pattern, like the one I need. Can we close this PR for now and open it at later time when/if needed?

@zanmato1984
Copy link
Contributor

Of course. I'm closing this as you currently wish. Feel free to reopen it once you change your mind. Thank you.

@zanmato1984 zanmato1984 closed this Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants