Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 29, 2020

This PR adds DataFusion examples for a Flight client and server where the client can send a SQL query to the server and then receive the results.

I have manually tested with a Java client as well to confirm that it works.

@github-actions
Copy link

@nevi-me
Copy link
Contributor

nevi-me commented Jan 29, 2020

Hi @andygrove, please test if my changes work.

@andygrove
Copy link
Member Author

Thanks @nevi-me ! Testing with a Java client and the server returns 1 instance of FlightData without error but the client fails with:

org.apache.arrow.flight.FlightRuntimeException: CallStatus{code=INTERNAL, cause=null, description='Stream completed without receiving schema.'}

@andygrove
Copy link
Member Author

I guess my next step here might be to get a Java Flight server returning the same data and use wireshark to compare the two servers.

@nevi-me
Copy link
Contributor

nevi-me commented Jan 29, 2020

I guess my next step here might be to get a Java Flight server returning the same data and use wireshark to compare the two servers.

I know what the issue is, we're supposed to send the schema first before all the record batches.

@andygrove
Copy link
Member Author

@nevi-me It works! I was able to get the batch from a Java client!

@andygrove
Copy link
Member Author

I think the next steps for this PR are:

  • Fix the release verification issues (new inter-crate dependencies were added)
  • Add a flight-client.rs so we can test end to end
  • Add a README explaining the examples

I am traveling for the next two days but will pick this up when I can.

@andygrove andygrove changed the title ARROW-7684: [Rust] Example Flight server for DataFusion [WIP] ARROW-7684: [Rust] Example Flight client and server for DataFusion Feb 1, 2020
@andygrove andygrove requested a review from paddyhoran February 1, 2020 19:12
@andygrove
Copy link
Member Author

@nevi-me The only thing missing now is for the client to read the RecordBatch out of the returned FlightData in the client. Do you think you'll have time to help with that? If not, I can have a go.

@andygrove
Copy link
Member Author

Well I managed to get the client parsing the schema and data but it seems pretty hacky at the moment.

@andygrove andygrove requested a review from nevi-me February 1, 2020 21:38
@nevi-me
Copy link
Contributor

nevi-me commented Feb 1, 2020

I just saw your messages now, I'm looking at the changes that you've made. We could create something to take the flight stream, and return a RecordBatchReader,

ipc.header_as_schema().map(|schema| fb_to_schema(schema))
}

pub fn recordbatch_from_bytes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd need the data_header and data_body to read a record batch correctly. See my commit fixing this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well that looks much better! Thanks for doing that.

@andygrove
Copy link
Member Author

@nevi-me @paddyhoran any objection to me merging this one?

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection from my side

@andygrove andygrove closed this in d091894 Feb 3, 2020
kszucs pushed a commit that referenced this pull request Feb 7, 2020
This PR adds DataFusion examples for a Flight client and server where the client can send a SQL query to the server and then receive the results.

I have manually tested with a Java client as well to confirm that it works.

Closes #6308 from andygrove/datafusion-flight-example and squashes the following commits:

788feef <Andy Grove> code cleanup
9c47338 <Neville Dipale> Complete flight client's record batch reader
1337b98 <Andy Grove> parse recordbatch
459bef3 <Andy Grove> client parses schema from ipc batches
31c894b <Andy Grove> update release test script
efe05ae <Andy Grove> update release test script
5ecea83 <Andy Grove> formatting
8b419da <Andy Grove> update release test script
03d2c84 <Andy Grove> client streams results
0a39a51 <Andy Grove> client can stream batches
e72c605 <Andy Grove> add starting point for flight-client example
ab28da8 <Andy Grove> get schema from query plan instead of from first batch
0901a3f <Neville Dipale> Merge branch 'datafusion-flight-example' of https://github.com/andygrove/arrow into datafusion-flight-example
ad2e3b0 <Neville Dipale> send schema before batches
996f2a4 <Andy Grove> Use PARQUET_TEST_DATA env var
260f9ca <Neville Dipale> fix license violation
516b66d <Neville Dipale> add helpers to convert record batch to flight data proto message
6beb4ea <Andy Grove> WIP example Flight server for DataFusion

Lead-authored-by: Andy Grove <[email protected]>
Co-authored-by: Neville Dipale <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants