ARROW-1257: Plasma documentation #881

pcmoritz · 2017-07-24T22:42:59Z

Thanks a lot to @crystalzyan who did all the heavy lifting for this PR!

pcmoritz · 2017-07-24T22:52:00Z

python/doc/source/plasma.rst

Ideally we would use get_record_batch_size here, but it doesn't account for the metadata.

wesm

Will have to look through the rest in more detail later

wesm · 2017-07-25T01:58:59Z

python/doc/source/plasma.rst

Seems like we might merge these with the more general build instructions? Though I suspect that most users will obtain the plasma client via pip or conda packages

crystalzyan · 2017-07-25T09:07:36Z

Hey Philipp,

Five things came up when I was reviewing plasma.rst (the plasma python tutorial):

Mac OS X Installation Instructions Don't Work For Me

... I still don't know what's wrong with my mac, but trying to follow the installation instructions for plasma still don't work. I get this error when trying to import plasma:

I've tried the installation instructions with both your pcmoritz/arrow plasma-cython branch, and the actual apache/arrow repo, I did update my dependency packages, but this still happens. Did something go wrong in the install pyarrow + plasma step?

Also, there's a paragraph in the Mac OS X Installation Instructions I had left which goes something like:

Plasma also requires the build-essential, curl, unzip, libboost-all-dev, and libjemalloc-dev packages. MacOS should already come with curl, unzip, and the compilation tools found in build-essential.

This was honestly more of a note to myself than anything, so I'm not sure if this paragraph is still necessary (it might confuse users).

Does PlasmaClient.get Take in only Lists?

I noticed that in the tutorial, all calls made to PlasmaClient.get seem to require list brackets for the argument and return result:

[buffer2] = client2.get([object_id])

It's a little unexpected, but this is how the method's syntax behaves, correct? If so, we could maybe add a sentence in the Getting an Object section that the PlasmaClient.get method only takes in/outputs lists. (in contrast to Ray.get).

Also, if it's like ray.get in that it can get multiple object ids at once, we might want to include a code example of that capability:

Note that client.get takes in the single argument object_id in a list, and outputs the single plasma object in a list. This is because the syntax for client.get supports getting multiple Object IDs as well. To get multiple objects at once, you would similarly pass in-and-out the objects as a list, like follows:

[buffer_A, buffer_B] = client2.get([object_id_A, object_id_B])

Reword the Timeout Explanation

Under the Getting an Object section, you included a mention of the timeout_ms argument for the PlasmaClient.get function:

If the object has not been sealed yet, then the call to client.get will block until the object has been sealed by the client constructing the object. Using the timeout_ms argument to get, you can specify a timeout for this (in milliseconds). After the timeout, the interpreter will yield control back.

These last two sentences are very brief and do not show a code example of the syntax of passing timeout_ms argument, which I would suggest to add as a comparison to a normal call to PlasmaClient.get.

Also, I'll mention that I actually found these two sentences confusing at first, since the word get wasn't even highlighted as code or written out in full, so I thought that it was part of the grammar of the sentence and that it wasn't a mention to the Python function PlasmaClient.get.

We could instead do something like:

If the object has not been sealed yet, then the call to client.get will block until the object has been sealed by the client constructing the object. However, we can limit how long client.get can block by passing in an optional timeout_ms argument.

By setting timeout_ms, we specify a timeout for this function call (in milliseconds). This timeout will force the interpreter to exit client.get early (regardless of success) if the function takes longer than timeout_ms milliseconds. Here is an example of using timeout_ms with client.get:

[buffer2] = client2.get([object_id], timeout_ms=100) // This function will timeout in 100 ms

Pandas Reference Link Broken

In the Storing Pandas DataFrames in Plasma section, I had originally included an rst link to the Using PyArrow with Pandas page of the arrow documentation. This was to let users know that they could check out the conversion charts between pandas and Arrow:

One can instead use pyarrow and its supportive API as an intermediary step to import the Pandas DataFrame into Plasma. Arrow has multiple equivalent types to the various Pandas structures, see the :ref:pandas page for more.

However, this :ref: link is currently broken, since the corresponding link anchor I put in pandas.rst has been removed. We should remove this broken link reference entirely, then.

Include One-Liners for Converting Plasma Objects Back to Arrow/Pandas

This is just an idea for the sake of convenience, but after we explain the users the conversion steps for PlasmaBuffer -> Arrow reader -> Arrow tensor -> numpy array in Getting Arrow Objects from Plasma (similarly, PlasmaBuffer -> Arrow BufferReader -> Arrow RecordBatchStreamReader -> Arrow RecordBatch -> Pandas DataFrame in Getting Pandas DataFrames from Plasma), we could also provide an equivalent condensed one-liner for the code example. This is so to show that all the conversion steps aren't really that intimidating or difficult:

For Arrow:

We can condense the entire procedure described above into one-liners as follows:

# Get the arrow object by ObjectID.
[buf2] = client.get([object_id])

# Equivalent one-liner to convert Plasma buffer back to Arrow tensor
tensor2 = pa.read_tensor(pa_BufferReader(buf2))

# Equivalent one-liner to convert Plasma buffer back to numpy array
array = pa.read_tensor(pa_BufferReader(buf2)).to_numpy()

For Pandas:

The above conversion procedures may seem lengthy, but we can put them all together into a one-liner as follows:

# Fetch the Plasma object
[data] = client.get([object_id])

# Equivalent one-liner to convert Plasma buffer back to Pandas dataFrame
result = pa.RecordBatchStreamReader(pa.BufferReader(data)).read_next_batch().to_pandas()

…allation for mac incomplete

…ontents header at top, minor tweaks to Linux Installation section. Still need to do Installation on Mac OS and storing Arrow/Panda in Plasma

…t to 'Getting an Object' subsection in Plasma API.

…for Starting the Object Store, Creating Clients, Creating Objects, Getting Objects, Transferring to Remote Stores, Querying Status, Releasing Objects, and Shutting Down Clients and Stores. Basically all of the PlasmaClient API. Warning- I could not get C++ running on my machine to verify that any of the code runs properly/works. Please verify all code and tutorial content

robertnishihara · 2017-08-01T05:21:13Z

I pushed a few small changes.

This looks good to me, nice job @crystalzyan :)

xhochy · 2017-08-01T13:54:29Z

cpp/apidoc/tutorials/plasma.md

+the Plasma store in this case, issue the command below:
+
+```shell
+killall plasma_store &


Why is killall sent to the background?

xhochy · 2017-08-01T13:56:27Z

cpp/apidoc/tutorials/plasma.md

+Alternatively, you can run the Plasma store in the background and ignore all
+message output with the following terminal command:
+
+```shell


Does using these annotations work in your doxygen version?

The shell ones aren't obviously doing anything (want me to remove them?), but the cpp ones definitely help. Using doxygen 1.8.13.

I can remove the cpp ones also if you prefer.

I just checked this out locally with doxygen 1.8.13. The rendered output looks OK, but it doesn't seem like shell is supported by the rendering engine (I tried tracking down where Doxygen's support for GH-flavored markdown is coming from but couldn't find anything conclusive).

The C++ looks good though so definitely leave that =)

Ok, I removed the shell keyword.

wesm

+1, very nice. thanks all!

pcmoritz commented Jul 24, 2017

View reviewed changes

wesm reviewed Jul 25, 2017

View reviewed changes

pcmoritz force-pushed the plasma-docs branch from 1bb76fd to 724fd2f Compare July 25, 2017 05:07

pcmoritz mentioned this pull request Jul 25, 2017

Fix typo in plasma protocol. #878

Closed

crystalzyan and others added 16 commits July 31, 2017 20:39

Plasma documentation- initial writeup of installation for linux. Inst…

c02955b

…allation for mac incomplete

Plasma documentation- Copied and edited Plasma API section, added a c…

5cf63e9

…ontents header at top, minor tweaks to Linux Installation section. Still need to do Installation on Mac OS and storing Arrow/Panda in Plasma

Plasma documentation- tweaked contents headings hierarchy, added a bi…

25abf83

…t to 'Getting an Object' subsection in Plasma API.

Plasma documentation- Added parts on using Arrow with Plasma

a49e122

Plasma documentation- Added using Pandas with Plasma sections.

2be9eab

remove old test.py

f51f41e

fix plasma documentation

3f3f373

complete installation instructions on macOS

bc078ff

Plasma C++ tutorial documentation - minor formatting fixes

caac479

edit the C++ tutorial (work in progress)

9a8437c

update C++ documentation

84141b6

unify installation instructions

193e00b

fix docs

ba8b0df

more fixes

c884720

cleanup

80aaf89

pcmoritz force-pushed the plasma-docs branch from 1cba64b to 80aaf89 Compare August 1, 2017 03:39

pcmoritz and others added 3 commits July 31, 2017 20:53

API changes

791e5b0

Some changes to plasma.md and add syntax highlighting.

4163ccf

Small changes to python plasma documentation.

21bdc01

pcmoritz changed the title ~~[WIP] ARROW-1257: Plasma documentation~~ ARROW-1257: Plasma documentation Aug 1, 2017

xhochy reviewed Aug 1, 2017

View reviewed changes

robertnishihara added 2 commits August 1, 2017 10:32

Fix typo.

4b987e8

Remove unsupported shell keyword from plasma.md.

c4ab47e

robertnishihara force-pushed the plasma-docs branch from ec53098 to c4ab47e Compare August 1, 2017 19:09

wesm approved these changes Aug 2, 2017

View reviewed changes

asfgit closed this in 7e7861c Aug 2, 2017

robertnishihara deleted the plasma-docs branch August 2, 2017 06:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-1257: Plasma documentation #881

ARROW-1257: Plasma documentation #881

Uh oh!

pcmoritz commented Jul 24, 2017

Uh oh!

pcmoritz Jul 24, 2017 •

edited

Loading

Uh oh!

wesm left a comment

Uh oh!

wesm Jul 25, 2017

Uh oh!

crystalzyan commented Jul 25, 2017

Uh oh!

robertnishihara commented Aug 1, 2017

Uh oh!

xhochy Aug 1, 2017

Uh oh!

xhochy Aug 1, 2017

Uh oh!

robertnishihara Aug 1, 2017

Uh oh!

wesm Aug 1, 2017 •

edited

Loading

Uh oh!

robertnishihara Aug 1, 2017

Uh oh!

wesm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ARROW-1257: Plasma documentation #881

ARROW-1257: Plasma documentation #881

Uh oh!

Conversation

pcmoritz commented Jul 24, 2017

Uh oh!

pcmoritz Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

wesm Jul 25, 2017

Choose a reason for hiding this comment

Uh oh!

crystalzyan commented Jul 25, 2017

Mac OS X Installation Instructions Don't Work For Me

Does PlasmaClient.get Take in only Lists?

Reword the Timeout Explanation

Pandas Reference Link Broken

Include One-Liners for Converting Plasma Objects Back to Arrow/Pandas

Uh oh!

robertnishihara commented Aug 1, 2017

Uh oh!

xhochy Aug 1, 2017

Choose a reason for hiding this comment

Uh oh!

xhochy Aug 1, 2017

Choose a reason for hiding this comment

Uh oh!

robertnishihara Aug 1, 2017

Choose a reason for hiding this comment

Uh oh!

wesm Aug 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertnishihara Aug 1, 2017

Choose a reason for hiding this comment

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pcmoritz Jul 24, 2017 •

edited

Loading

wesm Aug 1, 2017 •

edited

Loading