Skip to content

Conversation

@pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Aug 27, 2017

Note that this PR breaks the PlasmaClient API (which is still unstable at this point, so this is acceptable). It renames PlasmaClient.get to PlasmaClient.get_buffers and introduces two new functions, PlasmaClient.put and PlasmaClient.get which can put Python objects into the object store and provide access to their content. The old get was renamed to get_buffers because most users will want to use the new get method and therefore it should have the more concise name.

There is some freedom in designing the API; I tried to make it so there is a unified API between getting one and multiple objects (the latter is supported to limit the number of IPC roundtrips with the plasma store when we get many small objects). I also introduced a special object that is returned if one of the objects was not available within the timeout. We could use "None" here, but then it would be hard to distinguish between getting a "None" object and a timeout.

release_delay, num_retries))
return result

def put(PlasmaClient client, value, object_id=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to specify types here for value like list value (since that's the only thing supported for now), and ObjectID object_id=None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to make these methods on the PlasmaClient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why these are not methods on the PlasmaClient is that there is already a method get on there, which gets the underlying Buffers. If that's preferable we can rename that method to get_buffers and have get and put be methods of the PlasmaClient. What do you think?

client.seal(id)
return id

def get(PlasmaClient client, object_ids, timeout_ms=-1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout_ms should be given a type of int

-------
The object ID associated to the Python object.
"""
cdef ObjectID id = object_id if object_id else ObjectID.from_random()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally try to avoid using id as a variable name since it is also a built in python function

release_delay, num_retries))
return result

def put(PlasmaClient client, list value, ObjectID object_id=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of forcing the user to pass in a list, why not just allow them to pass in anything and then wrap it in a list?

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Aug 28, 2017

@wesm What do you think about the new API? An alternative that we might want to consider is to leave the PlasmaClient API as it is and add the new API as object_id = pyarrow.plasma.dump(client, value) and pyarrow.plasma.load(client, object_id) or object_id = client.dump(value) and client.load(object_id), with terminology from Python pickle.

A potential issue of the current one is that users might try to mix high-level and low-level APIs in inappropriate ways (like call client.create and then client.get on the object_id).

I think the Windows build error is unrelated.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this looks very nice to me as is. I'm not sure what's up with the Appveyor build failure but I opened ARROW-1420 to figure it out

@asfgit asfgit closed this in c027a03 Aug 28, 2017
@robertnishihara robertnishihara deleted the plasma-putget branch August 28, 2017 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants