-
Couldn't load subscription status.
- Fork 706
add quotation to external APIs #1763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
36dec0d to
d766dbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good but I have a suggestion about how Quoted works in general.
I think right now the logic with passing quoted explicitly / implicitly (and Quoted.internal) is a bit fuzzy. Can we do instead something similar to cause in exceptions?
Let's make Quoted nested and basically putting quoted on method declaration means you want to pass information about this call site. And then if you have quoted parameter in the scope & apply quoted macro - put quoted from the context as .cause call site information.
I don't think we really need Quoted.internal in this case and overall logic seems more clear to me (quoted parameter - want to save information about call site).
Also Quoted become more or less call site information in this case (which maybe is a better name?).
As cons for that I can see is - you need to somehow encode how different properties (such as used field projection) propagate through cause. But I think it actually represents what we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is both exciting, but a bit scary. This touches virtually every public API and make everything binary incompatible.
In exchange for that, it is not yet clear what we are getting. I'd like to see maybe a quoted branch where we can use the final product with say Parquet to do some filter pushdown or projection pushdown.
If we don't have that as a killer app, I'm nervous about merging and getting to it later.
What do you think about making a quoted branch to make this PR into and when we have filter/projection pushdown working, we merge into develop?
| * to operate in the mappers. Just call take. | ||
| */ | ||
| override def bufferedTake(n: Int) = take(n) | ||
| override def bufferedTake(n: Int)(implicit q: Quoted) = take(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need a Quoted here. Not clear to me. Just to get position information of the caller?
|
|
||
| class NoStackLineNumberTest extends WordSpec { | ||
|
|
||
| "No Stack Shouldn't block getting line number info" should { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we replace this with an equivalent test that we can see the caller correctly?
Yeah, it's an intrusive change. I thought it was clear that it'd be necessary when we discussed it, though.
That's #1754. These pull requests are just a break down of that initial one. Note that only projection pushdown is in the scope of this change. |
@ttim I can't see the benefit of this proposal. The quotation propagation is just a way to make the initial information available in the final
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing the point, but I'd love to minimize the callsites that have to change.
To me, that could be done by only accepting Quoted where we take a lambda and only where we expect we could in principle do some filter pushdown or projection pushdown. Does that make sense?
Why else do we want the quoted?
|
|
||
| class FlattenLeftJoin3[KEY, KLL[KLL_K, +KLL_V] <: KeyedListLike[KLL_K, KLL_V, KLL], A, B, C](nested: KLL[KEY, ((A, B), C)]) { | ||
| def flattenValueTuple: KLL[KEY, (A, B, C)] = nested.mapValues { tup => FlattenGroup.flattenNestedTuple(tup) } | ||
| def flattenValueTuple(implicit q: Quoted): KLL[KEY, (A, B, C)] = nested.mapValues { tup => FlattenGroup.flattenNestedTuple(tup) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why we need the implicits here. Can you comment? Can't we statically see anything we need for the purposes of projection? Is this just to collect line numbers?
| * For now, just do a normal take. | ||
| */ | ||
| override def bufferedTake(n: Int): CoGrouped[K, R] = | ||
| override def bufferedTake(n: Int)(implicit q: Quoted): CoGrouped[K, R] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not take an implicit here if it can't contribute to row/column filtering, which I can't see how it can in the case of take.
| */ | ||
| sealed trait Sortable[+T, +Sorted[+_]] { | ||
| def withSortOrdering[U >: T](so: Ordering[U]): Sorted[T] | ||
| def withSortOrdering[U >: T](so: Ordering[U])(implicit q: Quoted): Sorted[T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't project/filter here right since we know sorting is 1 in 1 out.
| UnsortedIdentityReduce(keyOrdering, mapped.mapValues(fn), reducers, descriptions) | ||
|
|
||
| override def sum[U >: V1](implicit sg: Semigroup[U]) = { | ||
| override def sum[U >: V1](implicit sg: Semigroup[U], q: Quoted) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since semigroups emit the whole object, do we really need the quoted here? Seems like we can assume there can be no projection/filtering here.
| * prefer this to take(1). | ||
| */ | ||
| def head: This[K, T] = sum(HeadSemigroup[T]()) | ||
| def head(implicit q: Quoted): This[K, T] = sum(HeadSemigroup[T](), q) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't really filter here, can we? why to we need quoted?
| * planner sees that you want to force a shuffle. For expert tuning | ||
| */ | ||
| def forceToReducers: This[K, T] = | ||
| def forceToReducers(implicit q: Quoted): This[K, T] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we avoid quoted here?
| * Use Algebird Aggregator to do the reduction | ||
| */ | ||
| def aggregate[B, C](agg: Aggregator[T, B, C]): This[K, C] = | ||
| def aggregate[B, C](agg: Aggregator[T, B, C])(implicit q: Quoted): This[K, C] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can look inside the aggregrator and filter, can we? Can we really use the quoted meaningfully?
| * For each key, selects all elements except first n ones. | ||
| */ | ||
| def drop(n: Int): This[K, T] = | ||
| def drop(n: Int)(implicit q: Quoted): This[K, T] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't filter here, can we?
| * and this method | ||
| */ | ||
| def fold[V](f: Fold[T, V]): This[K, V] = | ||
| def fold[V](f: Fold[T, V])(implicit q: Quoted): This[K, V] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we look inside the Fold?
| * This assumes the values for each key can fit in memory. | ||
| */ | ||
| def distinctValues: This[K, T] = toSet[T].flattenValues | ||
| def distinctValues(implicit q: Quoted): This[K, T] = toSet[T].flattenValues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't filter here, can we?
@johnynek take a look at the description on #1754. We'd also like to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.... It is scary, touch every API, but I guess we do it...
Have you tried to build twitter's Source against this change? i want to know what we are signing up for. Is there 100 places we need to change, or 5000?
| // every 10k items, compact into a CMS to prevent very slow mappers | ||
| lazy implicit val batchedSG: com.twitter.algebird.Semigroup[Batched[CMS[Bytes]]] = Batched.compactingSemigroup[CMS[Bytes]](10000) | ||
|
|
||
| implicit val q: Quoted = Quoted.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we take a Quoted in the constructor actually? That should be created where we call .sketch which I think could improve reporting on internal methods.
|
|
||
| def reducers = Some(numReducers) | ||
|
|
||
| private implicit val q: Quoted = Quoted.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment: should we take implicit q: Quoted on the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| * If T <:< U, then this is safe to treat as TypedPipe[U] due to covariance | ||
| */ | ||
| protected def raiseTo[U](implicit ev: T <:< U): TypedPipe[U] = | ||
| protected def raiseTo[U](implicit ev: T <:< U, m: Quoted): TypedPipe[U] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are using m here are we? This is really just a typesafe cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| TypedPipe.DebugPipe(this).withQuoted | ||
|
|
||
| /** adds a description to the pipe */ | ||
| def withDescription(description: String): TypedPipe[T] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we want Quoted here. Can you add the comment why? Is it the assumption that users get full control of the note, and there is no need to add the line/file info that Quoted can give?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just updated the scaladoc
| ordering: Ordering[K]): Sketched[K, V] = | ||
| ordering: Ordering[K], | ||
| m: Quoted): Sketched[K, V] = | ||
| Sketched(ev(this), reducers, delta, eps, seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass m explicitly to Sketched so we can see that it is not discarded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| object CascadingBackend { | ||
| import TypedPipe._ | ||
|
|
||
| private implicit val q: Quoted = Quoted.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nervous about this hanging out implicitly. Can we not use it implicitly and instead explicitly call it: interalQuoted where needed? I fear this will actually destroy our Quoted information when we are doing composing operations, and defeat the whole purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| t match { | ||
| case WithDescriptionTypedPipe(i, desc, ded) => | ||
| case WithDescriptionTypedPipe(i, desc, ded, quoted) => | ||
| loop(i, (desc, ded) :: acc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to accumulate the quoted? Seems like we need a Semigroup on quoted (maybe a Monoid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnynek desc already has the description information from quoted (see TypedPipe.withQuoted). I think we don't need to accumulate quoted at this point since the purpose is to set the description only (no projections).
|
|
||
| def plan[T](m: Memo, tp: TypedPipe[T]): (Memo, Op[T]) = | ||
| m.plan(tp) { | ||
| implicit val q: Quoted = Quoted.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to above, can we not keep dummy implicit Quoted around. I fear we won't know where we are actually sending them. I prefer to be explicit with the dummies (and add a comment to that effect).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
@johnynek I'm still working on updating source to develop without this change. I'll test these changes after it |
|
Flavio Brasil seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
This PR is the second part of #1754. It adds the
Quotedimplicit param to the user-facing APIs. After this change, the last PR will use the projection information to do automatic pushdown.