Initial setup #1

haberdashPI · 2022-06-06T15:38:58Z

This defines two functions that are handy for computing joins over time spans:

interval_join and
groupby_interval_join

Rows match in this join if their time spans overlap.

There is also a simple utility function (quantile_windows) for joining over regularly spaces intervals.

Remaining actions:

~~Close Minimal IntervalSet type invenia/Intervals.jl#193, so we can use my newly created find_intersections function from that repo.~~
~~Remove Manifest.toml entries from this repo and verify that CI now works.~~
Add JuliaFormatter / reviewdog setup

Will be releasing this as a 0.0.1 release, and add a 0.1.0 release once invenia/Intervals.jl#193 merges.

palday · 2022-06-17T19:11:45Z

Manifest.toml

@@ -0,0 +1,308 @@
+# This file is machine-generated - editing it directly is not advised
+
+[[ArgTools]]


should not be checked in

Leaving this for now. In its draft form this PR will not work without the manifest, since invenia/Intervals.jl#193 has not yet merged.

Right now, this code depends on pointing to a specific git commit that will be garbage collected (assuming that Invenia deletes branches upon merging / closing a PR) at some point. When we do this with internal repositories, we can do things like tag a commit to guarantee that it survives for future reference, but we can't do that for repositories we don't control. That's a very unstable dependency structure and is almost definitely blocking for this package having any type of release.

(I see now that there are TODOs in the top-level comment, but it might be worth opening a full issue. ❤️ )

#193 from Intervals.jl should merge in the next day or two. My plan is to not merge this until #193 has merged.

LICENSE

docs/Manifest.toml

docs/make.jl

docs/src/index.md

test/Manifest.toml

src/DataFrameIntervals.jl

Project.toml

README.md

Co-authored-by: Phillip Alday <[email protected]>

haberdashPI · 2022-06-21T18:39:18Z

src/DataFrameIntervals.jl

+    right_used = filter(x -> x isa String, right_groups)
+    right_unused = filter(x -> x isa Unused, right_groups)
+    left_used = filter(x -> x isa String, left_groups)
+    left_unused = filter(x -> x isa Unused, left_groups)


name unused is confusing, rename to invalid? maybe rename valid_columns.

also comment remains unclear: maybe spell out that we get the valid columns for right and the valid columns for left (as well as invalid).

kolia

see comments

src/DataFrameIntervals.jl

kolia · 2022-06-21T17:57:57Z

src/DataFrameIntervals.jl

+end
+
+"""
+    split_into(left, right; spancol=:span)


This is a join, why not match the DataFrames naming conventions for joins? So use the on keyword instead of spancol, and support on=:leftcol => :rightcol pairs.

kolia · 2022-06-21T18:07:20Z

src/DataFrameIntervals.jl

+function split_into(left, right; spancol=:span)
+    regions = find_intersections_(view(right, :, spancol), view(left, :, spancol))
+    left_side, right_side = split(regions, left, right)
+    joined = hcat(view(right_side, :, Not(spancol)),


When left and right have the same column, this will throw an error suggesting passing in makeunique.

Might be worth taking in and passing hcat's makeunique and copycols kwargs through to the hcat call, and maybe even throwing a split_into specific error suggesting passing makeunique to split_into instead.

Also for consistency this could be called intersectjoin or splitjoin or something like that to match the DataFrames naming convention.

kolia · 2022-06-21T18:38:54Z

src/DataFrameIntervals.jl

+end
+
+function spans_for_split!(df, left_span, right_span)
+    df[!, :left_span] = left_span


Can it happen that df already has a column called left_span or right_span, what error gets thrown then?

kolia · 2022-06-21T18:44:17Z

src/DataFrameIntervals.jl

+    end
+end
+toval(x::TimePeriod) = float(Dates.value(convert(Nanosecond, x)))
+toperiod(x::Real) = Nanosecond(round(Int, x, RoundDown))


Would be good for this name toperiod to convey that the x::Real arg is interpreted as a number of Nanoseconds.

kolia · 2022-06-21T19:23:58Z

src/DataFrameIntervals.jl

+`combine(groupby(split_into(left, right), groups), pairs...)`. The one caveat is that
+the only column from `right` that `pairs` can reference is `:right_span`.
+"""
+function split_into_combine(left, right, groups, pairs...; spancol=:span, kwds...)


As discussed in huddle, a better interface for this would be for the splitjoin / intersectjoin to return some kind of lazy object on which groupby and combine methods get defined, so that the interface mirrors the DataFrames API more closely. But that's a lot more work.

Minimal modifications here that I would find useful:

if the join operation is called intersectjoin, maybe this could be called combine_intersectjoined

having an example or two in the docstring would be very useful

ararslan · 2022-06-21T20:46:58Z

Note that one of Phillip's comments was resolved but is still applicable. Also, it would be good to fix the formatting issues and set up reviewdog + JuliaFormatter.

Co-authored-by: Phillip Alday <[email protected]>

haberdashPI · 2022-06-23T18:11:27Z

Note that one of Phillip's comments was resolved but is still applicable.

Is this the comment about the authorship that I missed above?

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

src/DataFrameIntervals.jl|279|
src/DataFrameIntervals.jl|281|
src/DataFrameIntervals.jl|287|
src/DataFrameIntervals.jl|308|
src/DataFrameIntervals.jl|314|
src/DataFrameIntervals.jl|317|
src/DataFrameIntervals.jl|321|
test/runtests.jl|17|
test/runtests.jl|22|
test/runtests.jl|32|
test/runtests.jl|36|
test/runtests.jl|39|
test/runtests.jl|41|
test/runtests.jl|45|
test/runtests.jl|52|
test/runtests.jl|56|
test/runtests.jl|59|
test/runtests.jl|63|
test/runtests.jl|67|

docs/make.jl

src/DataFrameIntervals.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

src/DataFrameIntervals.jl

github-actions · 2022-06-27T19:42:30Z

src/DataFrameIntervals.jl

+    return map(steps[1:end-1], steps[2:end]) do start, stop
+        return backto(el, Interval{eltype(steps), Closed, Open}(start, stop))


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

return map(steps[1:end-1], steps[2:end]) do start, stop

return backto(el, Interval{eltype(steps), Closed, Open}(start, stop))

return map(steps[1:(end - 1)], steps[2:end]) do start, stop

return backto(el, Interval{eltype(steps),Closed,Open}(start, stop))

src/DataFrameIntervals.jl

github-actions · 2022-06-27T19:42:30Z

src/DataFrameIntervals.jl

+    splits = intervals(range_(first(span), last(span); length=n+1), span_)
+    min_duration = if isnothing(min_duration) 
+        asnanoseconds(0.75*toval(Intervals.span(interval(first(splits)))))


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

splits = intervals(range_(first(span), last(span); length=n+1), span_)

min_duration = if isnothing(min_duration)

asnanoseconds(0.75*toval(Intervals.span(interval(first(splits)))))

splits = intervals(range_(first(span), last(span); length=n + 1), span_)

min_duration = if isnothing(min_duration)

asnanoseconds(0.75 * toval(Intervals.span(interval(first(splits)))))

github-actions · 2022-06-27T19:42:31Z

src/DataFrameIntervals.jl

+    else
+        min_duration
+    end
+    df = DataFrame(;(spancol => splits, label_helper(label) => value_helper(label, n))...)


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

df = DataFrame(;(spancol => splits, label_helper(label) => value_helper(label, n))...)

df = DataFrame(; (spancol => splits, label_helper(label) => value_helper(label, n))...)

test/runtests.jl

github-actions · 2022-06-27T19:42:33Z

test/runtests.jl

+    df2 = DataFrame(label = rand(('a':'d'), n), sublabel = rand(('k':'n'), n), x = rand(n), span = spans)
+    df2_split = combine(groupby_interval_join(df2, quarters, on=:span, Cols(Between(:label, :sublabel), :quarter)), :x => mean)
+    df2_manual = combine(groupby(interval_join(df2, quarters, on=:span), Cols(Between(:label, :sublabel), :quarter)), :x => mean)


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

df2 = DataFrame(label = rand(('a':'d'), n), sublabel = rand(('k':'n'), n), x = rand(n), span = spans)

df2_split = combine(groupby_interval_join(df2, quarters, on=:span, Cols(Between(:label, :sublabel), :quarter)), :x => mean)

df2_manual = combine(groupby(interval_join(df2, quarters, on=:span), Cols(Between(:label, :sublabel), :quarter)), :x => mean)

df2 = DataFrame(; label=rand(('a':'d'), n), sublabel=rand(('k':'n'), n), x=rand(n),

span=spans)

df2_split = combine(groupby_interval_join(df2, quarters; on=:span,

Cols(Between(:label, :sublabel), :quarter)),

:x => mean)

df2_manual = combine(groupby(interval_join(df2, quarters; on=:span),

Cols(Between(:label, :sublabel), :quarter)), :x => mean)

test/runtests.jl

github-actions · 2022-06-27T19:42:33Z

test/runtests.jl

+        Aqua.test_all(DataFrameIntervals; 
+                    project_extras=true,
+                    stale_deps=true,
+                    deps_compat=true,
+                    project_toml_formatting=true,
+                    ambiguities=false)


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

Aqua.test_all(DataFrameIntervals;

project_extras=true,

stale_deps=true,

deps_compat=true,

project_toml_formatting=true,

ambiguities=false)

Aqua.test_all(DataFrameIntervals;

project_extras=true,

stale_deps=true,

deps_compat=true,

project_toml_formatting=true,

ambiguities=false)

src/DataFrameIntervals.jl

test/runtests.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions · 2022-06-28T20:39:36Z

src/DataFrameIntervals.jl

+const IntervalTuple = Union{NamedTuple{(:start, :stop)}, NamedTuple{(:stop, :start)}}
+interval_type(x::Type{<:T}) where T<:IntervalTuple = Union{T.parameters[2].parameters...}


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

const IntervalTuple = Union{NamedTuple{(:start, :stop)}, NamedTuple{(:stop, :start)}}

interval_type(x::Type{<:T}) where T<:IntervalTuple = Union{T.parameters[2].parameters...}

const IntervalTuple = Union{NamedTuple{(:start, :stop)},NamedTuple{(:stop, :start)}}

interval_type(x::Type{<:T}) where {T<:IntervalTuple} = Union{T.parameters[2].parameters...}

github-actions · 2022-06-28T20:39:36Z

src/DataFrameIntervals.jl

+function IntervalArray(x::AbstractVector{<:IntervalTuple}) 
+    return IntervalArray{typeof(x), Interval{interval_type(eltype(x)), Closed, Open}}(x)


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

function IntervalArray(x::AbstractVector{<:IntervalTuple})

return IntervalArray{typeof(x), Interval{interval_type(eltype(x)), Closed, Open}}(x)

function IntervalArray(x::AbstractVector{<:IntervalTuple})

return IntervalArray{typeof(x),Interval{interval_type(eltype(x)),Closed,Open}}(x)

github-actions · 2022-06-28T20:39:36Z

src/DataFrameIntervals.jl

+interval(x::IntervalTuple) = Interval{interval_type(x), Closed, Open}(x.start, x.stop)
+backto(::NamedTuple{(:start, :stop)}, x::Interval) = (;start=first(x), stop=last(x))
+backto(::NamedTuple{(:stop, :start)}, x::Interval) = (;stop=last(x), start=first(x))


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

interval(x::IntervalTuple) = Interval{interval_type(x), Closed, Open}(x.start, x.stop)

backto(::NamedTuple{(:start, :stop)}, x::Interval) = (;start=first(x), stop=last(x))

backto(::NamedTuple{(:stop, :start)}, x::Interval) = (;stop=last(x), start=first(x))

interval(x::IntervalTuple) = Interval{interval_type(x),Closed,Open}(x.start, x.stop)

backto(::NamedTuple{(:start, :stop)}, x::Interval) = (; start=first(x), stop=last(x))

backto(::NamedTuple{(:stop, :start)}, x::Interval) = (; stop=last(x), start=first(x))

github-actions · 2022-06-28T20:39:36Z

test/runtests.jl

+    @test isapprox(duration(quarters.span[2]), duration(quarters.span[3]),
+                   atol=Nanosecond(1))
+    @test isapprox(duration(quarters.span[2]), duration(quarters.span[3]);
+                    atol=Nanosecond(1)) ||


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

atol=Nanosecond(1)) ||

atol=Nanosecond(1)) ||

github-actions · 2022-06-28T20:39:36Z

test/runtests.jl

+    nt_spans = [(;start=start(x), stop=stop(x)) for x in spans]
+    df1_nt = hcat(df1[!, Not(:span)], DataFrame(;span = nt_spans))


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

nt_spans = [(;start=start(x), stop=stop(x)) for x in spans]

df1_nt = hcat(df1[!, Not(:span)], DataFrame(;span = nt_spans))

nt_spans = [(; start=start(x), stop=stop(x)) for x in spans]

df1_nt = hcat(df1[!, Not(:span)], DataFrame(; span=nt_spans))

github-actions · 2022-06-28T20:39:37Z

test/runtests.jl

+                    span=spans)
+    df2_split = combine(groupby_interval_join(df2, quarters,
+                                              Cols(Between(:label, :sublabel), :quarter);
+                                              on=:span,),


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

on=:span,),

on=:span),

- Updated readme to use the actual names I ended up going with in #1 - Fix bug when passing `Pair` object with `on` - Fix bug in method definition for `quantile_windows` - Test various keyword arguments for `interval_join`

haberdashPI added 3 commits May 13, 2022 15:11

Initial, PkgTemplates setup

03b4521

initial draft, unfinished tests

454b2a6

starting to work on tests...

6e75201

haberdashPI marked this pull request as draft June 6, 2022 15:39

haberdashPI added 2 commits June 14, 2022 21:22

fixing some tests

c2173c1

working tests?

c4be972

palday reviewed Jun 17, 2022

View reviewed changes

haberdashPI and others added 5 commits June 17, 2022 21:37

tests working locally

b1237d7

Remove license

1ea329a

Remove doc manifest

977f5b5

add a comment

2685f89

Apply suggestions from code review

11d15f3

Co-authored-by: Phillip Alday <[email protected]>

haberdashPI marked this pull request as ready for review June 17, 2022 21:52

haberdashPI requested a review from kolia June 17, 2022 21:54

update julia versions

445b3eb

haberdashPI self-assigned this Jun 21, 2022

added example to readme

3f1f88b

haberdashPI commented Jun 21, 2022

View reviewed changes

kolia approved these changes Jun 21, 2022

View reviewed changes

haberdashPI and others added 3 commits June 22, 2022 20:41

wip refactor

985da80

initial draft of code review revisions

4ce41e1

Update Project.toml

1099b80

Co-authored-by: Phillip Alday <[email protected]>

haberdashPI added 6 commits June 23, 2022 16:21

tests working locally

e8eb202

some tweaks

a050b5c

YAS formatting

b641861

tagbot

7321364

updated comments

e834bd7

add empty examples dir

17fbda1

fixed misplaced file

f2a8c3c

github-actions bot reviewed Jun 27, 2022

View reviewed changes

Formatting fixes

1a9566e

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions bot reviewed Jun 27, 2022

View reviewed changes

formatting fixes

bd45ceb

github-actions bot reviewed Jun 27, 2022

View reviewed changes

haberdashPI and others added 2 commits June 27, 2022 15:54

Moar formatting

600a84b

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

handle named tuples

7190dfc

github-actions bot reviewed Jun 28, 2022

View reviewed changes

haberdashPI added 4 commits June 28, 2022 20:47

formatting

28f93f6

project setup

3ab48e7

formatting fixes

991e5b6

make this 0.0.1 release

a157e9a

haberdashPI merged commit 39fcd6a into main Jun 28, 2022

haberdashPI deleted the dfl/initial-setup branch June 28, 2022 21:04

haberdashPI mentioned this pull request Jun 30, 2022

some small repairs #7

Merged

		@@ -0,0 +1,308 @@
		# This file is machine-generated - editing it directly is not advised

		[[ArgTools]]

		return map(steps[1:end-1], steps[2:end]) do start, stop
		return backto(el, Interval{eltype(steps), Closed, Open}(start, stop))

	df = DataFrame(;(spancol => splits, label_helper(label) => value_helper(label, n))...)
	df = DataFrame(; (spancol => splits, label_helper(label) => value_helper(label, n))...)

		const IntervalTuple = Union{NamedTuple{(:start, :stop)}, NamedTuple{(:stop, :start)}}
		interval_type(x::Type{<:T}) where T<:IntervalTuple = Union{T.parameters[2].parameters...}

		function IntervalArray(x::AbstractVector{<:IntervalTuple})
		return IntervalArray{typeof(x), Interval{interval_type(eltype(x)), Closed, Open}}(x)

		nt_spans = [(;start=start(x), stop=stop(x)) for x in spans]
		df1_nt = hcat(df1[!, Not(:span)], DataFrame(;span = nt_spans))

Initial setup #1

Initial setup #1

Uh oh!

Conversation

haberdashPI commented Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haberdashPI Jun 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kolia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ararslan commented Jun 21, 2022

Uh oh!

haberdashPI commented Jun 23, 2022

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot Jun 27, 2022

Choose a reason for hiding this comment

haberdashPI commented Jun 6, 2022 •

edited

Loading

haberdashPI Jun 17, 2022 •

edited

Loading