Skip to content

Conversation

JCQuintas
Copy link
Member

@JCQuintas JCQuintas commented Jul 24, 2025

Resolves #7930

I've changed the API to only need links array to be defined. The node is an object that can be used to provide configuration to the nodes if needed.

By default the link/source/target order is used as the node's order. Otherwise, the node array is used when present.

To improve

There are two props used for styling. nodeOptions and linkOptions. Currently they have a mix of styling and configuration. Eg: nodeOptions.color style, nodeOptions.padding configuration (not real padding, used in the sankey layout fn). It might not be the clearest 🤔

Another possible option is to have a nodeStyle/linkStyle and move configuration to the root series, eg: { nodeStyle: { fill: 'blue', ...(accept css props maybe?) }, nodePadding: 10 }

Questions

  • Do we need legend? (assuming no)

Usage

  const data: SankeyData = {
    nodes: [
      { id: 'B', color: 'red' },
    ],
    links: [
      { source: 'A', target: 'B', value: 5, color: 'red' },
      { source: 'A', target: 'C', value: 3 },
      { source: 'B', target: 'D', value: 3 },
      { source: 'C', target: 'D', value: 1 },
    ],
  };

  return (
    <SankeyChart
      series={{data}}
    />
  );

Examples

Screenshot 2025-07-24 at 12 54 38

A more complex sankey, with custom color for each node/link.

Changelog

  • Add new SankeyChart under the Unstable_ key.
Screenshot 2025-07-24 at 12 54 33

@JCQuintas JCQuintas self-assigned this Jul 24, 2025
@JCQuintas JCQuintas added type: new feature Expand the scope of the product to solve a new problem. scope: charts Changes related to the charts. labels Jul 24, 2025
@mui-bot
Copy link

mui-bot commented Jul 24, 2025

Deploy preview: https://deploy-preview-18895--material-ui-x.netlify.app/

Updated pages:

Bundle size report

Bundle Parsed size Gzip size
@mui/x-data-grid 0B(0.00%) 0B(0.00%)
@mui/x-data-grid-pro 0B(0.00%) 0B(0.00%)
@mui/x-data-grid-premium 0B(0.00%) 0B(0.00%)
@mui/x-charts 🔺+60B(+0.02%) 🔺+25B(+0.03%)
@mui/x-charts-pro 🔺+13KB(+3.28%) 🔺+4.06KB(+3.39%)
@mui/x-date-pickers 0B(0.00%) 0B(0.00%)
@mui/x-date-pickers-pro 0B(0.00%) 0B(0.00%)
@mui/x-tree-view 0B(0.00%) 0B(0.00%)
@mui/x-tree-view-pro 0B(0.00%) 0B(0.00%)

Details of bundle changes

Generated by 🚫 dangerJS against 4dda1f8

Copy link

codspeed-hq bot commented Jul 24, 2025

CodSpeed Performance Report

Merging #18895 will not alter performance

Comparing JCQuintas:sankey (4dda1f8) with master (90d94c7)1

Summary

✅ 10 untouched benchmarks

Footnotes

  1. No successful run was found on master (22c79bb) during the generation of this report, so 90d94c7 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@bernardobelchior
Copy link
Member

bernardobelchior commented Jul 24, 2025

Great speed, good job 💪


An alternative would be to only accept links[] where the source/target are the node's labels, thought it may be simpler, it is not as flexible, like, where would I customise each node color?

We can make it so that the links are the source of truth, and the nodes provide additional information. E.g., the node 'C' below would have the label "C" because it's its ID and there's no label set for it.

  const data: SankeyValueType = {
    nodes: [
      { id: 'A', label: 'Node A' },
      { id: 'B', label: 'Node B' },
      { id: 'D', label: 'Node D' },
    ],
    links: [
      { source: 'A', target: 'B', value: 5, color: 'red' },
      { source: 'A', target: 'C', value: 3 },
      { source: 'B', target: 'D', value: 3 },
      { source: 'C', target: 'D', value: 1 },
    ],
  };

The benefit of such approach is that all sources and targets in links will be shown in the chart. The downside is there isn't a way to define a link to a non-existent node, but I'm failing to see why we would want that.


I've currently opted for nodes[] and links[] to be divided as they are better for customisation.

Another potential improvement is to define the nodes as an object, rather than an array:

  const data: SankeyValueType = {
    nodes: {
      'A': { label: 'Node A' },
      'B': { label: 'Node B' },
      'C': { label: 'Node C' },
      'D': { label: 'Node D' },
    },
    links: [
      { source: 'A', target: 'B', value: 5, color: 'red' },
      { source: 'A', target: 'C', value: 3 },
      { source: 'B', target: 'D', value: 3 },
      { source: 'C', target: 'D', value: 1 },
    ],
  };

That way we have O(1) access to nodes and we ensure there aren't multiple nodes with the same ID. Do you see any downside?


Finally, what's the plan for dealing with cycles? It seems d3-sankey just throws. Do we just want to let that bubble up?

@JCQuintas
Copy link
Member Author

The benefit of such approach is that all sources and targets in links will be shown in the chart. The downside is there isn't a way to define a link to a non-existent node, but I'm failing to see why we would want that.

I think the downside is that you can't have "nodes" without links. You could technically want that for, in the example of energy, display untapped potential. Though the current code doesn't support that too.


Another potential improvement is to define the nodes as an object, rather than an array:

I thought about it, but the underlying d3 func needs nodes as array


Finally, what's the plan for dealing with cycles? It seems d3-sankey just throws. Do we just want to let that bubble up?

I had a fix for it ready, it's just checking if the source === target and ignore the link and warn the user.

@bernardobelchior
Copy link
Member

bernardobelchior commented Jul 24, 2025

I think the downside is that you can't have "nodes" without links. You could technically want that for, in the example of energy, display untapped potential. Though the current code doesn't support that too.

I see.

How would that work, though? If the value is in the links, there isn't a way to define a "untapped potential", right? How would you imagine that working in terms of API?

I thought about it, but the underlying d3 func needs nodes as array

We could convert it, right? IMO, the array API looks better, but I also like the idea of making impossible states impossible to represent, which in this case would mean using an object keyed by node ID.

I had a fix for it ready, it's just checking if the source === target and ignore the link and warn the user.

There can also be indirect loops, right?

Something like:

  const data: SankeyValueType = {
    links: [
      { source: 'A', target: 'B', value: 5 },
      { source: 'A', target: 'C', value: 3 },
      { source: 'C', target: 'A', value: 3 },
    ],
  };

Creates a link A -> B -> C -> A. Looking at a single link's source and target isn't enough to catch this case.

I'm not saying we need to implement this cycle detection algorithm ourselves. My question was more like: do we want to catch d3's circular link error and treat it differently? E.g., provide a onCircularLinkError prop, or throw a more descriptive error in dev?

@JCQuintas
Copy link
Member Author

Creates a look A -> B -> C -> A. Looking at a single link's source and target isn't enough to catch this case.

🤦

@alexfauquette
Copy link
Member

Node format

About getting nodes as an object instead of an array, I'm more in favor of the opposite direction. Because the array as an order and this order can be used to sort nodes.

Just by using the D3 methods, you get

  • .nodeSort(undefined): vertical order of nodes within each column will be determined automatically by the layout (default behavior)
  • .nodeSort(null): vertical order of nodes order is fixed by the input

If you already know the fix order you're looking for, providing the nodes array in a good format could do all you need

Cycles

Of course, D3 needs to compute the Topological ordering of DAG to slice nodes per column.

I think we can raise the error. Because even if you catch the cycle, you don't know which element should be removed, so we don't know what to display

Comment on lines 86 to 90
// Calculate layout based on data and dimensions
const layout: SankeyLayout = React.useMemo(
() => calculateSankeyLayout(data, drawingArea, nodeWidth, nodeGap, iterations),
[drawingArea, data, nodeWidth, nodeGap, iterations],
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a comment for latter. WHile playing with Echarts, it looke like the sa,key ayout is a state that get saved in a state, such that external modification can have different behaviors:

  • drawing area update => reshape the current structure with linear interpolation between the previouse and next area available
  • data update => do a full re-computation

This is especially visible when you play with the drag and drop

Capture.video.du.2025-08-01.11-36-24.mp4

@alexfauquette
Copy link
Member

About choice between array/Map strucutre, the internal frequently convert from array to either Map or Set so not a big deel if we add a similar transormation one more time in our code
https://github.com/d3/d3-sankey/blob/bb7233ea0004669878d922e55fe83084ef3cb555/src/sankey.js#L132

In case you wonder how the circular get find, It's a brutal O(N^2) algorithm

https://github.com/d3/d3-sankey/blob/bb7233ea0004669878d922e55fe83084ef3cb555/src/sankey.js#L157-L173

@JCQuintas
Copy link
Member Author

About getting nodes as an object instead of an array, I'm more in favor of the opposite direction. Because the array as an order and this order can be used to sort nodes.

An object also has consistent sorting. We can allow the provided object to serve as the order, though I would prefer we just accept a sorting function instead for custom orders.

In case you wonder how the circular get find, It's a brutal O(N^2) algorithm

https://github.com/d3/d3-sankey/blob/bb7233ea0004669878d922e55fe83084ef3cb555/src/sankey.js#L157-L173

We can just return the error for now I guess.

Comment on lines 76 to 80
<defs>
<clipPath id={clipId}>
<path d={`M${x0},${y0} L${x1},${y0} L${x1},${y1} L${x0},${y1} Z`} />
</clipPath>
</defs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen this alternative solution to avoid clipPath?

https://observablehq.com/@enjalot/weird-sankey-links

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but his solution creates this weird shape

Screenshot 2025-08-19 at 18 44 18

It gets more pronounced the more the Y axis is moved apart.

Screenshot 2025-08-19 at 18 46 51

I have current solution that creates a shape without these issues though.

I'm just not sure the styling drawbacks were worth it, though it is up for discussion. I still have the implementation in a stash.

Mainly the line can be dashed and the browser does all the work for us. The code to generate good lines is complex.

In short it:

  • Gets the current link data
  • Creates a curve based on the given parameters
  • Split this curve into multiple paths
  • Calculate the angle of each path
  • Create a new point above+below the original point using the angle (extruding the line)
  • Ensure no point/shape is crossing the boundaries
  • Transform into a path.d

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing, my custom approach looks better on regular use-cases.

Screenshot 2025-08-20 at 15 33 16

While the naive approach looks better on smaller width or sankeys with too many nodes 🤔

Screenshot 2025-08-20 at 15 32 40

Not sure if we should combine the two somehow or pick one 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

  • if link height > link width => D3+clip-path
  • else => naive aproach

I spent to much time trying to find a formula that tells us when D3 link will do weird stuff or not without success

Copy link
Member

@alexfauquette alexfauquette Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally used a math solver to fix all my by-hand mistakes
And there is a magical formula that can tell you if the parallel will have a "weird shape"

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "math solver"? 🤔

Thanks for the formula, will try it out. I also spent some time trying to figure it out, but couldn't.

I've noticed I forgot to share the repo I was trying stuff on, if you want to play. 😓 https://github.com/JCQuintas/sankey-link-path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used sympy to run the maths. The exact terms might be "symbolic computation" since I did not use the solver method :)

If you want to have a look

from sympy import *

init_printing()
H, W, h, t = symbols('H W h t', real=True)

# Besier curve starting at (0, 0) going to (W, H) with control points at (W/2, 0), (W/2, H)
x = t**3*W - 3/2*W*t*t + 3*t*W/2
y = -t**3*2*H + 3*H*t*t

# The derivative of the besier curve
xd = diff(x, t)
yd = diff(y, t)

# The norm of the derivative
nd = sqrt(xd**2 + yd**2)

# The parallel curve at a distance h/2 from the initial curve
xp = x + h/2 * yd / nd
yp = y - h/2 * xd / nd


# compute x'(t) for the parallel curves
result = simplify(diff(xp, t))

# If a weird shape occurs, at the end the curve will go from right to left instead of left to right.
# So x'(t=1) will be negative
print(simplify(result.replace(t, 1)) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen this 😮

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've simple maths to do you can try
https://www.wolframalpha.com/input?i=cos%28atan%28x%29%29

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using the naive approach and adding a prop instead to better control the control points of the bezier curve. This offers nice looking curves, paths, and an option for the user to adjust the looks if they want the links to look nicer/fuller

Screen.Recording.2025-08-21.at.14.36.45.mov

Copy link
Member

@alexfauquette alexfauquette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor concern about item identifier type. The rest looks good 👍


return {
identifier,
color: node.color!,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those color! and value! are sign of a type issue. If you're sure those are with default value TS should be aware, because user doing their own tooltip will not know be able to guess it.

IMO the root cause is your bending of the notion of identifier.

The initial idea was to provide the minimal set of data needed to find an item. Most of the time the

  • series type
  • series id
  • data index

such that if one value is different you know it's not the same item.

Your SankeyItemIdentifier contains much more information. For example it can get the color, width, ..

For me the SankeyItemIdentifier should only contains

  • series type
  • series id
  • (type="node", node id) | (type"link", source id, target id)

And from this identifier the tooltip should get the full data.

For example when user will want to control the highlight, the highlightedItem prop will get type SankeyItemIdentifier which allow them to provide color and other useless properties.

Providing a more complete object for onClick events looks good, but we should have two distinct types:

  • SankeyItemIdentifier: A dry type that only focus on the necessary data. Used for our internal comparisons
  • SankeyItemIdentifierWithData: A type that also contains all the data we know about the link/node related to this item. Used for user facing callback

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause is the sankey types in d3, the identifier types have nothing to do with the ! assertions. The d3 sankey types assume the link/node are both initialised and not initialised, so the types are always the "non-defaultized" way, to put it into our internal language.

This causes this issues, I can change that by making everything required, and it should fix the types here, I just didn't think much of that at the time.


But regarding the identifier overloading, I'm not sure you suggestion is a good approach for sankey.

Both us and the user would have to traverse both the node and links array and re-create the data back in order to display the tooltip, while all these values are already calculated. We could store this and provide a hooks for that, but for what reason if the data needed can be readily available?

IMO it adds a lot more overhead for no real value except the idea that the identifier should be "simple".

It works for the other types because the data is simple, but as we start to work towards more complex visualisations, I believe we will need to change this concept anyways.

Eg: how would an identifier of node in a treeview look like? We could argue a path of ids like ['451ad', '1212b', 'jki40'], or the node entity itself with a parent ref node.parent are both valid, but which one would be best?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both us and the user would have to traverse both the node and links array and re-create the data back in order to display the tooltip, while all these values are already calculated. We could store this and provide a hooks for that, but for what reason if the data needed can be readily available?

I think there is a miss-understanding. If the data is available, I'm in favor of providing it to the user. Having an onClick?: (event, fullNodeData) => void is ok. I even think it could be an improvement for other charts in next majors.

My issue is with the typing. If you reintroduce the highlight feature, you will get itemHighlighted?: SankeyItemIdentifier | null. And for me it feels wrong that the autocomplet in itemHighlighted proposes to provide the node value, color, and other fields.

For example you have a dashboard about energy with multiple charts, and want you have a feature allowing you to select between electricity, gaz, or fuel to highlight/modify charts. In that case I would expect to write something as follow, and don't see any other property supported by TS

<SankeyChart highlighterItem={{type: 'sankey', subType: 'node', id: energyType }} />

Idealy we would have SankeyItemIdentifierWithData extends SankeyItemIdentifier to simplify controll

const [nodeItem, setNodeItem] = React.useState<SankeyItemIdentifierWithData | null>(null);

<SankeyChart 
  highlighterItem={nodeItem} // I pass more data than needed but that's ok
  onNodeClick={(_, newNode) =>  setNodeItem(newNode)} // I get all the data available
/>

It works for the other types because the data is simple, but as we start to work towards more complex visualisations, I believe we will need to change this concept anyways.
Eg: how would an identifier of node in a treeview look like? We could argue a path of ids like ['451ad', '1212b', 'jki40'], or the node entity itself with a parent ref node.parent are both valid, but which one would be best?

I don't get your example. There are multiple ways to represent data, but that's not an issue. We can pick the one we prefer.

In your case have an internal representation with

/** The unique identifier **/
interface TreeMapItemIdentifier {
  type: 'tree-map';
  seriesId: SeriesId;
  path: string[];
}

/** The item identifier used for callback**/
interface TreeMapItemIdentifierWithData extends TreeMapItemIdentifier {
  value: number;
  color: string;
  formattedValue: string;
  ...
}

interface TreeMapProps {
  highlightedItem?: TreeMapItemIdentifier | null;
  onItemClick?: (event, item: TreeMapItemIdentifierWithData) => void;
}

Copy link
Member Author

@JCQuintas JCQuintas Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SankeyItemIdentifierWithData extends SankeyItemIdentifier

There is no way to implement this right now I guess. Since we require the itemIdentifier from the ChartSeriesConfig.

I can make the type have an optional node/link, so the identifier becomes the nodeId/targetId/sourceId

export type SankeyItemIdentifier = {
  type: 'sankey';
  /**
   * Unique identifier for the series
   */
  seriesId: SeriesId;
} & (
  | {
      /**
       * Subtype to differentiate between node and link
       */
      subType: 'node';
      /**
       * The id of the node
       */
      nodeId: SankeyNodeId;
      /**
       * The node object with all the calculated properties
       */
      node?: SankeyLayoutNode;
    }
  | {
      /**
       * Subtype to differentiate between node and link
       */
      subType: 'link';
      /**
       * The id of the source node
       */
      sourceId: SankeyNodeId;
      /**
       * The id of the target node
       */
      targetId: SankeyNodeId;
      /**
       * The link object with all the calculated properties
       */
      link?: SankeyLayoutLink;
    }
);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the link/node are optional in the identifier if you are passing that in. At least for now 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the link/node are optional in the identifier if you are passing that in. At least for now 🤷

Looks like the worst of both worlds 😿

  • for controlled values, TS tells you to pass some data that will not be used
  • for custom components, TS tells you something is optional, whereas we know it's populated

There is no way to implement this right now I guess. Since we require the itemIdentifier from the ChartSeriesConfig.

Technically speekiing it's feasible. You introduce an attribute itemIdentifierWithData in the config with same type as the itemIdentifer for all series except the sankey.

Then you define this helper

export type ChartItemIdentifierWithData<T extends ChartSeriesType> =
  ChartsSeriesConfig[T]['itemIdentifierWithData'];

And now you can update the definition of the tooltip getter to be

export type TooltipGetter<TSeriesType extends ChartSeriesType> = (params: {
   series: ChartSeriesDefaultized<TSeriesType>;
   axesConfig: TooltipGetterAxesConfig;
   getColor: ColorGetter<TSeriesType>;
-  identifier: ChartItemIdentifier<TSeriesType> | null;
+  identifier: ChartItemIdentifierWithData<TSeriesType> | null;
 }) =>

The TS passes if in SankeyItemIdentifier the node can be undefined. If we mark it as mandatory (which is the interesting part) it fails for a more interesting reason. The result of the selector can not be assigned to the tooltipGetter.

If I resume types in a scheme, I get

  • The user provides two distinct source of truth. The data and the controlled identifier.
  • We can store the identifier with or without data:
    • identifier without data comes from controlled values
    • identifier with data comes from our own interaction management
  • In case the identifier has no data, the selector (or another piece of code) should add it
image

Proposal

Since the charts has no controlled value for now, we can have an identifier with node and link required.

Before adding a controlled highlight or tooltip, we should introduce this distinction between identifiers with/without data and the piece of code that transforms an identifier without data to an identifier with data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented it on this PR, I was under the impression it would need more changes, but was quite simple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually moved the changes here so it is easier to review: #19295

Copy link
Member

@alexfauquette alexfauquette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean. I left minor comments

key={`${link.source.id}-${link.target.id}`}
link={link}
opacity={linkOptions?.opacity}
onClick={onItemClick}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing onItemClick by onNodeCLick/onLinkClick would allow to simplify the type of the context (i.e. onNOdeCLick only getting node context).

Plus the cursor={onClick ? 'pointer' : 'default'} would only be applied when needed. No pointer on link if only nodes have an onclick interaction

@JCQuintas JCQuintas merged commit 3dcf6c2 into mui:master Aug 26, 2025
22 checks passed
@JCQuintas JCQuintas deleted the sankey branch August 26, 2025 17:00
@dberardo-com
Copy link

Are dropoffs included in the first release?

@JCQuintas
Copy link
Member Author

@dberardo-com they are in our todo list as a sub-issue from the main issue, though there is no timeline for it yet: #7930

@dberardo-com
Copy link

@dberardo-com they are in our todo list as a sub-issue from the main issue, though there is no timeline for it yet: #7930

Thanks for the fast reply, I couldn't find the list of requested features that's why i asked (I guess notions having troubles at the moment).

I believe dropoffs are quite a differentiating factor, but that depends on what your typical user's use cases are.

Looking forward for the official docu to come up to see what the current API can achieve.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: charts Changes related to the charts. type: new feature Expand the scope of the product to solve a new problem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[charts] Sankey charts
5 participants