Data Science

Sankey Diagram: A Guide to Flow Data Visualization

Define Sankey diagram nodes and links to visualize complex data flows. Discover best practices for mapping conversion funnels and resource audits.

90.5k
sankey diagram
Monthly Search Volume
Keyword Research

A Sankey diagram is a data visualization that shows how volume or value flows from one state to another. You can recognize these diagrams by their curved paths, where the width of each path matches the quantity it represents. Marketers use them to track how users move through a website or a sales funnel.

What is a Sankey Diagram?

A Sankey diagram emphasizes the flow, movement, or change of an extensive property within a system. These diagrams consist of nodes (the stages or items being connected) and links (the connections between them). The thickness of each link is proportional to the flow rate, making it easy to identify the most significant contributions to a process.

Historically, the visualization is named after [Captain Matthew Sankey, who used it in 1898 to show the energy efficiency of a steam engine] (Wikipedia). While the first black and white versions showed a single type of flow, modern versions use color to represent additional variables and complex relationships.

Why Sankey Diagram matters

Sankey diagrams help you spot patterns that data tables might hide. In an SEO or marketing context, they allow you to:

  • Visualize traffic paths: [Google Analytics uses Sankey diagrams to show how users move from one page to the next on a website] (Google Charts).
  • Identify dropout points: You can see exactly where potential customers leave your conversion funnel.
  • Audit resource allocation: Map out how budgets or personnel are distributed across different departments or projects.
  • Analyze hiring or sales journeys: Track how individuals progress through stages, such as [moving from 52 job applications to a single accepted offer] (storytelling with data).
  • Compare complex accounts: Review budgets, financial results, or national energy consumption in a single view.

How Sankey Diagram works

Sankey diagrams calculate layouts based on "sources" and "targets."

  1. Define the categories: Identify your starting points (sources) and end points (targets).
  2. Assign weights: Give each connection a numerical value representing its strength or volume.
  3. Map the connections: Formatting typically follows a simple structure: Source [Amount] Target.
  4. Automatic layout: [Layout engines often use specialized code to arrange nodes automatically for better readability] (Google Charts).
  5. Interactive rendering: Many web-based diagrams use SVG or VML to render in a browser, allowing you to hover over links to see specific values.

Types of Sankey Diagram

While the core concept remains the same, specialized versions exist for specific data needs.

Type Purpose Best Used For
Process Map Shows steps in a defined workflow. Identifying waste in production or hiring.
Conversion Funnel Tracks user progress toward a goal. E-commerce checkout or sales pipelines.
Energy/Material Flow Maps the physical journey of resources. Physics, engineering, or sustainability audits.
Multilevel Sankey Displays many stages of a journey. Visualizing global migration or energy budgets.

Best practices

  • Avoid data cycles: Do not link a node back to itself or to a previous stage, as [cycles will prevent the chart from rendering] (Google Charts).
  • Limit the number of nodes: Too many stages can create "visual clutter" that makes the diagram overwhelming.
  • Use contrasting colors: Apply different hues to distinct flow types to help viewers track specific paths through the engine or system.
  • Optimize layout iterations: For complex diagrams, [increasing the number of layout engine iterations will result in a more pleasing arrangement] (Google Charts).
  • Label clearly: Add currency symbols or suffixes (like "B" for billion) so viewers understand the scale of the data immediately.

Common mistakes

  • Mistake: Using a Sankey for categorical data with no flow.
    Fix: Use a bar chart or dot plot if there is no directional relationship between items.
  • Mistake: Overlapping flows that obscure data.
    Fix: Adjust node padding or drag nodes manually to clear the path.
  • Mistake: Expecting precise comparisons of link widths.
    Fix: Always include data labels with exact numbers for critical values.
  • Mistake: Forgetting transparency.
    Fix: Use [fill opacity settings (like 0.8)] (Google Charts) to ensure overlapping links remain visible.

Examples

  • Example scenario: A website owner wants to see where visitors go after the homepage. The Sankey shows 5,000 users at the "Home" node, with 3,000 flowing to "Products," 1,500 to "Blog," and 500 exiting the site.
  • Example scenario: A CFO visualizes the company budget. Revenue flows from "Product Sales" and "Service Fees" into a central "Total Budget" node, then splits into "Marketing," "R&D," and "Operations" proportional to their funding.
  • Example scenario: An election analyst tracks runoff voting. The diagram shows votes from [eliminated candidates moving to surviving candidates in the next round] (SankeyMATIC).
Feature Sankey Diagram Alluvial Diagram Parallel Coordinate Plot
Goal Show flow volume across stages. Show how items re-group over time. Compare variables across dimensions.
Link Width Proportional to volume. Proportional to volume. Uniform line thickness.
Direction Inherent directional flow. Change across time. No specific order or flow.

FAQ

What makes a Sankey diagram different from other flow charts?
The defining characteristic is the width of the links. In a standard flow chart, arrows simply show direction. In a Sankey, the arrows are sized proportionally to the quantity they carry. This allows you to see at a glance where most of the "mass" in a system is moving.

Can I use Sankey diagrams for financial reporting?
Yes. They are increasingly used by finance leaders to [turn complex income statements into intuitive infographics] (SankeyArt). They are particularly helpful for showing how revenue is apportioned into different expense categories like taxes, housing, and savings.

Are there browser limitations for these diagrams?
Some modern libraries [do not support older browsers like Internet Explorer 8] (Google Charts). Most tools currently render using SVG, which works in all modern web browsers.

How do I handle "lost" values in a funnel?
You should create a specific node for drop-offs, labeled as "Rejected," "Exited," or "Unsuccessful." This ensures the data remains "conserved" within the system boundaries, showing where 100% of the input went.

Start Your SEO Research in Seconds

5 free searches/day • No credit card needed • Access all features