Sankey Plot In R

Sankey Plots refer to a type of flow diagram where the thickness of the arrows represents the flow rate. Data flow diagrams are especially used when showing transfers of data or energy or the movement of materials between stages or categories. It is possible to use them to find who contributes most to a flow and specify the intricacies of the process; see how different members of a system are connected.

Steps to Create a Sankey Plot in R

In the R Programming Language, there are several ways to make Sankey plots, including by using the networkD3 package as it is easy to use and rather flexible.

  1. Install and Load Required Packages
  2. Prepare the Data
  3. Create the Sankey Plot
  4. Customize the Plot
  5. Save or Export the Plot

1. Install and Load Required Packages

First, you need to install the networkD3 package if you haven’t already: Then, load the package into your R session.

install.packages("networkD3")
library(networkD3)

2. Prepare the Data

You need two data frames: one for the nodes and one for the links.

  • The nodes data frame lists all the entities involved
  • The links data frame specifies the connections between these nodes and their flow values.
# Create nodes data frame
nodes <- data.frame(name = c("Source A", "Source B", "Source C", "Destination 1", "Destination 2"))
# Create links data frame
links <- data.frame(source = c(0, 1, 2, 0, 1, 2),
                                                       target = c(3, 3, 3, 4, 4, 4), 
                                                      value = c(10, 20, 30, 5, 15, 25))

Nodes are labeled “Source A”, “Source B”, “Source C”, “Destination 1”, and “Destination 2”. Links specify the flow from each source to each destination with corresponding values.

3. Create the Sankey Plot

Use the sankeyNetwork function to create the plot, as shown below:

sankeyPlot <- sankeyNetwork(Links = links, Nodes = nodes,
                            Source = "source", Target = "target",
                            Value = "value", NodeID = "name",
                            units = "TWh", fontSize = 12, nodeWidth = 30)

4. Customize the Plot

Enhancing the features of Sankey plots can add more value to it and make the difference between the quality of the given data. There are also many ways to modify a Sankey diagram in the networkD3 package of R such as altering the position, shape and size of the nodes as well as modifying the links, adding labels and changing the color of nodes and other features. Basic customization options are :

Adjusting Node Width and Font Size

  • nodeWidth: Sets the width of the nodes.
  • fontSize: Sets the font size for node labels.
  • units: Adds a unit of measurement to the values displayed.
sankeyPlot <- sankeyNetwork(Links = links, Nodes = nodes,
                                                Source = "source", Target = "target",
                                                 Value = "value", NodeID = "name",
                                                  units = "TWh", fontSize = 12, nodeWidth = 30)

By default, networkD3 provides tooltips when you hover over nodes and links, showing details about them. This feature is enabled by default and enhances interactivity.

5. Save or Export the Plot

To save the plot, you can use the htmlwidgets package to save it as an HTML file:

library(htmlwidgets)
saveWidget(sankeyPlot, file = "sankey_plot.html")

Here are a few more examples of Sankey plots with different datasets and customizations:

Energy Flow Sankey Plot

The Sankey plot to be developed for the energy flow example will depict how the energy inputs (Coal, Oil and Gas) are transformed into electricity generation and how this electricity is then transmitted to the several sectors of the economy such as the Industrial, Residential, and Commercial. The width of the links will vary according to the amount of energy transferred ; thereby allowing someone to easily see the locations of major energy input and output.

R
# Load necessary libraries
library(networkD3)
library(htmlwidgets)

# Create the nodes data frame
nodes <- data.frame(name = c("Coal", "Oil", "Gas", "Electricity", "Industry", 
                             "Residential", "Commercial"))

# Create the links data frame
links <- data.frame(source = c(0, 1, 2, 3, 3, 3),
                    target = c(3, 3, 3, 4, 5, 6),
                    value = c(50, 30, 20, 60, 20, 20))

# Create the Sankey plot
sankeyPlotEnergy <- sankeyNetwork(Links = links, Nodes = nodes,
                                  Source = "source", Target = "target",
                                  Value = "value", NodeID = "name",
                                  units = "TWh", fontSize = 12, nodeWidth = 30)

# Save the plot as an HTML file
saveWidget(sankeyPlotEnergy, file = "sankey_plot_energy.html")

Output:

sanky_plot_energy.html

When you execute the code for the Energy Flow example, the Sankey plot will display the following:

  • Nodes: Groups the data according to some specific categories that can represent different types of energy sources or customers – “Coal”, “Oil”, “Gas”, “Electricity”, “Industry”, “Residential”, “Commercial”.
  • Links: Explain how the energy generated from coal, oil and gas gets to the target of producing electricity and how this electricity gets distributed among industry, residents and commercial.
  • Flow Widths: The width of each link depends on the energy transfer, expressed in the Terawatt-hours (TWh) sane above in Figure 4. For example, the link that connects “Coal” with“Electricity” can be wider to denote a larger energy exchange than can a narrow link.

This plot is used to understand the distribution and proportion of energy source and how electricity is domed out according to various sectors.

Website User Flow Sankey Plot

The Sankey diagram for the Website User Flow will show a map of actual directions used by the Website users. This is seen when first arriving at the Home page where it is an indicator of how many people go to About, Services, and Contact pages and finally the Purchase page. It will enable the plot to establish the most frequently used routes and areas that likely post Consumers to a variety of drop-off points.

R
# Load necessary libraries
library(networkD3)
library(htmlwidgets)

# Create the nodes data frame
nodes <- data.frame(name = c("Home", "About", "Services", "Contact", "Purchase"))

# Create the links data frame
links <- data.frame(source = c(0, 0, 1, 2, 3),
                    target = c(1, 2, 3, 3, 4),
                    value = c(1000, 500, 200, 100, 50))

# Create the Sankey plot
sankeyPlotWeb <- sankeyNetwork(Links = links, Nodes = nodes,
                               Source = "source", Target = "target",
                               Value = "value", NodeID = "name",
                               units = "Users", fontSize = 12, nodeWidth = 30)

# Save the plot as an HTML file
saveWidget(sankeyPlotWeb, file = "sankey_plot_web.html")

Output:

sankey_plot_web.html

When you run the code for the Website User Flow example, the Sankey plot will illustrate:

  • Nodes: They show different webpages and they are homepage, about, services, contact and purchase webpages respectively.
  • Links: D show the progression of users from one page to the other page. For instance, visitors can move from one page such as “Home” to other page such as “About Us,” “Our Services” or “Contact US” and from “Contact US” to “Purchase” page.
  • Flow Widths: The width of each link is proportional to the number of users transitioning between pages. For instance, a wide link from “Home” to “About” indicates that many users visit the About page from the Home page.

This plot helps visualize user navigation paths on a website, showing the most common routes and highlighting any significant drop-offs or conversions.

Conclusion

Sankey diagrams are effective and informative graphic representations of flows as well as the associative connections between corresponding items. NetworkD3 package in R can be used with ease in order to generate and modify these types of plots. Based on the outlined procedures in this guide, one is in a good position to create Sankey plots for different purposes such as energy flow diagrams and tracking of user activities on a given website.

For more options and features not discussed in the networkD3 package, refer to the documentation associated with this package and review the other parameters and options provided by the package.