Search…
Data Visualization
Follow these guidelines when visualizing data in our product. Have questions? Talk to Richard at [email protected]

Foreword

Epilogue from Edward Tufte’s The Visual Display of Quantitative Information:
Design is choice. The theory of the visual display of quantitative information consists of principles that generate design options and that guide choices among options. The principles should not be applied rigidly or in a peevish spirit; they are not logically or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper. Most principles of design should be greeted with some skepticism, for word authority can dominate our vision, and we may come to see only through the lenses of word authority rather than with our own eyes.
What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult–that is, the revelation of the complex.

Principles

Communicate data effectively

Know the question you want to answer. Visualizations should be focused on answering a single question. Attempting to answer multiple questions can increase the complexity of a visualization, making it harder to understand.
Show the data. A customer should focus on the data, not the design or anything else. Visualizations should encourage customers to compare different data points. Data should be shown at several levels of detail, allowing customers to see the complete picture. Consider display metrics showing a broad overview followed by a line graph depicting the historical context of that display metric.
Make complexity approachable. Visualizations should use clarity, precision, and efficiency to communicate complex data in a simple design. Customers should find large data sets approachable and easy to understand.
Example: A customer may send over one billion messages. Showing rates with 3 decimals is necessary to provide the needed precision. A customer with only ten thousand sends only needs rates in 2 decimals places to get a similar level of precision.
Data tells a story. There’s often a narrative about the data. Consider not only the question the data is answering, but stories being told by the data.
Newly created messages combined with transition of bounced to suppressed
Example: Take the above Undelivered graph. What story might the data be telling? A history of 0 messages created followed by a slow increase tells us this customer is most likely integrating push. We know that when a push fails because the token is invalid or unregistered, we generate a bounce. Then, further messages are not sent and marked as suppressed. Seeing this relationship in rates combined with a decrease in bounce rate could mean a customer is fixing their issue with invalid tokens.
Now what if we know that’s a common story and we can codify it? Maybe it’s: X days of 0 messages sent, followed by 0 > n < 100 messages sent and >50% failure rate. If we see it beginning to unfold for other customers, we can alert them to links for debugging their integration.

Consider the nature of the data

Understanding the data set will help determine how to best show the data, and ultimately how to best answer the question asked. The same data in different graphics will tell different stories, which will lead to different interpretations. Knowing the nature of the data will help determine how to present the data.
Example: Consider a count of email messages sent per month for the past year. That data will most likely be comprised of high numbers (in the hundreds of thousands) with a small magnitude of difference. So while, a bar graph will show those counts over time, it will be hard to visualize the month-over-month difference as a bar graph’s y-axis requires equal increments starting at 0. To better visualize the trend in monthly send volumes given that data, we could show:
  • A line graph depicting the counts with an adjusted y-axis (y-min to y-max),
  • A line graph depicting percentage change month-to-month,
  • Or something else entirely.
When showing this data, we have to take into account the fact that the data is made up of large numbers with a small magnitude of difference.

Maintain graphical integrity

While we won’t intentionally mislead customers, we strive to reduce the possibility of unintentionally misleading our customers. Use clear and detailed labeling to combat ambiguity.
A good rule of the thumb is that number of information-carrying dimensions depicted should not exceed the number of dimensions in the data. For example, take a bar graph displaying the counts of email sends. If the bar graph was a 3D bar graph, depth gets introduced as an information-carrying dimension when it does not provide any additional data.
What information is depth providing? Please don't do this.
For any printed graphics, we should ensure the physical measurements of graphics are directly proportional the numerical quantities represented.

Alert customers to insights

When there is an opportunity, provide customers with insights. Helping customers understand and compare data empowers them to take action.
Example: Consider our spam rate threshold. When a spam rate goes above a certain level, we alert them to potential deliverability issues with a tooltip. That’s great, but it’s reactive.
By including that rate on a graph showcasing spam rate over time, we’re able to be proactive. A customer can now see the direction their spam rate is trending and can proactively take action.
Proactively addressing the issue and never hitting the 2.00% mark!

Guidelines

1. Define your intent

Start by defining the question you want to answer. Remember: visualizations should be focused on answering a single question. Next, determine what data is needed to answer your question.
Some considerations:
  • Narrative: Is there a story you want to tell? Does the data tell a story?
  • Level of detail: Is this a high-level metric? Or targeted towards granular details.
  • Accuracy: Is a high level of accuracy needed to convey a trend or pattern?
  • Insights: Are there insights we can alert customers to?

2. Choosing a visualization

I want to show...
And I want to...
Recommendation is...
Magnitude
Size comparisons, relative or absolute. Typically a count rather than rate.
Compare size of elements using a common, understood pattern.
Show the size changing over time.
Bar Graph (Vertical)
Compare size of elements using a common, understood pattern.
Have many items, category labels are long, and data is not a time series.
Bar Graph (Horizontal)
Display single data points, typically the current state.
Display Metrics
Show large amounts of data where values are more important than trends.
Enable scanning or sorting to facilitate comparing the data.
Table
Change over time
Changing trends.
Show the relationship between points: peaks, valleys, or direction of trend. Have rates to show
Line Graph
Provide historical context to a display metrics. Have a small amount of space. Have many historical data points.
Sparkline Graph
Provide historical context to a display metrics. Have a small amount of space. Emphasize change since last time period.
Delta label
Show how the count has changed over time
Bar Graph (Vertical)
Show how the relationship between a count (bar) and a rate (line)
Bar Graph (Vertical) with Line
Show the change to the total (of components) by emphasizing the relationship between the data and zero.
Area Graph
Part-to-whole
How a single item can be broken down into its components.
Consider magnitude if intention is on the size of the components v. the breakdown.
Add detail/data to a standard bar graph by showing its component pieces.
Stacked Bar Graph
Make Tufte mad or want dessert.
"A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them…”
It’s hard for humans to accurately compare the size of the segments and should be avoided.
Pie/Donut Chart
Show parts-to-whole relationship when there are many sub-pieces
Sunburst
Flow
Volumes of movements between states, logical sequences.
Show the one-directional flow from one state to others. Often depicting outcomes from a process.
Sankey
Deviation
Variations (+/-) from a fixed reference point (0, target, long-term average).
Also used for positive/neutral/negative
See table below for examples
Correlation
Relationship between two or more variables. Warning: customers will most likely infer a causal relationship.
See table below for examples
Ranking
If an item’s position in an ordered list is more important than absolute or relative value.
See table below for examples
Distribution
How often values occur.
See table below for examples

3. Formatting the visualization

Now that the right graph is chosen, a few overall tips:
  • Use all elements together: words, numbers, and the graph’s drawing
  • Reflect a balance, a proportion, a sense of relevant scale
  • Display an accessible complexity of detail
  • Avoid content-free decoration or visual elements that are not necessary to comprehend the data
  • Be cognizant of other graphics on the page to avoid series of higher complexity graphs

Content

In addition to the data, graphs should contain:
  1. 1.
    Heading (top left)
  2. 2.
    Legend (top right)
  3. 3.
    Frame (as needed)
  4. 4.
    Gridlines (as needed)
  5. 5.
    Unit of measurement
  6. 6.
    Reference to time
  7. 7.
    Labels
  8. 8.
    Filters
  9. 9.
    Interactions (as needed)
  10. 10.
    Table (as needed)
Two things to consider when depicting the data: data-ink and data density.
Data-ink is the non-erasable core of a graphic, the non-redundant ink arranged in response to variation in the numbers represented. Additionally, the data-ink ratio is equal to data-ink / total ink used to print the graphic.
A few guidelines:
  1. 1.
    Above all else, show the data
  2. 2.
    Maximize the data-ink ratio, within reason
  3. 3.
    Erase non-data-ink, within reason
  4. 4.
    Erase redundant data-ink, within reason.
A few examples:
Good: A nice balance of data-ink and non-data-ink
Bad: Too much non-data-ink
Bad: Too little data-ink
Data density of a display is equal to the number of entries in data matrix / area of data display. Low density prompts suspicions: did they cherry-pick data? What did they leave out? Aim to maximize data density and the size of the data matrix, within reason.
Note that graphics can typically be shrunk to increase the density. For example, changing the y-axis scale in a line graph to go from y-min to y-max.
Good: Scaled y-axis and increased data points increases data density.
Bad: Full 0-100% y-axis and minimal data points decreases data density

Size and Spacing

Graphs, by default, should be greater in width than height. A few reasons:
  • Humans are naturally practiced in detecting deviations from the horizon.
  • Labels are easier to write on a single line, and thus easier to read left-to-right.
  • Many graphs are set up with the y-axis representing the effect and the x-axis representing the cause. A larger width adds more detail to the causal variable.
A rule of thumb is to aim for a maximum ratio of 2:1 (width:height). A graph should be tall enough to properly plot data, but the entire graph and it’s supporting context should not exceed the height of screen.

Axis

Scale. When displaying counts, the axis should start at zero. When displaying rates, allow the data to be the axis’ lowest and highest value to maximize the data density. An exception to this is when a specific rate is an expected value. For example, a spam rate of 0.0% is an expected rate (and encouraged!) so the axis for a spam rate over time graph should start at 0.0%.
Good: Scaled y-axis to increase data density.
Good: Scales starts at an expected value (0.00%)
Bad: Y-axis scale decreases data density
Two Y-Axes. When using two y-axes, follow these guidelines:
  • The higher priority data uses the left y-axis.
  • Gridlines should be used only with the left y-axis.
  • Properly label the axes to communicate which axis is for which data. Aim to align these with the labels in the legend.
Good: Two y-axes are labeled appropriately with the legend and the rate is using the left y-axis.
Other ideas. Note: You most likely won't ever need to consider these options.
When appropriate, consider reducing non-data-ink by using a range-frame. A range-frame only shows the y-axis line from y-min to y-max.
Example of a range-frame to reduce non-data-ink
An iteration on this is a range-frame with range-labels. In the above example, A range-frame with range-labels would add labels for the y-min and y-max data points, but with no gridlines (only at round numbers).
Example of range-frame with range-labels. Helpful to communicate those y-min and y-max values.
Another option to consider to reduce non-data-ink is to use data-based labels instead of rounded value tick marks. In some scenarios, data-based labels increase the comprehension of the data (almost like a column in the table), especially if no gridlines are used.
Example of data-based labels where you can read the data values without hovering.

Gridlines

To increase the data-ink ratio, gridlines should be reduced in color (gray-200) and weight (1px). This allows labels and then data values to be higher in visual hierarchy. Gridlines should be evenly spaced, set on rounded units.

Labeling

A few guidelines:
  • X and Y-axis should be clearly labeled.
  • Labels should be clear, concise, and accurate.
  • Center labels on tick mark
  • Don’t put labels at an angle
  • Maintain consistency across values
  • Remove repetition if possible
  • Abbreviate when appropriate
  • It’s ok to skip labels in regular intervals if clear tick marks are used

Formatting (initial attempt)

Type
Case
Format
Percentages
Rate
Default
95.13%
For metrics with a denominator > 1,000,000
95.128%
Numeric
1,000,000
1m
1,000,000,000
1b
Dates
Hour
Default
2pm
Day
Default
Jan 1
With hour
Jan 1 2pm
Within a week
Mon
Wed
Over multiple years
Dec 31, 2018
Over multiple years with hour
Dec 31, 2018 2pm
Week
Default
Jan 1 - Jan 7
Over multiple years
Dec 25, 2018 - Jan 2, 2019
Month
Default
Jan
Over multiple years.
First value and any year change is listed.
Nov 2018, Dec, Jan 2019, Feb, Mar
Year
Default
2019

Colors

Single-hue. Used when data is a range of one numeric value. Colors used:
Data Point
Default
Emphasis
Negative
First
Ink-500
Purple-500
Red-500
Second
Ink-300
Red-300
Third
Ink-900
Red-900
Fourth
Ink-700
Order data points in a logical order and use your best judgement. For example...
...In this Volume chart, we have three data points that are in a logical order of Sent (Ink-300) → Opened (Ink-500) → Clicked (Ink-900)
...In this Undelivered chart, we have three data points that are in a logical order of Failed (Red-900) → Bounced (Red-500) → Suppressed (Red-300).
Also note the Created bars are not ink-500 and the Failed, Bounced, and Suppressed are semi-transparent. The focus on this chart is the Failed, Bounce, and Suppressed data. Using ink-500 would be distracting and draw attention away from the core data. This is also the case with the Spammed chart (below).
Good: Colors put the focus on Failed, Bounced, and Suppressed data
Bad: Harder to see the Failed, Bounced, and Suppressed data
Good: Able to visualized the Spammed rate data
Bad: Harder to see the trend in the Spammed rate
Multi-hue. Used when comparing data across different categories. Colors used:
Data Point
Color
Notes
First
Ink-500
Second
Purple-500
Third
Blue-500
Fourth
Plum-500
Careful when combining with purple
Fifth
Teal-500
Sixth
Clementine-500
Seventh
Raspberry-500
Careful, it could look negative
Eighth
Yellow-500
Careful, it could communicate a warning
If the graph data corresponds to an item, use those colors. For example, graphs for A/B testing can use the colors associated with the variations.
With the three variations, the chart utilizes the 500 level to depict the data
Biased. Used when showing data in a positive or negative. Colors used:
Data Point
Negative
Positive
Neutral
First
Red-500
Green-500
Gray-500
Second
Red-300
Green-300
Third
Red-900
Green-900
Note: Order data points in a order of severity. For example:
  • With 5 data points, most negative is Red-500 and most positive is Green-500.
  • With 7 data points, most negative is Red-900 and most positive is Green-900.

Specifics by Type

Tables
  • Limit content in data cells, move repetitive content into labels
  • Reduce non-data-ink by using subtle horizontal borders over zebra striping.
  • By default, content should be left aligned. The exception to this rule is when a different alignment helps comprehension of the content. For example, currency or numeric values should be right-aligned.
  • Header row uses “Label”, aligned with their associated content, sorting icons (placement in relation to column header, icon)
  • If a cell has no value, leave the cell empty
  • Bulk actions are visible on hover. If space allows, links are displayed as text. If there isn’t enough space, actions are consolidated into the 3-dot icon on the far right.
  • Tables may include pagination, if included:
  • Pagination should communicate which results are being shown
  • Pagination should communicate the total amount of results
  • Pagination should allow you to go to First, Prev, Next, Last
  • If table has more columns that can be displayed, then the table should be horizontally scrollable. If horizontally scrolling is needed, an identifying column(s) needs to remain fixed; typically the first column.
Vertical Bar Graph
  • Starts the y-axis at 0.
  • Width of bar should be twice the space of space between the bars.
Horizontal Bar Graph
  • Organize in meaningful way (size, alphabetical, date).
Sparkline
  • Metrics has a label + value.
  • Sparkline has line + acceptable range if needed.
Pie/Donut Chart
  • Avoid is possible.
  • Try to use five or less parts.
  • Start at 12 o’clock, go clockwise.
  • Slices in order of size.
  • Ensure it adds up to 100%, add uncategorized if needed.

4. Adding interactivity

Adding interactivity to data visualizations can help customers view and compare the data. This is especially true when revealing the data at multiple levels. Take a line graph that is (1) successful at depicting the trend (slope), but (2) has a tooltip that reveals the exact values on hover.

Viewing data

To enhance the viewing of specific datapoint, consider:
  • Showing a tooltip (on hover) that lists the exact values. Tooltips should default to being centered on the data point, but should viewing of entire data point should be prioritized
  • Adding supplementary information to that tooltip to provide context. For example, a spam rate graph maybe include the spam rate threshold in the tooltip, plus a deliverability warning if the spam rate for that data point is above the threshold.
  • Highlighting the data point hovered by adding a background color, a dot to a line chart, or other methods.
Comparing data
To enhance the comparison of data points, consider giving customers the ability to add a datapoint to a table below the chart for comparison.
Example: Consider a bar chart covering the past 9 quarters and a customer wanting to compare Q4 over the past 3 years. Adding a button within a tooltip allowing a customer to add that datapoint to a table below the graph enables them to directly compare those specific data points.

Adding context

To provide more understanding around a data point, consider adding the ability for customers to add annotations to a chart. This allows customers to note specific points in time that help explain the “why” behind the data.
Example: a customer makes significant changes to the campaign and wants to add an annotation so they can visualize the difference in performance before and after the change.
Adding annotation to a chart
Copy link
On this page
Foreword
Principles
Communicate data effectively
Consider the nature of the data
Maintain graphical integrity
Alert customers to insights
Guidelines
1. Define your intent
2. Choosing a visualization
3. Formatting the visualization
4. Adding interactivity