Data Visualization
Follow these guidelines when visualizing data in our product. Have questions? Talk to Richard at richard@customer.io.
Last updated
Follow these guidelines when visualizing data in our product. Have questions? Talk to Richard at richard@customer.io.
Last updated
Epilogue from Edward Tufte’s The Visual Display of Quantitative Information:
Design is choice. The theory of the visual display of quantitative information consists of principles that generate design options and that guide choices among options. The principles should not be applied rigidly or in a peevish spirit; they are not logically or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper. Most principles of design should be greeted with some skepticism, for word authority can dominate our vision, and we may come to see only through the lenses of word authority rather than with our own eyes.
What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult–that is, the revelation of the complex.
Know the question you want to answer. Visualizations should be focused on answering a single question. Attempting to answer multiple questions can increase the complexity of a visualization, making it harder to understand.
Show the data. A customer should focus on the data, not the design or anything else. Visualizations should encourage customers to compare different data points. Data should be shown at several levels of detail, allowing customers to see the complete picture. Consider display metrics showing a broad overview followed by a line graph depicting the historical context of that display metric.
Make complexity approachable. Visualizations should use clarity, precision, and efficiency to communicate complex data in a simple design. Customers should find large data sets approachable and easy to understand.
Example: A customer may send over one billion messages. Showing rates with 3 decimals is necessary to provide the needed precision. A customer with only ten thousand sends only needs rates in 2 decimals places to get a similar level of precision.
Data tells a story. There’s often a narrative about the data. Consider not only the question the data is answering, but stories being told by the data.
Example: Take the above Undelivered graph. What story might the data be telling? A history of 0 messages created followed by a slow increase tells us this customer is most likely integrating push. We know that when a push fails because the token is invalid or unregistered, we generate a bounce. Then, further messages are not sent and marked as suppressed. Seeing this relationship in rates combined with a decrease in bounce rate could mean a customer is fixing their issue with invalid tokens.
Now what if we know that’s a common story and we can codify it? Maybe it’s: X days of 0 messages sent, followed by 0 > n < 100 messages sent and >50% failure rate. If we see it beginning to unfold for other customers, we can alert them to links for debugging their integration.
Understanding the data set will help determine how to best show the data, and ultimately how to best answer the question asked. The same data in different graphics will tell different stories, which will lead to different interpretations. Knowing the nature of the data will help determine how to present the data.
Example: Consider a count of email messages sent per month for the past year. That data will most likely be comprised of high numbers (in the hundreds of thousands) with a small magnitude of difference. So while, a bar graph will show those counts over time, it will be hard to visualize the month-over-month difference as a bar graph’s y-axis requires equal increments starting at 0. To better visualize the trend in monthly send volumes given that data, we could show:
A line graph depicting the counts with an adjusted y-axis (y-min to y-max),
A line graph depicting percentage change month-to-month,
Or something else entirely.
When showing this data, we have to take into account the fact that the data is made up of large numbers with a small magnitude of difference.
While we won’t intentionally mislead customers, we strive to reduce the possibility of unintentionally misleading our customers. Use clear and detailed labeling to combat ambiguity.
A good rule of the thumb is that number of information-carrying dimensions depicted should not exceed the number of dimensions in the data. For example, take a bar graph displaying the counts of email sends. If the bar graph was a 3D bar graph, depth gets introduced as an information-carrying dimension when it does not provide any additional data.
For any printed graphics, we should ensure the physical measurements of graphics are directly proportional the numerical quantities represented.
When there is an opportunity, provide customers with insights. Helping customers understand and compare data empowers them to take action.
Example: Consider our spam rate threshold. When a spam rate goes above a certain level, we alert them to potential deliverability issues with a tooltip. That’s great, but it’s reactive.
By including that rate on a graph showcasing spam rate over time, we’re able to be proactive. A customer can now see the direction their spam rate is trending and can proactively take action.
Start by defining the question you want to answer. Remember: visualizations should be focused on answering a single question. Next, determine what data is needed to answer your question.
Some considerations:
Narrative: Is there a story you want to tell? Does the data tell a story?
Level of detail: Is this a high-level metric? Or targeted towards granular details.
Accuracy: Is a high level of accuracy needed to convey a trend or pattern?
Insights: Are there insights we can alert customers to?
I want to show... | And I want to... | Recommendation is... |
Magnitude Size comparisons, relative or absolute. Typically a count rather than rate. | Compare size of elements using a common, understood pattern. Show the size changing over time. | Bar Graph (Vertical) |
Compare size of elements using a common, understood pattern. Have many items, category labels are long, and data is not a time series. | Bar Graph (Horizontal) | |
Display single data points, typically the current state. | Display Metrics | |
Show large amounts of data where values are more important than trends. Enable scanning or sorting to facilitate comparing the data. | Table | |
Change over time Changing trends. | Show the relationship between points: peaks, valleys, or direction of trend. Have rates to show | Line Graph |
Provide historical context to a display metrics. Have a small amount of space. Have many historical data points. | Sparkline Graph | |
Provide historical context to a display metrics. Have a small amount of space. Emphasize change since last time period. | Delta label | |
Show how the count has changed over time | Bar Graph (Vertical) | |
Show how the relationship between a count (bar) and a rate (line) | Bar Graph (Vertical) with Line | |
Show the change to the total (of components) by emphasizing the relationship between the data and zero. | Area Graph | |
Part-to-whole How a single item can be broken down into its components. Consider magnitude if intention is on the size of the components v. the breakdown. | Add detail/data to a standard bar graph by showing its component pieces. | Stacked Bar Graph |
Make Tufte mad or want dessert. "A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them…” It’s hard for humans to accurately compare the size of the segments and should be avoided. | Pie/Donut Chart | |
Show parts-to-whole relationship when there are many sub-pieces | Sunburst | |
Flow Volumes of movements between states, logical sequences. | Show the one-directional flow from one state to others. Often depicting outcomes from a process. | Sankey |
Deviation Variations (+/-) from a fixed reference point (0, target, long-term average). Also used for positive/neutral/negative | See table below for examples | |
Correlation Relationship between two or more variables. Warning: customers will most likely infer a causal relationship. | See table below for examples | |
Ranking If an item’s position in an ordered list is more important than absolute or relative value. | See table below for examples | |
Distribution How often values occur. | See table below for examples |
I want show... | And the list above doesn't cut it/need inspiration... |
Change over time | |
Magnitude | |
Part-to-whole | |
Flow | |
Deviation | |
Correlation | |
Ranking | |
Distribution |
Now that the right graph is chosen, a few overall tips:
Use all elements together: words, numbers, and the graph’s drawing
Reflect a balance, a proportion, a sense of relevant scale
Display an accessible complexity of detail
Avoid content-free decoration or visual elements that are not necessary to comprehend the data
Be cognizant of other graphics on the page to avoid series of higher complexity graphs
In addition to the data, graphs should contain:
Heading (top left)
Legend (top right)
Frame (as needed)
Gridlines (as needed)
Unit of measurement
Reference to time
Labels
Filters
Interactions (as needed)
Table (as needed)
Two things to consider when depicting the data: data-ink and data density.
Data-ink is the non-erasable core of a graphic, the non-redundant ink arranged in response to variation in the numbers represented. Additionally, the data-ink ratio is equal to data-ink / total ink used to print the graphic.
A few guidelines:
Above all else, show the data
Maximize the data-ink ratio, within reason
Erase non-data-ink, within reason
Erase redundant data-ink, within reason.
A few examples:
Data density of a display is equal to the number of entries in data matrix / area of data display. Low density prompts suspicions: did they cherry-pick data? What did they leave out? Aim to maximize data density and the size of the data matrix, within reason.
Note that graphics can typically be shrunk to increase the density. For example, changing the y-axis scale in a line graph to go from y-min to y-max.
Graphs, by default, should be greater in width than height. A few reasons:
Humans are naturally practiced in detecting deviations from the horizon.
Labels are easier to write on a single line, and thus easier to read left-to-right.
Many graphs are set up with the y-axis representing the effect and the x-axis representing the cause. A larger width adds more detail to the causal variable.
A rule of thumb is to aim for a maximum ratio of 2:1 (width:height). A graph should be tall enough to properly plot data, but the entire graph and it’s supporting context should not exceed the height of screen.
Scale. When displaying counts, the axis should start at zero. When displaying rates, allow the data to be the axis’ lowest and highest value to maximize the data density. An exception to this is when a specific rate is an expected value. For example, a spam rate of 0.0% is an expected rate (and encouraged!) so the axis for a spam rate over time graph should start at 0.0%.
Two Y-Axes. When using two y-axes, follow these guidelines:
The higher priority data uses the left y-axis.
Gridlines should be used only with the left y-axis.
Properly label the axes to communicate which axis is for which data. Aim to align these with the labels in the legend.
Other ideas. Note: You most likely won't ever need to consider these options.
When appropriate, consider reducing non-data-ink by using a range-frame. A range-frame only shows the y-axis line from y-min to y-max.
An iteration on this is a range-frame with range-labels. In the above example, A range-frame with range-labels would add labels for the y-min and y-max data points, but with no gridlines (only at round numbers).
Another option to consider to reduce non-data-ink is to use data-based labels instead of rounded value tick marks. In some scenarios, data-based labels increase the comprehension of the data (almost like a column in the table), especially if no gridlines are used.
To increase the data-ink ratio, gridlines should be reduced in color (gray-200) and weight (1px). This allows labels and then data values to be higher in visual hierarchy. Gridlines should be evenly spaced, set on rounded units.
A few guidelines:
X and Y-axis should be clearly labeled.
Labels should be clear, concise, and accurate.
Center labels on tick mark
Don’t put labels at an angle
Maintain consistency across values
Remove repetition if possible
Abbreviate when appropriate
It’s ok to skip labels in regular intervals if clear tick marks are used
Type | Case | Format |
Percentages | ||
Rate | Default | 95.13% |
For metrics with a denominator > 1,000,000 | 95.128% | |
Numeric | ||
1,000,000 | 1m | |
1,000,000,000 | 1b | |
Dates | ||
Hour | Default | 2pm |
Day | Default | Jan 1 |
With hour | Jan 1 2pm | |
Within a week | Mon Wed | |
Over multiple years | Dec 31, 2018 | |
Over multiple years with hour | Dec 31, 2018 2pm | |
Week | Default | Jan 1 - Jan 7 |
Over multiple years | Dec 25, 2018 - Jan 2, 2019 | |
Month | Default | Jan |
Over multiple years. First value and any year change is listed. | Nov 2018, Dec, Jan 2019, Feb, Mar | |
Year | Default | 2019 |
Single-hue. Used when data is a range of one numeric value. Colors used:
Data Point | Default | Emphasis | Negative |
First | Ink-500 | Purple-500 | Red-500 |
Second | Ink-300 | Red-300 | |
Third | Ink-900 | Red-900 | |
Fourth | Ink-700 |
Order data points in a logical order and use your best judgement. For example...
...In this Volume chart, we have three data points that are in a logical order of Sent (Ink-300) → Opened (Ink-500) → Clicked (Ink-900)
...In this Undelivered chart, we have three data points that are in a logical order of Failed (Red-900) → Bounced (Red-500) → Suppressed (Red-300).
Also note the Created bars are not ink-500 and the Failed, Bounced, and Suppressed are semi-transparent. The focus on this chart is the Failed, Bounce, and Suppressed data. Using ink-500 would be distracting and draw attention away from the core data. This is also the case with the Spammed chart (below).
Multi-hue. Used when comparing data across different categories. Colors used:
Data Point | Color | Notes |
First | Ink-500 | |
Second | Purple-500 | |
Third | Blue-500 | |
Fourth | Plum-500 | Careful when combining with purple |
Fifth | Teal-500 | |
Sixth | Clementine-500 | |
Seventh | Raspberry-500 | Careful, it could look negative |
Eighth | Yellow-500 | Careful, it could communicate a warning |
If the graph data corresponds to an item, use those colors. For example, graphs for A/B testing can use the colors associated with the variations.
Biased. Used when showing data in a positive or negative. Colors used:
Data Point | Negative | Positive | Neutral |
First | Red-500 | Green-500 | Gray-500 |
Second | Red-300 | Green-300 | |
Third | Red-900 | Green-900 |
Note: Order data points in a order of severity. For example:
With 5 data points, most negative is Red-500 and most positive is Green-500.
With 7 data points, most negative is Red-900 and most positive is Green-900.
Tables
Limit content in data cells, move repetitive content into labels
Reduce non-data-ink by using subtle horizontal borders over zebra striping.
By default, content should be left aligned. The exception to this rule is when a different alignment helps comprehension of the content. For example, currency or numeric values should be right-aligned.
Header row uses “Label”, aligned with their associated content, sorting icons (placement in relation to column header, icon)
If a cell has no value, leave the cell empty
Bulk actions are visible on hover. If space allows, links are displayed as text. If there isn’t enough space, actions are consolidated into the 3-dot icon on the far right.
Tables may include pagination, if included:
Pagination should communicate which results are being shown
Pagination should communicate the total amount of results
Pagination should allow you to go to First, Prev, Next, Last
If table has more columns that can be displayed, then the table should be horizontally scrollable. If horizontally scrolling is needed, an identifying column(s) needs to remain fixed; typically the first column.
Vertical Bar Graph
Starts the y-axis at 0.
Width of bar should be twice the space of space between the bars.
Horizontal Bar Graph
Organize in meaningful way (size, alphabetical, date).
Sparkline
Metrics has a label + value.
Sparkline has line + acceptable range if needed.
Pie/Donut Chart
Avoid is possible.
Try to use five or less parts.
Start at 12 o’clock, go clockwise.
Slices in order of size.
Ensure it adds up to 100%, add uncategorized if needed.
Adding interactivity to data visualizations can help customers view and compare the data. This is especially true when revealing the data at multiple levels. Take a line graph that is (1) successful at depicting the trend (slope), but (2) has a tooltip that reveals the exact values on hover.
To enhance the viewing of specific datapoint, consider:
Showing a tooltip (on hover) that lists the exact values. Tooltips should default to being centered on the data point, but should viewing of entire data point should be prioritized
Adding supplementary information to that tooltip to provide context. For example, a spam rate graph maybe include the spam rate threshold in the tooltip, plus a deliverability warning if the spam rate for that data point is above the threshold.
Highlighting the data point hovered by adding a background color, a dot to a line chart, or other methods.
Comparing data
To enhance the comparison of data points, consider giving customers the ability to add a datapoint to a table below the chart for comparison.
Example: Consider a bar chart covering the past 9 quarters and a customer wanting to compare Q4 over the past 3 years. Adding a button within a tooltip allowing a customer to add that datapoint to a table below the graph enables them to directly compare those specific data points.
To provide more understanding around a data point, consider adding the ability for customers to add annotations to a chart. This allows customers to note specific points in time that help explain the “why” behind the data.
Example: a customer makes significant changes to the campaign and wants to add an annotation so they can visualize the difference in performance before and after the change.
Don't make customers think about the chart. Stick to common chart types. This allows customers to focus on the data. If you do have to use a more uncommon on complex chart type, provide tools and information to help customers. Consistent use of charts helps comprehension. For example, if a "Message Volume" chart is used across multiple channels, ensure "messages sent" is always the same color so customers don't need to re-learn the chart's legend.
Communicate data from multiple levels. While a chart is likely detailing a specific timeframe or subset of the data, consider ways to communicate the context or high-level trends. Take the common example of a stock price chart for Apple. The most visual piece is the chart which is displaying the details of the pricing during the day. Above the chart, there's current price and today's change as the amount and percentage. Below the chart, there's the open price, high price, low price, and more data points to help provide context to the chart.
This goes beyond data, provide descriptive text to help make the chart easier to understand. Even above, the "today" is providing clarity to the percentage change and the chart below. It reinforces the timeline selected. Most weather apps do a good job of this by adding "Rain for the hour" or "Clear skies" next the temperature and forecast icon.
Match the chart's visual prominence to its purpose. Our summary charts like on the dashboard or transactional list are intentionally void of numerous labels and annotations. Individually, they are only a third or half of the page's width. The goal is to provide a quick insight and answer the question of "Is this trend what I expected or do I need to investigate further?" Where as on the analysis page, there are tools to toggle chart types, display multiple series of data, and more. These charts take up the full-width because they're meant to be the primary focus.