Epilogue from Edward Tufte’s The Visual Display of Quantitative Information:
Design is choice. The theory of the visual display of quantitative information consists of principles that generate design options and that guide choices among options. The principles should not be applied rigidly or in a peevish spirit; they are not logically or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper. Most principles of design should be greeted with some skepticism, for word authority can dominate our vision, and we may come to see only through the lenses of word authority rather than with our own eyes.What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult–that is, the revelation of the complex.
Know the question you want to answer. Visualizations should be focused on answering a single question. Attempting to answer multiple questions can increase the complexity of a visualization, making it harder to understand.
Show the data. A customer should focus on the data, not the design or anything else. Visualizations should encourage customers to compare different data points. Data should be shown at several levels of detail, allowing customers to see the complete picture. Consider display metrics showing a broad overview followed by a line graph depicting the historical context of that display metric.
Make complexity approachable. Visualizations should use clarity, precision, and efficiency to communicate complex data in a simple design. Customers should find large data sets approachable and easy to understand.
Example: A customer may send over one billion messages. Showing rates with 3 decimals is necessary to provide the needed precision. A customer with only ten thousand sends only needs rates in 2 decimals places to get a similar level of precision.
Data tells a story. There’s often a narrative about the data. Consider not only the question the data is answering, but stories being told by the data.
Newly created messages combined with transition of bounced to suppressed
Example: Take the above Undelivered graph. What story might the data be telling? A history of 0 messages created followed by a slow increase tells us this customer is most likely integrating push. We know that when a push fails because the token is invalid or unregistered, we generate a bounce. Then, further messages are not sent and marked as suppressed. Seeing this relationship in rates combined with a decrease in bounce rate could mean a customer is fixing their issue with invalid tokens.Now what if we know that’s a common story and we can codify it? Maybe it’s: X days of 0 messages sent, followed by 0 > n < 100 messages sent and >50% failure rate. If we see it beginning to unfold for other customers, we can alert them to links for debugging their integration.
Understanding the data set will help determine how to best show the data, and ultimately how to best answer the question asked. The same data in different graphics will tell different stories, which will lead to different interpretations. Knowing the nature of the data will help determine how to present the data.
Example: Consider a count of email messages sent per month for the past year. That data will most likely be comprised of high numbers (in the hundreds of thousands) with a small magnitude of difference. So while, a bar graph will show those counts over time, it will be hard to visualize the month-over-month difference as a bar graph’s y-axis requires equal increments starting at 0. To better visualize the trend in monthly send volumes given that data, we could show:
- A line graph depicting the counts with an adjusted y-axis (y-min to y-max),
- A line graph depicting percentage change month-to-month,
- Or something else entirely.When showing this data, we have to take into account the fact that the data is made up of large numbers with a small magnitude of difference.
While we won’t intentionally mislead customers, we strive to reduce the possibility of unintentionally misleading our customers. Use clear and detailed labeling to combat ambiguity.
A good rule of the thumb is that number of information-carrying dimensions depicted should not exceed the number of dimensions in the data. For example, take a bar graph displaying the counts of email sends. If the bar graph was a 3D bar graph, depth gets introduced as an information-carrying dimension when it does not provide any additional data.
What information is depth providing? Please don't do this.
For any printed graphics, we should ensure the physical measurements of graphics are directly proportional the numerical quantities represented.
When there is an opportunity, provide customers with insights. Helping customers understand and compare data empowers them to take action.
Example: Consider our spam rate threshold. When a spam rate goes above a certain level, we alert them to potential deliverability issues with a tooltip. That’s great, but it’s reactive.By including that rate on a graph showcasing spam rate over time, we’re able to be proactive. A customer can now see the direction their spam rate is trending and can proactively take action.
Proactively addressing the issue and never hitting the 2.00% mark!
Start by defining the question you want to answer. Remember: visualizations should be focused on answering a single question. Next, determine what data is needed to answer your question.
- Narrative: Is there a story you want to tell? Does the data tell a story?
- Level of detail: Is this a high-level metric? Or targeted towards granular details.
- Accuracy: Is a high level of accuracy needed to convey a trend or pattern?
- Insights: Are there insights we can alert customers to?
Now that the right graph is chosen, a few overall tips:
- Use all elements together: words, numbers, and the graph’s drawing
- Reflect a balance, a proportion, a sense of relevant scale
- Display an accessible complexity of detail
- Avoid content-free decoration or visual elements that are not necessary to comprehend the data
- Be cognizant of other graphics on the page to avoid series of higher complexity graphs
In addition to the data, graphs should contain:
- 1.Heading (top left)
- 2.Legend (top right)
- 3.Frame (as needed)
- 4.Gridlines (as needed)
- 5.Unit of measurement
- 6.Reference to time
- 9.Interactions (as needed)
- 10.Table (as needed)
Two things to consider when depicting the data: data-ink and data density.
Data-ink is the non-erasable core of a graphic, the non-redundant ink arranged in response to variation in the numbers represented. Additionally, the data-ink ratio is equal to data-ink / total ink used to print the graphic.
A few guidelines:
- 1.Above all else, show the data
- 2.Maximize the data-ink ratio, within reason
- 3.Erase non-data-ink, within reason
- 4.Erase redundant data-ink, within reason.
A few examples:
Good: A nice balance of data-ink and non-data-ink
Bad: Too much non-data-ink
Bad: Too little data-ink
Data density of a display is equal to the number of entries in data matrix / area of data display. Low density prompts suspicions: did they cherry-pick data? What did they leave out? Aim to maximize data density and the size of the data matrix, within reason.
Note that graphics can typically be shrunk to increase the density. For example, changing the y-axis scale in a line graph to go from y-min to y-max.
Good: Scaled y-axis and increased data points increases data density.
Bad: Full 0-100% y-axis and minimal data points decreases data density
Graphs, by default, should be greater in width than height. A few reasons:
- Humans are naturally practiced in detecting deviations from the horizon.
- Labels are easier to write on a single line, and thus easier to read left-to-right.
- Many graphs are set up with the y-axis representing the effect and the x-axis representing the cause. A larger width adds more detail to the causal variable.
A rule of thumb is to aim for a maximum ratio of 2:1 (width:height). A graph should be tall enough to properly plot data, but the entire graph and it’s supporting context should not exceed the height of screen.
Scale. When displaying counts, the axis should start at zero. When displaying rates, allow the data to be the axis’ lowest and highest value to maximize the data density. An exception to this is when a specific rate is an expected value. For example, a spam rate of 0.0% is an expected rate (and encouraged!) so the axis for a spam rate over time graph should start at 0.0%.
Good: Scaled y-axis to increase data density.
Good: Scales starts at an expected value (0.00%)
Bad: Y-axis scale decreases data density
Two Y-Axes. When using two y-axes, follow these guidelines:
- The higher priority data uses the left y-axis.
- Gridlines should be used only with the left y-axis.
- Properly label the axes to communicate which axis is for which data. Aim to align these with the labels in the legend.
Good: Two y-axes are labeled appropriately with the legend and the rate is using the left y-axis.
Other ideas. Note: You most likely won't ever need to consider these options.
When appropriate, consider reducing non-data-ink by using a range-frame. A range-frame only shows the y-axis line from y-min to y-max.
Example of a range-frame to reduce non-data-ink
An iteration on this is a range-frame with range-labels. In the above example, A range-frame with range-labels would add labels for the y-min and y-max data points, but with no gridlines (only at round numbers).
Example of range-frame with range-labels. Helpful to communicate those y-min and y-max values.
Another option to consider to reduce non-data-ink is to use data-based labels instead of rounded value tick marks. In some scenarios, data-based labels increase the comprehension of the data (almost like a column in the table), especially if no gridlines are used.
Example of data-based labels where you can read the data values without hovering.
To increase the data-ink ratio, gridlines should be reduced in color (gray-200) and weight (1px). This allows labels and then data values to be higher in visual hierarchy. Gridlines should be evenly spaced, set on rounded units.
A few guidelines:
- X and Y-axis should be clearly labeled.
- Labels should be clear, concise, and accurate.
- Center labels on tick mark
- Don’t put labels at an angle
- Maintain consistency across values
- Remove repetition if possible
- Abbreviate when appropriate
- It’s ok to skip labels in regular intervals if clear tick marks are used
Single-hue. Used when data is a range of one numeric value. Colors used:
Order data points in a logical order and use your best judgement. For example...
...In this Volume chart, we have three data points that are in a logical order of Sent (Ink-300) → Opened (Ink-500) → Clicked (Ink-900)
...In this Undelivered chart, we have three data points that are in a logical order of Failed (Red-900) → Bounced (Red-500) → Suppressed (Red-300).
Also note the Created bars are not ink-500 and the Failed, Bounced, and Suppressed are semi-transparent. The focus on this chart is the Failed, Bounce, and Suppressed data. Using ink-500 would be distracting and draw attention away from the core data. This is also the case with the Spammed chart (below).
Good: Colors put the focus on Failed, Bounced, and Suppressed data
Bad: Harder to see the Failed, Bounced, and Suppressed data
Good: Able to visualized the Spammed rate data
Bad: Harder to see the trend in the Spammed rate
Multi-hue. Used when comparing data across different categories. Colors used:
If the graph data corresponds to an item, use those colors. For example, graphs for A/B testing can use the colors associated with the variations.
With the three variations, the chart utilizes the 500 level to depict the data
Biased. Used when showing data in a positive or negative. Colors used:
Note: Order data points in a order of severity. For example:
- With 5 data points, most negative is Red-500 and most positive is Green-500.
- With 7 data points, most negative is Red-900 and most positive is Green-900.
- Limit content in data cells, move repetitive content into labels
- Reduce non-data-ink by using subtle horizontal borders over zebra striping.
- By default, content should be left aligned. The exception to this rule is when a different alignment helps comprehension of the content. For example, currency or numeric values should be right-aligned.
- Header row uses “Label”, aligned with their associated content, sorting icons (placement in relation to column header, icon)
- If a cell has no value, leave the cell empty
- Bulk actions are visible on hover. If space allows, links are displayed as text. If there isn’t enough space, actions are consolidated into the 3-dot icon on the far right.
- Tables may include pagination, if included:
- Pagination should communicate which results are being shown
- Pagination should communicate the total amount of results
- Pagination should allow you to go to First, Prev, Next, Last
- If table has more columns that can be displayed, then the table should be horizontally scrollable. If horizontally scrolling is needed, an identifying column(s) needs to remain fixed; typically the first column.
Vertical Bar Graph
- Starts the y-axis at 0.
- Width of bar should be twice the space of space between the bars.
Horizontal Bar Graph
- Organize in meaningful way (size, alphabetical, date).
- Metrics has a label + value.
- Sparkline has line + acceptable range if needed.
- Avoid is possible.
- Try to use five or less parts.
- Start at 12 o’clock, go clockwise.
- Slices in order of size.
- Ensure it adds up to 100%, add uncategorized if needed.
Adding interactivity to data visualizations can help customers view and compare the data. This is especially true when revealing the data at multiple levels. Take a line graph that is (1) successful at depicting the trend (slope), but (2) has a tooltip that reveals the exact values on hover.
To enhance the viewing of specific datapoint, consider:
- Showing a tooltip (on hover) that lists the exact values. Tooltips should default to being centered on the data point, but should viewing of entire data point should be prioritized
- Adding supplementary information to that tooltip to provide context. For example, a spam rate graph maybe include the spam rate threshold in the tooltip, plus a deliverability warning if the spam rate for that data point is above the threshold.
- Highlighting the data point hovered by adding a background color, a dot to a line chart, or other methods.
To enhance the comparison of data points, consider giving customers the ability to add a datapoint to a table below the chart for comparison.
Example: Consider a bar chart covering the past 9 quarters and a customer wanting to compare Q4 over the past 3 years. Adding a button within a tooltip allowing a customer to add that datapoint to a table below the graph enables them to directly compare those specific data points.
To provide more understanding around a data point, consider adding the ability for customers to add annotations to a chart. This allows customers to note specific points in time that help explain the “why” behind the data.
Example: a customer makes significant changes to the campaign and wants to add an annotation so they can visualize the difference in performance before and after the change.
Adding annotation to a chart
Don't make customers think about the chart. Stick to common chart types. This allows customers to focus on the data. If you do have to use a more uncommon on complex chart type, provide tools and information to help customers. Consistent use of charts helps comprehension. For example, if a "Message Volume" chart is used across multiple channels, ensure "messages sent" is always the same color so customers don't need to re-learn the chart's legend.
Communicate data from multiple levels. While a chart is likely detailing a specific timeframe or subset of the data, consider ways to communicate the context or high-level trends. Take the common example of a stock price chart for Apple. The most visual piece is the chart which is displaying the details of the pricing during the day. Above the chart, there's current price and today's change as the amount and percentage. Below the chart, there's the open price, high price, low price, and more data points to help provide context to the chart.
This goes beyond data, provide descriptive text to help make the chart easier to understand. Even above, the "today" is providing clarity to the percentage change and the chart below. It reinforces the timeline selected. Most weather apps do a good job of this by adding "Rain for the hour" or "Clear skies" next the temperature and forecast icon.