Tufte's Principles for Data Visualisation

Table of Contents

Adapted from the work of Dr. Angelica Parente.

See also Analytical Design, Sparklines, and Layering for extensions from Envisioning Information, Visual Explanations, and Beautiful Evidence.

1. Graphical Excellence

Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency.

Core qualities

  • Show the data.
  • Induce the viewer to think about substance, not methodology or design.
  • Avoid distorting what the data have to say.
  • Present many numbers in a small space.
  • Make large datasets coherent.
  • Encourage eye comparison of different pieces of data.
  • Reveal data at several levels of detail (broad overview to fine structure).
  • Serve a reasonably clear purpose.
  • Integrate closely with statistical and verbal descriptions.

Questions to ask

  • Does the graphic show the data clearly?
  • Does it encourage thinking about content over decoration?
  • Can the viewer compare data elements easily?

2. Graphical Integrity

Graphics must tell the truth about the data.

The Lie Factor

Lie Factor = Size of effect shown in graphic / Size of effect in data
  • Lie Factor = 1.0 → truthful.
  • Lie Factor > 1.05 or < 0.95 → distortion.

Six principles of graphical integrity

  1. Representation of numbers should be directly proportional to quantities represented.
  2. Clear, detailed, thorough labeling defeats distortion.
  3. Show data variation, not design variation.
  4. In time-series displays, standardise money (deflate) and use consistent baselines.
  5. Dimensions of graphics should not exceed dimensions of data.
  6. Graphics must not quote data out of context.

Common violations

  • 3D effects on 2D data (adds fake dimension).
  • Truncated axes that exaggerate change.
  • Inconsistent intervals.
  • Area/volume encoding of linear data.
  • Missing context or baselines.

3. Data-Ink Ratio

The data-ink ratio is the proportion of a graphic's ink devoted to the non-redundant display of data-information.

Data-Ink Ratio = Data-ink / Total ink used in graphic

Maximise the data-ink ratio within reason:

  1. Erase non-data-ink (decoration, heavy grids, boxes).
  2. Erase redundant data-ink (3D when 2D suffices).
  3. Revise and edit.

Non-data-ink to eliminate

  • Heavy gridlines.
  • Unnecessary borders and boxes.
  • Redundant labels.
  • Decorative elements.
  • Excessive tick marks.
  • 3D effects that add no information.

The eraser test: if you can erase something without losing data information, erase it.

4. Chartjunk

Chartjunk is the interior decoration of graphics that does not convey information.

Three categories

A. Unintentional optical art (moiré vibration)

  • Busy patterns that create visual noise.
  • Cross-hatching that vibrates.
  • Competing visual frequencies.

B. The Grid

  • Heavy grids compete with data.
  • Grids should be muted or eliminated.
  • If needed, use light grey or dotted lines.

C. The Duck (self-promoting graphics)

  • Graphics that draw attention to their own design.
  • Decoration masquerading as information.
  • Style over substance.

Chartjunk indicators

  • Viewer notices the design before the data.
  • Colours/patterns used for decoration, not encoding.
  • 3D effects, shadows, gradients without purpose.
  • Clip art, icons, or illustrations that don't carry data.

5. Small Multiples

Small multiples are series of graphics showing the same combination of variables, indexed by changes in another variable.

Characteristics

  • Same design structure repeated.
  • Changes in data, not design.
  • Enables direct visual comparison.
  • High information density.
  • Eye moves across variations effortlessly.

When to use

  • Comparing across categories, time periods, or conditions.
  • Showing change or variation.
  • Revealing patterns across groups.
  • When animation would work but static display is needed.

Design guidelines

  • Identical scales across all panels.
  • Consistent visual encoding.
  • Minimal between-panel decoration.
  • Clear labelling of what varies.
  • Tight spacing (data should dominate).

6. Data Density & Information Resolution

Data density = numbers plotted per unit area.

High data density is a sign of graphical quality. Maps and time-series can achieve thousands of numbers per square inch.

Shrink principle

Graphics can often be reduced significantly while maintaining readability and gaining impact. Consider:

  • Sparklines (word-sized graphics).
  • Condensed time-series.
  • Small multiple arrays.

Resolution thinking

  • What is the minimum size that remains readable?
  • Can we show more data in the same space?
  • Are we wasting white space?

7. Multifunctioning Graphical Elements

Every graphical element should serve multiple purposes when possible.

Data measures that can serve as: data point, label, scale marker, grid reference.

Examples

  • Data points that also serve as labels (scatter plots with text).
  • Axis that is also a data series.
  • Range frames (axis shows data range, not arbitrary extent).

8. Aesthetics and Technique

Balance complexity and simplicity

  • Simple design, complex data.
  • Complexity should come from data, not decoration.

Visual hierarchy

Data > Labels > Annotations > Grids > Borders. Prominent elements should carry the most information.

Colour use

  • Use sparingly and purposefully.
  • Ensure accessibility (colorblind-safe palettes).
  • Grey as default; colour for emphasis or encoding.
  • Avoid "rainbow" colour maps for sequential data.

Typography

  • Clear, readable fonts.
  • Appropriate sizing hierarchy.
  • Horizontal text when possible.
  • Labels close to the data they describe.

Quick Reference: The Tufte Test

For any visualisation, ask:

  1. Data-Ink — can I erase any element without losing data? (Erase it.)
  2. Integrity — does the visual effect match the data effect? (Lie Factor ≈ 1.)
  3. Chartjunk — does any element exist for decoration only? (Remove it.)
  4. Excellence — does it reveal the data at multiple levels? (Broad + detailed.)
  5. Comparison — can the viewer easily compare data elements? (Enable it.)
  6. Density — could this show more data in the same space? (Condense.)
  7. Context — is all necessary context provided? (Labels, sources, scales.)

For the extended 14-question test see Analytical Design, Sparklines, and Layering.

Emacs 29.1 (Org mode 9.6.6)