The System Usability Scale (SUS) is a 10-item questionnaire that produces a single score from 0 to 100 to measure perceived usability. John Brooke originally developed it as a "quick and dirty" tool for usability testing. For marketers and SEO practitioners, it provides a standardized grade to determine if website changes actually improve user experience or if usability issues are hurting conversions.
What is the System Usability Scale?
SUS was created by John Brooke in 1986 as a "quick and dirty" scale for administering after usability tests. It consists of 10 statements rated on a 5-point Likert scale, ranging from "Strongly Disagree" to "Strongly Agree". SUS is technology independent and has been tested on hardware, consumer software, websites, cell-phones, IVRs and even the yellow-pages.
The questionnaire covers both positive and negative aspects of usability. Items include statements like "I thought the system was easy to use" and "I found the system unnecessarily complex". Score calculation converts responses to a 0-100 scale, though this is not a percentage. SUS is an industry standard with references in over 600 publications.
Why the System Usability Scale matters
- Benchmarks performance: The average SUS score from 500 studies is 68. Scores above 68 indicate above-average usability; below signals problems.
- Works with small samples: SUS can be used on very small sample sizes (as few as two users) and still generate reliable results. This makes it viable for tight budgets.
- Enables A/B comparisons: You can compare two website versions or track improvements across releases using a single number, even for dissimilar systems.
- Splits usability and learnability: Research shows SUS provides sub-scales of usability (8 items) and learnability (items 4 and 10), letting you track which dimension needs work.
- Valid and reliable: SUS correlates with other usability questionnaires and outperforms home-grown surveys in detecting differences.
How the System Usability Scale works
Administer the questionnaire immediately after users complete key tasks. Then calculate:
- Score each item: For odd-numbered items (1, 3, 5, 7, 9), subtract 1 from the user response. For even-numbered items (2, 4, 6, 8, 10), subtract the user response from 5. This scales all values from 0 to 4.
- Calculate total: Add the converted responses for each user. Multiply the total by 2.5 to convert the range from 0-40 to 0-100.
- Interpret the grade: A score above 80.3 earns an A (top 10%), 68 is a C (average), and below 51 is an F (bottom 15%). Convert scores to percentiles to communicate with stakeholders; a raw score of 74 equals the 70th percentile.
- Analyze sub-scales: Items 4 and 10 measure Learnability; the remaining eight measure Usability.
Note that SUS is not diagnostic. It signals that problems exist but does not identify specific interface issues.
Best practices
- Compute confidence intervals: Small samples produce reliable but imprecise estimates. Calculate confidence intervals around your sample mean to understand the range of possible true scores.
- Convert to percentiles for reporting: Since SUS scores are not percentages, convert them to percentile ranks or letter grades when presenting to stakeholders unfamiliar with the scale.
- Administer immediately: Give the questionnaire right after task completion while the experience is fresh.
- Track both scales: Monitor the global SUS score alongside the Learnability and Usability sub-scales to pinpoint specific friction points.
- Benchmark consistently: Use SUS to compare your site against previous versions or competitors using the same grading curve.
Common mistakes
- Treating the score as a percentage: A score of 70 does not mean 70% usable or 70th percentile. It actually represents roughly the 50th percentile, not above average.
- Using SUS for diagnosis: Low scores indicate you should review session recordings or conduct further testing. SUS will not tell you which button is broken.
- Ignoring sample size variability: While SUS works with two users, small samples generate imprecise estimates of the population score. Do not treat a score from five users as exact.
- Expecting perfect task correlation: SUS scores show only modest correlation with task completion rates (r=.24), explaining only 6% of the variance. Users may fail tasks but rate the system highly due to other factors.
- Neglecting the learnability dimension: Focusing only on the total score misses whether users struggle to learn the system versus daily use friction.
Examples
Example scenario: Homepage redesign comparison
You test two homepage layouts with 20 users each. Version A scores 72 (C+), Version B scores 81 (A). You ship Version B knowing it performs in the top 10% of systems tested.
Example scenario: Quarterly benchmarking
Your e-commerce site scores 65 (below average). You fix checkout flow issues. Three months later, the score rises to 74 (70th percentile). The grade improvement validates the redesign investment.
Example scenario: Go/no-go decision
A new internal tool scores 48 (F). Despite being "functional," the SUS score signals severe usability issues. You delay launch to conduct task analysis, avoiding a rollout that would have hurt productivity.
System Usability Scale vs alternative questionnaires
| Feature | SUS | SUMI | SUPR-Q | QUIS |
|---|---|---|---|---|
| Length | 10 items | 50 items | 13 items | Varies (diagnostic) |
| Focus | Overall usability and learnability | Software usability | Usability, trust, appearance, loyalty | User interaction satisfaction |
| Diagnostic detail | Low (global score only) | Moderate | Moderate | High (guides redesigns) |
| Best for | Quick benchmarking, A/B tests | Deep software analysis | Website experience metrics | Identifying specific redesign needs |
Rule of thumb: Use SUS when you need a fast, reliable grade. Switch to SUMI or QUIS when you need to diagnose specific interaction problems.
FAQ
What exactly does SUS measure?
SUS measures perceived usability and learnability. It captures how users feel about the system after use, not their task success rate or specific error locations.
How many users do I need for valid SUS results?
You can obtain reliable results with as few as two users, but larger samples (20+) provide more precise population estimates. Always report confidence intervals with small samples.
Is a SUS score of 70 good?
No. The average score is 68, so 70 is only slightly above average (roughly 50th-60th percentile). Scores above 80.3 are considered excellent (A grade).
Can I modify the SUS questions?
Minor wording changes (replacing "system" with "website") are common. However, significant alterations affect comparability with industry benchmarks. Not specified in the sources whether minor changes invalidate reliability.
Why doesn't my SUS score match my task success data?
SUS shows only a modest correlation with task performance (r=.24). Users may complete tasks but find the process painful, or fail yet blame themselves rather than the interface.
How often should I run SUS?
Run SUS after major redesigns or quarterly for ongoing products. Use it consistently to track trends rather than one-off snapshots.