Get people’s feedback on how to improve the justice system & what innovation works

User testing evaluation methods help us understand what people want from services, documents, & tech.

It can guide court, legal aid, and civic teams to understand what people can use and how best to support them. Strategically, user testing can help teams to put diverse people’s needs first — and imagine new offerings that better work for them.

This Guide to User Testing

What are all the ways I can do user testing in making the justice system better?

User Testing Overview

How do I get early feedback on ideas for justice innovations?

Early Stage User Testing

How do I get feedback on something I’m building, to improve it?

User Testing During Development

How can I test my pilot project with users, to see if it improves their outcomes & experiences?

Pilot User Testing

How can courts get ongoing feedback from users to improve the system?

Ongoing User Feedback

What more can I read to understand user testing & find examples of it in action?

Reading List on User Testing

User Testing Overview

How do you get stakeholders’ input, to improve the justice system?

User testing can help you define what areas and ideas for innovation are most promising and impactful, especially from a community’s perspective. It can also be used later, when your team is developing, piloting, or evaluating a new intervention. User testing can tell you if your innovation has value and impact on the public, and what changes you should make to improve its effectiveness.

User testing can be combined with other evaluation methods — including expert reviews of legal outcomes, and experimental or quasi-experimental studies to track impact.

On this page, you can dive into specific examples and methods for user testing. But first you can see an overview of the types of user testing purposes and techniques.

Early Stage User Testing, when you are defining your agenda:

Priority Sort: have people look at a wide variety of high-level ideas and judge their relative value. They will put them in buckets, spending pretend money on the ideas.
Idea book: make concept posters or other high-level presentations of your various ideas or features. Put them in a single book, like a catalog. Have the testers look through and rank which of the ideas they’d like and why.

Development User Testing, when you’re refining a new proposal

Feedback Interview: show your new design prototype to a stakeholder and interview them about how usable, useful, and engaging it is.
Over-the-shoulder observation: give them the prototype and watch as they try to use it. Note down breakdowns, confusion, and payoffs. You can possibly give them a persona card to help them understand what POV they are using it from.
Survey instruments: have the tester fill out a short survey, usually with Likert scale responses of 1-7 (levels of agreement). The questions can draw from surveys around Usability, Design for Dignity, and Procedural Justice.

Pilot User Testing, as you launch a live intervention

Capability-Improvement Testing: have people use the design prototype, and then after they are done, give them a quiz to measure how much of the important content they have understood and retained — and if their strategy-making and confidence have improved
Administrative Burden Testing, to see how costly or time-consuming it is to use the innovation.
Ongoing Procedural Justice Testing, to see if it makes people feel the justice system is more fair, efficient, and transparent.

Ongoing User Testing & Feedback

Exit ratings to get quick reactions after using a service or tool
Follow-up Surveys and Interviews, as people are leaving a courtroom, office, or hybrid setting, or in the weeks and months after
Focus groups and community design sessions

Ethical Guidance for User Testing

The Legal Design Lab released a guide in 2017 that covers some of the design training that we give to our students before they go into the field to conduct interviews or testing with members of the community.

Use this guide as you get started planning & running user testing with community members.

Early-Stage User Testing

Is your team interested in defining an agenda for justice innovation?

Are you looking for fresh new ideas, and deciding what should be on your community’s agenda for improving the justice system?

You can use early-stage user testing methods to sort, rank, and improve ideas for innovation. These might be ideas that have come from grant proposals, design sprints, community brainstorms, or a committee discussion.

Before you invest time and money in building a pilot of an idea, you can do user testing to validate that the idea has value and potential impact for your key stakeholders.

Early-Stage User Testing Guide

The Legal Design Lab presents this user testing guide to walk court, legal aid, and law school teams through practical design and testing methods. It’s particularly meant for on-site testing of new ideas in a courthouse.

Our team wrote this guide to walk through exactly how we ran user testing for new traffic court-oriented redesigns. We captured the steps, tools, and ethical considerations we took when doing early-stage testing of new prototypes.

Games to Rank Ideas & Set Agenda

Do you have multiple ideas for justice innovations? Do you want input on which idea or area is of the highest value for your stakeholders? Try Idea Sorting & Ranking games.

Idea Sorting & Ranking games offer a method to have multiple stakeholders compare and prioritize high-level ideas. This kind of user testing evaluation can let your team gather a large number of stakeholders’ feedback about which ideas should move forward to the agenda.

Your team can put the ideas on cards or post-its, and the users can individually rank them — or a working group can together come to a consensus about which of the High-Medium-Low-No value categories each idea should be placed into.

Your team can run ‘Priority Sorts’ or ‘Idea Book Voting’ to have people rank ideas, and sort which should be abandoned and which should be forwarded.

You can give early, sketchy prototypes to target users, like in the form of drawings, posters, slide decks, or short descriptions. Let people explore them, try to use them, ask questions of them, or talk about why they like them or don’t.

But then make the person choose: where does this idea go – high, medium, or low priority? How much money would you give this idea?

Often we give fictional money to participants, to distribute among different ideas. Rather than having people sort cards of ideas, they are putting fictional money down among a short menu of ideas.

An Idea Ranking exercise, with fictional money allocated by users to different prospective innovations.

Quick Review Sheet for Early Ideas

The Idea Review is an initial survey-like tool, that stakeholders can use to quickly evaluate a concept that has been proposed.

We tend to use it when there are a handful of ideas in contention for development. This review can be done when groups are presenting their idea at ‘demo’ or ‘idea presentation’ session. Usually, it’s filled out by different experts and users. The team then collects the different ratings and comments to decide which ideas should move forward.

We use the Idea Review at the Legal Design Lab in order to judge ideas as they have emerged out of brainstorms, and after they’ve been sketched out in rough prototypes.

We present this review tool both to subject matter experts and target users, for them to give feedback that we can easily process. Often people have it when listening to a presentation or a demo of a new thing. They can fill in the Idea Review sheet to help us rank whether and how to move forward with it.

It’s a quick tool to get a mix of quantitative and qualitative feedback — and it’s best used when more than one idea is being reviewed.

User Testing During Development

Once your team has chosen an idea to prototype and pilot, you can continue user testing throughout the development cycle.

The user testing methods here will be more structured, and there are many more formal protocols, scales, and other methods to evaluate prototypes of new tech, services, and policies.

Two guides, the Civic User Testing Guide and 18F’s Usability Testing guide, walk you through how to run user testing for a new justice website, app, or digital service. These guides include practical tips about recruitment, privacy, compensation, structure, and analysis.

Civic User Testing Guide

The Smart Chicago Collaborative has a guide to running usability tests for civic (or public interest) websites, apps, and other technology tools.

It presents a very practical overview of how to operate User Experience and usability testing, especially with diverse community members in public places.

Use this guide to structure recruitment, research protocols, compensation, analysis, and other detailed tasks.

18F Remote Usability Testing Guide

The federal government design & tech group 18F has a series of extensive articles that walk through how civic teams can run usability tests of digital services, especially in a virtual situation.

These articles from 18F include examples, research protocols, ethical considerations, and more guidance on how to run remote user-centered tests well.

In this section, we walk you through some specific methods and instruments you can use when user-testing your new prototype or pilot of a justice innovation.

Early Prototype Review

As one idea is chosen to be the likely new pilot, then there are methods to evaluate the prototypes of this new intervention. This is still in the early, pre-pilot stage — but it helps your group to make a new service much more likely to be engaging, usable, and user-friendly.

As your team makes medium- or high-fidelity mockups of new documents, products, tech, services, or policy, you can invite stakeholders in to review these prototypes’ details to give you critical feedback on them.

Are they worth continuing on? How should they be edited? What must be improved before we invest more resources in them?

Have participants draw on your project plans, rank the different features and options, and give user experience and usability assessments of them.

Usability Protocols

Designer and technology researchers have established protocols for assessing websites, documents, services, and applications.

These usability protocols can help justice teams figure out how usable (or difficult) interventions are. These protocols can be used for any consumer-facing intervention, whether in the form of a paper, technology tool, human service, or otherwise.

We go over these scales and questions here. They include:

Net Promoter Score (NPS)
System Usability Scale (SUS)
Single Ease Question task analysis
NASA-TLX post-task workload
Usability & Dignity evaluation

Net Promoter Score (NPS) for an overall UX metric

Justice teams can use the Net Promoter Score (NPS) metric as a very simple, straightforward way to assess the quality of their intervention, from the user’s point of view. How do use the NPS? You ask the user, after they have used the intervention, the question:

How likely are you to recommend this [intervention type, like website/form] to your friend?

Answer on a scale of 0 (low) to 10 (high).

Then the team can count how many Promoters (ratings of 9-10), Detractors (ratings of 0-6), and Passives (I ratings of 7-8) they have. Then get the overall NPS by taking the percentage of Promoters, and subtracting away the percentage of Detractors.

The team will then have the NPS score for their intervention. They can use this to understand, overall, what people’s user experience of the intervention is.

System Usability Scale 10-question survey

Justice teams can use the popular System Usability Scale (SUS) to ask user-testers about how friendly or difficult their intervention is. The SUS is a set of questions to ask a tester after they have gone through the document, website, tool, or other ‘thing’.

The team would have given the tester their scenario & then given them the intervention to try to use. Then the team should ask them out loud, or have them fill in a paper or online survey with these questions. The answers will be on a Likert scale of 1-5, from Strongly Disagree to Strongly Agree.

1. I think that I would like to use this system frequently.

2. I found the system unnecessarily complex.

3. I thought the system was easy to use.

4. I think that I would need the support of a technical person to be able to use this system.

5. I found the various functions in this system were well integrated.

6. I thought there was too much inconsistency in this system.

7. I would imagine that most people would learn to use this system very quickly.

8. I found the system very cumbersome to use.

9. I felt very confident using the system.

10. I needed to learn a lot of things before I could get going with this system.

Then the team can tally all of the ratings of each question, to get a score for their intervention’s usability.

Single Ease Question task analysis

This protocol focuses on each part of an intervention (rather than the intervention as a whole). The team would ask this question regularly as a person is going through the intervention. It can help the team spot which parts of their workflow is problematic, and which is usable.

The team would ask the user, after different tasks within the intervention, this 1-7 Likert scale question:

Overall, this task was?

On a scale of 1-7, Very Difficult to Very Easy.

The task will still be fresh in the user’s mind, and the team can record the different difficulty levels regularly to compare the tasks’ difficulty.

NASA-TLX: Post-Task Workload

Another task-by-task analysis is from the world of high-stakes, complex products (like in aerospace or the military). This can assess how demanding and difficult individual tasks are in an overall workflow.

The NASA -TLX instrument has 6 questions that are on a 21-point scale from Very Low to Very High. The team asks these questions after each task. Then they also ask the user to weigh which of these questions/categories matter most to them. NASA has an application to use for this complex set of questions and weighing.

Mental Demand: How mentally demanding was this task?
Physical Demand: How physically demanding was the task?
Temporal Demand: How hurried or rushed was the pace of the task?
Performance: How successful were you in accomplishing what you were asked to do?
Effort: How hard did you have to work to accomplish your level of performance?
Frustration: How insecure, discouraged, irritated, stressed, and annoyed were you?

Usability and Dignity Evaluation Instrument

This evaluation protocol combines usability-oriented assessments with changes to someone’s sense of dignity.

When we ask people for short feedback on our new technology offerings, service designs, or information design, we use an evaluation instrument that we’ve created. It’s a short survey evaluation that incorporates assessments from established survey instruments to evaluate software’s usability, to get citizens’ feedback on government services, and to assess people’s sense of procedural justice and dignity while using an offering.

For each of these questions, we use a Likert scale, of 0 (Disagree Strongly) to 7 (Agree Strongly).

I think that I would like to use this system often to help me [insert objective: communicate with the court, navigate court process, etc.]

I thought the [design name] was easy to use.

I felt very confident using the [design name].

This will help me to get through court more efficiently.

This gave me clear, helpful information.

I felt that I was understood using the [design name].

I wish I could take [design name] around [place/system name] with me.

I felt the [design name] provided most of the information I was looking for.

I felt that the [design name] could be improved.

User Testing with Persona Task Cards

While testing out a new website, guide, hotline, app, or service, you can use persona task cards to help different team members experience this innovation from a different perspective. You can give your tester a ‘persona card’, so they know whose point of view they are looking at the ranking through.

Often in very early-stage testing, we have people test from a different person’s perspective. We give them personas to play, so that they scrutinize the design from these various points of view. We know that they are not as good as having a wide range of people from these different backgrounds, but it is a test run of this — to see what issues we can spot with a design before investing in wider testing.

Here are some example personas that we give to people:

Persona 1: 22-year-old digital native, very confident in technology, prefers to text over phone calls and sometimes even over in-person communication, feels higher confidence in their ability to figure things out especially using Google and looking through social media, but feels relatively out of their depth in the legal system
Persona 2: 65-year-old, who is a first-time user of a legal system, but has dealt with lots of other complex social systems like with health insurance, social security, taxes, etc. They are definitely not very confident with technology, but do email a lot, still uses AOL, just moved to the most basic smartphone this year upon the insistence of their kids.
Persona 3: 42-year-old who has been to court several times to deal with divorce, custody, and parenting plans. They have had enough repeat visits to feel confident about how to navigate the system and the relationships. They feel literate, but still want support to get things right
Persona 4: 31-year-old who has very limited English proficiency. They have been through immigration proceedings with the help of family and friends before, but they definitely don’t feel confident in going to court by themselves because of the language and because of the unfamiliarity of the system.
Persona 5: 18-year-old who is coming with their older family member to help translate for them in court. They are literate in English, and feel confident with technology. But they are not familiar with the legal system at all. They grew up in the US, and feel they can also help with the cultural translation for their family members

The Persona Card approach can help your court or legal team get started on reviewing your new innovation from different users’ perspectives. That said, you should test with real community members, including past and prospective users to make sure you’re getting diverse, accurate input.

Pilot User Testing

Once your new innovation is almost ready to be piloted — or even after it is formally launched, it’s time for another kind of User Testing. These methods are focused on measuring whether the innovation is having the desired impact on user empowerment or access to justice.

You can keep testing for usability and value, but these additional methods will help you see whether the fully-developed, detailed new website, app, flier, poster, bot, or other innovation is in fact improving the justice systems & people’s outcomes.

User Testing in this pilot phase should focus on Legal Capability improvements, Procedural Justice and Dignity improvements, and measurement of Administrative Burden. Because your innovation is now formal and detailed enough for a person to fully use, it’s ready to test out its impacts and costs.

Legal Capability Improvement

Your group likely cares about more than usability. You likely also want to evaluate whether a justice innovation is improving a person’s legal empowerment and their likely outcomes in the justice system.

You could then user-test a prototype for LegalCapability Improvement. This technique can help you determine if your intervention (like a new form, website, or app) might have on a person’s ability to navigate their legal problem and solution.

This technique can help you determine if your intervention can make people more likely to engage with the legal tasks, more informed about the correct info, and more strategic in making choices that are in their best interest.

Most early-stage Capability Improvement tests focus on measuring usability, user experience, and knowledge testing. This means having a small number of people, representative of the target population, use your new intervention (and possibly some other versions). Your testing team will gather both qualitative and quantitative feedback.

Qualitative Information on engagement and capability:

Ask: What did you like?

Ask: What did you find confusing?

Observe what they skipped or ignored. Ask them why did they skip?

Observe what they complain about, or where they express frustration. Ask them, why was this frustrating?

Ask: What made you feel a sense of dignity?

Ask: What make you feel more knowledgeable?

Ask: What about this would make you recommend it to someone else going through this problem?

In addition, you gather Quantitative Information on changes to the testers’ legal capabilities:

Do they fully engage with all of the tasks?

Do they complete the process?

Do they pay attention to all that is being communicated to them (measured by eye-tracking or page-recording)

After they use the intervention, do they answer key Knowledge Questions correctly (in a quiz)?

After they use the intervention, do they have a concrete, and expert-approved strategy of next steps?

How long do they have to spend to understand the information, and answer questions/form strategies correctly?

Quiz-based legal capability testing

Often Capability Testing is done by giving participants scenarios, and having them try to answer ‘quiz’ questions that test their knowledge and their strategy-making. For example, this is a Legal Capability evaluation from Catrina Denvir in a study of legal education online. She gave recruited participants a fictional scenario, with a ‘persona’ to play. Then she asked them legal knowledge & strategy questions, to determine if a new website intervention improved their ability correctly answer the questions. The quiz questions can help more directly measure the impact of an intervention in improving a person’s legal knowledge — and thus their capability to deal with their justice problem.

Administrative Burden Testing

Will your intervention be low-burden enough that people will be able to engage with it? Or will it be so timely, costly, or overwhelming that people will disengage with it, delay using it, get it wrong, or resent its costs?

User testing for Administrative Burden is useful (and sometimes required) for a public service or technology tool.

Administrative burden testing measures the cost of time and expenses to use a tool, document, or service. It can put a quantitative analysis of how difficult it might be to use a form, tool, or service while trying to resolve a legal problem.

Many federal agencies are required to do this Burden Cost evaluation whenever they make changes to procedures involved in users’ interactions with social security, taxes, or disability processes, or who can access food stamps. It’s required at the federal level by the Paperwork Reduction Act, which requires an agency to measure the effect of a new procedure change by looking at:

The time it takes for an average person to fill in the given form or do the required task
The time it takes to prepare the documents or get the information to correctly fill in the form/do the task
Assume that this costs a person $15/hour
Calculate the number of people who will have to go through this on average

This basic calculation will allow you to produce a numeric amount of how much this form or step costs ‘the public’ :

( (Time to fill + prep)$15/hour) )# of people doing this = Burden Cost

Having Burden Costs — or comparing them across different proposed forms — can be a very influential way to pressure policy-makers or support an argument around process simplification. In the access to justice space, you could be do Burden Cost calculations that include:

Time to search for and find the correct form
Time to look up and understand the words being used in the form
- (Optional: time to get help at Self Help Center, including waiting in line and being seen)
- (Optional: time to call legal aid, get screened, see if they can help you, be helped)
Time to read the form and fill in the questions
Time to prep/make copies/ get ready for filing
Time to file it in the courthouse
Time to deal with any problems with filing

You can calculate these time costs by doing these steps yourselves, or having research participants do some/all of these. It could also be done by gathering data from experts with data or informed estimates of these timings.

Ongoing User Testing

As your innovation goes from a prototype, to a pilot, to an ongoing ‘thing’, there is another phase of User Testing to do. This phase is about ongoing evaluation and feedback, both about a specific intervention and the overall system.

Exit rating

A quick way to get user feedback on an experience, service, or product, is to ask them to use a very simple rating on their ‘exit’. It can be on a text message line, on a tablet (for an in-person service), on a browser window (for a web-based service), or on a paper sheet (again for in-person).

This might be an NPS survey question: how likely are you to recommend this to another person?

Ideally, the user testing will run through a very quick and visual interface, that lets the user quickly put a rating on what they’ve just experienced.

Exit & Follow-Up Surveys

User testing can happen after a real user has used an intervention or gone through the legal system.

Many courts, legal aid groups, and experts have created exit surveys for when a person has just concluded their justice journey.

These exit surveys can gather important feedback on the quality of their experience, their outcomes, and their ideas for improvement.

LaGratta Consulting. “Court Voices Project: Using Court User Feedback to Guide Courts’ Pandemic Responses,” August 2022. www.lagratta.com/court-voices-project-user-feedback.

Sample exit survey from California Courts, California Courts. “Customer Satisfaction Survey.” https://www.courts.ca.gov/partners/documents/customersatisfactionsurvey.pdf.

Read more about feedback surveys and techniques to use with this report: “Trial Court Research and Improvement Consortium Executive Program Assessment Tool: Assistance to Self-Represented Litigants Revised Draft,” 2005. https://www.srln.org/node/43/trial-court-research-and-improvement-consortium-tcric-self-help-program-assessment-tool-2005.

Also explore how to use text message follow-up to get more user input, with this report from Legal Aid Society of Cleveland and Michigan Legal Help. “Texting for Outcomes Toolkit,” 2020. https://www.lsntap.org/sites/lsntap.org/files/Texting for Outcomes Toolkit %2810.18.2021%29 Final w App.pdf

Focus Groups & Design Workshops

Courts and legal aid groups can also gather ongoing feedback through interactive, qualitative sessions with past users and community members. These kinds of deep-dive sessions can help groups get in-depth information about what works, what doesn’t, and what new opportunities exist.

Listen > Learn > Lead report

This report from IAALS and collaborators from university design labs profiles how to involve court users and other stakeholders in deep, qualitative feedback sessions about systemic change.

Institute for the Advancement of the American Legal System, Margaret Hagan, Dan Jackson, and Lois Lupica. “Listen> Learn> Lead: A Guide to Improving Court Services through User-Centered Design.” Denver, https://iaals.du.edu/publications/listen-learn-lead

Reading List on User Testing

Read more about how to run user testing and find examples of how others do it.

Civic User Testing book

O’Neil, Daniel X, and Smart Chicago Collaborative. Civic User Testing Group as a New Model for UX Testing, Digital Skills Development, and Community Engagement in Civic Tech. Chicago: The CUT Group, 2019, https://irp-cdn.multiscreensite.com/9614ecbe/files/uploaded/TheCUTGroupBook.pdf

User Testing 4 Different Prototypes

Hagan, Margaret. “Community Testing 4 Innovations for Traffic Court Justice.” Legal Design and Innovation, 2017. https://medium.com/legal-design-and-innovation/community-testing-4-innovations-for-traffic-court-justice-df439cb7bcd9

Pop-Up User Research in the Courts

Aldunate, Guillermo, Margaret Hagan, Jorge Gabriel Jimenez, Janet Martinez, and Jane Wong. “Doing User Research in the Courts on the Future of Access to Justice.” Legal Design and Innovation. Stanford, CA, July 2018. https://medium.com/legal-design-and-innovation/doing-user-research-in-the-courts-on-the-future-of-access-to-justice-cb7a75dc3a4b

Remote Usability Testing

Maier, Andrew, and Sarah Eckert. “Introduction to Remote Moderated Usability Testing, Part 2: How.” 18F, US General Services Administration agency, November 20, 2018. https://18f.gsa.gov/2018/11/20/introduction-to-remote-moderated-usability-testing-part-2-how/

18F Toolkit for User-Centered Design

18F. “18F Methods: A Collection of Tools to Bring Human-Centered Design into Your Project.” US General Services Administration, 2020. https://methods.18f.gov/

Participatory Design of Justice Innovations

Hagan, Margaret. “Participatory Design for Innovation in Access to Justice.” Daedalus 148, no. 1 (2019): 120–27. https://doi.org/10.1162/DAED_a_00544 .

Abstract: Most access-to-justice technologies are designed by lawyers and reflect lawyers’ perspectives on what people need. Most of these technologies do not fulfill their promise because the people they are designed to serve do not use them. Participatory design, which was developed in Scandinavia as a process for creating better software, brings end-users and other stakeholders into the design process to help decide what problems need to be solved and how. Work at the Stanford Legal Design Lab highlights new insights about what tools can provide the assistance that people actually need, and about where and how they are likely to access and use those tools. These participatory design models lead to more effective innovation and greater community engagement with courts and the legal system.

Doing Human-Centered Design with Courts

Hagan, M.D., 2018. “A Human-Centered Design Approach to Access to Justice: Generating New Prototypes and Hypotheses for Intervention to Make Courts User-Friendly.” Indiana Journal of Law and Social Equality, 6(2), pp.199–239. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3186101

Abstract: How can the court system be made more navigable and comprehensible to unrepresented laypeople trying to use it to solve their family, housing, debt, employment, or other life problems? This Article chronicles human-centered design work to generate solutions to this fundamental challenge of access to justice. It presents a new methodology: human-centered design research that can identify key opportunity areas for interventions, user requirements for interventions, and a shortlist of vetted ideas for interventions. This research presents both the methodology and these “design deliverables” based on work with California state courts’ Self Help Centers. It identifies seven key areas for courts to improve their usability, and, in each area, proposes a range of new interventions that emerged from the class’s design work. This research lays the groundwork for pilots and randomized control trials, with its proposed hypotheses and prototypes for new interventions, that can be piloted, evaluated, and — ideally — have a practical effect on how comprehensible, navigable, and efficient the civil court system is.

Legal Help Design Testing & Strategies

The Michigan Advocacy Project & Graphic Advocacy Project have a guide to using legal design strategies to improve your legal technology efforts.

These include guides for user testing & principles about how to make legal help that works for many different types of people.

More Evaluation Resources

Some groups, like the World Bank and the UK government, have assembled handbooks that collect many different instruments that groups can use to evaluate the impact of their policies and programs.

These texts and slide decks are useful field guides to evaluating new policies, services, and other interventions in the field.

Impact Evaluation in Practice

This free training book from the World Bank presents strategies and tools to evaluate programs in the field.

UK Magenta Book on evaluation

This set of resources from the UK Government, called the ‘Magenta Book’ and connected slide decks and appendices, goes through how to evaluate policies in practice.

This Guide to User Testing

User Testing Overview

Early Stage User Testing, when you are defining your agenda:

Development User Testing, when you’re refining a new proposal

Pilot User Testing, as you launch a live intervention

Ongoing User Testing & Feedback

Ethical Guidance for User Testing

Early-Stage User Testing

Early-Stage User Testing Guide

Games to Rank Ideas & Set Agenda

Read More on Idea-Ranking in Courts & Legal Spaces

Quick Review Sheet for Early Ideas

User Testing During Development

Civic User Testing Guide

18F Remote Usability Testing Guide

Early Prototype Review

Usability Protocols

Net Promoter Score (NPS) for an overall UX metric

System Usability Scale 10-question survey

Single Ease Question task analysis

NASA-TLX: Post-Task Workload

Usability and Dignity Evaluation Instrument

User Testing with Persona Task Cards

Pilot User Testing

Legal Capability Improvement

Quiz-based legal capability testing

Administrative Burden Testing

Ongoing User Testing

Exit rating

Exit & Follow-Up Surveys

Focus Groups & Design Workshops

Listen > Learn > Lead report

Reading List on User Testing

Civic User Testing book

User Testing 4 Different Prototypes

Pop-Up User Research in the Courts

Remote Usability Testing

18F Toolkit for User-Centered Design

Participatory Design of Justice Innovations

Doing Human-Centered Design with Courts

Legal Help Design Testing & Strategies

More Evaluation Resources

Impact Evaluation in Practice

UK Magenta Book on evaluation