Evaluation Methods of Justice Innovations

What works in increasing access to justice?

Evaluation methods can help us create more effective legal help. And they can make sure that it the service is working as intended.

Use these evaluation methods in the design process & then also for services as they are piloted and scaled.

  1. Early-Stage Evaluation during the Design of a Legal Help Service: How do we know what kind of justice innovation is needed? Does our new idea for legal help work? How can we make the strongest version of a new idea?
  2. Pilot Evaluation during the early Deployment of a Legal Help Service: Is this thing working as intended? What wrong assumptions or choices were made in the design, that need to be fixed? What bug or performance issues must be improved? Does it increase people’s access to justice, on key indicators?

And how do we know whether it works as intended?

How do we know what the most pressing needs of the community are? How do we determine what the most promising designs and engagement strategies are?

How do we evaluate ideas’ impact early and often, to see if our proposed solutions work? And how can we best embed solutions into a community, so people engage with them?

This Evaluation Methods section of our website collects together resources, experiments, and case studies that you can use to create better innovation in the justice system.

  • Please explore early-stage evaluation of ideas with resources on our User Testing page.
  • Look to our Indicators page to consider the outcomes and metrics you can be measuring, to determine if your intervention is getting to access to justice.

Please explore below as well, to see some high-level outlines of what you might consider doing in your design and innovation work for access to justice

We are collecting methods for the whole journey of an innovation process. It is meant for practitioners in courts and legal aid groups, for designers and technologists working on legal and social system innovation, and for academics who are studying this area.

The methods are grouped into use cases — depending on what types of knowledge and outcomes they produce. We have links to toolkits that demonstrate these methods in greater detail and give examples. We also link to model stories, that describe specific implementation of the methods for a particular context.


Early Stage Evaluation

Here is a quick overview of tools — with more explored in depth below.

  1. Priority Sorts
  2. Over-the-shoulder observation
  3. Usage and interviews
  4. Dot votes at an idea fair
  5. Field Tests: counting usage
  6. Usability evaluation – Likert scale
  7. Dignity/procedural justice evaluation
  8. Comprehension quiz
  9. Affinity/aesthetics tests




UX Heuristic Review - Idea Review sheet for wise design smaller


There is real value in costing out/comparing the (Administrative) Burden Costs of a given form, task, or other process step in a person’s legal journal.

Many federal agencies are required to do this Burden Cost evaluation whenever they make changes to procedures involved in users’ interactions with  social security, taxes, or disability processes, or who can access food stamps. It’s required at the federal level by the Paperwork Reduction Act, which requires an agency to measure the effect of a new procedure change by looking at:

  • The time it takes for an average person to fill in the given form or do the required task
  • The time it takes to prepare the documents or get the information to correctly fill in the form/do the task
  • Assume that this costs a person $15/hour 
  • Calculate the number of people who will have to go through this on average

This basic calculation will allow you to produce a numeric amount of how much this form or step costs ‘the public’ :

( (Time to fill + prep)*$15/hour) )*# of people doing this = Burden Cost

Having Burden Costs — or comparing them across different proposed forms — can be a very influential way to pressure policy-makers or support an argument around process simplification. In the access to justice space, you could be do Burden Cost calculations that include:

  • Time to search for and find the correct form
  • Time to look up and understand the words being used in the form
    • (Optional: time to get help at Self Help Center, including waiting in line and being seen)
    • (Optional: time to call legal aid, get screened, see if they can help you, be helped)
  • Time to read the form and fill in the questions
  • Time to prep/make copies/ get ready for filing
  • Time to file it in the courthouse
  • Time to deal with any problems with filing

You can calculate these time costs by doing these steps yourselves, or having research participants do some/all of these. It could also be done by gathering data from experts with data or informed estimates of these timings.


Benchmarking is a key evaluation and research technique, especially for a discrete work product like a a new Form or Info Sheet your team is creating. In benchmarking, you will be comparing your intervention to established principles + criteria from past design work, (or by investing in establishing these yourselves, by looking at others’ practices, if there do not yet exist benchmark standards).
For those working on legal communications, for example, there are some Benchmark Criteria from other groups working on parallel efforts to forms.  
The first is from the Simplification Centre, which works on simplifying government documents generally. The second is from the UK’s Behavioural Insights Team, from their massive study of improving privacy policy documents.
You can compare your new legal communication design (like your FAQ, Info Sheet, Letter, or Form) to these established principles, make tweaks as might become clear — and then communicate that to the Judicial Council — that you are following established benchmark standards.
(See full 16 criteria from Simplification Centre and explanation here, and see more details on the Behavioral Insights Team’s experiments/goals at https://www.bi.team/blogs/terms-conditions-apply/)


Another method to evaluate an early prototype is Capability Improvement. This technique can help you determine if your intervention (like a new form, website, or app) might have on a person’s ability to navigate their legal problem and solution.

This technique can help you determine if your intervention can make people more likely to engage with the legal tasks, more informed about the correct info, and more strategic in making choices that are in their best interest.

Most early-stage Capability Improvement tests focus on measuring usability, user experience, and knowledge-testing. This means having a small number of people, representative of the target population, use your new intervention (and possibly some other versions). Your testing team will gather

Qualitative Information on engagement and capability:

  • What they like,
  • What they find confusing,
  • What they skip or ignore,
  • What they complain about
  • What they say improves their sense of dignity, knowledge, or likelihood to use thing/recommend it to friends

In addition, you gather Quantitative Information on changes to the testers’ legal capabilities:

  • Do they fully engage with all of the tasks?
  • Do they complete the process?
  • Do they pay attention to all that is being communicated to them (measured by eye-tracking or page-recording)
  • After they use the intervention, do they answer key Knowledge Questions correctly (in a quiz)?
  • After they use the intervention, do they have a concrete, and expert-approved strategy of next steps?
  • How long do they have to spend to understand the information, and answer questions/form strategies correctly?

Often Capability Testing is done by giving participants scenarios, and having them try to answer ‘quiz’ questions that test their knowledge and their strategy-making. For example, this is a Legal Capability evaluation  from Catrina Denvir in a study of legal education online. She gave recruited participants a fictional scenario, with a ‘persona’ to play. Then she asked them legal knowledge & strategy questions, to determine if a new website intervention improved their ability correctly answer the questions. The quiz questions can help more directly measure the impact of an intervention in improving a person’s legal knowledge — and thus their capability to deal with their justice problem.



Evaluations of the Intervention in the Field

This free training book from the World Bank presents strategies and tools to evaluate programs in the field.

In addition, this set of resources from the UK Government, called the ‘Magenta Book’ and connected slide-decks and appendices, goes through how to evaluate policies in practice.


User Feedback machine from an airport

MargaretEvaluation Methods of Justice Innovations