Evaluation Methods of Justice Innovations

What works in increasing access to justice?

Evaluation methods can help us create more effective legal help interventions. And they can make sure that the intervention is working as intended.

On this page, you can find resources, experiments, and case studies on user testing, pilot evaluation, and other outcomes research on justice innovations.

Table of Contents

Why Evaluate a Justice Innovation?

Your team may have several different motives to evaluate legal help services. You could be required to run an evaluation by a funder or other group. Or perhaps you are interested in how your work is impacting people.

Our team at Stanford Legal Design Lab is interested in improving how our community evaluates legal help services. From our review of practices and the literature, we have found:

There are many metrics and instruments in use, but they are often disparate and divided. Even if many overlap, they’re not used consistently across jurisdictions, policy areas, or intervention areas. Many groups might share their research protocols at conferences, in papers, or with online reports, but there is no coordinated discussion about which ones should be used and scaled.

Many of the metrics and instruments aren’t used practically. In on-the-ground civil justice settings, like courts, legal aid groups, and nonprofit offices, there’s no awareness of what metrics to be tracking or instruments to use. That means when groups are trying to measure the status quo or evaluate a pilot, they often rely on their own instincts or a consultant to devise an evaluation plan. Or they might forgo an evaluation plan, and rely on anecdotes or casual observation to decide what works.

Ideally, better metrics and instruments could drive positive service innovations in the civil justice system. Evaluation can make stakeholders more aware of the impact of their current legal services and people’s justice needs, and have stronger knowledge about what works to make a positive impact. If we have well-defined, usable metrics and instruments, this can help spotlight if and how legal service providers are having positive impacts on key outcomes that matter to them, their funders, and their clients. These could ensure there is consistent attention being paid to what impact and outcomes result from the system. This can ensure services, funding, and strategies are allocated to the most effective initiatives that promote access to justice.

Better metrics and instruments can also lead to system change, with these new indicators changing the perspective of leaders in civil justice. Metrics can be the “key performance indicators” (KPIs) that are at the heart of long-term planning, education, and agenda-setting in these organizations. The more that something is measured, the more it can become the basis for organizational change management that puts that thing as a central priority. If there is more access to justice measurement, then it might become a KPI for courts and government agencies.

What do you want to evaluate?

Do you have court forms, legal aid forms, or other structured documents that you want to evaluate?

Use this evaluation plan to understand if the form is well-designed, if people can find and use it, and if it is having the legal outcomes that you intend.

Many courts, legal aid groups, and commissions build websites to help people understand and deal with their legal problem. These websites offer guides, FAQs, service directories, form-filling tools, and other features to help people access the justice system.

What makes for a good website, and how do you know if your website is performing at a high level?

Use the Lab’s Legal Help Online Dashboard to rank your website. How is your site doing on the 4 key areas: Discovery, Tech, Content, and Design?

Use our rankings to understand how you are doing.

Then use our guides to improve on all 4 areas, and serve the public better.

What Metrics to Evaluate With?

When your team is evaluating a justice innovation, what are the indicators that a technology, service, or policy is succeeding? What are the metrics that you should be evaluating for?

Based on our team’s literature review, we have found a few categories of metrics that the justice community has prioritized as the most important to measure.

High-Level, Big-Picture Outcomes to measure

Ideally, the justice system (including court, legal aid, and private services) will help people live better lives, with conflicts resolved, and with more stability and security. If we wanted to evaluate legal help services at the biggest picture, then some of the evaluation metrics include the following four categories.

4 Broad Outcome Areas For Access to Justice

1: Better Social Stability and Prevention of Poverty

These metrics focus on the medium and long-term outcomes experienced by the person with the justice problem. A positive outcome would be that a person’s (and family’s) life is more stable, with stable housing, finances, and family relationships. A small life problem has not spiraled into a larger set of problems that risks pushing them into instability and poverty. A good justice experience and set of services will help stabilize a person’s life, resolve the conflict they’re experiencing, and make their life better.

2: Access to the Civil Justice System

This area of metrics focuses on a person’s ability to use the courts, legal aid, and other public justice services if they wish to. A positive outcome would be that as they are going through their justice journey, they do not encounter barriers that make it too difficult, costly, or otherwise burdensome to use the justice system. Another positive outcome would be that they have a quality of choice to decide what actions to take, and how to use (or not use) the justice institutions to resolve their problem. The equity of access is also a key part of this outcome area. All demographic groups should be able to access the civil justice system in a low-burden way.

3: Improved Substantive Justice

In this area of metrics, the focus is on the application of the law to a person’s situation. A positive outcome would be that the person enjoys their full set of rights, defenses, and claims within the legal regime of their jurisdiction. Their conflict is resolved in a just manner because the law is applied correctly. Also, they know about their rights, their defenses, and legal procedure enough to raise all claims that they choose to. They are able to use the law to advocate for themselves, and the order or settlement is a just resolution for themselves and the other party.

4: Improved Procedural Justice

This area of metrics looks at a person’s experience of the justice system. A positive outcome would be that a person feels dignified, respected, and included in the justice system. They would feel that the system is transparent and fair. During and after their justice journey, they understand what is happening, feel included in the proceedings, and respect that the system works — and that the outcome should be followed. If they have a legal problem in the future, they would use the justice system. Or they would recommend that a neighbor or a family member would use it.

Intermediate, Specific Outcomes to measure

When looking at the big-picture, high-level outcomes, it can be hard to figure out if these are being achieved or not. How can your team know if people have access to justice, or improved procedural justice, or better social outcomes? How can you tell if a service is improving one or all of these areas?

Also, it can be hard for one service, policy, or technology to improve these big-picture outcomes. A justice problem (like an eviction, custody dispute, guardianship, or debt lawsuit) is often a wicked problem, that is not easily resolved and has many factors contributing to it. If your team only measured a new intervention based on big-picture outcomes, it would be hard to see direct, immediate success.

Rather, it can be useful to measure more intermediate outcomes. These outcomes offer more specific metrics, that can be quantified or analyzed more easily than the big-picture ones. They also are more likely to occur — and if they do, can be an indicator that the big-picture outcomes might also occur.

Intermediate Access to Justice Outcomes to Measure

  • Avoidance of Bad Life Outcomes, like forced displacement, domestic violence, or collection activity.
  • Improved Legal Capability including legal knowledge, strategy, and confidence. 
  • More Participation, not defaulting, avoiding, or dropping out of their justice journey.
  • Increased Uptake of legal help services.
  • Lower Administrative Burden to participate, with lower costs & time to complete legal tasks.
  • Saved Money, so a person owes less than they otherwise would.
  • Higher Satisfaction with the outcomes, process, and service.
  • Correct application of law to the case, and use of legal rights, defenses, and claims.
  • Equitable use of services & participation in court, by different demographic groups.
  • Improved sense of belonging, reduction in social identity threats and sense of exclusion

How to Run Evaluations of Justice Innovations?

Possible Evaluation Protocols

How do you actually run an evaluation? What is the plan you can follow, to see if your legal help service is performing as you expect?

Find the protocol and instrument that fits your phase, and what you wish to understand.

  1. Live Survey: Ask real users, who are currently dealing with a justice problem, questions
  2. Lab Scenario Survey: Give participants scenarios & measure behavior
  3. Behavior Measuring/Costing: Measure sample users’ behavior, and then generalize to make cost assessments
  4. Benchmark Analysis: Assess the service using expert principles or a heuristics list. Or have experts, like a judge or court staff, evaluate the quality of a service or its output.
  5. Administrative Data: Tracking how people in real scenarios behaved, and what outcomes they experienced in case records or other agencies’ datasets
  6. Case File Evaluation: Looking in detail about what happened in a person’s court case, including their claims, decisions, process, and outcomes.
  7. Observation of behavior: to see what actions a person takes in court. This could be through a court watch or otherwise.
  8. Controlled Experiment: Assigning people to certain groups of receiving an intervention (or not). See what effect this intervention seems to have, by comparing the results of one group against the other. This could be a pre-/post-intervention study, in which you measure people’s outcomes in the time period before the pilot and after the pilot begins. Or it could be a randomized controlled study, in which people are randomly chosen to receive the intervention or not.

Phases of Evaluation

Evaluation is not just a thing for after a service, policy, or technology project is up and running. Evaluation can happen throughout the development and pilot of a new ‘innovation’. It can also be used for long-standing services, policies, and technology implementation to see how this existing thing is performing.

It’s useful to think of four different times to be evaluating. The first is early in the research & development cycle. The second phase is when a ‘live’ pilot of the thing is running for the first time. The third phase is the evaluation of an ‘established’ legal help service. And the fourth is an ongoing collection of feedback and performance data.

Phases when you may run a justice innovation evaluation

Use these evaluation methods in the design process & then for services as they are piloted and scaled.

  • Phase 1: Early-Stage Evaluation During the Design of a Legal Help Innovation: How do we know what kind of justice innovation is needed? Does our new idea for legal help work? How can we make the strongest version of a new idea?
  • Phase 2: Pilot Evaluation During the Early Deployment of a Legal Help Innovation: Is this thing working as intended? What wrong assumptions or choices were made in the design, that need to be fixed? What bug or performance issues must be improved? Does it increase people’s access to justice, on key indicators?
  • Phase 3: Evaluation of an Established Legal Help Service: Even if a clinic, website, policy, or other ‘intervention’ is well-established, it is still worth gathering feedback about whether it is making an impact & what ideas there are for improving it.
  • Phase 4: Ongoing Feedback about a Legal Help Service: Apart from evaluating the performance of a particular service, justice institutions can have regular feedback and data gathering from their clients and peers. This ongoing evaluation can deliver you information about what is changing, what is needed, and ideas for improvement.

Please explore these resources to find better ways to develop promising justice interventions, and to gather constant feedback on impact in order to improve the system.

Phase 1: Early Stage Evaluation

What are the methods we can use to understand if our new ideas, rough prototypes, or proposals are feasible, viable, and desirable for stakeholders and the system?

Your team may be developing a new app, service, policy, or rule meant to make the justice system better. How do you know if people are going to engage with it as you intend? And how do you ensure that it brings the value and impact that you intend it to?

Your team can use Early-Stage User Feedback tools. These include user-testing and also comparing your product to Benchmark standards that expert stakeholders have established.

Early Stage User Testing Methods

You can run these activities with your team, or in controlled ‘lab’ situations, or in the field. These methods allow you to rank ideas, choose which have the most potential, and decide which ideas you take to pilot. Ideally, you will be involving many different kinds of stakeholders in these sessions.

These early-stage evaluation methods allow for quick, affordable ways to screen ideas, and to refine them to be more likely to work with the target users.

Read more about these kinds of evaluation protocols at our User Testing page.

Benchmark Methods

As your team is developing a new innovation, you can also evaluate it by comparing it to sets of principles, guidance, and best practices that expert stakeholders have created.

These benchmarks can help ensure that you are scrutinizing your possible innovation, to reduce possible harms and increase its likelihood of success.

They help you screen your idea, prototype, and final proposal to make sure that it does not fall into failures that past projects have experienced.

Phase 2-3: Evaluations of Intervention in the Field

Overviews of Field-Based Impact Evaluation Protocols

Some groups, like the World Bank and the UK government, have assembled handbooks that collect many different instruments that groups can use to evaluate the impact of their policies and programs.

These texts and slide decks are useful field guides to evaluating new policies, services, and other interventions in the field.

Impact Evaluation in Practice

This free training book from the World Bank presents strategies and tools to evaluate programs in the field.

UK Magenta Book on evaluation

This set of resources from the UK Government, called the ‘Magenta Book’ and connected slide decks and appendices, goes through how to evaluate policies in practice.


Just like in Phase 1 evaluation, benchmark tools can also be used in Phase 2 or 3. These same lists of principles, best practices, and heuristics can be used to evaluate pilots and existing programs.

Explore Benchmark tools that can help you evaluate legal documents, technology, forms, and more.

Administrative Burden Costing

As part of the evaluation of forms, websites, and other legal help tools, many of these assessments can be grouped together to establish its. Administrative Burden. Administrative burden costing of time and expenses can help put a quantitative analysis of how difficult it might be to use a form, tool, or service while trying to resolve a legal problem.

Many federal agencies are required to do this Burden Cost evaluation whenever they make changes to procedures involved in users’ interactions with social security, taxes, or disability processes, or who can access food stamps. It’s required at the federal level by the Paperwork Reduction Act, which requires an agency to measure the effect of a new procedure change by looking at:

  • The time it takes for an average person to fill in the given form or do the required task
  • The time it takes to prepare the documents or get the information to correctly fill in the form/do the task
  • Assume that this costs a person $15/hour 
  • Calculate the number of people who will have to go through this on average

This basic calculation will allow you to produce a numeric amount of how much this form or step costs ‘the public’ :

( (Time to fill + prep)$15/hour) )# of people doing this = Burden Cost

Having Burden Costs — or comparing them across different proposed forms — can be a very influential way to pressure policy-makers or support an argument around process simplification. In the access to justice space, you could be do Burden Cost calculations that include:

  • Time to search for and find the correct form
  • Time to look up and understand the words being used in the form
    • (Optional: time to get help at Self Help Center, including waiting in line and being seen)
    • (Optional: time to call legal aid, get screened, see if they can help you, be helped)
  • Time to read the form and fill in the questions
  • Time to prep/make copies/ get ready for filing
  • Time to file it in the courthouse
  • Time to deal with any problems with filing

You can calculate these time costs by doing these steps yourselves, or having research participants do some/all of these. It could also be done by gathering data from experts with data or informed estimates of these timings.

Randomized Controlled Trials

A Randomized Controlled Trial (RCT for short) is an empirically rigorous way to determine if a thing you are doing (let’s call it an ‘intervention’) is having the effect you intended it to. It involves careful planning of what exactly you are testing — with identification of the ‘variables’ and the ‘conditions’, and having multiple testing groups with different variations of these variables.

The goal is to be able to have a group that has used the intervention you’re testing the efficacy of, and a similar group that has not used it — the ‘control’ group. Then you can collect data that will more clearly demonstrate whether the group with the intervention has noticeably different results than the control group.

Read more about RCTs and examples of how to run them here, at BetterEvaluation.

Phase 4: Ongoing Evaluation & Feedback

Exit rating

A quick way to get user feedback on an experience, service, or product, is to ask them to use a very simple rating on their ‘exit’. It can be on a text message line, on a tablet (for an in-person service), on a browser window (for a web-based service), or a paper sheet (again for in-person).

Ideally, it will be a very quick and visual interface, that lets the user quickly put a rating on what they’ve just experienced.

Follow-Up Surveys

Many courts, legal aid groups, and experts have created exit surveys for when a person has just concluded their justice journey. These exit surveys can gather important feedback on the quality of their experience, their outcomes, and their ideas for improvement.

Sample exit survey from California Courts

California Courts. “Customer Satisfaction Survey.” https://www.courts.ca.gov/partners/documents/customersatisfactionsurvey.pdf.

Greacen, John M. “Trial Court Research and Improvement Consortium Executive Program Assessment Tool: Assistance to Self-Represented Litigants Revised Draft,” 2005. https://www.srln.org/node/43/trial-court-research-and-improvement-consortium-tcric-self-help-program-assessment-tool-2005.

Legal Aid Society of Cleveland, and Michigan Legal Help. “Texting for Outcomes Toolkit,” 2020. https://www.lsntap.org/sites/lsntap.org/files/Texting for Outcomes Toolkit %2810.18.2021%29 Final w App.pdf

LaGratta Consulting. “COURT VOICES PROJECT Using Court User Feedback to Guide Courts’ Pandemic Responses,” August 2022. www.lagratta.com/court-voices-project-user-feedback.

Focus Groups & Design Workshops

Courts and legal aid groups can also gather ongoing feedback through interactive, qualitative sessions with past users and community members. These kinds of deep-dive sessions can help groups get in-depth information about what works, what doesn’t, and what new opportunities exist.

Listen > Learn > Lead report

This report from IAALS and collaborators from university design labs profiles how to involve court users and other stakeholders in deep, qualitative feedback sessions about systemic change.

Institute for the Advancement of the American Legal System, Margaret Hagan, Dan Jackson, and Lois Lupica. “Listen> Learn> Lead: A Guide to Improving Court Services through User-Centered Design.” Denver, https://iaals.du.edu/publications/listen-learn-lead