What works in increasing access to justice?

Evaluation methods can help us create more effective legal help interventions. And they can make sure that the intervention is working as intended.

On this page, you can find resources, experiments, and case studies on user testing, pilot evaluation, and other outcomes research on justice innovations.

Table of Contents

Why Evaluate a Justice Innovation?

Your team may have several different motives to evaluate legal help services. You could be required to run an evaluation by a funder or other group. Or perhaps you are interested in how your work is impacting people.

Our team at Stanford Legal Design Lab is interested in improving how our community evaluates legal help services. From our review of practices and the literature, we have found:

There are many metrics and instruments in use, but they are often disparate and divided. Even if many overlap, they’re not used consistently across jurisdictions, policy areas, or intervention areas. Many groups might share their research protocols at conferences, in papers, or with online reports, but there is no coordinated discussion about which ones should be used and scaled.

Many of the metrics and instruments aren’t used practically. In on-the-ground civil justice settings, like courts, legal aid groups, and nonprofit offices, there’s no awareness of what metrics to be tracking or instruments to be using. That means when groups are trying to measure the status quo or evaluate a pilot, they often rely on their own instincts or a consultant to devise an evaluation plan. Or they might forgo an evaluation plan, and rely on anecdotes or casual observation to decide what works.

Ideally, better metrics and instruments could drive positive service innovations in the civil justice system. Evaluation can make stakeholders more aware of the impact of their current legal services and people’s justice needs, and have stronger knowledge about what works to make a positive impact. If we have well-defined, usable metrics and instruments, this can help spotlight if and how legal service providers are having positive impacts on key outcomes that matter to them, their funders, and their clients. These could ensure there is consistent attention being paid to what impact and outcomes result from the system. This can ensure services, funding, and strategies are allocated to the most effective initiatives that promote access to justice.

Better metrics and instruments can also lead to system change, with these new indicators changing the perspective of leaders in civil justice. Metrics can be the “key performance indicators” (KPIs) that are at the heart of long-term planning, education, and agenda-setting in these organizations. The more that something is measured, the more it can become the basis for organizational change management that puts that thing as a central priority. If there is more access to justice measurement, then it might become a KPI for courts and government agencies.

What Justice Innovations & Legal Services Can you Evaluate?

When we talk about improving the justice system, we could be talking about direct-to-people services, that are meant to help them with information, advice, counseling, and other services. We call these “Legal Help Services”.

We could also be talking about justice innovations that are more abstract, and involve changing the legal system’s rules, options, and branding. This is a different category of things, that need to be measured with different kinds of outcomes and instruments.

On this page, we mainly discuss evaluation mechanisms mainly for the first category, Legal Help Services.

What is a legal help service? On this page, we mean it to cover things that go directly to a person with a legal problem. These things aim to help the person make progress on their justice journey of resolving this problem.

Here are some examples of what a legal help service intervention might include:

  • Official communications, like court summons, government forms, legal aid explainers, or text message reminders
  • Legal help information service, like a website, guide book, FAQ, or chatbot. These tend not to be staffed by humans or customized to the person with the problem. They might be interactive but are mainly about connecting people with non-customized information about their legal options, services, and rights.
  • Brief tech help service, like a document assembly tool, a chatbot, or another thing that helps a person fill in a form, file it with the courts, sign up for a program, or schedule an appointment. It might even help them resolve the problem, by going through a screener or dispute resolution program.
  • Brief human help service, like a hotline, clinic, navigator session, self-help center session, or other interaction in which a professional with some legal expertise gives the person help with a task (like filling in a form or writing a letter) or gives them advice about what their options and rights are.
  • Ongoing human help service, where a person gets continual, deep-dive support with their legal problem from an expert team of lawyers, paralegals, and other professionals. The team helps them through all of the tasks and decisions they must do to resolve their problem.

System-level Policy Interventions

This website does not yet discuss the evaluation of system-level interventions. This category of interventions also aims for the same high-level, big-picture outcomes. But the nature of these interventions means that they are often more difficult to assess through user testing, surveys, or other evaluation protocols we would use for a legal help service.

These abstract, system-level interactions include:

  • A right, defense, or counterclaim
  • A court rule or business process
  • Legislation, ordinance, or other similar act of a legislature
  • Funding source or strategy
  • A Messaging campaign, that aims to expand brand awareness or improve outreach

These kinds of interventions can be very valuable, but they are of a different nature than a legal help service. They are not about providing a legal help service to an individual — they are about changing the system itself. More discussion is needed for each of these intervention types, to determine how to evaluate their performance and impact.

What Metrics to Evaluate With?

When your team is evaluating a justice innovation, what are the indicators that a technology, service, or policy is succeeding? What are the metrics that you should be evaluating for?

Based on our team’s literature review, we have found a few categories of metrics that the justice community has prioritized as the most important to measure.

High-Level, Big-Picture Outcomes to measure

Ideally, the justice system (including court, legal aid, and private services) will help people live better lives, with conflicts resolved, and with more stability and security. If we wanted to evaluate legal help services at the biggest picture, then some of the evaluation metrics include the following four categories.

4 Broad Outcome Areas For Access to Justice

Outcome Area 1: Better Social Stability and Prevention of Poverty

These metrics focus on the medium and long-term outcomes experienced by the person with the justice problem. A positive outcome would be that a person’s (and family’s) life is more stable, with stable housing, finances, and family relationships. A small life problem has not spiraled into a larger set of problems that risks pushing them into instability and poverty. A good justice experience and set of services will help stabilize a person’s life, resolve the conflict they’re experiencing, and make their life better.

Outcome Area 2: Access to the Civil Justice System

This area of metrics focuses on a person’s ability to use the courts, legal aid, and other public justice services if they wish to. A positive outcome would be that as they are going through their justice journey, they do not encounter barriers that make it too difficult, costly, or otherwise burdensome to use the justice system. Another positive outcome would be that they have a quality of choice to decide what actions to take, and how to use (or not use) the justice institutions to resolve their problem. The equity of access is also a key part of this outcome area. All demographic groups should be able to access the civil justice system in a low-burden way.

Outcome Area 3: Improved Substantive Justice

In this area of metrics, the focus is on the application of the law to a person’s situation. A positive outcome would be that the person enjoys their full set of rights, defenses, and claims within the legal regime of their jurisdiction. Their conflict is resolved in a just manner because the law is applied correctly. Also, they know about their rights, their defenses, and legal procedure enough to raise all claims that they choose to. They are able to use the law to advocate for themselves, and the order or settlement is a just resolution for themselves and the other party.

Outcome Area 4: Improved Procedural Justice

This area of metrics looks at a person’s experience of the justice system. A positive outcome would be that a person feels dignified, respected, and included in the justice system. They would feel that the system is transparent and fair. During and after their justice journey, they understand what is happening, feel included in the proceedings, and respect that the system works — and that the outcome should be followed. If they have a legal problem in the future, they would use the justice system. Or they would recommend that a neighbor or a family member would use it.

Intermediate, Specific Outcomes to measure

When looking at the big-picture, high-level outcomes, it can be hard to figure out if these are being achieved or not. How can your team know if people have access to justice, or improved procedural justice, or better social outcomes? How can you tell if a service is improving one or all of these areas?

Also, it can be hard for one service, policy, or technology to improve these big-picture outcomes. A justice problem (like an eviction, custody dispute, guardianship, or debt lawsuit) is often a wicked problem, that is not easily resolved and has many factors contributing to it. If your team only measured a new intervention based on big-picture outcomes, it would be hard to see direct, immediate success.

Rather, it can be useful to measure more intermediate outcomes. These outcomes offer more specific metrics, that can be quantified or analyzed more easily than the big-picture ones. They also are more likely to occur — and if they do, can be an indicator that the big-picture outcomes might also occur.

Intermediate Access to Justice Outcomes to Measure

  • Avoidance of bad outcomes, like forced displacement, domestic violence, or collection activity.
  • Improved Legal Capability including legal knowledge, strategy, and confidence. 
  • More Participation, not defaulting, avoiding, or dropping out of their justice journey.
  • Increased Uptake of legal help services.
  • Lower Administrative Burden to participate, with lower costs & time to complete legal tasks.
  • Saved Money, so a person owes less than they otherwise would.
  • Higher Satisfaction with the outcomes, process, and service.
  • Correct application of law to the case, and use of legal rights, defenses, and claims.
  • Equitable use of services & participation in court, by different demographic groups.
  • Improved sense of belonging, reduction in social identity threats and sense of exclusion

How to Run Evaluations of Justice Innovations?

Possible Evaluation Protocols

How do you actually run an evaluation? What is the plan you can follow, to see if your legal help service is performing as you expect?

Find the protocol and instrument that fits your phase, and what you wish to understand.

  1. Live Survey: Ask real users, who are currently dealing with a justice problem, questions
  2. Lab Scenario Survey: Give participants scenarios & measure behavior
  3. Behavior Measuring/Costing: Measure sample users’ behavior, and then generalize to make cost assessments
  4. Benchmark Analysis: Assess the service using expert principles or a heuristics list. Or have experts, like a judge or court staff, evaluate the quality of a service or its output.
  5. Administrative Data: Tracking how people in real scenarios behaved, and what outcomes they experienced in case record or other agencies
  6. Case File Evaluation: Looking in detail about what happened in a person’s court case, including their claims, decisions, process, and outcomes.
  7. Observation of behavior: to see what actions a person takes in court. This could be through a court watch or otherwise.
  8. Controlled Experiment: Assigning people to certain groups of receiving an intervention (or not). See what effect this intervention seems to have, by comparing the results of one group against the other. This could be a pre-/post-intervention study, in which you measure people’s outcomes in the time period before the pilot and after the pilot begins. Or it could be a randomized controlled study, in which people are randomly chosen to receive the intervention or not.

Phases of Evaluation

Evaluation is not just a thing for after a service, policy, or technology project is up and running. Evaluation can happen throughout the development and pilot of a new ‘innovation’. It can also be used for long-standing services, policies, and technology implementation to see how this existing thing is performing.

It’s useful to think of four different times to be evaluating. The first is early in the research & development cycle. The second is phase is when a ‘live’ pilot of the thing is running for the first time. The third phase is the evaluation of an ‘established’ legal help service. And the fourth is an ongoing collection of feedback and performance data.

Phases when you may run a justice innovation evaluation

Use these evaluation methods in the design process & then for services as they are piloted and scaled.

  • Phase 1: Early-Stage Evaluation During the Design of a Legal Help Innovation: How do we know what kind of justice innovation is needed? Does our new idea for legal help work? How can we make the strongest version of a new idea?
  • Phase 2: Pilot Evaluation During the Early Deployment of a Legal Help Innovation: Is this thing working as intended? What wrong assumptions or choices were made in the design, that need to be fixed? What bug or performance issues must be improved? Does it increase people’s access to justice, on key indicators?
  • Phase 3: Evaluation of an Established Legal Help Service: Even if a clinic, website, policy, or other ‘intervention’ is well-established, it is still worth gathering feedback about whether it is making an impact & what ideas there are for improving it.
  • Phase 4: Ongoing Feedback about a Legal Help Service: Apart from evaluating the performance of a particular service, justice institutions can have regular feedback and data gathering from their clients and peers. This ongoing evaluation can deliver you information about what is changing, what is needed, and ideas for improvement.

Please explore these resources to find better ways to develop promising justice interventions, and to gather constant feedback on impact in order to improve the system.

Phase 1: Early Stage Evaluation

What are the methods we can use to understand if our new ideas, rough prototypes, or proposals are feasible, viable, and desirable for stakeholders and the system?

Your team may be developing a new app, service, policy, or rule meant to make the justice system better. How do you know if people are going to engage with it as you intend? And how do you ensure that it brings the value and impact that you intend it to?

Your team can use Early-Stage User Feedback tools. These include user-testing and also comparing your product to Benchmark standards that expert stakeholders have established

Early Stage User Testing Methods

You can run these activities with your team, or in controlled ‘lab’ situations, or in the field. These methods allow you to rank ideas, choose which have the most potential, and decide which ideas you take to pilot. Ideally, you will be involving many different kinds of stakeholders in these sessions.

These early-stage evaluation methods allow for quick, affordable ways to screen ideas, and to refine them to be more likely to work with the target users.

Read more about these kinds of evaluation protocols at our User Testing page.

Benchmark Methods

As your team is developing a new innovation, you can also evaluate it by comparing it to sets of principles, guidance, and best practices that expert stakeholders have created.

These benchmarks can help ensure that you are scrutinizing your possible innovation, to reduce possible harms and increase its likelihood of success.

They help you screen your idea, prototype, and final proposal to make sure that it does not fall into failures that past projects have experienced.

Phase 2-3: Evaluations of Intervention in the Field

Overviews of Field-Based impact evaluation protocols

Some groups, like the World Bank and the UK government, have assembled handbooks that collect many different instruments that groups can use to evaluate the impact of their policies and programs.

These texts and slide decks are useful field guides to evaluating new policies, services, and other interventions in the field.

Impact Evaluation in Practice

This free training book from the World Bank presents strategies and tools to evaluate programs in the field.

UK Magenta Book on evaluation

This set of resources from the UK Government, called the ‘Magenta Book’ and connected slide decks and appendices, goes through how to evaluate policies in practice.

Benchmarking

Just like in Phase 1 evaluation, benchmark tools can also be used in Phase 2 or 3. These same lists of principles, best practices, and heuristics can be used to evaluate pilots and existing programs.

Explore Benchmark tools that can help you evaluate legal documents, technology, forms, and more.

Form & Tool Evaluation

Any form or tool that is meant to help a person through their justice journey can be evaluated with a combination of user-testing, data analytics, expert evaluation, and costing. This combo of measurements can determine if this form or technology tool is actually helping people do what it intends to, and in the most impactful and low-burden way.

For example, when evaluating a form, the evaluation process should use a combination of assessments that can measure a phase 1 of discovery and uptake, a phase 2 of usability and usefulness of the thing itself, and a phase 3 of effectiveness in getting the person towards a just resolution of their legal problem.

1. “Pre-Form” Uptake and Discovery Assessment

  • Discoverability and reach of the form. How many people who are likely to need this form are actually finding it? This can be evaluated through calculations of the expected audience with this legal problem or in need of this form/tool. Then compare this to analytics or administrative data numbers about the number of people who visit this form page or access it.
  • Usage rates of the form. Of those who find the form, how many actually engage with it – -and try to fill it in or file it? This can be tracked through websites’ analytics or in-person clinic visits, to see rates of bounces, drops, or incompletion.

2. Metrics around the “Form Itself”

  • Usability of the form as a way to enter in the necessary information. How many people actually complete the form? How many do it in the correct, intended way? This can be evaluated through user testing sessions, in which people attempt to use the form and then give qualitative rankings and quantitative assessments about usability.
  • Readability of the form questions. How easy is it for a person to understand what the form is asking for? This can be measured through readability scoring tools, and in-person evaluation.
  • Cost of filling in the form. How expensive or burdensome is it to fill in the form and supply all the required things necessary to complete it? Do you need to hire someone or go through a service-seeking journey to be able to fill it in? This can be evaluated through interviews with past users to understand their costs, as well as user testers to do this in a lab simulation.
  • Time to fill in the form. How long does it take to fill in? The time can be measured through website analytics of form tools, clinic and self-help center management records, or user tests in lab simulations.
  • Error rates in complete and comprehensive form info. How many times do people fill in information incorrectly, not supplying the data that the form was intended to gather? How many times do they choose things arbitrarily, not because they actually want to (like opting to raise all defenses and claims listed out)? This can be measured by lab simulations with test users, or by sampling forms that have been created or filed to evaluate them for errors or arbitrary responses.
  • Procedural Justice of the form. Does the person then feel the justice system is transparent, fair, and open to them? Or does the form experience make them feel the opposite? This can be measured with exit surveys of actual users of the form, or with lab simulations with test users.

3. Metrics around “Post-Form” Process & Decisions

  • Filing rates. How many of the completed forms are filed with the court? This can be measured by comparing analytics about form tool usage or form downloads, versus filing rates in court administrative data.
  • The acceptance rate of the form’s output. Of the filed forms, how many are accepted by the clerk and entered into the record? This can be measured with court administrative data about rejection rates.
  • Judicial usability of the form entries. When the judge and clerk are reviewing the case file, triaging it to the correct process, and making decisions about outcomes — is the information in the form usable and useful to them? This can be measured through benchmarking evaluation sessions with clerks and judge teams, in which they review a sample of filings to evaluate them based on criteria they might usually use as informal heuristics when reviewing a case. These sessions can help them formally identify the criteria that makes a filing usable and useful to them, which the team can then use to score other filings.
  • Substantive Justice outcomes. Do people who use the form better represent their case, claims, and evidence? Do they get the judicial time to spend more time on their case and take care in applying the law correctly? Do they raise more claims and defenses persuasively? Do they end up with judgments more often in their favor? Substantive justice outcomes can be measured by case file reviews, to see how many claims and defenses are raised, how many of them moved forward with serious consideration by the judicial decision-makers, and how many of them were ultimately decided in favor of the litigant. It might also be assessed with exit surveys with litigants about whether they felt their problem was resolved and they received a just outcome.

Administrative Burden Costing

As part of the evaluation of forms, websites, and other legal help tools, many of these assessments can be grouped together to establish its. Administrative Burden. Administrative burden costing of time and expenses can help put a quantitative analysis of how difficult it might be to use a form, tool, or service while trying to resolve a legal problem.

Many federal agencies are required to do this Burden Cost evaluation whenever they make changes to procedures involved in users’ interactions with social security, taxes, or disability processes, or who can access food stamps. It’s required at the federal level by the Paperwork Reduction Act, which requires an agency to measure the effect of a new procedure change by looking at:

  • The time it takes for an average person to fill in the given form or do the required task
  • The time it takes to prepare the documents or get the information to correctly fill in the form/do the task
  • Assume that this costs a person $15/hour 
  • Calculate the number of people who will have to go through this on average

This basic calculation will allow you to produce a numeric amount of how much this form or step costs ‘the public’ :

( (Time to fill + prep)$15/hour) )# of people doing this = Burden Cost

Having Burden Costs — or comparing them across different proposed forms — can be a very influential way to pressure policy-makers or support an argument around process simplification. In the access to justice space, you could be do Burden Cost calculations that include:

  • Time to search for and find the correct form
  • Time to look up and understand the words being used in the form
    • (Optional: time to get help at Self Help Center, including waiting in line and being seen)
    • (Optional: time to call legal aid, get screened, see if they can help you, be helped)
  • Time to read the form and fill in the questions
  • Time to prep/make copies/ get ready for filing
  • Time to file it in the courthouse
  • Time to deal with any problems with filing

You can calculate these time costs by doing these steps yourselves, or having research participants do some/all of these. It could also be done by gathering data from experts with data or informed estimates of these timings.

Randomized Controlled Trials

A Randomized Controlled Trial (RCT for short) is an empirically rigorous way to determine if a thing you are doing (let’s call it an ‘intervention’) is having the effect you intended it to. It involves careful planning of what exactly you are testing — with identification of the ‘variables’ and the ‘conditions’, and having multiple testing groups with different variations of these variables.

The goal is to be able to have a group that has used the intervention you’re testing the efficacy of, and a similar group that has not used it — the ‘control’ group. Then you can collect data that will more clearly demonstrate whether the group with the intervention has noticeably different results than the control group.

Read more about RCTs and examples of how to run them here, at BetterEvaluation.

Phase 4: Ongoing Evaluation & Feedback

Exit rating

A quick way to get user feedback on an experience, service, or product, is to ask them to use a very simple rating on their ‘exit’. It can be on a text message line, on a tablet (for an in-person service), on a browser window (for a web-based service), or a paper sheet (again for in-person).

Ideally, it will be a very quick and visual interface, that lets the user quickly put a rating on what they’ve just experienced.

Follow-Up Surveys

Many courts, legal aid groups, and experts have created exit surveys for when a person has just concluded their justice journey. These exit surveys can gather important feedback on the quality of their experience, their outcomes, and their ideas for improvement.

Sample exit survey from California Courts

California Courts. “Customer Satisfaction Survey.” https://www.courts.ca.gov/partners/documents/customersatisfactionsurvey.pdf.

Greacen, John M. “Trial Court Research and Improvement Consortium Executive Program Assessment Tool: Assistance to Self-Represented Litigants Revised Draft,” 2005. https://www.srln.org/node/43/trial-court-research-and-improvement-consortium-tcric-self-help-program-assessment-tool-2005.

Legal Aid Society of Cleveland, and Michigan Legal Help. “Texting for Outcomes Toolkit,” 2020. https://www.lsntap.org/sites/lsntap.org/files/Texting for Outcomes Toolkit %2810.18.2021%29 Final w App.pdf

LaGratta Consulting. “COURT VOICES PROJECT Using Court User Feedback to Guide Courts’ Pandemic Responses,” August 2022. www.lagratta.com/court-voices-project-user-feedback.

Focus Groups & Design Workshops

Courts and legal aid groups can also gather ongoing feedback through interactive, qualitative sessions with past users and community members. These kinds of deep-dive sessions can help groups get in-depth information about what works, what doesn’t, and what new opportunities exist.

Listen > Learn > Lead report

This report from IAALS and collaborators from university design labs profiles how to involve court users and other stakeholders in deep, qualitative feedback sessions about systemic change.

Institute for the Advancement of the American Legal System, Margaret Hagan, Dan Jackson, and Lois Lupica. “Listen> Learn> Lead: A Guide to Improving Court Services through User-Centered Design.” Denver, https://iaals.du.edu/publications/listen-learn-lead

MargaretEvaluation Methods of Justice Innovations