Categories
AI + Access to Justice Current Projects

Can LLMs help streamline legal aid intake?

Insights from Quinten Steenhuis at the AI + Access to Justice Research Seminar

Recently, the Stanford Legal Design Lab hosted its latest installment of the AI+Access to Justice Research Seminar, featuring a presentation from Quinten Steenhuis.

Quinten is a professor and innovator-in-residence at Suffolk Law School’s LIT Lab. He’s also a former housing attorney in Massachusetts who has made a significant impact with projects like Court Forms Online and MADE, a tool for automating eviction help. His group Lemma Legal works with groups on developing legal tech for interviews, forms, and documents.

His presentation in April 2025 focused on a project he’s been working on in collaboration with Hannes Westermann from the Maastricht Law & Tech Lab. This R&D project focuses on whether large language models (LLMs) are effective at tasks that might streamline intake in civil legal services. This work is being developed in partnership with Legal Aid of Eastern Missouri, along with other legal aid groups and funding from the U.S. Department of Housing and Urban Development.

The central question addressed was: Can LLMs help people get through the legal intake process faster and more accurately?

The Challenge: Efficient, Accurate Legal Aid Intake and Triage

For many people, legal aid is hard to access. That is in part because of the intake process, to apply for help from a local legal aid group. It can be time-consuming and frustrating for people to go through the current legal aid intake and triage process. Imagine calling a legal aid hotline, stressed out about a problem with your housing, family, finances, or job, only to wait on hold for an hour or more. When your call is finally answered, the intake worker needs to determine whether you qualify for help based on a complex and often open-textured set of rules. These rules vary significantly depending on jurisdiction, issue area, and individual circumstances — from citizenship and income requirements to more subjective judgments like whether a case is a “good” or “bad” fit for the program’s priorities or funding streams.

Intake protocols are typically documented internally for staff members in narrative guides, sometimes as long as seven pages, containing a mix of rules, sample scripts, timelines, and sub-rules that differ by zip code and issue type. These rules are rarely published online, as they can be too complex for users to interpret on their own. Legal aid programs may also worry about misinterpretation and inaccurate self-screening by clients. Instead, they keep these screening rules private to their staff.

Moreover, the intake process can involve up to 30+ rules about which cases to accept. These rules can vary between legal aid groups and can also change frequently (often in part because of funding that changes frequently). This “rules complexity” makes it hard for call center workers to provide consistent, accurate determinations about whose case will be accepted, leading to long wait times and inconsistent screening results. The challenge is to reduce the time legal aid workers spend screening without incorrectly denying services to those who qualify.

The Proposed Intervention: Integrating LLMs for Faster, Smarter Intake

To address this issue, Quinten, Hannes, and their partners have been exploring whether LLMs can help automate parts of the intake process. Specifically, they asked:

  • Can LLMs quickly determine whether someone qualifies for legal aid?
  • Can this system reduce the time spent on screening and make intake more efficient?

The solution they developed is part of the Missouri Tenant Help project, a hybrid system that combines rule-based questions with LLM-powered responses. The site’s intake system begins by asking straightforward, rules-based questions about citizenship, income, location, and problem description. It uses DocAssemble, a flexible platform that integrates Missouri-specific legal screening questions with central rules from Suffolk’s Court Forms Online for income limits and federal guidelines.

At one point in the intake workflow, the system prompts users to describe their problem in a free-text box. The LLM then analyzes the input, cross-referencing it with the legal aid group’s eligibility rules. If the system still lacks sufficient data, it generates follow-up questions in real-time, using a low-temperature model version to ensure consistent and cautious output.

For example, if a user says, “I got kicked out of my house,” the system might follow up with, “Did your landlord give you any formal notice or involve the court before evicting you?” The goal is to quickly assess whether the person might qualify for legal help while minimizing unnecessary back-and-forth. The LLM’s job is to identify the legal problem at issue, and then match this specific legal problem with the case types that legal aid groups around Missouri may take (or may not).

If the LLM works perfectly, it would be able to predict correctly whether a legal aid group is likely to take on this case, is likely to decline it, or if it is borderline.

The Experiment: Testing Different LLMs

To evaluate the system, the team conducted an experiment using 16 scenarios, 3 sets of legal aid program rules, and 8 different LLMs (including open-source, commercial, and popular models). The main question was whether the system could accurately match the “accept” or “reject” labels that legal experts had assigned to the scenarios.

The team found that the LLMs did a fairly accurate job at predicting which cases should be accepted or not. Overall, the LLMs correctly predicted acceptance or rejection with 84% precision, and GPT-4 Turbo performed the best.

Of particular interest were the rates of inaccurate predictions to reject a case. The system rarely made incorrect denials, which is critical for avoiding unjust exclusion from services. Rather, the LLM erred on the side of caution, often generating follow-up questions rather than making definitive, potentially incorrect judgments.

However, it sometimes asked for unnecessary follow-up information even when it already had enough data. This could mean that it led to a bad user experience, asking for too many redundant details and delaying making a decision. The problem was not around inaccuracy, though.

Challenges and Insights

One surprising result was that the LLMs sometimes caught errors made by human labelers. For example, in one case involving a support animal in Kansas City, the model correctly identified that a KC legal aid group was likely to accept this case, while the human reviewer mistakenly marked it as a likely denial. This underscores the potential of LLMs to enhance accuracy when paired with human oversight.

However, the LLMs also faced unique challenges.

  • Some models, like Gemini, refused to engage with topics related to domestic violence due to content moderation settings. This raised questions about whether AI developers understand the nuances of legal contexts. It also flagged the importance of screening possible models for use, depending on whether they censor legal topics.
  • The system also struggled with ambiguous scenarios, like evaluating whether “flimsy doors and missing locks” constituted a severe issue. Such situations highlighted the need for more tailored training and model configuration.

User Feedback and Next Steps

The system has been live for a month and a half and is currently offered as an optional self-screening tool on the Missouri Tenant Help website. Early feedback from legal aid partners has been positive, with high satisfaction ratings from users who tested the system. Some service providers noted they would like to see more follow-up questions to gather comprehensive details upfront — envisioning the LLM doing even more data-gathering, beyond what is needed to determine if a case is likely to be accepted or rejected.

In the future, the team aims to continue refinement and planning work, including to:

  1. Refine the LLM prompts and training data to better capture nuanced legal issues.
  2. Improve system accuracy by integrating rules-based reasoning with LLM flexibility.
  3. Explore more cost-effective models to keep the service affordable — currently around 5 cents per interaction.
  4. Enhance error handling by implementing model switching when a primary LLM fails to respond or disengages due to sensitive content.

Can LLMs and Humans Work Together?

This project exemplifies how LLMs and human experts can complement each other. Rather than fully automating intake, the system serves as a first-pass filter. It gives community members a quicker tool to get a high-level read on whether they are likely to get services from a legal aid group, or whether it would be better for them to pursue another service.

Rather than waiting for hours on a phone line, the user can choose to use this tool to get quicker feedback. They can still call the program — the system does not issue a rejection, but rather just gives them a prediction of what the legal aid will tell them.

The next phase will involve ongoing live testing and iterative improvements to balance speed, accuracy, and user experience.

The Future of Improving Legal Intake with AI

As legal aid programs increasingly look to AI and LLMs to streamline intake, several key opportunities and challenges are emerging.

1. Enhancing Accuracy and Contextual Understanding:

One promising avenue is the development of more nuanced models that can better interpret ambiguous or context-dependent situations. For instance, instead of flagging a potential denial based solely on rigid rule interpretations, the system could use context-aware prompts that take into account local regulations and specific case details. This might involve combining rule-based logic with adaptive LLM responses to better handle edge cases, like domestic violence scenarios or complex tenancy disputes.

2. Adaptive Model Switching:

Another promising approach is to implement a hybrid model system that dynamically switches between different LLMs depending on the context. For example, if a model like Gemini refuses to address sensitive topics, the system could automatically switch to a more legally knowledgeable model or one with fewer content moderation constraints. This could be facilitated by a router API that monitors for censorship or errors and adjusts the model in real time.

3. More Robust Fact Gathering:

A significant future goal is to enhance the system’s ability to collect comprehensive facts during intake. Legal aid workers noted that they often needed follow-up information after the initial screening, especially when the client’s problem involved specific housing issues or complex legal nuances. The next version of the system will focus on expanding the follow-up question logic to reduce the need for manual callbacks. This could involve developing predefined question trees for common issues while maintaining the model’s ability to generate context-specific follow-up questions.

4. Tailoring to Local Needs and Specific Use Cases:

One of the biggest challenges for scaling AI-based intake systems is ensuring that they are flexible enough to adapt to local legal nuances. The team is considering ways to contextualize the system for individual jurisdictions, potentially using open-source approaches to allow local legal aid programs to train their own versions. This could enable more customized intake systems that better reflect local policies, tenant protections, and court requirements.

5. Real-Time Human-AI Collaboration:

Looking further ahead, there is potential for building integrated systems where AI actively assists call center workers in real time. For instance, instead of having the AI conduct intake independently, it could listen to live calls and provide real-time suggestions to human operators, similar to how customer support chatbots assist agents. This would allow AI to augment rather than replace human judgment, helping to maintain quality control and legal accuracy.

6. Privacy and Ethical Considerations:

As these systems evolve, maintaining data privacy and ethical standards will be crucial. The current setup already segregates personal information from AI processing, but as models become more integrated into intake workflows, new strategies may be needed. Exploring privacy-preserving AI methods and data anonymization techniques will help maintain compliance while leveraging the full potential of LLMs.

7. Cost and Efficiency Optimization:

At the current cost of around 5 cents per interaction, the system remains relatively affordable, but as more users engage, maintaining cost efficiency will be key. The team plans to experiment with more affordable model versions and optimize the routing strategy to ensure that high-quality responses are delivered at a sustainable price. The goal is to make the intake process not just faster but also economically feasible for widespread adoption.

Building the Next Generation of Legal Aid Systems

Quinten’s presentation at AI + Access to Justice seminar made it clear that while LLMs hold tremendous potential for improving legal intake, human oversight and adaptive systems are crucial to ensure reliability and fairness. The current system’s success — 84% precision, minimal false denials, and positive user feedback — shows that AI-human collaboration is not only possible but also promising.

As the team continues to refine the system, they aim to create a model that can balance efficiency with accuracy, while being adaptable to the diverse and dynamic needs of legal aid programs. The long-term vision is to develop a scalable, open-source tool that local programs can fine-tune and deploy independently, making access to legal support faster and more reliable for those who need it most.

Read the research article in detail here.

See more at Quinten’s group Lemma Legal: https://lemmalegal.com/

Read more about Hannes at Maastricht University: https://cris.maastrichtuniversity.nl/en/persons/hannes-westermann

Categories
AI + Access to Justice Current Projects

Justice AI Co-Pilots

The Stanford Legal Design Lab is proud to announce a new initiative funded by the Gates Foundation that aims to bring the power of artificial intelligence (AI) into the hands of legal aid professionals. With this new project, we’re building and testing AI systems—what we’re calling “AI co-pilots”—to support legal aid attorneys and staff in two of the most urgent areas of civil justice: eviction defense and reentry debt mitigation.

This work continues our Lab’s mission to design and deploy innovative, human-centered solutions that expand access to justice, especially for those who face systemic barriers to legal support.

A Justice Gap That Demands Innovation

Across the United States, millions of people face high-stakes legal problems without any legal representation. Eviction cases and post-incarceration debt are two such areas, where legal complexity meets chronic underrepresentation—leading to outcomes that can reinforce poverty, destabilize families, and erode trust in the justice system.

Legal aid organizations are often the only line of defense for people navigating these challenges, but these nonprofits are severely under-resourced. These organizations are on the front lines of help, but often are stretched thin with staffing, tech, and resources.

The Project: Building AI Co-Pilots for Legal Aid Workflows

In collaboration with two outstanding legal aid partners—Legal Aid Foundation of Los Angeles (LAFLA) and Legal Aid Services of Oklahoma (LASO)—we are designing and piloting four AI co-pilot prototypes: two for eviction defense, and two for reentry debt mitigation.

These AI tools will be developed to assist legal aid professionals with tasks such as:

  • Screening and intake
  • Issue spotting and triage
  • Drafting legal documents
  • Preparing litigation strategies
  • Interpreting complex legal rules

Rather than replacing human judgment, these tools are meant to augment legal professionals’ work. The aim is to free up time for higher-value legal advocacy, enable legal teams to take on more clients, and help non-expert legal professionals assist in more specialized areas.

The goal is to use a deliberate, human-centered process to first identify low-risk, high-impact tasks for AI to do in legal teams’ workflows, and then to develop, test, pilot, and evaluate new AI solutions that can offer safe, meaningful improvements to legal service delivery & people’s social outcomes.

Why Eviction and Reentry Debt?

These two areas were chosen because of their widespread and devastating impacts on people’s housing, financial stability, and long-term well-being.

Eviction Defense

Over 3 million eviction lawsuits are filed each year in the U.S., with the vast majority of tenants going unrepresented. Without legal advocacy, many tenants are unaware of their rights or defenses. It’s also hard to fill in the many complicated legal documents required to participate in they system, protect one’s rights, and avoid a default judgment. This makes it difficult to negotiate with landlords, comply with court requirements, and protect one’s housing and money.

Evictions often happen in a matter of weeks, and with a confusing mix of local and state laws, it can be hard for even experienced attorneys to respond quickly. The AI co-pilots developed through this project will help legal aid staff navigate these rules and prepare more efficiently—so they can support more tenants, faster.

Reentry Debt

When people return home after incarceration, they often face legal financial obligations that can include court fines, restitution, supervision fees, and other penalties. This kind of debt can make it hard for a person to get to stability with housing, employment, driver’s licenses, and family.

According to the Brennan Center for Justice, over 10 million Americans owe more than $50 billion in reentry-related legal debt. Yet there are few tools to help people navigate, reduce, or resolve these obligations. By working with LASO, we aim to prototype tools that can help legal professionals advise clients on debt relief options, identify eligibility for fee waivers, and support court filings.

What Will the AI Co-Pilots Actually Do?

Each AI co-pilot will be designed for real use in legal aid organizations. They’ll be integrated into existing workflows and tailored to the needs of specific roles—like intake specialists, paralegals, or staff attorneys. Examples of potential functionality include:

  • Summarizing client narratives and flagging relevant legal issues
  • Filling in common forms and templates based on structured data
  • Recommending next steps based on jurisdictional rules and case data
  • Generating interview questions for follow-up conversations
  • Cross-referencing legal codes with case facts

The design process will be collaborative and iterative, involving continuous feedback from attorneys, advocates, and technologists. We will pilot and evaluate each tool rigorously to ensure its effectiveness, usability, and alignment with legal ethics.

Spreading the Impact

While the immediate goal is to support LAFLA and LASO, we are designing the project with national impact in mind. Our team plans to publish:

  • Open-source protocols and sample workflows
  • Evaluation reports and case studies
  • Responsible use guidelines for AI in legal aid
  • Collaboration pathways with legal tech vendors

This way, other legal aid organizations can replicate and adapt the tools to their own contexts—amplifying the reach of the project across the U.S.

“There’s a lot of curiosity in the legal aid field about AI—but very few live examples to learn from,” Hagan said. “We hope this project can be one of those examples, and help the field move toward thoughtful, responsible adoption.”

Responsible AI in Legal Services

At the Legal Design Lab, we know that AI is not a silver bullet. Tools must be designed thoughtfully, with attention to risks, biases, data privacy, and unintended consequences.

This project is part of our broader commitment to responsible AI development. That means:

  • Using human-centered design
  • Maintaining transparency in how tools work and make suggestions
  • Prioritizing data privacy and user control
  • Ensuring that tools do not replace human judgment in critical decisions

Our team will work closely with our legal aid partners, domain experts, and the communities served to ensure that these tools are safe, equitable, and truly helpful.

Looking Ahead

Over the next two years, we’ll be building, testing, and refining our AI co-pilots—and sharing what we learn along the way. We’ll also be connecting with national networks of eviction defense and reentry lawyers to explore broader deployment and partnerships.

If you’re interested in learning more, getting involved, or following along with project updates, sign up for our newsletter or follow the Lab on social media.

We’re grateful to the Gates Foundation for their support, and to our partners at LAFLA and LASO for their leadership, creativity, and deep dedication to the clients they serve.

Together, we hope to demonstrate how AI can be used responsibly to strengthen—not replace—the critical human work of legal aid.

Categories
AI + Access to Justice Current Projects

ICAIL workshop on AI & Access to Justice

The Legal Design Lab is excited to co-organize a new workshop at the International Conference on Artificial Intelligence and Law (ICAIL 2025):

AI for Access to Justice (AI4A2J@ICAIL 2025)
📍 Where? Northwestern University, Chicago, Illinois, USA
🗓 When? June 20, 2025 (Hybrid – in-person and virtual participation available)
📄 Submission Deadline: May 4, 2025
📬 Acceptance Notification: May 18, 2025

Submit a paper here https://easychair.org/cfp/AI4A2JICAIL25

This workshop brings together researchers, technologists, legal aid practitioners, court leaders, policymakers, and interdisciplinary collaborators to explore the potential and pitfalls of using artificial intelligence (AI) to expand access to justice (A2J). It is part of the larger ICAIL 2025 conference, the leading international forum for AI and law research, hosted this year at Northwestern University in Chicago.


Why this workshop?

Legal systems around the world are struggling to meet people’s needs—especially in housing, immigration, debt, and family law. AI tools are increasingly being tested and deployed to address these gaps: from chatbots and form fillers to triage systems and legal document classifiers. Yet these innovations also raise serious questions around risk, bias, transparency, equity, and governance.

This workshop will serve as a venue to:

  • Share and critically assess emerging work on AI-powered legal tools
  • Discuss design, deployment, and evaluation of AI systems in real-world legal contexts
  • Learn from cross-disciplinary perspectives to better guide responsible innovation in justice systems


What are we looking for?

We welcome submissions from a wide range of contributors—academic researchers, practitioners, students, community technologists, court innovators, and more.

We’re seeking:

  • Research papers on AI and A2J
  • Case studies of AI tools used in courts, legal aid, or nonprofit contexts
  • Design proposals or system demos
  • Critical perspectives on the ethics, policy, and governance of AI for justice
  • Evaluation frameworks for AI used in legal services
  • Collaborative, interdisciplinary, or community-centered work

Topics might include (but are not limited to):

  • Legal intake and triage using large language models (LLMs)
  • AI-guided form completion and document assembly
  • Language access and plain language tools powered by AI
  • Risk scoring and case prioritization
  • Participatory design and co-creation with affected communities
  • Bias detection and mitigation in legal AI systems
  • Evaluation methods for LLMs in legal services
  • Open-source or public-interest AI tools

We welcome both completed projects and works-in-progress. Our goal is to foster a diverse conversation that supports learning, experimentation, and critical thinking across the access to justice ecosystem.


Workshop Format

The workshop will be held on June 20, 2025 in hybrid format—with both in-person sessions in Chicago, Illinois and the option for virtual participation. Presenters and attendees are welcome to join from anywhere.


Workshop Committee

  • Hannes Westermann, Maastricht University Faculty of Law
  • Jaromír Savelka, Carnegie Mellon University
  • Marc Lauritsen, Capstone Practice Systems
  • Margaret Hagan, Stanford Law School, Legal Design Lab
  • Quinten Steenhuis, Suffolk University Law School


Submit Your Work

For full submission guidelines, visit the official workshop site:
https://suffolklitlab.org/ai-for-access-to-justice-at-the-international-conference-on-ai-and-law-2025-ai4a2j-icail25/

Submit your paper at EasyChair here.

Submissions are due by May 4, 2025.
Notifications of acceptance will be sent by May 18, 2025.


We’re thrilled to help convene this conversation on the future of AI and justice—and we hope to see your ideas included. Please spread the word to others in your network who are building, researching, or questioning the role of AI in the justice system.

Categories
AI + Access to Justice Current Projects

Measuring What Matters: A Quality Rubric for Legal AI Answers

by Margaret Hagan, Executive Director of the Legal Design Lab

Measuring What Matters: A Quality Rubric for Legal AI Answers

As more people turn to AI for legal advice, a pressing issue emerges: How do we know whether AI-generated legal answers are actually helpful? While legal professionals and regulators may have instincts about good and bad answers, there has been no clear, standardized way to evaluate AI’s performance in this space — until now.

What makes a good answer on a chatbot, clinic, livechat, or LLM site?

My paper for the JURIX 2024 conference, Measuring What Matters: Developing Human-Centered Legal Q-and-A Quality Standards through Multi-Stakeholder Research, tackles this challenge head-on. Through a series of empirical studies, the paper develops a human-centered framework for evaluating AI-generated legal answers, ensuring that quality benchmarks align with what actually helps people facing legal problems. The findings provide valuable guidance for legal aid organizations, product developers, and policymakers who are shaping the future of AI-driven legal assistance.

Why Quality Standards for AI Legal Help Matter

When people receive a legal notice — like an eviction warning or a debt collection letter — they often turn to the internet for guidance. Platforms such as Reddit’s r/legaladvice, free legal aid websites, and now AI chatbots have become common sources of legal information. However, the reliability and usefulness of these answers vary widely.

AI’s increasing role in legal Q&A raises serious questions:

  • Are AI-generated answers accurate and actionable?
  • Do they actually help users solve legal problems?
  • Could they mislead people, causing harm rather than good?

My research addresses these concerns by involving multiple stakeholders — end users, legal experts, and technologists — to define what makes a legal answer “good.”

The paper reveals several surprising insights about what actually matters when evaluating AI’s performance in legal Q&A. Here are some key takeaways that challenge conventional assumptions:

1. Accuracy Alone Isn’t Enough — Actionability Matters More

One of the biggest surprises is that accuracy is necessary but not sufficient. While many evaluations of legal AI focus on whether an answer is legally correct, the study finds that what really helps people is whether the answer provides clear, actionable steps. A technically accurate response that doesn’t tell someone what to do next is not as valuable as a slightly less precise but highly actionable answer.

Example of accuracy that is not helpful to user’s outcome:

  • AI says: “Your landlord is violating tenant laws in your state.” (Accurate but vague)
  • AI says: “You should file a response within a short time period — often 7 days. (Though this 7 days may be different depending on your exact situation.) Here’s a link to your county’s tenant protection forms and a local legal aid service.” (Actionable and useful)

2. Accurate Information Is Not Always Good for the User

The study highlights that some legal rights exist on paper but can be risky to exercise in practice — especially without proper guidance. For example, withholding rent is a legal remedy in many states if a landlord fails to make necessary repairs. However, in reality, exercising this right can backfire:

  • Many landlords retaliate by starting eviction proceedings.
  • The tenant may misapply the law, thinking they qualify when they don’t.
  • Even when legally justified, withholding rent can lead to court battles that tenants often lose if they don’t follow strict procedural steps.

This is a case where AI-generated legal advice could be technically accurate but still harmful if it doesn’t include risk disclosures. The study suggests that high-risk legal actions should always come with clear warnings about potential consequences. Instead of simply stating, “You have the right to withhold rent,” a high-quality AI response should add:

  • “Withholding rent is legally allowed in some cases, but it carries huge risks, including eviction. It’s very hard to withhold rent correctly. Reach out to this tenants’ rights organization before trying to do it on your own.”

This principle applies to other “paper rights” too — such as recording police interactions, filing complaints against employers, or disputing debts — where following the law technically might expose a person to serious retaliation or legal consequences.

Legal answers should not just state rights but also warn about practical risks — helping users make informed, strategic decisions rather than leading them into legal traps.

3. Legal Citations Aren’t That Valuable for Users

Legal experts often assume that providing citations to statutes and case law is crucial for credibility. However, both users and experts in the study ranked citations as a lower-priority feature. Most users don’t actually read or use legal citations — instead, they prefer practical, easy-to-understand guidance.

However, citations do help in one way: they allow users to verify information and use it as leverage in disputes (e.g., showing a landlord they know their rights). The best AI responses include citations sparingly and with context, rather than overwhelming users with legal references.

4. Overly Cautious Warnings Can Be Harmful

Many AI systems include disclaimers like “Consult a lawyer before taking any action.” While this seems responsible, the study found that excessive warnings can discourage people from acting at all.

Since most people seeking legal help online don’t have access to a lawyer, AI responses should avoid paralyzing users with fear and instead guide them toward steps they can take on their own — such as contacting free legal aid or filing paperwork themselves.

5. Misleading Answers Are More Dangerous Than Completely Wrong Ones

AI-generated legal answers that contain partial truths or misrepresentations are actually more dangerous than completely wrong ones. Users tend to trust AI responses by default, so if an answer sounds authoritative but gets key details wrong (like deadlines or filing procedures), it can lead to serious harm (e.g., missing a legal deadline).

The study found that the most harmful AI errors were related to procedural law — things like incorrect filing deadlines, court names, or legal steps. Even small errors in these areas can cause major problems for users.

6. The Best AI Answers Function Like a “Legal GPS”

Rather than replacing lawyers, users want AI to act like a smart navigation system — helping them spot legal issues, identify paths forward, and get to the right help. The most helpful answers do this by:

  • Quickly diagnosing the problem (understanding what the user is asking about).
  • Giving step-by-step guidance (telling the user exactly what to do next).
  • Providing links to relevant forms and local services (so users can act on the advice).

Instead of just stating the law, AI should orient users, give them confidence, and point them toward useful actions — even if that means simplifying some details to keep them engaged.

AI’s Role in Legal Help Is About Empowerment, Not Just Information

The research challenges the idea that AI legal help should be measured only by how well it mimics a lawyer’s expertise. Instead, the most effective AI legal Q&A focuses on empowering users with clear, actionable, and localized guidance — helping them take meaningful steps rather than just providing abstract legal knowledge.

Key Takeaways for Legal Aid, AI Developers, and Policymakers

The paper’s findings offer important lessons for different stakeholders in the legal AI ecosystem.

1. Legal Aid Organizations: Ensuring AI Helps, Not Hurts

Legal aid groups may increasingly rely on AI to extend their reach, but they must be cautious about its limitations. The research highlights that users want AI tools that:

  • Provide clear, step-by-step guidance on what to do next.
  • Offer jurisdiction-specific advice rather than generic legal principles.
  • Refer users to real-world resources, such as legal aid offices or court forms.
  • Are easy to read and understand, avoiding legal jargon.

Legal aid groups should ensure that the AI tools they deploy adhere to these quality benchmarks. Otherwise, users may receive vague, confusing, or even misleading responses that could worsen their legal situations.

2. AI Product Developers: Building Legal AI Responsibly & Knowing Justice Use Cases

AI developers must recognize that accuracy alone is not enough. The paper identifies four key criteria for evaluating the quality of AI legal answers:

  1. Accuracy — Does the answer provide correct legal information? And when legal information is accurate but high-risk, does it tell people about rights and options with sufficient context?
  2. Actionability — Does it offer concrete steps that the user can take?
  3. Empowerment — Does it help users feel capable of handling their problem?
  4. Strategic Caution — Does it avoid causing unnecessary fear or discouraging action?

One surprising insight is that legal citations — often seen as a hallmark of credibility — are not as critical as actionability. Users care less about legal precedents and more about what they can do next. Developers should focus on designing AI responses that prioritize usability over technical legal accuracy alone.

3. Policymakers: Regulating AI for Consumer Protection & Outcomes

For regulators, the study underscores the need for clear, enforceable quality standards for AI-generated legal guidance. Without such standards, AI-generated legal help may range from extremely useful to dangerously misleading.

Key regulatory considerations include:

  • Transparency: AI platforms should disclose how they generate answers and whether they have been reviewed by legal experts.
  • Accuracy Audits: Regulators should develop auditing protocols to ensure AI legal help is not systematically providing incorrect or harmful advice.
  • Consumer Protections: Policies should prevent AI tools from deterring users from seeking legal aid when needed.

Policymakers ideally will be in conversation with frontline practitioners, product/model developers, and community members to understand what is important to measure, how to measure it, and how to increase the quality and safety of performance. Evaluation based on concepts like Unauthorized Practice of Law does not necessarily correspond to consumers’ outcomes, needs, and priorities. Rather, figuring out what is beneficial to consumers should be based on what matters to the community and frontline providers.

The Research Approach: A Human-Centered Framework

How did we identify these insights and standards? The study used a three-part research process to hear from community members, frontline legal help providers, and access to justice experts. (Thanks to the Legal Design Lab team for helping me with interviews and study mechanics!)

  1. User Interviews: 46 community members tested AI legal help tools and shared feedback on their usefulness and trustworthiness.
  2. Expert Evaluations: 21 legal professionals ranked the importance of various quality criteria for AI-generated legal answers.
  3. AI Response Ratings: Legal experts assessed real AI-generated answers to legal questions, identifying common pitfalls and best practices.

This participatory, multi-stakeholder approach ensures that quality metrics reflect the real-world needs of legal aid seekers, not just theoretical legal standards.

The Legal Q-and-A Quality Rubric

What’s Next? Implementing the Quality Rubric

The research concludes with a proposed Quality Rubric that can serve as a blueprint for AI developers, researchers, and regulators. This rubric provides a scoring system that evaluates legal AI answers based on their strengths and weaknesses across key quality dimensions.

Potential next steps include:

  • Regular AI audits using the Quality Rubric to track performance.
  • Collaboration between legal aid groups and AI developers to refine AI-generated responses.
  • Policy frameworks that hold AI platforms accountable for misleading or harmful legal information.

Others might be developing internal quality review of the RAG-bots and AI systems on their websites and tools. They can use the rubric above as they are doing safety and quality checks, or training human labelers or AI automated judges to conduct these checks.

Conclusion: Measuring AI for Better Access to Justice

AI holds great promise for expanding access to legal help, but it must be measured and managed effectively. My research provides a concrete roadmap for ensuring that AI legal assistance is not just technically impressive but genuinely useful to people in need.

For legal aid organizations, the priority should be integrating AI tools that align with the study’s quality criteria. For AI developers, the challenge is to design products that go beyond accuracy and focus on usability, actionability, and strategic guidance. And for policymakers, the responsibility lies in crafting regulations that ensure AI-driven legal help does more good than harm.

As AI continues to transform how people access legal information, establishing clear, human-centered quality standards will be essential in shaping a fair and effective legal tech landscape.

Need for More Benchmarks of More Legal Tasks

In addition to this current focus on Legal Q-and-A, the justice community also needs to create similar evaluation standards and protocols for other tasks. Besides answering brief legal questions, there are other quality questions that matter to people’s outcomes, rights, and justice. This is the first part of a much bigger effort to have measurable, meaningful justice interventions.

This focus on delineated tasks & quality measures for each will be essential for quality products and models — serving the public — and unlocking greater scale and support of innovation.

Categories
AI + Access to Justice Class Blog Current Projects

Class Presentations for AI for Legal Help

Last week, the 5 student teams in Autumn Quarter’s AI for Legal Help made their final presentations, about if and how generative AI could assist legal aid, court & bar associations in providing legal help to the public.

The class’s 5 student groups have been working over the 9-week quarter with partners including the American Bar Association, Legal Aid Society of San Bernardino, Neighborhood Legal Services of LA, and LA Superior Court Help Center. The partners came to the class with some ideas, and the student teams worked with them to scope & prototype new AI agents to do legal tasks, including:

  • Demand letters for reasonable accommodations
  • Motions to set aside to stop an impending eviction/forcible set-out
  • Triaging court litigants to direct them to appropriate services
  • Analyzing eviction litigants’ case details to spot defenses
  • Improving lawyers’ responses to online brief advice clinic users’ questions

The AI agents are still in early stages. We’ll be continuing refinement, testing, and pilot-planning next quarter.

Categories
AI + Access to Justice Current Projects

AI + Access to Justice Summit 2024

On October 17 and 18, 2024 Stanford Legal Design Lab hosted the first-ever AI and Access to Justice Summit.

The Summit’s primary goal was to build strong relationships and a national, coordinated roadmap of how AI can responsibly be deployed and held accountable to close the justice gap.

AI + A2J Summit at Stanford Law School

Who was at the Summit?

Two law firm sponsors, K&L Gates and DLA Piper, supported the Summit through travel scholarships, program costs, and strategic guidance.

The main group of invitees were frontline legal help providers at legal aid groups, law help website teams, and the courts. We know they are key players in deciding what kinds of AI should and could be impactful for closing the justice gap. They’ll also be key partners in developing, piloting, and evaluating new AI solutions.

Key supporters and regional leaders from bar foundations, philanthropies, and pro bono groups were also invited. Their knowledge about funding, scaling, past initiatives, and spreading projects from one organization and region to others was key to the Summit.

Technology developers also came, both from big technology companies like Google and Microsoft and legal technology companies like Josef, Thomson Reuters, Briefpoint, and Paladin. Some of these groups already have AI tools for legal services, but not all of them have focused in on access to justice use cases.

In addition, we invited researchers who are also developing strategies for responsible, privacy-forward, efficient ways of developing specialized AI solutions that could help people in the justice sphere, and also learn from how AI is being deployed in parallel fields like in medicine or mental health.

Finally, we had participants who work in regulation and policy-making at state bars, to talk about policy, ethics, and balancing innovation with consumer protection. The ‘rules of the road’ about what kinds of AI can be built and deployed, and what standards they need to follow, are essential for clarity and predictability among developers.

What Happened at the Summit?

The Summit was a 2-day event, split intentionally into 5 sections:

  • Hands-On AI Training: Examples and Research to upskill legal professionals. There were demo’s, explainers, and strategies about what AI solutions are already in use or possible for legal services. Big tech, legal tech, and computer science researchers presented participants with hands-on, practical, detailed tour of AI tools, examples, and protocols that can be useful in developing new solutions to close the justice gap.
  • Big Vision: Margaret Hagan and Richard Susskind opened up the 2nd day with a challenge: where does the access to justice community want to be in 2030 when it comes to AI and the justice gap? How can individual organizations collaborate, build common infrastructure, and learn from each other to reach our big-picture goals?
  • AI+A2J as of 2024: In the morning of the second day, two panels presented on what is already happening in AI and Access to Justice — including an inventory of current pilots, demo’s of some early legal aid chatbots, regulators’ guidelines, and innovation sandboxes. This can help the group all understand the early-stage developments and policies.
  • Design & Development of New Initiatives. In the afternoon of the second day, we led breakout design workshops on specific use cases: housing law, immigration law, legal aid intake, and document preparation. The diverse stakeholders worked together using our AI Legal Design workbook to scope out a proposal for a new solution — whether that might mean building new technology or adapting off-the-shelf tech to the needs.
  • Support & Collaboration. In the final session, we heard from a panel who could talk through support: financial support, pro bono partnership support, technology company licensing and architecture support, and other ways to build more new interdisciplinary relationships that could unlock the talent, strategy, momentum, and finances necessary to make AI innovation happen. We also discussed support around evaluation so that there could be more data and more feeling of safety in deploying these new tools.

Takeaways from the Summit

The Summit built strong relationships & common understanding among technologists, providers, researchers, and supporters. Our hope is that we can run the Summit annually, to track progress in tackling the justice gap with AI and to observe what progress has been made, year-to-year. It is also to see the development of these relationships, collaborations, and scaling of impact.

In addition, some key points emerged from the training, panels, workshops, and down-time discussions.

Common Infrastructure for AI Development

Though many AI pilots are going to have be local to a specific organization in a specific region, the national (or international) justice community can be working on common resources that can serve as infrastructure to support AI for justice.

  • Common AI Trainings: Regional leaders, who are newly being hired by state bars and bar foundations to train and explore how AI can fit with legal services, should be working together to develop common training, common resources, and common best practices.
  • Project Repository: National organizations and networks should be thinking about a common repository of projects. This inventory could track what tech provider is being used, what benchmark is being used for evaluation, what AI model is being deployed, what date it was fine-tuned on, and if and how others could replicate it.
  • Rules of the Road Trainings. National organizations and local regulators could give more guidance to leadership like legal aid executive directors about what is allowed or not allowed, what is risky or safe, or other clarification that can help more leadership be brave and knowledgeable about how to deploy AI responsibly. When is an AI project sufficiently tested to be released to the public? How should the team be maintaining and tracking an AI project, to ensure it’s mitigating risk sufficiently?
  • Public Education. Technology companies, regulators, and frontline providers need to be talking more about how to make sure that the AI that is already out there (like ChatGPT, Gemini, and Claude) is reliable, has enough guardrails, and is consumer-safe. More research needs to be done on how to encourage strategic caution among the public, so they can use the AI safely and avoid user mistakes with it (like overreliance or misunderstanding).
  • Regulators<->Frontline Providers. More frontline legal help providers need to be in conversation with regulators (like bar associations, attorneys general, or other state/federal agencies) to talk about their perspective on if and how AI can be useful in closing the justice gap. Their perspective on risks, consumer harms, opportunities, and needs from regulators can ensure that rules are being set to maximize positive impact and minimize consumer harm & technology chilling.
  • Bar Foundation Collaboration. Statewide funders (especially bar foundations) can be talking to each other about their funding, scaling, and AI strategies. Well-resourced bar foundations can share how they are distributing money, what kinds of projects they’re incentivizing, how they are holding the projects accountable, and what local resources or protocols they could share with others.

AI for Justice Should be Going Upstream & Going Big

Richard Susskind charged the group with thinking big about AI for justice. His charges & insights inspired many of the participants throughout the Summit, particularly on two points.

Going Big. Susskind called on legal leaders and technologists not to do piecemeal AI innovation (which might well be the default pathway). Rather, he called on them to work in coordination across the country (if not the globe). The focus should be on reimagining how to use AI as a way to make a fundamental, beneficial shift in justice services. This means not just doing small optimizations or tweaks, but shifting the system to work better for users and providers.

Susskind charged us with thinking beyond augmentation to models of serving the public with their justice needs.

Going Upstream. He also charged us with going upstream, figuring out more early ways to spot and get help to people. This means not just adding AI into the current legal aid or court workflow — but developing new service offerings, data links, or community partnerships. Can we prevent more legal problems by using AI before a small problem spirals into a court case or large conflict?

After Susskind’s remarks, I focused in on coordination among legal actors across the country for AI development. Compared to the last 20 years of legal technology development, are there ways to be more coordinated, and also more focused on impact and accountability?

There might be strategic leaders in different regions of the US and in different issue areas (housing, immigration, debt, family, etc) that are spreading

  • best practices,
  • evaluation protocols and benchmarks,
  • licensing arrangements with technology companies
  • bridges with the technology companies
  • conversations with the regulators.

How can the Access to Justice community be more organized so that their voice can be heard as

  • the rules of the road are being defined?
  • technology companies are building and releasing models that the public is going to be using?
  • technology vendors decide if and how they are going to enter this market, and what their pricing and licensing are going to look like?

Ideally, legal aid groups, courts, and bars will be collaborating together to build AI models, agents, and evaluations that can get a significant number of people the legal help they need to resolve their problems — and to ensure that the general, popular AI tools are doing a good job at helping people with their legal problems.

Privacy Engineering & Confidentiality Concerns

One of the main barriers to AI R&D for justice is confidentiality. Legal aid and other help providers have a duty to keep their clients’ data confidential, which restricts their ability to use past data to train models or to use current data to execute tasks through AI. In practice, many legal leaders are nervous about any new technology that requires client data — -will it lead to data leaks, client harms, regulatory actions, bad press or other concerning outcomes?

Our technology developers and researchers had cutting-edge proposals for privacy-forward AI development, that could deal with some of these concerns around confidentiality. THough these privacy engineering strategies are foreign to many lawyers, the technologists broke them down into step-by-step explanations with examples, to help more legal professionals be able to think about data protection in a systematic, engineering way.

Synthetic Data. One of the privacy-forward strategies discussed was synthetic data. With this solution, a developer doesn’t use real, confidential data to train a system. Rather, they create a parallel but fictional set of data — like a doppelganger to the original client data. It’s structurally similar to confidential client data, but it contains no real people’s information. Synthetic data is a common strategy in healthcare technology, where there is a similar emphasis on patient confidentiality.

Neel Guha explained to the participants how synthetic data works, and how they might build a synthetic dataset that is free of identifiable data and does not violate ethical duties to confidentiality. He emphasized that the more legal aid and court groups can develop datasets that are share-able to researchers and the public, the more that researchers and technologists will be attracted to working on justice-tech challenges. More synthetic datasets will both be ethically safe & beneficial to collaboration, scaling, and innovation.

Federated Model Training. Another privacy/confidentiality strategy was Federated Model Training. Google DeepMind team presented on this strategy, taking examples from the health system.

When multiple hospitals all wanted to work on the same project: training an AI model to better spot tuberculosis or other issues on lung X-rays. Each hospital wanted to train the AI model on their existing X-ray data, but they did not want to let this confidential data to leave their servers and go to a centralized server. Sharing the data would break their confidentiality requirements.

So instead, the hospitals decided to go with a Federated Model training protocol. Here, an original, first version of the AI model was taken from the centralized server and then put on each of the hospital’s localized servers. The local version of the AI model would look at that hospital’s X-ray data and train the model on them. Then they would send the model back to the centralized server and accumulate all of the learnings and trainings to make a smart model in the center. The local hospital data was never shared.

In this way, legal aid groups or courts could explore making a centralized model while still keeping each of their confidential data sources on their private, secure servers. Individual case data and confidential data stay local on the local servers, and the smart collective model lives at a centralized place and gradually gets smarter. This technique can also work for training the model over time so that the model can continue to get smart as the information and data continue to grow.

Towards the Next Year of AI for Access to Justice

The Legal Design Lab team thanks all of our participants and sponsors for a tremendous event. We learned so much and built new relationships that we look forward to deepening with more collaborations & projects.

We were excited to hear frontline providers walk away with new ideas, concrete plans for how to borrow from others’ AI pilots, and an understanding of what might be feasible. We were also excited to see new pro bono and funding relationships develop, that can unlock more resources in this space.

Stay tuned as we continue our work on AI R&D, evaluation, and community-building in the access to justice community. We look forward to working towards closing the justice gap, through technology and otherwise!

Categories
AI + Access to Justice Current Projects

Housing Law experts wanted for AI evaluation research

We are recruiting Housing Law experts to participate in a study of AI answers to landlord-tenant questions. Please sign up here if you are a housing law practitioner interested in this study.

Experts who participate in interviews and AI-ranking sessions will receive Amazon gift cards for their participation.

Categories
AI + Access to Justice Current Projects

Design Workbook for Legal Help AI Pilots

For our upcoming AI+Access to Justice Summit and our AI for Legal Help class, our team has made a new design workbook to guide people through scoping a new AI pilot.

We encourage others to use and explore this AI Design Workbook to help think through:

  • Use Cases and Workflows
  • Specific Legal Tasks that AI could do (or should not do)
  • User Personas, and how they might need or worry about AI — or how they might be affected by it
  • Data plans for training AI and for deploying it
  • Risks, laws, ethics brainstorming about what could go wrong or what regulators might require, and mitigation/prevention plans to proactively deal with these concerns
  • Quality and Efficiency Benchmarks to aim for with a new intervention (and how to compare the tech with the human service)
  • Support needed to go into the next phases, of tech prototyping and pilot deployment

Responsible AI development should be going through these 3 careful stages — design and policy research, tech prototyping and benchmark evaluation, and piloting in a controlled, careful way. We hope this workbook can be useful to groups who want to get started on this journey!

Categories
AI + Access to Justice Class Blog Current Projects Design Research

Interviewing Legal Experts on the Quality of AI Answers

This month, our team commenced interviews with landlord-tenant subject matter experts, including court help staff, legal aid attorneys, and hotline operators. These experts are comparing and rating various AI responses to commonly asked landlord-tenant questions that individuals may get when they go online to find help.

Learned Hands Battle Mode

Our team has developed a new ‘Battle Mode’ of our rating/classification platform Learned Hands. In a Battle Mode game on Learned Hands, experts compare two distinct AI answers to the same user’s query and determine which one is superior. Additionally, we have the experts speak aloud as they are playing, asking that they articulate their reasoning. This allows us to gain insights into why a particular response is deemed good or bad, helpful or harmful.

Our group will be publishing a report that evaluates the performance of various AI models in answering everyday landlord-tenant questions. Our goal is to establish a standardized approach for auditing and benchmarking AI’s evolving ability to address people’s legal inquiries. This standardized approach will be applicable to major AI platforms, as well as local chatbots and tools developed by individual groups and startups. By doing so, we hope to refine our methods for conducting audits and benchmarks, ensuring that we can accurately assess AI’s capabilities in answering people’s legal questions.

Instead of speculating about potential pitfalls, we aim to hear directly from on-the-ground experts about how these AI answers might help or harm a tenant who has gone onto the Internet to problem-solve. This means regular, qualitative sessions with housing attorneys and service providers, to have them closely review what AI is telling people when asked for information on a landlord-tenant problem. These experts have real-world experience in how people use (or don’t) the information they get online, from friends, or from other experts — and how it plays out for their benefit or their detriment. 

We also believe that regular review by experts can help us spot concerning trends as early as possible. AI answers might change in the coming months & years. We want to keep an eye on the evolving trends in how large tech companies’ AI platforms respond to people’s legal help problem queries, and have front-line experts flag where there might be a big harm or benefit that has policy consequences.

Stay tuned for the results of our expert-led rating games and feedback sessions!

If you are a legal expert in landlord-tenant law, please sign up to be one of our expert interviewees below.

https://airtable.com/embed/appMxYCJsZZuScuTN/pago0ZNPguYKo46X8/form

Categories
AI + Access to Justice Current Projects

Autumn 24 AI for Legal Help

Our team is excited to announce the new, 2024-25 version of our ongoing class, AI for Legal Help. This school year, we’re moving from background user and expert research towards AI R&D and pilot development.

Can AI increase access to justice, by helping people resolve their legal problems in more accessible, equitable, and effective ways? What are the risks that AI poses for people seeking legal guidance, that technical and policy guardrails should mitigate?

In this course, students will design and develop new demonstration AI projects and pilot plans, combining human-centered design, tech & data work, and law & policy knowledge. 

Students will work on interdisciplinary teams, each partnered with frontline legal aid and court groups interested in using AI to improve their public services. Student teams will help their partners scope specific AI projects, spot and mitigate risks, train a model, test its performance, and think through a plan to safely pilot the AI. 

By the end of the class, students and their partners will co-design new tech pilots to help people dealing with legal problems like evictions, reentry from the criminal justice system, debt collection, and more.

Students will get experience in human-centered AI development, and critical thinking about if and how technology projects can be used in helping the public with a high-stakes legal problem. Along with their AI pilot, teams will establish important guidelines to ensure that new AI projects are centered on the needs of people, and developed with a careful eye towards ethical and legal principles.

Join our policy lab team to do R&D to define the future of AI for legal help.Apply for the class at the SLS Policy Lab link here.