Categories
AI + Access to Justice Class Blog Current Projects Project updates

Legal Aid Intake & Screening AI

A Report on an AI-Powered Intake & Screening Workflow for Legal Aid Teams 

AI for Legal Help, Legal Design Lab, 2025

This report provides a write-up of the AI for Housing Legal Aid Intake & Screening class project, that was one track of the  “AI for Legal Help” Policy Lab, during the Autumn 2024 and Winter 2025 quarters. The AI for Legal Help course involved work with legal and court groups that provide legal help services to the public, to understand where responsible AI innovations might be possible and to design and prototype initial solutions, as well as pilot and evaluation plans.

One of the project tracks was on improving the workflows of legal aid teams who provide housing help, particularly with their struggle of high demand from community members but a lack of clarity on exactly whether a person can be served by the legal aid group & how. Between Autumn 2024 and Winter 2025, an interdisciplinary team of Stanford University students partnered with the Legal Aid Society of San Bernardino (LASSB) to understand the current design of housing intake & screening, and to propose an improved, AI-powered workflow. 

This report details the problem identified by LASSB, the proposed AI-powered intake & screening workflow developed by the student team, and recommendations for future development and implementation. 

We share it in the hopes that legal aid and court help center leadership might also be interested in exploring responsible AI development for demand letters, and that funders, researchers, and technologists might collaborate on developing and testing successful solutions for this task.

Thank you to students in this team: Favour Nerisse, Gretel Cannon, Tatiana Zhang, and other collaborators.. And a big thank you to our LASSB colleagues: Greg Armstrong, Pablo Ramirez, and more.

Introduction

The Legal Aid Society of San Bernardino (LASSB) is a nonprofit law firm serving low-income residents across San Bernardino and Riverside Counties, where housing issues – especially evictions – are the most common legal problems facing the community. Like many legal aid organizations, LASSB operates under severe resource constraints and high demand.

In the first half of 2024 alone, LASSB assisted over 1,200 households (3,261 individuals) with eviction prevention and landlord-tenant support. Yet many more people seek help than LASSB can serve, and those who do seek help often face barriers like long hotline wait times or lack of transportation to clinics. These challenges make the intake process – the initial screening and information-gathering when a client asks for help – a critical bottleneck. If clients cannot get through intake or are screened out improperly, they effectively have no access to justice.

Against this backdrop, LASSB partnered with a team of Stanford students in the AI for Legal Help practicum to explore an AI-based solution. The task selected was housing legal intake: using an AI “Intake Agent” to streamline eligibility screening and initial fact-gathering for clients with housing issues (especially evictions). The proposed solution was a chatbot-style AI assistant that could interview applicants about their legal problem and situation, apply LASSB’s intake criteria, and produce a summary for legal aid staff. By handling routine, high-volume intake questions, the AI agent aimed to reduce client wait times and expand LASSB’s reach to those who can’t easily come in or call during business hours. The students planned a phased evaluation and implementation: first prototyping the agent with sample data, then testing its accuracy and safety with LASSB staff, before moving toward a limited pilot deployment. This report details the development of that prototype AI Intake Agent across the Autumn and Winter quarters, including the use case rationale, current vs. future workflow, technical design, evaluation findings, and recommendations for next steps.

1: The Use Case – AI-Assisted Housing Intake

Defining the Use Case of Intake & Screening

The project focused on legal intake for housing legal help, specifically tenants seeking assistance with eviction or unsafe housing. Intake is the process by which legal aid determines who qualifies for help and gathers the facts of their case. For a tenant facing eviction, this means answering questions about income, household, and the eviction situation, so the agency can decide if the case falls within their scope (for example, within income limits and legal priorities).

Intake is a natural first use case because it is a gateway to justice: a short phone interview or online form is often all that stands between a person in crisis and the help they need. Yet many people never complete this step due to practical barriers (long hold times, lack of childcare or transportation, fear or embarrassment). 

By improving intake, LASSB could assist more people early, preventing more evictions or legal problems from escalating.

Why LASSB Chose Housing Intake 

LASSB and the student team selected the housing intake scenario for several reasons. First, housing is LASSB’s highest-demand area – eviction defense was 62% of cases for a neighboring legal aid and similarly dominant for LASSB. This high volume means intake workers spend enormous time screening housing cases, and many eligible clients are turned away simply because staff can’t handle all the calls. Improving intake throughput could thus have an immediate impact. Second, housing intake involves highly repetitive and rules-based questions (e.g. income eligibility, case type triage) that are well-suited to automation. These are precisely the kind of routine, information-heavy tasks that AI can assist with at scale. 

Third, an intake chatbot could increase privacy and reach: clients could complete intake online 24/7, at their own pace, without waiting on hold or revealing personal stories to a stranger right away. This could especially help those in rural areas or those uncomfortable with an in-person or phone interview. In short, housing intake was seen as a high-impact, AI-ready use case where automation might improve efficiency while preserving quality of service.

Why Intake Matters for Access to Justice

Intake may seem mundane, but it is a cornerstone of access to justice. It is the “front door” of legal aid – if the door is locked or the line too long, people simply don’t get help. Studies show that only a small fraction of people with civil legal issues ever consult a lawyer, often because they don’t recognize their problem as legal or face obstacles seeking help. Even among those who do reach out to legal aid (nearly 2 million requests in 2022), about half are turned away due to insufficient resources. Many turn-aways happen at the intake stage, when agencies must triage cases. Improving intake can thus shrink the “justice gap” by catching more issues early and providing at least some guidance to those who would otherwise get nothing. 

Moreover, a well-designed intake process can empower clients – by helping them tell their story, identifying their urgent needs, and connecting them to appropriate next steps. On the flip side, a bad intake experience (confusing questions, long delays, or perfunctory denials) can discourage people from pursuing their rights, effectively denying justice. By focusing on intake, the project aimed to make the path to legal help smoother and more equitable.

Why AI Is a Good Fit for Housing Intake

Legal intake involves high volume, repetitive Q&A, and standard decision rules, which are conditions where AI can excel. A large language model (LLM) can be programmed to ask the same questions an intake worker would, in a conversational manner, and interpret the answers. 

Because LLMs can process natural language, an AI agent can understand a client’s narrative of their housing problem and spot relevant details or legal issues (e.g. identifying an illegal lockout vs. a formal eviction) to ask appropriate follow-ups. This dynamic questioning is something LLMs have demonstrated success in – for example, a recent experiment in Missouri showed that an LLM could generate follow-up intake questions “in real-time” based on a user’s description, like asking whether a landlord gave formal notice after a tenant said “I got kicked out.” AI can also help standardize decisions: by encoding eligibility rules into the prompt or system, it can apply the same criteria every time, potentially reducing inconsistent screening outcomes. Importantly, initial research found that GPT-4-based models could predict legal aid acceptance/rejection decisions with about 84% accuracy, and they erred on the side of caution (usually not rejecting a case unless clearly ineligible). This suggests AI intake systems can be tuned to minimize false denials, a critical requirement for fairness.

Beyond consistency and accuracy, AI offers scalability and extended reach. Once developed, an AI intake agent can handle multiple clients at once, anytime. For LASSB, this could mean a client with an eviction notice can start an intake at midnight rather than waiting anxious days for a callback. Other legal aid groups have already seen the potential: Legal Aid of North Carolina’s chatbot “LIA” has engaged in over 21,000 conversations in its first year, answering common legal questions and freeing up staff time. LASSB hopes for similar gains – the Executive Director noted plans to test AI tools to “reduce client wait times” and extend services to rural communities that in-person clinics don’t reach. Finally, an AI intake agent can offer a degree of client comfort – some individuals might prefer typing out their story to a bot rather than speaking to a person, especially on sensitive issues like domestic violence intersecting with an eviction. In summary, the volume, repetitive structure, and outreach potential of intake made it an ideal candidate for an AI solution.

2: Status Quo and Future Vision

Current Human-Led Workflow 

At present, LASSB’s intake process is entirely human-driven. A typical workflow might begin with a client calling LASSB’s hotline or walking into a clinic. An intake coordinator or paralegal then screens for eligibility, asking a series of standard questions: Are you a U.S. citizen or eligible immigrant? What is your household size and income? What is your zip code or county? What type of legal issue do you have? These questions correspond to LASSB’s internal eligibility rules (for example, income below a percentage of the poverty line, residence in the service area, and case type within program priorities). 

The intake worker usually follows a scripted guide – these guides can run 7+ pages of rules and flowcharts for different scenarios. If the client passes initial screening, the staffer moves on to information-gathering: taking down details of the legal problem. In a housing case, they might ask: “When did you receive the eviction notice? Did you already go to court? How many people live in the unit? Do you have any disabilities or special circumstances?” This helps determine the urgency and possible defenses (for instance, disability could mean a reasonable accommodation letter might help, or a lockout without court order is illegal). The intake worker must also gauge if the case fits LASSB’s current priorities or grant requirements – a subtle judgment call often based on experience. 

Once information is collected, the case is handed off internally: if it’s straightforward and within scope, they may schedule the client for a legal clinic or assign a staff attorney for advice. If it’s a tougher or out-of-scope case, the client might be given a referral to another agency or a “brief advice” appointment where an attorney only gives counsel and not full representation. In some instances, there are multiple handoffs – for example, the person who does the phone screening might not be the one who ultimately provides the legal advice, requiring good note-taking and case summaries.

User Personas in the Workflow

The team crafted sample user and staff personas, of who would be interacting with the new workflow and AI agent.


Pain Points in the Status Quo

This human-centric process has several pain points identified by LASSB and the student team. 

First, it’s slow and resource-intensive. Clients can wait an hour or more on hold before even speaking to an intake worker during peak times, such as when an eviction moratorium change causes a surge in calls. Staff capacity is limited – a single intake worker can only handle one client at a time, and each interview might take 20–30 minutes. If the client is ultimately ineligible, that time might be “wasted” that could have been spent on an eligible client. The sheer volume means many callers never get through at all. 

Second, the complexity of rules can lead to inconsistent or suboptimal outcomes. Intake staff have to juggle 30+ eligibility rules, which can change with funding or policy shifts. Important details might be missed or misapplied; for example, a novice staffer might turn away a case that seems outside scope but actually fits an exception. Indeed, variability in intake decisions was a known issue – one research project found that LLMs sometimes caught errors made by human screeners (e.g., the AI recognized a case was eligible when a human mistakenly marked it as not). 

Third, the process can be stressful for clients. Explaining one’s predicament (like why rent is behind) to a stranger can be intimidating. Clients in crisis might forget to mention key facts or have trouble understanding the questions. If a client has trauma (such as a domestic violence survivor facing eviction due to abuse), a blunt interview can inadvertently re-traumatize them. LASSB intake staff are trained to be sensitive, but in the rush of high volume, the experience may still feel hurried or impersonal. 

Finally, timing and access are issues. Intake typically happens during business hours via phone or at specific clinic times. People who work, lack a phone, or have disabilities may struggle to engage through those channels. Language barriers can also be an issue; while LASSB offers services in Spanish and other languages, matching bilingual staff to every call is challenging. All these pain points underscore a need for a more efficient, user-friendly intake system.

Envisioned Human-AI Workflow

In the future-state vision, LASSB’s intake would be a human-AI partnership, blending automation with human judgment. The envisioned workflow goes as follows: A client in need of housing help would first interact with an AI Intake Agent, likely through a web chat interface (or possibly via a self-help kiosk or mobile app). 

The AI agent would greet the user with a friendly introduction (making clear it’s an automated assistant) and guide them through the eligibility questions – e.g., asking for their income range, household size, and problem category. These could even be answered via simple buttons or quick replies to make it easy. The agent would use these answers to do an initial screening (following the same rules staff use). If clearly ineligible (for instance, the person lives outside LASSB’s service counties), the agent would not simply turn them away. Instead, it might gently inform them that LASSB likely cannot assist directly and provide a referral link or information for the appropriate jurisdiction. (Crucially, per LASSB’s guidance, the AI would err on inclusion – if unsure, it would mark the case for human review rather than issuing a flat denial.) 

For those who pass the basic criteria, the AI would proceed to collect case facts: “Please describe what’s happening with your housing situation.” As the user writes or speaks (in a typed chat or possibly voice in the future), the AI will parse the narrative and ask smart follow-ups. For example, if the client says “I’m being evicted for not paying rent,” the AI might follow up: “Have you received court papers (an unlawful detainer lawsuit) from your landlord, or just a pay-or-quit notice?” – aiming to distinguish a looming eviction from an active court case. This dynamic Q&A continues until the AI has enough detail to fill out an intake template (or until it senses diminishing returns from more questions). The conversation is designed to feel like a natural interview with empathy and clarity.

After gathering info, the handoff to humans occurs. The AI will compile a summary of the intake: key facts like names, important dates (e.g., eviction hearing date if any), and the client’s stated goals or concerns. It may also tentatively flag certain legal issues or urgency indicators – for instance, “Client might qualify for a disability accommodation defense” or “Lockout situation – urgent” – based on what it learned. This summary and the raw Q&A transcript are then forwarded to LASSB’s intake staff or attorneys. A human will review the package, double-check eligibility (the AI’s work is a recommendation, not final), and then follow up with the client. In some cases, the AI might be able to immediately route the client: for example, scheduling them for the next eviction clinic or providing a link to self-help resources while they wait.

But major decisions, like accepting the case for full representation or giving legal advice, remain with human professionals. The human staff thus step in at the “decision” stage with a lot of the grunt work already done. They can spend their time verifying critical details and providing counsel, rather than laboriously collecting background info. This hybrid workflow means clients get faster initial engagement (potentially instantaneous via AI, instead of waiting days for a call) and staff time is used more efficiently where their expertise is truly needed.

Feedback-Shaped Vision

The envisioned workflow was refined through feedback from LASSB stakeholders and experts during the project. Early on, LASSB’s attorneys emphasized that high-stakes decisions must remain human – for instance, deciding someone is ineligible or giving them legal advice about what to do would require a person. This feedback led the team to build guardrails so the AI does not give definitive legal conclusions or turn anyone away without human oversight. Another piece of feedback was about tone and trauma-informed practice. LASSB staff noted that many clients are distressed; a cold or robotic interview could alienate them. In response, the team made the AI’s language extra supportive and user-friendly, adding polite affirmations (“Thank you for sharing that information”) and apologies (“I’m sorry you’re dealing with this”) where appropriate. 

They also ensured the AI would ask for sensitive details in a careful way and only if necessary. For example, rather than immediately asking “How much is your income?” which might feel intrusive, the AI might first explain “We ask income because we have to confirm eligibility – roughly what is your monthly income?” to give context. The team also got input on workflow integration – intake staff wanted the AI system to feed into their existing case management software (LegalServer) so that there’s no duplication of data entry. This shaped the plan for implementation (i.e., designing the output in a format that can be easily transferred). Finally, feedback from technologists and the class instructors encouraged the use of a combined approach (rules + AI). This meant not relying on the AI alone to figure out eligibility from scratch, but to use simple rule-based checks for clear-cut criteria (citizenship, income threshold) and let the AI focus on understanding the narrative and generating follow-up questions. 

This hybrid approach was validated by outside research as well. All of these inputs helped refine the future workflow into one that is practical, safe, and aligned with LASSB’s needs: AI handles the heavy lifting of asking and recording, while humans handle the nuanced judgment calls and personal touch.


3: Prototyping and Technical Work

Initial Concepts from Autumn Quarter 

During the Autumn 2024 quarter, the student team explored the problem space and brainstormed possible AI interventions for LASSB. The partner had come with a range of ideas, including using AI to assist with emergency eviction filings. One early concept was an AI tool to help tenants draft a “motion to set aside” a default eviction judgment – essentially, a last-minute court filing to stop a lockout. This is a high-impact task (it can literally keep someone housed), but also high-risk and time-sensitive. Through discussions with LASSB, the team realized that automating such a critical legal document might be too ambitious as a first step – errors or bad advice in that context could have severe consequences. 

Moreover, to draft a motion, the AI would still need a solid intake of facts to base it on. This insight refocused the team on the intake stage as the foundation. Another concept floated was an AI that could analyze a tenant’s story to spot legal defenses (for example, identifying if the landlord failed to make repairs as a defense to nonpayment). While appealing, this again raised the concern of false negatives (what if the AI missed a valid defense?) and overlapped with legal advice. Feedback from course mentors and LASSB steered the team toward a more contained use case: improving the intake interview itself

By the end of Autumn quarter, the students presented a concept for an AI intake chatbot that would ask clients the right questions and produce an intake summary for staff. The concept kept human review in the loop, aligning with the consensus that AI should support, not replace, the expert judgment of LASSB’s legal team.

Revised Scope in Winter 

Going into Winter quarter, the project’s scope was refined and solidified. The team committed to a limited use case – the AI would handle initial intake for housing matters only, and it would not make any final eligibility determinations or provide legal advice. All high-stakes decisions were deferred to staff. For example, rather than programming the AI to tell a client “You are over income, we cannot help,” the AI would instead flag the issue for a human to confirm and follow up with a personalized referral if needed. Likewise, the AI would not tell a client “You have a great defense, here’s what to do” – instead, it might say, “Thank you, someone from our office will review this information and discuss next steps with you.” By narrowing the scope to fact-gathering and preliminary triage, the team could focus on making the AI excellent at those tasks, while minimizing ethical risks. They also limited the domain to housing (evictions, landlord/tenant issues) rather than trying to cover every legal issue LASSB handles. This allowed the prototype to be more finely tuned with housing-specific terminology and questions. The Winter quarter also shifted toward implementation details – deciding on the tech stack and data inputs – now that the “what” was determined. The result was a clear mandate: build a prototype AI intake agent for housing that asks the right questions, captures the necessary data, and hands off to humans appropriately.

Prototype Development Details 

The team developed the prototype using a combination of Google’s Vertex AI platform and custom scripting. Vertex AI was chosen in part for its enterprise-grade security (important for client data) and its support for large language model deployment. Using Vertex AI’s generative AI tools, the students configured a chatbot with a predefined prompt that established the AI’s role and instructions. For example, the system prompt instructed: “You are an intake assistant for a legal aid organization. Your job is to collect information from the client about their housing issue, while being polite, patient, and thorough. You do not give legal advice or make final decisions. If the user asks for advice or a decision, you should defer and explain a human will help with that.” This kind of prompt served as a guardrail for the AI’s behavior.

They also input a structured intake script derived from LASSB’s actual intake checklist. This script included key questions (citizenship, income, etc.) and conditional logic – for instance, if the client indicated a domestic violence issue tied to housing, the AI should ask a few DV-related questions (given LASSB has special protocols for DV survivors). Some of this logic was handled by embedding cues in the prompt like: “If the client mentions domestic violence, express empathy and ensure they are safe, then ask if they have a restraining order or need emergency assistance.” The team had to balance not making the AI too rigidly scripted (losing the flexibility of natural conversation) with not leaving it totally open-ended (which could lead to random or irrelevant questions). They achieved this by a hybrid approach: a few initial questions were fixed and rule-based (using Vertex AI’s dialogue flow control), then the narrative part used the LLM’s generative ability to ask appropriate follow-ups. 

The sample data used to develop and test the bot included a set of hypothetical client scenarios. The students wrote out example intakes (based on real patterns LASSB described) – e.g., “Client is a single mother behind 2 months rent after losing job; received 3-day notice; has an eviction hearing in 2 weeks; also mentions apartment has mold”. They fed these scenarios to the chatbot during development to see how it responded. This helped them identify gaps – for example, early versions of the bot forgot to ask whether the client had received court papers, and sometimes it didn’t ask about deadlines like a hearing date. Each iteration, they refined the prompt or added guidance until the bot consistently covered those crucial points.

Key Design Decisions

A number of design decisions were made to ensure the AI agent was effective and aligned with LASSB’s values.

Trauma-Informed Questioning 

The bot’s dialogue was crafted to be empathetic and empowering. Instead of bluntly asking “Why didn’t you pay your rent?,” it would use a non-judgmental tone: “Can you share a bit about why you fell behind on rent? (For example, loss of income, unexpected expenses, etc.) This helps us understand your situation.” 

The AI was also set to avoid repetitive pressing on distressing details. If a client had already said plenty about a conflict with their landlord, the AI would acknowledge that (“Thank you, I understand that must be very stressful”) and not re-ask the same thing just to fill a form. These choices were informed by trauma-informed lawyering principles LASSB adheres to, aiming to make clients feel heard and not blamed.

Tone and Language 

The AI speaks in plain, layperson’s language, not legalese. Internal rules like “FPI at 125% for XYZ funding” were translated into simple terms or hidden from the user. For instance, instead of asking “Is your income under 125% of the federal poverty guidelines?” the bot asks “Do you mind sharing your monthly income (approximately)? We have income limits to determine eligibility.” It also explains why it’s asking things, to build trust. The tone is conversational but professional – akin to a friendly paralegal. 

The team included some small talk elements at the start (“I’m here to help you with your housing issue. I will ask some questions to understand your situation.”) to put users at ease. Importantly, the bot never pretends to be a lawyer or a human; it was transparent that it’s a virtual assistant helping gather info for the legal aid.

Guardrails

Several guardrails were programmed to keep the AI on track. A major one was a do-not-do list in the prompt: do not provide legal advice, do not make guarantees, do not deviate into unrelated topics even if user goes off-track. If the user asked a legal question (“What should I do about X?”), the bot was instructed to reply with something like: “I’m not able to give legal advice, but I will record your question for our attorneys. Let’s focus on getting the details of your situation, and our team will advise you soon.” 

Another guardrail was content moderation – e.g., if a user described intentions of self-harm or violence, the bot would give a compassionate response and alert a human immediately. Vertex AI’s content filter was leveraged to catch extreme situations. Additionally, the bot was prevented from asking for information that LASSB staff said they never need at intake (to avoid over-intrusive behavior). For example, it wouldn’t ask for Social Security Number or any passwords, etc., which also helps with security.

User Flow and Interface

The user flow was deliberately kept simple. The prototype interface (tested in a web browser) would show one question at a time, and allow the user to either type a response or select from suggested options when applicable. The design avoids giant text boxes that might overwhelm users; instead, it breaks the interview into bite-sized exchanges (a principle from online form usability). 

After the last question, the bot would explicitly ask “Is there anything else you want us to know?” giving the user a chance to add details in their own words. Then the bot would confirm it has what it needs and explain the next steps: e.g., “Thank you for all this information. Our legal team will review it immediately. You should receive a call or email from us within 1 business day. If you have an urgent court date, you can also call our hotline at …” This closure message was included to ensure the user isn’t left wondering what happens next, a common complaint with some automated systems.

Risk Mitigation

The team did a review of what could go wrong — what risks of harm are there with an intake agent? They did a brainstorm of what design, tech, and policy decisions could mitigate each of those risks.

 RiskMitigation
Screening Agent
 The client is monolingual and does not understand the AI’s questions and does not provide sufficient/ correct information to the Agent.We are working towards the Screening Agent having multilingual capabilities, particular Spanish-language skills.
 The client is vision or hearing impaired and the Screening Agent does not understand the client.The Screening Agent has voice-to-text for vision impaired clients and text-based options for hearing impaired clients. We can also train the Screening Agent on producing a list of questions it did not get answers to and route to the Paralegal to ask those questions.  
 The Screening Agent does not understand the client properly and generates incorrect information.The Screening Agent will confirm / spell back important identifying information, such as names and addresses. The Screening Agent will be programmed to route back to an IW or Paralegal if the AI cannot understand the client. A LASSB attorney will review and confirm any final product with the client.
 The client is insulted or in some other way offended by the Screening Agent.The Screening Agent’s scope is limited to the Screening Questions. It will also be trained on trauma-informed care. LASSB should also obtain the clients’ consent before referring them to the Screening Agent.

Training and Iteration

Notably, the team did not train a new machine learning model from scratch; instead they used a pre-existing LLM (from Vertex, analogous to GPT-4 or PaLM2) and focused on prompt engineering and few-shot examples to refine its performance. They created a few example dialogues as part of the prompt to show the AI what a good intake looks like. For instance, an example Q&A in the prompt might demonstrate the AI asking clarifying questions and the user responding, so the model could mimic that style. 

The prototype’s development was highly iterative: the students would run simulated chats (playing the user role themselves or with peers) and analyze the output. When the AI did something undesirable – like asking a redundant question or missing a key fact – they would adjust the instructions or add a conditional rule. They also experimented with model parameters like temperature (choosing a relatively low temperature for more predictable, consistent questioning rather than creative, off-the-cuff responses[28][18]). Over the Winter quarter, dozens of test conversations were conducted. 

Midway, they also invited LASSB staff to test the bot with sample scenarios. An intake supervisor typed in a scenario of a tenant family being evicted after one member lost a job, and based on that feedback, the team tweaked the bot to be more sensitive when asking about income (the supervisor felt the bot should explicitly mention services are free and confidential, to reassure clients as they disclose personal info). The final prototype by March 2025 was able to handle a realistic intake conversation end-to-end: from greeting to summary output. 

The output was formatted as a structured text report (with sections for client info, issue summary, and any urgent flags) that a human could quickly read. The technical work thus culminated in a working demo of the AI intake agent ready for evaluation.

4: Evaluation and Lessons Learned

Evaluating Quality and Usefulness

The team approached evaluation on multiple dimensions – accuracy of the intake, usefulness to staff, user experience, and safety. 

First, the team created a quality rubric about what ‘good’ or ‘bad performance would look like.

Good-Bad Rubric on Screening Performance

A successful agent will be able to obtain answers from the client for all relevant Screening questions in the format best suited to the client (i.e., verbally or written and in English or Spanish).  A successful agent will also be able to ask some open-ended questions about the client’s legal problem to save the time spent by the Housing Attorney and Clinic Attorney discussing the client’s legal problem. Ultimately, a successful AI Screening agent will be able to perform pre-screening and Screening for clients

✅A good Screening agent will be able to accurately detail all the client’s information and ensure that there are no mistakes in the spelling or otherwise of the information. 

❌A bad Screening agent would produce incorrect information and misunderstand the clients.  A bad solution would require the LASSB users to cross-check and amend lots of the information with the client.

✅A good Screening agent will be user-friendly for the clients in a format already familiar with the client, such as text or phone call.

❌ A bad Screening agent would require clients, many of whom may be unsophisticated, to use systems they are not familiar with and would be difficult to use.

✅A good Screening agent would be multilingual.

❌ A bad Screening agent would only understand clients that spoke very and in a particular format.

✅ A good Screening agent would be accessible for clients with disabilities, including vision or audio impaired clients.  

❌A bad Screening agent would not be accessible to clients with disabilities. A bad solution would not be accessible on a client’s phone.

✅A good Screening agent will be respond to the clients in a trauma-informed manner.  A good AI agent Screening will appear kind and make the clients feel comfortable.

❌A bad Screening agent would offend the clients and make the clients reluctant to answer the questions.

✅A good Screening agent will produce a transcript of the interview that enables the LASSB attorneys and paralegals to understand the client’s situation efficiently. To do this, the agent could produce a summary of the key points from the Screening questions.  It is also important the transcript is searchable and easy to navigate so that the LASSB attorneys can easily locate information.

❌A bad Screening agent would produce a transcript that is difficult to navigate and identify key information.  For example, it may produce a large PDF that is not searchable and not provide any easy way to find the responses to the questions. 

✅A good Screening agent need not get through the questions as quickly as possible, but must be able to redirect the client to the questions to ensure that the clients answers all the necessary questions.

❌A bad Screening agent would get distracted from the clients’ responses and not obtain answers to all the questions.

In summary, the main metrics against which the Screening Agent should be measured include:

  1. Accuracy: whether matches human performance or produces errors in less cases);
  2. User satisfaction: how happy the client & LASSB personnel using the agent are; and
  3. Efficiency: how much time the agent takes to obtain answers to all 114 pre-screening and Screening questions.

Testing the prototype

To test accuracy, they compared the AI’s screening and issue-spotting to that of human experts. They prepared 16 sample intake scenarios (inspired by real cases, similar to what other researchers have done) and for each scenario they had a law student or attorney determine the expected “intake outcome” (e.g., eligible vs. not eligible, and key issues identified). Then they ran each scenario through the AI chatbot and examined the results. The encouraging finding was that the AI correctly identified eligibility in the vast majority of cases, and when uncertain, it appropriately refrained from a definitive judgment – often saying a human would review. For example, in a scenario where the client’s income was slightly above the normal cutoff but they had a disability (which could qualify them under an exception), the AI noted the income issue but did not reject the case; it tagged it for staff review. This behavior aligned with the design goal of avoiding false negatives. 

In fact, across the test scenarios, the AI never outright “turned away” an eligible client. At worst, it sometimes told an ineligible client that it “might not” qualify and a human would confirm – a conservative approach that errs on inclusion. In terms of issue-spotting, the AI’s performance was good but not flawless. It correctly zeroed in on the main legal issue (e.g., nonpayment eviction, illegal lockout, landlord harassment) in nearly all cases. In a few complex scenarios, it missed secondary issues – for instance, a scenario involved both eviction and a housing code violation (mold), and the AI summary focused on the eviction but didn’t highlight the possible habitability claim. When attorneys reviewed this, they noted a human intake worker likely would have flagged the mold issue for potential affirmative claims. This indicated a learning: the AI might need further training or prompts to capture all legal issues, not just the primary one.

To gauge usefulness and usability, the team turned to qualitative feedback. They had LASSB intake staff and a couple of volunteer testers act as users in mock intake interviews with the AI. Afterward, they surveyed them on the experience. The intake staff’s perspective was crucial: they reviewed the AI-generated summaries alongside what a typical human-intake notes would look like. The staff generally found the AI summaries usable and in many cases more structured than human notes. The AI provided a coherent narrative of the problem and neatly listed relevant facts (dates, amounts, etc.), which some staff said could save them a few minutes per case in writing up memos. One intake coordinator commented that the AI “asked all the questions I would have asked” in a standard tenancy termination case – a positive sign of completeness. 

On the client side, volunteer testers noted that the AI was understandable and polite, though a few thought it was a bit “formal” in phrasing. This might reflect the fine line between professional and conversational tone – a point for possible adjustment. Importantly, testers reported that they “would be comfortable using this tool” and would trust that their information gets to a real lawyer. The presence of clear next-step messaging (that staff would follow up) seemed to reassure users that they weren’t just shouting into a void. The team also looked at efficiency metrics: In simulation, the AI interview took about 5–10 minutes of user time on average, compared to ~15 minutes for a typical phone intake. Of course, these were simulated users; real clients might take longer to type or might need more clarification. But it suggested the AI could potentially cut intake time by around 30-50% for straightforward cases, a significant efficiency gain.

Benchmarks for AI Performance

In designing evaluation, the team drew on emerging benchmarks in the AI & justice field. They set some target benchmarks such as: 

  • Zero critical errors (no client who should be helped is mistakenly rejected by the AI, and no obviously wrong information given), 
  • at least 80% alignment with human experts on identifying case eligibility (they achieved ~90% in testing), and 
  • high user satisfaction (measured informally via feedback forms). 

For safety, a benchmark was that the AI should trigger human intervention in 100% of cases where certain red flags appear (like mention of self-harm or urgent safety concerns). In test runs, there was one scenario where a client said something like “I have nowhere to go, I’m so desperate I’m thinking of doing something drastic.” 

The AI appropriately responded with empathy and indicated that it would notify the team for immediate assistance – meeting the safety benchmark. Another benchmark was privacy and confidentiality – the team checked that the AI was not inadvertently storing data outside approved channels. All test data was kept in a sandbox environment and they planned that any actual deployment would comply with confidentiality policies (e.g., not retaining chat transcripts longer than needed and storing them in LASSB’s secure system).

Feedback from Attorneys and Technologists: 

The prototype was demonstrated to a group of LASSB attorneys, intake staff, and a few technology advisors in late Winter quarter. The attorneys provided candid feedback. One housing lawyer was initially skeptical – concerned an AI might miss the human nuance – but after seeing the demo, they remarked that “the output is like what I’d expect from a well-trained intern or paralegal.” They appreciated that the AI didn’t attempt to solve the case but simply gathered information systematically. Another attorney asked about bias – whether the AI might treat clients differently based on how they talk (for instance, if a client is less articulate, would the AI misunderstand?). 

In response, the team showed how the AI asks gentle clarifying questions if it’s unsure, and they discussed plans for continuous monitoring to catch any biased outcomes. The intake staff reiterated that the tool could be very helpful as an initial filter, especially during surges. They did voice a concern: “How do we ensure the client’s story is accurately understood?” This led to a suggestion that in the pilot phase, staff double-check key facts with the client (“The bot noted you got a 3-day notice on Jan 1, is that correct?”) to verify nothing was lost in translation. 

Technologists (including advisors from the Stanford Legal Design Lab) gave feedback on the technical approach. They supported the use of rule-based gating combined with LLM follow-ups, noting that other projects (like the Missouri intake experiment) have found success with that hybrid model. They also advised to keep the model updated with policy changes – e.g., if income thresholds or laws change, those need to be reflected in the AI’s knowledge promptly, which is more of an operational challenge than a technical one. Overall, the feedback from all sides was that the prototype showed real promise, provided it’s implemented carefully. Stakeholders were excited that it could improve capacity, but they stressed that proper oversight and iterative improvement would be key before using it live with vulnerable clients.

What Worked Well in testing

Several aspects of the project went well. First, the AI agent effectively mirrored the standard intake procedure, indicating that the effort to encode LASSB’s intake script was successful. It consistently asked the fundamental eligibility questions and gathered core facts without needing human prompting. This shows that a well-structured prompt and logic can guide an LLM to perform a complex multi-step task reliably. 

Second, the LLM’s natural language understanding proved advantageous. It could handle varied user inputs – whether someone wrote a long story all at once or gave terse answers, the AI adapted. In one test, a user rambled about their landlord “kicking them out for no reason, changed locks, etc.” and the AI parsed that as an illegal lockout scenario and asked the right follow-up about court involvement. The ability to parse messy, real-life narratives and extract legal-relevant details is where AI shined compared to rigid forms. 

Third, the tone and empathy embedded in the AI’s design appeared to resonate. Test users noted that the bot was “surprisingly caring”. This was a victory for the team’s design emphasis on trauma-informed language – it validated that an AI can be programmed to respond in a way that feels supportive (at least to some users). 

Fourth, the AI’s cautious approach to eligibility (not auto-rejecting) worked as intended. In testing, whenever a scenario was borderline, the AI prompted for human review rather than making a call. This matches the desired ethical stance: no one gets thrown out by a machine’s decision alone. Finally, the process of developing the prototype fostered a lot of knowledge transfer and reflection. LASSB staff mentioned that just mapping out their intake logic for the AI helped them identify a few inefficiencies in their current process (like questions that might not be needed). So the project had a side benefit of process improvement insight for the human system too.

What Failed or Fell Short in testing

Despite the many positives, there were also failures and limitations encountered. One issue was over-questioning. The AI sometimes asked one or two questions too many, which could test a user’s patience. For example, in a scenario where the client clearly stated “I have an eviction hearing on April 1,” an earlier version of the bot still asked “Do you know if there’s a court date set?” which was redundant. This kind of repetition, while minor, could annoy a real user. It stemmed from the AI not having a perfect memory of prior answers unless carefully constrained – a known quirk of LLMs. The team addressed some instances by refining prompts, but it’s something to watch in deployment. Another shortcoming was handling of multi-issue situations. If a client brought up multiple problems (say eviction plus a related family law issue), the AI got somewhat confused about scope. In one test, a user mentioned being evicted and also having a dispute with a roommate who is a partner – mixing housing and personal relationship issues. The AI tried to be helpful by asking about both, but that made the interview unfocused. This highlights that AI may struggle with scope management – knowing what not to delve into. A design decision for the future might be to explicitly tell the AI to stick to housing and ignore other legal problems (while perhaps flagging them for later). 

Additionally, there were challenges with the AI’s legal knowledge limits. The prototype did not integrate an external legal knowledge base; it relied on the LLM’s trained knowledge (up to its cutoff date). While it generally knew common eviction terms, it might not know the latest California-specific procedural rules. For instance, if a user asked, “What is an Unlawful Detainer?” the AI provided a decent generic answer in testing, but we hadn’t formally allowed it to give legal definitions (since that edges into advice). If not carefully constrained, it might give incorrect or jurisdictionally wrong info. This is a risk the team noted: for production, one might integrate a vetted FAQ or knowledge retrieval component to ensure any legal info given is accurate and up-to-date.

We also learned that the AI could face moderation or refusal issues for certain sensitive content. As seen in other research, certain models have content filters that might refuse queries about violence or illegal activity. In our tests, when a scenario involved domestic violence, the AI handled it appropriately (did not refuse; it responded with concern and continued). But we were aware that some LLMs might balk or produce sanitised answers if a user’s description includes abuse details or strong language. Ensuring the AI remains able to discuss these issues (in a helpful way) is an ongoing concern – we might need to adjust settings or choose models that allow these conversations with proper context. 

Lastly, the team encountered the mundane but important challenge of integrating with existing systems. The prototype worked in a standalone environment, but LASSB’s real intake involves LegalServer and other databases. We didn’t fully solve how to plug the AI into those systems in real-time. This is less a failure of the AI per se and more a next-step technical hurdle, but it’s worth noting: a tool is only useful if it fits into the workflow. We attempted a small integration by outputting the summary in a format similar to a LegalServer intake form, but a true integration would require more IT development.

Why These Issues Arose

Many of the shortcomings trace back to the inherent limitations of current LLM technology and the complexity of legal practice. The redundant questions happened because the AI doesn’t truly understand context like a human, it only predicts likely sequences. If not explicitly instructed, it might err on asking again to be safe. Our prompt engineering reduced but didn’t eliminate this; it’s a reminder that LLMs need carefully bounded instructions. The scope creep with multiple issues is a byproduct of the AI trying to be helpful – it sees mention of another problem and, without human judgment about relevance, it goes after it. This is where human intake workers naturally filter and focus, something an AI will do only as well as it’s told to. 

Legal knowledge gaps are expected because an LLM is not a legal expert and can’t be updated like a database without re-training. We mitigated risk by not relying on it to give legal answers, but any subtle knowledge it applied (like understanding eviction procedure) comes from its general training, which might not capture local nuances. The team recognized that a retrieval-augmented approach (providing the AI with reference text like LASSB’s manual or housing law snippets) could improve factual accuracy, but that was beyond the initial prototype’s scope. 

Content moderation issues arise from the AI provider’s safety guardrails – these are important to have (to avoid harmful outputs), but they can be a blunt instrument. Fine-tuning them for a legal aid context (where discussions of violence or self-harm are sometimes necessary) is tricky and likely requires collaboration with the provider or switching to a model where we have more control. The integration challenge simply comes from the fact that legal aid tech stacks were not designed with AI in mind. Systems like LegalServer are improving their API offerings, but knitting together a custom AI with legacy systems is non-trivial. This is a broader lesson: often the tech is ahead of the implementation environment in nonprofits.

Lessons on Human-AI Teaming and Client Protection 

Developing this prototype yielded valuable lessons about how AI and humans can best collaborate in legal services. One clear lesson is that AI works best as a junior partner, not a solo actor. Our intake agent performed well when its role was bounded to assisting – gathering info, suggesting next steps – under human supervision. The moment we imagined expanding its role (like it drafting a motion or advising a client), the complexity and risk jumped exponentially. So, the takeaway for human-AI teaming is to start with discrete tasks that augment human work. The humans remain the decision-makers and safety net, which not only protects clients but also builds trust among staff. Initially, some LASSB staff were worried the AI might replace them or make decisions they disagreed with. By designing the system to clearly feed into the human process (rather than bypass it), we gained staff buy-in. They began to see the AI as a tool – like an efficient paralegal – rather than a threat. This cultural acceptance is crucial for any such project to succeed.

We also learned about the importance of transparency and accountability in the AI’s operation. For human team members to rely on the AI, they need to know what it asked and what the client answered. Black-box summaries aren’t enough. That’s why we ensured the full Q&A transcript is available to the staff reviewing the case. This way, if something looks off in the summary, the human can check exactly what was said. It’s a form of accountability for the AI. In fact, one attorney noted this could be an advantage: “Sometimes I wish I had a recording or transcript of the intake call to double-check details – this gives me that.” However, this raises a client protection consideration: since the AI interactions are recorded text, safeguarding that data is paramount (whereas a phone call’s content might not be recorded at all). We have to treat those chat logs as confidential client communications. This means robust data security and policies on who can access them.

From the client’s perspective, a lesson is that AI can empower clients if used correctly. Some testers said they felt more in control typing out their story versus speaking on the phone, because they could see what they wrote and edit their thoughts. The AI also never expresses shock or judgment, which some clients might prefer. However, others might find it impersonal or might struggle if they aren’t literate or tech-comfortable. So a takeaway is that AI intake should be offered as an option, not the only path. Clients should be able to choose a human interaction if they want. That choice protects client autonomy and ensures we don’t inadvertently exclude those who can’t or won’t use the technology (due to disability, language, etc.).

Finally, the project underscored that guarding against harm requires constant vigilance. We designed many protections into the system, but we know that only through real-world use will new issues emerge. One must plan to continuously monitor the AI’s outputs for any signs of bias, error, or unintended effects on clients. For example, if clients start treating the AI’s words as gospel (even though we tell them a human will follow up), we might need to reinforce disclaimers or adjust messaging. Human-AI teaming in legal aid is thus not a set-and-forget deployment; it’s an ongoing partnership where the technology must be supervised and updated by the humans running it. As one of the law students quipped, “It’s like having a really smart but somewhat unpredictable intern – you’ve got to keep an eye on them.” This captures well the role of AI: helpful, yes, but still requiring human oversight to truly protect and serve the client’s interests.

Section 5: Recommendations and Next Steps

Immediate Next Steps for LASSB: 

With the prototype built and initial evaluations positive, LASSB is poised to take the next steps toward a pilot. In the near term, a key step is securing approval and support from LASSB leadership and stakeholders. This includes briefing the executive team and possibly the board about the prototype’s capabilities and limitations, to get buy-in for moving forward. (Notably, LASSB’s executive director is already enthusiastic about using AI to streamline services.) 

Concurrently, LASSB should engage with its IT staff or consultants to plan integration of the AI agent with their systems. This means figuring out how the AI will receive user inquiries (e.g., via the LASSB website or a dedicated phone text line) and how the data will flow into their case management. 

A concrete next step is a small-scale pilot deployment of the AI intake agent in a controlled setting. One suggestion is to start with after-hours or overflow calls: for example, when the hotline is closed, direct callers to an online chat with the AI agent as an initial intake, with clear messaging that someone will follow up next day. This would allow testing the AI with real users in a relatively low-risk context (since those clients would likely otherwise just leave a voicemail or not connect at all). Another approach is to use the AI internally first – e.g., have intake staff use the AI in parallel with their own interviewing (almost like a decision support tool) to see if it captures the same info.

LASSB should also pursue any necessary training or policy updates. Staff will need to be trained on how to review AI-collected information, and perhaps coached to not simply trust it blindly but verify critical pieces. Policies may need updating to address AI usage – for instance, updating the intake protocol manual to include procedures for AI-assisted cases. 

Additionally, client consent and awareness must be addressed. A near-term task is drafting a short consent notice for clients using the AI (e.g., “You are interacting with LASSB’s virtual assistant. It will collect information that will be kept confidential and reviewed by our legal team. This assistant is not a lawyer and cannot give legal advice. By continuing you consent to this process.”). This ensures ethical transparency and could be implemented easily at the start of the chat. In summary, the immediate next steps revolve around setting up a pilot environment: getting green lights, making technical arrangements, and preparing staff and clients for the introduction of the AI intake agent.

Toward Pilot and Deployment

To move from prototype to a live pilot, a few things are needed. 

Resource investment is one – while the prototype was built by students, sustaining and improving it will require dedicated resources. LASSB may need to seek a grant or allocate budget for an “AI Intake Pilot” project. This could fund a part-time developer or an AI service subscription (Vertex AI or another platform) and compensate staff time spent on oversight. Given the interest in legal tech innovation, LASSB might explore funding from sources like LSC’s Technology Initiative Grants or private foundations interested in access to justice tech. 

Another requirement is to select the right technology stack for production. The prototype used Vertex AI; LASSB will need to decide if they continue with that (ensuring compliance with confidentiality) or shift to a different solution. Some legal aids are exploring open-source models or on-premises solutions for greater control. The trade-offs (development effort vs. control) should be weighed. It might be simplest initially to use a managed service like Vertex or OpenAI’s API with a strict data use agreement (OpenAI now allows opting out of data retention, etc.). 

On the integration front, LASSB should coordinate with its case management vendor (LegalServer) to integrate the intake outputs. LegalServer has an API and web intake forms; possibly the AI can populate a hidden web form with the collected data or attach a summary to the client’s record. Close collaboration with the vendor could streamline this – maybe an opportunity for the vendor to pilot integration as well, since many legal aids might want this functionality.

As deployment nears, testing and monitoring protocols must be in place. For the pilot, LASSB should define how it will measure success: e.g., reduction in wait times, number of intakes successfully processed by AI, client satisfaction surveys, etc. They should schedule regular check-ins (say weekly) during the pilot to review transcripts and outcomes. Any errors or missteps the AI makes in practice should be logged and analyzed to refine the system (prompt tweaks or additional training examples). It’s also wise to have a clear fallback plan: if the AI system malfunctions or a user is unhappy with it, there must be an easy way to route them to a human immediately. For instance, a button that says “I’d like to talk to a person now” should always be available. From a policy standpoint, LASSB might also want to loop in the California State Bar or ethics bodies just to inform them of the project and ensure there are no unforeseen compliance issues. While the AI is just facilitating intake (not giving legal advice independently), being transparent with regulators can build trust and preempt concerns.

Broader Lessons for Replication 

The journey of building the AI Intake Agent for LASSB offers several lessons for other legal aid organizations considering similar tools:

Start Small and Specific

One lesson is to narrow the use case initially. Rather than trying to build a do-it-all legal chatbot, focus on a specific bottleneck. For us it was housing intake; for another org it might be triaging a particular clinic or automating a frequently used legal form. A well-defined scope makes the project manageable and the results measurable. It also limits the risk surface. Others can take note that the success in Missouri’s project and ours came from targeting a concrete task (intake triage) rather than the whole legal counseling process.

Human-Centered Design is Key

Another lesson is the importance of deep collaboration with the end-users (both clients and staff). The LASSB team’s input on question phrasing, workflow, and what not to automate was invaluable. Legal aid groups should involve their intake workers, paralegals, and even clients (if possible via user testing) from day one. This ensures the AI solution actually fits into real-world practice and addresses real pain points. It’s tempting to build tech in a vacuum, but as we saw, something as nuanced as tone (“Are we sounding too formal?”) only gets addressed through human feedback. For the broader community, sharing design workbooks or guides can help – in fact, the Stanford team developed an AI pilot design workbook to aid others in scoping use cases and thinking through user personas.

Combine Rules and AI for Reliability

A clear takeaway from both our project and others in the field is that a hybrid approach yields the best results. Pure end-to-end AI (just throwing an LLM at the problem) might work 80% of the time, but the 20% it fails could be dangerous. By combining rule-based logic (for hard eligibility cutoffs or mandatory questions) with the flexible reasoning of LLMs, we got a system that was both consistent and adaptable. Legal aid orgs should consider leveraging their existing expertise (their intake manuals, decision trees) in tandem with AI, rather than assuming the AI will infer all the rules itself. This also makes the system more transparent – the rules part can be documented and audited easily.

Don’t Neglect Data Privacy and Ethics

Any org replicating this should prioritize confidentiality and client consent. Our approach was to treat AI intake data with the same confidentiality as any intake conversation. Others should do the same and ensure their AI vendors comply. This might mean negotiating a special contract or using on-prem solutions for sensitive data. Ethically, always disclose to users that they’re interacting with AI. We found users didn’t mind as long as they knew a human would be involved downstream. But failing to disclose could undermine trust severely if discovered. Additionally, groups should be wary of algorithmic bias

Test your AI with diverse personas – different languages, education levels, etc. – to see if it performs equally well. If your client population includes non-English speakers, make multi-language support a requirement from the start (some LLMs handle multilingual intake, or you might integrate translation services).

Benchmark and Share Outcomes

We recommend that legal aid tech pilots establish clear benchmark metrics (like we did for accuracy and false negatives) and openly share their results. This helps the whole community learn what is acceptable performance and where the bar needs to be. As AI in legal aid is still new, a shared evidence base is forming. For example, our finding of ~90% agreement with human intake decisions and 0 false denials in testing is encouraging, but we need more data from other contexts to validate that standard. JusticeBench (or similar networks) could maintain a repository of such pilot results and even anonymized transcripts to facilitate learning. The Medium article “A Pathway to Justice: AI and the Legal Aid Intake Problem” highlights some early adopters like LANC and CARPLS, and calls for exactly this kind of knowledge sharing and collaboration. Legal aid orgs should tap into these networks – there’s an LSC-funded AI working group inviting organizations to share their experiences and tools. Replication will be faster and safer if we learn from each other.

Policy and Regulatory Considerations

On a broader scale, the deployment of AI in legal intake raises policy questions. Organizations should stay abreast of guidance from funders and regulators. For instance, Legal Services Corporation may issue guidelines on use of AI that must be followed for funded programs. State bar ethics opinions on AI usage (especially concerning unauthorized practice of law (UPL) or competence) should be monitored. 

One comforting factor in our case is that the AI is not giving legal advice, so UPL risk is low. However, if an AI incorrectly tells someone they don’t qualify and thus they don’t get help, one could argue that’s a form of harm that regulators would care about. Hence, we reiterate: keep a human in the loop, and you largely mitigate that risk. If other orgs push into AI-provided legal advice, then very careful compliance with emerging policies (and likely some form of licensed attorney oversight of the AI’s advice) will be needed. For now, focusing on intake, forms, and other non-advisory assistance is the prudent path – it’s impactful but doesn’t step hard on the third rail of legal ethics.

Maintain the Human Touch

A final recommendation for any replication is to maintain focus on the human element of access to justice. AI is a tool, not an end in itself. Its success should be measured in how it improves client outcomes and experiences, and how it enables staff and volunteers to do their jobs more effectively without burnout. In our lessons, we saw that clients still need the empathy and strategic thinking of lawyers, and lawyers still need to connect with clients. AI intake should free up time for exactly those things – more counsel and advice, more personal attention where it matters – rather than become a barrier or a cold interface that clients feel stuck with. In designing any AI system, keeping that balanced perspective is crucial. To paraphrase a theme from the AI & justice field: the goal is not to replace humans, but to remove obstacles between humans (clients and lawyers) through sensible use of technology.

Policy and Ethical Considerations

In implementing AI intake agents, legal aid organizations must navigate several policy and ethical issues:

Confidentiality & Data Security

Client communications with an AI agent are confidential and legally privileged (similar to an intake with a human). Thus, the data must be stored securely and any third-party AI service must be vetted. If using a cloud AI API, ensure it does not store or train on your data, and that communications are encrypted. Some orgs may opt for self-hosted models to have full control. Additionally, clients should be informed that their information is being collected in a digital system and assured it’s safe. This transparency aligns with ethical duties of confidentiality.

As mentioned, always let the user know they’re dealing with an AI and not a live lawyer. This can be in a welcome message or a footnote on the chat interface. Users have a right to know and to choose an alternative. Also, make it clear that the AI is not giving legal advice, to manage expectations and avoid confusion about attorney-client relationship. Most people will understand a “virtual assistant” concept, but clarity is key to trust.

Guarding Against Improper Gatekeeping

Perhaps the biggest ethical concern internally is avoiding improper denial of service. If the AI were to mistakenly categorize someone as ineligible or not worth a case and they get turned away, that’s a serious justice failure. To counter this, our approach (and recommended generally) is to set the AI’s threshold such that it prefers false positives to false negatives. In practice, this means any close call gets escalated to a human. 

Organizations should monitor for any patterns of the AI inadvertently filtering out certain groups (e.g., if it turned out people with limited English were dropping off during AI intake, that would be unacceptable and the process must be adjusted). Having humans review at least a sample of “rejected” intakes is a good policy to ensure nobody meritorious slipped through. The principle should be: AI can streamline access, but final “gatekeeping” responsibility remains with human supervisors.

Bias and Fairness

AI systems can inadvertently perpetuate biases present in their training data. For a legal intake agent, this might manifest in how it phrases questions or how it interprets answers. For example, if a client writes in a way that the AI (trained on generic internet text) associates with untruthfulness or something, it might respond less helpfully. We must actively guard against such bias. That means testing the AI with diverse inputs and correcting any skewed behaviors. It might also mean fine-tuning the model on data that reflects the client population more accurately. 

Ethically, a legal aid AI should be as accessible and effective for a homeless person with a smartphone as for a tech-savvy person with a laptop. Fairness also extends to disability access – e.g., ensuring the chatbot works with screen readers or that there’s a voice option for those who can’t easily type.

Accuracy and Accountability

While our intake AI isn’t providing legal advice, accuracy still matters – it must record information correctly and categorize cases correctly. Any factual errors (like mistyping a date or mixing up who is landlord vs. tenant in the summary) could have real impacts. Therefore, building in verification (like the human review stage) is necessary. If the AI were to be extended to give some legal information, then accuracy becomes even more critical; one would need rigorous validation of its outputs against current law. 

Some proposals in the field include requiring AI legal tools to cite sources or provide confidence scores, but for intake, the main thing is careful quality control. Accountability wise, the organization using the AI must accept responsibility for its operation – meaning if something goes wrong, it’s on the organization, not some nebulous “computer.” This should be clear in internal policies: the AI is a tool under our supervision.

UPL and Ethical Practice

We touched on unauthorized practice of law concerns. Since our intake agent doesn’t give advice, it should not cross UPL lines. However, it’s a short step from intake to advice – for instance, if a user asks “What can I do to stop the eviction?” the AI has to hold the line and not give advice. Ensuring it consistently does so (and refers that question to a human attorney) is not just a design choice but an ethical mandate under current law. If in the future, laws or bar rules evolve to allow more automated advice, this might change. But as of now, we recommend strictly keeping AI on the “information collection and form assistance” side, not the “legal advice or counsel” side, unless a licensed attorney is reviewing everything it outputs to the client. There’s a broader policy discussion happening about how AI might be regulated in law – for instance, some have called for safe harbor rules for AI tools used by licensed legal aids under certain conditions. Legal aid organizations should stay involved in those conversations so that they can shape sensible guidelines that protect clients without stifling innovation.

The development of the AI Intake Agent for LASSB demonstrates both the promise and the careful planning required to integrate AI into legal services. The prototype showed that many intake tasks can be automated or augmented by AI in a way that saves time and maintains quality. At the same time, it reinforced that AI is best used as a complement to, not a replacement for, human expertise in the justice system. By sharing these findings with the broader community – funders, legal aid leaders, bar associations, and innovators – we hope to contribute to a responsible expansion of AI pilots that bridge the justice gap. The LASSB case offers a blueprint: start with a well-scoped problem, design with empathy and ethics, keep humans in the loop, and iterate based on real feedback. Following this approach, other organizations can leverage AI’s capabilities to reach more clients and deliver timely legal help, all while upholding the core values of access to justice and client protection. The path to justice can indeed be widened with AI, so long as we tread that path thoughtfully and collaboratively.

Categories
AI + Access to Justice Current Projects

A Call for Statewide Legal Help AI Stewards

Shaping the Future of AI for Access to Justice

By Margaret Hagan, originally published on Legal Design & Innovation

If AI is going to advance access to justice rather than deepen the justice gap, the public-interest legal field needs more than speculation and pilots — we need statewide stewardship.

2 missions of an AI steward, for a state’s legal help service provider community

We need specific people and institutions in every state who wake up each morning responsible for two things:

  1. AI readiness and vision for the legal services ecosystem: getting organizations knowledgeable, specific, and proactive about where AI can responsibly improve outcomes for people with legal problems — and improve the performance of services. This can ensure the intelligent and impactful adoption of AI solutions as they are developed.
  2. AI R&D encouragement and alignment: getting vendors, builders, researchers, and benchmark makers on the same page about concrete needs; matchmaking them with real service teams; guiding, funding, evaluating, and communicating so the right tools get built and adopted.

Ideally, these local state stewards will be talking with each other regularly. In this way, there can be federated research & development of AI solutions for legal service providers and the public struggling with legal problems.

This essay outlines what AI + Access to Justice stewardship could look like in practice — who can play the role, how it works alongside court help centers and legal aid, and the concrete, near-term actions a steward can take to make AI useful, safe, and truly public-interest.

State stewards can help local legal providers — legal aid groups, court help centers, pro bono networks, and community justice workers — to set a clear vision for AI futures & help execute it.

Why stewardship — why now?

Every week, new tools promise to draft, translate, summarize, triage, and file. Meanwhile, most legal aid organizations and court help centers are still asking foundational questions: What’s safe? What’s high-value? What’s feasible with our staff and privacy rules? How do we avoid vendor lock-in? How do we keep equity and client dignity at the center?

Without stewardship, AI adoption will be fragmented, extractive, and inequitable. With stewardship, states can:

  • Focus AI where it demonstrably helps clients and staff. Prioritize tech based on community and provider stakeholders’ needs and preferences — not just what is being sold by vendors.
  • Prepare data and knowledge so tools work in the local contexts. Also, that they can be trained safely & benchmarked responsibly with relevant data that is masked and safe.
  • Align funders, vendors, and researchers around real service needs. So that all of these stakeholder groups, with their capacity to support, build, and evaluate emerging technology, direct this capacity at opportunities that are meaningful.
  • Develop shared evaluation and governance so we build trust, not backlash.

Who can play the Statewide AI Steward role?

“Steward” is a role, not a single job title. Different kinds of groups can carry it, depending on how your state is organized:

  • Access to Justice Commissions / Bar associations / Bar foundations that convene stakeholders, fund statewide initiatives, and set standards.
  • Legal Aid Executive Directors (or cross-org consortia) with authority to coordinate practice areas and operations.
  • Court innovation offices / judicial councils that lead technology, self-help, and rules-of-court implementations.
  • University labs / legal tech nonprofits that have capacity for research, evaluation, data stewardship, and product prototyping.
  • Regional collaboratives with a track record of shared infrastructure and implementation.

Any of these can steward. The common denominator: local trusted relationships, coordination power, and delivery focus. The steward must be able to convene local stakeholders, communicate with them, work with them on shared training and data efforts, and move from talk to action.

The steward’s two main missions

Mission 1: AI readiness + vision (inside the legal ecosystem)

The steward gets legal organizations — executive directors, supervising/managing attorneys, practice leads, intake supervisors, operations staff — knowledgeable and specific about where AI can responsibly improve outcomes. This means:

  • Translating AI into service-level opportunities (not vague “innovation”).
  • Running short, targeted training sessions for leaders and teams.
  • Co-designing workflow pilots with clear review and safety protocols.
  • Building a roadmap: which portfolios, which tools, what sequence, what KPIs.
  • Clarify ethical, privacy, and consumer/client safety priorities and strategies, to talk about risks and worries in specific, technically-informed ways that provide sufficient protection to users and orgs — and don’t fall into inaction because of ill-defined concern about risk.

The result: organizations are in charge of the change rather than passive recipients of vendor pitches or media narratives.

2) AI tech encouragement + alignment (across the supply side)

The steward gets the groups who specialize in building and evaluating technology — vendors, tech groups, university researchers, benchmarkers— pointed at the right problems with the right real-world partnerships:

  • Publishing needs briefs by portfolio (housing, reentry, debt, family, etc).
  • Matchmaking teams and vendors; structuring pilots with data, milestones, evaluation, and governance. Helping organizations choose a best-in-class vendor and then also manage this relationship with regular evaluation.
  • Contributing to benchmarks, datasets, and red-teaming so the field learns together. Build the infrastructure that can lead to effective, ongoing evaluation of how AI systems are performing.
  • Helping fund and scale what works; communicating results frankly. Ensuring that prototypes and pilots’ outcomes are shared to inform others of what they might adopt, or what changes must happen to the AI solutions for them to be adopted or scaled.

The result: useful and robust AI solutions built with frontline reality, evaluated transparently, and ready to adopt responsibly.

What Stewards Could Do Month-to-Month

I have been brainstorming specific actions that a statewide steward could do. Many of these actions could also be done in concert with a federated network of stewards.

Some of the things a state steward could do to advance responsible, impactful AI for Access to Justice in their region.

Map the State’s Ecosystem of Legal Help

Too often, we think in terms of organizations — “X Legal Aid,” “Y Court Help Center” — instead of understanding who’s doing the actual legal work.

Each state needs to start by identifying the legal teams operating within its borders.

  • Who is doing eviction defense?
  • Who helps people with no-fault divorce filings?
  • Who handles reasonable accommodation letters for tenants?
  • Who runs the reentry clinic or expungement help line?
  • Who offers debt relief letter assistance?
  • Who does restraining order help?

This means mapping not just legal help orgs, but service portfolios and delivery models. What are teams doing? What are they not doing? And what are the unmet legal needs that clients consistently face?

This is a service-level analysis — an inventory of the “market” of help provided and the legal needs not yet met.

AI Training for Leaders + Broader Legal Organizations

Most legal aid and court help staff are understandably cautious about AI. Many don’t feel in control of the changes coming — they feel like they’re watching the train leave the station without them.

The steward’s job is to change that.

  • Demystify AI: Explain what these systems are and how they can support (or undermine) legal work.
  • Coach teams: Help practice leads and service teams see which parts of their work are ripe for AI support.
  • Invite ownership: Position AI not as a threat, but as a design space — a place where legal experts get to define how tools should work, and where lawyers and staff retain the power to review and direct.

To do this, stewards can run short briefings for EDs, intake leads, and practice heads on LLM basics, use cases, risks, UPL and confidentiality, and adoption playbooks. Training aims to get them conversant in the basics of the technology and help them envision where responsible opportunities might be. Let them see real-world examples of how other legal help providers are using AI behind the scenes or directly to the public.

Brainstorm + Opportunity Mapping Workshops with Legal Teams

Bring housing teams, family law facilitator teams, reentry teams, or other specific legal teams together. Have them map out their workflows and choose which of their day-to-day tasks is AI-opportune. Which of the tasks are routine, templated, and burdensome?

As stewards run these workshops, they can be on the lookout for where legal teams in their state can build, buy, or adopt an AI solution in 3 areas.

When running AI opportunity brainstorm, it’s worth considering these 3 zones: where can we add to existing legal full-representation servivces, where can we add to brief or pro bono services, and where can we add services that legal teams don’t currently offer?

Brainstorm 1: AI Copilots for Services Legal Teams Already Offer

This is the lowest-risk, highest-benefit space. Legal teams are already helping with eviction defense, demand letters, restraining orders, criminal record clearing, etc.

Here, AI can act as a copilot for the expert — a tool that does things that the expert lawyer, paralegal, or legal secretary is already doing in a rote way:

  • Auto-generates first drafts based on intake data
  • Summarizes client histories
  • Auto-fills court forms
  • Suggests next actions or deadlines
  • Creates checklists, declarations, or case timelines

These copilots don’t replace lawyers. They reduce drudge work, improve quality, and make staff more effective.

Brainstorm 2: AI Copilots for Services That Could Be Done by Pro Bono or Volunteers

Many legal aid organizations know where they could use more help: limited-scope letters, form reviews, answering FAQs, or helping users navigate next steps.

AI can play a key role in unlocking pro bono, brief advice, and volunteer capacity:

  • Automating burdensome tasks like collecting or review database records,
  • Helping them write high-quality letters or motions
  • Pre-filling petitions and forms with data that has been gathered
  • Providing them with step-by-step guidance
  • Flagging errors, inconsistencies, or risks in drafts
  • Offering language suggestions or plain-language explanations

Think of this as AI-powered “training wheels” that help volunteers help more people, with less handholding from staff.

Brainstorm 3: AI Tools for Services That Aren’t Currently Offered — But Should Be

There are many legal problems where there is high demand, but legal help orgs don’t currently offer help because of capacity limits.

Common examples of these under-served areas include:

  • Security deposit refund letters
  • Creating demand letters
  • Filing objections to default judgments
  • Answering brief questions

In these cases, AI systems — carefully designed, tested, and overseen — can offer direct-to-consumer services that supplement the safety net:

  • Structured interviews that guide users through legal options
  • AI-generated letters/forms with oversight built in
  • Clear red flags for when human review is needed

This is the frontier: responsibly extending the reach of legal help to people who currently get none. The brainstorm might also include reviewing existing direct-to-consumer AI tools from other legal orgs, and deciding which they might want to host or link to from their website.

The steward can hold these brainstorming and prioritization sessions to help legal teams find these legal team co-pilots, pro bono tools, and new service offerings in their issue area. The stewards and legal teams can move the AI vision forward & prepare for a clear scope for what AI should be built.

Data Readiness + Knowledge Base Building

Work with legal and court teams to inventory what data they have that could be used to train or evaluate some of the legal AI use cases they have envisioned. Support them with tools & protocols by which to mask PII in this document and make it safe to use in AI R&D.

This could mean getting anonymized completed forms, documents, intake notes, legal answers, data reports, or other legal workflow items. Likely, much of this data will have to be labeled, scored, and marked up so that it’s useful in training and evaluation.

The steward can help the groups that hold this data to understand what data they hold, how to prepare it and share it, and how to mark it up with helpful labels.

Part of this is also to build a Local Legal Help Knowledge Base — not just about the laws and statutes on the books, but about the practical, procedural, and service knowledge that people need when trying to deal with a legal problem.

Much of this knowledge is in legal aid lawyers’ and court staff’s heads, or training decks and events, or internal knowledge management systems and memos.

Stewards can help these local organizations contribute this knowledge about local legal rules, procedures, timelines, forms, services, and step-by-step guides into a statewide knowledge base. This knowledge base can then be used by the local providers. It will be a key piece of infrastructure on which new AI tools and services can be built.

Adoption Logistics

As local AI development visions come together, the steward can lead on adoption logistics.

The steward can make sure that the local orgs don’t reinvent what might already exist, or spend money in a wasteful way.

They can do tool evaluations to see which LLMs and specific AI solutions perform best on the scoped tasks. They can identify researchers and evaluators to help with this. They can also help organizations procure these tools or even create a pool of multiple organizations with similar needs for a shared procurement process.

They might also negotiate beneficial, affordable licenses or access to AI tools that can help with the desired functions. They can also ensure that case management and document management systems are responsive to the AI R&D needs, so that the legacy technology systems will integrate well with the new tools.

Ideally, the steward will help the statewide group and the local orgs make smart investments in the tech they might need to buy or build — and can help clear the way when hurdles emerge.

Bigger-Picture Steward Strategies

In addition to these possible actions, statewide stewards can also follow a few broader strategies to get a healthy AI R&D ecosystem in their state and beyond.

Be specific to legal teams

As I’ve already mentioned throughout this essay, stewards should be focused on the ‘team’ level, rather than the ‘organization’ one. It’s important that they develop relationships and run activities with teams that are in charge of specific workflows — and that means the specific kind of legal problem they help with.

Stewardship should be organizing its statewide network of named teams and named services, for example,

  • Housing law teams & their workflows: hotline consults, eviction defense prep, answers, motions to set aside, trial prep, RA letters for habitability issues, security-deposit demand letters.
  • Reentry teams & their workflows: record clearance screening, fines & fees relief, petitions, supporting declarations, RAP sheet interpretation, collateral consequences counseling.
  • Debt/consumer teams & their workflows: answer filing, settlement letters, debt verification, exemptions, repair counseling, FDCPA dispute letters.
  • Family law teams & their workflows: form prep (custody, DV orders), parenting plans, mediation prep, service and filing instructions, deadline tracking.

The steward can make progress on its 2 main goals — AI readiness and R&D encouragement — if it can build a strong local network among the teams that work on similar workflows, with similar data and documents, with similar audiences.

Put ethics, privacy, and operational safeguards at the center

Stewardship builds trust by making ethics operational rather than an afterthought. This all happens when AI conversations are grounded, informed, and specific among legal teams and communities. It also happens when they work with trained evaluators, who know how to evaluate the performance of AI rigorously, not based on anecdotes and speculation.

The steward network can help by planning out and vetting common, proven strategies to ensure quality & consumer protection are designed into the AI systems. They could work on:

  • Competence & supervision protocols: helping legal teams plan for the future of expert review of AI systems, clarifying “eyes-on” review models with staff trainings and tools. Stewards can also help them plan for escalation paths, when human reviewers find problems with the AI’s performance. Stewards might also work on standard warnings, verification prompts, and other key designs to ensure that reviewers are effectively watching AI’s performance.
  • Professional ethics rules clarity: help the teams design internal policies that ensure they’re in compliance with all ethical rules and responsibilities. Stewards can also help them plan out effective disclosures and consent protocols, so consumers know what is happening and have transparency.
  • Confidentiality & privacy: This can happen at the federated/ national level. Stewards can set rules for data flows, retention, de-identification/masking — which otherwise can be overwhelming for specific orgs. Stewards can also vet vendors for security and subprocessing.
  • Accountability & Improvements: Stewards can help organizations and vendors plan for good data-gathering & feedback cycles about AI’s performance. This can include guidance on document versioning, audit logs, failure reports, and user feedback loops.

Stewards can help bake safeguards into workflows and procurement, so that there are ethics and privacy by design in the technical systems that are being piloted.

Networking stewards into a federated ecosystem

For statewide stewardship to matter beyond isolated pilots, stewards need to network into a federated ecosystem — a light but disciplined network that preserves local autonomy while aligning on shared methods, shared infrastructure, and shared learning.

The value of federation is compounding: each state adapts tools to local law and practice, contributes back what it learns, and benefits from the advances of others. Also, many of the tasks of a steward — educating about AI, building ethics and safeguards, measuring AI, setting up good procurement — will be quite similar state-to-state. Stewards can share resources and materials to implement locally.

What follows reframes “membership requirements” as the operating norms of that ecosystem and explains how they translate into concrete habits, artifacts, and results.

Quarterly check-ins become the engine of national learning. Stewards participate in a regular virtual cohort, not as a status ritual but as an R&D loop. Each session surfaces what was tried, what worked, and what failed — brief demos, before/after metrics, and annotated playbooks.

Stewards use these meetings to co-develop materials, evaluation rubrics, funding strategies, disclosure patterns, and policy stances, and to retire practices that didn’t pan out. Over time, this cadence produces a living canon of benchmarks and templates that any newcomer steward can adopt on day one.

Each year, the steward could champion at least one pilot or evaluation (for example, reasonable-accommodation letters in housing or security-deposit demand letters in consumer law), making sure it has clear success criteria, review protocols, and an exit ramp if risks outweigh benefits. This can help the pilots spread to other jurisdictions more effectively.

Shared infrastructure is how federation stays interoperable. Rather than inventing new frameworks in every state, stewards lean on common platforms for evaluation, datasets, and reusable workflows. Practically, that means contributing test cases and localized content, adopting shared rubrics and disclosure patterns, and publishing results in a comparable format.

It also means using common identifiers and metadata conventions so that guides, form logic, and service directories can be exchanged or merged without bespoke cleanup. When a state localizes a workflow or improves a safety check, it pushes the enhancement upstream, so other states can pull it down and adapt with minimal effort.

Annual reporting turns stories into evidence and standards. Each steward could publish a concise yearly report that covers: progress made, obstacles encountered, datasets contributed (and their licensing status), tools piloted or adopted (and those intentionally rejected), equity and safety findings, and priorities for the coming year.

Because these reports follow a common outline, they are comparable across states and can be aggregated nationally to show impact, surface risks, and redirect effort. They also serve as onboarding guides for new teams: “Here’s what to try first, here’s what to avoid, here’s who to call.”

Success in 12–18 months looks concrete and repeatable. In a healthy federation, we could point to a public, living directory of AI-powered teams and services by portfolio, with visible gaps prioritized for action.

  • We could have several legal team copilots embedded in high-volume workflows — say, demand letters, security-deposit letters, or DV packet preparation — with documented time savings, quality gains, and staff acceptance.
  • We could have volunteer unlocks, where a clinic or pro bono program helps two to three times more people in brief-service matters because a copilot provides structure, drafting support, and review checkers.
  • We could have at least one direct-to-public workflow launched in a high-demand, manageable-risk area, with clear disclosures, escalation rules, and usage metrics.
  • We would see more contributions to data-driven evaluation practices and R&D protocols. This could be localized guides, triage logic, form metadata, anonymized samples, and evaluation results. Or it could be an ethics and safety playbook that is not just written but operationalized in training, procurement, and audits.

A federation of stewards doesn’t need heavy bureaucracy. It could be a set of light, disciplined habits that make local work easier and national progress faster. Quarterly cohort exchanges prevent wheel-reinventing. Local duties anchor AI in real services. Shared infrastructure keeps efforts compatible. Governance protects the public-interest character of the work. Annual reports convert experience into standards.

Put together, these practices allow stewards to move quickly and responsibly — delivering tangible improvements for clients and staff while building a body of knowledge the entire field can trust and reuse.

Stewardship as the current missing piece

Our team at Stanford Legal Design Lab is aiming for an impactful, ethical, robust ecosystem of AI in legal services. We are building the platform JusticeBench to be a home base for those working on AI R&D for access to justice. We are also building justice co-pilots directly with several legal aid groups.

But to build this robust ecosystem, we need local stewards for state jurisdictions across the country — who can take on key leadership roles and decisions — and make sure that there can be A2J AI that responds to local needs but benefits from national resources. Stewards can also help activate local legal teams, so that they are directing the development of AI solutions rather than reacting to others’ AI visions.

We can build legal help AI state by state, team by team, workflow by workflow. But we need stewards who keep clients, communities, and frontline staff at the center, while moving their state forward.

That’s how AI becomes a force for justice — because we designed it that way.

Categories
AI + Access to Justice Current Projects

Can LLMs help streamline legal aid intake?

Insights from Quinten Steenhuis at the AI + Access to Justice Research Seminar

Recently, the Stanford Legal Design Lab hosted its latest installment of the AI+Access to Justice Research Seminar, featuring a presentation from Quinten Steenhuis.

Quinten is a professor and innovator-in-residence at Suffolk Law School’s LIT Lab. He’s also a former housing attorney in Massachusetts who has made a significant impact with projects like Court Forms Online and MADE, a tool for automating eviction help. His group Lemma Legal works with groups on developing legal tech for interviews, forms, and documents.

His presentation in April 2025 focused on a project he’s been working on in collaboration with Hannes Westermann from the Maastricht Law & Tech Lab. This R&D project focuses on whether large language models (LLMs) are effective at tasks that might streamline intake in civil legal services. This work is being developed in partnership with Legal Aid of Eastern Missouri, along with other legal aid groups and funding from the U.S. Department of Housing and Urban Development.

The central question addressed was: Can LLMs help people get through the legal intake process faster and more accurately?

The Challenge: Efficient, Accurate Legal Aid Intake and Triage

For many people, legal aid is hard to access. That is in part because of the intake process, to apply for help from a local legal aid group. It can be time-consuming and frustrating for people to go through the current legal aid intake and triage process. Imagine calling a legal aid hotline, stressed out about a problem with your housing, family, finances, or job, only to wait on hold for an hour or more. When your call is finally answered, the intake worker needs to determine whether you qualify for help based on a complex and often open-textured set of rules. These rules vary significantly depending on jurisdiction, issue area, and individual circumstances — from citizenship and income requirements to more subjective judgments like whether a case is a “good” or “bad” fit for the program’s priorities or funding streams.

Intake protocols are typically documented internally for staff members in narrative guides, sometimes as long as seven pages, containing a mix of rules, sample scripts, timelines, and sub-rules that differ by zip code and issue type. These rules are rarely published online, as they can be too complex for users to interpret on their own. Legal aid programs may also worry about misinterpretation and inaccurate self-screening by clients. Instead, they keep these screening rules private to their staff.

Moreover, the intake process can involve up to 30+ rules about which cases to accept. These rules can vary between legal aid groups and can also change frequently (often in part because of funding that changes frequently). This “rules complexity” makes it hard for call center workers to provide consistent, accurate determinations about whose case will be accepted, leading to long wait times and inconsistent screening results. The challenge is to reduce the time legal aid workers spend screening without incorrectly denying services to those who qualify.

The Proposed Intervention: Integrating LLMs for Faster, Smarter Intake

To address this issue, Quinten, Hannes, and their partners have been exploring whether LLMs can help automate parts of the intake process. Specifically, they asked:

  • Can LLMs quickly determine whether someone qualifies for legal aid?
  • Can this system reduce the time spent on screening and make intake more efficient?

The solution they developed is part of the Missouri Tenant Help project, a hybrid system that combines rule-based questions with LLM-powered responses. The site’s intake system begins by asking straightforward, rules-based questions about citizenship, income, location, and problem description. It uses DocAssemble, a flexible platform that integrates Missouri-specific legal screening questions with central rules from Suffolk’s Court Forms Online for income limits and federal guidelines.

At one point in the intake workflow, the system prompts users to describe their problem in a free-text box. The LLM then analyzes the input, cross-referencing it with the legal aid group’s eligibility rules. If the system still lacks sufficient data, it generates follow-up questions in real-time, using a low-temperature model version to ensure consistent and cautious output.

For example, if a user says, “I got kicked out of my house,” the system might follow up with, “Did your landlord give you any formal notice or involve the court before evicting you?” The goal is to quickly assess whether the person might qualify for legal help while minimizing unnecessary back-and-forth. The LLM’s job is to identify the legal problem at issue, and then match this specific legal problem with the case types that legal aid groups around Missouri may take (or may not).

If the LLM works perfectly, it would be able to predict correctly whether a legal aid group is likely to take on this case, is likely to decline it, or if it is borderline.

The Experiment: Testing Different LLMs

To evaluate the system, the team conducted an experiment using 16 scenarios, 3 sets of legal aid program rules, and 8 different LLMs (including open-source, commercial, and popular models). The main question was whether the system could accurately match the “accept” or “reject” labels that legal experts had assigned to the scenarios.

The team found that the LLMs did a fairly accurate job at predicting which cases should be accepted or not. Overall, the LLMs correctly predicted acceptance or rejection with 84% precision, and GPT-4 Turbo performed the best.

Of particular interest were the rates of inaccurate predictions to reject a case. The system rarely made incorrect denials, which is critical for avoiding unjust exclusion from services. Rather, the LLM erred on the side of caution, often generating follow-up questions rather than making definitive, potentially incorrect judgments.

However, it sometimes asked for unnecessary follow-up information even when it already had enough data. This could mean that it led to a bad user experience, asking for too many redundant details and delaying making a decision. The problem was not around inaccuracy, though.

Challenges and Insights

One surprising result was that the LLMs sometimes caught errors made by human labelers. For example, in one case involving a support animal in Kansas City, the model correctly identified that a KC legal aid group was likely to accept this case, while the human reviewer mistakenly marked it as a likely denial. This underscores the potential of LLMs to enhance accuracy when paired with human oversight.

However, the LLMs also faced unique challenges.

  • Some models, like Gemini, refused to engage with topics related to domestic violence due to content moderation settings. This raised questions about whether AI developers understand the nuances of legal contexts. It also flagged the importance of screening possible models for use, depending on whether they censor legal topics.
  • The system also struggled with ambiguous scenarios, like evaluating whether “flimsy doors and missing locks” constituted a severe issue. Such situations highlighted the need for more tailored training and model configuration.

User Feedback and Next Steps

The system has been live for a month and a half and is currently offered as an optional self-screening tool on the Missouri Tenant Help website. Early feedback from legal aid partners has been positive, with high satisfaction ratings from users who tested the system. Some service providers noted they would like to see more follow-up questions to gather comprehensive details upfront — envisioning the LLM doing even more data-gathering, beyond what is needed to determine if a case is likely to be accepted or rejected.

In the future, the team aims to continue refinement and planning work, including to:

  1. Refine the LLM prompts and training data to better capture nuanced legal issues.
  2. Improve system accuracy by integrating rules-based reasoning with LLM flexibility.
  3. Explore more cost-effective models to keep the service affordable — currently around 5 cents per interaction.
  4. Enhance error handling by implementing model switching when a primary LLM fails to respond or disengages due to sensitive content.

Can LLMs and Humans Work Together?

This project exemplifies how LLMs and human experts can complement each other. Rather than fully automating intake, the system serves as a first-pass filter. It gives community members a quicker tool to get a high-level read on whether they are likely to get services from a legal aid group, or whether it would be better for them to pursue another service.

Rather than waiting for hours on a phone line, the user can choose to use this tool to get quicker feedback. They can still call the program — the system does not issue a rejection, but rather just gives them a prediction of what the legal aid will tell them.

The next phase will involve ongoing live testing and iterative improvements to balance speed, accuracy, and user experience.

The Future of Improving Legal Intake with AI

As legal aid programs increasingly look to AI and LLMs to streamline intake, several key opportunities and challenges are emerging.

1. Enhancing Accuracy and Contextual Understanding:

One promising avenue is the development of more nuanced models that can better interpret ambiguous or context-dependent situations. For instance, instead of flagging a potential denial based solely on rigid rule interpretations, the system could use context-aware prompts that take into account local regulations and specific case details. This might involve combining rule-based logic with adaptive LLM responses to better handle edge cases, like domestic violence scenarios or complex tenancy disputes.

2. Adaptive Model Switching:

Another promising approach is to implement a hybrid model system that dynamically switches between different LLMs depending on the context. For example, if a model like Gemini refuses to address sensitive topics, the system could automatically switch to a more legally knowledgeable model or one with fewer content moderation constraints. This could be facilitated by a router API that monitors for censorship or errors and adjusts the model in real time.

3. More Robust Fact Gathering:

A significant future goal is to enhance the system’s ability to collect comprehensive facts during intake. Legal aid workers noted that they often needed follow-up information after the initial screening, especially when the client’s problem involved specific housing issues or complex legal nuances. The next version of the system will focus on expanding the follow-up question logic to reduce the need for manual callbacks. This could involve developing predefined question trees for common issues while maintaining the model’s ability to generate context-specific follow-up questions.

4. Tailoring to Local Needs and Specific Use Cases:

One of the biggest challenges for scaling AI-based intake systems is ensuring that they are flexible enough to adapt to local legal nuances. The team is considering ways to contextualize the system for individual jurisdictions, potentially using open-source approaches to allow local legal aid programs to train their own versions. This could enable more customized intake systems that better reflect local policies, tenant protections, and court requirements.

5. Real-Time Human-AI Collaboration:

Looking further ahead, there is potential for building integrated systems where AI actively assists call center workers in real time. For instance, instead of having the AI conduct intake independently, it could listen to live calls and provide real-time suggestions to human operators, similar to how customer support chatbots assist agents. This would allow AI to augment rather than replace human judgment, helping to maintain quality control and legal accuracy.

6. Privacy and Ethical Considerations:

As these systems evolve, maintaining data privacy and ethical standards will be crucial. The current setup already segregates personal information from AI processing, but as models become more integrated into intake workflows, new strategies may be needed. Exploring privacy-preserving AI methods and data anonymization techniques will help maintain compliance while leveraging the full potential of LLMs.

7. Cost and Efficiency Optimization:

At the current cost of around 5 cents per interaction, the system remains relatively affordable, but as more users engage, maintaining cost efficiency will be key. The team plans to experiment with more affordable model versions and optimize the routing strategy to ensure that high-quality responses are delivered at a sustainable price. The goal is to make the intake process not just faster but also economically feasible for widespread adoption.

Building the Next Generation of Legal Aid Systems

Quinten’s presentation at AI + Access to Justice seminar made it clear that while LLMs hold tremendous potential for improving legal intake, human oversight and adaptive systems are crucial to ensure reliability and fairness. The current system’s success — 84% precision, minimal false denials, and positive user feedback — shows that AI-human collaboration is not only possible but also promising.

As the team continues to refine the system, they aim to create a model that can balance efficiency with accuracy, while being adaptable to the diverse and dynamic needs of legal aid programs. The long-term vision is to develop a scalable, open-source tool that local programs can fine-tune and deploy independently, making access to legal support faster and more reliable for those who need it most.

Read the research article in detail here.

See more at Quinten’s group Lemma Legal: https://lemmalegal.com/

Read more about Hannes at Maastricht University: https://cris.maastrichtuniversity.nl/en/persons/hannes-westermann

Categories
AI + Access to Justice Current Projects

Justice AI Co-Pilots

The Stanford Legal Design Lab is proud to announce a new initiative funded by the Gates Foundation that aims to bring the power of artificial intelligence (AI) into the hands of legal aid professionals. With this new project, we’re building and testing AI systems—what we’re calling “AI co-pilots”—to support legal aid attorneys and staff in two of the most urgent areas of civil justice: eviction defense and reentry debt mitigation.

This work continues our Lab’s mission to design and deploy innovative, human-centered solutions that expand access to justice, especially for those who face systemic barriers to legal support.

A Justice Gap That Demands Innovation

Across the United States, millions of people face high-stakes legal problems without any legal representation. Eviction cases and post-incarceration debt are two such areas, where legal complexity meets chronic underrepresentation—leading to outcomes that can reinforce poverty, destabilize families, and erode trust in the justice system.

Legal aid organizations are often the only line of defense for people navigating these challenges, but these nonprofits are severely under-resourced. These organizations are on the front lines of help, but often are stretched thin with staffing, tech, and resources.

The Project: Building AI Co-Pilots for Legal Aid Workflows

In collaboration with two outstanding legal aid partners—Legal Aid Foundation of Los Angeles (LAFLA) and Legal Aid Services of Oklahoma (LASO)—we are designing and piloting four AI co-pilot prototypes: two for eviction defense, and two for reentry debt mitigation.

These AI tools will be developed to assist legal aid professionals with tasks such as:

  • Screening and intake
  • Issue spotting and triage
  • Drafting legal documents
  • Preparing litigation strategies
  • Interpreting complex legal rules

Rather than replacing human judgment, these tools are meant to augment legal professionals’ work. The aim is to free up time for higher-value legal advocacy, enable legal teams to take on more clients, and help non-expert legal professionals assist in more specialized areas.

The goal is to use a deliberate, human-centered process to first identify low-risk, high-impact tasks for AI to do in legal teams’ workflows, and then to develop, test, pilot, and evaluate new AI solutions that can offer safe, meaningful improvements to legal service delivery & people’s social outcomes.

Why Eviction and Reentry Debt?

These two areas were chosen because of their widespread and devastating impacts on people’s housing, financial stability, and long-term well-being.

Eviction Defense

Over 3 million eviction lawsuits are filed each year in the U.S., with the vast majority of tenants going unrepresented. Without legal advocacy, many tenants are unaware of their rights or defenses. It’s also hard to fill in the many complicated legal documents required to participate in they system, protect one’s rights, and avoid a default judgment. This makes it difficult to negotiate with landlords, comply with court requirements, and protect one’s housing and money.

Evictions often happen in a matter of weeks, and with a confusing mix of local and state laws, it can be hard for even experienced attorneys to respond quickly. The AI co-pilots developed through this project will help legal aid staff navigate these rules and prepare more efficiently—so they can support more tenants, faster.

Reentry Debt

When people return home after incarceration, they often face legal financial obligations that can include court fines, restitution, supervision fees, and other penalties. This kind of debt can make it hard for a person to get to stability with housing, employment, driver’s licenses, and family.

According to the Brennan Center for Justice, over 10 million Americans owe more than $50 billion in reentry-related legal debt. Yet there are few tools to help people navigate, reduce, or resolve these obligations. By working with LASO, we aim to prototype tools that can help legal professionals advise clients on debt relief options, identify eligibility for fee waivers, and support court filings.

What Will the AI Co-Pilots Actually Do?

Each AI co-pilot will be designed for real use in legal aid organizations. They’ll be integrated into existing workflows and tailored to the needs of specific roles—like intake specialists, paralegals, or staff attorneys. Examples of potential functionality include:

  • Summarizing client narratives and flagging relevant legal issues
  • Filling in common forms and templates based on structured data
  • Recommending next steps based on jurisdictional rules and case data
  • Generating interview questions for follow-up conversations
  • Cross-referencing legal codes with case facts

The design process will be collaborative and iterative, involving continuous feedback from attorneys, advocates, and technologists. We will pilot and evaluate each tool rigorously to ensure its effectiveness, usability, and alignment with legal ethics.

Spreading the Impact

While the immediate goal is to support LAFLA and LASO, we are designing the project with national impact in mind. Our team plans to publish:

  • Open-source protocols and sample workflows
  • Evaluation reports and case studies
  • Responsible use guidelines for AI in legal aid
  • Collaboration pathways with legal tech vendors

This way, other legal aid organizations can replicate and adapt the tools to their own contexts—amplifying the reach of the project across the U.S.

“There’s a lot of curiosity in the legal aid field about AI—but very few live examples to learn from,” Hagan said. “We hope this project can be one of those examples, and help the field move toward thoughtful, responsible adoption.”

Responsible AI in Legal Services

At the Legal Design Lab, we know that AI is not a silver bullet. Tools must be designed thoughtfully, with attention to risks, biases, data privacy, and unintended consequences.

This project is part of our broader commitment to responsible AI development. That means:

  • Using human-centered design
  • Maintaining transparency in how tools work and make suggestions
  • Prioritizing data privacy and user control
  • Ensuring that tools do not replace human judgment in critical decisions

Our team will work closely with our legal aid partners, domain experts, and the communities served to ensure that these tools are safe, equitable, and truly helpful.

Looking Ahead

Over the next two years, we’ll be building, testing, and refining our AI co-pilots—and sharing what we learn along the way. We’ll also be connecting with national networks of eviction defense and reentry lawyers to explore broader deployment and partnerships.

If you’re interested in learning more, getting involved, or following along with project updates, sign up for our newsletter or follow the Lab on social media.

We’re grateful to the Gates Foundation for their support, and to our partners at LAFLA and LASO for their leadership, creativity, and deep dedication to the clients they serve.

Together, we hope to demonstrate how AI can be used responsibly to strengthen—not replace—the critical human work of legal aid.

Categories
AI + Access to Justice Current Projects

ICAIL workshop on AI & Access to Justice

The Legal Design Lab is excited to co-organize a new workshop at the International Conference on Artificial Intelligence and Law (ICAIL 2025):

AI for Access to Justice (AI4A2J@ICAIL 2025)
📍 Where? Northwestern University, Chicago, Illinois, USA
🗓 When? June 20, 2025 (Hybrid – in-person and virtual participation available)
📄 Submission Deadline: May 4, 2025
📬 Acceptance Notification: May 18, 2025

Submit a paper here https://easychair.org/cfp/AI4A2JICAIL25

This workshop brings together researchers, technologists, legal aid practitioners, court leaders, policymakers, and interdisciplinary collaborators to explore the potential and pitfalls of using artificial intelligence (AI) to expand access to justice (A2J). It is part of the larger ICAIL 2025 conference, the leading international forum for AI and law research, hosted this year at Northwestern University in Chicago.


Why this workshop?

Legal systems around the world are struggling to meet people’s needs—especially in housing, immigration, debt, and family law. AI tools are increasingly being tested and deployed to address these gaps: from chatbots and form fillers to triage systems and legal document classifiers. Yet these innovations also raise serious questions around risk, bias, transparency, equity, and governance.

This workshop will serve as a venue to:

  • Share and critically assess emerging work on AI-powered legal tools
  • Discuss design, deployment, and evaluation of AI systems in real-world legal contexts
  • Learn from cross-disciplinary perspectives to better guide responsible innovation in justice systems


What are we looking for?

We welcome submissions from a wide range of contributors—academic researchers, practitioners, students, community technologists, court innovators, and more.

We’re seeking:

  • Research papers on AI and A2J
  • Case studies of AI tools used in courts, legal aid, or nonprofit contexts
  • Design proposals or system demos
  • Critical perspectives on the ethics, policy, and governance of AI for justice
  • Evaluation frameworks for AI used in legal services
  • Collaborative, interdisciplinary, or community-centered work

Topics might include (but are not limited to):

  • Legal intake and triage using large language models (LLMs)
  • AI-guided form completion and document assembly
  • Language access and plain language tools powered by AI
  • Risk scoring and case prioritization
  • Participatory design and co-creation with affected communities
  • Bias detection and mitigation in legal AI systems
  • Evaluation methods for LLMs in legal services
  • Open-source or public-interest AI tools

We welcome both completed projects and works-in-progress. Our goal is to foster a diverse conversation that supports learning, experimentation, and critical thinking across the access to justice ecosystem.


Workshop Format

The workshop will be held on June 20, 2025 in hybrid format—with both in-person sessions in Chicago, Illinois and the option for virtual participation. Presenters and attendees are welcome to join from anywhere.


Workshop Committee

  • Hannes Westermann, Maastricht University Faculty of Law
  • Jaromír Savelka, Carnegie Mellon University
  • Marc Lauritsen, Capstone Practice Systems
  • Margaret Hagan, Stanford Law School, Legal Design Lab
  • Quinten Steenhuis, Suffolk University Law School


Submit Your Work

For full submission guidelines, visit the official workshop site:
https://suffolklitlab.org/ai-for-access-to-justice-at-the-international-conference-on-ai-and-law-2025-ai4a2j-icail25/

Submit your paper at EasyChair here.

Submissions are due by May 4, 2025.
Notifications of acceptance will be sent by May 18, 2025.


We’re thrilled to help convene this conversation on the future of AI and justice—and we hope to see your ideas included. Please spread the word to others in your network who are building, researching, or questioning the role of AI in the justice system.

Categories
AI + Access to Justice Current Projects

Measuring What Matters: A Quality Rubric for Legal AI Answers

by Margaret Hagan, Executive Director of the Legal Design Lab

Measuring What Matters: A Quality Rubric for Legal AI Answers

As more people turn to AI for legal advice, a pressing issue emerges: How do we know whether AI-generated legal answers are actually helpful? While legal professionals and regulators may have instincts about good and bad answers, there has been no clear, standardized way to evaluate AI’s performance in this space — until now.

What makes a good answer on a chatbot, clinic, livechat, or LLM site?

My paper for the JURIX 2024 conference, Measuring What Matters: Developing Human-Centered Legal Q-and-A Quality Standards through Multi-Stakeholder Research, tackles this challenge head-on. Through a series of empirical studies, the paper develops a human-centered framework for evaluating AI-generated legal answers, ensuring that quality benchmarks align with what actually helps people facing legal problems. The findings provide valuable guidance for legal aid organizations, product developers, and policymakers who are shaping the future of AI-driven legal assistance.

Why Quality Standards for AI Legal Help Matter

When people receive a legal notice — like an eviction warning or a debt collection letter — they often turn to the internet for guidance. Platforms such as Reddit’s r/legaladvice, free legal aid websites, and now AI chatbots have become common sources of legal information. However, the reliability and usefulness of these answers vary widely.

AI’s increasing role in legal Q&A raises serious questions:

  • Are AI-generated answers accurate and actionable?
  • Do they actually help users solve legal problems?
  • Could they mislead people, causing harm rather than good?

My research addresses these concerns by involving multiple stakeholders — end users, legal experts, and technologists — to define what makes a legal answer “good.”

The paper reveals several surprising insights about what actually matters when evaluating AI’s performance in legal Q&A. Here are some key takeaways that challenge conventional assumptions:

1. Accuracy Alone Isn’t Enough — Actionability Matters More

One of the biggest surprises is that accuracy is necessary but not sufficient. While many evaluations of legal AI focus on whether an answer is legally correct, the study finds that what really helps people is whether the answer provides clear, actionable steps. A technically accurate response that doesn’t tell someone what to do next is not as valuable as a slightly less precise but highly actionable answer.

Example of accuracy that is not helpful to user’s outcome:

  • AI says: “Your landlord is violating tenant laws in your state.” (Accurate but vague)
  • AI says: “You should file a response within a short time period — often 7 days. (Though this 7 days may be different depending on your exact situation.) Here’s a link to your county’s tenant protection forms and a local legal aid service.” (Actionable and useful)

2. Accurate Information Is Not Always Good for the User

The study highlights that some legal rights exist on paper but can be risky to exercise in practice — especially without proper guidance. For example, withholding rent is a legal remedy in many states if a landlord fails to make necessary repairs. However, in reality, exercising this right can backfire:

  • Many landlords retaliate by starting eviction proceedings.
  • The tenant may misapply the law, thinking they qualify when they don’t.
  • Even when legally justified, withholding rent can lead to court battles that tenants often lose if they don’t follow strict procedural steps.

This is a case where AI-generated legal advice could be technically accurate but still harmful if it doesn’t include risk disclosures. The study suggests that high-risk legal actions should always come with clear warnings about potential consequences. Instead of simply stating, “You have the right to withhold rent,” a high-quality AI response should add:

  • “Withholding rent is legally allowed in some cases, but it carries huge risks, including eviction. It’s very hard to withhold rent correctly. Reach out to this tenants’ rights organization before trying to do it on your own.”

This principle applies to other “paper rights” too — such as recording police interactions, filing complaints against employers, or disputing debts — where following the law technically might expose a person to serious retaliation or legal consequences.

Legal answers should not just state rights but also warn about practical risks — helping users make informed, strategic decisions rather than leading them into legal traps.

3. Legal Citations Aren’t That Valuable for Users

Legal experts often assume that providing citations to statutes and case law is crucial for credibility. However, both users and experts in the study ranked citations as a lower-priority feature. Most users don’t actually read or use legal citations — instead, they prefer practical, easy-to-understand guidance.

However, citations do help in one way: they allow users to verify information and use it as leverage in disputes (e.g., showing a landlord they know their rights). The best AI responses include citations sparingly and with context, rather than overwhelming users with legal references.

4. Overly Cautious Warnings Can Be Harmful

Many AI systems include disclaimers like “Consult a lawyer before taking any action.” While this seems responsible, the study found that excessive warnings can discourage people from acting at all.

Since most people seeking legal help online don’t have access to a lawyer, AI responses should avoid paralyzing users with fear and instead guide them toward steps they can take on their own — such as contacting free legal aid or filing paperwork themselves.

5. Misleading Answers Are More Dangerous Than Completely Wrong Ones

AI-generated legal answers that contain partial truths or misrepresentations are actually more dangerous than completely wrong ones. Users tend to trust AI responses by default, so if an answer sounds authoritative but gets key details wrong (like deadlines or filing procedures), it can lead to serious harm (e.g., missing a legal deadline).

The study found that the most harmful AI errors were related to procedural law — things like incorrect filing deadlines, court names, or legal steps. Even small errors in these areas can cause major problems for users.

6. The Best AI Answers Function Like a “Legal GPS”

Rather than replacing lawyers, users want AI to act like a smart navigation system — helping them spot legal issues, identify paths forward, and get to the right help. The most helpful answers do this by:

  • Quickly diagnosing the problem (understanding what the user is asking about).
  • Giving step-by-step guidance (telling the user exactly what to do next).
  • Providing links to relevant forms and local services (so users can act on the advice).

Instead of just stating the law, AI should orient users, give them confidence, and point them toward useful actions — even if that means simplifying some details to keep them engaged.

AI’s Role in Legal Help Is About Empowerment, Not Just Information

The research challenges the idea that AI legal help should be measured only by how well it mimics a lawyer’s expertise. Instead, the most effective AI legal Q&A focuses on empowering users with clear, actionable, and localized guidance — helping them take meaningful steps rather than just providing abstract legal knowledge.

Key Takeaways for Legal Aid, AI Developers, and Policymakers

The paper’s findings offer important lessons for different stakeholders in the legal AI ecosystem.

1. Legal Aid Organizations: Ensuring AI Helps, Not Hurts

Legal aid groups may increasingly rely on AI to extend their reach, but they must be cautious about its limitations. The research highlights that users want AI tools that:

  • Provide clear, step-by-step guidance on what to do next.
  • Offer jurisdiction-specific advice rather than generic legal principles.
  • Refer users to real-world resources, such as legal aid offices or court forms.
  • Are easy to read and understand, avoiding legal jargon.

Legal aid groups should ensure that the AI tools they deploy adhere to these quality benchmarks. Otherwise, users may receive vague, confusing, or even misleading responses that could worsen their legal situations.

2. AI Product Developers: Building Legal AI Responsibly & Knowing Justice Use Cases

AI developers must recognize that accuracy alone is not enough. The paper identifies four key criteria for evaluating the quality of AI legal answers:

  1. Accuracy — Does the answer provide correct legal information? And when legal information is accurate but high-risk, does it tell people about rights and options with sufficient context?
  2. Actionability — Does it offer concrete steps that the user can take?
  3. Empowerment — Does it help users feel capable of handling their problem?
  4. Strategic Caution — Does it avoid causing unnecessary fear or discouraging action?

One surprising insight is that legal citations — often seen as a hallmark of credibility — are not as critical as actionability. Users care less about legal precedents and more about what they can do next. Developers should focus on designing AI responses that prioritize usability over technical legal accuracy alone.

3. Policymakers: Regulating AI for Consumer Protection & Outcomes

For regulators, the study underscores the need for clear, enforceable quality standards for AI-generated legal guidance. Without such standards, AI-generated legal help may range from extremely useful to dangerously misleading.

Key regulatory considerations include:

  • Transparency: AI platforms should disclose how they generate answers and whether they have been reviewed by legal experts.
  • Accuracy Audits: Regulators should develop auditing protocols to ensure AI legal help is not systematically providing incorrect or harmful advice.
  • Consumer Protections: Policies should prevent AI tools from deterring users from seeking legal aid when needed.

Policymakers ideally will be in conversation with frontline practitioners, product/model developers, and community members to understand what is important to measure, how to measure it, and how to increase the quality and safety of performance. Evaluation based on concepts like Unauthorized Practice of Law does not necessarily correspond to consumers’ outcomes, needs, and priorities. Rather, figuring out what is beneficial to consumers should be based on what matters to the community and frontline providers.

The Research Approach: A Human-Centered Framework

How did we identify these insights and standards? The study used a three-part research process to hear from community members, frontline legal help providers, and access to justice experts. (Thanks to the Legal Design Lab team for helping me with interviews and study mechanics!)

  1. User Interviews: 46 community members tested AI legal help tools and shared feedback on their usefulness and trustworthiness.
  2. Expert Evaluations: 21 legal professionals ranked the importance of various quality criteria for AI-generated legal answers.
  3. AI Response Ratings: Legal experts assessed real AI-generated answers to legal questions, identifying common pitfalls and best practices.

This participatory, multi-stakeholder approach ensures that quality metrics reflect the real-world needs of legal aid seekers, not just theoretical legal standards.

The Legal Q-and-A Quality Rubric

What’s Next? Implementing the Quality Rubric

The research concludes with a proposed Quality Rubric that can serve as a blueprint for AI developers, researchers, and regulators. This rubric provides a scoring system that evaluates legal AI answers based on their strengths and weaknesses across key quality dimensions.

Potential next steps include:

  • Regular AI audits using the Quality Rubric to track performance.
  • Collaboration between legal aid groups and AI developers to refine AI-generated responses.
  • Policy frameworks that hold AI platforms accountable for misleading or harmful legal information.

Others might be developing internal quality review of the RAG-bots and AI systems on their websites and tools. They can use the rubric above as they are doing safety and quality checks, or training human labelers or AI automated judges to conduct these checks.

Conclusion: Measuring AI for Better Access to Justice

AI holds great promise for expanding access to legal help, but it must be measured and managed effectively. My research provides a concrete roadmap for ensuring that AI legal assistance is not just technically impressive but genuinely useful to people in need.

For legal aid organizations, the priority should be integrating AI tools that align with the study’s quality criteria. For AI developers, the challenge is to design products that go beyond accuracy and focus on usability, actionability, and strategic guidance. And for policymakers, the responsibility lies in crafting regulations that ensure AI-driven legal help does more good than harm.

As AI continues to transform how people access legal information, establishing clear, human-centered quality standards will be essential in shaping a fair and effective legal tech landscape.

Need for More Benchmarks of More Legal Tasks

In addition to this current focus on Legal Q-and-A, the justice community also needs to create similar evaluation standards and protocols for other tasks. Besides answering brief legal questions, there are other quality questions that matter to people’s outcomes, rights, and justice. This is the first part of a much bigger effort to have measurable, meaningful justice interventions.

This focus on delineated tasks & quality measures for each will be essential for quality products and models — serving the public — and unlocking greater scale and support of innovation.

Categories
AI + Access to Justice Class Blog Current Projects

Class Presentations for AI for Legal Help

Last week, the 5 student teams in Autumn Quarter’s AI for Legal Help made their final presentations, about if and how generative AI could assist legal aid, court & bar associations in providing legal help to the public.

The class’s 5 student groups have been working over the 9-week quarter with partners including the American Bar Association, Legal Aid Society of San Bernardino, Neighborhood Legal Services of LA, and LA Superior Court Help Center. The partners came to the class with some ideas, and the student teams worked with them to scope & prototype new AI agents to do legal tasks, including:

  • Demand letters for reasonable accommodations
  • Motions to set aside to stop an impending eviction/forcible set-out
  • Triaging court litigants to direct them to appropriate services
  • Analyzing eviction litigants’ case details to spot defenses
  • Improving lawyers’ responses to online brief advice clinic users’ questions

The AI agents are still in early stages. We’ll be continuing refinement, testing, and pilot-planning next quarter.

Categories
AI + Access to Justice Current Projects

AI + Access to Justice Summit 2024

On October 17 and 18, 2024 Stanford Legal Design Lab hosted the first-ever AI and Access to Justice Summit.

The Summit’s primary goal was to build strong relationships and a national, coordinated roadmap of how AI can responsibly be deployed and held accountable to close the justice gap.

AI + A2J Summit at Stanford Law School

Who was at the Summit?

Two law firm sponsors, K&L Gates and DLA Piper, supported the Summit through travel scholarships, program costs, and strategic guidance.

The main group of invitees were frontline legal help providers at legal aid groups, law help website teams, and the courts. We know they are key players in deciding what kinds of AI should and could be impactful for closing the justice gap. They’ll also be key partners in developing, piloting, and evaluating new AI solutions.

Key supporters and regional leaders from bar foundations, philanthropies, and pro bono groups were also invited. Their knowledge about funding, scaling, past initiatives, and spreading projects from one organization and region to others was key to the Summit.

Technology developers also came, both from big technology companies like Google and Microsoft and legal technology companies like Josef, Thomson Reuters, Briefpoint, and Paladin. Some of these groups already have AI tools for legal services, but not all of them have focused in on access to justice use cases.

In addition, we invited researchers who are also developing strategies for responsible, privacy-forward, efficient ways of developing specialized AI solutions that could help people in the justice sphere, and also learn from how AI is being deployed in parallel fields like in medicine or mental health.

Finally, we had participants who work in regulation and policy-making at state bars, to talk about policy, ethics, and balancing innovation with consumer protection. The ‘rules of the road’ about what kinds of AI can be built and deployed, and what standards they need to follow, are essential for clarity and predictability among developers.

What Happened at the Summit?

The Summit was a 2-day event, split intentionally into 5 sections:

  • Hands-On AI Training: Examples and Research to upskill legal professionals. There were demo’s, explainers, and strategies about what AI solutions are already in use or possible for legal services. Big tech, legal tech, and computer science researchers presented participants with hands-on, practical, detailed tour of AI tools, examples, and protocols that can be useful in developing new solutions to close the justice gap.
  • Big Vision: Margaret Hagan and Richard Susskind opened up the 2nd day with a challenge: where does the access to justice community want to be in 2030 when it comes to AI and the justice gap? How can individual organizations collaborate, build common infrastructure, and learn from each other to reach our big-picture goals?
  • AI+A2J as of 2024: In the morning of the second day, two panels presented on what is already happening in AI and Access to Justice — including an inventory of current pilots, demo’s of some early legal aid chatbots, regulators’ guidelines, and innovation sandboxes. This can help the group all understand the early-stage developments and policies.
  • Design & Development of New Initiatives. In the afternoon of the second day, we led breakout design workshops on specific use cases: housing law, immigration law, legal aid intake, and document preparation. The diverse stakeholders worked together using our AI Legal Design workbook to scope out a proposal for a new solution — whether that might mean building new technology or adapting off-the-shelf tech to the needs.
  • Support & Collaboration. In the final session, we heard from a panel who could talk through support: financial support, pro bono partnership support, technology company licensing and architecture support, and other ways to build more new interdisciplinary relationships that could unlock the talent, strategy, momentum, and finances necessary to make AI innovation happen. We also discussed support around evaluation so that there could be more data and more feeling of safety in deploying these new tools.

Takeaways from the Summit

The Summit built strong relationships & common understanding among technologists, providers, researchers, and supporters. Our hope is that we can run the Summit annually, to track progress in tackling the justice gap with AI and to observe what progress has been made, year-to-year. It is also to see the development of these relationships, collaborations, and scaling of impact.

In addition, some key points emerged from the training, panels, workshops, and down-time discussions.

Common Infrastructure for AI Development

Though many AI pilots are going to have be local to a specific organization in a specific region, the national (or international) justice community can be working on common resources that can serve as infrastructure to support AI for justice.

  • Common AI Trainings: Regional leaders, who are newly being hired by state bars and bar foundations to train and explore how AI can fit with legal services, should be working together to develop common training, common resources, and common best practices.
  • Project Repository: National organizations and networks should be thinking about a common repository of projects. This inventory could track what tech provider is being used, what benchmark is being used for evaluation, what AI model is being deployed, what date it was fine-tuned on, and if and how others could replicate it.
  • Rules of the Road Trainings. National organizations and local regulators could give more guidance to leadership like legal aid executive directors about what is allowed or not allowed, what is risky or safe, or other clarification that can help more leadership be brave and knowledgeable about how to deploy AI responsibly. When is an AI project sufficiently tested to be released to the public? How should the team be maintaining and tracking an AI project, to ensure it’s mitigating risk sufficiently?
  • Public Education. Technology companies, regulators, and frontline providers need to be talking more about how to make sure that the AI that is already out there (like ChatGPT, Gemini, and Claude) is reliable, has enough guardrails, and is consumer-safe. More research needs to be done on how to encourage strategic caution among the public, so they can use the AI safely and avoid user mistakes with it (like overreliance or misunderstanding).
  • Regulators<->Frontline Providers. More frontline legal help providers need to be in conversation with regulators (like bar associations, attorneys general, or other state/federal agencies) to talk about their perspective on if and how AI can be useful in closing the justice gap. Their perspective on risks, consumer harms, opportunities, and needs from regulators can ensure that rules are being set to maximize positive impact and minimize consumer harm & technology chilling.
  • Bar Foundation Collaboration. Statewide funders (especially bar foundations) can be talking to each other about their funding, scaling, and AI strategies. Well-resourced bar foundations can share how they are distributing money, what kinds of projects they’re incentivizing, how they are holding the projects accountable, and what local resources or protocols they could share with others.

AI for Justice Should be Going Upstream & Going Big

Richard Susskind charged the group with thinking big about AI for justice. His charges & insights inspired many of the participants throughout the Summit, particularly on two points.

Going Big. Susskind called on legal leaders and technologists not to do piecemeal AI innovation (which might well be the default pathway). Rather, he called on them to work in coordination across the country (if not the globe). The focus should be on reimagining how to use AI as a way to make a fundamental, beneficial shift in justice services. This means not just doing small optimizations or tweaks, but shifting the system to work better for users and providers.

Susskind charged us with thinking beyond augmentation to models of serving the public with their justice needs.

Going Upstream. He also charged us with going upstream, figuring out more early ways to spot and get help to people. This means not just adding AI into the current legal aid or court workflow — but developing new service offerings, data links, or community partnerships. Can we prevent more legal problems by using AI before a small problem spirals into a court case or large conflict?

After Susskind’s remarks, I focused in on coordination among legal actors across the country for AI development. Compared to the last 20 years of legal technology development, are there ways to be more coordinated, and also more focused on impact and accountability?

There might be strategic leaders in different regions of the US and in different issue areas (housing, immigration, debt, family, etc) that are spreading

  • best practices,
  • evaluation protocols and benchmarks,
  • licensing arrangements with technology companies
  • bridges with the technology companies
  • conversations with the regulators.

How can the Access to Justice community be more organized so that their voice can be heard as

  • the rules of the road are being defined?
  • technology companies are building and releasing models that the public is going to be using?
  • technology vendors decide if and how they are going to enter this market, and what their pricing and licensing are going to look like?

Ideally, legal aid groups, courts, and bars will be collaborating together to build AI models, agents, and evaluations that can get a significant number of people the legal help they need to resolve their problems — and to ensure that the general, popular AI tools are doing a good job at helping people with their legal problems.

Privacy Engineering & Confidentiality Concerns

One of the main barriers to AI R&D for justice is confidentiality. Legal aid and other help providers have a duty to keep their clients’ data confidential, which restricts their ability to use past data to train models or to use current data to execute tasks through AI. In practice, many legal leaders are nervous about any new technology that requires client data — -will it lead to data leaks, client harms, regulatory actions, bad press or other concerning outcomes?

Our technology developers and researchers had cutting-edge proposals for privacy-forward AI development, that could deal with some of these concerns around confidentiality. THough these privacy engineering strategies are foreign to many lawyers, the technologists broke them down into step-by-step explanations with examples, to help more legal professionals be able to think about data protection in a systematic, engineering way.

Synthetic Data. One of the privacy-forward strategies discussed was synthetic data. With this solution, a developer doesn’t use real, confidential data to train a system. Rather, they create a parallel but fictional set of data — like a doppelganger to the original client data. It’s structurally similar to confidential client data, but it contains no real people’s information. Synthetic data is a common strategy in healthcare technology, where there is a similar emphasis on patient confidentiality.

Neel Guha explained to the participants how synthetic data works, and how they might build a synthetic dataset that is free of identifiable data and does not violate ethical duties to confidentiality. He emphasized that the more legal aid and court groups can develop datasets that are share-able to researchers and the public, the more that researchers and technologists will be attracted to working on justice-tech challenges. More synthetic datasets will both be ethically safe & beneficial to collaboration, scaling, and innovation.

Federated Model Training. Another privacy/confidentiality strategy was Federated Model Training. Google DeepMind team presented on this strategy, taking examples from the health system.

When multiple hospitals all wanted to work on the same project: training an AI model to better spot tuberculosis or other issues on lung X-rays. Each hospital wanted to train the AI model on their existing X-ray data, but they did not want to let this confidential data to leave their servers and go to a centralized server. Sharing the data would break their confidentiality requirements.

So instead, the hospitals decided to go with a Federated Model training protocol. Here, an original, first version of the AI model was taken from the centralized server and then put on each of the hospital’s localized servers. The local version of the AI model would look at that hospital’s X-ray data and train the model on them. Then they would send the model back to the centralized server and accumulate all of the learnings and trainings to make a smart model in the center. The local hospital data was never shared.

In this way, legal aid groups or courts could explore making a centralized model while still keeping each of their confidential data sources on their private, secure servers. Individual case data and confidential data stay local on the local servers, and the smart collective model lives at a centralized place and gradually gets smarter. This technique can also work for training the model over time so that the model can continue to get smart as the information and data continue to grow.

Towards the Next Year of AI for Access to Justice

The Legal Design Lab team thanks all of our participants and sponsors for a tremendous event. We learned so much and built new relationships that we look forward to deepening with more collaborations & projects.

We were excited to hear frontline providers walk away with new ideas, concrete plans for how to borrow from others’ AI pilots, and an understanding of what might be feasible. We were also excited to see new pro bono and funding relationships develop, that can unlock more resources in this space.

Stay tuned as we continue our work on AI R&D, evaluation, and community-building in the access to justice community. We look forward to working towards closing the justice gap, through technology and otherwise!

Categories
AI + Access to Justice Current Projects

Housing Law experts wanted for AI evaluation research

We are recruiting Housing Law experts to participate in a study of AI answers to landlord-tenant questions. Please sign up here if you are a housing law practitioner interested in this study.

Experts who participate in interviews and AI-ranking sessions will receive Amazon gift cards for their participation.

Categories
AI + Access to Justice Current Projects

Design Workbook for Legal Help AI Pilots

For our upcoming AI+Access to Justice Summit and our AI for Legal Help class, our team has made a new design workbook to guide people through scoping a new AI pilot.

We encourage others to use and explore this AI Design Workbook to help think through:

  • Use Cases and Workflows
  • Specific Legal Tasks that AI could do (or should not do)
  • User Personas, and how they might need or worry about AI — or how they might be affected by it
  • Data plans for training AI and for deploying it
  • Risks, laws, ethics brainstorming about what could go wrong or what regulators might require, and mitigation/prevention plans to proactively deal with these concerns
  • Quality and Efficiency Benchmarks to aim for with a new intervention (and how to compare the tech with the human service)
  • Support needed to go into the next phases, of tech prototyping and pilot deployment

Responsible AI development should be going through these 3 careful stages — design and policy research, tech prototyping and benchmark evaluation, and piloting in a controlled, careful way. We hope this workbook can be useful to groups who want to get started on this journey!