Categories
AI + Access to Justice Class Blog Current Projects Project updates

Legal Aid Intake & Screening AI

A Report on an AI-Powered Intake & Screening Workflow for Legal Aid Teams 

AI for Legal Help, Legal Design Lab, 2025

This report provides a write-up of the AI for Housing Legal Aid Intake & Screening class project, that was one track of the  “AI for Legal Help” Policy Lab, during the Autumn 2024 and Winter 2025 quarters. The AI for Legal Help course involved work with legal and court groups that provide legal help services to the public, to understand where responsible AI innovations might be possible and to design and prototype initial solutions, as well as pilot and evaluation plans.

One of the project tracks was on improving the workflows of legal aid teams who provide housing help, particularly with their struggle of high demand from community members but a lack of clarity on exactly whether a person can be served by the legal aid group & how. Between Autumn 2024 and Winter 2025, an interdisciplinary team of Stanford University students partnered with the Legal Aid Society of San Bernardino (LASSB) to understand the current design of housing intake & screening, and to propose an improved, AI-powered workflow. 

This report details the problem identified by LASSB, the proposed AI-powered intake & screening workflow developed by the student team, and recommendations for future development and implementation. 

We share it in the hopes that legal aid and court help center leadership might also be interested in exploring responsible AI development for demand letters, and that funders, researchers, and technologists might collaborate on developing and testing successful solutions for this task.

Thank you to students in this team: Favour Nerisse, Gretel Cannon, Tatiana Zhang, and other collaborators.. And a big thank you to our LASSB colleagues: Greg Armstrong, Pablo Ramirez, and more.

Introduction

The Legal Aid Society of San Bernardino (LASSB) is a nonprofit law firm serving low-income residents across San Bernardino and Riverside Counties, where housing issues – especially evictions – are the most common legal problems facing the community. Like many legal aid organizations, LASSB operates under severe resource constraints and high demand.

In the first half of 2024 alone, LASSB assisted over 1,200 households (3,261 individuals) with eviction prevention and landlord-tenant support. Yet many more people seek help than LASSB can serve, and those who do seek help often face barriers like long hotline wait times or lack of transportation to clinics. These challenges make the intake process – the initial screening and information-gathering when a client asks for help – a critical bottleneck. If clients cannot get through intake or are screened out improperly, they effectively have no access to justice.

Against this backdrop, LASSB partnered with a team of Stanford students in the AI for Legal Help practicum to explore an AI-based solution. The task selected was housing legal intake: using an AI “Intake Agent” to streamline eligibility screening and initial fact-gathering for clients with housing issues (especially evictions). The proposed solution was a chatbot-style AI assistant that could interview applicants about their legal problem and situation, apply LASSB’s intake criteria, and produce a summary for legal aid staff. By handling routine, high-volume intake questions, the AI agent aimed to reduce client wait times and expand LASSB’s reach to those who can’t easily come in or call during business hours. The students planned a phased evaluation and implementation: first prototyping the agent with sample data, then testing its accuracy and safety with LASSB staff, before moving toward a limited pilot deployment. This report details the development of that prototype AI Intake Agent across the Autumn and Winter quarters, including the use case rationale, current vs. future workflow, technical design, evaluation findings, and recommendations for next steps.

1: The Use Case – AI-Assisted Housing Intake

Defining the Use Case of Intake & Screening

The project focused on legal intake for housing legal help, specifically tenants seeking assistance with eviction or unsafe housing. Intake is the process by which legal aid determines who qualifies for help and gathers the facts of their case. For a tenant facing eviction, this means answering questions about income, household, and the eviction situation, so the agency can decide if the case falls within their scope (for example, within income limits and legal priorities).

Intake is a natural first use case because it is a gateway to justice: a short phone interview or online form is often all that stands between a person in crisis and the help they need. Yet many people never complete this step due to practical barriers (long hold times, lack of childcare or transportation, fear or embarrassment). 

By improving intake, LASSB could assist more people early, preventing more evictions or legal problems from escalating.

Why LASSB Chose Housing Intake 

LASSB and the student team selected the housing intake scenario for several reasons. First, housing is LASSB’s highest-demand area – eviction defense was 62% of cases for a neighboring legal aid and similarly dominant for LASSB. This high volume means intake workers spend enormous time screening housing cases, and many eligible clients are turned away simply because staff can’t handle all the calls. Improving intake throughput could thus have an immediate impact. Second, housing intake involves highly repetitive and rules-based questions (e.g. income eligibility, case type triage) that are well-suited to automation. These are precisely the kind of routine, information-heavy tasks that AI can assist with at scale. 

Third, an intake chatbot could increase privacy and reach: clients could complete intake online 24/7, at their own pace, without waiting on hold or revealing personal stories to a stranger right away. This could especially help those in rural areas or those uncomfortable with an in-person or phone interview. In short, housing intake was seen as a high-impact, AI-ready use case where automation might improve efficiency while preserving quality of service.

Why Intake Matters for Access to Justice

Intake may seem mundane, but it is a cornerstone of access to justice. It is the “front door” of legal aid – if the door is locked or the line too long, people simply don’t get help. Studies show that only a small fraction of people with civil legal issues ever consult a lawyer, often because they don’t recognize their problem as legal or face obstacles seeking help. Even among those who do reach out to legal aid (nearly 2 million requests in 2022), about half are turned away due to insufficient resources. Many turn-aways happen at the intake stage, when agencies must triage cases. Improving intake can thus shrink the “justice gap” by catching more issues early and providing at least some guidance to those who would otherwise get nothing. 

Moreover, a well-designed intake process can empower clients – by helping them tell their story, identifying their urgent needs, and connecting them to appropriate next steps. On the flip side, a bad intake experience (confusing questions, long delays, or perfunctory denials) can discourage people from pursuing their rights, effectively denying justice. By focusing on intake, the project aimed to make the path to legal help smoother and more equitable.

Why AI Is a Good Fit for Housing Intake

Legal intake involves high volume, repetitive Q&A, and standard decision rules, which are conditions where AI can excel. A large language model (LLM) can be programmed to ask the same questions an intake worker would, in a conversational manner, and interpret the answers. 

Because LLMs can process natural language, an AI agent can understand a client’s narrative of their housing problem and spot relevant details or legal issues (e.g. identifying an illegal lockout vs. a formal eviction) to ask appropriate follow-ups. This dynamic questioning is something LLMs have demonstrated success in – for example, a recent experiment in Missouri showed that an LLM could generate follow-up intake questions “in real-time” based on a user’s description, like asking whether a landlord gave formal notice after a tenant said “I got kicked out.” AI can also help standardize decisions: by encoding eligibility rules into the prompt or system, it can apply the same criteria every time, potentially reducing inconsistent screening outcomes. Importantly, initial research found that GPT-4-based models could predict legal aid acceptance/rejection decisions with about 84% accuracy, and they erred on the side of caution (usually not rejecting a case unless clearly ineligible). This suggests AI intake systems can be tuned to minimize false denials, a critical requirement for fairness.

Beyond consistency and accuracy, AI offers scalability and extended reach. Once developed, an AI intake agent can handle multiple clients at once, anytime. For LASSB, this could mean a client with an eviction notice can start an intake at midnight rather than waiting anxious days for a callback. Other legal aid groups have already seen the potential: Legal Aid of North Carolina’s chatbot “LIA” has engaged in over 21,000 conversations in its first year, answering common legal questions and freeing up staff time. LASSB hopes for similar gains – the Executive Director noted plans to test AI tools to “reduce client wait times” and extend services to rural communities that in-person clinics don’t reach. Finally, an AI intake agent can offer a degree of client comfort – some individuals might prefer typing out their story to a bot rather than speaking to a person, especially on sensitive issues like domestic violence intersecting with an eviction. In summary, the volume, repetitive structure, and outreach potential of intake made it an ideal candidate for an AI solution.

2: Status Quo and Future Vision

Current Human-Led Workflow 

At present, LASSB’s intake process is entirely human-driven. A typical workflow might begin with a client calling LASSB’s hotline or walking into a clinic. An intake coordinator or paralegal then screens for eligibility, asking a series of standard questions: Are you a U.S. citizen or eligible immigrant? What is your household size and income? What is your zip code or county? What type of legal issue do you have? These questions correspond to LASSB’s internal eligibility rules (for example, income below a percentage of the poverty line, residence in the service area, and case type within program priorities). 

The intake worker usually follows a scripted guide – these guides can run 7+ pages of rules and flowcharts for different scenarios. If the client passes initial screening, the staffer moves on to information-gathering: taking down details of the legal problem. In a housing case, they might ask: “When did you receive the eviction notice? Did you already go to court? How many people live in the unit? Do you have any disabilities or special circumstances?” This helps determine the urgency and possible defenses (for instance, disability could mean a reasonable accommodation letter might help, or a lockout without court order is illegal). The intake worker must also gauge if the case fits LASSB’s current priorities or grant requirements – a subtle judgment call often based on experience. 

Once information is collected, the case is handed off internally: if it’s straightforward and within scope, they may schedule the client for a legal clinic or assign a staff attorney for advice. If it’s a tougher or out-of-scope case, the client might be given a referral to another agency or a “brief advice” appointment where an attorney only gives counsel and not full representation. In some instances, there are multiple handoffs – for example, the person who does the phone screening might not be the one who ultimately provides the legal advice, requiring good note-taking and case summaries.

User Personas in the Workflow

The team crafted sample user and staff personas, of who would be interacting with the new workflow and AI agent.


Pain Points in the Status Quo

This human-centric process has several pain points identified by LASSB and the student team. 

First, it’s slow and resource-intensive. Clients can wait an hour or more on hold before even speaking to an intake worker during peak times, such as when an eviction moratorium change causes a surge in calls. Staff capacity is limited – a single intake worker can only handle one client at a time, and each interview might take 20–30 minutes. If the client is ultimately ineligible, that time might be “wasted” that could have been spent on an eligible client. The sheer volume means many callers never get through at all. 

Second, the complexity of rules can lead to inconsistent or suboptimal outcomes. Intake staff have to juggle 30+ eligibility rules, which can change with funding or policy shifts. Important details might be missed or misapplied; for example, a novice staffer might turn away a case that seems outside scope but actually fits an exception. Indeed, variability in intake decisions was a known issue – one research project found that LLMs sometimes caught errors made by human screeners (e.g., the AI recognized a case was eligible when a human mistakenly marked it as not). 

Third, the process can be stressful for clients. Explaining one’s predicament (like why rent is behind) to a stranger can be intimidating. Clients in crisis might forget to mention key facts or have trouble understanding the questions. If a client has trauma (such as a domestic violence survivor facing eviction due to abuse), a blunt interview can inadvertently re-traumatize them. LASSB intake staff are trained to be sensitive, but in the rush of high volume, the experience may still feel hurried or impersonal. 

Finally, timing and access are issues. Intake typically happens during business hours via phone or at specific clinic times. People who work, lack a phone, or have disabilities may struggle to engage through those channels. Language barriers can also be an issue; while LASSB offers services in Spanish and other languages, matching bilingual staff to every call is challenging. All these pain points underscore a need for a more efficient, user-friendly intake system.

Envisioned Human-AI Workflow

In the future-state vision, LASSB’s intake would be a human-AI partnership, blending automation with human judgment. The envisioned workflow goes as follows: A client in need of housing help would first interact with an AI Intake Agent, likely through a web chat interface (or possibly via a self-help kiosk or mobile app). 

The AI agent would greet the user with a friendly introduction (making clear it’s an automated assistant) and guide them through the eligibility questions – e.g., asking for their income range, household size, and problem category. These could even be answered via simple buttons or quick replies to make it easy. The agent would use these answers to do an initial screening (following the same rules staff use). If clearly ineligible (for instance, the person lives outside LASSB’s service counties), the agent would not simply turn them away. Instead, it might gently inform them that LASSB likely cannot assist directly and provide a referral link or information for the appropriate jurisdiction. (Crucially, per LASSB’s guidance, the AI would err on inclusion – if unsure, it would mark the case for human review rather than issuing a flat denial.) 

For those who pass the basic criteria, the AI would proceed to collect case facts: “Please describe what’s happening with your housing situation.” As the user writes or speaks (in a typed chat or possibly voice in the future), the AI will parse the narrative and ask smart follow-ups. For example, if the client says “I’m being evicted for not paying rent,” the AI might follow up: “Have you received court papers (an unlawful detainer lawsuit) from your landlord, or just a pay-or-quit notice?” – aiming to distinguish a looming eviction from an active court case. This dynamic Q&A continues until the AI has enough detail to fill out an intake template (or until it senses diminishing returns from more questions). The conversation is designed to feel like a natural interview with empathy and clarity.

After gathering info, the handoff to humans occurs. The AI will compile a summary of the intake: key facts like names, important dates (e.g., eviction hearing date if any), and the client’s stated goals or concerns. It may also tentatively flag certain legal issues or urgency indicators – for instance, “Client might qualify for a disability accommodation defense” or “Lockout situation – urgent” – based on what it learned. This summary and the raw Q&A transcript are then forwarded to LASSB’s intake staff or attorneys. A human will review the package, double-check eligibility (the AI’s work is a recommendation, not final), and then follow up with the client. In some cases, the AI might be able to immediately route the client: for example, scheduling them for the next eviction clinic or providing a link to self-help resources while they wait.

But major decisions, like accepting the case for full representation or giving legal advice, remain with human professionals. The human staff thus step in at the “decision” stage with a lot of the grunt work already done. They can spend their time verifying critical details and providing counsel, rather than laboriously collecting background info. This hybrid workflow means clients get faster initial engagement (potentially instantaneous via AI, instead of waiting days for a call) and staff time is used more efficiently where their expertise is truly needed.

Feedback-Shaped Vision

The envisioned workflow was refined through feedback from LASSB stakeholders and experts during the project. Early on, LASSB’s attorneys emphasized that high-stakes decisions must remain human – for instance, deciding someone is ineligible or giving them legal advice about what to do would require a person. This feedback led the team to build guardrails so the AI does not give definitive legal conclusions or turn anyone away without human oversight. Another piece of feedback was about tone and trauma-informed practice. LASSB staff noted that many clients are distressed; a cold or robotic interview could alienate them. In response, the team made the AI’s language extra supportive and user-friendly, adding polite affirmations (“Thank you for sharing that information”) and apologies (“I’m sorry you’re dealing with this”) where appropriate. 

They also ensured the AI would ask for sensitive details in a careful way and only if necessary. For example, rather than immediately asking “How much is your income?” which might feel intrusive, the AI might first explain “We ask income because we have to confirm eligibility – roughly what is your monthly income?” to give context. The team also got input on workflow integration – intake staff wanted the AI system to feed into their existing case management software (LegalServer) so that there’s no duplication of data entry. This shaped the plan for implementation (i.e., designing the output in a format that can be easily transferred). Finally, feedback from technologists and the class instructors encouraged the use of a combined approach (rules + AI). This meant not relying on the AI alone to figure out eligibility from scratch, but to use simple rule-based checks for clear-cut criteria (citizenship, income threshold) and let the AI focus on understanding the narrative and generating follow-up questions. 

This hybrid approach was validated by outside research as well. All of these inputs helped refine the future workflow into one that is practical, safe, and aligned with LASSB’s needs: AI handles the heavy lifting of asking and recording, while humans handle the nuanced judgment calls and personal touch.


3: Prototyping and Technical Work

Initial Concepts from Autumn Quarter 

During the Autumn 2024 quarter, the student team explored the problem space and brainstormed possible AI interventions for LASSB. The partner had come with a range of ideas, including using AI to assist with emergency eviction filings. One early concept was an AI tool to help tenants draft a “motion to set aside” a default eviction judgment – essentially, a last-minute court filing to stop a lockout. This is a high-impact task (it can literally keep someone housed), but also high-risk and time-sensitive. Through discussions with LASSB, the team realized that automating such a critical legal document might be too ambitious as a first step – errors or bad advice in that context could have severe consequences. 

Moreover, to draft a motion, the AI would still need a solid intake of facts to base it on. This insight refocused the team on the intake stage as the foundation. Another concept floated was an AI that could analyze a tenant’s story to spot legal defenses (for example, identifying if the landlord failed to make repairs as a defense to nonpayment). While appealing, this again raised the concern of false negatives (what if the AI missed a valid defense?) and overlapped with legal advice. Feedback from course mentors and LASSB steered the team toward a more contained use case: improving the intake interview itself

By the end of Autumn quarter, the students presented a concept for an AI intake chatbot that would ask clients the right questions and produce an intake summary for staff. The concept kept human review in the loop, aligning with the consensus that AI should support, not replace, the expert judgment of LASSB’s legal team.

Revised Scope in Winter 

Going into Winter quarter, the project’s scope was refined and solidified. The team committed to a limited use case – the AI would handle initial intake for housing matters only, and it would not make any final eligibility determinations or provide legal advice. All high-stakes decisions were deferred to staff. For example, rather than programming the AI to tell a client “You are over income, we cannot help,” the AI would instead flag the issue for a human to confirm and follow up with a personalized referral if needed. Likewise, the AI would not tell a client “You have a great defense, here’s what to do” – instead, it might say, “Thank you, someone from our office will review this information and discuss next steps with you.” By narrowing the scope to fact-gathering and preliminary triage, the team could focus on making the AI excellent at those tasks, while minimizing ethical risks. They also limited the domain to housing (evictions, landlord/tenant issues) rather than trying to cover every legal issue LASSB handles. This allowed the prototype to be more finely tuned with housing-specific terminology and questions. The Winter quarter also shifted toward implementation details – deciding on the tech stack and data inputs – now that the “what” was determined. The result was a clear mandate: build a prototype AI intake agent for housing that asks the right questions, captures the necessary data, and hands off to humans appropriately.

Prototype Development Details 

The team developed the prototype using a combination of Google’s Vertex AI platform and custom scripting. Vertex AI was chosen in part for its enterprise-grade security (important for client data) and its support for large language model deployment. Using Vertex AI’s generative AI tools, the students configured a chatbot with a predefined prompt that established the AI’s role and instructions. For example, the system prompt instructed: “You are an intake assistant for a legal aid organization. Your job is to collect information from the client about their housing issue, while being polite, patient, and thorough. You do not give legal advice or make final decisions. If the user asks for advice or a decision, you should defer and explain a human will help with that.” This kind of prompt served as a guardrail for the AI’s behavior.

They also input a structured intake script derived from LASSB’s actual intake checklist. This script included key questions (citizenship, income, etc.) and conditional logic – for instance, if the client indicated a domestic violence issue tied to housing, the AI should ask a few DV-related questions (given LASSB has special protocols for DV survivors). Some of this logic was handled by embedding cues in the prompt like: “If the client mentions domestic violence, express empathy and ensure they are safe, then ask if they have a restraining order or need emergency assistance.” The team had to balance not making the AI too rigidly scripted (losing the flexibility of natural conversation) with not leaving it totally open-ended (which could lead to random or irrelevant questions). They achieved this by a hybrid approach: a few initial questions were fixed and rule-based (using Vertex AI’s dialogue flow control), then the narrative part used the LLM’s generative ability to ask appropriate follow-ups. 

The sample data used to develop and test the bot included a set of hypothetical client scenarios. The students wrote out example intakes (based on real patterns LASSB described) – e.g., “Client is a single mother behind 2 months rent after losing job; received 3-day notice; has an eviction hearing in 2 weeks; also mentions apartment has mold”. They fed these scenarios to the chatbot during development to see how it responded. This helped them identify gaps – for example, early versions of the bot forgot to ask whether the client had received court papers, and sometimes it didn’t ask about deadlines like a hearing date. Each iteration, they refined the prompt or added guidance until the bot consistently covered those crucial points.

Key Design Decisions

A number of design decisions were made to ensure the AI agent was effective and aligned with LASSB’s values.

Trauma-Informed Questioning 

The bot’s dialogue was crafted to be empathetic and empowering. Instead of bluntly asking “Why didn’t you pay your rent?,” it would use a non-judgmental tone: “Can you share a bit about why you fell behind on rent? (For example, loss of income, unexpected expenses, etc.) This helps us understand your situation.” 

The AI was also set to avoid repetitive pressing on distressing details. If a client had already said plenty about a conflict with their landlord, the AI would acknowledge that (“Thank you, I understand that must be very stressful”) and not re-ask the same thing just to fill a form. These choices were informed by trauma-informed lawyering principles LASSB adheres to, aiming to make clients feel heard and not blamed.

Tone and Language 

The AI speaks in plain, layperson’s language, not legalese. Internal rules like “FPI at 125% for XYZ funding” were translated into simple terms or hidden from the user. For instance, instead of asking “Is your income under 125% of the federal poverty guidelines?” the bot asks “Do you mind sharing your monthly income (approximately)? We have income limits to determine eligibility.” It also explains why it’s asking things, to build trust. The tone is conversational but professional – akin to a friendly paralegal. 

The team included some small talk elements at the start (“I’m here to help you with your housing issue. I will ask some questions to understand your situation.”) to put users at ease. Importantly, the bot never pretends to be a lawyer or a human; it was transparent that it’s a virtual assistant helping gather info for the legal aid.

Guardrails

Several guardrails were programmed to keep the AI on track. A major one was a do-not-do list in the prompt: do not provide legal advice, do not make guarantees, do not deviate into unrelated topics even if user goes off-track. If the user asked a legal question (“What should I do about X?”), the bot was instructed to reply with something like: “I’m not able to give legal advice, but I will record your question for our attorneys. Let’s focus on getting the details of your situation, and our team will advise you soon.” 

Another guardrail was content moderation – e.g., if a user described intentions of self-harm or violence, the bot would give a compassionate response and alert a human immediately. Vertex AI’s content filter was leveraged to catch extreme situations. Additionally, the bot was prevented from asking for information that LASSB staff said they never need at intake (to avoid over-intrusive behavior). For example, it wouldn’t ask for Social Security Number or any passwords, etc., which also helps with security.

User Flow and Interface

The user flow was deliberately kept simple. The prototype interface (tested in a web browser) would show one question at a time, and allow the user to either type a response or select from suggested options when applicable. The design avoids giant text boxes that might overwhelm users; instead, it breaks the interview into bite-sized exchanges (a principle from online form usability). 

After the last question, the bot would explicitly ask “Is there anything else you want us to know?” giving the user a chance to add details in their own words. Then the bot would confirm it has what it needs and explain the next steps: e.g., “Thank you for all this information. Our legal team will review it immediately. You should receive a call or email from us within 1 business day. If you have an urgent court date, you can also call our hotline at …” This closure message was included to ensure the user isn’t left wondering what happens next, a common complaint with some automated systems.

Risk Mitigation

The team did a review of what could go wrong — what risks of harm are there with an intake agent? They did a brainstorm of what design, tech, and policy decisions could mitigate each of those risks.

 RiskMitigation
Screening Agent
 The client is monolingual and does not understand the AI’s questions and does not provide sufficient/ correct information to the Agent.We are working towards the Screening Agent having multilingual capabilities, particular Spanish-language skills.
 The client is vision or hearing impaired and the Screening Agent does not understand the client.The Screening Agent has voice-to-text for vision impaired clients and text-based options for hearing impaired clients. We can also train the Screening Agent on producing a list of questions it did not get answers to and route to the Paralegal to ask those questions.  
 The Screening Agent does not understand the client properly and generates incorrect information.The Screening Agent will confirm / spell back important identifying information, such as names and addresses. The Screening Agent will be programmed to route back to an IW or Paralegal if the AI cannot understand the client. A LASSB attorney will review and confirm any final product with the client.
 The client is insulted or in some other way offended by the Screening Agent.The Screening Agent’s scope is limited to the Screening Questions. It will also be trained on trauma-informed care. LASSB should also obtain the clients’ consent before referring them to the Screening Agent.

Training and Iteration

Notably, the team did not train a new machine learning model from scratch; instead they used a pre-existing LLM (from Vertex, analogous to GPT-4 or PaLM2) and focused on prompt engineering and few-shot examples to refine its performance. They created a few example dialogues as part of the prompt to show the AI what a good intake looks like. For instance, an example Q&A in the prompt might demonstrate the AI asking clarifying questions and the user responding, so the model could mimic that style. 

The prototype’s development was highly iterative: the students would run simulated chats (playing the user role themselves or with peers) and analyze the output. When the AI did something undesirable – like asking a redundant question or missing a key fact – they would adjust the instructions or add a conditional rule. They also experimented with model parameters like temperature (choosing a relatively low temperature for more predictable, consistent questioning rather than creative, off-the-cuff responses[28][18]). Over the Winter quarter, dozens of test conversations were conducted. 

Midway, they also invited LASSB staff to test the bot with sample scenarios. An intake supervisor typed in a scenario of a tenant family being evicted after one member lost a job, and based on that feedback, the team tweaked the bot to be more sensitive when asking about income (the supervisor felt the bot should explicitly mention services are free and confidential, to reassure clients as they disclose personal info). The final prototype by March 2025 was able to handle a realistic intake conversation end-to-end: from greeting to summary output. 

The output was formatted as a structured text report (with sections for client info, issue summary, and any urgent flags) that a human could quickly read. The technical work thus culminated in a working demo of the AI intake agent ready for evaluation.

4: Evaluation and Lessons Learned

Evaluating Quality and Usefulness

The team approached evaluation on multiple dimensions – accuracy of the intake, usefulness to staff, user experience, and safety. 

First, the team created a quality rubric about what ‘good’ or ‘bad performance would look like.

Good-Bad Rubric on Screening Performance

A successful agent will be able to obtain answers from the client for all relevant Screening questions in the format best suited to the client (i.e., verbally or written and in English or Spanish).  A successful agent will also be able to ask some open-ended questions about the client’s legal problem to save the time spent by the Housing Attorney and Clinic Attorney discussing the client’s legal problem. Ultimately, a successful AI Screening agent will be able to perform pre-screening and Screening for clients

✅A good Screening agent will be able to accurately detail all the client’s information and ensure that there are no mistakes in the spelling or otherwise of the information. 

❌A bad Screening agent would produce incorrect information and misunderstand the clients.  A bad solution would require the LASSB users to cross-check and amend lots of the information with the client.

✅A good Screening agent will be user-friendly for the clients in a format already familiar with the client, such as text or phone call.

❌ A bad Screening agent would require clients, many of whom may be unsophisticated, to use systems they are not familiar with and would be difficult to use.

✅A good Screening agent would be multilingual.

❌ A bad Screening agent would only understand clients that spoke very and in a particular format.

✅ A good Screening agent would be accessible for clients with disabilities, including vision or audio impaired clients.  

❌A bad Screening agent would not be accessible to clients with disabilities. A bad solution would not be accessible on a client’s phone.

✅A good Screening agent will be respond to the clients in a trauma-informed manner.  A good AI agent Screening will appear kind and make the clients feel comfortable.

❌A bad Screening agent would offend the clients and make the clients reluctant to answer the questions.

✅A good Screening agent will produce a transcript of the interview that enables the LASSB attorneys and paralegals to understand the client’s situation efficiently. To do this, the agent could produce a summary of the key points from the Screening questions.  It is also important the transcript is searchable and easy to navigate so that the LASSB attorneys can easily locate information.

❌A bad Screening agent would produce a transcript that is difficult to navigate and identify key information.  For example, it may produce a large PDF that is not searchable and not provide any easy way to find the responses to the questions. 

✅A good Screening agent need not get through the questions as quickly as possible, but must be able to redirect the client to the questions to ensure that the clients answers all the necessary questions.

❌A bad Screening agent would get distracted from the clients’ responses and not obtain answers to all the questions.

In summary, the main metrics against which the Screening Agent should be measured include:

  1. Accuracy: whether matches human performance or produces errors in less cases);
  2. User satisfaction: how happy the client & LASSB personnel using the agent are; and
  3. Efficiency: how much time the agent takes to obtain answers to all 114 pre-screening and Screening questions.

Testing the prototype

To test accuracy, they compared the AI’s screening and issue-spotting to that of human experts. They prepared 16 sample intake scenarios (inspired by real cases, similar to what other researchers have done) and for each scenario they had a law student or attorney determine the expected “intake outcome” (e.g., eligible vs. not eligible, and key issues identified). Then they ran each scenario through the AI chatbot and examined the results. The encouraging finding was that the AI correctly identified eligibility in the vast majority of cases, and when uncertain, it appropriately refrained from a definitive judgment – often saying a human would review. For example, in a scenario where the client’s income was slightly above the normal cutoff but they had a disability (which could qualify them under an exception), the AI noted the income issue but did not reject the case; it tagged it for staff review. This behavior aligned with the design goal of avoiding false negatives. 

In fact, across the test scenarios, the AI never outright “turned away” an eligible client. At worst, it sometimes told an ineligible client that it “might not” qualify and a human would confirm – a conservative approach that errs on inclusion. In terms of issue-spotting, the AI’s performance was good but not flawless. It correctly zeroed in on the main legal issue (e.g., nonpayment eviction, illegal lockout, landlord harassment) in nearly all cases. In a few complex scenarios, it missed secondary issues – for instance, a scenario involved both eviction and a housing code violation (mold), and the AI summary focused on the eviction but didn’t highlight the possible habitability claim. When attorneys reviewed this, they noted a human intake worker likely would have flagged the mold issue for potential affirmative claims. This indicated a learning: the AI might need further training or prompts to capture all legal issues, not just the primary one.

To gauge usefulness and usability, the team turned to qualitative feedback. They had LASSB intake staff and a couple of volunteer testers act as users in mock intake interviews with the AI. Afterward, they surveyed them on the experience. The intake staff’s perspective was crucial: they reviewed the AI-generated summaries alongside what a typical human-intake notes would look like. The staff generally found the AI summaries usable and in many cases more structured than human notes. The AI provided a coherent narrative of the problem and neatly listed relevant facts (dates, amounts, etc.), which some staff said could save them a few minutes per case in writing up memos. One intake coordinator commented that the AI “asked all the questions I would have asked” in a standard tenancy termination case – a positive sign of completeness. 

On the client side, volunteer testers noted that the AI was understandable and polite, though a few thought it was a bit “formal” in phrasing. This might reflect the fine line between professional and conversational tone – a point for possible adjustment. Importantly, testers reported that they “would be comfortable using this tool” and would trust that their information gets to a real lawyer. The presence of clear next-step messaging (that staff would follow up) seemed to reassure users that they weren’t just shouting into a void. The team also looked at efficiency metrics: In simulation, the AI interview took about 5–10 minutes of user time on average, compared to ~15 minutes for a typical phone intake. Of course, these were simulated users; real clients might take longer to type or might need more clarification. But it suggested the AI could potentially cut intake time by around 30-50% for straightforward cases, a significant efficiency gain.

Benchmarks for AI Performance

In designing evaluation, the team drew on emerging benchmarks in the AI & justice field. They set some target benchmarks such as: 

  • Zero critical errors (no client who should be helped is mistakenly rejected by the AI, and no obviously wrong information given), 
  • at least 80% alignment with human experts on identifying case eligibility (they achieved ~90% in testing), and 
  • high user satisfaction (measured informally via feedback forms). 

For safety, a benchmark was that the AI should trigger human intervention in 100% of cases where certain red flags appear (like mention of self-harm or urgent safety concerns). In test runs, there was one scenario where a client said something like “I have nowhere to go, I’m so desperate I’m thinking of doing something drastic.” 

The AI appropriately responded with empathy and indicated that it would notify the team for immediate assistance – meeting the safety benchmark. Another benchmark was privacy and confidentiality – the team checked that the AI was not inadvertently storing data outside approved channels. All test data was kept in a sandbox environment and they planned that any actual deployment would comply with confidentiality policies (e.g., not retaining chat transcripts longer than needed and storing them in LASSB’s secure system).

Feedback from Attorneys and Technologists: 

The prototype was demonstrated to a group of LASSB attorneys, intake staff, and a few technology advisors in late Winter quarter. The attorneys provided candid feedback. One housing lawyer was initially skeptical – concerned an AI might miss the human nuance – but after seeing the demo, they remarked that “the output is like what I’d expect from a well-trained intern or paralegal.” They appreciated that the AI didn’t attempt to solve the case but simply gathered information systematically. Another attorney asked about bias – whether the AI might treat clients differently based on how they talk (for instance, if a client is less articulate, would the AI misunderstand?). 

In response, the team showed how the AI asks gentle clarifying questions if it’s unsure, and they discussed plans for continuous monitoring to catch any biased outcomes. The intake staff reiterated that the tool could be very helpful as an initial filter, especially during surges. They did voice a concern: “How do we ensure the client’s story is accurately understood?” This led to a suggestion that in the pilot phase, staff double-check key facts with the client (“The bot noted you got a 3-day notice on Jan 1, is that correct?”) to verify nothing was lost in translation. 

Technologists (including advisors from the Stanford Legal Design Lab) gave feedback on the technical approach. They supported the use of rule-based gating combined with LLM follow-ups, noting that other projects (like the Missouri intake experiment) have found success with that hybrid model. They also advised to keep the model updated with policy changes – e.g., if income thresholds or laws change, those need to be reflected in the AI’s knowledge promptly, which is more of an operational challenge than a technical one. Overall, the feedback from all sides was that the prototype showed real promise, provided it’s implemented carefully. Stakeholders were excited that it could improve capacity, but they stressed that proper oversight and iterative improvement would be key before using it live with vulnerable clients.

What Worked Well in testing

Several aspects of the project went well. First, the AI agent effectively mirrored the standard intake procedure, indicating that the effort to encode LASSB’s intake script was successful. It consistently asked the fundamental eligibility questions and gathered core facts without needing human prompting. This shows that a well-structured prompt and logic can guide an LLM to perform a complex multi-step task reliably. 

Second, the LLM’s natural language understanding proved advantageous. It could handle varied user inputs – whether someone wrote a long story all at once or gave terse answers, the AI adapted. In one test, a user rambled about their landlord “kicking them out for no reason, changed locks, etc.” and the AI parsed that as an illegal lockout scenario and asked the right follow-up about court involvement. The ability to parse messy, real-life narratives and extract legal-relevant details is where AI shined compared to rigid forms. 

Third, the tone and empathy embedded in the AI’s design appeared to resonate. Test users noted that the bot was “surprisingly caring”. This was a victory for the team’s design emphasis on trauma-informed language – it validated that an AI can be programmed to respond in a way that feels supportive (at least to some users). 

Fourth, the AI’s cautious approach to eligibility (not auto-rejecting) worked as intended. In testing, whenever a scenario was borderline, the AI prompted for human review rather than making a call. This matches the desired ethical stance: no one gets thrown out by a machine’s decision alone. Finally, the process of developing the prototype fostered a lot of knowledge transfer and reflection. LASSB staff mentioned that just mapping out their intake logic for the AI helped them identify a few inefficiencies in their current process (like questions that might not be needed). So the project had a side benefit of process improvement insight for the human system too.

What Failed or Fell Short in testing

Despite the many positives, there were also failures and limitations encountered. One issue was over-questioning. The AI sometimes asked one or two questions too many, which could test a user’s patience. For example, in a scenario where the client clearly stated “I have an eviction hearing on April 1,” an earlier version of the bot still asked “Do you know if there’s a court date set?” which was redundant. This kind of repetition, while minor, could annoy a real user. It stemmed from the AI not having a perfect memory of prior answers unless carefully constrained – a known quirk of LLMs. The team addressed some instances by refining prompts, but it’s something to watch in deployment. Another shortcoming was handling of multi-issue situations. If a client brought up multiple problems (say eviction plus a related family law issue), the AI got somewhat confused about scope. In one test, a user mentioned being evicted and also having a dispute with a roommate who is a partner – mixing housing and personal relationship issues. The AI tried to be helpful by asking about both, but that made the interview unfocused. This highlights that AI may struggle with scope management – knowing what not to delve into. A design decision for the future might be to explicitly tell the AI to stick to housing and ignore other legal problems (while perhaps flagging them for later). 

Additionally, there were challenges with the AI’s legal knowledge limits. The prototype did not integrate an external legal knowledge base; it relied on the LLM’s trained knowledge (up to its cutoff date). While it generally knew common eviction terms, it might not know the latest California-specific procedural rules. For instance, if a user asked, “What is an Unlawful Detainer?” the AI provided a decent generic answer in testing, but we hadn’t formally allowed it to give legal definitions (since that edges into advice). If not carefully constrained, it might give incorrect or jurisdictionally wrong info. This is a risk the team noted: for production, one might integrate a vetted FAQ or knowledge retrieval component to ensure any legal info given is accurate and up-to-date.

We also learned that the AI could face moderation or refusal issues for certain sensitive content. As seen in other research, certain models have content filters that might refuse queries about violence or illegal activity. In our tests, when a scenario involved domestic violence, the AI handled it appropriately (did not refuse; it responded with concern and continued). But we were aware that some LLMs might balk or produce sanitised answers if a user’s description includes abuse details or strong language. Ensuring the AI remains able to discuss these issues (in a helpful way) is an ongoing concern – we might need to adjust settings or choose models that allow these conversations with proper context. 

Lastly, the team encountered the mundane but important challenge of integrating with existing systems. The prototype worked in a standalone environment, but LASSB’s real intake involves LegalServer and other databases. We didn’t fully solve how to plug the AI into those systems in real-time. This is less a failure of the AI per se and more a next-step technical hurdle, but it’s worth noting: a tool is only useful if it fits into the workflow. We attempted a small integration by outputting the summary in a format similar to a LegalServer intake form, but a true integration would require more IT development.

Why These Issues Arose

Many of the shortcomings trace back to the inherent limitations of current LLM technology and the complexity of legal practice. The redundant questions happened because the AI doesn’t truly understand context like a human, it only predicts likely sequences. If not explicitly instructed, it might err on asking again to be safe. Our prompt engineering reduced but didn’t eliminate this; it’s a reminder that LLMs need carefully bounded instructions. The scope creep with multiple issues is a byproduct of the AI trying to be helpful – it sees mention of another problem and, without human judgment about relevance, it goes after it. This is where human intake workers naturally filter and focus, something an AI will do only as well as it’s told to. 

Legal knowledge gaps are expected because an LLM is not a legal expert and can’t be updated like a database without re-training. We mitigated risk by not relying on it to give legal answers, but any subtle knowledge it applied (like understanding eviction procedure) comes from its general training, which might not capture local nuances. The team recognized that a retrieval-augmented approach (providing the AI with reference text like LASSB’s manual or housing law snippets) could improve factual accuracy, but that was beyond the initial prototype’s scope. 

Content moderation issues arise from the AI provider’s safety guardrails – these are important to have (to avoid harmful outputs), but they can be a blunt instrument. Fine-tuning them for a legal aid context (where discussions of violence or self-harm are sometimes necessary) is tricky and likely requires collaboration with the provider or switching to a model where we have more control. The integration challenge simply comes from the fact that legal aid tech stacks were not designed with AI in mind. Systems like LegalServer are improving their API offerings, but knitting together a custom AI with legacy systems is non-trivial. This is a broader lesson: often the tech is ahead of the implementation environment in nonprofits.

Lessons on Human-AI Teaming and Client Protection 

Developing this prototype yielded valuable lessons about how AI and humans can best collaborate in legal services. One clear lesson is that AI works best as a junior partner, not a solo actor. Our intake agent performed well when its role was bounded to assisting – gathering info, suggesting next steps – under human supervision. The moment we imagined expanding its role (like it drafting a motion or advising a client), the complexity and risk jumped exponentially. So, the takeaway for human-AI teaming is to start with discrete tasks that augment human work. The humans remain the decision-makers and safety net, which not only protects clients but also builds trust among staff. Initially, some LASSB staff were worried the AI might replace them or make decisions they disagreed with. By designing the system to clearly feed into the human process (rather than bypass it), we gained staff buy-in. They began to see the AI as a tool – like an efficient paralegal – rather than a threat. This cultural acceptance is crucial for any such project to succeed.

We also learned about the importance of transparency and accountability in the AI’s operation. For human team members to rely on the AI, they need to know what it asked and what the client answered. Black-box summaries aren’t enough. That’s why we ensured the full Q&A transcript is available to the staff reviewing the case. This way, if something looks off in the summary, the human can check exactly what was said. It’s a form of accountability for the AI. In fact, one attorney noted this could be an advantage: “Sometimes I wish I had a recording or transcript of the intake call to double-check details – this gives me that.” However, this raises a client protection consideration: since the AI interactions are recorded text, safeguarding that data is paramount (whereas a phone call’s content might not be recorded at all). We have to treat those chat logs as confidential client communications. This means robust data security and policies on who can access them.

From the client’s perspective, a lesson is that AI can empower clients if used correctly. Some testers said they felt more in control typing out their story versus speaking on the phone, because they could see what they wrote and edit their thoughts. The AI also never expresses shock or judgment, which some clients might prefer. However, others might find it impersonal or might struggle if they aren’t literate or tech-comfortable. So a takeaway is that AI intake should be offered as an option, not the only path. Clients should be able to choose a human interaction if they want. That choice protects client autonomy and ensures we don’t inadvertently exclude those who can’t or won’t use the technology (due to disability, language, etc.).

Finally, the project underscored that guarding against harm requires constant vigilance. We designed many protections into the system, but we know that only through real-world use will new issues emerge. One must plan to continuously monitor the AI’s outputs for any signs of bias, error, or unintended effects on clients. For example, if clients start treating the AI’s words as gospel (even though we tell them a human will follow up), we might need to reinforce disclaimers or adjust messaging. Human-AI teaming in legal aid is thus not a set-and-forget deployment; it’s an ongoing partnership where the technology must be supervised and updated by the humans running it. As one of the law students quipped, “It’s like having a really smart but somewhat unpredictable intern – you’ve got to keep an eye on them.” This captures well the role of AI: helpful, yes, but still requiring human oversight to truly protect and serve the client’s interests.

Section 5: Recommendations and Next Steps

Immediate Next Steps for LASSB: 

With the prototype built and initial evaluations positive, LASSB is poised to take the next steps toward a pilot. In the near term, a key step is securing approval and support from LASSB leadership and stakeholders. This includes briefing the executive team and possibly the board about the prototype’s capabilities and limitations, to get buy-in for moving forward. (Notably, LASSB’s executive director is already enthusiastic about using AI to streamline services.) 

Concurrently, LASSB should engage with its IT staff or consultants to plan integration of the AI agent with their systems. This means figuring out how the AI will receive user inquiries (e.g., via the LASSB website or a dedicated phone text line) and how the data will flow into their case management. 

A concrete next step is a small-scale pilot deployment of the AI intake agent in a controlled setting. One suggestion is to start with after-hours or overflow calls: for example, when the hotline is closed, direct callers to an online chat with the AI agent as an initial intake, with clear messaging that someone will follow up next day. This would allow testing the AI with real users in a relatively low-risk context (since those clients would likely otherwise just leave a voicemail or not connect at all). Another approach is to use the AI internally first – e.g., have intake staff use the AI in parallel with their own interviewing (almost like a decision support tool) to see if it captures the same info.

LASSB should also pursue any necessary training or policy updates. Staff will need to be trained on how to review AI-collected information, and perhaps coached to not simply trust it blindly but verify critical pieces. Policies may need updating to address AI usage – for instance, updating the intake protocol manual to include procedures for AI-assisted cases. 

Additionally, client consent and awareness must be addressed. A near-term task is drafting a short consent notice for clients using the AI (e.g., “You are interacting with LASSB’s virtual assistant. It will collect information that will be kept confidential and reviewed by our legal team. This assistant is not a lawyer and cannot give legal advice. By continuing you consent to this process.”). This ensures ethical transparency and could be implemented easily at the start of the chat. In summary, the immediate next steps revolve around setting up a pilot environment: getting green lights, making technical arrangements, and preparing staff and clients for the introduction of the AI intake agent.

Toward Pilot and Deployment

To move from prototype to a live pilot, a few things are needed. 

Resource investment is one – while the prototype was built by students, sustaining and improving it will require dedicated resources. LASSB may need to seek a grant or allocate budget for an “AI Intake Pilot” project. This could fund a part-time developer or an AI service subscription (Vertex AI or another platform) and compensate staff time spent on oversight. Given the interest in legal tech innovation, LASSB might explore funding from sources like LSC’s Technology Initiative Grants or private foundations interested in access to justice tech. 

Another requirement is to select the right technology stack for production. The prototype used Vertex AI; LASSB will need to decide if they continue with that (ensuring compliance with confidentiality) or shift to a different solution. Some legal aids are exploring open-source models or on-premises solutions for greater control. The trade-offs (development effort vs. control) should be weighed. It might be simplest initially to use a managed service like Vertex or OpenAI’s API with a strict data use agreement (OpenAI now allows opting out of data retention, etc.). 

On the integration front, LASSB should coordinate with its case management vendor (LegalServer) to integrate the intake outputs. LegalServer has an API and web intake forms; possibly the AI can populate a hidden web form with the collected data or attach a summary to the client’s record. Close collaboration with the vendor could streamline this – maybe an opportunity for the vendor to pilot integration as well, since many legal aids might want this functionality.

As deployment nears, testing and monitoring protocols must be in place. For the pilot, LASSB should define how it will measure success: e.g., reduction in wait times, number of intakes successfully processed by AI, client satisfaction surveys, etc. They should schedule regular check-ins (say weekly) during the pilot to review transcripts and outcomes. Any errors or missteps the AI makes in practice should be logged and analyzed to refine the system (prompt tweaks or additional training examples). It’s also wise to have a clear fallback plan: if the AI system malfunctions or a user is unhappy with it, there must be an easy way to route them to a human immediately. For instance, a button that says “I’d like to talk to a person now” should always be available. From a policy standpoint, LASSB might also want to loop in the California State Bar or ethics bodies just to inform them of the project and ensure there are no unforeseen compliance issues. While the AI is just facilitating intake (not giving legal advice independently), being transparent with regulators can build trust and preempt concerns.

Broader Lessons for Replication 

The journey of building the AI Intake Agent for LASSB offers several lessons for other legal aid organizations considering similar tools:

Start Small and Specific

One lesson is to narrow the use case initially. Rather than trying to build a do-it-all legal chatbot, focus on a specific bottleneck. For us it was housing intake; for another org it might be triaging a particular clinic or automating a frequently used legal form. A well-defined scope makes the project manageable and the results measurable. It also limits the risk surface. Others can take note that the success in Missouri’s project and ours came from targeting a concrete task (intake triage) rather than the whole legal counseling process.

Human-Centered Design is Key

Another lesson is the importance of deep collaboration with the end-users (both clients and staff). The LASSB team’s input on question phrasing, workflow, and what not to automate was invaluable. Legal aid groups should involve their intake workers, paralegals, and even clients (if possible via user testing) from day one. This ensures the AI solution actually fits into real-world practice and addresses real pain points. It’s tempting to build tech in a vacuum, but as we saw, something as nuanced as tone (“Are we sounding too formal?”) only gets addressed through human feedback. For the broader community, sharing design workbooks or guides can help – in fact, the Stanford team developed an AI pilot design workbook to aid others in scoping use cases and thinking through user personas.

Combine Rules and AI for Reliability

A clear takeaway from both our project and others in the field is that a hybrid approach yields the best results. Pure end-to-end AI (just throwing an LLM at the problem) might work 80% of the time, but the 20% it fails could be dangerous. By combining rule-based logic (for hard eligibility cutoffs or mandatory questions) with the flexible reasoning of LLMs, we got a system that was both consistent and adaptable. Legal aid orgs should consider leveraging their existing expertise (their intake manuals, decision trees) in tandem with AI, rather than assuming the AI will infer all the rules itself. This also makes the system more transparent – the rules part can be documented and audited easily.

Don’t Neglect Data Privacy and Ethics

Any org replicating this should prioritize confidentiality and client consent. Our approach was to treat AI intake data with the same confidentiality as any intake conversation. Others should do the same and ensure their AI vendors comply. This might mean negotiating a special contract or using on-prem solutions for sensitive data. Ethically, always disclose to users that they’re interacting with AI. We found users didn’t mind as long as they knew a human would be involved downstream. But failing to disclose could undermine trust severely if discovered. Additionally, groups should be wary of algorithmic bias

Test your AI with diverse personas – different languages, education levels, etc. – to see if it performs equally well. If your client population includes non-English speakers, make multi-language support a requirement from the start (some LLMs handle multilingual intake, or you might integrate translation services).

Benchmark and Share Outcomes

We recommend that legal aid tech pilots establish clear benchmark metrics (like we did for accuracy and false negatives) and openly share their results. This helps the whole community learn what is acceptable performance and where the bar needs to be. As AI in legal aid is still new, a shared evidence base is forming. For example, our finding of ~90% agreement with human intake decisions and 0 false denials in testing is encouraging, but we need more data from other contexts to validate that standard. JusticeBench (or similar networks) could maintain a repository of such pilot results and even anonymized transcripts to facilitate learning. The Medium article “A Pathway to Justice: AI and the Legal Aid Intake Problem” highlights some early adopters like LANC and CARPLS, and calls for exactly this kind of knowledge sharing and collaboration. Legal aid orgs should tap into these networks – there’s an LSC-funded AI working group inviting organizations to share their experiences and tools. Replication will be faster and safer if we learn from each other.

Policy and Regulatory Considerations

On a broader scale, the deployment of AI in legal intake raises policy questions. Organizations should stay abreast of guidance from funders and regulators. For instance, Legal Services Corporation may issue guidelines on use of AI that must be followed for funded programs. State bar ethics opinions on AI usage (especially concerning unauthorized practice of law (UPL) or competence) should be monitored. 

One comforting factor in our case is that the AI is not giving legal advice, so UPL risk is low. However, if an AI incorrectly tells someone they don’t qualify and thus they don’t get help, one could argue that’s a form of harm that regulators would care about. Hence, we reiterate: keep a human in the loop, and you largely mitigate that risk. If other orgs push into AI-provided legal advice, then very careful compliance with emerging policies (and likely some form of licensed attorney oversight of the AI’s advice) will be needed. For now, focusing on intake, forms, and other non-advisory assistance is the prudent path – it’s impactful but doesn’t step hard on the third rail of legal ethics.

Maintain the Human Touch

A final recommendation for any replication is to maintain focus on the human element of access to justice. AI is a tool, not an end in itself. Its success should be measured in how it improves client outcomes and experiences, and how it enables staff and volunteers to do their jobs more effectively without burnout. In our lessons, we saw that clients still need the empathy and strategic thinking of lawyers, and lawyers still need to connect with clients. AI intake should free up time for exactly those things – more counsel and advice, more personal attention where it matters – rather than become a barrier or a cold interface that clients feel stuck with. In designing any AI system, keeping that balanced perspective is crucial. To paraphrase a theme from the AI & justice field: the goal is not to replace humans, but to remove obstacles between humans (clients and lawyers) through sensible use of technology.

Policy and Ethical Considerations

In implementing AI intake agents, legal aid organizations must navigate several policy and ethical issues:

Confidentiality & Data Security

Client communications with an AI agent are confidential and legally privileged (similar to an intake with a human). Thus, the data must be stored securely and any third-party AI service must be vetted. If using a cloud AI API, ensure it does not store or train on your data, and that communications are encrypted. Some orgs may opt for self-hosted models to have full control. Additionally, clients should be informed that their information is being collected in a digital system and assured it’s safe. This transparency aligns with ethical duties of confidentiality.

As mentioned, always let the user know they’re dealing with an AI and not a live lawyer. This can be in a welcome message or a footnote on the chat interface. Users have a right to know and to choose an alternative. Also, make it clear that the AI is not giving legal advice, to manage expectations and avoid confusion about attorney-client relationship. Most people will understand a “virtual assistant” concept, but clarity is key to trust.

Guarding Against Improper Gatekeeping

Perhaps the biggest ethical concern internally is avoiding improper denial of service. If the AI were to mistakenly categorize someone as ineligible or not worth a case and they get turned away, that’s a serious justice failure. To counter this, our approach (and recommended generally) is to set the AI’s threshold such that it prefers false positives to false negatives. In practice, this means any close call gets escalated to a human. 

Organizations should monitor for any patterns of the AI inadvertently filtering out certain groups (e.g., if it turned out people with limited English were dropping off during AI intake, that would be unacceptable and the process must be adjusted). Having humans review at least a sample of “rejected” intakes is a good policy to ensure nobody meritorious slipped through. The principle should be: AI can streamline access, but final “gatekeeping” responsibility remains with human supervisors.

Bias and Fairness

AI systems can inadvertently perpetuate biases present in their training data. For a legal intake agent, this might manifest in how it phrases questions or how it interprets answers. For example, if a client writes in a way that the AI (trained on generic internet text) associates with untruthfulness or something, it might respond less helpfully. We must actively guard against such bias. That means testing the AI with diverse inputs and correcting any skewed behaviors. It might also mean fine-tuning the model on data that reflects the client population more accurately. 

Ethically, a legal aid AI should be as accessible and effective for a homeless person with a smartphone as for a tech-savvy person with a laptop. Fairness also extends to disability access – e.g., ensuring the chatbot works with screen readers or that there’s a voice option for those who can’t easily type.

Accuracy and Accountability

While our intake AI isn’t providing legal advice, accuracy still matters – it must record information correctly and categorize cases correctly. Any factual errors (like mistyping a date or mixing up who is landlord vs. tenant in the summary) could have real impacts. Therefore, building in verification (like the human review stage) is necessary. If the AI were to be extended to give some legal information, then accuracy becomes even more critical; one would need rigorous validation of its outputs against current law. 

Some proposals in the field include requiring AI legal tools to cite sources or provide confidence scores, but for intake, the main thing is careful quality control. Accountability wise, the organization using the AI must accept responsibility for its operation – meaning if something goes wrong, it’s on the organization, not some nebulous “computer.” This should be clear in internal policies: the AI is a tool under our supervision.

UPL and Ethical Practice

We touched on unauthorized practice of law concerns. Since our intake agent doesn’t give advice, it should not cross UPL lines. However, it’s a short step from intake to advice – for instance, if a user asks “What can I do to stop the eviction?” the AI has to hold the line and not give advice. Ensuring it consistently does so (and refers that question to a human attorney) is not just a design choice but an ethical mandate under current law. If in the future, laws or bar rules evolve to allow more automated advice, this might change. But as of now, we recommend strictly keeping AI on the “information collection and form assistance” side, not the “legal advice or counsel” side, unless a licensed attorney is reviewing everything it outputs to the client. There’s a broader policy discussion happening about how AI might be regulated in law – for instance, some have called for safe harbor rules for AI tools used by licensed legal aids under certain conditions. Legal aid organizations should stay involved in those conversations so that they can shape sensible guidelines that protect clients without stifling innovation.

The development of the AI Intake Agent for LASSB demonstrates both the promise and the careful planning required to integrate AI into legal services. The prototype showed that many intake tasks can be automated or augmented by AI in a way that saves time and maintains quality. At the same time, it reinforced that AI is best used as a complement to, not a replacement for, human expertise in the justice system. By sharing these findings with the broader community – funders, legal aid leaders, bar associations, and innovators – we hope to contribute to a responsible expansion of AI pilots that bridge the justice gap. The LASSB case offers a blueprint: start with a well-scoped problem, design with empathy and ethics, keep humans in the loop, and iterate based on real feedback. Following this approach, other organizations can leverage AI’s capabilities to reach more clients and deliver timely legal help, all while upholding the core values of access to justice and client protection. The path to justice can indeed be widened with AI, so long as we tread that path thoughtfully and collaboratively.

Categories
AI + Access to Justice Class Blog Current Projects Project updates

Demand Letter AI

A prototype report on an AI-Powered Drafting of Reasonable Accommodation Demand Letters 

AI for Legal Help, Legal Design Lab, 2025

This report provides a write-up of the AI for Housing Accommodation Demand Letters class project, that was one track of the  “AI for Legal Help” Policy Lab,during the Autumn 2024 and Winter 2025 quarters. This class involved work with legal and court groups that provide legal help services to the public, to understand where responsible AI innovations might be possible and to design and prototype initial solutions, as well as pilot and evaluation plans.

One of the project tracks was on Demand Letters. An interdisciplinary team of Stanford University students partnered with the Legal Aid Society of San Bernardino (LASSB) to address a critical bottleneck in their service delivery: the time-consuming process of drafting reasonable accommodation demand letters for tenants with disabilities. 

This report details the problem identified by LASSB, the proposed AI-powered solution developed by the student team, and recommendations for future development and implementation. 

We share it in the hopes that legal aid and court help center leadership might also be interested in exploring responsible AI development for demand letters, and that funders, researchers, and technologists might collaborate on developing and testing successful solutions for this task.

Thank you to students in this team: Max Bosel, Adam Golomb, Jay Li, Mitra Solomon, and Julia Stroinska. And a big thank you to our LASSB colleagues: Greg Armstrong, Pablo Ramirez, and more

The Housing Accommodation Demand Letter Task

The Legal Aid Society of San Bernardino (LASSB) is a nonprofit law firm providing free legal services to low-income residents in San Bernardino County, California. Among their clients are tenants with disabilities who often need reasonable accommodation demand letters to request changes from landlords (for example, allowing a service animal in a “no pets” building). 

These demand letters are formal written requests asserting tenants’ rights under laws like the Americans with Disabilities Act (ADA) and Fair Housing Act (FHA). They are crucial for tenants to secure accommodations and avoid eviction, but drafting them properly is time-consuming and requires legal expertise. LASSB faces overwhelming demand for help in this area – its hotline receives on the order of 100+ calls per day from tenants seeking assistance. 

However, LASSB has only a handful of intake paralegals and housing attorneys available, meaning many callers must wait a long time or never get through. In fact, LASSB serves around 9–10,000 clients per year via the hotline, yet an estimated 15,000 additional calls never reach help due to capacity limits. Even for clients who do get assistance, drafting a personalized, legally sound letter can take hours of an attorney’s time. With such limited staffing, LASSB’s attorneys are stretched thin, and some eligible clients may end up without a well-crafted demand letter to assert their rights.

LASSB presented their current workflow and questions about AI opportunities in September 2024, and a team of students in AI for Legal Help formed to partner on this task and explore an AI-powered solution. 

The initial question from LASSB was whether we could leverage recent advances in AI to draft high-quality demand letter templates automatically, thereby relieving some burden on staff and improving capacity to serve clients. The goal was to have an AI system gather information from the client and produce a solid first draft letter that an attorney could then quickly review and approve. By doing so, LASSB hoped to streamline the demand-letter workflow – saving attorney time, reducing errors or inconsistencies, and ensuring more clients receive help. 

Importantly, any AI agent would not replace attorney judgment or final sign-off. Rather, it would act as a virtual assistant or co-pilot: handling the routine drafting labor while LASSB staff maintain complete control over the final output. Key objectives set by the partner included improving efficiency, consistency, and accessibility of the service, while remaining legally compliant and user-friendly. In summary, LASSB needed a way to draft reasonable accommodation letters faster without compromising quality. 

After two quarters of work, the class teams proposed a Demand Letter AI system, creating a prototype AI agent that would interview clients about their situation and automatically generates a draft accommodation request letter. This letter would cite the relevant laws and follow LASSB’s format, ready for an attorney’s review. By adopting such a tool, LASSB hopes to minimize the time attorneys spend on repetitive drafting tasks and free them to focus on providing direct counsel and representation. The remainder of this report details the use case rationale, the current vs. envisioned workflow, the technical prototyping process, evaluation approach, and recommendations for next steps in developing this AI-assisted demand letter system.

Why is the Demand Letter Task a good fit for AI?

Reasonable accommodation demand letters for tenants with disabilities were chosen as the focus use case for several reasons. 

The need is undeniably high: as noted, LASSB receives a tremendous volume of housing-related calls, and many involve disabled tenants facing issues like a landlord refusing an exception to a policy (no-pets rules, parking accommodations, unit modifications, etc.). These letters are often the gateway to justice for such clients – a well-crafted letter can persuade a landlord to comply without the tenant ever needing to file a complaint or lawsuit. Demand letters are a high-impact intervention that can prevent evictions and ensure stable housing for vulnerable tenants. Focusing on this use case meant the project could directly improve outcomes for a large number of people, aligning with LASSB’s mission of “justice without barriers – equitable access for all.” 

At the same time, drafting each letter individually is labor-intensive. Attorneys must gather the details of the tenant’s disability and accommodation request, explain the legal basis (e.g. FHA and California law), and compose a polite but firm letter to the landlord. With LASSB’s staff attorneys handling heavy caseloads, these letters sometimes get delayed or delegated to clients themselves to write (with mixed results). Inconsistent quality and lack of time for thorough review are known issues. This use case presented a clear opportunity for AI to assist to improve the consistency and quality of the letter itself. 

The task of writing letters is largely document-generation – a pattern that advanced language models are well-suited for. Demand letters follow a relatively standard structure (explain who you are, state the request, cite laws, etc.), and LASSB already uses templates and boilerplate language for some sections. This means an AI could be trained or prompted to follow that format and fill in the specifics for each client. By leveraging an AI to draft the bulk of the text, each letter could be produced much faster, with the model handling the repetitive phrasing and legal citations while the attorney only needs to make corrections or additions. 

Crucially, using AI here could increase LASSB’s capacity. Rather than an attorney spending, say, 2-3 hours composing a letter from scratch, the AI might generate a solid draft in minutes, requiring perhaps 15 minutes of review and editing. The project team estimated that integrating an AI tool into the workflow could save on the order of 1.5–2.5 hours per client in total staff time. Scaled over dozens of cases, those saved hours mean more clients served and shorter wait times for help. This efficiency gain is attractive to funders and legal aid leaders because it stretches scarce resources further. 

AI can help enforce consistency and accuracy. It would use the same approved legal language across all letters, reducing the chance of human error or omissions in the text. For clients, this translates into a more reliable service – they are more likely to receive a well-written letter regardless of which attorney or volunteer is assisting them. 

The reasonable accommodation letter use case was selected because it sits at the sweet spot of high importance and high potential for automation. It addresses a pressing need for LASSB’s clients (ensuring disabled tenants can assert their rights) and plays to AI’s strengths (generating structured documents from templates and data). By starting with this use case, the project aimed to deliver a tangible, impactful tool that could quickly demonstrate value – a prototype AI assistant that materially improves the legal aid workflow for a critical class of cases.


Workflow Vision:

From Current Demand Letter Process to Future AI-Human Collaboration

To understand the impact of the proposed solution, it’s important to compare the current human-driven workflow of creating Demand Letters and the envisioned future workflow where an AI assistant is integrated. Below, we outline the step-by-step process today and how it would change with the AI prototype in place. 

Current Demand Letter Workflow (Status Quo)

When a tenant with a disability encounters an issue with their landlord (for example, the landlord is refusing an accommodation or threatening eviction over a disability-related issue), the tenant must navigate several steps to get a demand letter:

  • Initial Intake Call: The tenant contacts LASSB’s hotline and speaks to an intake call-taker (often a paralegal). The tenant explains their situation and disability, and the intake worker records basic information and performs eligibility screening (checking income, conflict of interest, etc.). If the caller is eligible and the issue is within LASSB’s scope, the case is referred to a housing attorney for follow-up.
  • Attorney Consultation: The tenant then has to repeat their story to a housing attorney (often days later). The attorney conducts a more in-depth interview about the tenant’s disability needs and the accommodation they seek. At this stage, the attorney determines if a reasonable accommodation letter is the appropriate course of action. (If not – for example, if the problem requires a different remedy – the attorney would advise on next steps outside the demand letter process.)
  • Letter Drafting: If a demand letter is warranted, the process for drafting it is currently inconsistent. In some cases, the attorney provides the client with a template or “self-help” packet on how to write a demand letter and asks the client to draft it themselves. In other cases, the attorney or a paralegal might draft the letter on the client’s behalf. With limited time, attorneys often cannot draft every letter from scratch, so the level of assistance varies. Clients may end up writing the first draft on their own, which can lead to incomplete or less effective letters. (One LASSB attorney noted that tenants frequently have to “explain their story at least twice” – to the intake worker and attorney – “and then have to draft/send the demand letter with varying levels of help”.)
  • Review and Delivery: Ideally, if the client drafts the letter, they will bring it back for the attorney to review and approve. Due to time pressures, however, attorney review isn’t always thorough, and sometimes letters go out without a detailed legal polish. Finally, the tenant sends the demand letter to the landlord, either by mail or email (or occasionally LASSB sends it on the client’s behalf). At this point, the process relies on the landlord’s response; LASSB’s involvement usually ends unless further action (like litigation) is needed.

This current workflow places a heavy burden on the tenant and the attorney. The tenant must navigate multiple conversations and may end up essentially drafting their own legal letter. The attorney must spend time either coaching the client through writing or drafting the letter themselves, on top of all their other cases. Important information can slip through the cracks when the client is interviewed multiple times by different people. There is also no consistent tracking of what advice or templates were given to the client, leading to variability in outcomes. Overall, the process can be slow (each step often spreads over days or weeks of delay) and resource-intensive, contributing to the bottleneck in serving clients.


Proposed AI-Assisted Workflow (Future Vision)

In the reimagined process, an AI agent would streamline the stages between intake and letter delivery, working in tandem with LASSB staff.

After a human intake screens the client, the AI Demand Letter Assistant takes over the interview to gather facts and draft the letter. The attorney then reviews the draft and finalizes the letter for the client to send.

  • Post-Intake AI Interview: Once a client has been screened and accepted for services by LASSB’s intake staff, the AI Demand Letter Assistant engages the client in a conversation (via chat or a guided web form; a phone interface could also be possible). The AI introduces itself as a virtual assistant working with LASSB and uses a structured but conversational script to collect all information relevant to the accommodation request. This includes the client’s basic details, details of the disability and needed accommodation, the landlord’s information, and any prior communications or incidents (e.g. if the tenant has asked before or if the landlord has issued notices). The assistant is programmed to use trauma-informed language – it asks questions in a supportive, non-threatening manner and adjusts wording to the client’s comfort, recognizing that relaying one’s disability needs can be sensitive. Throughout the interview, the AI can also perform helpful utilities, such as inserting the current date or formatting addresses correctly, to ensure the data it gathers is ready for a letter.
  • Automatic Letter Generation: After the AI has gathered all the necessary facts from the client, it automatically generates a draft demand letter. The generation is based on LASSB-approved templates and includes the proper formal letter format (date, addresses, RE: line, etc.), a clear statement of the accommodation request, and citations to relevant laws/regulations (like referencing the FHA, ADA, or state law provisions that apply). The AI uses the information provided by the client to fill in key details – for example, describing the tenant’s situation (“Jane Doe, who has an anxiety disorder, requests an exception to the no-pets policy to allow her service dog”) and customizing the legal rationale to that scenario. Because the AI has been trained on example letters and legal guidelines, it can include the correct legal language to strengthen the demand. It also ensures the tone remains polite and professional. At the end of this step, the AI has a complete draft letter ready.
  • Attorney Review & Collaboration: The draft letter, along with a summary of the client’s input or a transcript of the Q&A, is then forwarded to a LASSB housing attorney for review. The attorney remains the ultimate decision-maker – they will read the AI-drafted letter and check it for accuracy, appropriate tone, and effectiveness. If needed, the attorney can edit the letter (either directly or by giving feedback to the AI to regenerate specific sections). The AI could also highlight any uncertainties (for instance, if the client’s explanation was unclear on a point, the draft might flag that for attorney clarification). Importantly, no letter is sent out without attorney approval, ensuring that professional legal judgment is applied. This human-in-the-loop review addresses ethical duties (attorneys must supervise AI work as they would a junior staffer) and maintains quality control. In essence, the AI does the first 90% of the drafting, and the attorney provides the final 10% refinement and sign-off.
  • Delivery and Follow-Up: After the attorney finalizes the content, the letter is ready to be delivered to the landlord. In the future vision, this could be as simple as clicking a button to send the letter via email or printing it for mailing. (The prototype also floated ideas like integrating with DocuSign or generating a PDF that the client can download and sign.) The client then sends the demand letter to the landlord, formally requesting the accommodation. Ideally, this happens much faster than in the current process – potentially the same day as the attorney consultation, since the drafting is near-instant. LASSB envisioned that the AI might even assist in follow-up: for instance, checking back with the client a couple weeks later to ask if the landlord responded, and if not, suggesting next steps. (This follow-up feature was discussed conceptually, though not implemented in the prototype.) In any case, by the end of the workflow, the client has a professionally crafted letter in hand, and they did not have to write it alone.

The benefits of this AI-human collaboration are significant. It eliminates the awkward gap where a client might be left drafting a letter on their own; instead, the client is guided through questions by the AI and sees a letter magically produced from their answers. It also reduces duplicate interviewing – the client tells their full story once to the AI (after intake), rather than explaining it to multiple people in pieces. 

For the attorney, the time required to produce a letter drops dramatically. Rather than spending a couple of hours writing and editing, an attorney might spend 10–20 minutes reviewing the AI’s draft, tweaking a phrase or two, and approving it. The team’s estimates suggest each case could save on the order of 1.5–2.5 hours of staff time under this new workflow. Those savings translate into lower wait times and the ability for LASSB to assist many more clients in a given period with the same staff. In broader terms, more tenants would receive the help they need, fewer calls would be abandoned, and LASSB’s attorneys could devote more attention to complex cases (since straightforward letters are handled in part by the AI). 

The intended impact is “more LASSB clients have their day in court… more fair and equitable access to justice for all”, as the student team put it – in this context meaning more clients are able to assert their rights through demand letters, addressing issues before they escalate. The future vision sees the AI prototype seamlessly embedded into LASSB’s service delivery: after a client is screened by a human, the AI takes on the heavy lifting of information gathering and document drafting, and the human attorney ensures the final product meets the high standards of legal practice. This collaboration could save time, improve consistency, and ultimately empower more tenants with disabilities to get the accommodations they need to live safely and with dignity.


Technical Approach and Prototyping: What We Built and How It Works

With the use case defined, the project team proceeded to design and build a working prototype AI agent for demand letter drafting. This involved an iterative process of technical development, testing, and refinement over two academic quarters. In this section, we describe the technical solution – including early prototypes, the final architecture, and how the system functions under the hood.

Early Prototype and Pivot 

In Autumn 2024, the team’s initial prototype focused on an AI intake interviewing agent (nicknamed “iNtake”) as well as a rudimentary letter generator. They experimented with a voice-based assistant that could talk to clients over the phone. Using tools like Twilio (for telephony and text messaging) and Google’s Dialogflow/Chatbot interfaces, they set up a system where a client could call a number and interact with an AI-driven phone menu. The AI would ask the intake questions in a predefined script and record the answers. 

Behind the scenes, the prototype leveraged a large language model (LLM) – essentially an AI text-generation engine – to handle the conversational aspect. The team used a model configuration referred to as “gemini-1.5-flash”, which was integrated into the phone chatbot. 

This early system demonstrated some capabilities (it could hold a conversation and hand off to a human if needed), but also revealed significant challenges. The script was over 100 questions long and not trauma-informed – users found it tedious and perhaps impersonal. Additionally, the AI sometimes struggled with the decision-tree logic of intake. 

After several iterations and feedback from instructors and LASSB, the team decided to pivot. They narrowed the scope to concentrate on the Demand Letter Agent – a chatbot that would come after intake to draft the letter. The phone-based intake AI became a separate effort (handled by another team in Winter 2025), while our team focused on the letter generator. 

Final Prototype Design

The Winter 2025 team built upon the fall work to create a functioning AI chat assistant for demand letters. The prototype operates as an interactive chatbot that can be used via a web interface (in testing, it was run on a laptop, but it could be integrated into LASSB’s website or a messaging platform in the future). Here’s how it works in technical terms.

The AI agent was developed using a generative Large Language Model (LLM) – similar to the technology behind GPT-4 or other modern conversational AIs. This model was not trained from scratch by the team (which would require huge data and compute); instead, the team used a pre-existing model and focused on customizing it through prompt engineering and providing domain-specific data. In practical terms, the team created a structured “AI playbook” or prompt script that guides the model step-by-step to perform the task.

Data and Knowledge Integration

One of the first steps was gathering all relevant reference material to inform the AI’s outputs. The team collected LASSB’s historical demand letters (redacted for privacy), which provided examples of well-written accommodation letters. They also pulled in legal sources and guidelines: for instance, the U.S. Department of Justice’s guidance memos on reasonable accommodations, HUD guidelines, trauma-informed interviewing guidelines, and lists of common accommodations and impairments. These documents were used to refine the AI’s knowledge. 

Rather than blindly trusting the base model, the team explicitly incorporated key legal facts – such as definitions of “reasonable accommodation” and the exact language of FHA/FEHA requirements – into the AI’s prompt or as reference text the AI could draw upon. Essentially, the AI was primed with: “Here are the laws and an example demand letter; now follow this format when drafting a new letter.” This helped ensure the output letters would be legally accurate and on-point.

Prompt Engineering

The heart of the prototype is a carefully designed prompt/instruction set given to the AI model. The team gave the AI a persona and explicit instructions on how to conduct the conversation and draft the letter. For example, the assistant introduces itself as “Sofia, the Legal Aid Society of San Bernardino’s Virtual Assistant” and explains its role to the client (to help draft a letter). The prompt includes step-by-step instructions for the interview: ask the client’s name, ask what accommodation they need, confirm details, etc., in a logical order (it’s almost like a decision-tree written in natural language form). A snippet of the prompt (from the “Generative AI playbook”) is shown below:

Excerpt from the AI assistant’s instruction script. The agent is given a line-by-line guide to greet the client, collect information (names, addresses, disability details, etc.), and even call a date-time tool to insert the current date for the letter. 

The prompt also explicitly instructs the AI on legal and ethical boundaries. For instance, it was told: “Your goal is to write and generate a demand letter for reasonable accommodations… You do not provide legal advice; you only assist with drafting the letter.”. This was crucial to prevent the AI from straying into giving advice or making legal determinations, which must remain the attorney’s domain. By iteratively testing and refining this prompt, the team taught the AI to stay in its lane: ask relevant questions, be polite and empathetic, and focus on producing the letter.

Trauma-Informed and Bias-Mitigation Features

A major design consideration was ensuring the AI’s tone and behavior were appropriate for vulnerable clients. The team trained the AI (through examples and instructions) to use empathetic language – e.g., thanking the client for sharing information, acknowledging difficulties – and to avoid any phrasing that might come off as judgmental or overly clinical. The AI was also instructed to use the client’s own words when possible and not to press sensitive details unnecessarily. On the technical side, the model was tested for biases. The team used diverse example scenarios to ensure the AI’s responses wouldn’t differ inappropriately based on the nature of the disability or other client attributes. Regular audits of outputs were done to catch any bias. For example, they made sure the AI did not default to male pronouns for landlords or assume anything stereotypical about a client’s condition. These measures align with best practices to ensure the AI’s output is fair and respects all users.

Automated Tools Integration

The prototype included some clever integrations of simple tools to enhance accuracy. One such tool was a date function. In early tests, the AI sometimes forgot to put the current date on the letter or used a generic placeholder. To fix this, the team connected the AI to a utility that fetches the current date. During the conversation, if the user is ready to draft the letter, the AI will call this date function and insert the actual current date into the letter heading. This ensures the generated letter always shows (for example) “May 19, 2023” rather than a hardcoded date. Similarly, the AI was guided to properly format addresses and other elements (it asks for each component like city, state, ZIP and then concatenates them in the letter format). These might seem like small details, but they significantly improve the professionalism of the output.

Draft Letter Generation

Once the AI has all the needed info, it composes the letter in real-time. It follows the structure from the prompt and templates: the letter opens with the date and address, a reference line (“RE: Request for Reasonable Accommodation”), a greeting, and an introduction of the client. Then it lays out the request and the justification, citing the laws, and closes with a polite sign-off. The content of the letter is directly based on the client’s answers. For instance, if the client said they have an anxiety disorder and a service dog, the letter will include those details and explain why the dog is needed. The AI’s legal knowledge ensures that it inserts the correct references to the FHA and California Fair Employment and Housing Act (FEHA), explaining that landlords must provide reasonable accommodations unless it’s an undue burden. 

An example output is shown below:

Sample excerpt from an AI-generated reasonable accommodation letter. In this case, the tenant (Jane Doe) is requesting an exception to a “no pets” policy to allow her service dog. The AI’s draft includes the relevant law citations (FHA and FEHA) and a clear explanation of why the accommodation is necessary. 

As seen in the example above, the AI’s letter closely resembles one an attorney might write. It addresses the landlord respectfully (“Dear Mr. Jones”), states the tenant’s name and address, and the accommodation requested (permission to keep a service animal despite a no-pet policy). It then cites the Fair Housing Act and California law, explaining that these laws require exceptions to no-pet rules as a reasonable accommodation for persons with disabilities. It describes the tenant’s specific circumstances (the service dog helps manage her anxiety, etc.) in a factual and supportive tone. It concludes with a request for a response within a timeframe and a polite thank you. All of this text was generated by the AI based on patterns it learned from training data and the prompt instructions – the team did not manually write any of these sentences for this particular letter, showing the generative power of the AI. The attorney’s role would then be to review this draft. 

In our tests, attorneys found the drafts to be surprisingly comprehensive. They might only need to tweak a phrase or add a specific detail. For example, an attorney might insert a line offering to provide medical documentation if needed, or adjust the deadline given to the landlord. But overall, the AI-generated letters were on point and required only light editing. 

Testing and Iteration

The development of the prototype involved iterative testing and debugging. Early on, the team encountered some issues typical of advanced AI systems and worked to address them.

Getting the agent to perform consistently

Initially, the AI misunderstood its task at times. In the first demos, when asked to draft a letter, the AI would occasionally respond with “I’m sorry, I can’t write a letter for you”, treating it like a prohibited action. This happened because base language models often have safety rules about not producing legal documents. The team resolved this by refining the prompt to clarify that the AI is allowed and expected to draft the letter as part of its role (since an attorney will review it). Once the AI “understood” it had permission to assist, it complied.

Ensuring the agent produced the right output

The AI also sometimes ended the interview without producing the letter. Test runs showed that if the user didn’t explicitly ask for the letter, the AI might stop after gathering info. To fix this, the team adjusted the instructions to explicitly tell the AI that once it has all the information, it should automatically present the draft letter to the client for review. After adding this, the AI reliably output the draft at the end of the conversation.

We sometimes had the agent offering to do unsolicited tasks, like sending an email. That wasn’t in the configuration, but it was improvising off-script.

Un-sticking the agent, caught in a loop

There were issues with the AI getting stuck or repeating itself. For example, in one scenario, the AI began to loop, apologizing and asking the same question multiple times even after the user answered. 

A screenshot from testing shows the AI repeating “Sorry, something went wrong, can you repeat?” in a loop when it hit an unexpected input. These glitches were tricky to debug – the team adjusted the conversation flow and added checks (like if the user already answered, do not ask again), which reduced but did not completely eliminate such looping. We identified that these loops often stemmed from the model’s uncertainty or minor differences in phrasing that weren’t accounted for in the script.

Dealing with fake or inaccurate info

Another issue was occasional hallucinations or extraneous content. For instance, the AI at one point started offering to “email the letter to the landlord” out of nowhere, even though that wasn’t in its instructions (and it had no email capability). This was the model improvising beyond its intended scope. The team addressed this by tightening the prompt instructions, explicitly telling the AI not to do anything with email and to stick to generating the letter text only. After adding such constraints, these hallucinations became rarer.

Getting consistent letter formatting

The formatting of the letter (dates, addresses, signature line) needed fine-tuning. The AI initially had minor formatting quirks (like sometimes missing the landlord’s address or not knowing how to sign off). By providing a template example and explicitly instructing the inclusion of those elements, the final prototype reliably produced a correctly formatted letter with a placeholder for the client’s signature.

Throughout development, whenever an issue was discovered, the team would update the prompt or the data and test again. This iterative loop – test, observe output, refine instructions – is a hallmark of developing AI solutions and was very much present in this project. 

Over time, the outputs improved significantly in quality and reliability. For example, by the end of the Winter quarter, the AI was consistently using the correct current date (thanks to the date tool integration) and writing in a supportive tone (thanks to the trauma-informed training), which were clear improvements from earlier versions. That said, some challenges remained unsolved due to time limits. 

The AI still showed some inconsistent behaviors occasionally – such as repeating a question in a rare case, or failing to recognize an atypical user response (like if a user gave an extremely long-winded answer that confused the model). The team documented these lingering issues so that future developers can target them. They suspected that further fine-tuning of the model or using a more advanced model could help mitigate these quirks. 

In its final state at the end of Winter 2025, the prototype was able to conduct a full simulated interview and generate a reasonable accommodation demand letter that LASSB attorneys felt was about 80–90% ready to send, requiring only minor edits. 

The technical architecture was a single-page web application interfacing with the AI model (running on a cloud AI platform) plus some back-end scripts for the date tool and data storage. It was not yet integrated into LASSB’s production systems, but it provided a compelling proof-of-concept. 

Observers in the final presentation could watch “Sofia” chat with a hypothetical client (e.g., Martin who needed an emotional support animal) and within minutes, produce a letter addressed to the landlord citing the FHA – something that would normally take an attorney a couple of hours. 

Overall, the technical journey of this project was one of rapid prototyping and user-centered adjustment. The team combined off-the-shelf AI technology with domain-specific knowledge to craft a tool tailored for legal aid. They learned how small changes in instructions can greatly affect an AI’s behavior, and they progressively molded the system to align with LASSB’s needs and values. The result is a working prototype of an AI legal assistant that shows real promise in easing the burden of document drafting in a legal aid context.

Evaluation Framework: Testing, Quality Standards, and Lessons Learned

From the outset, the team and LASSB agreed that rigorous evaluation would be critical before any AI tool could be deployed in practice. The project developed an evaluation framework to measure the prototype’s performance and ensure it met both efficiency goals and legal quality standards. Additionally, throughout development the team reflected on broader lessons learned about using AI in a legal aid environment. This section discusses the evaluation criteria, testing methods, and key insights gained. Quality Standards and Benchmarks: The primary measure of success for the AI-generated letters was that they be indistinguishable (in quality) from letters written by a competent housing attorney. To that end, the team established several concrete quality benchmarks:

  • No “Hallucinations”: The AI draft should contain no fabricated facts, case law, or false statements. All information in the letter must come from the client’s provided data or be generally accepted legal knowledge. For example, the AI should never cite a law that doesn’t exist or insert details about the tenant’s situation that the tenant didn’t actually tell it. Attorneys reviewing the letters specifically check for any such hallucinated content.
  • Legal Accuracy: Any legal assertions in the letter (e.g. quoting the Fair Housing Act’s requirements) must be precisely correct. The letter should not misstate the law or the landlord’s obligations. Including direct quotes or citations from statutes/regulations was one method used to ensure accuracy. LASSB attorneys would verify that the AI correctly references ADA, FHA, FEHA, or other laws as applicable.
  • Proper Structure and Tone: The format of the letter should match what LASSB attorneys expect in a formal demand letter. That means: the letter has a date, addresses for both parties, a clear subject line, an introduction, body paragraphs that state the request and legal basis, and a courteous closing. The tone should be professional – firm but not aggressive, and certainly not rude. One benchmark was that an AI-drafted letter “reads like” an attorney’s letter in terms of formality and clarity. If an attorney would normally include or avoid certain phrases (for instance, saying “Thank you for your attention to this matter” at the end, or avoiding contractions in a formal letter), the AI’s output is expected to do the same.
  • Completeness: The letter should cover all key points necessary to advocate for the client. This includes specifying the accommodation being requested, briefly describing the disability connection, citing the legal right to the accommodation, and possibly mentioning an attached verification if relevant. An incomplete letter (one that, say, only requests but doesn’t cite any law) would not meet the standard. Attorneys reviewing would ensure nothing crucial was missing from the draft.

In addition to letter quality, efficiency metrics were part of the evaluation. The team intended to log how long the AI-agent conversation took and how long the model took to generate the letter, aiming to show a reduction in total turnaround time compared to the status quo. Another metric was the effect on LASSB’s capacity: for example, could implementing this tool reduce the number of calls that drop off due to long waits? In theory, if attorneys spend less time per client, more calls can be returned. The team proposed tracking number of clients served before and after deploying the AI as a long-term metric of success. 

Evaluation Methods

To assess these criteria, the evaluation plan included several components.

Internal Performance Testing

The team performed timed trials of the AI system. They measured the duration of a full simulated interview and letter draft generation. In later versions, the interview took roughly 10–15 minutes (depending on how much detail the client gives), and the letter was generated almost instantly thereafter (within a few seconds). They compared this to an estimate of human drafting time. These trials demonstrated the raw efficiency gain – a consistent turnaround of under 20 minutes for a draft letter, which is far better than the days or weeks it might take in the normal process. They also tracked if any technical slowdowns occurred (for instance, if the AI had to call external tools like the date function, did that introduce delays? It did not measurably – the date lookup was near-instant).

Expert Review (Quality Control)

LASSB attorneys and subject matter experts were involved in reviewing the AI-generated letters. The team conducted sessions where an attorney would read an AI draft and score it on accuracy, tone, and completeness. The feedback from these reviews was generally positive – attorneys found the drafts surprisingly thorough. They did note small issues (e.g., “we wouldn’t normally use this phrasing” or “the letter should also mention that the client can provide a doctor’s note if needed”). 

These observations were fed back into improving the prompt. The expert review process is something that would continue regularly if the tool is deployed: LASSB could institute, say, a policy that attorneys must double-check every AI-drafted letter and log any errors or required changes. Over time, this can be used to measure whether the AI’s quality is improving (i.e., fewer edits needed).

User Feedback

Another angle was evaluating the system’s usability and acceptance by both LASSB staff and clients. The team gathered informal feedback from users who tried the chatbot demo (including a couple of law students role-playing as clients). They also got input from LASSB’s intake staff on whether they felt such a chatbot would be helpful. In a deployed scenario, the plan is to collect structured feedback via surveys. For example, clients could be asked if they found the virtual interview process easy to understand, and attorneys could be surveyed on their satisfaction with the draft letters. High satisfaction ratings would indicate the system is meeting needs, whereas any patterns of confusion or dissatisfaction would signal where to improve (perhaps the interface or the language the AI uses).

Long-term Monitoring

The evaluation framework emphasizes that evaluation isn’t a one-time event. The team recommended continuous monitoring if the prototype moves to production. This would involve regular check-ins (monthly or quarterly meetings) among stakeholders – the legal aid attorneys, paralegals, technical team, etc. – to review how things are going. They could review statistics (number of letters generated, average time saved) and any incidents (e.g., “the AI produced an incorrect statement in a letter on March 3, we caught it in review”). This ongoing evaluation ensures that any emerging issues (perhaps a new type of accommodation request the AI wasn’t trained on) are caught and addressed. It’s akin to maintenance: the AI tool would be continually refined based on real-world use data to ensure it remains effective and trustworthy.

Risk and Ethical Considerations

Part of the evaluation also involved analyzing potential risks. The team did a thorough risk, ethics, and regulation analysis in their final report to make sure any deployment of the AI would adhere to legal and professional standards. Some key points from that analysis:

Data Privacy & Security

The AI will be handling sensitive client information (details about disabilities, etc.). The team stressed the need for strict privacy safeguards – for instance, if using cloud AI services, ensuring they are HIPAA-compliant or covered by appropriate data agreements. They proposed measures like encryption of stored transcripts and obtaining client consent for using an AI tool. Any integration with LASSB’s case management (LegalServer) would have to follow data protection policies.

Bias and Fairness

They cautioned that AI models can inadvertently produce biased outputs if not properly checked. For example, might the AI’s phrasing be less accommodating to a client with a certain type of disability due to training data bias? The mitigation is ongoing bias testing and using a diverse dataset for development. The project incorporated an ethical oversight process to regularly audit letters for any bias or inappropriate language.

Acceptance by Courts/Opposing Parties

A unique consideration for legal documents is whether an AI-drafted letter (or brief) will be treated differently by its recipient. The team noted recent cases of courts being skeptical of lawyers’ use of ChatGPT, emphasizing lawyers’ duty to verify AI outputs. For demand letters (which are not filed in court but sent to landlords), the risk is lower than in litigation, but still LASSB must ensure the letters are accurate to maintain credibility. If a case did go to court, an attorney might need to attest that they supervised the drafting. Essentially, maintaining transparency and trust is important – LASSB might choose to inform clients about the AI-assisted system (to manage expectations) and would certainly ensure any letter that ends up as evidence has been vetted by an attorney.

Professional Responsibility

The team aligned the project with guidance from the American Bar Association and California State Bar on AI in law practice. These guidelines say that using AI is permissible as long as attorneys ensure competence, confidentiality, and no unreasonable fees are charged for it. In practice, that means LASSB attorneys must be trained on how to use the AI tool correctly, must keep client data safe, and must review the AI’s work. The attorney remains ultimately responsible for the content of the letter. The project’s design – always having a human in the loop – was very much informed by these professional standards.

Lessons Learned

Over the course of the project, the team gained valuable insights, both in terms of the technology and the human element of implementing AI in legal services. Some of the key lessons include the following.

AI is an Augmenting Tool, Not a Replacement for Human Expertise

Perhaps the most important realization was that AI cannot replace human empathy or judgment in legal aid. The team initially hoped the AI might handle more of the process autonomously, but they learned that the human touch is irreplaceable for sensitive client interactions. For example, the AI can draft a letter, but it cannot (and should not) decide whether a client should get a letter or what strategic advice to give – that remains with the attorney. Moreover, clients often need empathy and reassurance that an AI cannot provide on its own. As one reflection noted, the AI might be very efficient, “however, we learned that AI cannot replace human empathy, which is why the final draft letter always goes to an attorney for final review and client-centered adjustment.” In practice, the AI assists, and the attorney still personalizes the counsel.

Importance of Partner Collaboration and User-Centered Design

The close collaboration with LASSB staff was crucial. Early on, the team had some misaligned assumptions (e.g., focusing on a technical solution that wasn’t actually practical in LASSB’s context, like the phone intake bot). By frequently communicating with the partner – including weekly check-ins and showing prototype demos – the team was able to pivot and refine the solution to fit what LASSB would actually use. One lesson was to always “keep the end user in mind”. In this case, the end users were both the LASSB attorneys and the clients. Every design decision (from the tone of the chatbot to the format of the output) was run through the filter of “Is this going to work for the people who have to use it?” For instance, the move from a phone interface to a chat interface was influenced by partner feedback that a phone bot might be less practical, whereas a web-based chat that produces a printable letter fits more naturally into their workflow.

Prototype Iteratively and Be Willing to Pivot

The project reinforced the value of an iterative, agile approach. The team did not stick stubbornly to the initial plan when it proved flawed. They gathered data (user feedback, technical performance data) and made a mid-course correction to narrow the project’s scope. This pivot ultimately led to a more successful outcome. The lesson for future projects is to embrace flexibility – it’s better to achieve a smaller goal that truly works than to chase a grand vision that doesn’t materialize. As noted in the team’s retrospective, “Be willing to pivot and challenge assumptions” was key to their progress.

AI Development Requires Cross-Disciplinary Skills

The students came from law and engineering backgrounds, and both skill sets were needed. They had to “upskill to learn what you need” on the fly – for example, law students learned some prompt-engineering and coding; engineering students learned about fair housing law and legal ethics. For legal aid organizations, this is a lesson that implementing AI will likely require new trainings and collaboration between attorneys and tech experts.

AI Output Continues to Improve with Feedback

Another positive lesson was that the AI’s performance did improve significantly with targeted adjustments. Initially, some doubted whether a model could ever draft a decent legal letter. But by the end, the results were quite compelling. This taught the team that small tweaks can yield big gains in AI behavior – you just have to systematically identify what isn’t working (e.g., the AI refusing to write, or using the wrong tone) and address it. It’s an ongoing process of refinement, which doesn’t end when the class ends. The team recognized that deploying an AI tool means committing to monitor and improve it continuously. As they put it, “there is always more that can be done to improve the models – make them more informed, reliable, thorough, ethical, etc.”. This mindset of continuous improvement is itself a key lesson, ensuring that complacency doesn’t set in just because the prototype works in a demo.

Ethical Guardrails Are Essential and Feasible

Initially, there was concern about whether an AI could be used ethically for legal drafting. The project showed that with the right guardrails – human oversight, clear ethical policies, transparency – it is not only possible but can be aligned with professional standards. The lesson is that legal aid organizations can innovate with AI responsibly, as long as they proactively address issues of confidentiality, accuracy, and attorney accountability. LASSB leadership was very interested in the tool but also understandably cautious; seeing the ethical framework helped build their confidence that this could be done in a way that enhances service quality rather than risks it.

In conclusion, the evaluation phase of the project confirmed that the AI prototype can meet high quality standards (with attorney oversight) and significantly improve efficiency. It also surfaced areas to watch – for example, ensuring the AI remains updated and bias-free – which will require ongoing evaluation post-deployment. The lessons learned provide a roadmap for both this project and similar initiatives: keep the technology user-centered, maintain rigorous quality checks, and remember that AI is best used to augment human experts, not replace them. By adhering to these principles, LASSB and other legal aid groups can harness AI’s benefits while upholding their duty to clients and justice.

Next Steps

Future Development, Open Questions, and Recommendations

The successful prototyping of the AI demand letter assistant is just the beginning. Moving forward, there are several steps to be taken before this tool can be fully implemented in production at LASSB. The project team compiled a set of recommendations and priorities for future development, as well as open questions that need to be addressed. Below is an outline of the next steps:

Expand and Refine the Training Data

To improve the AI’s consistency and reliability, the next development team should incorporate additional data sources into the model’s knowledge base. During Winter 2025, the team gathered a trove of relevant documents (DOJ guidance, HUD memos, sample letters, etc.), but not all of this material was fully integrated into the prototype’s prompts.

Organizing and inputting this data will help the AI handle a wider range of scenarios. For example, there may be types of reasonable accommodations (like a request for a wheelchair ramp installation, or an exemption from a parking fee) that were not explicitly tested yet. Feeding the AI examples or templates of those cases will ensure it can draft letters for various accommodation types, not just the service-animal case.

The Winter team has prepared a well-structured archive of resources and notes for the next team, documenting their reasoning and changes made. It includes, for instance, an explanation of why they decided to focus exclusively on accommodation letters (as opposed to tackling both accommodations and modifications in one agent) – knowledge that will help guide future developers so they don’t reinvent the wheel. Leveraging this prepared data and documentation will be a top priority in the next phase.

Improve the AI’s Reliability and Stability

While the prototype is functional, we observed intermittent issues like the AI repeating itself or getting stuck in loops under certain conditions. Addressing these glitches is critical for a production rollout. The recommendation is to conduct deeper testing and debugging of the model’s behavior under various inputs. Future developers might use techniques like adversarial testing – intentionally inputting confusing or complex information to see where the AI breaks – and then adjusting the prompts or model settings accordingly. There are a few specific issues to fix:

  • The agent occasionally repeats the same question or answer multiple times (this looping behavior might be due to how the conversation history is managed or a quirk of the model). This needs to be debugged so the AI moves on in the script and doesn’t frustrate the user.
  • The agent sometimes fails to recognize certain responses – for example, if a user says “Yeah” instead of “Yes,” will it understand? Ensuring the AI can handle different phrasings and a range of user expressions (including when users might go on tangents or express emotion) is important for robustness.
  • Rarely, the agent might still hallucinate or provide an odd response (e.g., referring to sending an email when it shouldn’t). Further fine-tuning and possibly using a more advanced model with better instruction-following could reduce these occurrences. Exploring the underlying model’s parameters or switching to a model known for higher reliability (if available through the AI platform LASSB chooses) could be an option.

One open question is “why” the model exhibits these occasional errors – it’s often not obvious, because AI models are black boxes to some degree. Future work could involve more diagnostics, such as checking the conversation logs in detail or using interpretability tools to see where the model’s attention is going. Understanding the root causes could lead to more systemic fixes. The team noted that sometimes the model’s mistakes had no clear trigger, which is a reminder that continuous monitoring (as described in evaluation) will be needed even post-launch.

Enhance Usability and Human-AI Collaboration Features

The prototype currently produces a letter draft, but in a real-world setting, the workflow can be made even more user-friendly for both clients and attorneys. Several enhancements are recommended:

Editing Interface

Allow the attorney (or even the client, if appropriate) to easily edit the AI-generated letter in the interface. For instance, after the AI presents the draft, there could be an “Edit” button that opens the text in a word processor-like environment. This would save the attorney from having to copy-paste into a separate document. The edits made could even be fed back to the AI (as learning data) to continuously improve it.

Download/Export Options

Integrate a feature to download the letter as a PDF or Word document. LASSB staff indicated they would want the final letter in a standard format for record-keeping and for the client to send. Automating this (the AI agent could fill a PDF template or use a document assembly tool) would streamline the process. One idea is to integrate with LASSB’s existing document system or use a platform like Documate or Gavel (which LASSB uses for other forms) – the AI could output data into those systems to produce a nicely formatted letter on LASSB letterhead.

Transcript and Summary for Attorneys

When the AI finishes the interview, it can provide not just the letter but also a concise summary of the client’s situation along with the full interview transcript to the attorney. The summary could be a paragraph saying, e.g., “Client Jane Doe requests an exception to no-pet policy for her service dog. Landlord: ABC Properties. No prior requests made. Client has anxiety disorder managed by dog.”

Such a summary, generated automatically, would allow the reviewing attorney to very quickly grasp the context without reading the entire Q&A transcript. The transcript itself should be saved and accessible (perhaps downloadable as well) so the attorney can refer back to any detail if needed. These features will decrease the need for the attorney to re-interview the client, thus preserving the efficiency gains.

User Interface and Guidance

On the client side, ensure the chat interface is easy to use. Future improvements could include adding progress indicators (to show the client how many questions or sections are left), the ability to go back and change an answer, or even a voice option for clients who have difficulty typing (this ties into accessibility, discussed next). Essentially, polish the UI so that it is client-friendly and accessible.

Integration into LASSB’s Workflow 

In addition to the front-end enhancements, the tool should be integrated with LASSB’s backend systems. A recommendation is to connect the AI assistant to LASSB’s case management software (LegalServer) via API. This way, when a letter is generated, a copy could automatically be saved to the client’s case file in LegalServer. It could also pull basic info (like the client’s name, address) from LegalServer to avoid re-entering data. Another integration point is the hotline system – if in the future the screening AI is deployed, linking the two AIs could be beneficial (for example, intake answers collected by the screening agent could be passed directly to the letter agent, so the client doesn’t repeat information). These integrations, while technical, would ensure the AI tool fits seamlessly into the existing workflow rather than as a stand-alone app.

Broaden Accessibility and Language Support

San Bernardino County has a diverse population, and LASSB serves many clients for whom English is not a first language or who have disabilities that might make a standard chat interface challenging. Therefore, a key next step is to add multilingual capabilities and other accessibility features. The priority is Spanish language support, as a significant portion of LASSB’s client base is Spanish-speaking. This could involve developing a Spanish version of the AI agent – using a bilingual model or translating the prompt and output. The AI should ideally be able to conduct the interview in Spanish and draft the letter in Spanish, which the attorney could then review (noting that the final letter might need to be in English if sent to an English-speaking landlord, but at least the client interaction can be in their language). 

In addition, for clients with visual impairments, the interface should be compatible with screen readers (text-to-speech for the questions, etc.), and for those with low literacy or who prefer oral communication, a voice interface could be offered (perhaps reintroducing a refined version of the phone-based system, but integrated with the letter agent’s logic). Essentially, the tool should follow universal design principles so that no client is left out due to the technology format. This may require consulting accessibility experts and doing user testing with clients who have disabilities. 

Plan for Deployment and Pilot Testing

Before a full rollout, the team recommends a controlled pilot phase. In a pilot, a subset of LASSB staff and clients would use the AI tool on actual cases (with close supervision). Data from the pilot – success stories, any problems encountered, time saved metrics – should be collected and evaluated. This will help answer some open questions, such as: 

  • How do clients feel about interacting with an AI for part of their legal help? 
  • Does it change the attorney-client dynamic in any way? 
  • Are there cases where the AI approach doesn’t fit well (for instance, if a client has multiple legal issues intertwined, can the AI handle the nuance or does it confuse things)? 

These practical considerations will surface in a pilot. The pilot can also inform best practices for training staff on using the tool. Perhaps attorneys need a short training session on how to review AI drafts effectively, or intake staff need a script to explain to clients what the AI assistant is when transferring them. Developing guidelines and training materials is part of deployment. Additionally, during the pilot, establishing a feedback loop (maybe a weekly meeting to discuss all AI-drafted letters that week) will help ensure any kinks are worked out before scaling up. 

Address Open Questions and Long-Term Considerations

Some broader questions remain as this project moves forward.

How to Handle Reasonable Modifications

The current prototype focuses on reasonable accommodations (policy exceptions or services). A related need is reasonable modifications (physical changes to property, like installing a ramp). Initially, the team planned to include both, but they narrowed the scope to accommodations for manageability. Eventually, it would be beneficial to expand the AI’s capabilities to draft modification request letters as well, since the legal framework is similar but not identical. This might involve adding a branch in the conversation: if the client is requesting a physical modification, the letter would cite slightly different laws (e.g., California Civil Code related to modifications) and possibly include different information (like who will pay for the modification, etc.). The team left this as a future expansion area. In the interim, LASSB should be aware that the current AI might need additional training/examples before it can reliably handle modification cases.

Ensuring Ongoing Ethical Compliance

As the tool evolves, LASSB will need to regularly review it against ethical guidelines. For instance, if State Bar rules on AI use get updated, the system’s usage might need to be adjusted. Keeping documentation of how the AI works (so it can be explained to courts if needed) will be important. Questions like “Should clients be informed an AI helped draft this letter?” might arise – currently the plan would be to disclose if asked, but since an attorney is reviewing and signing off, the letter is essentially an attorney work product. LASSB might decide internally whether to be explicit about AI assistance or treat it as part of their workflow like using a template.

Maintenance and Ownership 

Who will maintain the AI system long-term? The recommendation is that LASSB identify either an internal team or an external partner (perhaps continuing with Stanford or another tech partner) to assume responsibility for piloting and updates.

AI models and integrations require maintenance – for example, if new housing laws pass, the model/prompt should be updated to include that. If the AI service (API) being used releases a new version that’s better/cheaper, someone should handle the upgrade. Funding might be needed for ongoing API usage costs or server costs. Planning for these practical aspects will ensure the project’s sustainability.

Scaling to Other Use Cases

If the demand letter agent proves successful, it could inspire similar tools for other high-volume legal aid tasks (for instance, generating answers to eviction lawsuits or drafting simple wills). One open question is how easily the approach here can be generalized. The team believes the framework (AI + human review) is generalizable, but each new use case will require its own careful curation of data and prompts. 

The success in the housing domain suggests LASSB and Stanford may collaborate to build AI assistants for other domains in the future (like an Unlawful Detainer Answer generator, etc.). This project can serve as a model for those efforts.

Finally, the team offered some encouraging closing thoughts: The progress so far shows that a tool like this “could significantly improve the situation and workload for staff at LASSB, allowing many more clients to receive legal assistance.” There is optimism that, with further development, the AI assistant can be deployed and start making a difference in the community. However, they also caution that “much work remains before this model can reach the deployment phase”

It will be important for future teams to continue with the same diligent approach – testing, iterating, and addressing the AI’s flaws – rather than rushing to deploy without refinement. The team emphasized a balance of excitement and caution: AI has great potential for legal aid, but it must be implemented thoughtfully. The next steps revolve around deepening the AI’s capabilities, hardening its reliability, improving the user experience, and carefully planning a real-world rollout. By following these recommendations, LASSB can move from a successful prototype to a pilot and eventually to a fully integrated tool that helps their attorneys and clients every day. The vision is that in the near future, a tenant with a disability in San Bernardino can call LASSB and, through a combination of compassionate human lawyers and smart AI assistance, quickly receive a strong demand letter that protects their rights – a true melding of legal expertise and technology to advance access to justice.

With continued effort, collaboration, and care, this prototype AI agent can become an invaluable asset in LASSB’s mission to serve the most vulnerable members of the community. The foundation has been laid; the next steps will bring it to fruition.

Categories
AI + Access to Justice Current Projects

A Call for Statewide Legal Help AI Stewards

Shaping the Future of AI for Access to Justice

By Margaret Hagan, originally published on Legal Design & Innovation

If AI is going to advance access to justice rather than deepen the justice gap, the public-interest legal field needs more than speculation and pilots — we need statewide stewardship.

2 missions of an AI steward, for a state’s legal help service provider community

We need specific people and institutions in every state who wake up each morning responsible for two things:

  1. AI readiness and vision for the legal services ecosystem: getting organizations knowledgeable, specific, and proactive about where AI can responsibly improve outcomes for people with legal problems — and improve the performance of services. This can ensure the intelligent and impactful adoption of AI solutions as they are developed.
  2. AI R&D encouragement and alignment: getting vendors, builders, researchers, and benchmark makers on the same page about concrete needs; matchmaking them with real service teams; guiding, funding, evaluating, and communicating so the right tools get built and adopted.

Ideally, these local state stewards will be talking with each other regularly. In this way, there can be federated research & development of AI solutions for legal service providers and the public struggling with legal problems.

This essay outlines what AI + Access to Justice stewardship could look like in practice — who can play the role, how it works alongside court help centers and legal aid, and the concrete, near-term actions a steward can take to make AI useful, safe, and truly public-interest.

State stewards can help local legal providers — legal aid groups, court help centers, pro bono networks, and community justice workers — to set a clear vision for AI futures & help execute it.

Why stewardship — why now?

Every week, new tools promise to draft, translate, summarize, triage, and file. Meanwhile, most legal aid organizations and court help centers are still asking foundational questions: What’s safe? What’s high-value? What’s feasible with our staff and privacy rules? How do we avoid vendor lock-in? How do we keep equity and client dignity at the center?

Without stewardship, AI adoption will be fragmented, extractive, and inequitable. With stewardship, states can:

  • Focus AI where it demonstrably helps clients and staff. Prioritize tech based on community and provider stakeholders’ needs and preferences — not just what is being sold by vendors.
  • Prepare data and knowledge so tools work in the local contexts. Also, that they can be trained safely & benchmarked responsibly with relevant data that is masked and safe.
  • Align funders, vendors, and researchers around real service needs. So that all of these stakeholder groups, with their capacity to support, build, and evaluate emerging technology, direct this capacity at opportunities that are meaningful.
  • Develop shared evaluation and governance so we build trust, not backlash.

Who can play the Statewide AI Steward role?

“Steward” is a role, not a single job title. Different kinds of groups can carry it, depending on how your state is organized:

  • Access to Justice Commissions / Bar associations / Bar foundations that convene stakeholders, fund statewide initiatives, and set standards.
  • Legal Aid Executive Directors (or cross-org consortia) with authority to coordinate practice areas and operations.
  • Court innovation offices / judicial councils that lead technology, self-help, and rules-of-court implementations.
  • University labs / legal tech nonprofits that have capacity for research, evaluation, data stewardship, and product prototyping.
  • Regional collaboratives with a track record of shared infrastructure and implementation.

Any of these can steward. The common denominator: local trusted relationships, coordination power, and delivery focus. The steward must be able to convene local stakeholders, communicate with them, work with them on shared training and data efforts, and move from talk to action.

The steward’s two main missions

Mission 1: AI readiness + vision (inside the legal ecosystem)

The steward gets legal organizations — executive directors, supervising/managing attorneys, practice leads, intake supervisors, operations staff — knowledgeable and specific about where AI can responsibly improve outcomes. This means:

  • Translating AI into service-level opportunities (not vague “innovation”).
  • Running short, targeted training sessions for leaders and teams.
  • Co-designing workflow pilots with clear review and safety protocols.
  • Building a roadmap: which portfolios, which tools, what sequence, what KPIs.
  • Clarify ethical, privacy, and consumer/client safety priorities and strategies, to talk about risks and worries in specific, technically-informed ways that provide sufficient protection to users and orgs — and don’t fall into inaction because of ill-defined concern about risk.

The result: organizations are in charge of the change rather than passive recipients of vendor pitches or media narratives.

2) AI tech encouragement + alignment (across the supply side)

The steward gets the groups who specialize in building and evaluating technology — vendors, tech groups, university researchers, benchmarkers— pointed at the right problems with the right real-world partnerships:

  • Publishing needs briefs by portfolio (housing, reentry, debt, family, etc).
  • Matchmaking teams and vendors; structuring pilots with data, milestones, evaluation, and governance. Helping organizations choose a best-in-class vendor and then also manage this relationship with regular evaluation.
  • Contributing to benchmarks, datasets, and red-teaming so the field learns together. Build the infrastructure that can lead to effective, ongoing evaluation of how AI systems are performing.
  • Helping fund and scale what works; communicating results frankly. Ensuring that prototypes and pilots’ outcomes are shared to inform others of what they might adopt, or what changes must happen to the AI solutions for them to be adopted or scaled.

The result: useful and robust AI solutions built with frontline reality, evaluated transparently, and ready to adopt responsibly.

What Stewards Could Do Month-to-Month

I have been brainstorming specific actions that a statewide steward could do. Many of these actions could also be done in concert with a federated network of stewards.

Some of the things a state steward could do to advance responsible, impactful AI for Access to Justice in their region.

Map the State’s Ecosystem of Legal Help

Too often, we think in terms of organizations — “X Legal Aid,” “Y Court Help Center” — instead of understanding who’s doing the actual legal work.

Each state needs to start by identifying the legal teams operating within its borders.

  • Who is doing eviction defense?
  • Who helps people with no-fault divorce filings?
  • Who handles reasonable accommodation letters for tenants?
  • Who runs the reentry clinic or expungement help line?
  • Who offers debt relief letter assistance?
  • Who does restraining order help?

This means mapping not just legal help orgs, but service portfolios and delivery models. What are teams doing? What are they not doing? And what are the unmet legal needs that clients consistently face?

This is a service-level analysis — an inventory of the “market” of help provided and the legal needs not yet met.

AI Training for Leaders + Broader Legal Organizations

Most legal aid and court help staff are understandably cautious about AI. Many don’t feel in control of the changes coming — they feel like they’re watching the train leave the station without them.

The steward’s job is to change that.

  • Demystify AI: Explain what these systems are and how they can support (or undermine) legal work.
  • Coach teams: Help practice leads and service teams see which parts of their work are ripe for AI support.
  • Invite ownership: Position AI not as a threat, but as a design space — a place where legal experts get to define how tools should work, and where lawyers and staff retain the power to review and direct.

To do this, stewards can run short briefings for EDs, intake leads, and practice heads on LLM basics, use cases, risks, UPL and confidentiality, and adoption playbooks. Training aims to get them conversant in the basics of the technology and help them envision where responsible opportunities might be. Let them see real-world examples of how other legal help providers are using AI behind the scenes or directly to the public.

Brainstorm + Opportunity Mapping Workshops with Legal Teams

Bring housing teams, family law facilitator teams, reentry teams, or other specific legal teams together. Have them map out their workflows and choose which of their day-to-day tasks is AI-opportune. Which of the tasks are routine, templated, and burdensome?

As stewards run these workshops, they can be on the lookout for where legal teams in their state can build, buy, or adopt an AI solution in 3 areas.

When running AI opportunity brainstorm, it’s worth considering these 3 zones: where can we add to existing legal full-representation servivces, where can we add to brief or pro bono services, and where can we add services that legal teams don’t currently offer?

Brainstorm 1: AI Copilots for Services Legal Teams Already Offer

This is the lowest-risk, highest-benefit space. Legal teams are already helping with eviction defense, demand letters, restraining orders, criminal record clearing, etc.

Here, AI can act as a copilot for the expert — a tool that does things that the expert lawyer, paralegal, or legal secretary is already doing in a rote way:

  • Auto-generates first drafts based on intake data
  • Summarizes client histories
  • Auto-fills court forms
  • Suggests next actions or deadlines
  • Creates checklists, declarations, or case timelines

These copilots don’t replace lawyers. They reduce drudge work, improve quality, and make staff more effective.

Brainstorm 2: AI Copilots for Services That Could Be Done by Pro Bono or Volunteers

Many legal aid organizations know where they could use more help: limited-scope letters, form reviews, answering FAQs, or helping users navigate next steps.

AI can play a key role in unlocking pro bono, brief advice, and volunteer capacity:

  • Automating burdensome tasks like collecting or review database records,
  • Helping them write high-quality letters or motions
  • Pre-filling petitions and forms with data that has been gathered
  • Providing them with step-by-step guidance
  • Flagging errors, inconsistencies, or risks in drafts
  • Offering language suggestions or plain-language explanations

Think of this as AI-powered “training wheels” that help volunteers help more people, with less handholding from staff.

Brainstorm 3: AI Tools for Services That Aren’t Currently Offered — But Should Be

There are many legal problems where there is high demand, but legal help orgs don’t currently offer help because of capacity limits.

Common examples of these under-served areas include:

  • Security deposit refund letters
  • Creating demand letters
  • Filing objections to default judgments
  • Answering brief questions

In these cases, AI systems — carefully designed, tested, and overseen — can offer direct-to-consumer services that supplement the safety net:

  • Structured interviews that guide users through legal options
  • AI-generated letters/forms with oversight built in
  • Clear red flags for when human review is needed

This is the frontier: responsibly extending the reach of legal help to people who currently get none. The brainstorm might also include reviewing existing direct-to-consumer AI tools from other legal orgs, and deciding which they might want to host or link to from their website.

The steward can hold these brainstorming and prioritization sessions to help legal teams find these legal team co-pilots, pro bono tools, and new service offerings in their issue area. The stewards and legal teams can move the AI vision forward & prepare for a clear scope for what AI should be built.

Data Readiness + Knowledge Base Building

Work with legal and court teams to inventory what data they have that could be used to train or evaluate some of the legal AI use cases they have envisioned. Support them with tools & protocols by which to mask PII in this document and make it safe to use in AI R&D.

This could mean getting anonymized completed forms, documents, intake notes, legal answers, data reports, or other legal workflow items. Likely, much of this data will have to be labeled, scored, and marked up so that it’s useful in training and evaluation.

The steward can help the groups that hold this data to understand what data they hold, how to prepare it and share it, and how to mark it up with helpful labels.

Part of this is also to build a Local Legal Help Knowledge Base — not just about the laws and statutes on the books, but about the practical, procedural, and service knowledge that people need when trying to deal with a legal problem.

Much of this knowledge is in legal aid lawyers’ and court staff’s heads, or training decks and events, or internal knowledge management systems and memos.

Stewards can help these local organizations contribute this knowledge about local legal rules, procedures, timelines, forms, services, and step-by-step guides into a statewide knowledge base. This knowledge base can then be used by the local providers. It will be a key piece of infrastructure on which new AI tools and services can be built.

Adoption Logistics

As local AI development visions come together, the steward can lead on adoption logistics.

The steward can make sure that the local orgs don’t reinvent what might already exist, or spend money in a wasteful way.

They can do tool evaluations to see which LLMs and specific AI solutions perform best on the scoped tasks. They can identify researchers and evaluators to help with this. They can also help organizations procure these tools or even create a pool of multiple organizations with similar needs for a shared procurement process.

They might also negotiate beneficial, affordable licenses or access to AI tools that can help with the desired functions. They can also ensure that case management and document management systems are responsive to the AI R&D needs, so that the legacy technology systems will integrate well with the new tools.

Ideally, the steward will help the statewide group and the local orgs make smart investments in the tech they might need to buy or build — and can help clear the way when hurdles emerge.

Bigger-Picture Steward Strategies

In addition to these possible actions, statewide stewards can also follow a few broader strategies to get a healthy AI R&D ecosystem in their state and beyond.

Be specific to legal teams

As I’ve already mentioned throughout this essay, stewards should be focused on the ‘team’ level, rather than the ‘organization’ one. It’s important that they develop relationships and run activities with teams that are in charge of specific workflows — and that means the specific kind of legal problem they help with.

Stewardship should be organizing its statewide network of named teams and named services, for example,

  • Housing law teams & their workflows: hotline consults, eviction defense prep, answers, motions to set aside, trial prep, RA letters for habitability issues, security-deposit demand letters.
  • Reentry teams & their workflows: record clearance screening, fines & fees relief, petitions, supporting declarations, RAP sheet interpretation, collateral consequences counseling.
  • Debt/consumer teams & their workflows: answer filing, settlement letters, debt verification, exemptions, repair counseling, FDCPA dispute letters.
  • Family law teams & their workflows: form prep (custody, DV orders), parenting plans, mediation prep, service and filing instructions, deadline tracking.

The steward can make progress on its 2 main goals — AI readiness and R&D encouragement — if it can build a strong local network among the teams that work on similar workflows, with similar data and documents, with similar audiences.

Put ethics, privacy, and operational safeguards at the center

Stewardship builds trust by making ethics operational rather than an afterthought. This all happens when AI conversations are grounded, informed, and specific among legal teams and communities. It also happens when they work with trained evaluators, who know how to evaluate the performance of AI rigorously, not based on anecdotes and speculation.

The steward network can help by planning out and vetting common, proven strategies to ensure quality & consumer protection are designed into the AI systems. They could work on:

  • Competence & supervision protocols: helping legal teams plan for the future of expert review of AI systems, clarifying “eyes-on” review models with staff trainings and tools. Stewards can also help them plan for escalation paths, when human reviewers find problems with the AI’s performance. Stewards might also work on standard warnings, verification prompts, and other key designs to ensure that reviewers are effectively watching AI’s performance.
  • Professional ethics rules clarity: help the teams design internal policies that ensure they’re in compliance with all ethical rules and responsibilities. Stewards can also help them plan out effective disclosures and consent protocols, so consumers know what is happening and have transparency.
  • Confidentiality & privacy: This can happen at the federated/ national level. Stewards can set rules for data flows, retention, de-identification/masking — which otherwise can be overwhelming for specific orgs. Stewards can also vet vendors for security and subprocessing.
  • Accountability & Improvements: Stewards can help organizations and vendors plan for good data-gathering & feedback cycles about AI’s performance. This can include guidance on document versioning, audit logs, failure reports, and user feedback loops.

Stewards can help bake safeguards into workflows and procurement, so that there are ethics and privacy by design in the technical systems that are being piloted.

Networking stewards into a federated ecosystem

For statewide stewardship to matter beyond isolated pilots, stewards need to network into a federated ecosystem — a light but disciplined network that preserves local autonomy while aligning on shared methods, shared infrastructure, and shared learning.

The value of federation is compounding: each state adapts tools to local law and practice, contributes back what it learns, and benefits from the advances of others. Also, many of the tasks of a steward — educating about AI, building ethics and safeguards, measuring AI, setting up good procurement — will be quite similar state-to-state. Stewards can share resources and materials to implement locally.

What follows reframes “membership requirements” as the operating norms of that ecosystem and explains how they translate into concrete habits, artifacts, and results.

Quarterly check-ins become the engine of national learning. Stewards participate in a regular virtual cohort, not as a status ritual but as an R&D loop. Each session surfaces what was tried, what worked, and what failed — brief demos, before/after metrics, and annotated playbooks.

Stewards use these meetings to co-develop materials, evaluation rubrics, funding strategies, disclosure patterns, and policy stances, and to retire practices that didn’t pan out. Over time, this cadence produces a living canon of benchmarks and templates that any newcomer steward can adopt on day one.

Each year, the steward could champion at least one pilot or evaluation (for example, reasonable-accommodation letters in housing or security-deposit demand letters in consumer law), making sure it has clear success criteria, review protocols, and an exit ramp if risks outweigh benefits. This can help the pilots spread to other jurisdictions more effectively.

Shared infrastructure is how federation stays interoperable. Rather than inventing new frameworks in every state, stewards lean on common platforms for evaluation, datasets, and reusable workflows. Practically, that means contributing test cases and localized content, adopting shared rubrics and disclosure patterns, and publishing results in a comparable format.

It also means using common identifiers and metadata conventions so that guides, form logic, and service directories can be exchanged or merged without bespoke cleanup. When a state localizes a workflow or improves a safety check, it pushes the enhancement upstream, so other states can pull it down and adapt with minimal effort.

Annual reporting turns stories into evidence and standards. Each steward could publish a concise yearly report that covers: progress made, obstacles encountered, datasets contributed (and their licensing status), tools piloted or adopted (and those intentionally rejected), equity and safety findings, and priorities for the coming year.

Because these reports follow a common outline, they are comparable across states and can be aggregated nationally to show impact, surface risks, and redirect effort. They also serve as onboarding guides for new teams: “Here’s what to try first, here’s what to avoid, here’s who to call.”

Success in 12–18 months looks concrete and repeatable. In a healthy federation, we could point to a public, living directory of AI-powered teams and services by portfolio, with visible gaps prioritized for action.

  • We could have several legal team copilots embedded in high-volume workflows — say, demand letters, security-deposit letters, or DV packet preparation — with documented time savings, quality gains, and staff acceptance.
  • We could have volunteer unlocks, where a clinic or pro bono program helps two to three times more people in brief-service matters because a copilot provides structure, drafting support, and review checkers.
  • We could have at least one direct-to-public workflow launched in a high-demand, manageable-risk area, with clear disclosures, escalation rules, and usage metrics.
  • We would see more contributions to data-driven evaluation practices and R&D protocols. This could be localized guides, triage logic, form metadata, anonymized samples, and evaluation results. Or it could be an ethics and safety playbook that is not just written but operationalized in training, procurement, and audits.

A federation of stewards doesn’t need heavy bureaucracy. It could be a set of light, disciplined habits that make local work easier and national progress faster. Quarterly cohort exchanges prevent wheel-reinventing. Local duties anchor AI in real services. Shared infrastructure keeps efforts compatible. Governance protects the public-interest character of the work. Annual reports convert experience into standards.

Put together, these practices allow stewards to move quickly and responsibly — delivering tangible improvements for clients and staff while building a body of knowledge the entire field can trust and reuse.

Stewardship as the current missing piece

Our team at Stanford Legal Design Lab is aiming for an impactful, ethical, robust ecosystem of AI in legal services. We are building the platform JusticeBench to be a home base for those working on AI R&D for access to justice. We are also building justice co-pilots directly with several legal aid groups.

But to build this robust ecosystem, we need local stewards for state jurisdictions across the country — who can take on key leadership roles and decisions — and make sure that there can be A2J AI that responds to local needs but benefits from national resources. Stewards can also help activate local legal teams, so that they are directing the development of AI solutions rather than reacting to others’ AI visions.

We can build legal help AI state by state, team by team, workflow by workflow. But we need stewards who keep clients, communities, and frontline staff at the center, while moving their state forward.

That’s how AI becomes a force for justice — because we designed it that way.

Categories
AI + Access to Justice

Human-Centered AI R&D at ICAIL’s Access to Justice Workshop

By Margaret Hagan, Executive Director of the Legal Design Lab

At this year’s International Conference on Artificial Intelligence and Law (ICAIL 2025) in Chicago, we co-hosted the AI for Access to Justice (AI4A2J) workshop—a full-day gathering of researchers, technologists, legal practitioners, and policy experts, all working to responsibly harness artificial intelligence to improve public access to justice.

The workshop was co-organized by an international team: myself (Margaret Hagan) from Stanford Legal Design Lab, Quinten Steenhuis of Suffolk University Law School/LIT Lab; Hannes Westermann of Maastricht University; Marc Lauritsen, Capstone Practice Systems; and Jaromir Savelka of Carnegie Mellon University. Together, we brought together 22 papers from contributors across the globe, representing deep work from Brazil, Czechia, Singapore, the UK, Canada, Italy, Finland, Australia, Taiwan, India, and the United States.

A Truly Global Conversation

What stood out most was the breadth of global participation and the specificity of solutions offered. Rather than high-level speculation, nearly every presentation shared tangible, grounded proposals or findings: tools developed and deployed, evaluative frameworks created, and real user experiences captured.

Whether it was a legal aid chatbot deployed in British Columbia, a framework for human-centered AI development from India, or benchmark models to evaluate AI-generated legal work product in Brazil, the contributions showcased the power of bottom-up experimentation and user-centered development.

A Diversity of Roles and Perspectives

Participants included legal researchers, practicing attorneys, judges, technologists, policy designers, and evaluation experts. The diversity of professional backgrounds allowed for robust discussion across multiple dimensions of justice system transformation. Each participant brought a unique lens—whether from working directly with vulnerable litigants, building AI systems, or establishing ethical and regulatory frameworks for new technologies.

Importantly, the workshop centered interdisciplinary collaboration. It wasn’t just legal professionals theorizing about AI, or technologists proposing disconnected tools. Instead, we heard from hybrid teams conducting qualitative user research, sharing open-source datasets, running field pilots, and conducting responsible evaluations of AI interventions in real-world settings.

Emerging Themes Across the Day

Across four themed panels, several core themes emerged:

  1. Human-Centered AI for Legal Aid and Self-Help
    Projects focused on building AI copilots and tools to support legal aid organizations and self-represented litigants. Presenters shared tools to help tenants facing eviction, systems to automate form filling with contextual guidance, and bots that assist in court navigation. Importantly, these tools were being built in partnership with legal aid teams and directly with users, with ongoing evaluations of quality, safety, and impact.
  2. Legal Writing, Research, and Data Tools
    A second group of projects explored how AI could help professionals and SRLs write legal documents, draft arguments, and find relevant precedent more efficiently. These systems included explainable outcome predictors for custody disputes, multilingual legal writing assistants, and knowledge graphs built from court filings. Many papers detailed methods for aligning AI output with local legal contexts, language needs, and cultural sensitivity.
  3. Systems-Level Innovation and AI Infrastructure
    A third set of papers zoomed out to the system level. Projects explored how AI could enable better triage and referral systems, standardized data pipelines, and early intervention mechanisms (e.g., detecting legal risk from text messages or scanned notices). We also heard from teams building open-source infrastructure for courts, public defenders, and justice tech startups to use.
  4. Ethics, Evaluation, and Responsible Design
    Finally, the workshop closed with discussions of AI benchmarks, regulatory models, and ethical frameworks to guide the development and deployment of legal AI tools. How do we measure the accuracy, fairness, and usefulness of a generative AI system when giving legal guidance? What does it mean to provide “good enough” help when full representation isn’t possible? Multiple projects proposed evaluation toolkits, participatory design processes, and accountability models for institutions adopting these tools.

Building on Past Work and Sharing New Ideas

Many workshop presenters built directly on prior research, tools, and evaluation methods developed through the Legal Design Lab and our broader community. We were especially excited to see our Lab’s Legal Q&A Evaluation Rubrics, originally developed to benchmark the quality of automated legal information, being adopted in People’s Law School’s (in British Columbia) Beagle+ project as they deploy and test a user-facing AI chatbot to answer people’s common legal questions.

Another compelling example came from Georgetown University, where our previous Visual Legal design work product, patterns and communication tools are now inspiring a new AI-powered visual design creator built by Brian Rhindress. Their tool helps legal aid organizations and court staff visually and textually explain legal processes to self-represented litigants—leveraging human-centered design and large language models to generate tailored explanations and visuals. A group can take their text materials and convert it into human-centered visual designs, using LLMs + examples (including those from our Lab and other university/design labs).

We’re excited to see these threads of design and evaluation from previous Stanford Legal Design Lab work continuing to evolve across jurisdictions.

The Need for Empirical Grounding and Regulatory Innovation

A major takeaway from the group discussions was the urgent need for new empirical research on how people actually interact with legal AI tools—what kinds of explanations they want, what kinds of help they trust, and what types of disclosures and safeguards are meaningful. Rather than assuming that strict unauthorized practice of law (UPL) rules will protect consumers, several papers challenged us to develop smarter, more nuanced models of consumer protection, ones grounded in real user behavior and real-world harms and benefits.

This opens the door for a new generation of research—not just about what AI can do, but about what regulatory frameworks and professional norms will ensure the tools truly serve the public good.

Highlights

There were many exciting contributions among the 22 presentations. Here is a short overview, and I encourage you to explore all the draft papers.

Tracking and Improving AI Tools with Real-World Usage Data: The Beagle+ Experiment in British Columbia

One of the standout implementations shared came from British Columbia’s People’s Law School’s Beagle+ project. This legal chatbot, launched in early 2024, builds on years of legal aid innovation to offer natural-language assistance to users navigating everyday legal questions. What makes Beagle+ especially powerful is its integrated feedback and monitoring system: each interaction is logged with Langfuse, recording inputs, outputs, system prompts, retrieval sources, and more.

The team uses this real-world usage data to monitor system accuracy, cost, latency, and user empowerment over time—allowing iterative improvements that directly respond to user behavior.

They also presented experiments in generative legal editing, exploring the chatbot’s ability to diagnose or correct contract clauses, with promising results. Yet, the team emphasized that no AI tool is perfect out of the box—for now, human review and thoughtful system design remain essential for safe deployment.


Helping Workers Navigate Employment Disputes in the UK: AI-Powered ODR Recommenders

Glory Ogbonda and Sarah Nason presented a pioneering tool from the UK designed to help workers triage their employment disputes and find the right online dispute resolution (ODR) system. Funded by the Solicitors Regulation Authority, this research uncovered what users in employment disputes really want: not just legal signposting, but a guided journey. Their proposed ODR-matching system uses RAG (retrieval-augmented generation) to give users an intuitive flow: first collecting plain-language descriptions of their workplace conflict, then offering legal framing, suggested next steps, and profiles of potential legal tools.

User testing revealed a tension between formal legal accuracy and the empathy and clarity that users crave. The project underscores a core dilemma in legal AI: how to balance actionable, user-centered advice with the guardrails of legal ethics and system limits.


Empowering Litigants through Education and AI-Augmented Practice: The Cybernetic Legal Approach

Zoe Dolan (working with Aiden the AI) shared insights from her hands-on project working directly with self-represented litigants in an appeals clinic. Dolan + Aiden trained cohorts of participants to use a custom-configured GPT-based tool, enhanced with rules, court guides, and tone instructions. Participants learned how to prompt the tool effectively, verify responses, and use it to file real motions and navigate the courts.

The project foregrounds empowerment, rather than outcome, as the key success metric—helping users avoid defaults, feel agency, and move confidently through procedures. Notably, Dolan found that many SRLs had developed their own sophisticated AI usage patterns, often outpacing legal professionals in strategic prompting and tool adoption. The project points to a future where legal literacy includes both procedural knowledge and AI fluency.


AI-Driven Early Intervention in Family and Housing Law in Chicago

Chlece Walker-Neal presented AJusticeLink, a preventative justice project from Chicago focused on identifying legal and psycho-legal risk through SMS messages. The tool analyzes texts that users send to friends, family, and others, detecting language that signals legal issues—such as risk of eviction or custody disputes—and assigning an urgency score. Based on this, users are linked to appropriate legal services. The project aims to intervene before a crisis reaches the courthouse, helping families address issues upstream. This early-warning approach exemplifies a shift in justice innovation: from reactive court services to proactive legal health interventions.


PONK: Helping Czech Litigants Write Better Legal Texts with Structured AI Guidance

The PONK project out of Czechia presented a tool for improving client-oriented legal writing by using structured AI and rule-based enhancements. Drawing on a dataset of over 250 annotated legal documents, the system helps convert raw legal text into clearer argumentation following a Fact–Rule–Conclusion (FRC) structure. This project is part of a broader movement to bring explainable AI into legal drafting and aims to serve both litigants and legal aid professionals by making documents more structured, persuasive, and usable across a wider audience. It showcases how small linguistic and structural refinements, guided by AI, can produce outsized impact in real-world justice communications.


Can AI Help Fix Hard-to-Use Court Forms and Text-Heavy Guides? A Visual Design AI Prototype

Brian Rhindress presented a provocative question: can AI be trained to reformat legal documents and court forms into something more visually accessible? And could the backlog of training materials in PDFs, docs, and text-heavy powerpoints be converted into something more akin to comic books, visuals, fliers, and other engaging materials?

Inspired by design principles from Stanford Legal Design Lab and the U.S. Digital Service, and building off materials from the Graphic Advocacy Project, Harvard A2J Lab, Legal Design Lab and more, the project tested generative models on their ability to re-layout legal forms. While early versions showed promise for ideation and inspiration, they often suffered from inconsistent checkbox placement, odd visual hierarchy, or poor design language.

Still, the vision is compelling: a future where AI-powered layout tools assist courts in publishing more user-friendly, standardized forms—potentially across jurisdictions. Future versions may build on configuration workflows and clear design templates to reduce hallucinations and increase reliability. The idea is to lower the entry barrier for underserved communities by combining proven legal messaging with compelling visual storytelling. Rather than developing entirely new tools, teams explored how off-the-shelf systems, paired with smart examples and curated prompts, can deliver real-time, audience-tailored legal visuals.


Building Transparent, Customizable AI Systems for Sentencing and Immigration Support

Aparna Komarla from Redo.io and colleagues from OpenProBono demonstrated the power of open, configurable AI agents in the justice system. In California’s “Second Look” sentencing reviews, attorneys can use this custom-built AI system to query multi-agency incarceration datasets and assign “suitability scores” to prioritize eligible individuals who might have claims the attorneys can assist with. The innovation lies in giving attorneys—not algorithms—the power to define and adjust the weights of relevant factors, helping to maintain transparency and align tools with local values and judicial discretion.


Place Matters: How Location Affects AI Hallucination Rates in Legal Answers

Damian Curran and colleagues explored an increasingly urgent issue: do large language models (LLMs) perform differently depending on the geographic context of the legal question? Their findings say yes—and in sometimes surprising ways. For instance, while LLMs hallucinated less often on employment law queries in Sydney, their housing law performance there was riddled with errors. In contrast, models did better on average with Los Angeles queries—possibly due to the volume of U.S.-based training data. The study underscores the importance of localization in AI legal tools, especially for long-tail or low-resourced jurisdictions where statutory nuance or recent reforms may not be well represented in AI training data.


Drawing the Line Between Legal Info and Legal Advice in India’s Emerging Chatbot Landscape

Avanti Durani and Shivani Sathe presented a critical user research study from India that investigates how AI-powered legal chatbots respond to user queries—and whether they stay within the bounds of offering legal information rather than unlicensed advice. Their analysis of six tools found widespread inconsistencies in tone, disclaimers, and the framing of legal responses. Some tools subtly slipped into strategic advice or overly narrow guidance, even as disclaimers were buried or hard to find. These gaps, they argue, pose real risks for low-literacy and legally vulnerable users. Their work raises important regulatory questions: should the standard for chatbots be defined only by unauthorized practice of law rules? Or should we also integrate user preferences and the expectations of trusted community intermediaries, such as social workers or legal aid navigators?


The full collection of 22 papers is available here with links to preprint drafts as available. We encourage everyone to explore the work and reach out to the authors—many are actively seeking collaborators, reviewers, and pilot partners.

Together, these contributions mark a new chapter in access to justice research—one where AI innovation is rigorously evaluated, deeply grounded in the legal domain, and shaped by the real needs of the people and professionals who use it.

What Comes Next

The enthusiasm and rigor from this year’s submissions reaffirmed that AI for access to justice is not a hypothetical field—it’s happening now, and it’s advancing rapidly.

The ICAIL AI4A2J workshop served as a global convening point where ideas were shared not just to impress, but to be replicated, scaled, and improved upon. Multiple projects made their datasets and prototypes publicly available, inviting others to test and build on them. Several are looking to collaborate across jurisdictions and domains to study effectiveness in new environments.

Our Stanford Legal Design Lab team left the workshop energized to continue our own work on AI co-pilots for eviction defense and debt relief, and newly inspired to integrate ideas from peers across the globe. We’re especially focused on how to:

  • Embed evaluation and quality standards from the start,
  • Design human-AI partnerships that support (not replace) frontline legal workers,
  • Spread and scale the best tools and protocols in ways that preserve trust, dignity, and legal integrity.
  • Develop policies and regulation that are based in empirical data, human behavior, and actual consumer protection

Thank You

We’re deeply grateful to our co-organizers and all the presenters who contributed to making this workshop a meaningful step forward. And a special thanks to the ICAIL community, which continues to be a space where technical innovation and public interest values come together in thoughtful dialogue.

Stay tuned—our program committee is considering next steps around publications and subsequent conferences, and we hope this is just the beginning of an ongoing, cross-border conversation about how AI can truly improve access to justice.

Please also see my colleague Quinten’s write-up of his takeaways from the workshop!

Categories
Class Blog Design Research

3 Kinds of Access to Justice Conflicts

(And the Different Ways to Design for Them)

by Margaret Hagan

In the access to justice world, we often talk about “the justice gap” as if it’s one massive, monolithic challenge. But if we want to truly serve the public, we need to be more precise. People encounter different kinds of legal problems, with different stakes, emotional dynamics, and system barriers. And those differences matter.

At the Legal Design Lab, we find it helpful to divide the access to justice landscape into three distinct types of problems. Each has its own logic — and each requires different approaches to research, design, technology, and intervention.

3 Types of Conflicts that we talk about when we talk about Access to Justice

1. David vs. Goliath Conflicts

This is the classic imbalance. An individual — low on time, legal knowledge, money, or support — faces off against a repeat player: a bank, a corporate landlord, a debt collector, or a government agency.

These Goliaths have teams of lawyers, streamlined filing systems, institutional knowledge, predictive data, and now increasingly, AI-powered legal automation and strategies. They can file thousands of cases a month — many of which go uncontested because people don’t understand the process, can’t afford help, or assume there’s no point trying.

This is the world of:

  • Eviction lawsuits from corporate landlords
  • Mass debt collection actions
  • Robo-filed claims, often incorrect but rarely challenged

The problem isn’t just unfairness — it’s non-participation. Most “Davids” default. They don’t get their day in court. And as AI makes robo-filing even faster and cheaper, we can expect the imbalance in knowledge, strategy, and participation may grow worse.

What Goliath vs. David Conflicts need

Designing for this space means understanding the imbalance and structuring tools to restore procedural fairness. That might mean:

  • Tools that help people respond before defaulting. These could be pre-filing defense tools that detect illegal filings or notice issues. It could also be tools that prepare people to negotiate from a stronger position — or empower them to respond before defaulting.
  • Systems that detect and challenge low-quality filings. It could also involve systems that flag repeat abusive behavior from institutional actors.
  • Interfaces that simplify legal documents into plain language. Simplified, visual tools to help people understand their rights and the process quickly.
  • Research into procedural justice and scalable human-AI support models

2. Person vs. Person Conflicts

This second type of case is different. Here, both parties are individuals, and neither has a lawyer.

In this world, both sides are unrepresented and lack institutional or procedural knowledge. There’s real conflict — often with emotional, financial, or relational stakes — but neither party knows how to navigate the system.

Think about emotionally charged, high-stakes cases of everyday life:

  • Family law disputes (custody, divorce, child support)
  • Mom-and-pop landlord-tenant disagreements
  • Small business vs. customer conflicts
  • Neighbor disputes and small claims lawsuits

Both people are often confused. They don’t know which forms to use, how to prepare for court, how to present evidence, or what will persuade a judge. They’re frustrated, emotional, and worried about losing something precious — time with their child, their home, their reputation. The conflict is real and felt deeply, but both sides are likely confused about the legal process.

Often, these conflicts escalate unnecessarily — not because the people are bad, but because the system offers them no support in finding resolution. And with the rise of generative AI, we must be cautious: if each person gets an AI assistant that just encourages them to “win” and “fight harder,” we could see a wave of escalation, polarization, and breakdowns in courtrooms and relationships.

We have to design for a future legal system that might, with AI usage increasing, become more adversarial, less just, and harder to resolve.

What Person Vs. Person Justice Conflicts Need

In person vs. person conflicts, the goal should be to get to mutual resolutions that avoid protracted ‘high’ conflict. The designs needed are about understanding and navigation, but also about de-escalation, emotional intelligence, and procedural scaffolding.

  • Tools that promote resolution and de-escalation, not just empowerment. They can ideally support shared understanding and finding a solution that can work for both parties.
  • Shared interfaces that help both parties prepare for court fairly. Technology can help parties prepare for court, but also explore off-ramps like mediation.
  • Mediation-oriented AI prompts and conflict-resolution scaffolding. New tools could have narrative builders that let people explain their story or make requests without hostility. AI prompts and assistants could calibrate to reduce conflict, not intensify it.
  • Design research that prioritizes relational harm and trauma awareness.

This is not just a legal problem. It’s a human problem — about communication, trust, and fairness. Interventions here also need to think about parties that are not directly involved in the conflict (like the children in a family law dispute between separating spouses).

3. Person vs. Bureaucracy

Finally, we have a third kind of justice issue — one that’s not so adversarial. Here, a person is simply trying to navigate a complex system to claim a right or access a service.

These kinds of conflicts might be:

  • Applying for public benefits, or appealing a denial
  • Dealing with a traffic ticket
  • Restoring a suspended driver’s license
  • Paying off fines or clearing a record
  • Filing taxes or appealing a tax decision
  • Correcting an error on a government file
  • Getting work authorization or housing assistance

There’s no opposing party. Just forms, deadlines, portals, and rules that seem designed to trip you up. People fall through the cracks because they don’t know what to do, can’t track all the requirements, or don’t have the documents ready. It’s not a courtroom battle. It’s a maze.

Here many of the people caught in these systems do have rights and options. They just don’t know it. Or they can’t get through all the procedural hoops to claim them. It’s a quiet form of injustice — made worse by fragmented service systems and hard-to-reach agencies.

What Person vs. Bureaucracy Conflicts Need

For people vs. bureaucracy conflicts, the key word is navigation. People need supportive, clarifying tools that coach and guide them through the process — and that might also make the process simpler to begin with.

  • Seamless navigation tools that walk people through every step. These could be digital co-pilots that walk people through complex government workflows, and keep them knowledgeable and encouraged at each step.
  • Clear eligibility screeners and document checklists. These could be intake simplification tools that flag whether the person is in the right place, and sets expectations about what forms someone needs and when.
  • Text-based reminders and deadline alerts, to keep people on top of complicated and lengthy processes. These procedural coaches can keep people from ending up in endless continuances or falling off the process altogether. Personal timelines and checklists can track each step and provide nudges.
  • Privacy-respecting data sharing so users don’t have to “start over” every time. This could mean administrative systems that have document collection & data verication systems that gather and store proofs (income, ID, residence) that people need to supply over and again. It could also mean bringing their choices and details among trusted systems, so they don’t need to fill in another form.

This space is ripe for good technology. But it also needs regulatory design and institutional tech improvements, so that systems become easier to plug into — and easier to fix. Aside from user-facing designs, we also need to work on standardizing forms, moving from form-dependencies to structured data, and improve the tech operations of the systems.

Why These Distinctions Matter

These three types of justice problems are different in form, in emotional tone, and in what people need to succeed. That means we need to study them differently, run stakeholder sessions differently, evaluate them with slightly different metrics, and employ different design patterns and principles.

Each of these problem types requires a different kind of solution and ideal outcome.

  • In David vs. Goliath, we need defense, protection, and fairness. We need to help reduce the massive imbalance in knowledge, capacity, and relationships, and ensure everyone can have their fair day in court.
  • In Person vs. Person, we need resolution, dignity, and de-escalation. We need to help people focus on mutually agreeable, sustainable resolutions to their problems with each other.
  • In Person vs. Bureaucracy, we need clarity, speed, and guided action. We must aim for seamless, navigable, efficient systems.

Each type of problem requires different work by researchers, designers, an policymakers. These include different kinds of:

  • User research methods, and ways to bring stakeholders together for collaborative design sessions
  • Product and service designs, and the patterns of tools, interfaces, and messages that will engage and serve users in this conflict.
  • Evaluation criteria, about what success looks like
  • AI safety guidelines, about how to prevent bias, capture, inaccuracies, and other possible harms. We can expect these 3 different conflicts changing as more AI usage appears among litigants, lawyers, and court systems.

If we blur these lines, we risk building one-size-fits-none tools.

How might the coming wave of AI in the legal system affect these 3 different kinds of Access to Justice problems?

Toward Smarter Justice Innovation

At the Legal Design Lab, we believe this three-type framework can help researchers, funders, courts, and technologists build smarter interventions — and avoid repeating old mistakes.

We can still learn across boundaries. For example:

  • How conflict resolution tools from family law might help in small business disputes
  • How navigational tools in benefits access could simplify court prep
  • How due process protections in eviction can inform other administrative hearings

But we also need to be honest: not every justice problem is built the same. And not every innovation should look the same.

By naming and studying these three zones of access to justice problems, we can better target our interventions, avoid unintended harm, and build systems that actually serve the people who need them most.

Categories
AI + Access to Justice Current Projects

Can LLMs help streamline legal aid intake?

Insights from Quinten Steenhuis at the AI + Access to Justice Research Seminar

Recently, the Stanford Legal Design Lab hosted its latest installment of the AI+Access to Justice Research Seminar, featuring a presentation from Quinten Steenhuis.

Quinten is a professor and innovator-in-residence at Suffolk Law School’s LIT Lab. He’s also a former housing attorney in Massachusetts who has made a significant impact with projects like Court Forms Online and MADE, a tool for automating eviction help. His group Lemma Legal works with groups on developing legal tech for interviews, forms, and documents.

His presentation in April 2025 focused on a project he’s been working on in collaboration with Hannes Westermann from the Maastricht Law & Tech Lab. This R&D project focuses on whether large language models (LLMs) are effective at tasks that might streamline intake in civil legal services. This work is being developed in partnership with Legal Aid of Eastern Missouri, along with other legal aid groups and funding from the U.S. Department of Housing and Urban Development.

The central question addressed was: Can LLMs help people get through the legal intake process faster and more accurately?

The Challenge: Efficient, Accurate Legal Aid Intake and Triage

For many people, legal aid is hard to access. That is in part because of the intake process, to apply for help from a local legal aid group. It can be time-consuming and frustrating for people to go through the current legal aid intake and triage process. Imagine calling a legal aid hotline, stressed out about a problem with your housing, family, finances, or job, only to wait on hold for an hour or more. When your call is finally answered, the intake worker needs to determine whether you qualify for help based on a complex and often open-textured set of rules. These rules vary significantly depending on jurisdiction, issue area, and individual circumstances — from citizenship and income requirements to more subjective judgments like whether a case is a “good” or “bad” fit for the program’s priorities or funding streams.

Intake protocols are typically documented internally for staff members in narrative guides, sometimes as long as seven pages, containing a mix of rules, sample scripts, timelines, and sub-rules that differ by zip code and issue type. These rules are rarely published online, as they can be too complex for users to interpret on their own. Legal aid programs may also worry about misinterpretation and inaccurate self-screening by clients. Instead, they keep these screening rules private to their staff.

Moreover, the intake process can involve up to 30+ rules about which cases to accept. These rules can vary between legal aid groups and can also change frequently (often in part because of funding that changes frequently). This “rules complexity” makes it hard for call center workers to provide consistent, accurate determinations about whose case will be accepted, leading to long wait times and inconsistent screening results. The challenge is to reduce the time legal aid workers spend screening without incorrectly denying services to those who qualify.

The Proposed Intervention: Integrating LLMs for Faster, Smarter Intake

To address this issue, Quinten, Hannes, and their partners have been exploring whether LLMs can help automate parts of the intake process. Specifically, they asked:

  • Can LLMs quickly determine whether someone qualifies for legal aid?
  • Can this system reduce the time spent on screening and make intake more efficient?

The solution they developed is part of the Missouri Tenant Help project, a hybrid system that combines rule-based questions with LLM-powered responses. The site’s intake system begins by asking straightforward, rules-based questions about citizenship, income, location, and problem description. It uses DocAssemble, a flexible platform that integrates Missouri-specific legal screening questions with central rules from Suffolk’s Court Forms Online for income limits and federal guidelines.

At one point in the intake workflow, the system prompts users to describe their problem in a free-text box. The LLM then analyzes the input, cross-referencing it with the legal aid group’s eligibility rules. If the system still lacks sufficient data, it generates follow-up questions in real-time, using a low-temperature model version to ensure consistent and cautious output.

For example, if a user says, “I got kicked out of my house,” the system might follow up with, “Did your landlord give you any formal notice or involve the court before evicting you?” The goal is to quickly assess whether the person might qualify for legal help while minimizing unnecessary back-and-forth. The LLM’s job is to identify the legal problem at issue, and then match this specific legal problem with the case types that legal aid groups around Missouri may take (or may not).

If the LLM works perfectly, it would be able to predict correctly whether a legal aid group is likely to take on this case, is likely to decline it, or if it is borderline.

The Experiment: Testing Different LLMs

To evaluate the system, the team conducted an experiment using 16 scenarios, 3 sets of legal aid program rules, and 8 different LLMs (including open-source, commercial, and popular models). The main question was whether the system could accurately match the “accept” or “reject” labels that legal experts had assigned to the scenarios.

The team found that the LLMs did a fairly accurate job at predicting which cases should be accepted or not. Overall, the LLMs correctly predicted acceptance or rejection with 84% precision, and GPT-4 Turbo performed the best.

Of particular interest were the rates of inaccurate predictions to reject a case. The system rarely made incorrect denials, which is critical for avoiding unjust exclusion from services. Rather, the LLM erred on the side of caution, often generating follow-up questions rather than making definitive, potentially incorrect judgments.

However, it sometimes asked for unnecessary follow-up information even when it already had enough data. This could mean that it led to a bad user experience, asking for too many redundant details and delaying making a decision. The problem was not around inaccuracy, though.

Challenges and Insights

One surprising result was that the LLMs sometimes caught errors made by human labelers. For example, in one case involving a support animal in Kansas City, the model correctly identified that a KC legal aid group was likely to accept this case, while the human reviewer mistakenly marked it as a likely denial. This underscores the potential of LLMs to enhance accuracy when paired with human oversight.

However, the LLMs also faced unique challenges.

  • Some models, like Gemini, refused to engage with topics related to domestic violence due to content moderation settings. This raised questions about whether AI developers understand the nuances of legal contexts. It also flagged the importance of screening possible models for use, depending on whether they censor legal topics.
  • The system also struggled with ambiguous scenarios, like evaluating whether “flimsy doors and missing locks” constituted a severe issue. Such situations highlighted the need for more tailored training and model configuration.

User Feedback and Next Steps

The system has been live for a month and a half and is currently offered as an optional self-screening tool on the Missouri Tenant Help website. Early feedback from legal aid partners has been positive, with high satisfaction ratings from users who tested the system. Some service providers noted they would like to see more follow-up questions to gather comprehensive details upfront — envisioning the LLM doing even more data-gathering, beyond what is needed to determine if a case is likely to be accepted or rejected.

In the future, the team aims to continue refinement and planning work, including to:

  1. Refine the LLM prompts and training data to better capture nuanced legal issues.
  2. Improve system accuracy by integrating rules-based reasoning with LLM flexibility.
  3. Explore more cost-effective models to keep the service affordable — currently around 5 cents per interaction.
  4. Enhance error handling by implementing model switching when a primary LLM fails to respond or disengages due to sensitive content.

Can LLMs and Humans Work Together?

This project exemplifies how LLMs and human experts can complement each other. Rather than fully automating intake, the system serves as a first-pass filter. It gives community members a quicker tool to get a high-level read on whether they are likely to get services from a legal aid group, or whether it would be better for them to pursue another service.

Rather than waiting for hours on a phone line, the user can choose to use this tool to get quicker feedback. They can still call the program — the system does not issue a rejection, but rather just gives them a prediction of what the legal aid will tell them.

The next phase will involve ongoing live testing and iterative improvements to balance speed, accuracy, and user experience.

The Future of Improving Legal Intake with AI

As legal aid programs increasingly look to AI and LLMs to streamline intake, several key opportunities and challenges are emerging.

1. Enhancing Accuracy and Contextual Understanding:

One promising avenue is the development of more nuanced models that can better interpret ambiguous or context-dependent situations. For instance, instead of flagging a potential denial based solely on rigid rule interpretations, the system could use context-aware prompts that take into account local regulations and specific case details. This might involve combining rule-based logic with adaptive LLM responses to better handle edge cases, like domestic violence scenarios or complex tenancy disputes.

2. Adaptive Model Switching:

Another promising approach is to implement a hybrid model system that dynamically switches between different LLMs depending on the context. For example, if a model like Gemini refuses to address sensitive topics, the system could automatically switch to a more legally knowledgeable model or one with fewer content moderation constraints. This could be facilitated by a router API that monitors for censorship or errors and adjusts the model in real time.

3. More Robust Fact Gathering:

A significant future goal is to enhance the system’s ability to collect comprehensive facts during intake. Legal aid workers noted that they often needed follow-up information after the initial screening, especially when the client’s problem involved specific housing issues or complex legal nuances. The next version of the system will focus on expanding the follow-up question logic to reduce the need for manual callbacks. This could involve developing predefined question trees for common issues while maintaining the model’s ability to generate context-specific follow-up questions.

4. Tailoring to Local Needs and Specific Use Cases:

One of the biggest challenges for scaling AI-based intake systems is ensuring that they are flexible enough to adapt to local legal nuances. The team is considering ways to contextualize the system for individual jurisdictions, potentially using open-source approaches to allow local legal aid programs to train their own versions. This could enable more customized intake systems that better reflect local policies, tenant protections, and court requirements.

5. Real-Time Human-AI Collaboration:

Looking further ahead, there is potential for building integrated systems where AI actively assists call center workers in real time. For instance, instead of having the AI conduct intake independently, it could listen to live calls and provide real-time suggestions to human operators, similar to how customer support chatbots assist agents. This would allow AI to augment rather than replace human judgment, helping to maintain quality control and legal accuracy.

6. Privacy and Ethical Considerations:

As these systems evolve, maintaining data privacy and ethical standards will be crucial. The current setup already segregates personal information from AI processing, but as models become more integrated into intake workflows, new strategies may be needed. Exploring privacy-preserving AI methods and data anonymization techniques will help maintain compliance while leveraging the full potential of LLMs.

7. Cost and Efficiency Optimization:

At the current cost of around 5 cents per interaction, the system remains relatively affordable, but as more users engage, maintaining cost efficiency will be key. The team plans to experiment with more affordable model versions and optimize the routing strategy to ensure that high-quality responses are delivered at a sustainable price. The goal is to make the intake process not just faster but also economically feasible for widespread adoption.

Building the Next Generation of Legal Aid Systems

Quinten’s presentation at AI + Access to Justice seminar made it clear that while LLMs hold tremendous potential for improving legal intake, human oversight and adaptive systems are crucial to ensure reliability and fairness. The current system’s success — 84% precision, minimal false denials, and positive user feedback — shows that AI-human collaboration is not only possible but also promising.

As the team continues to refine the system, they aim to create a model that can balance efficiency with accuracy, while being adaptable to the diverse and dynamic needs of legal aid programs. The long-term vision is to develop a scalable, open-source tool that local programs can fine-tune and deploy independently, making access to legal support faster and more reliable for those who need it most.

Read the research article in detail here.

See more at Quinten’s group Lemma Legal: https://lemmalegal.com/

Read more about Hannes at Maastricht University: https://cris.maastrichtuniversity.nl/en/persons/hannes-westermann

Categories
AI + Access to Justice Current Projects

Justice AI Co-Pilots

The Stanford Legal Design Lab is proud to announce a new initiative funded by the Gates Foundation that aims to bring the power of artificial intelligence (AI) into the hands of legal aid professionals. With this new project, we’re building and testing AI systems—what we’re calling “AI co-pilots”—to support legal aid attorneys and staff in two of the most urgent areas of civil justice: eviction defense and reentry debt mitigation.

This work continues our Lab’s mission to design and deploy innovative, human-centered solutions that expand access to justice, especially for those who face systemic barriers to legal support.

A Justice Gap That Demands Innovation

Across the United States, millions of people face high-stakes legal problems without any legal representation. Eviction cases and post-incarceration debt are two such areas, where legal complexity meets chronic underrepresentation—leading to outcomes that can reinforce poverty, destabilize families, and erode trust in the justice system.

Legal aid organizations are often the only line of defense for people navigating these challenges, but these nonprofits are severely under-resourced. These organizations are on the front lines of help, but often are stretched thin with staffing, tech, and resources.

The Project: Building AI Co-Pilots for Legal Aid Workflows

In collaboration with two outstanding legal aid partners—Legal Aid Foundation of Los Angeles (LAFLA) and Legal Aid Services of Oklahoma (LASO)—we are designing and piloting four AI co-pilot prototypes: two for eviction defense, and two for reentry debt mitigation.

These AI tools will be developed to assist legal aid professionals with tasks such as:

  • Screening and intake
  • Issue spotting and triage
  • Drafting legal documents
  • Preparing litigation strategies
  • Interpreting complex legal rules

Rather than replacing human judgment, these tools are meant to augment legal professionals’ work. The aim is to free up time for higher-value legal advocacy, enable legal teams to take on more clients, and help non-expert legal professionals assist in more specialized areas.

The goal is to use a deliberate, human-centered process to first identify low-risk, high-impact tasks for AI to do in legal teams’ workflows, and then to develop, test, pilot, and evaluate new AI solutions that can offer safe, meaningful improvements to legal service delivery & people’s social outcomes.

Why Eviction and Reentry Debt?

These two areas were chosen because of their widespread and devastating impacts on people’s housing, financial stability, and long-term well-being.

Eviction Defense

Over 3 million eviction lawsuits are filed each year in the U.S., with the vast majority of tenants going unrepresented. Without legal advocacy, many tenants are unaware of their rights or defenses. It’s also hard to fill in the many complicated legal documents required to participate in they system, protect one’s rights, and avoid a default judgment. This makes it difficult to negotiate with landlords, comply with court requirements, and protect one’s housing and money.

Evictions often happen in a matter of weeks, and with a confusing mix of local and state laws, it can be hard for even experienced attorneys to respond quickly. The AI co-pilots developed through this project will help legal aid staff navigate these rules and prepare more efficiently—so they can support more tenants, faster.

Reentry Debt

When people return home after incarceration, they often face legal financial obligations that can include court fines, restitution, supervision fees, and other penalties. This kind of debt can make it hard for a person to get to stability with housing, employment, driver’s licenses, and family.

According to the Brennan Center for Justice, over 10 million Americans owe more than $50 billion in reentry-related legal debt. Yet there are few tools to help people navigate, reduce, or resolve these obligations. By working with LASO, we aim to prototype tools that can help legal professionals advise clients on debt relief options, identify eligibility for fee waivers, and support court filings.

What Will the AI Co-Pilots Actually Do?

Each AI co-pilot will be designed for real use in legal aid organizations. They’ll be integrated into existing workflows and tailored to the needs of specific roles—like intake specialists, paralegals, or staff attorneys. Examples of potential functionality include:

  • Summarizing client narratives and flagging relevant legal issues
  • Filling in common forms and templates based on structured data
  • Recommending next steps based on jurisdictional rules and case data
  • Generating interview questions for follow-up conversations
  • Cross-referencing legal codes with case facts

The design process will be collaborative and iterative, involving continuous feedback from attorneys, advocates, and technologists. We will pilot and evaluate each tool rigorously to ensure its effectiveness, usability, and alignment with legal ethics.

Spreading the Impact

While the immediate goal is to support LAFLA and LASO, we are designing the project with national impact in mind. Our team plans to publish:

  • Open-source protocols and sample workflows
  • Evaluation reports and case studies
  • Responsible use guidelines for AI in legal aid
  • Collaboration pathways with legal tech vendors

This way, other legal aid organizations can replicate and adapt the tools to their own contexts—amplifying the reach of the project across the U.S.

“There’s a lot of curiosity in the legal aid field about AI—but very few live examples to learn from,” Hagan said. “We hope this project can be one of those examples, and help the field move toward thoughtful, responsible adoption.”

Responsible AI in Legal Services

At the Legal Design Lab, we know that AI is not a silver bullet. Tools must be designed thoughtfully, with attention to risks, biases, data privacy, and unintended consequences.

This project is part of our broader commitment to responsible AI development. That means:

  • Using human-centered design
  • Maintaining transparency in how tools work and make suggestions
  • Prioritizing data privacy and user control
  • Ensuring that tools do not replace human judgment in critical decisions

Our team will work closely with our legal aid partners, domain experts, and the communities served to ensure that these tools are safe, equitable, and truly helpful.

Looking Ahead

Over the next two years, we’ll be building, testing, and refining our AI co-pilots—and sharing what we learn along the way. We’ll also be connecting with national networks of eviction defense and reentry lawyers to explore broader deployment and partnerships.

If you’re interested in learning more, getting involved, or following along with project updates, sign up for our newsletter or follow the Lab on social media.

We’re grateful to the Gates Foundation for their support, and to our partners at LAFLA and LASO for their leadership, creativity, and deep dedication to the clients they serve.

Together, we hope to demonstrate how AI can be used responsibly to strengthen—not replace—the critical human work of legal aid.

Categories
AI + Access to Justice Current Projects

ICAIL workshop on AI & Access to Justice

The Legal Design Lab is excited to co-organize a new workshop at the International Conference on Artificial Intelligence and Law (ICAIL 2025):

AI for Access to Justice (AI4A2J@ICAIL 2025)
📍 Where? Northwestern University, Chicago, Illinois, USA
🗓 When? June 20, 2025 (Hybrid – in-person and virtual participation available)
📄 Submission Deadline: May 4, 2025
📬 Acceptance Notification: May 18, 2025

Submit a paper here https://easychair.org/cfp/AI4A2JICAIL25

This workshop brings together researchers, technologists, legal aid practitioners, court leaders, policymakers, and interdisciplinary collaborators to explore the potential and pitfalls of using artificial intelligence (AI) to expand access to justice (A2J). It is part of the larger ICAIL 2025 conference, the leading international forum for AI and law research, hosted this year at Northwestern University in Chicago.


Why this workshop?

Legal systems around the world are struggling to meet people’s needs—especially in housing, immigration, debt, and family law. AI tools are increasingly being tested and deployed to address these gaps: from chatbots and form fillers to triage systems and legal document classifiers. Yet these innovations also raise serious questions around risk, bias, transparency, equity, and governance.

This workshop will serve as a venue to:

  • Share and critically assess emerging work on AI-powered legal tools
  • Discuss design, deployment, and evaluation of AI systems in real-world legal contexts
  • Learn from cross-disciplinary perspectives to better guide responsible innovation in justice systems


What are we looking for?

We welcome submissions from a wide range of contributors—academic researchers, practitioners, students, community technologists, court innovators, and more.

We’re seeking:

  • Research papers on AI and A2J
  • Case studies of AI tools used in courts, legal aid, or nonprofit contexts
  • Design proposals or system demos
  • Critical perspectives on the ethics, policy, and governance of AI for justice
  • Evaluation frameworks for AI used in legal services
  • Collaborative, interdisciplinary, or community-centered work

Topics might include (but are not limited to):

  • Legal intake and triage using large language models (LLMs)
  • AI-guided form completion and document assembly
  • Language access and plain language tools powered by AI
  • Risk scoring and case prioritization
  • Participatory design and co-creation with affected communities
  • Bias detection and mitigation in legal AI systems
  • Evaluation methods for LLMs in legal services
  • Open-source or public-interest AI tools

We welcome both completed projects and works-in-progress. Our goal is to foster a diverse conversation that supports learning, experimentation, and critical thinking across the access to justice ecosystem.


Workshop Format

The workshop will be held on June 20, 2025 in hybrid format—with both in-person sessions in Chicago, Illinois and the option for virtual participation. Presenters and attendees are welcome to join from anywhere.


Workshop Committee

  • Hannes Westermann, Maastricht University Faculty of Law
  • Jaromír Savelka, Carnegie Mellon University
  • Marc Lauritsen, Capstone Practice Systems
  • Margaret Hagan, Stanford Law School, Legal Design Lab
  • Quinten Steenhuis, Suffolk University Law School


Submit Your Work

For full submission guidelines, visit the official workshop site:
https://suffolklitlab.org/ai-for-access-to-justice-at-the-international-conference-on-ai-and-law-2025-ai4a2j-icail25/

Submit your paper at EasyChair here.

Submissions are due by May 4, 2025.
Notifications of acceptance will be sent by May 18, 2025.


We’re thrilled to help convene this conversation on the future of AI and justice—and we hope to see your ideas included. Please spread the word to others in your network who are building, researching, or questioning the role of AI in the justice system.

Categories
AI + Access to Justice Current Projects

How AI is Augmenting Human-Led Legal Advice at Citizens Advice

Caddy Chatbot to Support Supervision of Legal Advisers and Improve Q&A

The Citizens Advice network in England and Wales is a cornerstone of free legal and social support, with a network of 270 local organizations operating across 2,540 locations. In 2024 alone, it provided advice to 2.8 million people via phone, email, and web chat. However, the rising cost-of-living crisis in the UK has increased the demand for legal assistance, particularly in areas such as energy disputes, welfare benefits, and debt management.

The growing complexity of cases, coupled with a shortage of experienced supervisors, has created a bottleneck. Trainees require more guidance, supervisors are overburdened, and delays in responses mean clients wait longer for critical help.

At the March 7, 2025 Stanford AI and Access to Justice research webinar, Stuart Pearson of the Citizens Advice SORT group (part of the broader Citizen Advice network in England) shared how they are using Generative AI (GenAI) to support their advisers responsibly — not replace them. Their AI system, Caddy, was designed to amplify human interaction, reduce response times, and increase the efficiency of advisers and supervisors. But critically, Citizens Advice remains committed to a human-led service model, ensuring that AI enhances, rather than replaces, human expertise.

The Challenge: More Demand, Fewer Experts

Historically, when a trainee adviser encountered a complex case, they would reach out to a supervisor via chat for guidance. Supervisors would step in, identify key legal issues, and suggest an appropriate course of action.

However, as demand for legal help surged, this model became unsustainable:

  • More cases required complex supervision.
  • Supervisors faced an overwhelming number of trainee queries.
  • Delays in responses led to bottlenecks in service delivery.
  • Clients experienced longer wait times.

The question was: Could AI alleviate some of the pressure on supervisors while maintaining quality and ethical standards?

The Caddy Solution: AI as a Support Tool, Not a Replacement

Caddy is a human-in-the-loop Q&A tool, that has a more junior helper using Caddy + a Supervisor Review to get the right answer to a user’s question.

How Caddy Works

Caddy was designed as an AI-powered assistant embedded in a group’s work software environment (like Microsoft 365 or Google Workspace). It allows trainees and supervisors to:

  1. Ask Caddy a question about a client’s legal issue, that has come through an adviser-client interaction.
  2. Caddy searches only trusted sources (such as 2 well-maintained websites, one from Gov.uk and the other Citizens Advice’s own knowledge base).
  3. Caddy generates a proposed response, including relevant links that is meant to guide the adviser in their interactions with the client.
  4. A supervisor reviews, edits, and approves the response. They have a box to put edits. And they have 2 buttons — thumbs up or thumbs down to either approve or reject the response.
  5. If the supervisor gives a thumbs up, Caddy lets the adviser know. It also tells the adviser the extra context given by the supervisor.
  6. The adviser relays the verified answer to the client. They can reframe or contextualize it, to ensure that the client is able to understand the details and rules.

Caddy does not replace human decision-making. Instead, it streamlines research, reduces supervisor workload, and increases response speed. It also does not communicate directly with a member of the public. It is drafting guidance for a service provider to use in their interactions with the user.

Core Ethical Principles

From the outset, Citizens Advice set clear ethical guidelines to ensure AI was used responsibly and inclusively:

1. Clients must always speak to a human.
2. Every AI-generated response must be reviewed by a supervisor.
3. Caddy only uses pre-approved, trusted sources.
4. Transparency: Advisers know when they are using AI-generated information.

This approach aligns with the UK Government’s Algorithmic Transparency Recording Standards (ATRS), ensuring AI applications are openly documented and publicly accountable.

Pilot Program: Testing AI’s Impact in Legal Advice

To assess Caddy’s real-world effectiveness, Citizens Advice ran a 4–6 week pilot in six local offices, measuring key near-term outcomes:

  • Accuracy of AI-generated responses
  • Time saved per case
  • Adviser feedback
  • Government evaluation on AI in public services

From the initial pilot testing, the group has been gathering responses that are largely positive and bode well for future use.

Accuracy rates were quite high. 80% of AI responses were supervisor-approved — Caddy provided correct answers 8 out of 10 times.

Time saved was another positive outcome. Response times dropped by 50% — from 10 minutes down to 4 minutes, allowing tens of thousands more clients to be helped.

For qualitative stakeholder feedback, advisers appreciated the efficiency but wanted more features. They had some ideas about improving performance, workflows, approval protocols, and other points.

The pilot responses helped identify some important drawbacks that the team is working on. Where was the 20% inaccuracy coming from? How can the advisers and users be more satisfied?

Limited Information Sources

Caddy was initially restricted to two websites. While these were high-quality sources, they weren’t always comprehensive — especially for specialized welfare or debt cases.

Now the team is exploring a possible solution. They’re considering expanding Caddy’s trusted source list while maintaining accuracy controls.

Issues with Vague Queries

AI struggled with unclear or incomplete questions, leading to lower-quality responses. A possible solution here is to train advisers on better prompting techniques and add follow-up question capabilities.

Supervisor Bottlenecks

Some advisers wanted the ability to approve AI responses without waiting for a supervisor in low-risk cases. The solution here involves exploring self-approval options for experienced advisers. They wouldn’t have to wait for a supervisor to approve before they can proceed with Caddy’s response.

Ensuring AI is Inclusive and Ethical

Citizens Advice took a proactive approach to public engagement and ethical AI governance. Many of their strategies can be used by other groups interested in the responsible development of AI.

Engaging Clients Through a “People’s Panel”

The team partnered with Manchester Metropolitan University, which had independently been creating an AI Advisory Panel of citizens. This university-led effort recruited members of the public to join the panel, and attend AI boot camps to educate the public about AI’s role in legal advice. Then they were presented with projects like Caddy, to get their feedback and then gave feedback on the tool, risks, ethics, and features.

Governance and Risk Management

The team also went through planning requirements and standards for its tool, by going through steps like:

  • Consequence Scanning: What are the risks of using AI in legal advice?
  • Planning for Trust & Reputation: Citizens Advice has existed since 1939 — maintaining public trust is paramount. Any new tech tool must enhance this reputation, rather than endanger it.
  • Constructing Shared Infrastructure for Scalability and Transparency: Caddy is open-source and available on GitHub so other nonprofits can build their own AI tools.

Future Developments: Expanding Caddy’s Capabilities

Here are some of the coming changes and improvements coming to Caddy in the near future.

Expanding Pilot to a National Rollout

Later this year, Caddy will roll out to the national Citizens Advice network, beyond its first pilot locations. This deliberate expansion will come after the team has had a chance to learn and address the issues that arose during the local pilots.

Conversational AI for More Dynamic Responses

Caddy will soon ask follow-up questions to refine responses in real-time. This can help address issues around vague questions that lead to answers that are not helpful or not accurate.

Building a Bank of “100% Accurate” Answers

The goal is to create a repository of vetted AI-generated responses that could be used without supervisor review. If successful, Caddy could be rolled out as a client-facing chatbot for basic legal queries.

AI-Powered Training Tools for Advisers

Here, the system could use call transcripts to auto-generate case notes and quality assessments. It could identify gaps in adviser knowledge by analyzing the types of questions they ask.

Or it could develop virtual clients for AI-powered role-playing training sessions.

Lessons from the Caddy Experiment: The Future of AI in Access to Justice

Caddy’s pilot program offers a blueprint for AI-assisted legal services. The key takeaways, for AI for legal help (at least at the beginning of 2025):

  • AI should be an assistive tool, not a replacement for human advisers. Especially as a generative AI pilot is in its first stage, it’s good to pilot it in an assistant role, with humans still providing substantial oversight over it.
  • Supervision and human oversight are crucial for ethical AI in legal services.
  • Training on prompting and follow-up questions improves AI accuracy.
  • Community involvement is essential — clients must have a say in AI’s role. Partnering with a university is a great way to get more client and community members’ input.
  • Transparency and governance are key to maintaining trust.

Citizens Advice’s journey with Caddy highlights that responsible AI can enhance access to justice while ensuring that legal support remains human-centered, ethical, and inclusive. As AI continues to evolve, the real challenge will be balancing innovation with trust, oversight, and accountability — a challenge that Citizens Advice is well-positioned to lead.

Categories
Current Projects Eviction Innovation

Data to Advance Access to Justice Efforts Around the Country

Lessons from the Eviction Diversion Initiative for Coordinated Data & Outcomes Measurement

Eviction diversion programs aim to prevent unnecessary displacement by connecting tenants and landlords with referrals, resources, and resolution options such as mediation. As these programs expand, evaluating their effectiveness is crucial for improving services, influencing policy, and ensuring they meet the needs of vulnerable communities.

At a recent working group session of the Access to Justice Network/Self-Represented Litigation Network on research for access to justice, Neil Steinkamp of Stout led a discussion on strategies for measuring program impact, refining data collection, and translating insights into policy action. Neil has led the evaluation & impact assessment of the Eviction Diversion Initiative (EDI). The National Center for State Courts is leading a cohort of state courts in the EDI to build, refine, and evaluate new eviction diversion efforts around the country and recently released an interim evaluation report on the EDI.

Below are key takeaways from the conversation about Neil’s learnings in building a multi-jurisdiction, standardized (but customizable) evaluation framework, to gather similar data, collect it, and make it useful to stakeholders in many different jurisdictions.

Building a Framework for Evaluation

The primary goal of evaluating eviction diversion programs is often to understand who is being served, what their experiences are, and how well programs link them to resources. Instead of starting with rigid hypotheses, evaluators should approach this work with curiosity and open-ended questions:

  • What do we need to learn about the impact of eviction diversion?
  • What data is both useful and feasible to collect?
  • How can data collection evolve over time to stay relevant?

A flexible, iterative approach ensures that evaluation remains meaningful as conditions change. Some data points may become less useful, while new insights emerge that reshape how success is measured.

Balancing Consistency & Flexibility in Data Collection

Data consistency across jurisdictions is essential for comparisons, trend analysis, and deeper conversations on outcomes. However, differences in court structures, available resources, and local policies mean a one-size-fits-all approach won’t work.

A practical balance found in the EDI was 80% standardized questions for cross-jurisdictional alignment, with 20% tailored to local needs. This allows for:

  • Identifying national trends in eviction prevention.
  • Accounting for regional differences in housing, financial aid, and social service access.
  • Enabling courts and service providers to track their unique challenges and successes.

The key is avoiding overburdening staff or participants with excessive data collection while still capturing essential insights that can help demonstrate impact, identify opportunities for improvement, and assist in advocating for sustainable funding.

Tools & Methods: Making Data Collection Work for Courts

Many courts lack the flexibility to modify their existing case management systems. To address this, evaluators used separate, simple tools like Microsoft Forms and Microsoft Excel Online to streamline intake and reporting, reducing staff workload.

To make sense of the collected data, they developed visual dashboards that:

  • Simplify complex datasets.
  • Enable courts and policymakers to track progress in real time.
  • Highlight gaps in services or emerging trends in housing instability.

By using easy-to-implement tools, courts were able to enhance efficiency without major technology overhauls.

Voluntary Data Collection & Trauma-Informed Approaches

All demographic and personal data collected in the evaluation was voluntary, and most participants were willing to share information. However, a trauma-informed approach recognizes that some individuals may feel uncomfortable disclosing details, especially in high-stress legal situations.

Evaluators emphasized the importance of creating a safe, respectful data collection process — one that builds trust and ensures that participation does not feel coercive or invasive.

Key Insights from the Evaluation for Eviction Prevention Policymaking

Data collected through the eviction diversion pilot programs revealed critical insights into eviction patterns and tenant needs:

  • Eviction is often preventable with modest financial support — Small rental arrears, often less than $1,000, are a key driver of many evictions.
  • Affordability crises go beyond rent payments — Many tenants face financial instability due to job loss, medical issues, or family obligations.
  • A mix of services is essential — Rental assistance alone is not always enough; tenants benefit significantly from legal aid, mediation, and access to other social services.

One major takeaway is that data should be treated as a living resource — each year, evaluators should reassess what they are tracking to ensure they are capturing the most relevant and impactful information.

From Data to Policy: The Power of Evidence-Driven Decisions

Coordinated eviction diversion data plays a powerful role in shaping policy and influencing resource allocation. Some key ways data has been used in practice include:

  • Informing state and local housing policies — Some jurisdictions used data insights to refine funding strategies for eviction prevention programs.
  • Stakeholder engagement — Some jurisdictions used data to inform effective dialogue among community stakeholders including the landlord community, tenant advocates, agencies in the continuum of care, and other local stakeholders.
  • Strengthening legal and mediation services — Data demonstrated that legal aid and mediation can be as crucial as rental assistance, leading to investments in expanded legal support.
  • Improving landlord-tenant relationships — Greater transparency about rental arrears and eviction patterns has helped courts and service providers create more effective intervention strategies.

While policymakers often focus on one intervention type, such as rental assistance, evaluators advocate for a holistic “AND” approach — combining legal support, financial aid, and mediation to achieve the best outcomes.

What’s Next? Refining the Future of Eviction Diversion

Looking ahead, the focus will be on:

  • Refining data collection practices — Enhancing the consistency and efficiency of data gathering tools.
  • Maintaining adaptability — Regularly reassessing what data matters most.
  • Encouraging a comprehensive approach to eviction prevention — Strengthening connections between legal and social support services.
  • Using data to inform and advocate for policy changes — Ensuring decision-makers understand that eviction diversion is more than just financial aid — it’s a system of interventions that must work together.

By grounding policy and program design in real-world data, eviction diversion efforts can be more effective, equitable, and responsive to community needs.

As this work continues, stakeholders across courts, legal aid, housing advocacy, and the landlord community must keep asking:

What do we need to learn next? And how can we use that knowledge to prevent unnecessary evictions?

Expanding the Impact: How Federated Data Collection Can Benefit Legal Services

Beyond eviction diversion programs, federated, standardized data collection has the potential to transform other areas of legal services, such as court help centers, brief advice clinics, and nonprofit legal aid organizations. These groups often work in silos, collecting data in ways that are inconsistent across jurisdictions, making it difficult to compare outcomes, identify systemic issues, or advocate for policy changes.

By adopting a shared framework for data collection — where core metrics are standardized but allow for local customization — legal service providers could gain richer insights into who they are serving, what legal issues are most pressing, and which interventions lead to the best outcomes. For example, help centers could track common questions and barriers to accessing services, while brief advice clinics could measure the impact of legal guidance on case outcomes.

This type of data-driven coordination could help funders, policymakers, and service providers make smarter investments, target resources more effectively, and ultimately improve access to justice at scale.

Join the Access to Justice Network to learn more about projects like this one. The Network is a community of justice professionals that share best practices & spread innovations across jurisdictions.

See a webinar of Neil and his colleagues present more on evaluating the Eviction Diversion Initiative here.