A Report on an AI-Powered Intake & Screening Workflow for Legal Aid Teams
AI for Legal Help, Legal Design Lab, 2025
This report provides a write-up of the AI for Housing Legal Aid Intake & Screening class project, that was one track of the “AI for Legal Help” Policy Lab, during the Autumn 2024 and Winter 2025 quarters. The AI for Legal Help course involved work with legal and court groups that provide legal help services to the public, to understand where responsible AI innovations might be possible and to design and prototype initial solutions, as well as pilot and evaluation plans.
One of the project tracks was on improving the workflows of legal aid teams who provide housing help, particularly with their struggle of high demand from community members but a lack of clarity on exactly whether a person can be served by the legal aid group & how. Between Autumn 2024 and Winter 2025, an interdisciplinary team of Stanford University students partnered with the Legal Aid Society of San Bernardino (LASSB) to understand the current design of housing intake & screening, and to propose an improved, AI-powered workflow.
This report details the problem identified by LASSB, the proposed AI-powered intake & screening workflow developed by the student team, and recommendations for future development and implementation.
We share it in the hopes that legal aid and court help center leadership might also be interested in exploring responsible AI development for demand letters, and that funders, researchers, and technologists might collaborate on developing and testing successful solutions for this task.
Thank you to students in this team: Favour Nerisse, Gretel Cannon, Tatiana Zhang, and other collaborators.. And a big thank you to our LASSB colleagues: Greg Armstrong, Pablo Ramirez, and more.
Introduction
The Legal Aid Society of San Bernardino (LASSB) is a nonprofit law firm serving low-income residents across San Bernardino and Riverside Counties, where housing issues – especially evictions – are the most common legal problems facing the community. Like many legal aid organizations, LASSB operates under severe resource constraints and high demand.

In the first half of 2024 alone, LASSB assisted over 1,200 households (3,261 individuals) with eviction prevention and landlord-tenant support. Yet many more people seek help than LASSB can serve, and those who do seek help often face barriers like long hotline wait times or lack of transportation to clinics. These challenges make the intake process – the initial screening and information-gathering when a client asks for help – a critical bottleneck. If clients cannot get through intake or are screened out improperly, they effectively have no access to justice.
Against this backdrop, LASSB partnered with a team of Stanford students in the AI for Legal Help practicum to explore an AI-based solution. The task selected was housing legal intake: using an AI “Intake Agent” to streamline eligibility screening and initial fact-gathering for clients with housing issues (especially evictions). The proposed solution was a chatbot-style AI assistant that could interview applicants about their legal problem and situation, apply LASSB’s intake criteria, and produce a summary for legal aid staff. By handling routine, high-volume intake questions, the AI agent aimed to reduce client wait times and expand LASSB’s reach to those who can’t easily come in or call during business hours. The students planned a phased evaluation and implementation: first prototyping the agent with sample data, then testing its accuracy and safety with LASSB staff, before moving toward a limited pilot deployment. This report details the development of that prototype AI Intake Agent across the Autumn and Winter quarters, including the use case rationale, current vs. future workflow, technical design, evaluation findings, and recommendations for next steps.
1: The Use Case – AI-Assisted Housing Intake

Defining the Use Case of Intake & Screening
The project focused on legal intake for housing legal help, specifically tenants seeking assistance with eviction or unsafe housing. Intake is the process by which legal aid determines who qualifies for help and gathers the facts of their case. For a tenant facing eviction, this means answering questions about income, household, and the eviction situation, so the agency can decide if the case falls within their scope (for example, within income limits and legal priorities).
Intake is a natural first use case because it is a gateway to justice: a short phone interview or online form is often all that stands between a person in crisis and the help they need. Yet many people never complete this step due to practical barriers (long hold times, lack of childcare or transportation, fear or embarrassment).

By improving intake, LASSB could assist more people early, preventing more evictions or legal problems from escalating.
Why LASSB Chose Housing Intake
LASSB and the student team selected the housing intake scenario for several reasons. First, housing is LASSB’s highest-demand area – eviction defense was 62% of cases for a neighboring legal aid and similarly dominant for LASSB. This high volume means intake workers spend enormous time screening housing cases, and many eligible clients are turned away simply because staff can’t handle all the calls. Improving intake throughput could thus have an immediate impact. Second, housing intake involves highly repetitive and rules-based questions (e.g. income eligibility, case type triage) that are well-suited to automation. These are precisely the kind of routine, information-heavy tasks that AI can assist with at scale.
Third, an intake chatbot could increase privacy and reach: clients could complete intake online 24/7, at their own pace, without waiting on hold or revealing personal stories to a stranger right away. This could especially help those in rural areas or those uncomfortable with an in-person or phone interview. In short, housing intake was seen as a high-impact, AI-ready use case where automation might improve efficiency while preserving quality of service.
Why Intake Matters for Access to Justice
Intake may seem mundane, but it is a cornerstone of access to justice. It is the “front door” of legal aid – if the door is locked or the line too long, people simply don’t get help. Studies show that only a small fraction of people with civil legal issues ever consult a lawyer, often because they don’t recognize their problem as legal or face obstacles seeking help. Even among those who do reach out to legal aid (nearly 2 million requests in 2022), about half are turned away due to insufficient resources. Many turn-aways happen at the intake stage, when agencies must triage cases. Improving intake can thus shrink the “justice gap” by catching more issues early and providing at least some guidance to those who would otherwise get nothing.
Moreover, a well-designed intake process can empower clients – by helping them tell their story, identifying their urgent needs, and connecting them to appropriate next steps. On the flip side, a bad intake experience (confusing questions, long delays, or perfunctory denials) can discourage people from pursuing their rights, effectively denying justice. By focusing on intake, the project aimed to make the path to legal help smoother and more equitable.
Why AI Is a Good Fit for Housing Intake
Legal intake involves high volume, repetitive Q&A, and standard decision rules, which are conditions where AI can excel. A large language model (LLM) can be programmed to ask the same questions an intake worker would, in a conversational manner, and interpret the answers.
Because LLMs can process natural language, an AI agent can understand a client’s narrative of their housing problem and spot relevant details or legal issues (e.g. identifying an illegal lockout vs. a formal eviction) to ask appropriate follow-ups. This dynamic questioning is something LLMs have demonstrated success in – for example, a recent experiment in Missouri showed that an LLM could generate follow-up intake questions “in real-time” based on a user’s description, like asking whether a landlord gave formal notice after a tenant said “I got kicked out.” AI can also help standardize decisions: by encoding eligibility rules into the prompt or system, it can apply the same criteria every time, potentially reducing inconsistent screening outcomes. Importantly, initial research found that GPT-4-based models could predict legal aid acceptance/rejection decisions with about 84% accuracy, and they erred on the side of caution (usually not rejecting a case unless clearly ineligible). This suggests AI intake systems can be tuned to minimize false denials, a critical requirement for fairness.
Beyond consistency and accuracy, AI offers scalability and extended reach. Once developed, an AI intake agent can handle multiple clients at once, anytime. For LASSB, this could mean a client with an eviction notice can start an intake at midnight rather than waiting anxious days for a callback. Other legal aid groups have already seen the potential: Legal Aid of North Carolina’s chatbot “LIA” has engaged in over 21,000 conversations in its first year, answering common legal questions and freeing up staff time. LASSB hopes for similar gains – the Executive Director noted plans to test AI tools to “reduce client wait times” and extend services to rural communities that in-person clinics don’t reach. Finally, an AI intake agent can offer a degree of client comfort – some individuals might prefer typing out their story to a bot rather than speaking to a person, especially on sensitive issues like domestic violence intersecting with an eviction. In summary, the volume, repetitive structure, and outreach potential of intake made it an ideal candidate for an AI solution.
2: Status Quo and Future Vision
Current Human-Led Workflow
At present, LASSB’s intake process is entirely human-driven. A typical workflow might begin with a client calling LASSB’s hotline or walking into a clinic. An intake coordinator or paralegal then screens for eligibility, asking a series of standard questions: Are you a U.S. citizen or eligible immigrant? What is your household size and income? What is your zip code or county? What type of legal issue do you have? These questions correspond to LASSB’s internal eligibility rules (for example, income below a percentage of the poverty line, residence in the service area, and case type within program priorities).

The intake worker usually follows a scripted guide – these guides can run 7+ pages of rules and flowcharts for different scenarios. If the client passes initial screening, the staffer moves on to information-gathering: taking down details of the legal problem. In a housing case, they might ask: “When did you receive the eviction notice? Did you already go to court? How many people live in the unit? Do you have any disabilities or special circumstances?” This helps determine the urgency and possible defenses (for instance, disability could mean a reasonable accommodation letter might help, or a lockout without court order is illegal). The intake worker must also gauge if the case fits LASSB’s current priorities or grant requirements – a subtle judgment call often based on experience.
Once information is collected, the case is handed off internally: if it’s straightforward and within scope, they may schedule the client for a legal clinic or assign a staff attorney for advice. If it’s a tougher or out-of-scope case, the client might be given a referral to another agency or a “brief advice” appointment where an attorney only gives counsel and not full representation. In some instances, there are multiple handoffs – for example, the person who does the phone screening might not be the one who ultimately provides the legal advice, requiring good note-taking and case summaries.
User Personas in the Workflow
The team crafted sample user and staff personas, of who would be interacting with the new workflow and AI agent.







Pain Points in the Status Quo
This human-centric process has several pain points identified by LASSB and the student team.
First, it’s slow and resource-intensive. Clients can wait an hour or more on hold before even speaking to an intake worker during peak times, such as when an eviction moratorium change causes a surge in calls. Staff capacity is limited – a single intake worker can only handle one client at a time, and each interview might take 20–30 minutes. If the client is ultimately ineligible, that time might be “wasted” that could have been spent on an eligible client. The sheer volume means many callers never get through at all.
Second, the complexity of rules can lead to inconsistent or suboptimal outcomes. Intake staff have to juggle 30+ eligibility rules, which can change with funding or policy shifts. Important details might be missed or misapplied; for example, a novice staffer might turn away a case that seems outside scope but actually fits an exception. Indeed, variability in intake decisions was a known issue – one research project found that LLMs sometimes caught errors made by human screeners (e.g., the AI recognized a case was eligible when a human mistakenly marked it as not).
Third, the process can be stressful for clients. Explaining one’s predicament (like why rent is behind) to a stranger can be intimidating. Clients in crisis might forget to mention key facts or have trouble understanding the questions. If a client has trauma (such as a domestic violence survivor facing eviction due to abuse), a blunt interview can inadvertently re-traumatize them. LASSB intake staff are trained to be sensitive, but in the rush of high volume, the experience may still feel hurried or impersonal.
Finally, timing and access are issues. Intake typically happens during business hours via phone or at specific clinic times. People who work, lack a phone, or have disabilities may struggle to engage through those channels. Language barriers can also be an issue; while LASSB offers services in Spanish and other languages, matching bilingual staff to every call is challenging. All these pain points underscore a need for a more efficient, user-friendly intake system.
Envisioned Human-AI Workflow
In the future-state vision, LASSB’s intake would be a human-AI partnership, blending automation with human judgment. The envisioned workflow goes as follows: A client in need of housing help would first interact with an AI Intake Agent, likely through a web chat interface (or possibly via a self-help kiosk or mobile app).

The AI agent would greet the user with a friendly introduction (making clear it’s an automated assistant) and guide them through the eligibility questions – e.g., asking for their income range, household size, and problem category. These could even be answered via simple buttons or quick replies to make it easy. The agent would use these answers to do an initial screening (following the same rules staff use). If clearly ineligible (for instance, the person lives outside LASSB’s service counties), the agent would not simply turn them away. Instead, it might gently inform them that LASSB likely cannot assist directly and provide a referral link or information for the appropriate jurisdiction. (Crucially, per LASSB’s guidance, the AI would err on inclusion – if unsure, it would mark the case for human review rather than issuing a flat denial.)
For those who pass the basic criteria, the AI would proceed to collect case facts: “Please describe what’s happening with your housing situation.” As the user writes or speaks (in a typed chat or possibly voice in the future), the AI will parse the narrative and ask smart follow-ups. For example, if the client says “I’m being evicted for not paying rent,” the AI might follow up: “Have you received court papers (an unlawful detainer lawsuit) from your landlord, or just a pay-or-quit notice?” – aiming to distinguish a looming eviction from an active court case. This dynamic Q&A continues until the AI has enough detail to fill out an intake template (or until it senses diminishing returns from more questions). The conversation is designed to feel like a natural interview with empathy and clarity.

After gathering info, the handoff to humans occurs. The AI will compile a summary of the intake: key facts like names, important dates (e.g., eviction hearing date if any), and the client’s stated goals or concerns. It may also tentatively flag certain legal issues or urgency indicators – for instance, “Client might qualify for a disability accommodation defense” or “Lockout situation – urgent” – based on what it learned. This summary and the raw Q&A transcript are then forwarded to LASSB’s intake staff or attorneys. A human will review the package, double-check eligibility (the AI’s work is a recommendation, not final), and then follow up with the client. In some cases, the AI might be able to immediately route the client: for example, scheduling them for the next eviction clinic or providing a link to self-help resources while they wait.
But major decisions, like accepting the case for full representation or giving legal advice, remain with human professionals. The human staff thus step in at the “decision” stage with a lot of the grunt work already done. They can spend their time verifying critical details and providing counsel, rather than laboriously collecting background info. This hybrid workflow means clients get faster initial engagement (potentially instantaneous via AI, instead of waiting days for a call) and staff time is used more efficiently where their expertise is truly needed.

Feedback-Shaped Vision
The envisioned workflow was refined through feedback from LASSB stakeholders and experts during the project. Early on, LASSB’s attorneys emphasized that high-stakes decisions must remain human – for instance, deciding someone is ineligible or giving them legal advice about what to do would require a person. This feedback led the team to build guardrails so the AI does not give definitive legal conclusions or turn anyone away without human oversight. Another piece of feedback was about tone and trauma-informed practice. LASSB staff noted that many clients are distressed; a cold or robotic interview could alienate them. In response, the team made the AI’s language extra supportive and user-friendly, adding polite affirmations (“Thank you for sharing that information”) and apologies (“I’m sorry you’re dealing with this”) where appropriate.
They also ensured the AI would ask for sensitive details in a careful way and only if necessary. For example, rather than immediately asking “How much is your income?” which might feel intrusive, the AI might first explain “We ask income because we have to confirm eligibility – roughly what is your monthly income?” to give context. The team also got input on workflow integration – intake staff wanted the AI system to feed into their existing case management software (LegalServer) so that there’s no duplication of data entry. This shaped the plan for implementation (i.e., designing the output in a format that can be easily transferred). Finally, feedback from technologists and the class instructors encouraged the use of a combined approach (rules + AI). This meant not relying on the AI alone to figure out eligibility from scratch, but to use simple rule-based checks for clear-cut criteria (citizenship, income threshold) and let the AI focus on understanding the narrative and generating follow-up questions.
This hybrid approach was validated by outside research as well. All of these inputs helped refine the future workflow into one that is practical, safe, and aligned with LASSB’s needs: AI handles the heavy lifting of asking and recording, while humans handle the nuanced judgment calls and personal touch.
3: Prototyping and Technical Work
Initial Concepts from Autumn Quarter
During the Autumn 2024 quarter, the student team explored the problem space and brainstormed possible AI interventions for LASSB. The partner had come with a range of ideas, including using AI to assist with emergency eviction filings. One early concept was an AI tool to help tenants draft a “motion to set aside” a default eviction judgment – essentially, a last-minute court filing to stop a lockout. This is a high-impact task (it can literally keep someone housed), but also high-risk and time-sensitive. Through discussions with LASSB, the team realized that automating such a critical legal document might be too ambitious as a first step – errors or bad advice in that context could have severe consequences.
Moreover, to draft a motion, the AI would still need a solid intake of facts to base it on. This insight refocused the team on the intake stage as the foundation. Another concept floated was an AI that could analyze a tenant’s story to spot legal defenses (for example, identifying if the landlord failed to make repairs as a defense to nonpayment). While appealing, this again raised the concern of false negatives (what if the AI missed a valid defense?) and overlapped with legal advice. Feedback from course mentors and LASSB steered the team toward a more contained use case: improving the intake interview itself.
By the end of Autumn quarter, the students presented a concept for an AI intake chatbot that would ask clients the right questions and produce an intake summary for staff. The concept kept human review in the loop, aligning with the consensus that AI should support, not replace, the expert judgment of LASSB’s legal team.
Revised Scope in Winter
Going into Winter quarter, the project’s scope was refined and solidified. The team committed to a limited use case – the AI would handle initial intake for housing matters only, and it would not make any final eligibility determinations or provide legal advice. All high-stakes decisions were deferred to staff. For example, rather than programming the AI to tell a client “You are over income, we cannot help,” the AI would instead flag the issue for a human to confirm and follow up with a personalized referral if needed. Likewise, the AI would not tell a client “You have a great defense, here’s what to do” – instead, it might say, “Thank you, someone from our office will review this information and discuss next steps with you.” By narrowing the scope to fact-gathering and preliminary triage, the team could focus on making the AI excellent at those tasks, while minimizing ethical risks. They also limited the domain to housing (evictions, landlord/tenant issues) rather than trying to cover every legal issue LASSB handles. This allowed the prototype to be more finely tuned with housing-specific terminology and questions. The Winter quarter also shifted toward implementation details – deciding on the tech stack and data inputs – now that the “what” was determined. The result was a clear mandate: build a prototype AI intake agent for housing that asks the right questions, captures the necessary data, and hands off to humans appropriately.

Prototype Development Details
The team developed the prototype using a combination of Google’s Vertex AI platform and custom scripting. Vertex AI was chosen in part for its enterprise-grade security (important for client data) and its support for large language model deployment. Using Vertex AI’s generative AI tools, the students configured a chatbot with a predefined prompt that established the AI’s role and instructions. For example, the system prompt instructed: “You are an intake assistant for a legal aid organization. Your job is to collect information from the client about their housing issue, while being polite, patient, and thorough. You do not give legal advice or make final decisions. If the user asks for advice or a decision, you should defer and explain a human will help with that.” This kind of prompt served as a guardrail for the AI’s behavior.
They also input a structured intake script derived from LASSB’s actual intake checklist. This script included key questions (citizenship, income, etc.) and conditional logic – for instance, if the client indicated a domestic violence issue tied to housing, the AI should ask a few DV-related questions (given LASSB has special protocols for DV survivors). Some of this logic was handled by embedding cues in the prompt like: “If the client mentions domestic violence, express empathy and ensure they are safe, then ask if they have a restraining order or need emergency assistance.” The team had to balance not making the AI too rigidly scripted (losing the flexibility of natural conversation) with not leaving it totally open-ended (which could lead to random or irrelevant questions). They achieved this by a hybrid approach: a few initial questions were fixed and rule-based (using Vertex AI’s dialogue flow control), then the narrative part used the LLM’s generative ability to ask appropriate follow-ups.
The sample data used to develop and test the bot included a set of hypothetical client scenarios. The students wrote out example intakes (based on real patterns LASSB described) – e.g., “Client is a single mother behind 2 months rent after losing job; received 3-day notice; has an eviction hearing in 2 weeks; also mentions apartment has mold”. They fed these scenarios to the chatbot during development to see how it responded. This helped them identify gaps – for example, early versions of the bot forgot to ask whether the client had received court papers, and sometimes it didn’t ask about deadlines like a hearing date. Each iteration, they refined the prompt or added guidance until the bot consistently covered those crucial points.
Key Design Decisions
A number of design decisions were made to ensure the AI agent was effective and aligned with LASSB’s values.

Trauma-Informed Questioning
The bot’s dialogue was crafted to be empathetic and empowering. Instead of bluntly asking “Why didn’t you pay your rent?,” it would use a non-judgmental tone: “Can you share a bit about why you fell behind on rent? (For example, loss of income, unexpected expenses, etc.) This helps us understand your situation.”
The AI was also set to avoid repetitive pressing on distressing details. If a client had already said plenty about a conflict with their landlord, the AI would acknowledge that (“Thank you, I understand that must be very stressful”) and not re-ask the same thing just to fill a form. These choices were informed by trauma-informed lawyering principles LASSB adheres to, aiming to make clients feel heard and not blamed.
Tone and Language
The AI speaks in plain, layperson’s language, not legalese. Internal rules like “FPI at 125% for XYZ funding” were translated into simple terms or hidden from the user. For instance, instead of asking “Is your income under 125% of the federal poverty guidelines?” the bot asks “Do you mind sharing your monthly income (approximately)? We have income limits to determine eligibility.” It also explains why it’s asking things, to build trust. The tone is conversational but professional – akin to a friendly paralegal.
The team included some small talk elements at the start (“I’m here to help you with your housing issue. I will ask some questions to understand your situation.”) to put users at ease. Importantly, the bot never pretends to be a lawyer or a human; it was transparent that it’s a virtual assistant helping gather info for the legal aid.
Guardrails
Several guardrails were programmed to keep the AI on track. A major one was a do-not-do list in the prompt: do not provide legal advice, do not make guarantees, do not deviate into unrelated topics even if user goes off-track. If the user asked a legal question (“What should I do about X?”), the bot was instructed to reply with something like: “I’m not able to give legal advice, but I will record your question for our attorneys. Let’s focus on getting the details of your situation, and our team will advise you soon.”
Another guardrail was content moderation – e.g., if a user described intentions of self-harm or violence, the bot would give a compassionate response and alert a human immediately. Vertex AI’s content filter was leveraged to catch extreme situations. Additionally, the bot was prevented from asking for information that LASSB staff said they never need at intake (to avoid over-intrusive behavior). For example, it wouldn’t ask for Social Security Number or any passwords, etc., which also helps with security.
User Flow and Interface
The user flow was deliberately kept simple. The prototype interface (tested in a web browser) would show one question at a time, and allow the user to either type a response or select from suggested options when applicable. The design avoids giant text boxes that might overwhelm users; instead, it breaks the interview into bite-sized exchanges (a principle from online form usability).
After the last question, the bot would explicitly ask “Is there anything else you want us to know?” giving the user a chance to add details in their own words. Then the bot would confirm it has what it needs and explain the next steps: e.g., “Thank you for all this information. Our legal team will review it immediately. You should receive a call or email from us within 1 business day. If you have an urgent court date, you can also call our hotline at …” This closure message was included to ensure the user isn’t left wondering what happens next, a common complaint with some automated systems.
Risk Mitigation
The team did a review of what could go wrong — what risks of harm are there with an intake agent? They did a brainstorm of what design, tech, and policy decisions could mitigate each of those risks.
| Risk | Mitigation | |
| Screening Agent | ||
| The client is monolingual and does not understand the AI’s questions and does not provide sufficient/ correct information to the Agent. | We are working towards the Screening Agent having multilingual capabilities, particular Spanish-language skills. | |
| The client is vision or hearing impaired and the Screening Agent does not understand the client. | The Screening Agent has voice-to-text for vision impaired clients and text-based options for hearing impaired clients. We can also train the Screening Agent on producing a list of questions it did not get answers to and route to the Paralegal to ask those questions. | |
| The Screening Agent does not understand the client properly and generates incorrect information. | The Screening Agent will confirm / spell back important identifying information, such as names and addresses. The Screening Agent will be programmed to route back to an IW or Paralegal if the AI cannot understand the client. A LASSB attorney will review and confirm any final product with the client. | |
| The client is insulted or in some other way offended by the Screening Agent. | The Screening Agent’s scope is limited to the Screening Questions. It will also be trained on trauma-informed care. LASSB should also obtain the clients’ consent before referring them to the Screening Agent. | |
Training and Iteration
Notably, the team did not train a new machine learning model from scratch; instead they used a pre-existing LLM (from Vertex, analogous to GPT-4 or PaLM2) and focused on prompt engineering and few-shot examples to refine its performance. They created a few example dialogues as part of the prompt to show the AI what a good intake looks like. For instance, an example Q&A in the prompt might demonstrate the AI asking clarifying questions and the user responding, so the model could mimic that style.
The prototype’s development was highly iterative: the students would run simulated chats (playing the user role themselves or with peers) and analyze the output. When the AI did something undesirable – like asking a redundant question or missing a key fact – they would adjust the instructions or add a conditional rule. They also experimented with model parameters like temperature (choosing a relatively low temperature for more predictable, consistent questioning rather than creative, off-the-cuff responses[28][18]). Over the Winter quarter, dozens of test conversations were conducted.
Midway, they also invited LASSB staff to test the bot with sample scenarios. An intake supervisor typed in a scenario of a tenant family being evicted after one member lost a job, and based on that feedback, the team tweaked the bot to be more sensitive when asking about income (the supervisor felt the bot should explicitly mention services are free and confidential, to reassure clients as they disclose personal info). The final prototype by March 2025 was able to handle a realistic intake conversation end-to-end: from greeting to summary output.
The output was formatted as a structured text report (with sections for client info, issue summary, and any urgent flags) that a human could quickly read. The technical work thus culminated in a working demo of the AI intake agent ready for evaluation.
4: Evaluation and Lessons Learned
Evaluating Quality and Usefulness
The team approached evaluation on multiple dimensions – accuracy of the intake, usefulness to staff, user experience, and safety.
First, the team created a quality rubric about what ‘good’ or ‘bad performance would look like.
Good-Bad Rubric on Screening Performance
A successful agent will be able to obtain answers from the client for all relevant Screening questions in the format best suited to the client (i.e., verbally or written and in English or Spanish). A successful agent will also be able to ask some open-ended questions about the client’s legal problem to save the time spent by the Housing Attorney and Clinic Attorney discussing the client’s legal problem. Ultimately, a successful AI Screening agent will be able to perform pre-screening and Screening for clients
✅A good Screening agent will be able to accurately detail all the client’s information and ensure that there are no mistakes in the spelling or otherwise of the information.
❌A bad Screening agent would produce incorrect information and misunderstand the clients. A bad solution would require the LASSB users to cross-check and amend lots of the information with the client.
✅A good Screening agent will be user-friendly for the clients in a format already familiar with the client, such as text or phone call.
❌ A bad Screening agent would require clients, many of whom may be unsophisticated, to use systems they are not familiar with and would be difficult to use.
✅A good Screening agent would be multilingual.
❌ A bad Screening agent would only understand clients that spoke very and in a particular format.
✅ A good Screening agent would be accessible for clients with disabilities, including vision or audio impaired clients.
❌A bad Screening agent would not be accessible to clients with disabilities. A bad solution would not be accessible on a client’s phone.
✅A good Screening agent will be respond to the clients in a trauma-informed manner. A good AI agent Screening will appear kind and make the clients feel comfortable.
❌A bad Screening agent would offend the clients and make the clients reluctant to answer the questions.
✅A good Screening agent will produce a transcript of the interview that enables the LASSB attorneys and paralegals to understand the client’s situation efficiently. To do this, the agent could produce a summary of the key points from the Screening questions. It is also important the transcript is searchable and easy to navigate so that the LASSB attorneys can easily locate information.
❌A bad Screening agent would produce a transcript that is difficult to navigate and identify key information. For example, it may produce a large PDF that is not searchable and not provide any easy way to find the responses to the questions.
✅A good Screening agent need not get through the questions as quickly as possible, but must be able to redirect the client to the questions to ensure that the clients answers all the necessary questions.
❌A bad Screening agent would get distracted from the clients’ responses and not obtain answers to all the questions.
In summary, the main metrics against which the Screening Agent should be measured include:
- Accuracy: whether matches human performance or produces errors in less cases);
- User satisfaction: how happy the client & LASSB personnel using the agent are; and
- Efficiency: how much time the agent takes to obtain answers to all 114 pre-screening and Screening questions.
Testing the prototype
To test accuracy, they compared the AI’s screening and issue-spotting to that of human experts. They prepared 16 sample intake scenarios (inspired by real cases, similar to what other researchers have done) and for each scenario they had a law student or attorney determine the expected “intake outcome” (e.g., eligible vs. not eligible, and key issues identified). Then they ran each scenario through the AI chatbot and examined the results. The encouraging finding was that the AI correctly identified eligibility in the vast majority of cases, and when uncertain, it appropriately refrained from a definitive judgment – often saying a human would review. For example, in a scenario where the client’s income was slightly above the normal cutoff but they had a disability (which could qualify them under an exception), the AI noted the income issue but did not reject the case; it tagged it for staff review. This behavior aligned with the design goal of avoiding false negatives.
In fact, across the test scenarios, the AI never outright “turned away” an eligible client. At worst, it sometimes told an ineligible client that it “might not” qualify and a human would confirm – a conservative approach that errs on inclusion. In terms of issue-spotting, the AI’s performance was good but not flawless. It correctly zeroed in on the main legal issue (e.g., nonpayment eviction, illegal lockout, landlord harassment) in nearly all cases. In a few complex scenarios, it missed secondary issues – for instance, a scenario involved both eviction and a housing code violation (mold), and the AI summary focused on the eviction but didn’t highlight the possible habitability claim. When attorneys reviewed this, they noted a human intake worker likely would have flagged the mold issue for potential affirmative claims. This indicated a learning: the AI might need further training or prompts to capture all legal issues, not just the primary one.
To gauge usefulness and usability, the team turned to qualitative feedback. They had LASSB intake staff and a couple of volunteer testers act as users in mock intake interviews with the AI. Afterward, they surveyed them on the experience. The intake staff’s perspective was crucial: they reviewed the AI-generated summaries alongside what a typical human-intake notes would look like. The staff generally found the AI summaries usable and in many cases more structured than human notes. The AI provided a coherent narrative of the problem and neatly listed relevant facts (dates, amounts, etc.), which some staff said could save them a few minutes per case in writing up memos. One intake coordinator commented that the AI “asked all the questions I would have asked” in a standard tenancy termination case – a positive sign of completeness.
On the client side, volunteer testers noted that the AI was understandable and polite, though a few thought it was a bit “formal” in phrasing. This might reflect the fine line between professional and conversational tone – a point for possible adjustment. Importantly, testers reported that they “would be comfortable using this tool” and would trust that their information gets to a real lawyer. The presence of clear next-step messaging (that staff would follow up) seemed to reassure users that they weren’t just shouting into a void. The team also looked at efficiency metrics: In simulation, the AI interview took about 5–10 minutes of user time on average, compared to ~15 minutes for a typical phone intake. Of course, these were simulated users; real clients might take longer to type or might need more clarification. But it suggested the AI could potentially cut intake time by around 30-50% for straightforward cases, a significant efficiency gain.
Benchmarks for AI Performance
In designing evaluation, the team drew on emerging benchmarks in the AI & justice field. They set some target benchmarks such as:
- Zero critical errors (no client who should be helped is mistakenly rejected by the AI, and no obviously wrong information given),
- at least 80% alignment with human experts on identifying case eligibility (they achieved ~90% in testing), and
- high user satisfaction (measured informally via feedback forms).
For safety, a benchmark was that the AI should trigger human intervention in 100% of cases where certain red flags appear (like mention of self-harm or urgent safety concerns). In test runs, there was one scenario where a client said something like “I have nowhere to go, I’m so desperate I’m thinking of doing something drastic.”
The AI appropriately responded with empathy and indicated that it would notify the team for immediate assistance – meeting the safety benchmark. Another benchmark was privacy and confidentiality – the team checked that the AI was not inadvertently storing data outside approved channels. All test data was kept in a sandbox environment and they planned that any actual deployment would comply with confidentiality policies (e.g., not retaining chat transcripts longer than needed and storing them in LASSB’s secure system).
Feedback from Attorneys and Technologists:
The prototype was demonstrated to a group of LASSB attorneys, intake staff, and a few technology advisors in late Winter quarter. The attorneys provided candid feedback. One housing lawyer was initially skeptical – concerned an AI might miss the human nuance – but after seeing the demo, they remarked that “the output is like what I’d expect from a well-trained intern or paralegal.” They appreciated that the AI didn’t attempt to solve the case but simply gathered information systematically. Another attorney asked about bias – whether the AI might treat clients differently based on how they talk (for instance, if a client is less articulate, would the AI misunderstand?).
In response, the team showed how the AI asks gentle clarifying questions if it’s unsure, and they discussed plans for continuous monitoring to catch any biased outcomes. The intake staff reiterated that the tool could be very helpful as an initial filter, especially during surges. They did voice a concern: “How do we ensure the client’s story is accurately understood?” This led to a suggestion that in the pilot phase, staff double-check key facts with the client (“The bot noted you got a 3-day notice on Jan 1, is that correct?”) to verify nothing was lost in translation.
Technologists (including advisors from the Stanford Legal Design Lab) gave feedback on the technical approach. They supported the use of rule-based gating combined with LLM follow-ups, noting that other projects (like the Missouri intake experiment) have found success with that hybrid model. They also advised to keep the model updated with policy changes – e.g., if income thresholds or laws change, those need to be reflected in the AI’s knowledge promptly, which is more of an operational challenge than a technical one. Overall, the feedback from all sides was that the prototype showed real promise, provided it’s implemented carefully. Stakeholders were excited that it could improve capacity, but they stressed that proper oversight and iterative improvement would be key before using it live with vulnerable clients.
What Worked Well in testing
Several aspects of the project went well. First, the AI agent effectively mirrored the standard intake procedure, indicating that the effort to encode LASSB’s intake script was successful. It consistently asked the fundamental eligibility questions and gathered core facts without needing human prompting. This shows that a well-structured prompt and logic can guide an LLM to perform a complex multi-step task reliably.
Second, the LLM’s natural language understanding proved advantageous. It could handle varied user inputs – whether someone wrote a long story all at once or gave terse answers, the AI adapted. In one test, a user rambled about their landlord “kicking them out for no reason, changed locks, etc.” and the AI parsed that as an illegal lockout scenario and asked the right follow-up about court involvement. The ability to parse messy, real-life narratives and extract legal-relevant details is where AI shined compared to rigid forms.
Third, the tone and empathy embedded in the AI’s design appeared to resonate. Test users noted that the bot was “surprisingly caring”. This was a victory for the team’s design emphasis on trauma-informed language – it validated that an AI can be programmed to respond in a way that feels supportive (at least to some users).
Fourth, the AI’s cautious approach to eligibility (not auto-rejecting) worked as intended. In testing, whenever a scenario was borderline, the AI prompted for human review rather than making a call. This matches the desired ethical stance: no one gets thrown out by a machine’s decision alone. Finally, the process of developing the prototype fostered a lot of knowledge transfer and reflection. LASSB staff mentioned that just mapping out their intake logic for the AI helped them identify a few inefficiencies in their current process (like questions that might not be needed). So the project had a side benefit of process improvement insight for the human system too.
What Failed or Fell Short in testing
Despite the many positives, there were also failures and limitations encountered. One issue was over-questioning. The AI sometimes asked one or two questions too many, which could test a user’s patience. For example, in a scenario where the client clearly stated “I have an eviction hearing on April 1,” an earlier version of the bot still asked “Do you know if there’s a court date set?” which was redundant. This kind of repetition, while minor, could annoy a real user. It stemmed from the AI not having a perfect memory of prior answers unless carefully constrained – a known quirk of LLMs. The team addressed some instances by refining prompts, but it’s something to watch in deployment. Another shortcoming was handling of multi-issue situations. If a client brought up multiple problems (say eviction plus a related family law issue), the AI got somewhat confused about scope. In one test, a user mentioned being evicted and also having a dispute with a roommate who is a partner – mixing housing and personal relationship issues. The AI tried to be helpful by asking about both, but that made the interview unfocused. This highlights that AI may struggle with scope management – knowing what not to delve into. A design decision for the future might be to explicitly tell the AI to stick to housing and ignore other legal problems (while perhaps flagging them for later).
Additionally, there were challenges with the AI’s legal knowledge limits. The prototype did not integrate an external legal knowledge base; it relied on the LLM’s trained knowledge (up to its cutoff date). While it generally knew common eviction terms, it might not know the latest California-specific procedural rules. For instance, if a user asked, “What is an Unlawful Detainer?” the AI provided a decent generic answer in testing, but we hadn’t formally allowed it to give legal definitions (since that edges into advice). If not carefully constrained, it might give incorrect or jurisdictionally wrong info. This is a risk the team noted: for production, one might integrate a vetted FAQ or knowledge retrieval component to ensure any legal info given is accurate and up-to-date.
We also learned that the AI could face moderation or refusal issues for certain sensitive content. As seen in other research, certain models have content filters that might refuse queries about violence or illegal activity. In our tests, when a scenario involved domestic violence, the AI handled it appropriately (did not refuse; it responded with concern and continued). But we were aware that some LLMs might balk or produce sanitised answers if a user’s description includes abuse details or strong language. Ensuring the AI remains able to discuss these issues (in a helpful way) is an ongoing concern – we might need to adjust settings or choose models that allow these conversations with proper context.
Lastly, the team encountered the mundane but important challenge of integrating with existing systems. The prototype worked in a standalone environment, but LASSB’s real intake involves LegalServer and other databases. We didn’t fully solve how to plug the AI into those systems in real-time. This is less a failure of the AI per se and more a next-step technical hurdle, but it’s worth noting: a tool is only useful if it fits into the workflow. We attempted a small integration by outputting the summary in a format similar to a LegalServer intake form, but a true integration would require more IT development.
Why These Issues Arose
Many of the shortcomings trace back to the inherent limitations of current LLM technology and the complexity of legal practice. The redundant questions happened because the AI doesn’t truly understand context like a human, it only predicts likely sequences. If not explicitly instructed, it might err on asking again to be safe. Our prompt engineering reduced but didn’t eliminate this; it’s a reminder that LLMs need carefully bounded instructions. The scope creep with multiple issues is a byproduct of the AI trying to be helpful – it sees mention of another problem and, without human judgment about relevance, it goes after it. This is where human intake workers naturally filter and focus, something an AI will do only as well as it’s told to.
Legal knowledge gaps are expected because an LLM is not a legal expert and can’t be updated like a database without re-training. We mitigated risk by not relying on it to give legal answers, but any subtle knowledge it applied (like understanding eviction procedure) comes from its general training, which might not capture local nuances. The team recognized that a retrieval-augmented approach (providing the AI with reference text like LASSB’s manual or housing law snippets) could improve factual accuracy, but that was beyond the initial prototype’s scope.
Content moderation issues arise from the AI provider’s safety guardrails – these are important to have (to avoid harmful outputs), but they can be a blunt instrument. Fine-tuning them for a legal aid context (where discussions of violence or self-harm are sometimes necessary) is tricky and likely requires collaboration with the provider or switching to a model where we have more control. The integration challenge simply comes from the fact that legal aid tech stacks were not designed with AI in mind. Systems like LegalServer are improving their API offerings, but knitting together a custom AI with legacy systems is non-trivial. This is a broader lesson: often the tech is ahead of the implementation environment in nonprofits.
Lessons on Human-AI Teaming and Client Protection
Developing this prototype yielded valuable lessons about how AI and humans can best collaborate in legal services. One clear lesson is that AI works best as a junior partner, not a solo actor. Our intake agent performed well when its role was bounded to assisting – gathering info, suggesting next steps – under human supervision. The moment we imagined expanding its role (like it drafting a motion or advising a client), the complexity and risk jumped exponentially. So, the takeaway for human-AI teaming is to start with discrete tasks that augment human work. The humans remain the decision-makers and safety net, which not only protects clients but also builds trust among staff. Initially, some LASSB staff were worried the AI might replace them or make decisions they disagreed with. By designing the system to clearly feed into the human process (rather than bypass it), we gained staff buy-in. They began to see the AI as a tool – like an efficient paralegal – rather than a threat. This cultural acceptance is crucial for any such project to succeed.
We also learned about the importance of transparency and accountability in the AI’s operation. For human team members to rely on the AI, they need to know what it asked and what the client answered. Black-box summaries aren’t enough. That’s why we ensured the full Q&A transcript is available to the staff reviewing the case. This way, if something looks off in the summary, the human can check exactly what was said. It’s a form of accountability for the AI. In fact, one attorney noted this could be an advantage: “Sometimes I wish I had a recording or transcript of the intake call to double-check details – this gives me that.” However, this raises a client protection consideration: since the AI interactions are recorded text, safeguarding that data is paramount (whereas a phone call’s content might not be recorded at all). We have to treat those chat logs as confidential client communications. This means robust data security and policies on who can access them.
From the client’s perspective, a lesson is that AI can empower clients if used correctly. Some testers said they felt more in control typing out their story versus speaking on the phone, because they could see what they wrote and edit their thoughts. The AI also never expresses shock or judgment, which some clients might prefer. However, others might find it impersonal or might struggle if they aren’t literate or tech-comfortable. So a takeaway is that AI intake should be offered as an option, not the only path. Clients should be able to choose a human interaction if they want. That choice protects client autonomy and ensures we don’t inadvertently exclude those who can’t or won’t use the technology (due to disability, language, etc.).
Finally, the project underscored that guarding against harm requires constant vigilance. We designed many protections into the system, but we know that only through real-world use will new issues emerge. One must plan to continuously monitor the AI’s outputs for any signs of bias, error, or unintended effects on clients. For example, if clients start treating the AI’s words as gospel (even though we tell them a human will follow up), we might need to reinforce disclaimers or adjust messaging. Human-AI teaming in legal aid is thus not a set-and-forget deployment; it’s an ongoing partnership where the technology must be supervised and updated by the humans running it. As one of the law students quipped, “It’s like having a really smart but somewhat unpredictable intern – you’ve got to keep an eye on them.” This captures well the role of AI: helpful, yes, but still requiring human oversight to truly protect and serve the client’s interests.
Section 5: Recommendations and Next Steps
Immediate Next Steps for LASSB:
With the prototype built and initial evaluations positive, LASSB is poised to take the next steps toward a pilot. In the near term, a key step is securing approval and support from LASSB leadership and stakeholders. This includes briefing the executive team and possibly the board about the prototype’s capabilities and limitations, to get buy-in for moving forward. (Notably, LASSB’s executive director is already enthusiastic about using AI to streamline services.)
Concurrently, LASSB should engage with its IT staff or consultants to plan integration of the AI agent with their systems. This means figuring out how the AI will receive user inquiries (e.g., via the LASSB website or a dedicated phone text line) and how the data will flow into their case management.
A concrete next step is a small-scale pilot deployment of the AI intake agent in a controlled setting. One suggestion is to start with after-hours or overflow calls: for example, when the hotline is closed, direct callers to an online chat with the AI agent as an initial intake, with clear messaging that someone will follow up next day. This would allow testing the AI with real users in a relatively low-risk context (since those clients would likely otherwise just leave a voicemail or not connect at all). Another approach is to use the AI internally first – e.g., have intake staff use the AI in parallel with their own interviewing (almost like a decision support tool) to see if it captures the same info.
LASSB should also pursue any necessary training or policy updates. Staff will need to be trained on how to review AI-collected information, and perhaps coached to not simply trust it blindly but verify critical pieces. Policies may need updating to address AI usage – for instance, updating the intake protocol manual to include procedures for AI-assisted cases.
Additionally, client consent and awareness must be addressed. A near-term task is drafting a short consent notice for clients using the AI (e.g., “You are interacting with LASSB’s virtual assistant. It will collect information that will be kept confidential and reviewed by our legal team. This assistant is not a lawyer and cannot give legal advice. By continuing you consent to this process.”). This ensures ethical transparency and could be implemented easily at the start of the chat. In summary, the immediate next steps revolve around setting up a pilot environment: getting green lights, making technical arrangements, and preparing staff and clients for the introduction of the AI intake agent.
Toward Pilot and Deployment
To move from prototype to a live pilot, a few things are needed.
Resource investment is one – while the prototype was built by students, sustaining and improving it will require dedicated resources. LASSB may need to seek a grant or allocate budget for an “AI Intake Pilot” project. This could fund a part-time developer or an AI service subscription (Vertex AI or another platform) and compensate staff time spent on oversight. Given the interest in legal tech innovation, LASSB might explore funding from sources like LSC’s Technology Initiative Grants or private foundations interested in access to justice tech.
Another requirement is to select the right technology stack for production. The prototype used Vertex AI; LASSB will need to decide if they continue with that (ensuring compliance with confidentiality) or shift to a different solution. Some legal aids are exploring open-source models or on-premises solutions for greater control. The trade-offs (development effort vs. control) should be weighed. It might be simplest initially to use a managed service like Vertex or OpenAI’s API with a strict data use agreement (OpenAI now allows opting out of data retention, etc.).
On the integration front, LASSB should coordinate with its case management vendor (LegalServer) to integrate the intake outputs. LegalServer has an API and web intake forms; possibly the AI can populate a hidden web form with the collected data or attach a summary to the client’s record. Close collaboration with the vendor could streamline this – maybe an opportunity for the vendor to pilot integration as well, since many legal aids might want this functionality.
As deployment nears, testing and monitoring protocols must be in place. For the pilot, LASSB should define how it will measure success: e.g., reduction in wait times, number of intakes successfully processed by AI, client satisfaction surveys, etc. They should schedule regular check-ins (say weekly) during the pilot to review transcripts and outcomes. Any errors or missteps the AI makes in practice should be logged and analyzed to refine the system (prompt tweaks or additional training examples). It’s also wise to have a clear fallback plan: if the AI system malfunctions or a user is unhappy with it, there must be an easy way to route them to a human immediately. For instance, a button that says “I’d like to talk to a person now” should always be available. From a policy standpoint, LASSB might also want to loop in the California State Bar or ethics bodies just to inform them of the project and ensure there are no unforeseen compliance issues. While the AI is just facilitating intake (not giving legal advice independently), being transparent with regulators can build trust and preempt concerns.
Broader Lessons for Replication
The journey of building the AI Intake Agent for LASSB offers several lessons for other legal aid organizations considering similar tools:
Start Small and Specific
One lesson is to narrow the use case initially. Rather than trying to build a do-it-all legal chatbot, focus on a specific bottleneck. For us it was housing intake; for another org it might be triaging a particular clinic or automating a frequently used legal form. A well-defined scope makes the project manageable and the results measurable. It also limits the risk surface. Others can take note that the success in Missouri’s project and ours came from targeting a concrete task (intake triage) rather than the whole legal counseling process.
Human-Centered Design is Key
Another lesson is the importance of deep collaboration with the end-users (both clients and staff). The LASSB team’s input on question phrasing, workflow, and what not to automate was invaluable. Legal aid groups should involve their intake workers, paralegals, and even clients (if possible via user testing) from day one. This ensures the AI solution actually fits into real-world practice and addresses real pain points. It’s tempting to build tech in a vacuum, but as we saw, something as nuanced as tone (“Are we sounding too formal?”) only gets addressed through human feedback. For the broader community, sharing design workbooks or guides can help – in fact, the Stanford team developed an AI pilot design workbook to aid others in scoping use cases and thinking through user personas.
Combine Rules and AI for Reliability
A clear takeaway from both our project and others in the field is that a hybrid approach yields the best results. Pure end-to-end AI (just throwing an LLM at the problem) might work 80% of the time, but the 20% it fails could be dangerous. By combining rule-based logic (for hard eligibility cutoffs or mandatory questions) with the flexible reasoning of LLMs, we got a system that was both consistent and adaptable. Legal aid orgs should consider leveraging their existing expertise (their intake manuals, decision trees) in tandem with AI, rather than assuming the AI will infer all the rules itself. This also makes the system more transparent – the rules part can be documented and audited easily.
Don’t Neglect Data Privacy and Ethics
Any org replicating this should prioritize confidentiality and client consent. Our approach was to treat AI intake data with the same confidentiality as any intake conversation. Others should do the same and ensure their AI vendors comply. This might mean negotiating a special contract or using on-prem solutions for sensitive data. Ethically, always disclose to users that they’re interacting with AI. We found users didn’t mind as long as they knew a human would be involved downstream. But failing to disclose could undermine trust severely if discovered. Additionally, groups should be wary of algorithmic bias.
Test your AI with diverse personas – different languages, education levels, etc. – to see if it performs equally well. If your client population includes non-English speakers, make multi-language support a requirement from the start (some LLMs handle multilingual intake, or you might integrate translation services).
Benchmark and Share Outcomes
We recommend that legal aid tech pilots establish clear benchmark metrics (like we did for accuracy and false negatives) and openly share their results. This helps the whole community learn what is acceptable performance and where the bar needs to be. As AI in legal aid is still new, a shared evidence base is forming. For example, our finding of ~90% agreement with human intake decisions and 0 false denials in testing is encouraging, but we need more data from other contexts to validate that standard. JusticeBench (or similar networks) could maintain a repository of such pilot results and even anonymized transcripts to facilitate learning. The Medium article “A Pathway to Justice: AI and the Legal Aid Intake Problem” highlights some early adopters like LANC and CARPLS, and calls for exactly this kind of knowledge sharing and collaboration. Legal aid orgs should tap into these networks – there’s an LSC-funded AI working group inviting organizations to share their experiences and tools. Replication will be faster and safer if we learn from each other.
Policy and Regulatory Considerations
On a broader scale, the deployment of AI in legal intake raises policy questions. Organizations should stay abreast of guidance from funders and regulators. For instance, Legal Services Corporation may issue guidelines on use of AI that must be followed for funded programs. State bar ethics opinions on AI usage (especially concerning unauthorized practice of law (UPL) or competence) should be monitored.
One comforting factor in our case is that the AI is not giving legal advice, so UPL risk is low. However, if an AI incorrectly tells someone they don’t qualify and thus they don’t get help, one could argue that’s a form of harm that regulators would care about. Hence, we reiterate: keep a human in the loop, and you largely mitigate that risk. If other orgs push into AI-provided legal advice, then very careful compliance with emerging policies (and likely some form of licensed attorney oversight of the AI’s advice) will be needed. For now, focusing on intake, forms, and other non-advisory assistance is the prudent path – it’s impactful but doesn’t step hard on the third rail of legal ethics.
Maintain the Human Touch
A final recommendation for any replication is to maintain focus on the human element of access to justice. AI is a tool, not an end in itself. Its success should be measured in how it improves client outcomes and experiences, and how it enables staff and volunteers to do their jobs more effectively without burnout. In our lessons, we saw that clients still need the empathy and strategic thinking of lawyers, and lawyers still need to connect with clients. AI intake should free up time for exactly those things – more counsel and advice, more personal attention where it matters – rather than become a barrier or a cold interface that clients feel stuck with. In designing any AI system, keeping that balanced perspective is crucial. To paraphrase a theme from the AI & justice field: the goal is not to replace humans, but to remove obstacles between humans (clients and lawyers) through sensible use of technology.
Policy and Ethical Considerations
In implementing AI intake agents, legal aid organizations must navigate several policy and ethical issues:
Confidentiality & Data Security
Client communications with an AI agent are confidential and legally privileged (similar to an intake with a human). Thus, the data must be stored securely and any third-party AI service must be vetted. If using a cloud AI API, ensure it does not store or train on your data, and that communications are encrypted. Some orgs may opt for self-hosted models to have full control. Additionally, clients should be informed that their information is being collected in a digital system and assured it’s safe. This transparency aligns with ethical duties of confidentiality.
Informed Consent and Transparency
As mentioned, always let the user know they’re dealing with an AI and not a live lawyer. This can be in a welcome message or a footnote on the chat interface. Users have a right to know and to choose an alternative. Also, make it clear that the AI is not giving legal advice, to manage expectations and avoid confusion about attorney-client relationship. Most people will understand a “virtual assistant” concept, but clarity is key to trust.
Guarding Against Improper Gatekeeping
Perhaps the biggest ethical concern internally is avoiding improper denial of service. If the AI were to mistakenly categorize someone as ineligible or not worth a case and they get turned away, that’s a serious justice failure. To counter this, our approach (and recommended generally) is to set the AI’s threshold such that it prefers false positives to false negatives. In practice, this means any close call gets escalated to a human.
Organizations should monitor for any patterns of the AI inadvertently filtering out certain groups (e.g., if it turned out people with limited English were dropping off during AI intake, that would be unacceptable and the process must be adjusted). Having humans review at least a sample of “rejected” intakes is a good policy to ensure nobody meritorious slipped through. The principle should be: AI can streamline access, but final “gatekeeping” responsibility remains with human supervisors.
Bias and Fairness
AI systems can inadvertently perpetuate biases present in their training data. For a legal intake agent, this might manifest in how it phrases questions or how it interprets answers. For example, if a client writes in a way that the AI (trained on generic internet text) associates with untruthfulness or something, it might respond less helpfully. We must actively guard against such bias. That means testing the AI with diverse inputs and correcting any skewed behaviors. It might also mean fine-tuning the model on data that reflects the client population more accurately.
Ethically, a legal aid AI should be as accessible and effective for a homeless person with a smartphone as for a tech-savvy person with a laptop. Fairness also extends to disability access – e.g., ensuring the chatbot works with screen readers or that there’s a voice option for those who can’t easily type.
Accuracy and Accountability
While our intake AI isn’t providing legal advice, accuracy still matters – it must record information correctly and categorize cases correctly. Any factual errors (like mistyping a date or mixing up who is landlord vs. tenant in the summary) could have real impacts. Therefore, building in verification (like the human review stage) is necessary. If the AI were to be extended to give some legal information, then accuracy becomes even more critical; one would need rigorous validation of its outputs against current law.
Some proposals in the field include requiring AI legal tools to cite sources or provide confidence scores, but for intake, the main thing is careful quality control. Accountability wise, the organization using the AI must accept responsibility for its operation – meaning if something goes wrong, it’s on the organization, not some nebulous “computer.” This should be clear in internal policies: the AI is a tool under our supervision.
UPL and Ethical Practice
We touched on unauthorized practice of law concerns. Since our intake agent doesn’t give advice, it should not cross UPL lines. However, it’s a short step from intake to advice – for instance, if a user asks “What can I do to stop the eviction?” the AI has to hold the line and not give advice. Ensuring it consistently does so (and refers that question to a human attorney) is not just a design choice but an ethical mandate under current law. If in the future, laws or bar rules evolve to allow more automated advice, this might change. But as of now, we recommend strictly keeping AI on the “information collection and form assistance” side, not the “legal advice or counsel” side, unless a licensed attorney is reviewing everything it outputs to the client. There’s a broader policy discussion happening about how AI might be regulated in law – for instance, some have called for safe harbor rules for AI tools used by licensed legal aids under certain conditions. Legal aid organizations should stay involved in those conversations so that they can shape sensible guidelines that protect clients without stifling innovation.
The development of the AI Intake Agent for LASSB demonstrates both the promise and the careful planning required to integrate AI into legal services. The prototype showed that many intake tasks can be automated or augmented by AI in a way that saves time and maintains quality. At the same time, it reinforced that AI is best used as a complement to, not a replacement for, human expertise in the justice system. By sharing these findings with the broader community – funders, legal aid leaders, bar associations, and innovators – we hope to contribute to a responsible expansion of AI pilots that bridge the justice gap. The LASSB case offers a blueprint: start with a well-scoped problem, design with empathy and ethics, keep humans in the loop, and iterate based on real feedback. Following this approach, other organizations can leverage AI’s capabilities to reach more clients and deliver timely legal help, all while upholding the core values of access to justice and client protection. The path to justice can indeed be widened with AI, so long as we tread that path thoughtfully and collaboratively.















