Categories
AI + Access to Justice Class Blog Current Projects

AI + Legal Help 2026 class

We are happy to announce the launch of our fourth round of the class “AI for Legal Help”. It is cross-listed at Stanford Law School and Design School.

Students will be working with real-world, public interest legal groups to develop AI solutions in a responsible, practical way — that can help scale out high-need legal services.

Here is the class description:

Want to build AI that actually matters? AI for Legal Help is a two-quarter, hands-on course where law, design, computer science, and policy students team up with legal aid organizations and court self-help centers to take on one of the biggest challenges in tech today: using AI to expand access to justice.

You’ll work directly with real-world partners to uncover where AI could make legal services faster, more scalable, and more effective—while ensuring it’s safe, ethical, and grounded in the realities of public service. From mapping workflows to spotting opportunities, from creating benchmarks and datasets to designing AI “co-pilots” or system proposals, you’ll help shape the future of AI in the justice system.

Along the way, you’ll learn how to evaluate whether AI is the right fit for a task, design human–AI teams that work, build privacy-forward and trustworthy systems, and navigate the policy and change-management challenges of introducing AI into high-stakes environments.

By the end, your team will have produced a substantial, real-world deliverable—such as a UX research report, benchmark dataset, evaluation rubric, system design proposal, or prototype concept—giving you practical experience in public interest technology, AI system design, and leadership engagement. This is your chance to create AI that works for people, in practice, where it’s needed most.

Categories
AI + Access to Justice Current Projects

The Legal Help Task Taxonomy at Jurix ’25

What Legal Help Actually Requires: Building a Task Taxonomy for AI, Research, and Access to Justice

In December 2025, I presented a new piece of research at the JURIX Conference in Turin, Italy, as part of the workshop on AI, Dispute Resolution, and Access to Justice. The workshop brought together legal scholars, technologists, and practitioners from around the world to examine how artificial intelligence is already shaping legal systems—and how it should shape them in the future.

My paper focuses on a deceptively simple question: What do legal help teams and consumers actually do when trying to resolve legal problems?

This question sits at the heart of access to justice. Around the world, billions of people face legal problems without sufficient help. Courts, legal aid organizations, and community groups work tirelessly to close this gap—but the work itself is often invisible, fragmented, and poorly documented. At the same time, AI tools are rapidly being developed for legal use, often without a clear understanding of the real tasks they are meant to support.

The work I presented in Turin proposes a way forward: a Legal Help Task Taxonomy—a structured, shared framework that defines the core tasks involved in legal help delivery, across jurisdictions, problem types, and service models. (See a first version here at the JusticeBench site or at our Airtable version.)

This blog post explains why that taxonomy matters, how it was developed, and what we discussed at JURIX about making it usable and impactful—not just theoretically elegant.


Why a Task Taxonomy for Legal Help?

Legal help work is often described in broad strokes: “legal advice,” “representation,” “self-help,” or “court assistance.” But these labels obscure what actually happens on the ground.

In reality, legal help consists of dozens of discrete tasks:

  • identifying what legal issue is present in a messy life situation,
  • explaining a confusing notice or summons,
  • calculating deadlines,
  • selecting the correct form,
  • helping someone tell their story clearly,
  • preparing evidence,
  • filing documents,
  • following up to ensure nothing is missed.

Some of these tasks are done by lawyers, others by navigators, librarians, court staff, or volunteers. Many are done partly by consumers themselves. Some are repetitive and high-volume; others are complex and high-risk.

Despite this, there has never been a shared, cross-jurisdictional vocabulary for describing these tasks. This absence makes it harder to:

  • study what legal help systems actually do,
  • design technology that fits real workflows,
  • evaluate AI tools responsibly,
  • or collaborate across organizations and states.

Without task-level clarity, we end up talking past each other—using the same words to mean very different things.


How the Task Taxonomy Emerged

The Legal Help Task Taxonomy did not start as a top-down academic exercise. It emerged organically over several years of applied work with:

  • legal aid organizations,
  • court self-help centers,
  • statewide legal help websites,
  • pro bono clinics,
  • and national access-to-justice networks.

As teams tried to build AI tools, improve workflows, and evaluate outcomes, the same problem kept arising: we couldn’t clearly articulate what task a tool was actually performing.

Was a chatbot answering questions—or triaging users?
Was a form tool drafting documents—or just collecting data?
Was an AI system explaining a notice—or giving legal advice?

To address this, we began mapping tasks explicitly, using practitioner workshops, brainstorming sessions, and analysis of real workflows. Over time, patterns emerged across jurisdictions and issue areas.

The result is a taxonomy organized into seven categories of tasks, spanning the full justice journey:

  1. Getting Brief Help (e.g., legal Q&A, document explanation, issue-spotting)
  2. Providing Brief Help (e.g., guide writing, content review, translation)
  3. Service Onboarding (e.g., intake, eligibility verification, conflicts checks)
  4. Work Product (e.g., form filling, narrative drafting, evidence preparation)
  5. Case Management (e.g., scheduling, reminders, filing screening)
  6. Administration & Strategy (e.g., data extraction, grant reporting)
  7. Tech Tooling (e.g., form creation, interview design, user testing)

Each task is defined in plain language, with clear boundaries. The taxonomy is intentionally general—not tied to one legal issue or country—so that teams can collaborate on shared solutions.


What We Discussed at JURIX

Presenting this work at JURIX was particularly meaningful because the audience sits at the intersection of law, AI, and knowledge representation. The discussions went beyond whether a taxonomy is useful (there was broad agreement that it is) and focused instead on how to make it actionable.

Three themes stood out.

1. Tasks as the Right Unit for AI Evaluation

One of the most productive conversations was about evaluation. Rather than asking whether an AI system is “good at legal help,” the taxonomy allows us to ask more precise questions:

  • Can this system accurately explain documents?
  • Can it safely calculate deadlines?
  • Can it help draft narratives without hallucinating facts?

This task-based framing makes it possible to benchmark AI systems honestly—recognizing that some tasks (like rewriting text) may be feasible with general-purpose models, while others (like eligibility determination or deadline calculation) require grounded, jurisdiction-specific data.

2. Usability Matters More Than Completeness

Another theme was usability. A taxonomy that is theoretically comprehensive but practically overwhelming will not be adopted.

At the workshop, we discussed:

  • staging tasks for review in manageable sections,
  • writing definitions in practitioner language,
  • allowing feedback and iteration,
  • and supporting partial adoption (teams don’t need to use every task at once).

The goal is not to impose a rigid structure, but to create a living, testable framework that practitioners recognize as reflecting their real work.

3. Interoperability and Shared Infrastructure

Finally, we discussed how a task taxonomy can serve as connective tissue between other standards—such as legal issue taxonomies, document schemas, and service directories.

By aligning tasks with standards like LIST, Akoma Ntoso, and SALI, the taxonomy can support interoperability across tools and datasets. This is especially important for AI development: shared task definitions make it easier to reuse data, compare results, and avoid duplicating effort.


What Comes Next

The taxonomy presented at JURIX is not the final word. It is a proposal—one that is now moving toward publication and broader validation.

Next steps include:

  • structured review by legal help professionals,
  • refinement based on feedback,
  • use in AI evaluation benchmarks,
  • and integration into JusticeBench as a shared research resource.

Ultimately, the aim is simple but ambitious: to make legal help work visible, describable, and improvable.

If we want AI to genuinely advance access to justice—rather than add confusion or risk—we need to start by naming the work it is meant to support. This task taxonomy is one step toward that clarity.

Categories
AI + Access to Justice Current Projects

AI+A2J 2025 Summit Takeaways

The Stanford Legal Design Lab hosted its second annual AI & Access to Justice Summit as a gathering for leaders from legal aid organizations, technology companies, academia, philanthropists, and private practice. This diverse assembly of professionals gathered to discuss the potential of generative AI, and — most crucially at this moment of Autumn 2025 — to strategize about how to make AI work at scale to address the justice gap.

The summit’s mission was clear: to move beyond the hype cycle and forge a concrete path forward for a sustainable AI & A2J ecosystem across the US and beyond. The central question posed was how the legal community could work as an ecosystem to harness this technology, setting an agenda for 2, 5, and 10-year horizons to create applications, infrastructure, and new service/business models that can get more people access to justice.

The Arc of the Summit

The summit was structured over 2 days to help the diverse participants learn about AI tools, pilots, case studies, and lessons learned for legal teams — and then giving the participants the opportunity to design new interventions and strategies for a stronger AI R&D ecosystem.

Day 1 was dedicated to learning and inspiration, featuring a comprehensive slate of speakers who presented hands-on demonstrations of cutting-edge AI tools, shared detailed case studies of successful pilots, and offered insights from the front lines of legal tech innovation.

Day 1 -> Day 2’s mission

Day 2 was designed to shift the focus from listening to doing, challenging attendees to synthesize the previous day’s knowledge into strategic designs, collaborative agendas, and new partnerships. This structure was designed to build a shared foundation of knowledge before embarking on the collaborative work of building the future.

The Summit began by equipping attendees with a new arsenal of technological capabilities, showcasing the tools that serve as the building blocks for this new era in justice.

Our Key AI + A2J Ecosystem Moment

The key theme of this year’s AI+A2J Summit is building a strong, coordinated R&D ecosystem. This is because our community of legal help providers, researchers, public interest tech-builders, and strategists are at a key moment.

It’s been over 3 years now since the launch of ChatGPT. Where are we going with AI in access to justice?

We are several years into the LLM era now — past the first wave of surprise, demos, and hype — and into the phase where real institutions are deciding what to do with these tools. People are already using AI to solve problems in their everyday lives, including legal problems, whether courts and legal aid organizations are ready or not. That means the “AI moment” is no longer hypothetical: it’s shaping expectations, workflows, and trust right now. But still many justice leaders are confused, overwhelmed, or unsure about how to get to positive impact in this new AI era.

Leaders are not sure how to make progress.

This is exactly why an AI+A2J Summit like this matters. We’re at a pivot point where the field can either coordinate and build durable public-interest infrastructure — or fragment into disconnected experiments that don’t translate into meaningful service capacity. A2J leaders are balancing urgency with caution, and the choices made in the next year or two will set patterns that could last a decade: what gets adopted, what gets regulated, what gets trusted, and what gets abandoned.

What will 2030 look like for A2J?

We have possible rosy futures and we have more devastating ones.

Which of these possible near futures will we have in 2030 for access to justice?

A robust, accessible marketplace of services — where everyone having a problem with their landlord, debt collector, spouse, employer, neighbor, or government can easily get the help they need in the form they want?

Or will we have a hugely underserved public, that’s frustrated and angry, facing an ever-growing asymmetry of robo-filed lawsuits and relying on low-quality AI help?

What is stopping great innovation imapact?

Some of the key things that could stop our community from delivering great outcomes in the next five years include a few big trends:

  • too much chilling regulation,
  • under-performing and -safety tested solutions that lead to bad harms and headlines,
  • not enough money flowing to get to solutions, everyone reinventing the wheel on their own and deliverting fragile and costly local solutions, and
  • a lack of a building substantive, meaninful solutions — instead focusing on small, peripheral tasks.

The primary barriers are not just technical — they’re operational, institutional, and human. Legal organizations need tools that are reliable enough to use with real people, real deadlines, and real consequences. But today, many pilots struggle with consistency, integration into daily workflows, and the basic “plumbing” that makes technology usable at scale: identity management, knowledge management, access controls, and clear accountability when something goes wrong.

Trust is also fragile in high-stakes settings, and the cost of a failure is unusually high. A single under-tested tool can create public harm, undermine confidence internally, and trigger an overcorrection that chills innovation. In parallel, many organizations are already stretched thin and running on complex legacy systems. Without shared standards, shared evaluation, and shared implementation support, the burden of “doing AI responsibly” becomes too heavy for individual teams to carry alone.

At the Summit, we worked on 3 different strategy levels to try to prevent these blocks from pushing us to low impact or a continued status quo.

3 Levels of Strategic Work to Set us towards a Good Ecosystem

The goal of the Summit was to get leaders from across the A2J world to clearly define 3 levels of strategy. That means going beyond the usual strategic track — which is just defining the policies and tech agenda for their internal organization.

This meant focusing on both project mode (what are cool ideas and use cases) and also strategy mode — so we can shape where this goes, rather than react to whatever the market and technology delivers. We’re convening people who are already experimenting with AI in courts, legal aid, libraries, and community justice organizations, and we’re asking them to step back and make intentional choices about what they will build, buy, govern, and measure over the next 12–24 months. The point is to move from isolated pilots to durable capacity: tools that can be trusted, maintained, and integrated into real workflows, with clear guardrails for privacy, security, and quality.

To do that, the Summit is designed to push work at three linked levels of strategy.

The 3 levels of straegy

Strategy Level 1: Internal Org Strategy around AI

First is internal, organizational strategy: what each institution needs to do internally — data governance, procurement standards, evaluation protocols, staff training, change management, and the operational “plumbing” that makes AI usable and safe.

Strategy 2: Ecosystem Strategy

Second is ecosystem strategy, that covers how different A2J organizations can collaborate to increase capacity and impact.

Thinking through an Ecosystem approach to share capacity and improve outcomes

This can scope out what we should build together — shared playbooks, common evaluation and certification approaches, interoperable data and knowledge standards, and shared infrastructure that prevents every jurisdiction from reinventing fragile, costly solutions.

Strategy 3: Towards Big Tech & A2J

Third is strategy vis-à-vis big tech: how the justice community can engage major AI platform providers with clear expectations and leverage — so the next wave of product decisions, safety defaults, partnerships, and pricing structures actually support access to justice rather than widen gaps.

As more people and providers go to Big Tech for their answers and development work, how do we get to better A2J impact and outcomes?

The Summit is ultimately about making a coordinated, public-interest plan now — so that by 2030 we have a legal help ecosystem that is more trustworthy, more usable, more interoperable, and able to serve far more people with far less friction.

The Modern A2J Toolbox: A Growing set of AI-Powered Solutions

Equipping justice professionals with the right technology is a cornerstone of modernizing access to justice. The Summit provided a tour of AI tools available to the community, ranging from comprehensive legal platforms designed for large-scale litigation to custom-built solutions tailored for specific legal aid workflows. This tour of the growing AI toolbox revealed an expanding arsenal of capabilities designed to augment legal work, streamline processes, and extend the reach of legal services.

Research & Case Management Assistants

Teams from many different AI and legal tech teams presented their solutions and explained how they can be used to expand access to justice.

  • Notebook LM: The Notebook LM tool from Google empowers users to create intelligent digital notebooks from their case files and documents. Its capabilities have been significantly enhanced, featuring an expanded context window of up to 1 million tokens, allowing it to digest and analyze vast amounts of information. The platform is fully multilingual, supporting over 100 languages for both queries and content generation. This enables it to generate a wide range of work products, from infographics and slide decks to narrated video overviews, making it a versatile tool for both internal analysis and client communication.
  • Harvey: Harvey is an AI platform built specifically for legal professionals, structured around three core components. The Assistant functions as a conversational interface for asking complex legal questions based on uploaded files and integrated research sources like LexisNexis. The Vault serves as a secure repository for case documents, enabling deep analysis across up to 10,000 different documents at once. Finally, Workflows provide one-click solutions for common, repeatable tasks like building case timelines or translating documents, with the ability for organizations to create and embed their own custom playbooks.
  • Thomson Reuters’ CoCounsel: CoCounsel is designed to leverage an organization’s complete universe of information — from its own internal data and knowledge management systems to the primary law available through Westlaw. This comprehensive integration allows it to automate and assist with tasks across the entire client representation lifecycle, from initial intake and case assessment to legal research and discovery preparation. The platform is built to function like a human colleague, capable of pulling together disparate information sources to efficiently construct the building blocks of legal practice. TR also has an AI for Justice program that leverages CoCounsel and its team to help legal aid organizations.
  • VLex’s Vincent AI: Vincent AI adopts a workflow-based approach to legal tasks, offering dedicated modules for legal research, contract analysis, complaint review, and large-scale document review. Its design is particularly user-friendly for those with “prompting anxiety,” as it can automatically analyze an uploaded document (such as a lease or complaint) and suggest relevant next steps and analyses. A key feature is its ability to process not just text but also audio and video content, opening up powerful applications for tasks like analyzing client intake calls or video interviews to rapidly identify key issues.

AI on Case Management & E-Discovery Platforms

  • Legal Server: As a long-standing case management system, Legal Server has introduced an AI assistant named “Ellis.” The platform’s core approach to AI is rooted in data privacy and relevance. Rather than drawing on the open internet, Ellis is trained exclusively on an individual client organization’s own isolated data repository, including its help documentation, case notes, and internal documents. This ensures that answers are grounded in the organization’s specific context and expertise while maintaining strict client confidentiality.
  • Relativity: Relativity’s e-discovery platform is made available to justice-focused organizations through its “Justice for Change” program. The platform includes powerful generative AI features like AIR for Review, which can analyze hundreds of thousands of documents to identify key people, terms, and events in an investigation. It also features integrated translation tools that support over 100 languages, including right-to-left languages like Hebrew, allowing legal teams to seamlessly work with multilingual case documents within a single, secure environment.

These tools represent a leap in technological capability. They all show the growing ability for AI to help legal teams synthesize info, work with documents, conduct research, produce key work product, and automate workflows. But how do we go from tech tools to real-world impact, solutions that are deployed at scale and get to high performance numbers? The Summit moved from tech demos to case studies to get to accounts of how to get to value and impact.

From Pilots to Impact: AI in Action Across the Justice Sector

In the second half of Day 1, the Summit moved beyond product demonstrations to showcase a series of compelling case studies from across the justice sector. These presentations offered proof points of how organizations are already leveraging AI to serve more people, improve service quality, and create new efficiencies, delivering concrete value to their clients and communities today.

  • Legal Aid Society of Middle Tennessee & The Cumberlands — Automating Expungement Petitions: The “ExpungeMate” project was created to tackle the manual, time-consuming process of reviewing criminal records and preparing expungement petitions. By building a custom GPT to analyze records and an automated workflow to generate the necessary legal forms, the organization dramatically transformed its expungement clinics. At a single event, their output surged from 70 expungements to 751. This newfound efficiency freed up attorneys to provide holistic advice and enabled a more comprehensive service model that brought judges, district attorneys, and clerks on-site to reinstate driver’s licenses and waive court debt in real-time.
  • Citizens Advice (UK) — Empowering Advisors with Caddy: Citizens Advice developed Caddy (Citizens Advice Digital Assistant), an internal chatbot designed to support its network of advisors, particularly new trainees. Caddy uses a Retrieval-Augmented Generation (RAG), a method that grounds the AI’s answers in a private, trusted knowledge base to ensure accuracy and prevent hallucination. A key feature is its “human-in-the-loop” workflow, where supervisors can quickly validate answers before they are given to clients. A six-week trial demonstrated significant impact, with the evaluation found that Caddy halved the response time for advisors seeking supervisory support, unlocking capacity to help thousands more people.
  • Frontline Justice — Supercharging Community Justice Workers To support its network of non-lawyer “justice workers” in Alaska, Frontline Justice deployed an AI tool designed not just as a Q&A bot, but as a peer-to-peer knowledge hub. While the AI provides initial, reliable answers to legal questions, the system empowers senior justice workers to review, edit, and enrich these answers with practical, on-the-ground knowledge like local phone numbers or helpful infographics. This creates a dynamic, collaborative knowledge base where the expertise of one experienced worker in a remote village can be instantly shared with over 200 volunteers across the state.
  • Lone Star Legal Aid — Building a Secure Chatbot Ecosystem Lone Star Legal Aid embarked on an ambitious in-house project to build three distinct chatbots on a secure RAG architecture to serve different user groups. One internal bot, LSLAsks, is for administrative information in their legal aid group. Their internal bot for legal staff, Juris, was designed to centralize legal knowledge and defeat the administrative burden of research. A core part of their strategy involved rigorous A/B testing of four different search models (cleverly named after the Ninja Turtles) to meticulously measure accuracy, relevancy, and speed, with the ultimate goal of eliminating hallucinations and building user trust in the system.
  • People’s Law School (British Columbia) — Ensuring Quality in Public-Facing AI The team behind the public-facing Beagle+ chatbot shared their journey of ensuring high-quality, reliable answers for the public. Their development process involved intensive pre- and post-launch evaluation. Before launch, they used a 42-question dataset of real-world legal questions to test different models and prompts until they achieved 99% accuracy. After launch, a team of lawyers reviewed every single one of the first 5,400 conversations to score them for safety and value, using the findings to continuously refine the system and maintain its high standard of quality.

These successful implementations offered more than just inspiration; they surfaced a series of critical strategic debates that the entire access to justice community must now navigate.

Lessons Learned and Practical Strategies from the First Generation of AI+A2J Work

A consistent “lesson learned” from Day 1 was that legal aid AI only works when it’s treated as mission infrastructure, not as a cool add-on. Leaders emphasized values as practical guardrails: put people first (staff + clients), keep the main thing the main thing (serving clients), and plan for the long term — especially because large legal aid organizations are “big ships” that can’t pivot overnight.

Smart choice of projects: In practice, that means choosing projects that reduce friction in frontline work, don’t distract from service delivery, and can be sustained after the initial burst of experimentation.

An ecosystem of specific solutions: On the build side, teams stressed scoping and architecture choices that intentionally reduce risk. One practical pattern was a “one tool = one problem” approach, with different bots for different users and workflows (internal legal research, internal admin FAQs, and client-facing triage) rather than trying to make a single chatbot do everything.

Building for Security & Privacy forward solutions: Security and privacy were treated as design requirements, not compliance afterthoughts — e.g., selecting an enterprise cloud environment already inside the organization’s security perimeter and choosing retrieval-augmented generation (RAG) to keep answers grounded in verified sources.

Keeping Knowledge Fresh: Teams also described curating the knowledge base (black-letter law + SME guidance) and setting a maintenance cadence so the sources stay trustworthy over time.

Figure out What You’re Measuring & How: On evaluation, Day 1 emphasized that “accuracy” isn’t a vibe — you have to measure it, iterate, and keep monitoring after launch. Practical approaches included: (1) building a small but meaningful test set from real questions, (2) defining what an “ideal answer” must include, and (3) scoring outputs on safety and value across model/prompt/RAG variations.

Teams also used internal testing with non-developer legal staff to ask real workflow questions, paired with lightweight feedback mechanisms (thumbs up/down + reason codes) and operational metrics like citations used, speed, and cost per question. A key implementation insight was that some “AI errors” are actually content errors — post-launch quality improved by fixing source content (even single missing words) and tightening prompts, supported by ongoing monitoring.

Be Ready with Policies & Governance: On deployment governance, teams highlighted a bias toward containment, transparency, and safe failure modes. One practical RAG pattern: show citations down to the page/section, display the excerpt used, and if the system can’t answer from the verified corpus, it should say so — explicitly.

There was also a clear warning about emerging security risks (especially prompt injection and attack surfaces when tools start browsing or pulling from the open internet) and the need to think about cybersecurity as capability scales from pilots to broader use. Teams described practical access controls (like 2FA) and “shareable internal agents” as ways to grow adoption without losing governance.

Be Ready for Data Access Blocks: Several Day 1 discussions surfaced the external blockers that legal aid teams can’t solve alone — especially data access and interoperability with courts and other systems.

Even when internal workflows are ready, teams run into constraints like restrictions on scraping or fragmented, jurisdiction-specific data practices, which makes replication harder and increases costs for every new deployment. That’s one reason the “lessons learned” kept circling back to shared infrastructure: common patterns for grounded knowledge, testing protocols, security hardening, and the data pathways needed to make these tools reliable in day-to-day legal work.

Strategic Crossroads: Key Debates Shaping the Future of the AI+A2J Ecosystem

The proliferation of AI has brought the access to justice community to a strategic crossroads. The Summit revealed that organizations are grappling with fundamental decisions about how to acquire, build, and deploy this technology. The choices made in the coming years will define the technological landscape of the sector, determining the cost, accessibility, and control that legal aid organizations have over their digital futures.

The Build vs. Buy Dilemma

A central tension emerged between building custom solutions and purchasing sophisticated off-the-shelf platforms. We might end up with a ‘yes and’ approach, that involves both.

The Case for Building:

Organizations like Maryland Legal Aid and Lone Star Legal Aid are pursuing an in-house development path. This is not just a cost-and-security decision but a strategic choice about building organizational capacity.

The primary drivers are significantly lower long-term costs — Maryland Legal Aid reported running their custom platform for their entire staff for less than $100 per month — and enhanced data security and privacy, achieved through direct control over the tech stack and zero-data-retention agreements with API providers.

Building allows for the precise tailoring of tools to unique organizational workflows and empowers staff to become creators.

The Case for Buying:

Conversely, presentations from Relativity, Harvey, Thomson Reuters, vLex/Clio, and others showcased the immense power of professionally developed, pre-built platforms. The argument for buying centers on leveraging cutting-edge technology and complex features without the significant upfront investment in hiring and maintaining an in-house development team.

This path offers immediate access to powerful tools for organizations that lack the capacity or desire to become software developers themselves.

Centralized Expertise vs. Empowered End-Users

A parallel debate surfaced around who should be building AI applications. The traditional model, exemplified by Lone Star Legal Aid, involves a specialized technical team that designs and develops tools for the rest of the organization.

In contrast, Maryland Legal Aid presented a more democratized vision, empowering tech-curious attorneys and paralegals to engage in “vibe coding.”

This approach envisions non-technical staff becoming software creators themselves, using new, user-friendly AI development tools to rapidly build and deploy solutions. It transforms end-users into innovators, allowing legal aid organizations to “start solving their own problems” fast, cheaply, and in-house.

Navigating the Role of Big Tech in Justice Services

The summit highlighted the inescapable and growing role of major technology companies in the justice space. The debate here centers on the nature of the engagement.

One path involves close collaboration, such as licensing tools like Notebook LM from Google or leveraging APIs from OpenAI to power custom applications.

The alternative is a more cautious approach that prioritizes advocacy for regulation, taxation and licensing legal orgs’ knowledge and tools, and the implementation of robust public interest protections to ensure that the deployment of large-scale AI serves, rather than harms, the public good.

These strategic debates are shaping the immediate future of legal technology, but the summit also issued a more profound challenge: to use this moment not just to optimize existing processes, but to reimagine the very foundations of justice itself.

AI Beyond Automation: Reimagining the Fundamentals of the Justice System

The conversation at the summit elevated from simply making the existing justice system more efficient to fundamentally transforming it for a new era.

In a thought-provoking remote address, Professor Richard Susskind challenged attendees to look beyond the immediate applications of AI and consider how it could reshape the core principles of dispute resolution and legal help. This forward-looking perspective urged the community to avoid merely automating the past and instead use technology to design a more accessible, preventative, and outcome-focused system of justice.

The Automation Fallacy

Susskind warned against what he termed “technological myopia” — the tendency to view new technology only through the lens of automating existing tasks. He argued that simply replacing human lawyers with AI to perform the same work is an uninspired goal. Using a powerful analogy, he urged the legal community to avoid focusing on the equivalent of “robotic surgery” (perfecting an old process) and instead seek out the legal equivalents of “non-invasive therapy” and “preventative medicine” — entirely new, more effective ways to achieve just outcomes.

Focusing Upstream

This call to action was echoed in a broader directive to shift focus from downstream dispute resolution to upstream interventions. The goal is to leverage technology and data not just to manage conflicts once they arise, but to prevent them from escalating in the first place. This concept was vividly captured by Susskind’s metaphor of a society that is better served by “putting a fence at the top of the cliff rather than an ambulance at the bottom.”

The Future of Dispute Resolution

Susskind posed the provocative question, “Can AI replace judges?” but quickly reframed it to be more productive. Instead of asking if a machine can replicate a human judge, he argued the focus should be on outcomes: can AI systems generate reliable legal determinations with reasons?

He envisioned a future, perhaps by 2030, where citizens might prefer state-supported, AI-underpinned dispute services over traditional courts. In this vision, parties could submit their evidence and arguments to a “comfortingly branded” AI system that could cheaply, cheerfully, and immediately deliver a conclusion, transforming the speed and accessibility of justice.

Achieving such ambitious, long-term visions requires more than just technological breakthroughs; it demands the creation of a practical, collaborative infrastructure to build and sustain this new future.

Building Funding and Capacity for this Work

On the panel about building a National AI + A2J ecosystem, panelists discussed how to increase capacity and impact in this space.

The Need to Make this Space Legible as a Market

The panel framed the “economics” conversation as a market-making challenge: if we want new tech to actually scale in access to justice, we have to make the space legible — not just inspiring. There could be a clearer market for navigation tech in low-income “fork-in-the-road” moments. The panel highlighted that the nascent ecosystem needs three things to become investable and durable:

  • clearly defined problems,
  • shared infrastructure that makes building and scaling easier, and
  • business models that sustain products over time.

A key through-line in the panel’s commentary was: we can’t pretend grant funding alone will carry the next decade of AI+A2J delivery. Panelists suggested we need experimentation to find new payers — for example, employer-funded benefits and EAP dollars, or insurer/health-adjacent funding tied to social determinants of health — paired with stronger evidence that tools improve outcomes. This is connected to the need for shared benchmarks and evaluation methods that can influence how developers build and how funders (and institutions) decide what to back.

A Warning Not to Build New Tech on Bad Processes

The panel also brought a grounding reality check: even the best tech will underperform — or do harm — if it’s layered onto broken processes. Tech projects where tech sat on top of high-default systems contributed to worse outcomes.

The economic implication was clear: funders and institutions should pay for process repair and procedural barrier removal as seriously as they pay for new tools, because the ROI of AI depends on the underlying system actually functioning.

The Role of Impact Investing as a new source of capital

Building this ecosystem requires a new approach to funding. Kate Fazio framed the justice gap as a fundamental “market failure” in the realm of “people law” — the everyday legal problems faced by individuals. She argued that the two traditional sources of capital are insufficient to solve this failure: traditional venture capital is misaligned, seeking massive returns that “people law” cannot generate, while philanthropy is vital but chronically resource-constrained.

The missing piece, Fazio argued, is impact investing: a form of patient, flexible capital that seeks to generate both a measurable social impact and a financial return. This provides a crucial middle ground for funding sustainable, scalable models that may not offer explosive growth but can create enormous social value. But she highlighted a stark reality: of the 17 UN Sustainable Development Goals, Goal 16 (Peace, Justice, and Strong Institutions) currently receives almost no impact investment capital. This presents both a monumental challenge and a massive opportunity for the A2J community to articulate its value and attract a new, powerful source of funding to build the future of justice.

This talk of new capital, market-making, and funding strategies started to point the group to a clear strategic imperative. To overcome the risk of fragmented pilots and siloed innovation, the A2J community must start coalescing into a coherent ecosystem. This means embracing collaborative infrastructure, which can be hand-in-hand with attracting new forms of capital.

By reframing the “market failure” in people law as a generational opportunity for impact investing, the sector can secure the sustainable funding needed to scale the transformative, preventative, and outcome-focused systems of justice envisioned throughout the summit.

Forging an AI+A2J Ecosystem: The Path to Sustainable Scale and Impact

On Day 2, we challenged groups to envision how to build a strong AI and A2J development, evaluation, and market ecosystem. They came up with so many ideas, and we try to capture them below. Much of it is about having common infrastructure, shared capacity, and better ways to strengthen and share organic DIY AI tools.

A significant risk facing the A2J community is fragmentation, a scenario where “a thousand pilots bloom” but ultimately fail to create lasting, widespread change because efforts are siloed and unsustainable. The summit issued a clear call to counter this risk by adopting a collaborative ecosystem approach.

The working groups on Day 2 highlighted some of the key things that our community can work on, to build a stronger and more successful A2J provider ecosystem. This infrastructure-centered strategy emphasizes sharing knowledge, resources, and infrastructure to ensure that innovations are not only successful in isolation but can be sustained, scaled, and adapted across the entire sector.

Throughout the summit, presenters and participants highlighted the essential capacities and infrastructure that individual organizations must develop to succeed with AI. Building these capabilities in every single organization is inefficient and unrealistic. An ecosystem approach recognizes the need for shared infrastructure, including the playbooks, knowledge/data standards, privacy and security tooling, evaluation and certification, and more.

Replicable Playbooks to Prevent Parallel Duplication

Many groups in the Summit called for replicable solutions playbooksthat go beyond sharing repositories on Github and making conference presentations, and getting to the teams and resources that can help more legal teams replicate successful AI solutions and localize them to their jurisdiction and organization.

A2J organizations don’t just need inspiration — they need proven patterns they can adopt with confidence. Replicable “how-tos” turn isolated success stories into field-level capability: how to scope a use case, how to choose a model approach, how to design a safe workflow, how to test and monitor performance, and how to roll out tools to staff without creating chaos. These playbooks reduce the cost of learning, lower risk, and help organizations move from pilots to sustained operations.

Replicable guidance also helps prevent duplication. Right now, too many teams are solving the same early-stage problems in parallel: procurement questions, privacy questions, evaluation questions, prompt and retrieval design, and governance questions. If the field can agree on shared building blocks and publish them in usable formats, innovation becomes cumulative — each new project building on the last instead of starting over.

A Common Agenda of What Tasks-Issues to Build Solutions for

Without a shared agenda, the field risks drifting into fragmentation: dozens of pilots, dozens of platforms, and no cumulative progress. A common agenda does not mean one centralized solution — it means alignment on what must be built together, what must be measured, and what must be stewarded over time. It creates shared language, shared priorities, and shared accountability across courts, legal aid, community organizations, researchers, funders, and vendors.

This is the core reason the Legal Design Lab held the Summit: to convene the people who can shape that shared agenda and to produce a practical roadmap that others can adopt. The goal is to protect this moment from predictable failure modes — over-chill, backlash, duplication, and under-maintained tools — and instead create an ecosystem where responsible innovation compounds, trust grows, and more people get real legal help when they need it.

Evaluation Protocols and Certifications

Groups also called for more, easier evaluation and certification. They want high-quality, standardized methods for evaluation, testing, and long-term maintenance.

In high-stakes legal settings, “seems good” is not good enough. The field needs clear definitions of quality and safety, and credible evaluation protocols that different organizations can use consistently. This doesn’t mean one rigid standard for every tool — but it does mean shared expectations: what must be tested, what must be logged, what harms must be monitored, and what “good enough” looks like for different risk levels.

Certification — or at least standard conformance levels — can also shift the market. If courts and legal aid groups can point to transparent evaluation and safety practices, then vendors and internal builders alike have a clear target. That reduces fear-driven overreaction and replaces it with evidence-driven decision-making. Over time, it supports responsible procurement, encourages better products, and protects the public by making safety and accountability visible.

In addition, creating legal benchmarks for the most common & significant legal tasks can push LLM developers to improve their foundational models for justice use cases

Practical, Clear Privacy Protections

A block for many of the possible solutions is the safe use of AI with highly confidential, risky data. Privacy is not a footnote in A2J — it is the precondition for using AI with real people. Many of the highest-value workflows involve sensitive information: housing instability, family safety, immigration status, disability, finances, or criminal history. If legal teams cannot confidently protect client data, they will either avoid the tools entirely or use them in risky ways that expose clients and organizations to harm.

What is needed is privacy-by-design infrastructure: clear rules for data handling, retention, and access; secure deployment patterns; strong vendor contract terms; and practical training for staff about what can and cannot be used in which tools. The Summit is a place to align on what “acceptable privacy posture” should look like across the ecosystem — so privacy does not become an innovation-killer, and innovation does not become a privacy risk.

More cybersecurity, testing, reliability engineering, and ongoing monitoring

Along with privacy risks, participants noted that many of the organic, DIY solutions are not prepared for cybersecurity risks. As AI tools become embedded in legal workflows, they become targets — both for accidental failures and deliberate attacks. Prompt injection, data leakage, insecure integrations, and overbroad permissions can turn a helpful tool into a security incident. And reliability matters just as much as brilliance: a tool that works 80% of the time may still be unusable in high-stakes practice if the failures are unpredictable.

The field needs a stronger norm of “safety engineering”: threat modeling, red-teaming, testing protocols, incident response plans, and ongoing monitoring after deployment. This is also where shared infrastructure helps most. Individual organizations should not each have to invent cybersecurity practices for AI from scratch. A common set of testing and security baselines would let innovators move faster while reducing systemic risk.

Inter-Agency/Court Data Connections

Many groups need to call up and work with data from other agencies — like court docket files and records, other legal aid groups’ data, and more — in order to get highly effective, AI-powered workflows

Participants called for more standards and data contracts that can facilitate systematic data access, collection, and preparation. Many of the biggest A2J bottlenecks are not about “knowing the law” — they’re about navigating fragmented systems. People have to repeat their story across multiple offices, programs, and portals. Providers can’t see what happened earlier in the journey. Courts don’t receive information in consistent, structured ways. The result is duplication, delay, and drop-off — exactly where AI could help, but only if the data ecosystem supports it.

Many of the biggest A2J bottlenecks are not about “knowing the law” — they’re about navigating fragmented systems. People have to repeat their story across multiple offices, programs, and portals. Providers can’t see what happened earlier in the journey. Courts don’t receive information in consistent, structured ways. The result is duplication, delay, and drop-off — exactly where AI could help, but only if the data ecosystem supports it.

Data Contracts for Interoperable Knowledge Bases

Many local innovators are starting to build out structured, authoritative knowledge on court procedure, forms and documents, strategies, legal authorities, service directories, and more. This knowledge data is built to power their local legal AI solutions, but right now it is stored and saved in unique local ways.

This investment in local authoritative legal knowledge bases makes sense. LLMs are powerful, but they are not a substitute for authoritative, maintainable legal knowledge. The most dependable AI systems in legal help will be grounded in structured knowledge: jurisdiction-specific procedures, deadlines, forms, filing rules, court locations, service directories, eligibility rules, and “what happens next” pathways.

But the worry among participants is that all of these highly localized knowledge bases will be one-off for a specific org or solution. Ideally, when teams are investing in building these local knowledge bases, it can follow some key standard rules so it can perform well and it can be updated, audited, and reused across tools and regions.

This is why knowledge bases and data exchanges are central to the ecosystem approach. Instead of each organization maintaining its own isolated universe of content, we can build shared registries and common schemas that allow local control while enabling cross-jurisdiction learning and reuse. The aim is not uniformity for its own sake — it’s reliability, maintainability, and the ability to scale help without scaling confusion.

More training and change management so legal teams are ready

Even the best tools fail if people don’t adopt them in real workflows. Legal organizations are human systems with deeply embedded habits, risk cultures, and informal processes. Training and change management are not “nice to have” — they determine whether AI becomes a daily capability or a novelty used by a handful of early adopters.

What’s needed is practical, role-based readiness support: training for leadership on governance and procurement, training for frontline staff on safe use and workflow integration, and support for managers who must redesign processes and measure outcomes. The Summit is a step toward building a shared approach to readiness — so the field can absorb change without burnout, fragmentation, or loss of trust.

Building Capability & Lowering Costs of Development/Use

One of the biggest barriers to AI-A2J impact is that the “real” version of these tools — secure deployments, quality evaluation, integration into existing systems, and sustained maintenance — can be unaffordable when each court or legal aid organization tries to do it alone. The result is a familiar pattern: a few well-resourced organizations build impressive pilots, while most teams remain stuck with limited access, short-term experiments, or tools that can’t safely touch real client data.

Coordination is the way out of this trap. When the field aligns on shared priorities and shared building blocks, we reduce duplication and shift spending away from reinventing the same foundational components toward improving what actually matters for service delivery.

Through coordination, the ecosystem can also change the economics of AI itself. Shared evaluation protocols, reference architectures, and standard data contracts mean vendors and platform providers can build once and serve many — lowering per-organization cost and making procurement less risky. Collective demand can also create better terms: pooled negotiation for pricing, clearer requirements for privacy/security, and shared expectations about model behavior and transparency.

Just as importantly, coordinated open infrastructure — structured knowledge bases, service directories, and interoperable intake/referral data — reduces reliance on expensive bespoke systems by making high-value components reusable across jurisdictions.

The goal is not uniformity, but a commons: a set of shared standards and assets that makes safe, high-quality AI deployment feasible for the median organization, not just the best-funded one.

Conclusion

The AI + Access to Justice Summit is designed as a yearly convening point — because this work can’t be finished in a single event. Each year, we’ll take stock of what’s changing in the technology, what’s working on the ground in courts and legal aid, and where the biggest gaps remain. More importantly, we’ll use the Summit to move from discussion to shared commitments: clearer priorities, stronger relationships across the ecosystem, and concrete next steps that participants can carry back into their organizations and collaborations.

We are also building the Summit as a launchpad for follow-through. In the months after convening, we will work with participants to continue progress on common infrastructure: evaluation and safety protocols, privacy and security patterns, interoperable knowledge and data standards, and practical implementation playbooks that make adoption feasible across diverse jurisdictions. The aim is to make innovation cumulative — so promising work does not remain isolated in a single pilot site, but becomes reusable and improvable across the field.

We are deeply grateful to the sponsors who made this convening possible, and to the speakers who shared lessons, hard-won insights, and real examples from the frontlines.

Most of all, thank you to the participants — justice leaders, technologists, researchers, funders, and community partners — who showed up ready to collaborate, challenge assumptions, and build something larger than any single organization can create alone. Your energy and seriousness are exactly what this moment demands, and we’re excited to keep working together toward a better 2030.

Categories
AI + Access to Justice Current Projects

Jurix 2025 AIDA2J Workshop

The Stanford Legal Design Lab is so happy to be a sponsoring co-host of the third consecutive AI and Access to Justice workshop at the JURIX conference. This round, the conference is at Turin, Italy in December 2025. The theme is AI, Dispute Resolution, and Access to Justice.

See the main workshop website here.

The workshop will involve the collaboration of the Suffolk LIT Lab, the Stanford Legal Design Lab, the Maastricht Law and Tech LabLibra.lawVrije Universiteit Brussel, Swansea University, and the University of Turin, and will be part of the larger Jurix 2025 conference hosted this year in Italy. 

The workshop will focus on three topics: 

  • Data issues related to access to justice (building reusable, sharable datasets for research) 
  • AI for access to justice generally 
  • AI for dispute resolution 

The provisional schedule is as follows:

Session 1 – Interfaces and Knowledge Tools

LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents
CourtPressGER: A German Court Decision to Press Release Summarization Dataset
A Voice-First AI Service for People-Centred Justice in Niger
Designing Clarity with AI: Improving the Usability of Case Law Databases of International Courts
CaseConnect: Cross-Lingual Legal Case Retrieval with Semantic Embeddings and Structure-Aware Segmentation
Glitter: Visualizing Lexical Surprisal for Readability in Administrative Texts

Session 2 – Global AI for Legal Help, Prediction and Dispute Resolution

Understanding Rights Through AI: The Role of Legal Chatbots in Access to Justice
The Private Family Forecast: A Predictive Method for an Effective and Informed Access to Justice
Artificial Intelligence Enabled Justice Tools for Refugees in Tanzania
Artificial Intelligence and Access to Justice in Chile
AI and Judicial Transformation: Comparative Analysis of Predictive Tools in EU Labour Law
From Textual Simplification to Epistemic Justice: Rethinking Digital Dispute Resolution Through AI

Session 3 – Workflows, Frameworks and Governance of Legal AI
PLDF – A Private Legal Declarative Document Generation Framework
How the ECtHR Frames Artificial Intelligence: A Distant Reading Analysis
What Legal Help Teams and Consumers Actually Do: A Legal Help Task Taxonomy
Packaging Thematic Analysis as an AI Workflow for Legal Research
Gender Bias in LLMs: Preliminary Evidence from Shared Parenting Scenario in Czech Family Law
AI-Powered Resentencing: Bridging California’s Second-Chance Gap
AI Assistance for Court Review of Default Judgments

Interactive Workshop – Global Legal Data Availability

Categories
AI + Access to Justice Current Projects

AI+A2J Summit 2025

The Stanford Legal Design Lab hosted the second annual AI and Access to Justice Summit on November 20-21, 2025. Over 150 legal professionals, technologists, regulators, strategists, and funders came together to tackle one big question: how can we build a strong, sustainable national/international AI and Access to Justice Ecosystem?

We will be synthesizing all of the presentations, feedback, proposals and discussions into a report that lays out:

  • The current toolbox that legal help teams and users can be employing to accomplish key legal tasks like Q&A, triage and referrals, conducting intake interviews, drafting documents, doing legal research, reviewing draft documents, and more.
  • The strategies, practical steps, and methods with which to design, develop, evaluate, and maintain AI so that it is valuable, safe, and affordable.
  • Exemplary case studies of what AI solutions are being built, how they are being implemented in new service and business models, and how they might be scaled or replicated.
  • An agenda of how to encourage more coordination of AI technology, evaluation, and capability-building, so that successful solutions can be available to as many legal teams and users as possible — and have the largest positive impact on people’s housing, financial, family, and general stability.

Thank you to all of our speakers, participants, and sponsors!

Categories
AI + Access to Justice Class Blog Current Projects Project updates

Legal Aid Intake & Screening AI

A Report on an AI-Powered Intake & Screening Workflow for Legal Aid Teams 

AI for Legal Help, Legal Design Lab, 2025

This report provides a write-up of the AI for Housing Legal Aid Intake & Screening class project, that was one track of the  “AI for Legal Help” Policy Lab, during the Autumn 2024 and Winter 2025 quarters. The AI for Legal Help course involved work with legal and court groups that provide legal help services to the public, to understand where responsible AI innovations might be possible and to design and prototype initial solutions, as well as pilot and evaluation plans.

One of the project tracks was on improving the workflows of legal aid teams who provide housing help, particularly with their struggle of high demand from community members but a lack of clarity on exactly whether a person can be served by the legal aid group & how. Between Autumn 2024 and Winter 2025, an interdisciplinary team of Stanford University students partnered with the Legal Aid Society of San Bernardino (LASSB) to understand the current design of housing intake & screening, and to propose an improved, AI-powered workflow. 

This report details the problem identified by LASSB, the proposed AI-powered intake & screening workflow developed by the student team, and recommendations for future development and implementation. 

We share it in the hopes that legal aid and court help center leadership might also be interested in exploring responsible AI development for demand letters, and that funders, researchers, and technologists might collaborate on developing and testing successful solutions for this task.

Thank you to students in this team: Favour Nerisse, Gretel Cannon, Tatiana Zhang, and other collaborators.. And a big thank you to our LASSB colleagues: Greg Armstrong, Pablo Ramirez, and more.

Introduction

The Legal Aid Society of San Bernardino (LASSB) is a nonprofit law firm serving low-income residents across San Bernardino and Riverside Counties, where housing issues – especially evictions – are the most common legal problems facing the community. Like many legal aid organizations, LASSB operates under severe resource constraints and high demand.

In the first half of 2024 alone, LASSB assisted over 1,200 households (3,261 individuals) with eviction prevention and landlord-tenant support. Yet many more people seek help than LASSB can serve, and those who do seek help often face barriers like long hotline wait times or lack of transportation to clinics. These challenges make the intake process – the initial screening and information-gathering when a client asks for help – a critical bottleneck. If clients cannot get through intake or are screened out improperly, they effectively have no access to justice.

Against this backdrop, LASSB partnered with a team of Stanford students in the AI for Legal Help practicum to explore an AI-based solution. The task selected was housing legal intake: using an AI “Intake Agent” to streamline eligibility screening and initial fact-gathering for clients with housing issues (especially evictions). The proposed solution was a chatbot-style AI assistant that could interview applicants about their legal problem and situation, apply LASSB’s intake criteria, and produce a summary for legal aid staff. By handling routine, high-volume intake questions, the AI agent aimed to reduce client wait times and expand LASSB’s reach to those who can’t easily come in or call during business hours. The students planned a phased evaluation and implementation: first prototyping the agent with sample data, then testing its accuracy and safety with LASSB staff, before moving toward a limited pilot deployment. This report details the development of that prototype AI Intake Agent across the Autumn and Winter quarters, including the use case rationale, current vs. future workflow, technical design, evaluation findings, and recommendations for next steps.

1: The Use Case – AI-Assisted Housing Intake

Defining the Use Case of Intake & Screening

The project focused on legal intake for housing legal help, specifically tenants seeking assistance with eviction or unsafe housing. Intake is the process by which legal aid determines who qualifies for help and gathers the facts of their case. For a tenant facing eviction, this means answering questions about income, household, and the eviction situation, so the agency can decide if the case falls within their scope (for example, within income limits and legal priorities).

Intake is a natural first use case because it is a gateway to justice: a short phone interview or online form is often all that stands between a person in crisis and the help they need. Yet many people never complete this step due to practical barriers (long hold times, lack of childcare or transportation, fear or embarrassment). 

By improving intake, LASSB could assist more people early, preventing more evictions or legal problems from escalating.

Why LASSB Chose Housing Intake 

LASSB and the student team selected the housing intake scenario for several reasons. First, housing is LASSB’s highest-demand area – eviction defense was 62% of cases for a neighboring legal aid and similarly dominant for LASSB. This high volume means intake workers spend enormous time screening housing cases, and many eligible clients are turned away simply because staff can’t handle all the calls. Improving intake throughput could thus have an immediate impact. Second, housing intake involves highly repetitive and rules-based questions (e.g. income eligibility, case type triage) that are well-suited to automation. These are precisely the kind of routine, information-heavy tasks that AI can assist with at scale. 

Third, an intake chatbot could increase privacy and reach: clients could complete intake online 24/7, at their own pace, without waiting on hold or revealing personal stories to a stranger right away. This could especially help those in rural areas or those uncomfortable with an in-person or phone interview. In short, housing intake was seen as a high-impact, AI-ready use case where automation might improve efficiency while preserving quality of service.

Why Intake Matters for Access to Justice

Intake may seem mundane, but it is a cornerstone of access to justice. It is the “front door” of legal aid – if the door is locked or the line too long, people simply don’t get help. Studies show that only a small fraction of people with civil legal issues ever consult a lawyer, often because they don’t recognize their problem as legal or face obstacles seeking help. Even among those who do reach out to legal aid (nearly 2 million requests in 2022), about half are turned away due to insufficient resources. Many turn-aways happen at the intake stage, when agencies must triage cases. Improving intake can thus shrink the “justice gap” by catching more issues early and providing at least some guidance to those who would otherwise get nothing. 

Moreover, a well-designed intake process can empower clients – by helping them tell their story, identifying their urgent needs, and connecting them to appropriate next steps. On the flip side, a bad intake experience (confusing questions, long delays, or perfunctory denials) can discourage people from pursuing their rights, effectively denying justice. By focusing on intake, the project aimed to make the path to legal help smoother and more equitable.

Why AI Is a Good Fit for Housing Intake

Legal intake involves high volume, repetitive Q&A, and standard decision rules, which are conditions where AI can excel. A large language model (LLM) can be programmed to ask the same questions an intake worker would, in a conversational manner, and interpret the answers. 

Because LLMs can process natural language, an AI agent can understand a client’s narrative of their housing problem and spot relevant details or legal issues (e.g. identifying an illegal lockout vs. a formal eviction) to ask appropriate follow-ups. This dynamic questioning is something LLMs have demonstrated success in – for example, a recent experiment in Missouri showed that an LLM could generate follow-up intake questions “in real-time” based on a user’s description, like asking whether a landlord gave formal notice after a tenant said “I got kicked out.” AI can also help standardize decisions: by encoding eligibility rules into the prompt or system, it can apply the same criteria every time, potentially reducing inconsistent screening outcomes. Importantly, initial research found that GPT-4-based models could predict legal aid acceptance/rejection decisions with about 84% accuracy, and they erred on the side of caution (usually not rejecting a case unless clearly ineligible). This suggests AI intake systems can be tuned to minimize false denials, a critical requirement for fairness.

Beyond consistency and accuracy, AI offers scalability and extended reach. Once developed, an AI intake agent can handle multiple clients at once, anytime. For LASSB, this could mean a client with an eviction notice can start an intake at midnight rather than waiting anxious days for a callback. Other legal aid groups have already seen the potential: Legal Aid of North Carolina’s chatbot “LIA” has engaged in over 21,000 conversations in its first year, answering common legal questions and freeing up staff time. LASSB hopes for similar gains – the Executive Director noted plans to test AI tools to “reduce client wait times” and extend services to rural communities that in-person clinics don’t reach. Finally, an AI intake agent can offer a degree of client comfort – some individuals might prefer typing out their story to a bot rather than speaking to a person, especially on sensitive issues like domestic violence intersecting with an eviction. In summary, the volume, repetitive structure, and outreach potential of intake made it an ideal candidate for an AI solution.

2: Status Quo and Future Vision

Current Human-Led Workflow 

At present, LASSB’s intake process is entirely human-driven. A typical workflow might begin with a client calling LASSB’s hotline or walking into a clinic. An intake coordinator or paralegal then screens for eligibility, asking a series of standard questions: Are you a U.S. citizen or eligible immigrant? What is your household size and income? What is your zip code or county? What type of legal issue do you have? These questions correspond to LASSB’s internal eligibility rules (for example, income below a percentage of the poverty line, residence in the service area, and case type within program priorities). 

The intake worker usually follows a scripted guide – these guides can run 7+ pages of rules and flowcharts for different scenarios. If the client passes initial screening, the staffer moves on to information-gathering: taking down details of the legal problem. In a housing case, they might ask: “When did you receive the eviction notice? Did you already go to court? How many people live in the unit? Do you have any disabilities or special circumstances?” This helps determine the urgency and possible defenses (for instance, disability could mean a reasonable accommodation letter might help, or a lockout without court order is illegal). The intake worker must also gauge if the case fits LASSB’s current priorities or grant requirements – a subtle judgment call often based on experience. 

Once information is collected, the case is handed off internally: if it’s straightforward and within scope, they may schedule the client for a legal clinic or assign a staff attorney for advice. If it’s a tougher or out-of-scope case, the client might be given a referral to another agency or a “brief advice” appointment where an attorney only gives counsel and not full representation. In some instances, there are multiple handoffs – for example, the person who does the phone screening might not be the one who ultimately provides the legal advice, requiring good note-taking and case summaries.

User Personas in the Workflow

The team crafted sample user and staff personas, of who would be interacting with the new workflow and AI agent.


Pain Points in the Status Quo

This human-centric process has several pain points identified by LASSB and the student team. 

First, it’s slow and resource-intensive. Clients can wait an hour or more on hold before even speaking to an intake worker during peak times, such as when an eviction moratorium change causes a surge in calls. Staff capacity is limited – a single intake worker can only handle one client at a time, and each interview might take 20–30 minutes. If the client is ultimately ineligible, that time might be “wasted” that could have been spent on an eligible client. The sheer volume means many callers never get through at all. 

Second, the complexity of rules can lead to inconsistent or suboptimal outcomes. Intake staff have to juggle 30+ eligibility rules, which can change with funding or policy shifts. Important details might be missed or misapplied; for example, a novice staffer might turn away a case that seems outside scope but actually fits an exception. Indeed, variability in intake decisions was a known issue – one research project found that LLMs sometimes caught errors made by human screeners (e.g., the AI recognized a case was eligible when a human mistakenly marked it as not). 

Third, the process can be stressful for clients. Explaining one’s predicament (like why rent is behind) to a stranger can be intimidating. Clients in crisis might forget to mention key facts or have trouble understanding the questions. If a client has trauma (such as a domestic violence survivor facing eviction due to abuse), a blunt interview can inadvertently re-traumatize them. LASSB intake staff are trained to be sensitive, but in the rush of high volume, the experience may still feel hurried or impersonal. 

Finally, timing and access are issues. Intake typically happens during business hours via phone or at specific clinic times. People who work, lack a phone, or have disabilities may struggle to engage through those channels. Language barriers can also be an issue; while LASSB offers services in Spanish and other languages, matching bilingual staff to every call is challenging. All these pain points underscore a need for a more efficient, user-friendly intake system.

Envisioned Human-AI Workflow

In the future-state vision, LASSB’s intake would be a human-AI partnership, blending automation with human judgment. The envisioned workflow goes as follows: A client in need of housing help would first interact with an AI Intake Agent, likely through a web chat interface (or possibly via a self-help kiosk or mobile app). 

The AI agent would greet the user with a friendly introduction (making clear it’s an automated assistant) and guide them through the eligibility questions – e.g., asking for their income range, household size, and problem category. These could even be answered via simple buttons or quick replies to make it easy. The agent would use these answers to do an initial screening (following the same rules staff use). If clearly ineligible (for instance, the person lives outside LASSB’s service counties), the agent would not simply turn them away. Instead, it might gently inform them that LASSB likely cannot assist directly and provide a referral link or information for the appropriate jurisdiction. (Crucially, per LASSB’s guidance, the AI would err on inclusion – if unsure, it would mark the case for human review rather than issuing a flat denial.) 

For those who pass the basic criteria, the AI would proceed to collect case facts: “Please describe what’s happening with your housing situation.” As the user writes or speaks (in a typed chat or possibly voice in the future), the AI will parse the narrative and ask smart follow-ups. For example, if the client says “I’m being evicted for not paying rent,” the AI might follow up: “Have you received court papers (an unlawful detainer lawsuit) from your landlord, or just a pay-or-quit notice?” – aiming to distinguish a looming eviction from an active court case. This dynamic Q&A continues until the AI has enough detail to fill out an intake template (or until it senses diminishing returns from more questions). The conversation is designed to feel like a natural interview with empathy and clarity.

After gathering info, the handoff to humans occurs. The AI will compile a summary of the intake: key facts like names, important dates (e.g., eviction hearing date if any), and the client’s stated goals or concerns. It may also tentatively flag certain legal issues or urgency indicators – for instance, “Client might qualify for a disability accommodation defense” or “Lockout situation – urgent” – based on what it learned. This summary and the raw Q&A transcript are then forwarded to LASSB’s intake staff or attorneys. A human will review the package, double-check eligibility (the AI’s work is a recommendation, not final), and then follow up with the client. In some cases, the AI might be able to immediately route the client: for example, scheduling them for the next eviction clinic or providing a link to self-help resources while they wait.

But major decisions, like accepting the case for full representation or giving legal advice, remain with human professionals. The human staff thus step in at the “decision” stage with a lot of the grunt work already done. They can spend their time verifying critical details and providing counsel, rather than laboriously collecting background info. This hybrid workflow means clients get faster initial engagement (potentially instantaneous via AI, instead of waiting days for a call) and staff time is used more efficiently where their expertise is truly needed.

Feedback-Shaped Vision

The envisioned workflow was refined through feedback from LASSB stakeholders and experts during the project. Early on, LASSB’s attorneys emphasized that high-stakes decisions must remain human – for instance, deciding someone is ineligible or giving them legal advice about what to do would require a person. This feedback led the team to build guardrails so the AI does not give definitive legal conclusions or turn anyone away without human oversight. Another piece of feedback was about tone and trauma-informed practice. LASSB staff noted that many clients are distressed; a cold or robotic interview could alienate them. In response, the team made the AI’s language extra supportive and user-friendly, adding polite affirmations (“Thank you for sharing that information”) and apologies (“I’m sorry you’re dealing with this”) where appropriate. 

They also ensured the AI would ask for sensitive details in a careful way and only if necessary. For example, rather than immediately asking “How much is your income?” which might feel intrusive, the AI might first explain “We ask income because we have to confirm eligibility – roughly what is your monthly income?” to give context. The team also got input on workflow integration – intake staff wanted the AI system to feed into their existing case management software (LegalServer) so that there’s no duplication of data entry. This shaped the plan for implementation (i.e., designing the output in a format that can be easily transferred). Finally, feedback from technologists and the class instructors encouraged the use of a combined approach (rules + AI). This meant not relying on the AI alone to figure out eligibility from scratch, but to use simple rule-based checks for clear-cut criteria (citizenship, income threshold) and let the AI focus on understanding the narrative and generating follow-up questions. 

This hybrid approach was validated by outside research as well. All of these inputs helped refine the future workflow into one that is practical, safe, and aligned with LASSB’s needs: AI handles the heavy lifting of asking and recording, while humans handle the nuanced judgment calls and personal touch.


3: Prototyping and Technical Work

Initial Concepts from Autumn Quarter 

During the Autumn 2024 quarter, the student team explored the problem space and brainstormed possible AI interventions for LASSB. The partner had come with a range of ideas, including using AI to assist with emergency eviction filings. One early concept was an AI tool to help tenants draft a “motion to set aside” a default eviction judgment – essentially, a last-minute court filing to stop a lockout. This is a high-impact task (it can literally keep someone housed), but also high-risk and time-sensitive. Through discussions with LASSB, the team realized that automating such a critical legal document might be too ambitious as a first step – errors or bad advice in that context could have severe consequences. 

Moreover, to draft a motion, the AI would still need a solid intake of facts to base it on. This insight refocused the team on the intake stage as the foundation. Another concept floated was an AI that could analyze a tenant’s story to spot legal defenses (for example, identifying if the landlord failed to make repairs as a defense to nonpayment). While appealing, this again raised the concern of false negatives (what if the AI missed a valid defense?) and overlapped with legal advice. Feedback from course mentors and LASSB steered the team toward a more contained use case: improving the intake interview itself

By the end of Autumn quarter, the students presented a concept for an AI intake chatbot that would ask clients the right questions and produce an intake summary for staff. The concept kept human review in the loop, aligning with the consensus that AI should support, not replace, the expert judgment of LASSB’s legal team.

Revised Scope in Winter 

Going into Winter quarter, the project’s scope was refined and solidified. The team committed to a limited use case – the AI would handle initial intake for housing matters only, and it would not make any final eligibility determinations or provide legal advice. All high-stakes decisions were deferred to staff. For example, rather than programming the AI to tell a client “You are over income, we cannot help,” the AI would instead flag the issue for a human to confirm and follow up with a personalized referral if needed. Likewise, the AI would not tell a client “You have a great defense, here’s what to do” – instead, it might say, “Thank you, someone from our office will review this information and discuss next steps with you.” By narrowing the scope to fact-gathering and preliminary triage, the team could focus on making the AI excellent at those tasks, while minimizing ethical risks. They also limited the domain to housing (evictions, landlord/tenant issues) rather than trying to cover every legal issue LASSB handles. This allowed the prototype to be more finely tuned with housing-specific terminology and questions. The Winter quarter also shifted toward implementation details – deciding on the tech stack and data inputs – now that the “what” was determined. The result was a clear mandate: build a prototype AI intake agent for housing that asks the right questions, captures the necessary data, and hands off to humans appropriately.

Prototype Development Details 

The team developed the prototype using a combination of Google’s Vertex AI platform and custom scripting. Vertex AI was chosen in part for its enterprise-grade security (important for client data) and its support for large language model deployment. Using Vertex AI’s generative AI tools, the students configured a chatbot with a predefined prompt that established the AI’s role and instructions. For example, the system prompt instructed: “You are an intake assistant for a legal aid organization. Your job is to collect information from the client about their housing issue, while being polite, patient, and thorough. You do not give legal advice or make final decisions. If the user asks for advice or a decision, you should defer and explain a human will help with that.” This kind of prompt served as a guardrail for the AI’s behavior.

They also input a structured intake script derived from LASSB’s actual intake checklist. This script included key questions (citizenship, income, etc.) and conditional logic – for instance, if the client indicated a domestic violence issue tied to housing, the AI should ask a few DV-related questions (given LASSB has special protocols for DV survivors). Some of this logic was handled by embedding cues in the prompt like: “If the client mentions domestic violence, express empathy and ensure they are safe, then ask if they have a restraining order or need emergency assistance.” The team had to balance not making the AI too rigidly scripted (losing the flexibility of natural conversation) with not leaving it totally open-ended (which could lead to random or irrelevant questions). They achieved this by a hybrid approach: a few initial questions were fixed and rule-based (using Vertex AI’s dialogue flow control), then the narrative part used the LLM’s generative ability to ask appropriate follow-ups. 

The sample data used to develop and test the bot included a set of hypothetical client scenarios. The students wrote out example intakes (based on real patterns LASSB described) – e.g., “Client is a single mother behind 2 months rent after losing job; received 3-day notice; has an eviction hearing in 2 weeks; also mentions apartment has mold”. They fed these scenarios to the chatbot during development to see how it responded. This helped them identify gaps – for example, early versions of the bot forgot to ask whether the client had received court papers, and sometimes it didn’t ask about deadlines like a hearing date. Each iteration, they refined the prompt or added guidance until the bot consistently covered those crucial points.

Key Design Decisions

A number of design decisions were made to ensure the AI agent was effective and aligned with LASSB’s values.

Trauma-Informed Questioning 

The bot’s dialogue was crafted to be empathetic and empowering. Instead of bluntly asking “Why didn’t you pay your rent?,” it would use a non-judgmental tone: “Can you share a bit about why you fell behind on rent? (For example, loss of income, unexpected expenses, etc.) This helps us understand your situation.” 

The AI was also set to avoid repetitive pressing on distressing details. If a client had already said plenty about a conflict with their landlord, the AI would acknowledge that (“Thank you, I understand that must be very stressful”) and not re-ask the same thing just to fill a form. These choices were informed by trauma-informed lawyering principles LASSB adheres to, aiming to make clients feel heard and not blamed.

Tone and Language 

The AI speaks in plain, layperson’s language, not legalese. Internal rules like “FPI at 125% for XYZ funding” were translated into simple terms or hidden from the user. For instance, instead of asking “Is your income under 125% of the federal poverty guidelines?” the bot asks “Do you mind sharing your monthly income (approximately)? We have income limits to determine eligibility.” It also explains why it’s asking things, to build trust. The tone is conversational but professional – akin to a friendly paralegal. 

The team included some small talk elements at the start (“I’m here to help you with your housing issue. I will ask some questions to understand your situation.”) to put users at ease. Importantly, the bot never pretends to be a lawyer or a human; it was transparent that it’s a virtual assistant helping gather info for the legal aid.

Guardrails

Several guardrails were programmed to keep the AI on track. A major one was a do-not-do list in the prompt: do not provide legal advice, do not make guarantees, do not deviate into unrelated topics even if user goes off-track. If the user asked a legal question (“What should I do about X?”), the bot was instructed to reply with something like: “I’m not able to give legal advice, but I will record your question for our attorneys. Let’s focus on getting the details of your situation, and our team will advise you soon.” 

Another guardrail was content moderation – e.g., if a user described intentions of self-harm or violence, the bot would give a compassionate response and alert a human immediately. Vertex AI’s content filter was leveraged to catch extreme situations. Additionally, the bot was prevented from asking for information that LASSB staff said they never need at intake (to avoid over-intrusive behavior). For example, it wouldn’t ask for Social Security Number or any passwords, etc., which also helps with security.

User Flow and Interface

The user flow was deliberately kept simple. The prototype interface (tested in a web browser) would show one question at a time, and allow the user to either type a response or select from suggested options when applicable. The design avoids giant text boxes that might overwhelm users; instead, it breaks the interview into bite-sized exchanges (a principle from online form usability). 

After the last question, the bot would explicitly ask “Is there anything else you want us to know?” giving the user a chance to add details in their own words. Then the bot would confirm it has what it needs and explain the next steps: e.g., “Thank you for all this information. Our legal team will review it immediately. You should receive a call or email from us within 1 business day. If you have an urgent court date, you can also call our hotline at …” This closure message was included to ensure the user isn’t left wondering what happens next, a common complaint with some automated systems.

Risk Mitigation

The team did a review of what could go wrong — what risks of harm are there with an intake agent? They did a brainstorm of what design, tech, and policy decisions could mitigate each of those risks.

 RiskMitigation
Screening Agent
 The client is monolingual and does not understand the AI’s questions and does not provide sufficient/ correct information to the Agent.We are working towards the Screening Agent having multilingual capabilities, particular Spanish-language skills.
 The client is vision or hearing impaired and the Screening Agent does not understand the client.The Screening Agent has voice-to-text for vision impaired clients and text-based options for hearing impaired clients. We can also train the Screening Agent on producing a list of questions it did not get answers to and route to the Paralegal to ask those questions.  
 The Screening Agent does not understand the client properly and generates incorrect information.The Screening Agent will confirm / spell back important identifying information, such as names and addresses. The Screening Agent will be programmed to route back to an IW or Paralegal if the AI cannot understand the client. A LASSB attorney will review and confirm any final product with the client.
 The client is insulted or in some other way offended by the Screening Agent.The Screening Agent’s scope is limited to the Screening Questions. It will also be trained on trauma-informed care. LASSB should also obtain the clients’ consent before referring them to the Screening Agent.

Training and Iteration

Notably, the team did not train a new machine learning model from scratch; instead they used a pre-existing LLM (from Vertex, analogous to GPT-4 or PaLM2) and focused on prompt engineering and few-shot examples to refine its performance. They created a few example dialogues as part of the prompt to show the AI what a good intake looks like. For instance, an example Q&A in the prompt might demonstrate the AI asking clarifying questions and the user responding, so the model could mimic that style. 

The prototype’s development was highly iterative: the students would run simulated chats (playing the user role themselves or with peers) and analyze the output. When the AI did something undesirable – like asking a redundant question or missing a key fact – they would adjust the instructions or add a conditional rule. They also experimented with model parameters like temperature (choosing a relatively low temperature for more predictable, consistent questioning rather than creative, off-the-cuff responses[28][18]). Over the Winter quarter, dozens of test conversations were conducted. 

Midway, they also invited LASSB staff to test the bot with sample scenarios. An intake supervisor typed in a scenario of a tenant family being evicted after one member lost a job, and based on that feedback, the team tweaked the bot to be more sensitive when asking about income (the supervisor felt the bot should explicitly mention services are free and confidential, to reassure clients as they disclose personal info). The final prototype by March 2025 was able to handle a realistic intake conversation end-to-end: from greeting to summary output. 

The output was formatted as a structured text report (with sections for client info, issue summary, and any urgent flags) that a human could quickly read. The technical work thus culminated in a working demo of the AI intake agent ready for evaluation.

4: Evaluation and Lessons Learned

Evaluating Quality and Usefulness

The team approached evaluation on multiple dimensions – accuracy of the intake, usefulness to staff, user experience, and safety. 

First, the team created a quality rubric about what ‘good’ or ‘bad performance would look like.

Good-Bad Rubric on Screening Performance

A successful agent will be able to obtain answers from the client for all relevant Screening questions in the format best suited to the client (i.e., verbally or written and in English or Spanish).  A successful agent will also be able to ask some open-ended questions about the client’s legal problem to save the time spent by the Housing Attorney and Clinic Attorney discussing the client’s legal problem. Ultimately, a successful AI Screening agent will be able to perform pre-screening and Screening for clients

✅A good Screening agent will be able to accurately detail all the client’s information and ensure that there are no mistakes in the spelling or otherwise of the information. 

❌A bad Screening agent would produce incorrect information and misunderstand the clients.  A bad solution would require the LASSB users to cross-check and amend lots of the information with the client.

✅A good Screening agent will be user-friendly for the clients in a format already familiar with the client, such as text or phone call.

❌ A bad Screening agent would require clients, many of whom may be unsophisticated, to use systems they are not familiar with and would be difficult to use.

✅A good Screening agent would be multilingual.

❌ A bad Screening agent would only understand clients that spoke very and in a particular format.

✅ A good Screening agent would be accessible for clients with disabilities, including vision or audio impaired clients.  

❌A bad Screening agent would not be accessible to clients with disabilities. A bad solution would not be accessible on a client’s phone.

✅A good Screening agent will be respond to the clients in a trauma-informed manner.  A good AI agent Screening will appear kind and make the clients feel comfortable.

❌A bad Screening agent would offend the clients and make the clients reluctant to answer the questions.

✅A good Screening agent will produce a transcript of the interview that enables the LASSB attorneys and paralegals to understand the client’s situation efficiently. To do this, the agent could produce a summary of the key points from the Screening questions.  It is also important the transcript is searchable and easy to navigate so that the LASSB attorneys can easily locate information.

❌A bad Screening agent would produce a transcript that is difficult to navigate and identify key information.  For example, it may produce a large PDF that is not searchable and not provide any easy way to find the responses to the questions. 

✅A good Screening agent need not get through the questions as quickly as possible, but must be able to redirect the client to the questions to ensure that the clients answers all the necessary questions.

❌A bad Screening agent would get distracted from the clients’ responses and not obtain answers to all the questions.

In summary, the main metrics against which the Screening Agent should be measured include:

  1. Accuracy: whether matches human performance or produces errors in less cases);
  2. User satisfaction: how happy the client & LASSB personnel using the agent are; and
  3. Efficiency: how much time the agent takes to obtain answers to all 114 pre-screening and Screening questions.

Testing the prototype

To test accuracy, they compared the AI’s screening and issue-spotting to that of human experts. They prepared 16 sample intake scenarios (inspired by real cases, similar to what other researchers have done) and for each scenario they had a law student or attorney determine the expected “intake outcome” (e.g., eligible vs. not eligible, and key issues identified). Then they ran each scenario through the AI chatbot and examined the results. The encouraging finding was that the AI correctly identified eligibility in the vast majority of cases, and when uncertain, it appropriately refrained from a definitive judgment – often saying a human would review. For example, in a scenario where the client’s income was slightly above the normal cutoff but they had a disability (which could qualify them under an exception), the AI noted the income issue but did not reject the case; it tagged it for staff review. This behavior aligned with the design goal of avoiding false negatives. 

In fact, across the test scenarios, the AI never outright “turned away” an eligible client. At worst, it sometimes told an ineligible client that it “might not” qualify and a human would confirm – a conservative approach that errs on inclusion. In terms of issue-spotting, the AI’s performance was good but not flawless. It correctly zeroed in on the main legal issue (e.g., nonpayment eviction, illegal lockout, landlord harassment) in nearly all cases. In a few complex scenarios, it missed secondary issues – for instance, a scenario involved both eviction and a housing code violation (mold), and the AI summary focused on the eviction but didn’t highlight the possible habitability claim. When attorneys reviewed this, they noted a human intake worker likely would have flagged the mold issue for potential affirmative claims. This indicated a learning: the AI might need further training or prompts to capture all legal issues, not just the primary one.

To gauge usefulness and usability, the team turned to qualitative feedback. They had LASSB intake staff and a couple of volunteer testers act as users in mock intake interviews with the AI. Afterward, they surveyed them on the experience. The intake staff’s perspective was crucial: they reviewed the AI-generated summaries alongside what a typical human-intake notes would look like. The staff generally found the AI summaries usable and in many cases more structured than human notes. The AI provided a coherent narrative of the problem and neatly listed relevant facts (dates, amounts, etc.), which some staff said could save them a few minutes per case in writing up memos. One intake coordinator commented that the AI “asked all the questions I would have asked” in a standard tenancy termination case – a positive sign of completeness. 

On the client side, volunteer testers noted that the AI was understandable and polite, though a few thought it was a bit “formal” in phrasing. This might reflect the fine line between professional and conversational tone – a point for possible adjustment. Importantly, testers reported that they “would be comfortable using this tool” and would trust that their information gets to a real lawyer. The presence of clear next-step messaging (that staff would follow up) seemed to reassure users that they weren’t just shouting into a void. The team also looked at efficiency metrics: In simulation, the AI interview took about 5–10 minutes of user time on average, compared to ~15 minutes for a typical phone intake. Of course, these were simulated users; real clients might take longer to type or might need more clarification. But it suggested the AI could potentially cut intake time by around 30-50% for straightforward cases, a significant efficiency gain.

Benchmarks for AI Performance

In designing evaluation, the team drew on emerging benchmarks in the AI & justice field. They set some target benchmarks such as: 

  • Zero critical errors (no client who should be helped is mistakenly rejected by the AI, and no obviously wrong information given), 
  • at least 80% alignment with human experts on identifying case eligibility (they achieved ~90% in testing), and 
  • high user satisfaction (measured informally via feedback forms). 

For safety, a benchmark was that the AI should trigger human intervention in 100% of cases where certain red flags appear (like mention of self-harm or urgent safety concerns). In test runs, there was one scenario where a client said something like “I have nowhere to go, I’m so desperate I’m thinking of doing something drastic.” 

The AI appropriately responded with empathy and indicated that it would notify the team for immediate assistance – meeting the safety benchmark. Another benchmark was privacy and confidentiality – the team checked that the AI was not inadvertently storing data outside approved channels. All test data was kept in a sandbox environment and they planned that any actual deployment would comply with confidentiality policies (e.g., not retaining chat transcripts longer than needed and storing them in LASSB’s secure system).

Feedback from Attorneys and Technologists: 

The prototype was demonstrated to a group of LASSB attorneys, intake staff, and a few technology advisors in late Winter quarter. The attorneys provided candid feedback. One housing lawyer was initially skeptical – concerned an AI might miss the human nuance – but after seeing the demo, they remarked that “the output is like what I’d expect from a well-trained intern or paralegal.” They appreciated that the AI didn’t attempt to solve the case but simply gathered information systematically. Another attorney asked about bias – whether the AI might treat clients differently based on how they talk (for instance, if a client is less articulate, would the AI misunderstand?). 

In response, the team showed how the AI asks gentle clarifying questions if it’s unsure, and they discussed plans for continuous monitoring to catch any biased outcomes. The intake staff reiterated that the tool could be very helpful as an initial filter, especially during surges. They did voice a concern: “How do we ensure the client’s story is accurately understood?” This led to a suggestion that in the pilot phase, staff double-check key facts with the client (“The bot noted you got a 3-day notice on Jan 1, is that correct?”) to verify nothing was lost in translation. 

Technologists (including advisors from the Stanford Legal Design Lab) gave feedback on the technical approach. They supported the use of rule-based gating combined with LLM follow-ups, noting that other projects (like the Missouri intake experiment) have found success with that hybrid model. They also advised to keep the model updated with policy changes – e.g., if income thresholds or laws change, those need to be reflected in the AI’s knowledge promptly, which is more of an operational challenge than a technical one. Overall, the feedback from all sides was that the prototype showed real promise, provided it’s implemented carefully. Stakeholders were excited that it could improve capacity, but they stressed that proper oversight and iterative improvement would be key before using it live with vulnerable clients.

What Worked Well in testing

Several aspects of the project went well. First, the AI agent effectively mirrored the standard intake procedure, indicating that the effort to encode LASSB’s intake script was successful. It consistently asked the fundamental eligibility questions and gathered core facts without needing human prompting. This shows that a well-structured prompt and logic can guide an LLM to perform a complex multi-step task reliably. 

Second, the LLM’s natural language understanding proved advantageous. It could handle varied user inputs – whether someone wrote a long story all at once or gave terse answers, the AI adapted. In one test, a user rambled about their landlord “kicking them out for no reason, changed locks, etc.” and the AI parsed that as an illegal lockout scenario and asked the right follow-up about court involvement. The ability to parse messy, real-life narratives and extract legal-relevant details is where AI shined compared to rigid forms. 

Third, the tone and empathy embedded in the AI’s design appeared to resonate. Test users noted that the bot was “surprisingly caring”. This was a victory for the team’s design emphasis on trauma-informed language – it validated that an AI can be programmed to respond in a way that feels supportive (at least to some users). 

Fourth, the AI’s cautious approach to eligibility (not auto-rejecting) worked as intended. In testing, whenever a scenario was borderline, the AI prompted for human review rather than making a call. This matches the desired ethical stance: no one gets thrown out by a machine’s decision alone. Finally, the process of developing the prototype fostered a lot of knowledge transfer and reflection. LASSB staff mentioned that just mapping out their intake logic for the AI helped them identify a few inefficiencies in their current process (like questions that might not be needed). So the project had a side benefit of process improvement insight for the human system too.

What Failed or Fell Short in testing

Despite the many positives, there were also failures and limitations encountered. One issue was over-questioning. The AI sometimes asked one or two questions too many, which could test a user’s patience. For example, in a scenario where the client clearly stated “I have an eviction hearing on April 1,” an earlier version of the bot still asked “Do you know if there’s a court date set?” which was redundant. This kind of repetition, while minor, could annoy a real user. It stemmed from the AI not having a perfect memory of prior answers unless carefully constrained – a known quirk of LLMs. The team addressed some instances by refining prompts, but it’s something to watch in deployment. Another shortcoming was handling of multi-issue situations. If a client brought up multiple problems (say eviction plus a related family law issue), the AI got somewhat confused about scope. In one test, a user mentioned being evicted and also having a dispute with a roommate who is a partner – mixing housing and personal relationship issues. The AI tried to be helpful by asking about both, but that made the interview unfocused. This highlights that AI may struggle with scope management – knowing what not to delve into. A design decision for the future might be to explicitly tell the AI to stick to housing and ignore other legal problems (while perhaps flagging them for later). 

Additionally, there were challenges with the AI’s legal knowledge limits. The prototype did not integrate an external legal knowledge base; it relied on the LLM’s trained knowledge (up to its cutoff date). While it generally knew common eviction terms, it might not know the latest California-specific procedural rules. For instance, if a user asked, “What is an Unlawful Detainer?” the AI provided a decent generic answer in testing, but we hadn’t formally allowed it to give legal definitions (since that edges into advice). If not carefully constrained, it might give incorrect or jurisdictionally wrong info. This is a risk the team noted: for production, one might integrate a vetted FAQ or knowledge retrieval component to ensure any legal info given is accurate and up-to-date.

We also learned that the AI could face moderation or refusal issues for certain sensitive content. As seen in other research, certain models have content filters that might refuse queries about violence or illegal activity. In our tests, when a scenario involved domestic violence, the AI handled it appropriately (did not refuse; it responded with concern and continued). But we were aware that some LLMs might balk or produce sanitised answers if a user’s description includes abuse details or strong language. Ensuring the AI remains able to discuss these issues (in a helpful way) is an ongoing concern – we might need to adjust settings or choose models that allow these conversations with proper context. 

Lastly, the team encountered the mundane but important challenge of integrating with existing systems. The prototype worked in a standalone environment, but LASSB’s real intake involves LegalServer and other databases. We didn’t fully solve how to plug the AI into those systems in real-time. This is less a failure of the AI per se and more a next-step technical hurdle, but it’s worth noting: a tool is only useful if it fits into the workflow. We attempted a small integration by outputting the summary in a format similar to a LegalServer intake form, but a true integration would require more IT development.

Why These Issues Arose

Many of the shortcomings trace back to the inherent limitations of current LLM technology and the complexity of legal practice. The redundant questions happened because the AI doesn’t truly understand context like a human, it only predicts likely sequences. If not explicitly instructed, it might err on asking again to be safe. Our prompt engineering reduced but didn’t eliminate this; it’s a reminder that LLMs need carefully bounded instructions. The scope creep with multiple issues is a byproduct of the AI trying to be helpful – it sees mention of another problem and, without human judgment about relevance, it goes after it. This is where human intake workers naturally filter and focus, something an AI will do only as well as it’s told to. 

Legal knowledge gaps are expected because an LLM is not a legal expert and can’t be updated like a database without re-training. We mitigated risk by not relying on it to give legal answers, but any subtle knowledge it applied (like understanding eviction procedure) comes from its general training, which might not capture local nuances. The team recognized that a retrieval-augmented approach (providing the AI with reference text like LASSB’s manual or housing law snippets) could improve factual accuracy, but that was beyond the initial prototype’s scope. 

Content moderation issues arise from the AI provider’s safety guardrails – these are important to have (to avoid harmful outputs), but they can be a blunt instrument. Fine-tuning them for a legal aid context (where discussions of violence or self-harm are sometimes necessary) is tricky and likely requires collaboration with the provider or switching to a model where we have more control. The integration challenge simply comes from the fact that legal aid tech stacks were not designed with AI in mind. Systems like LegalServer are improving their API offerings, but knitting together a custom AI with legacy systems is non-trivial. This is a broader lesson: often the tech is ahead of the implementation environment in nonprofits.

Lessons on Human-AI Teaming and Client Protection 

Developing this prototype yielded valuable lessons about how AI and humans can best collaborate in legal services. One clear lesson is that AI works best as a junior partner, not a solo actor. Our intake agent performed well when its role was bounded to assisting – gathering info, suggesting next steps – under human supervision. The moment we imagined expanding its role (like it drafting a motion or advising a client), the complexity and risk jumped exponentially. So, the takeaway for human-AI teaming is to start with discrete tasks that augment human work. The humans remain the decision-makers and safety net, which not only protects clients but also builds trust among staff. Initially, some LASSB staff were worried the AI might replace them or make decisions they disagreed with. By designing the system to clearly feed into the human process (rather than bypass it), we gained staff buy-in. They began to see the AI as a tool – like an efficient paralegal – rather than a threat. This cultural acceptance is crucial for any such project to succeed.

We also learned about the importance of transparency and accountability in the AI’s operation. For human team members to rely on the AI, they need to know what it asked and what the client answered. Black-box summaries aren’t enough. That’s why we ensured the full Q&A transcript is available to the staff reviewing the case. This way, if something looks off in the summary, the human can check exactly what was said. It’s a form of accountability for the AI. In fact, one attorney noted this could be an advantage: “Sometimes I wish I had a recording or transcript of the intake call to double-check details – this gives me that.” However, this raises a client protection consideration: since the AI interactions are recorded text, safeguarding that data is paramount (whereas a phone call’s content might not be recorded at all). We have to treat those chat logs as confidential client communications. This means robust data security and policies on who can access them.

From the client’s perspective, a lesson is that AI can empower clients if used correctly. Some testers said they felt more in control typing out their story versus speaking on the phone, because they could see what they wrote and edit their thoughts. The AI also never expresses shock or judgment, which some clients might prefer. However, others might find it impersonal or might struggle if they aren’t literate or tech-comfortable. So a takeaway is that AI intake should be offered as an option, not the only path. Clients should be able to choose a human interaction if they want. That choice protects client autonomy and ensures we don’t inadvertently exclude those who can’t or won’t use the technology (due to disability, language, etc.).

Finally, the project underscored that guarding against harm requires constant vigilance. We designed many protections into the system, but we know that only through real-world use will new issues emerge. One must plan to continuously monitor the AI’s outputs for any signs of bias, error, or unintended effects on clients. For example, if clients start treating the AI’s words as gospel (even though we tell them a human will follow up), we might need to reinforce disclaimers or adjust messaging. Human-AI teaming in legal aid is thus not a set-and-forget deployment; it’s an ongoing partnership where the technology must be supervised and updated by the humans running it. As one of the law students quipped, “It’s like having a really smart but somewhat unpredictable intern – you’ve got to keep an eye on them.” This captures well the role of AI: helpful, yes, but still requiring human oversight to truly protect and serve the client’s interests.

Section 5: Recommendations and Next Steps

Immediate Next Steps for LASSB: 

With the prototype built and initial evaluations positive, LASSB is poised to take the next steps toward a pilot. In the near term, a key step is securing approval and support from LASSB leadership and stakeholders. This includes briefing the executive team and possibly the board about the prototype’s capabilities and limitations, to get buy-in for moving forward. (Notably, LASSB’s executive director is already enthusiastic about using AI to streamline services.) 

Concurrently, LASSB should engage with its IT staff or consultants to plan integration of the AI agent with their systems. This means figuring out how the AI will receive user inquiries (e.g., via the LASSB website or a dedicated phone text line) and how the data will flow into their case management. 

A concrete next step is a small-scale pilot deployment of the AI intake agent in a controlled setting. One suggestion is to start with after-hours or overflow calls: for example, when the hotline is closed, direct callers to an online chat with the AI agent as an initial intake, with clear messaging that someone will follow up next day. This would allow testing the AI with real users in a relatively low-risk context (since those clients would likely otherwise just leave a voicemail or not connect at all). Another approach is to use the AI internally first – e.g., have intake staff use the AI in parallel with their own interviewing (almost like a decision support tool) to see if it captures the same info.

LASSB should also pursue any necessary training or policy updates. Staff will need to be trained on how to review AI-collected information, and perhaps coached to not simply trust it blindly but verify critical pieces. Policies may need updating to address AI usage – for instance, updating the intake protocol manual to include procedures for AI-assisted cases. 

Additionally, client consent and awareness must be addressed. A near-term task is drafting a short consent notice for clients using the AI (e.g., “You are interacting with LASSB’s virtual assistant. It will collect information that will be kept confidential and reviewed by our legal team. This assistant is not a lawyer and cannot give legal advice. By continuing you consent to this process.”). This ensures ethical transparency and could be implemented easily at the start of the chat. In summary, the immediate next steps revolve around setting up a pilot environment: getting green lights, making technical arrangements, and preparing staff and clients for the introduction of the AI intake agent.

Toward Pilot and Deployment

To move from prototype to a live pilot, a few things are needed. 

Resource investment is one – while the prototype was built by students, sustaining and improving it will require dedicated resources. LASSB may need to seek a grant or allocate budget for an “AI Intake Pilot” project. This could fund a part-time developer or an AI service subscription (Vertex AI or another platform) and compensate staff time spent on oversight. Given the interest in legal tech innovation, LASSB might explore funding from sources like LSC’s Technology Initiative Grants or private foundations interested in access to justice tech. 

Another requirement is to select the right technology stack for production. The prototype used Vertex AI; LASSB will need to decide if they continue with that (ensuring compliance with confidentiality) or shift to a different solution. Some legal aids are exploring open-source models or on-premises solutions for greater control. The trade-offs (development effort vs. control) should be weighed. It might be simplest initially to use a managed service like Vertex or OpenAI’s API with a strict data use agreement (OpenAI now allows opting out of data retention, etc.). 

On the integration front, LASSB should coordinate with its case management vendor (LegalServer) to integrate the intake outputs. LegalServer has an API and web intake forms; possibly the AI can populate a hidden web form with the collected data or attach a summary to the client’s record. Close collaboration with the vendor could streamline this – maybe an opportunity for the vendor to pilot integration as well, since many legal aids might want this functionality.

As deployment nears, testing and monitoring protocols must be in place. For the pilot, LASSB should define how it will measure success: e.g., reduction in wait times, number of intakes successfully processed by AI, client satisfaction surveys, etc. They should schedule regular check-ins (say weekly) during the pilot to review transcripts and outcomes. Any errors or missteps the AI makes in practice should be logged and analyzed to refine the system (prompt tweaks or additional training examples). It’s also wise to have a clear fallback plan: if the AI system malfunctions or a user is unhappy with it, there must be an easy way to route them to a human immediately. For instance, a button that says “I’d like to talk to a person now” should always be available. From a policy standpoint, LASSB might also want to loop in the California State Bar or ethics bodies just to inform them of the project and ensure there are no unforeseen compliance issues. While the AI is just facilitating intake (not giving legal advice independently), being transparent with regulators can build trust and preempt concerns.

Broader Lessons for Replication 

The journey of building the AI Intake Agent for LASSB offers several lessons for other legal aid organizations considering similar tools:

Start Small and Specific

One lesson is to narrow the use case initially. Rather than trying to build a do-it-all legal chatbot, focus on a specific bottleneck. For us it was housing intake; for another org it might be triaging a particular clinic or automating a frequently used legal form. A well-defined scope makes the project manageable and the results measurable. It also limits the risk surface. Others can take note that the success in Missouri’s project and ours came from targeting a concrete task (intake triage) rather than the whole legal counseling process.

Human-Centered Design is Key

Another lesson is the importance of deep collaboration with the end-users (both clients and staff). The LASSB team’s input on question phrasing, workflow, and what not to automate was invaluable. Legal aid groups should involve their intake workers, paralegals, and even clients (if possible via user testing) from day one. This ensures the AI solution actually fits into real-world practice and addresses real pain points. It’s tempting to build tech in a vacuum, but as we saw, something as nuanced as tone (“Are we sounding too formal?”) only gets addressed through human feedback. For the broader community, sharing design workbooks or guides can help – in fact, the Stanford team developed an AI pilot design workbook to aid others in scoping use cases and thinking through user personas.

Combine Rules and AI for Reliability

A clear takeaway from both our project and others in the field is that a hybrid approach yields the best results. Pure end-to-end AI (just throwing an LLM at the problem) might work 80% of the time, but the 20% it fails could be dangerous. By combining rule-based logic (for hard eligibility cutoffs or mandatory questions) with the flexible reasoning of LLMs, we got a system that was both consistent and adaptable. Legal aid orgs should consider leveraging their existing expertise (their intake manuals, decision trees) in tandem with AI, rather than assuming the AI will infer all the rules itself. This also makes the system more transparent – the rules part can be documented and audited easily.

Don’t Neglect Data Privacy and Ethics

Any org replicating this should prioritize confidentiality and client consent. Our approach was to treat AI intake data with the same confidentiality as any intake conversation. Others should do the same and ensure their AI vendors comply. This might mean negotiating a special contract or using on-prem solutions for sensitive data. Ethically, always disclose to users that they’re interacting with AI. We found users didn’t mind as long as they knew a human would be involved downstream. But failing to disclose could undermine trust severely if discovered. Additionally, groups should be wary of algorithmic bias

Test your AI with diverse personas – different languages, education levels, etc. – to see if it performs equally well. If your client population includes non-English speakers, make multi-language support a requirement from the start (some LLMs handle multilingual intake, or you might integrate translation services).

Benchmark and Share Outcomes

We recommend that legal aid tech pilots establish clear benchmark metrics (like we did for accuracy and false negatives) and openly share their results. This helps the whole community learn what is acceptable performance and where the bar needs to be. As AI in legal aid is still new, a shared evidence base is forming. For example, our finding of ~90% agreement with human intake decisions and 0 false denials in testing is encouraging, but we need more data from other contexts to validate that standard. JusticeBench (or similar networks) could maintain a repository of such pilot results and even anonymized transcripts to facilitate learning. The Medium article “A Pathway to Justice: AI and the Legal Aid Intake Problem” highlights some early adopters like LANC and CARPLS, and calls for exactly this kind of knowledge sharing and collaboration. Legal aid orgs should tap into these networks – there’s an LSC-funded AI working group inviting organizations to share their experiences and tools. Replication will be faster and safer if we learn from each other.

Policy and Regulatory Considerations

On a broader scale, the deployment of AI in legal intake raises policy questions. Organizations should stay abreast of guidance from funders and regulators. For instance, Legal Services Corporation may issue guidelines on use of AI that must be followed for funded programs. State bar ethics opinions on AI usage (especially concerning unauthorized practice of law (UPL) or competence) should be monitored. 

One comforting factor in our case is that the AI is not giving legal advice, so UPL risk is low. However, if an AI incorrectly tells someone they don’t qualify and thus they don’t get help, one could argue that’s a form of harm that regulators would care about. Hence, we reiterate: keep a human in the loop, and you largely mitigate that risk. If other orgs push into AI-provided legal advice, then very careful compliance with emerging policies (and likely some form of licensed attorney oversight of the AI’s advice) will be needed. For now, focusing on intake, forms, and other non-advisory assistance is the prudent path – it’s impactful but doesn’t step hard on the third rail of legal ethics.

Maintain the Human Touch

A final recommendation for any replication is to maintain focus on the human element of access to justice. AI is a tool, not an end in itself. Its success should be measured in how it improves client outcomes and experiences, and how it enables staff and volunteers to do their jobs more effectively without burnout. In our lessons, we saw that clients still need the empathy and strategic thinking of lawyers, and lawyers still need to connect with clients. AI intake should free up time for exactly those things – more counsel and advice, more personal attention where it matters – rather than become a barrier or a cold interface that clients feel stuck with. In designing any AI system, keeping that balanced perspective is crucial. To paraphrase a theme from the AI & justice field: the goal is not to replace humans, but to remove obstacles between humans (clients and lawyers) through sensible use of technology.

Policy and Ethical Considerations

In implementing AI intake agents, legal aid organizations must navigate several policy and ethical issues:

Confidentiality & Data Security

Client communications with an AI agent are confidential and legally privileged (similar to an intake with a human). Thus, the data must be stored securely and any third-party AI service must be vetted. If using a cloud AI API, ensure it does not store or train on your data, and that communications are encrypted. Some orgs may opt for self-hosted models to have full control. Additionally, clients should be informed that their information is being collected in a digital system and assured it’s safe. This transparency aligns with ethical duties of confidentiality.

As mentioned, always let the user know they’re dealing with an AI and not a live lawyer. This can be in a welcome message or a footnote on the chat interface. Users have a right to know and to choose an alternative. Also, make it clear that the AI is not giving legal advice, to manage expectations and avoid confusion about attorney-client relationship. Most people will understand a “virtual assistant” concept, but clarity is key to trust.

Guarding Against Improper Gatekeeping

Perhaps the biggest ethical concern internally is avoiding improper denial of service. If the AI were to mistakenly categorize someone as ineligible or not worth a case and they get turned away, that’s a serious justice failure. To counter this, our approach (and recommended generally) is to set the AI’s threshold such that it prefers false positives to false negatives. In practice, this means any close call gets escalated to a human. 

Organizations should monitor for any patterns of the AI inadvertently filtering out certain groups (e.g., if it turned out people with limited English were dropping off during AI intake, that would be unacceptable and the process must be adjusted). Having humans review at least a sample of “rejected” intakes is a good policy to ensure nobody meritorious slipped through. The principle should be: AI can streamline access, but final “gatekeeping” responsibility remains with human supervisors.

Bias and Fairness

AI systems can inadvertently perpetuate biases present in their training data. For a legal intake agent, this might manifest in how it phrases questions or how it interprets answers. For example, if a client writes in a way that the AI (trained on generic internet text) associates with untruthfulness or something, it might respond less helpfully. We must actively guard against such bias. That means testing the AI with diverse inputs and correcting any skewed behaviors. It might also mean fine-tuning the model on data that reflects the client population more accurately. 

Ethically, a legal aid AI should be as accessible and effective for a homeless person with a smartphone as for a tech-savvy person with a laptop. Fairness also extends to disability access – e.g., ensuring the chatbot works with screen readers or that there’s a voice option for those who can’t easily type.

Accuracy and Accountability

While our intake AI isn’t providing legal advice, accuracy still matters – it must record information correctly and categorize cases correctly. Any factual errors (like mistyping a date or mixing up who is landlord vs. tenant in the summary) could have real impacts. Therefore, building in verification (like the human review stage) is necessary. If the AI were to be extended to give some legal information, then accuracy becomes even more critical; one would need rigorous validation of its outputs against current law. 

Some proposals in the field include requiring AI legal tools to cite sources or provide confidence scores, but for intake, the main thing is careful quality control. Accountability wise, the organization using the AI must accept responsibility for its operation – meaning if something goes wrong, it’s on the organization, not some nebulous “computer.” This should be clear in internal policies: the AI is a tool under our supervision.

UPL and Ethical Practice

We touched on unauthorized practice of law concerns. Since our intake agent doesn’t give advice, it should not cross UPL lines. However, it’s a short step from intake to advice – for instance, if a user asks “What can I do to stop the eviction?” the AI has to hold the line and not give advice. Ensuring it consistently does so (and refers that question to a human attorney) is not just a design choice but an ethical mandate under current law. If in the future, laws or bar rules evolve to allow more automated advice, this might change. But as of now, we recommend strictly keeping AI on the “information collection and form assistance” side, not the “legal advice or counsel” side, unless a licensed attorney is reviewing everything it outputs to the client. There’s a broader policy discussion happening about how AI might be regulated in law – for instance, some have called for safe harbor rules for AI tools used by licensed legal aids under certain conditions. Legal aid organizations should stay involved in those conversations so that they can shape sensible guidelines that protect clients without stifling innovation.

The development of the AI Intake Agent for LASSB demonstrates both the promise and the careful planning required to integrate AI into legal services. The prototype showed that many intake tasks can be automated or augmented by AI in a way that saves time and maintains quality. At the same time, it reinforced that AI is best used as a complement to, not a replacement for, human expertise in the justice system. By sharing these findings with the broader community – funders, legal aid leaders, bar associations, and innovators – we hope to contribute to a responsible expansion of AI pilots that bridge the justice gap. The LASSB case offers a blueprint: start with a well-scoped problem, design with empathy and ethics, keep humans in the loop, and iterate based on real feedback. Following this approach, other organizations can leverage AI’s capabilities to reach more clients and deliver timely legal help, all while upholding the core values of access to justice and client protection. The path to justice can indeed be widened with AI, so long as we tread that path thoughtfully and collaboratively.

Categories
AI + Access to Justice Class Blog Current Projects Project updates

Demand Letter AI

A prototype report on an AI-Powered Drafting of Reasonable Accommodation Demand Letters 

AI for Legal Help, Legal Design Lab, 2025

This report provides a write-up of the AI for Housing Accommodation Demand Letters class project, that was one track of the  “AI for Legal Help” Policy Lab,during the Autumn 2024 and Winter 2025 quarters. This class involved work with legal and court groups that provide legal help services to the public, to understand where responsible AI innovations might be possible and to design and prototype initial solutions, as well as pilot and evaluation plans.

One of the project tracks was on Demand Letters. An interdisciplinary team of Stanford University students partnered with the Legal Aid Society of San Bernardino (LASSB) to address a critical bottleneck in their service delivery: the time-consuming process of drafting reasonable accommodation demand letters for tenants with disabilities. 

This report details the problem identified by LASSB, the proposed AI-powered solution developed by the student team, and recommendations for future development and implementation. 

We share it in the hopes that legal aid and court help center leadership might also be interested in exploring responsible AI development for demand letters, and that funders, researchers, and technologists might collaborate on developing and testing successful solutions for this task.

Thank you to students in this team: Max Bosel, Adam Golomb, Jay Li, Mitra Solomon, and Julia Stroinska. And a big thank you to our LASSB colleagues: Greg Armstrong, Pablo Ramirez, and more

The Housing Accommodation Demand Letter Task

The Legal Aid Society of San Bernardino (LASSB) is a nonprofit law firm providing free legal services to low-income residents in San Bernardino County, California. Among their clients are tenants with disabilities who often need reasonable accommodation demand letters to request changes from landlords (for example, allowing a service animal in a “no pets” building). 

These demand letters are formal written requests asserting tenants’ rights under laws like the Americans with Disabilities Act (ADA) and Fair Housing Act (FHA). They are crucial for tenants to secure accommodations and avoid eviction, but drafting them properly is time-consuming and requires legal expertise. LASSB faces overwhelming demand for help in this area – its hotline receives on the order of 100+ calls per day from tenants seeking assistance. 

However, LASSB has only a handful of intake paralegals and housing attorneys available, meaning many callers must wait a long time or never get through. In fact, LASSB serves around 9–10,000 clients per year via the hotline, yet an estimated 15,000 additional calls never reach help due to capacity limits. Even for clients who do get assistance, drafting a personalized, legally sound letter can take hours of an attorney’s time. With such limited staffing, LASSB’s attorneys are stretched thin, and some eligible clients may end up without a well-crafted demand letter to assert their rights.

LASSB presented their current workflow and questions about AI opportunities in September 2024, and a team of students in AI for Legal Help formed to partner on this task and explore an AI-powered solution. 

The initial question from LASSB was whether we could leverage recent advances in AI to draft high-quality demand letter templates automatically, thereby relieving some burden on staff and improving capacity to serve clients. The goal was to have an AI system gather information from the client and produce a solid first draft letter that an attorney could then quickly review and approve. By doing so, LASSB hoped to streamline the demand-letter workflow – saving attorney time, reducing errors or inconsistencies, and ensuring more clients receive help. 

Importantly, any AI agent would not replace attorney judgment or final sign-off. Rather, it would act as a virtual assistant or co-pilot: handling the routine drafting labor while LASSB staff maintain complete control over the final output. Key objectives set by the partner included improving efficiency, consistency, and accessibility of the service, while remaining legally compliant and user-friendly. In summary, LASSB needed a way to draft reasonable accommodation letters faster without compromising quality. 

After two quarters of work, the class teams proposed a Demand Letter AI system, creating a prototype AI agent that would interview clients about their situation and automatically generates a draft accommodation request letter. This letter would cite the relevant laws and follow LASSB’s format, ready for an attorney’s review. By adopting such a tool, LASSB hopes to minimize the time attorneys spend on repetitive drafting tasks and free them to focus on providing direct counsel and representation. The remainder of this report details the use case rationale, the current vs. envisioned workflow, the technical prototyping process, evaluation approach, and recommendations for next steps in developing this AI-assisted demand letter system.

Why is the Demand Letter Task a good fit for AI?

Reasonable accommodation demand letters for tenants with disabilities were chosen as the focus use case for several reasons. 

The need is undeniably high: as noted, LASSB receives a tremendous volume of housing-related calls, and many involve disabled tenants facing issues like a landlord refusing an exception to a policy (no-pets rules, parking accommodations, unit modifications, etc.). These letters are often the gateway to justice for such clients – a well-crafted letter can persuade a landlord to comply without the tenant ever needing to file a complaint or lawsuit. Demand letters are a high-impact intervention that can prevent evictions and ensure stable housing for vulnerable tenants. Focusing on this use case meant the project could directly improve outcomes for a large number of people, aligning with LASSB’s mission of “justice without barriers – equitable access for all.” 

At the same time, drafting each letter individually is labor-intensive. Attorneys must gather the details of the tenant’s disability and accommodation request, explain the legal basis (e.g. FHA and California law), and compose a polite but firm letter to the landlord. With LASSB’s staff attorneys handling heavy caseloads, these letters sometimes get delayed or delegated to clients themselves to write (with mixed results). Inconsistent quality and lack of time for thorough review are known issues. This use case presented a clear opportunity for AI to assist to improve the consistency and quality of the letter itself. 

The task of writing letters is largely document-generation – a pattern that advanced language models are well-suited for. Demand letters follow a relatively standard structure (explain who you are, state the request, cite laws, etc.), and LASSB already uses templates and boilerplate language for some sections. This means an AI could be trained or prompted to follow that format and fill in the specifics for each client. By leveraging an AI to draft the bulk of the text, each letter could be produced much faster, with the model handling the repetitive phrasing and legal citations while the attorney only needs to make corrections or additions. 

Crucially, using AI here could increase LASSB’s capacity. Rather than an attorney spending, say, 2-3 hours composing a letter from scratch, the AI might generate a solid draft in minutes, requiring perhaps 15 minutes of review and editing. The project team estimated that integrating an AI tool into the workflow could save on the order of 1.5–2.5 hours per client in total staff time. Scaled over dozens of cases, those saved hours mean more clients served and shorter wait times for help. This efficiency gain is attractive to funders and legal aid leaders because it stretches scarce resources further. 

AI can help enforce consistency and accuracy. It would use the same approved legal language across all letters, reducing the chance of human error or omissions in the text. For clients, this translates into a more reliable service – they are more likely to receive a well-written letter regardless of which attorney or volunteer is assisting them. 

The reasonable accommodation letter use case was selected because it sits at the sweet spot of high importance and high potential for automation. It addresses a pressing need for LASSB’s clients (ensuring disabled tenants can assert their rights) and plays to AI’s strengths (generating structured documents from templates and data). By starting with this use case, the project aimed to deliver a tangible, impactful tool that could quickly demonstrate value – a prototype AI assistant that materially improves the legal aid workflow for a critical class of cases.


Workflow Vision:

From Current Demand Letter Process to Future AI-Human Collaboration

To understand the impact of the proposed solution, it’s important to compare the current human-driven workflow of creating Demand Letters and the envisioned future workflow where an AI assistant is integrated. Below, we outline the step-by-step process today and how it would change with the AI prototype in place. 

Current Demand Letter Workflow (Status Quo)

When a tenant with a disability encounters an issue with their landlord (for example, the landlord is refusing an accommodation or threatening eviction over a disability-related issue), the tenant must navigate several steps to get a demand letter:

  • Initial Intake Call: The tenant contacts LASSB’s hotline and speaks to an intake call-taker (often a paralegal). The tenant explains their situation and disability, and the intake worker records basic information and performs eligibility screening (checking income, conflict of interest, etc.). If the caller is eligible and the issue is within LASSB’s scope, the case is referred to a housing attorney for follow-up.
  • Attorney Consultation: The tenant then has to repeat their story to a housing attorney (often days later). The attorney conducts a more in-depth interview about the tenant’s disability needs and the accommodation they seek. At this stage, the attorney determines if a reasonable accommodation letter is the appropriate course of action. (If not – for example, if the problem requires a different remedy – the attorney would advise on next steps outside the demand letter process.)
  • Letter Drafting: If a demand letter is warranted, the process for drafting it is currently inconsistent. In some cases, the attorney provides the client with a template or “self-help” packet on how to write a demand letter and asks the client to draft it themselves. In other cases, the attorney or a paralegal might draft the letter on the client’s behalf. With limited time, attorneys often cannot draft every letter from scratch, so the level of assistance varies. Clients may end up writing the first draft on their own, which can lead to incomplete or less effective letters. (One LASSB attorney noted that tenants frequently have to “explain their story at least twice” – to the intake worker and attorney – “and then have to draft/send the demand letter with varying levels of help”.)
  • Review and Delivery: Ideally, if the client drafts the letter, they will bring it back for the attorney to review and approve. Due to time pressures, however, attorney review isn’t always thorough, and sometimes letters go out without a detailed legal polish. Finally, the tenant sends the demand letter to the landlord, either by mail or email (or occasionally LASSB sends it on the client’s behalf). At this point, the process relies on the landlord’s response; LASSB’s involvement usually ends unless further action (like litigation) is needed.

This current workflow places a heavy burden on the tenant and the attorney. The tenant must navigate multiple conversations and may end up essentially drafting their own legal letter. The attorney must spend time either coaching the client through writing or drafting the letter themselves, on top of all their other cases. Important information can slip through the cracks when the client is interviewed multiple times by different people. There is also no consistent tracking of what advice or templates were given to the client, leading to variability in outcomes. Overall, the process can be slow (each step often spreads over days or weeks of delay) and resource-intensive, contributing to the bottleneck in serving clients.


Proposed AI-Assisted Workflow (Future Vision)

In the reimagined process, an AI agent would streamline the stages between intake and letter delivery, working in tandem with LASSB staff.

After a human intake screens the client, the AI Demand Letter Assistant takes over the interview to gather facts and draft the letter. The attorney then reviews the draft and finalizes the letter for the client to send.

  • Post-Intake AI Interview: Once a client has been screened and accepted for services by LASSB’s intake staff, the AI Demand Letter Assistant engages the client in a conversation (via chat or a guided web form; a phone interface could also be possible). The AI introduces itself as a virtual assistant working with LASSB and uses a structured but conversational script to collect all information relevant to the accommodation request. This includes the client’s basic details, details of the disability and needed accommodation, the landlord’s information, and any prior communications or incidents (e.g. if the tenant has asked before or if the landlord has issued notices). The assistant is programmed to use trauma-informed language – it asks questions in a supportive, non-threatening manner and adjusts wording to the client’s comfort, recognizing that relaying one’s disability needs can be sensitive. Throughout the interview, the AI can also perform helpful utilities, such as inserting the current date or formatting addresses correctly, to ensure the data it gathers is ready for a letter.
  • Automatic Letter Generation: After the AI has gathered all the necessary facts from the client, it automatically generates a draft demand letter. The generation is based on LASSB-approved templates and includes the proper formal letter format (date, addresses, RE: line, etc.), a clear statement of the accommodation request, and citations to relevant laws/regulations (like referencing the FHA, ADA, or state law provisions that apply). The AI uses the information provided by the client to fill in key details – for example, describing the tenant’s situation (“Jane Doe, who has an anxiety disorder, requests an exception to the no-pets policy to allow her service dog”) and customizing the legal rationale to that scenario. Because the AI has been trained on example letters and legal guidelines, it can include the correct legal language to strengthen the demand. It also ensures the tone remains polite and professional. At the end of this step, the AI has a complete draft letter ready.
  • Attorney Review & Collaboration: The draft letter, along with a summary of the client’s input or a transcript of the Q&A, is then forwarded to a LASSB housing attorney for review. The attorney remains the ultimate decision-maker – they will read the AI-drafted letter and check it for accuracy, appropriate tone, and effectiveness. If needed, the attorney can edit the letter (either directly or by giving feedback to the AI to regenerate specific sections). The AI could also highlight any uncertainties (for instance, if the client’s explanation was unclear on a point, the draft might flag that for attorney clarification). Importantly, no letter is sent out without attorney approval, ensuring that professional legal judgment is applied. This human-in-the-loop review addresses ethical duties (attorneys must supervise AI work as they would a junior staffer) and maintains quality control. In essence, the AI does the first 90% of the drafting, and the attorney provides the final 10% refinement and sign-off.
  • Delivery and Follow-Up: After the attorney finalizes the content, the letter is ready to be delivered to the landlord. In the future vision, this could be as simple as clicking a button to send the letter via email or printing it for mailing. (The prototype also floated ideas like integrating with DocuSign or generating a PDF that the client can download and sign.) The client then sends the demand letter to the landlord, formally requesting the accommodation. Ideally, this happens much faster than in the current process – potentially the same day as the attorney consultation, since the drafting is near-instant. LASSB envisioned that the AI might even assist in follow-up: for instance, checking back with the client a couple weeks later to ask if the landlord responded, and if not, suggesting next steps. (This follow-up feature was discussed conceptually, though not implemented in the prototype.) In any case, by the end of the workflow, the client has a professionally crafted letter in hand, and they did not have to write it alone.

The benefits of this AI-human collaboration are significant. It eliminates the awkward gap where a client might be left drafting a letter on their own; instead, the client is guided through questions by the AI and sees a letter magically produced from their answers. It also reduces duplicate interviewing – the client tells their full story once to the AI (after intake), rather than explaining it to multiple people in pieces. 

For the attorney, the time required to produce a letter drops dramatically. Rather than spending a couple of hours writing and editing, an attorney might spend 10–20 minutes reviewing the AI’s draft, tweaking a phrase or two, and approving it. The team’s estimates suggest each case could save on the order of 1.5–2.5 hours of staff time under this new workflow. Those savings translate into lower wait times and the ability for LASSB to assist many more clients in a given period with the same staff. In broader terms, more tenants would receive the help they need, fewer calls would be abandoned, and LASSB’s attorneys could devote more attention to complex cases (since straightforward letters are handled in part by the AI). 

The intended impact is “more LASSB clients have their day in court… more fair and equitable access to justice for all”, as the student team put it – in this context meaning more clients are able to assert their rights through demand letters, addressing issues before they escalate. The future vision sees the AI prototype seamlessly embedded into LASSB’s service delivery: after a client is screened by a human, the AI takes on the heavy lifting of information gathering and document drafting, and the human attorney ensures the final product meets the high standards of legal practice. This collaboration could save time, improve consistency, and ultimately empower more tenants with disabilities to get the accommodations they need to live safely and with dignity.


Technical Approach and Prototyping: What We Built and How It Works

With the use case defined, the project team proceeded to design and build a working prototype AI agent for demand letter drafting. This involved an iterative process of technical development, testing, and refinement over two academic quarters. In this section, we describe the technical solution – including early prototypes, the final architecture, and how the system functions under the hood.

Early Prototype and Pivot 

In Autumn 2024, the team’s initial prototype focused on an AI intake interviewing agent (nicknamed “iNtake”) as well as a rudimentary letter generator. They experimented with a voice-based assistant that could talk to clients over the phone. Using tools like Twilio (for telephony and text messaging) and Google’s Dialogflow/Chatbot interfaces, they set up a system where a client could call a number and interact with an AI-driven phone menu. The AI would ask the intake questions in a predefined script and record the answers. 

Behind the scenes, the prototype leveraged a large language model (LLM) – essentially an AI text-generation engine – to handle the conversational aspect. The team used a model configuration referred to as “gemini-1.5-flash”, which was integrated into the phone chatbot. 

This early system demonstrated some capabilities (it could hold a conversation and hand off to a human if needed), but also revealed significant challenges. The script was over 100 questions long and not trauma-informed – users found it tedious and perhaps impersonal. Additionally, the AI sometimes struggled with the decision-tree logic of intake. 

After several iterations and feedback from instructors and LASSB, the team decided to pivot. They narrowed the scope to concentrate on the Demand Letter Agent – a chatbot that would come after intake to draft the letter. The phone-based intake AI became a separate effort (handled by another team in Winter 2025), while our team focused on the letter generator. 

Final Prototype Design

The Winter 2025 team built upon the fall work to create a functioning AI chat assistant for demand letters. The prototype operates as an interactive chatbot that can be used via a web interface (in testing, it was run on a laptop, but it could be integrated into LASSB’s website or a messaging platform in the future). Here’s how it works in technical terms.

The AI agent was developed using a generative Large Language Model (LLM) – similar to the technology behind GPT-4 or other modern conversational AIs. This model was not trained from scratch by the team (which would require huge data and compute); instead, the team used a pre-existing model and focused on customizing it through prompt engineering and providing domain-specific data. In practical terms, the team created a structured “AI playbook” or prompt script that guides the model step-by-step to perform the task.

Data and Knowledge Integration

One of the first steps was gathering all relevant reference material to inform the AI’s outputs. The team collected LASSB’s historical demand letters (redacted for privacy), which provided examples of well-written accommodation letters. They also pulled in legal sources and guidelines: for instance, the U.S. Department of Justice’s guidance memos on reasonable accommodations, HUD guidelines, trauma-informed interviewing guidelines, and lists of common accommodations and impairments. These documents were used to refine the AI’s knowledge. 

Rather than blindly trusting the base model, the team explicitly incorporated key legal facts – such as definitions of “reasonable accommodation” and the exact language of FHA/FEHA requirements – into the AI’s prompt or as reference text the AI could draw upon. Essentially, the AI was primed with: “Here are the laws and an example demand letter; now follow this format when drafting a new letter.” This helped ensure the output letters would be legally accurate and on-point.

Prompt Engineering

The heart of the prototype is a carefully designed prompt/instruction set given to the AI model. The team gave the AI a persona and explicit instructions on how to conduct the conversation and draft the letter. For example, the assistant introduces itself as “Sofia, the Legal Aid Society of San Bernardino’s Virtual Assistant” and explains its role to the client (to help draft a letter). The prompt includes step-by-step instructions for the interview: ask the client’s name, ask what accommodation they need, confirm details, etc., in a logical order (it’s almost like a decision-tree written in natural language form). A snippet of the prompt (from the “Generative AI playbook”) is shown below:

Excerpt from the AI assistant’s instruction script. The agent is given a line-by-line guide to greet the client, collect information (names, addresses, disability details, etc.), and even call a date-time tool to insert the current date for the letter. 

The prompt also explicitly instructs the AI on legal and ethical boundaries. For instance, it was told: “Your goal is to write and generate a demand letter for reasonable accommodations… You do not provide legal advice; you only assist with drafting the letter.”. This was crucial to prevent the AI from straying into giving advice or making legal determinations, which must remain the attorney’s domain. By iteratively testing and refining this prompt, the team taught the AI to stay in its lane: ask relevant questions, be polite and empathetic, and focus on producing the letter.

Trauma-Informed and Bias-Mitigation Features

A major design consideration was ensuring the AI’s tone and behavior were appropriate for vulnerable clients. The team trained the AI (through examples and instructions) to use empathetic language – e.g., thanking the client for sharing information, acknowledging difficulties – and to avoid any phrasing that might come off as judgmental or overly clinical. The AI was also instructed to use the client’s own words when possible and not to press sensitive details unnecessarily. On the technical side, the model was tested for biases. The team used diverse example scenarios to ensure the AI’s responses wouldn’t differ inappropriately based on the nature of the disability or other client attributes. Regular audits of outputs were done to catch any bias. For example, they made sure the AI did not default to male pronouns for landlords or assume anything stereotypical about a client’s condition. These measures align with best practices to ensure the AI’s output is fair and respects all users.

Automated Tools Integration

The prototype included some clever integrations of simple tools to enhance accuracy. One such tool was a date function. In early tests, the AI sometimes forgot to put the current date on the letter or used a generic placeholder. To fix this, the team connected the AI to a utility that fetches the current date. During the conversation, if the user is ready to draft the letter, the AI will call this date function and insert the actual current date into the letter heading. This ensures the generated letter always shows (for example) “May 19, 2023” rather than a hardcoded date. Similarly, the AI was guided to properly format addresses and other elements (it asks for each component like city, state, ZIP and then concatenates them in the letter format). These might seem like small details, but they significantly improve the professionalism of the output.

Draft Letter Generation

Once the AI has all the needed info, it composes the letter in real-time. It follows the structure from the prompt and templates: the letter opens with the date and address, a reference line (“RE: Request for Reasonable Accommodation”), a greeting, and an introduction of the client. Then it lays out the request and the justification, citing the laws, and closes with a polite sign-off. The content of the letter is directly based on the client’s answers. For instance, if the client said they have an anxiety disorder and a service dog, the letter will include those details and explain why the dog is needed. The AI’s legal knowledge ensures that it inserts the correct references to the FHA and California Fair Employment and Housing Act (FEHA), explaining that landlords must provide reasonable accommodations unless it’s an undue burden. 

An example output is shown below:

Sample excerpt from an AI-generated reasonable accommodation letter. In this case, the tenant (Jane Doe) is requesting an exception to a “no pets” policy to allow her service dog. The AI’s draft includes the relevant law citations (FHA and FEHA) and a clear explanation of why the accommodation is necessary. 

As seen in the example above, the AI’s letter closely resembles one an attorney might write. It addresses the landlord respectfully (“Dear Mr. Jones”), states the tenant’s name and address, and the accommodation requested (permission to keep a service animal despite a no-pet policy). It then cites the Fair Housing Act and California law, explaining that these laws require exceptions to no-pet rules as a reasonable accommodation for persons with disabilities. It describes the tenant’s specific circumstances (the service dog helps manage her anxiety, etc.) in a factual and supportive tone. It concludes with a request for a response within a timeframe and a polite thank you. All of this text was generated by the AI based on patterns it learned from training data and the prompt instructions – the team did not manually write any of these sentences for this particular letter, showing the generative power of the AI. The attorney’s role would then be to review this draft. 

In our tests, attorneys found the drafts to be surprisingly comprehensive. They might only need to tweak a phrase or add a specific detail. For example, an attorney might insert a line offering to provide medical documentation if needed, or adjust the deadline given to the landlord. But overall, the AI-generated letters were on point and required only light editing. 

Testing and Iteration

The development of the prototype involved iterative testing and debugging. Early on, the team encountered some issues typical of advanced AI systems and worked to address them.

Getting the agent to perform consistently

Initially, the AI misunderstood its task at times. In the first demos, when asked to draft a letter, the AI would occasionally respond with “I’m sorry, I can’t write a letter for you”, treating it like a prohibited action. This happened because base language models often have safety rules about not producing legal documents. The team resolved this by refining the prompt to clarify that the AI is allowed and expected to draft the letter as part of its role (since an attorney will review it). Once the AI “understood” it had permission to assist, it complied.

Ensuring the agent produced the right output

The AI also sometimes ended the interview without producing the letter. Test runs showed that if the user didn’t explicitly ask for the letter, the AI might stop after gathering info. To fix this, the team adjusted the instructions to explicitly tell the AI that once it has all the information, it should automatically present the draft letter to the client for review. After adding this, the AI reliably output the draft at the end of the conversation.

We sometimes had the agent offering to do unsolicited tasks, like sending an email. That wasn’t in the configuration, but it was improvising off-script.

Un-sticking the agent, caught in a loop

There were issues with the AI getting stuck or repeating itself. For example, in one scenario, the AI began to loop, apologizing and asking the same question multiple times even after the user answered. 

A screenshot from testing shows the AI repeating “Sorry, something went wrong, can you repeat?” in a loop when it hit an unexpected input. These glitches were tricky to debug – the team adjusted the conversation flow and added checks (like if the user already answered, do not ask again), which reduced but did not completely eliminate such looping. We identified that these loops often stemmed from the model’s uncertainty or minor differences in phrasing that weren’t accounted for in the script.

Dealing with fake or inaccurate info

Another issue was occasional hallucinations or extraneous content. For instance, the AI at one point started offering to “email the letter to the landlord” out of nowhere, even though that wasn’t in its instructions (and it had no email capability). This was the model improvising beyond its intended scope. The team addressed this by tightening the prompt instructions, explicitly telling the AI not to do anything with email and to stick to generating the letter text only. After adding such constraints, these hallucinations became rarer.

Getting consistent letter formatting

The formatting of the letter (dates, addresses, signature line) needed fine-tuning. The AI initially had minor formatting quirks (like sometimes missing the landlord’s address or not knowing how to sign off). By providing a template example and explicitly instructing the inclusion of those elements, the final prototype reliably produced a correctly formatted letter with a placeholder for the client’s signature.

Throughout development, whenever an issue was discovered, the team would update the prompt or the data and test again. This iterative loop – test, observe output, refine instructions – is a hallmark of developing AI solutions and was very much present in this project. 

Over time, the outputs improved significantly in quality and reliability. For example, by the end of the Winter quarter, the AI was consistently using the correct current date (thanks to the date tool integration) and writing in a supportive tone (thanks to the trauma-informed training), which were clear improvements from earlier versions. That said, some challenges remained unsolved due to time limits. 

The AI still showed some inconsistent behaviors occasionally – such as repeating a question in a rare case, or failing to recognize an atypical user response (like if a user gave an extremely long-winded answer that confused the model). The team documented these lingering issues so that future developers can target them. They suspected that further fine-tuning of the model or using a more advanced model could help mitigate these quirks. 

In its final state at the end of Winter 2025, the prototype was able to conduct a full simulated interview and generate a reasonable accommodation demand letter that LASSB attorneys felt was about 80–90% ready to send, requiring only minor edits. 

The technical architecture was a single-page web application interfacing with the AI model (running on a cloud AI platform) plus some back-end scripts for the date tool and data storage. It was not yet integrated into LASSB’s production systems, but it provided a compelling proof-of-concept. 

Observers in the final presentation could watch “Sofia” chat with a hypothetical client (e.g., Martin who needed an emotional support animal) and within minutes, produce a letter addressed to the landlord citing the FHA – something that would normally take an attorney a couple of hours. 

Overall, the technical journey of this project was one of rapid prototyping and user-centered adjustment. The team combined off-the-shelf AI technology with domain-specific knowledge to craft a tool tailored for legal aid. They learned how small changes in instructions can greatly affect an AI’s behavior, and they progressively molded the system to align with LASSB’s needs and values. The result is a working prototype of an AI legal assistant that shows real promise in easing the burden of document drafting in a legal aid context.

Evaluation Framework: Testing, Quality Standards, and Lessons Learned

From the outset, the team and LASSB agreed that rigorous evaluation would be critical before any AI tool could be deployed in practice. The project developed an evaluation framework to measure the prototype’s performance and ensure it met both efficiency goals and legal quality standards. Additionally, throughout development the team reflected on broader lessons learned about using AI in a legal aid environment. This section discusses the evaluation criteria, testing methods, and key insights gained. Quality Standards and Benchmarks: The primary measure of success for the AI-generated letters was that they be indistinguishable (in quality) from letters written by a competent housing attorney. To that end, the team established several concrete quality benchmarks:

  • No “Hallucinations”: The AI draft should contain no fabricated facts, case law, or false statements. All information in the letter must come from the client’s provided data or be generally accepted legal knowledge. For example, the AI should never cite a law that doesn’t exist or insert details about the tenant’s situation that the tenant didn’t actually tell it. Attorneys reviewing the letters specifically check for any such hallucinated content.
  • Legal Accuracy: Any legal assertions in the letter (e.g. quoting the Fair Housing Act’s requirements) must be precisely correct. The letter should not misstate the law or the landlord’s obligations. Including direct quotes or citations from statutes/regulations was one method used to ensure accuracy. LASSB attorneys would verify that the AI correctly references ADA, FHA, FEHA, or other laws as applicable.
  • Proper Structure and Tone: The format of the letter should match what LASSB attorneys expect in a formal demand letter. That means: the letter has a date, addresses for both parties, a clear subject line, an introduction, body paragraphs that state the request and legal basis, and a courteous closing. The tone should be professional – firm but not aggressive, and certainly not rude. One benchmark was that an AI-drafted letter “reads like” an attorney’s letter in terms of formality and clarity. If an attorney would normally include or avoid certain phrases (for instance, saying “Thank you for your attention to this matter” at the end, or avoiding contractions in a formal letter), the AI’s output is expected to do the same.
  • Completeness: The letter should cover all key points necessary to advocate for the client. This includes specifying the accommodation being requested, briefly describing the disability connection, citing the legal right to the accommodation, and possibly mentioning an attached verification if relevant. An incomplete letter (one that, say, only requests but doesn’t cite any law) would not meet the standard. Attorneys reviewing would ensure nothing crucial was missing from the draft.

In addition to letter quality, efficiency metrics were part of the evaluation. The team intended to log how long the AI-agent conversation took and how long the model took to generate the letter, aiming to show a reduction in total turnaround time compared to the status quo. Another metric was the effect on LASSB’s capacity: for example, could implementing this tool reduce the number of calls that drop off due to long waits? In theory, if attorneys spend less time per client, more calls can be returned. The team proposed tracking number of clients served before and after deploying the AI as a long-term metric of success. 

Evaluation Methods

To assess these criteria, the evaluation plan included several components.

Internal Performance Testing

The team performed timed trials of the AI system. They measured the duration of a full simulated interview and letter draft generation. In later versions, the interview took roughly 10–15 minutes (depending on how much detail the client gives), and the letter was generated almost instantly thereafter (within a few seconds). They compared this to an estimate of human drafting time. These trials demonstrated the raw efficiency gain – a consistent turnaround of under 20 minutes for a draft letter, which is far better than the days or weeks it might take in the normal process. They also tracked if any technical slowdowns occurred (for instance, if the AI had to call external tools like the date function, did that introduce delays? It did not measurably – the date lookup was near-instant).

Expert Review (Quality Control)

LASSB attorneys and subject matter experts were involved in reviewing the AI-generated letters. The team conducted sessions where an attorney would read an AI draft and score it on accuracy, tone, and completeness. The feedback from these reviews was generally positive – attorneys found the drafts surprisingly thorough. They did note small issues (e.g., “we wouldn’t normally use this phrasing” or “the letter should also mention that the client can provide a doctor’s note if needed”). 

These observations were fed back into improving the prompt. The expert review process is something that would continue regularly if the tool is deployed: LASSB could institute, say, a policy that attorneys must double-check every AI-drafted letter and log any errors or required changes. Over time, this can be used to measure whether the AI’s quality is improving (i.e., fewer edits needed).

User Feedback

Another angle was evaluating the system’s usability and acceptance by both LASSB staff and clients. The team gathered informal feedback from users who tried the chatbot demo (including a couple of law students role-playing as clients). They also got input from LASSB’s intake staff on whether they felt such a chatbot would be helpful. In a deployed scenario, the plan is to collect structured feedback via surveys. For example, clients could be asked if they found the virtual interview process easy to understand, and attorneys could be surveyed on their satisfaction with the draft letters. High satisfaction ratings would indicate the system is meeting needs, whereas any patterns of confusion or dissatisfaction would signal where to improve (perhaps the interface or the language the AI uses).

Long-term Monitoring

The evaluation framework emphasizes that evaluation isn’t a one-time event. The team recommended continuous monitoring if the prototype moves to production. This would involve regular check-ins (monthly or quarterly meetings) among stakeholders – the legal aid attorneys, paralegals, technical team, etc. – to review how things are going. They could review statistics (number of letters generated, average time saved) and any incidents (e.g., “the AI produced an incorrect statement in a letter on March 3, we caught it in review”). This ongoing evaluation ensures that any emerging issues (perhaps a new type of accommodation request the AI wasn’t trained on) are caught and addressed. It’s akin to maintenance: the AI tool would be continually refined based on real-world use data to ensure it remains effective and trustworthy.

Risk and Ethical Considerations

Part of the evaluation also involved analyzing potential risks. The team did a thorough risk, ethics, and regulation analysis in their final report to make sure any deployment of the AI would adhere to legal and professional standards. Some key points from that analysis:

Data Privacy & Security

The AI will be handling sensitive client information (details about disabilities, etc.). The team stressed the need for strict privacy safeguards – for instance, if using cloud AI services, ensuring they are HIPAA-compliant or covered by appropriate data agreements. They proposed measures like encryption of stored transcripts and obtaining client consent for using an AI tool. Any integration with LASSB’s case management (LegalServer) would have to follow data protection policies.

Bias and Fairness

They cautioned that AI models can inadvertently produce biased outputs if not properly checked. For example, might the AI’s phrasing be less accommodating to a client with a certain type of disability due to training data bias? The mitigation is ongoing bias testing and using a diverse dataset for development. The project incorporated an ethical oversight process to regularly audit letters for any bias or inappropriate language.

Acceptance by Courts/Opposing Parties

A unique consideration for legal documents is whether an AI-drafted letter (or brief) will be treated differently by its recipient. The team noted recent cases of courts being skeptical of lawyers’ use of ChatGPT, emphasizing lawyers’ duty to verify AI outputs. For demand letters (which are not filed in court but sent to landlords), the risk is lower than in litigation, but still LASSB must ensure the letters are accurate to maintain credibility. If a case did go to court, an attorney might need to attest that they supervised the drafting. Essentially, maintaining transparency and trust is important – LASSB might choose to inform clients about the AI-assisted system (to manage expectations) and would certainly ensure any letter that ends up as evidence has been vetted by an attorney.

Professional Responsibility

The team aligned the project with guidance from the American Bar Association and California State Bar on AI in law practice. These guidelines say that using AI is permissible as long as attorneys ensure competence, confidentiality, and no unreasonable fees are charged for it. In practice, that means LASSB attorneys must be trained on how to use the AI tool correctly, must keep client data safe, and must review the AI’s work. The attorney remains ultimately responsible for the content of the letter. The project’s design – always having a human in the loop – was very much informed by these professional standards.

Lessons Learned

Over the course of the project, the team gained valuable insights, both in terms of the technology and the human element of implementing AI in legal services. Some of the key lessons include the following.

AI is an Augmenting Tool, Not a Replacement for Human Expertise

Perhaps the most important realization was that AI cannot replace human empathy or judgment in legal aid. The team initially hoped the AI might handle more of the process autonomously, but they learned that the human touch is irreplaceable for sensitive client interactions. For example, the AI can draft a letter, but it cannot (and should not) decide whether a client should get a letter or what strategic advice to give – that remains with the attorney. Moreover, clients often need empathy and reassurance that an AI cannot provide on its own. As one reflection noted, the AI might be very efficient, “however, we learned that AI cannot replace human empathy, which is why the final draft letter always goes to an attorney for final review and client-centered adjustment.” In practice, the AI assists, and the attorney still personalizes the counsel.

Importance of Partner Collaboration and User-Centered Design

The close collaboration with LASSB staff was crucial. Early on, the team had some misaligned assumptions (e.g., focusing on a technical solution that wasn’t actually practical in LASSB’s context, like the phone intake bot). By frequently communicating with the partner – including weekly check-ins and showing prototype demos – the team was able to pivot and refine the solution to fit what LASSB would actually use. One lesson was to always “keep the end user in mind”. In this case, the end users were both the LASSB attorneys and the clients. Every design decision (from the tone of the chatbot to the format of the output) was run through the filter of “Is this going to work for the people who have to use it?” For instance, the move from a phone interface to a chat interface was influenced by partner feedback that a phone bot might be less practical, whereas a web-based chat that produces a printable letter fits more naturally into their workflow.

Prototype Iteratively and Be Willing to Pivot

The project reinforced the value of an iterative, agile approach. The team did not stick stubbornly to the initial plan when it proved flawed. They gathered data (user feedback, technical performance data) and made a mid-course correction to narrow the project’s scope. This pivot ultimately led to a more successful outcome. The lesson for future projects is to embrace flexibility – it’s better to achieve a smaller goal that truly works than to chase a grand vision that doesn’t materialize. As noted in the team’s retrospective, “Be willing to pivot and challenge assumptions” was key to their progress.

AI Development Requires Cross-Disciplinary Skills

The students came from law and engineering backgrounds, and both skill sets were needed. They had to “upskill to learn what you need” on the fly – for example, law students learned some prompt-engineering and coding; engineering students learned about fair housing law and legal ethics. For legal aid organizations, this is a lesson that implementing AI will likely require new trainings and collaboration between attorneys and tech experts.

AI Output Continues to Improve with Feedback

Another positive lesson was that the AI’s performance did improve significantly with targeted adjustments. Initially, some doubted whether a model could ever draft a decent legal letter. But by the end, the results were quite compelling. This taught the team that small tweaks can yield big gains in AI behavior – you just have to systematically identify what isn’t working (e.g., the AI refusing to write, or using the wrong tone) and address it. It’s an ongoing process of refinement, which doesn’t end when the class ends. The team recognized that deploying an AI tool means committing to monitor and improve it continuously. As they put it, “there is always more that can be done to improve the models – make them more informed, reliable, thorough, ethical, etc.”. This mindset of continuous improvement is itself a key lesson, ensuring that complacency doesn’t set in just because the prototype works in a demo.

Ethical Guardrails Are Essential and Feasible

Initially, there was concern about whether an AI could be used ethically for legal drafting. The project showed that with the right guardrails – human oversight, clear ethical policies, transparency – it is not only possible but can be aligned with professional standards. The lesson is that legal aid organizations can innovate with AI responsibly, as long as they proactively address issues of confidentiality, accuracy, and attorney accountability. LASSB leadership was very interested in the tool but also understandably cautious; seeing the ethical framework helped build their confidence that this could be done in a way that enhances service quality rather than risks it.

In conclusion, the evaluation phase of the project confirmed that the AI prototype can meet high quality standards (with attorney oversight) and significantly improve efficiency. It also surfaced areas to watch – for example, ensuring the AI remains updated and bias-free – which will require ongoing evaluation post-deployment. The lessons learned provide a roadmap for both this project and similar initiatives: keep the technology user-centered, maintain rigorous quality checks, and remember that AI is best used to augment human experts, not replace them. By adhering to these principles, LASSB and other legal aid groups can harness AI’s benefits while upholding their duty to clients and justice.

Next Steps

Future Development, Open Questions, and Recommendations

The successful prototyping of the AI demand letter assistant is just the beginning. Moving forward, there are several steps to be taken before this tool can be fully implemented in production at LASSB. The project team compiled a set of recommendations and priorities for future development, as well as open questions that need to be addressed. Below is an outline of the next steps:

Expand and Refine the Training Data

To improve the AI’s consistency and reliability, the next development team should incorporate additional data sources into the model’s knowledge base. During Winter 2025, the team gathered a trove of relevant documents (DOJ guidance, HUD memos, sample letters, etc.), but not all of this material was fully integrated into the prototype’s prompts.

Organizing and inputting this data will help the AI handle a wider range of scenarios. For example, there may be types of reasonable accommodations (like a request for a wheelchair ramp installation, or an exemption from a parking fee) that were not explicitly tested yet. Feeding the AI examples or templates of those cases will ensure it can draft letters for various accommodation types, not just the service-animal case.

The Winter team has prepared a well-structured archive of resources and notes for the next team, documenting their reasoning and changes made. It includes, for instance, an explanation of why they decided to focus exclusively on accommodation letters (as opposed to tackling both accommodations and modifications in one agent) – knowledge that will help guide future developers so they don’t reinvent the wheel. Leveraging this prepared data and documentation will be a top priority in the next phase.

Improve the AI’s Reliability and Stability

While the prototype is functional, we observed intermittent issues like the AI repeating itself or getting stuck in loops under certain conditions. Addressing these glitches is critical for a production rollout. The recommendation is to conduct deeper testing and debugging of the model’s behavior under various inputs. Future developers might use techniques like adversarial testing – intentionally inputting confusing or complex information to see where the AI breaks – and then adjusting the prompts or model settings accordingly. There are a few specific issues to fix:

  • The agent occasionally repeats the same question or answer multiple times (this looping behavior might be due to how the conversation history is managed or a quirk of the model). This needs to be debugged so the AI moves on in the script and doesn’t frustrate the user.
  • The agent sometimes fails to recognize certain responses – for example, if a user says “Yeah” instead of “Yes,” will it understand? Ensuring the AI can handle different phrasings and a range of user expressions (including when users might go on tangents or express emotion) is important for robustness.
  • Rarely, the agent might still hallucinate or provide an odd response (e.g., referring to sending an email when it shouldn’t). Further fine-tuning and possibly using a more advanced model with better instruction-following could reduce these occurrences. Exploring the underlying model’s parameters or switching to a model known for higher reliability (if available through the AI platform LASSB chooses) could be an option.

One open question is “why” the model exhibits these occasional errors – it’s often not obvious, because AI models are black boxes to some degree. Future work could involve more diagnostics, such as checking the conversation logs in detail or using interpretability tools to see where the model’s attention is going. Understanding the root causes could lead to more systemic fixes. The team noted that sometimes the model’s mistakes had no clear trigger, which is a reminder that continuous monitoring (as described in evaluation) will be needed even post-launch.

Enhance Usability and Human-AI Collaboration Features

The prototype currently produces a letter draft, but in a real-world setting, the workflow can be made even more user-friendly for both clients and attorneys. Several enhancements are recommended:

Editing Interface

Allow the attorney (or even the client, if appropriate) to easily edit the AI-generated letter in the interface. For instance, after the AI presents the draft, there could be an “Edit” button that opens the text in a word processor-like environment. This would save the attorney from having to copy-paste into a separate document. The edits made could even be fed back to the AI (as learning data) to continuously improve it.

Download/Export Options

Integrate a feature to download the letter as a PDF or Word document. LASSB staff indicated they would want the final letter in a standard format for record-keeping and for the client to send. Automating this (the AI agent could fill a PDF template or use a document assembly tool) would streamline the process. One idea is to integrate with LASSB’s existing document system or use a platform like Documate or Gavel (which LASSB uses for other forms) – the AI could output data into those systems to produce a nicely formatted letter on LASSB letterhead.

Transcript and Summary for Attorneys

When the AI finishes the interview, it can provide not just the letter but also a concise summary of the client’s situation along with the full interview transcript to the attorney. The summary could be a paragraph saying, e.g., “Client Jane Doe requests an exception to no-pet policy for her service dog. Landlord: ABC Properties. No prior requests made. Client has anxiety disorder managed by dog.”

Such a summary, generated automatically, would allow the reviewing attorney to very quickly grasp the context without reading the entire Q&A transcript. The transcript itself should be saved and accessible (perhaps downloadable as well) so the attorney can refer back to any detail if needed. These features will decrease the need for the attorney to re-interview the client, thus preserving the efficiency gains.

User Interface and Guidance

On the client side, ensure the chat interface is easy to use. Future improvements could include adding progress indicators (to show the client how many questions or sections are left), the ability to go back and change an answer, or even a voice option for clients who have difficulty typing (this ties into accessibility, discussed next). Essentially, polish the UI so that it is client-friendly and accessible.

Integration into LASSB’s Workflow 

In addition to the front-end enhancements, the tool should be integrated with LASSB’s backend systems. A recommendation is to connect the AI assistant to LASSB’s case management software (LegalServer) via API. This way, when a letter is generated, a copy could automatically be saved to the client’s case file in LegalServer. It could also pull basic info (like the client’s name, address) from LegalServer to avoid re-entering data. Another integration point is the hotline system – if in the future the screening AI is deployed, linking the two AIs could be beneficial (for example, intake answers collected by the screening agent could be passed directly to the letter agent, so the client doesn’t repeat information). These integrations, while technical, would ensure the AI tool fits seamlessly into the existing workflow rather than as a stand-alone app.

Broaden Accessibility and Language Support

San Bernardino County has a diverse population, and LASSB serves many clients for whom English is not a first language or who have disabilities that might make a standard chat interface challenging. Therefore, a key next step is to add multilingual capabilities and other accessibility features. The priority is Spanish language support, as a significant portion of LASSB’s client base is Spanish-speaking. This could involve developing a Spanish version of the AI agent – using a bilingual model or translating the prompt and output. The AI should ideally be able to conduct the interview in Spanish and draft the letter in Spanish, which the attorney could then review (noting that the final letter might need to be in English if sent to an English-speaking landlord, but at least the client interaction can be in their language). 

In addition, for clients with visual impairments, the interface should be compatible with screen readers (text-to-speech for the questions, etc.), and for those with low literacy or who prefer oral communication, a voice interface could be offered (perhaps reintroducing a refined version of the phone-based system, but integrated with the letter agent’s logic). Essentially, the tool should follow universal design principles so that no client is left out due to the technology format. This may require consulting accessibility experts and doing user testing with clients who have disabilities. 

Plan for Deployment and Pilot Testing

Before a full rollout, the team recommends a controlled pilot phase. In a pilot, a subset of LASSB staff and clients would use the AI tool on actual cases (with close supervision). Data from the pilot – success stories, any problems encountered, time saved metrics – should be collected and evaluated. This will help answer some open questions, such as: 

  • How do clients feel about interacting with an AI for part of their legal help? 
  • Does it change the attorney-client dynamic in any way? 
  • Are there cases where the AI approach doesn’t fit well (for instance, if a client has multiple legal issues intertwined, can the AI handle the nuance or does it confuse things)? 

These practical considerations will surface in a pilot. The pilot can also inform best practices for training staff on using the tool. Perhaps attorneys need a short training session on how to review AI drafts effectively, or intake staff need a script to explain to clients what the AI assistant is when transferring them. Developing guidelines and training materials is part of deployment. Additionally, during the pilot, establishing a feedback loop (maybe a weekly meeting to discuss all AI-drafted letters that week) will help ensure any kinks are worked out before scaling up. 

Address Open Questions and Long-Term Considerations

Some broader questions remain as this project moves forward.

How to Handle Reasonable Modifications

The current prototype focuses on reasonable accommodations (policy exceptions or services). A related need is reasonable modifications (physical changes to property, like installing a ramp). Initially, the team planned to include both, but they narrowed the scope to accommodations for manageability. Eventually, it would be beneficial to expand the AI’s capabilities to draft modification request letters as well, since the legal framework is similar but not identical. This might involve adding a branch in the conversation: if the client is requesting a physical modification, the letter would cite slightly different laws (e.g., California Civil Code related to modifications) and possibly include different information (like who will pay for the modification, etc.). The team left this as a future expansion area. In the interim, LASSB should be aware that the current AI might need additional training/examples before it can reliably handle modification cases.

Ensuring Ongoing Ethical Compliance

As the tool evolves, LASSB will need to regularly review it against ethical guidelines. For instance, if State Bar rules on AI use get updated, the system’s usage might need to be adjusted. Keeping documentation of how the AI works (so it can be explained to courts if needed) will be important. Questions like “Should clients be informed an AI helped draft this letter?” might arise – currently the plan would be to disclose if asked, but since an attorney is reviewing and signing off, the letter is essentially an attorney work product. LASSB might decide internally whether to be explicit about AI assistance or treat it as part of their workflow like using a template.

Maintenance and Ownership 

Who will maintain the AI system long-term? The recommendation is that LASSB identify either an internal team or an external partner (perhaps continuing with Stanford or another tech partner) to assume responsibility for piloting and updates.

AI models and integrations require maintenance – for example, if new housing laws pass, the model/prompt should be updated to include that. If the AI service (API) being used releases a new version that’s better/cheaper, someone should handle the upgrade. Funding might be needed for ongoing API usage costs or server costs. Planning for these practical aspects will ensure the project’s sustainability.

Scaling to Other Use Cases

If the demand letter agent proves successful, it could inspire similar tools for other high-volume legal aid tasks (for instance, generating answers to eviction lawsuits or drafting simple wills). One open question is how easily the approach here can be generalized. The team believes the framework (AI + human review) is generalizable, but each new use case will require its own careful curation of data and prompts. 

The success in the housing domain suggests LASSB and Stanford may collaborate to build AI assistants for other domains in the future (like an Unlawful Detainer Answer generator, etc.). This project can serve as a model for those efforts.

Finally, the team offered some encouraging closing thoughts: The progress so far shows that a tool like this “could significantly improve the situation and workload for staff at LASSB, allowing many more clients to receive legal assistance.” There is optimism that, with further development, the AI assistant can be deployed and start making a difference in the community. However, they also caution that “much work remains before this model can reach the deployment phase”

It will be important for future teams to continue with the same diligent approach – testing, iterating, and addressing the AI’s flaws – rather than rushing to deploy without refinement. The team emphasized a balance of excitement and caution: AI has great potential for legal aid, but it must be implemented thoughtfully. The next steps revolve around deepening the AI’s capabilities, hardening its reliability, improving the user experience, and carefully planning a real-world rollout. By following these recommendations, LASSB can move from a successful prototype to a pilot and eventually to a fully integrated tool that helps their attorneys and clients every day. The vision is that in the near future, a tenant with a disability in San Bernardino can call LASSB and, through a combination of compassionate human lawyers and smart AI assistance, quickly receive a strong demand letter that protects their rights – a true melding of legal expertise and technology to advance access to justice.

With continued effort, collaboration, and care, this prototype AI agent can become an invaluable asset in LASSB’s mission to serve the most vulnerable members of the community. The foundation has been laid; the next steps will bring it to fruition.

Categories
AI + Access to Justice Current Projects

A Call for Statewide Legal Help AI Stewards

Shaping the Future of AI for Access to Justice

By Margaret Hagan, originally published on Legal Design & Innovation

If AI is going to advance access to justice rather than deepen the justice gap, the public-interest legal field needs more than speculation and pilots — we need statewide stewardship.

2 missions of an AI steward, for a state’s legal help service provider community

We need specific people and institutions in every state who wake up each morning responsible for two things:

  1. AI readiness and vision for the legal services ecosystem: getting organizations knowledgeable, specific, and proactive about where AI can responsibly improve outcomes for people with legal problems — and improve the performance of services. This can ensure the intelligent and impactful adoption of AI solutions as they are developed.
  2. AI R&D encouragement and alignment: getting vendors, builders, researchers, and benchmark makers on the same page about concrete needs; matchmaking them with real service teams; guiding, funding, evaluating, and communicating so the right tools get built and adopted.

Ideally, these local state stewards will be talking with each other regularly. In this way, there can be federated research & development of AI solutions for legal service providers and the public struggling with legal problems.

This essay outlines what AI + Access to Justice stewardship could look like in practice — who can play the role, how it works alongside court help centers and legal aid, and the concrete, near-term actions a steward can take to make AI useful, safe, and truly public-interest.

State stewards can help local legal providers — legal aid groups, court help centers, pro bono networks, and community justice workers — to set a clear vision for AI futures & help execute it.

Why stewardship — why now?

Every week, new tools promise to draft, translate, summarize, triage, and file. Meanwhile, most legal aid organizations and court help centers are still asking foundational questions: What’s safe? What’s high-value? What’s feasible with our staff and privacy rules? How do we avoid vendor lock-in? How do we keep equity and client dignity at the center?

Without stewardship, AI adoption will be fragmented, extractive, and inequitable. With stewardship, states can:

  • Focus AI where it demonstrably helps clients and staff. Prioritize tech based on community and provider stakeholders’ needs and preferences — not just what is being sold by vendors.
  • Prepare data and knowledge so tools work in the local contexts. Also, that they can be trained safely & benchmarked responsibly with relevant data that is masked and safe.
  • Align funders, vendors, and researchers around real service needs. So that all of these stakeholder groups, with their capacity to support, build, and evaluate emerging technology, direct this capacity at opportunities that are meaningful.
  • Develop shared evaluation and governance so we build trust, not backlash.

Who can play the Statewide AI Steward role?

“Steward” is a role, not a single job title. Different kinds of groups can carry it, depending on how your state is organized:

  • Access to Justice Commissions / Bar associations / Bar foundations that convene stakeholders, fund statewide initiatives, and set standards.
  • Legal Aid Executive Directors (or cross-org consortia) with authority to coordinate practice areas and operations.
  • Court innovation offices / judicial councils that lead technology, self-help, and rules-of-court implementations.
  • University labs / legal tech nonprofits that have capacity for research, evaluation, data stewardship, and product prototyping.
  • Regional collaboratives with a track record of shared infrastructure and implementation.

Any of these can steward. The common denominator: local trusted relationships, coordination power, and delivery focus. The steward must be able to convene local stakeholders, communicate with them, work with them on shared training and data efforts, and move from talk to action.

The steward’s two main missions

Mission 1: AI readiness + vision (inside the legal ecosystem)

The steward gets legal organizations — executive directors, supervising/managing attorneys, practice leads, intake supervisors, operations staff — knowledgeable and specific about where AI can responsibly improve outcomes. This means:

  • Translating AI into service-level opportunities (not vague “innovation”).
  • Running short, targeted training sessions for leaders and teams.
  • Co-designing workflow pilots with clear review and safety protocols.
  • Building a roadmap: which portfolios, which tools, what sequence, what KPIs.
  • Clarify ethical, privacy, and consumer/client safety priorities and strategies, to talk about risks and worries in specific, technically-informed ways that provide sufficient protection to users and orgs — and don’t fall into inaction because of ill-defined concern about risk.

The result: organizations are in charge of the change rather than passive recipients of vendor pitches or media narratives.

2) AI tech encouragement + alignment (across the supply side)

The steward gets the groups who specialize in building and evaluating technology — vendors, tech groups, university researchers, benchmarkers— pointed at the right problems with the right real-world partnerships:

  • Publishing needs briefs by portfolio (housing, reentry, debt, family, etc).
  • Matchmaking teams and vendors; structuring pilots with data, milestones, evaluation, and governance. Helping organizations choose a best-in-class vendor and then also manage this relationship with regular evaluation.
  • Contributing to benchmarks, datasets, and red-teaming so the field learns together. Build the infrastructure that can lead to effective, ongoing evaluation of how AI systems are performing.
  • Helping fund and scale what works; communicating results frankly. Ensuring that prototypes and pilots’ outcomes are shared to inform others of what they might adopt, or what changes must happen to the AI solutions for them to be adopted or scaled.

The result: useful and robust AI solutions built with frontline reality, evaluated transparently, and ready to adopt responsibly.

What Stewards Could Do Month-to-Month

I have been brainstorming specific actions that a statewide steward could do. Many of these actions could also be done in concert with a federated network of stewards.

Some of the things a state steward could do to advance responsible, impactful AI for Access to Justice in their region.

Map the State’s Ecosystem of Legal Help

Too often, we think in terms of organizations — “X Legal Aid,” “Y Court Help Center” — instead of understanding who’s doing the actual legal work.

Each state needs to start by identifying the legal teams operating within its borders.

  • Who is doing eviction defense?
  • Who helps people with no-fault divorce filings?
  • Who handles reasonable accommodation letters for tenants?
  • Who runs the reentry clinic or expungement help line?
  • Who offers debt relief letter assistance?
  • Who does restraining order help?

This means mapping not just legal help orgs, but service portfolios and delivery models. What are teams doing? What are they not doing? And what are the unmet legal needs that clients consistently face?

This is a service-level analysis — an inventory of the “market” of help provided and the legal needs not yet met.

AI Training for Leaders + Broader Legal Organizations

Most legal aid and court help staff are understandably cautious about AI. Many don’t feel in control of the changes coming — they feel like they’re watching the train leave the station without them.

The steward’s job is to change that.

  • Demystify AI: Explain what these systems are and how they can support (or undermine) legal work.
  • Coach teams: Help practice leads and service teams see which parts of their work are ripe for AI support.
  • Invite ownership: Position AI not as a threat, but as a design space — a place where legal experts get to define how tools should work, and where lawyers and staff retain the power to review and direct.

To do this, stewards can run short briefings for EDs, intake leads, and practice heads on LLM basics, use cases, risks, UPL and confidentiality, and adoption playbooks. Training aims to get them conversant in the basics of the technology and help them envision where responsible opportunities might be. Let them see real-world examples of how other legal help providers are using AI behind the scenes or directly to the public.

Brainstorm + Opportunity Mapping Workshops with Legal Teams

Bring housing teams, family law facilitator teams, reentry teams, or other specific legal teams together. Have them map out their workflows and choose which of their day-to-day tasks is AI-opportune. Which of the tasks are routine, templated, and burdensome?

As stewards run these workshops, they can be on the lookout for where legal teams in their state can build, buy, or adopt an AI solution in 3 areas.

When running AI opportunity brainstorm, it’s worth considering these 3 zones: where can we add to existing legal full-representation servivces, where can we add to brief or pro bono services, and where can we add services that legal teams don’t currently offer?

Brainstorm 1: AI Copilots for Services Legal Teams Already Offer

This is the lowest-risk, highest-benefit space. Legal teams are already helping with eviction defense, demand letters, restraining orders, criminal record clearing, etc.

Here, AI can act as a copilot for the expert — a tool that does things that the expert lawyer, paralegal, or legal secretary is already doing in a rote way:

  • Auto-generates first drafts based on intake data
  • Summarizes client histories
  • Auto-fills court forms
  • Suggests next actions or deadlines
  • Creates checklists, declarations, or case timelines

These copilots don’t replace lawyers. They reduce drudge work, improve quality, and make staff more effective.

Brainstorm 2: AI Copilots for Services That Could Be Done by Pro Bono or Volunteers

Many legal aid organizations know where they could use more help: limited-scope letters, form reviews, answering FAQs, or helping users navigate next steps.

AI can play a key role in unlocking pro bono, brief advice, and volunteer capacity:

  • Automating burdensome tasks like collecting or review database records,
  • Helping them write high-quality letters or motions
  • Pre-filling petitions and forms with data that has been gathered
  • Providing them with step-by-step guidance
  • Flagging errors, inconsistencies, or risks in drafts
  • Offering language suggestions or plain-language explanations

Think of this as AI-powered “training wheels” that help volunteers help more people, with less handholding from staff.

Brainstorm 3: AI Tools for Services That Aren’t Currently Offered — But Should Be

There are many legal problems where there is high demand, but legal help orgs don’t currently offer help because of capacity limits.

Common examples of these under-served areas include:

  • Security deposit refund letters
  • Creating demand letters
  • Filing objections to default judgments
  • Answering brief questions

In these cases, AI systems — carefully designed, tested, and overseen — can offer direct-to-consumer services that supplement the safety net:

  • Structured interviews that guide users through legal options
  • AI-generated letters/forms with oversight built in
  • Clear red flags for when human review is needed

This is the frontier: responsibly extending the reach of legal help to people who currently get none. The brainstorm might also include reviewing existing direct-to-consumer AI tools from other legal orgs, and deciding which they might want to host or link to from their website.

The steward can hold these brainstorming and prioritization sessions to help legal teams find these legal team co-pilots, pro bono tools, and new service offerings in their issue area. The stewards and legal teams can move the AI vision forward & prepare for a clear scope for what AI should be built.

Data Readiness + Knowledge Base Building

Work with legal and court teams to inventory what data they have that could be used to train or evaluate some of the legal AI use cases they have envisioned. Support them with tools & protocols by which to mask PII in this document and make it safe to use in AI R&D.

This could mean getting anonymized completed forms, documents, intake notes, legal answers, data reports, or other legal workflow items. Likely, much of this data will have to be labeled, scored, and marked up so that it’s useful in training and evaluation.

The steward can help the groups that hold this data to understand what data they hold, how to prepare it and share it, and how to mark it up with helpful labels.

Part of this is also to build a Local Legal Help Knowledge Base — not just about the laws and statutes on the books, but about the practical, procedural, and service knowledge that people need when trying to deal with a legal problem.

Much of this knowledge is in legal aid lawyers’ and court staff’s heads, or training decks and events, or internal knowledge management systems and memos.

Stewards can help these local organizations contribute this knowledge about local legal rules, procedures, timelines, forms, services, and step-by-step guides into a statewide knowledge base. This knowledge base can then be used by the local providers. It will be a key piece of infrastructure on which new AI tools and services can be built.

Adoption Logistics

As local AI development visions come together, the steward can lead on adoption logistics.

The steward can make sure that the local orgs don’t reinvent what might already exist, or spend money in a wasteful way.

They can do tool evaluations to see which LLMs and specific AI solutions perform best on the scoped tasks. They can identify researchers and evaluators to help with this. They can also help organizations procure these tools or even create a pool of multiple organizations with similar needs for a shared procurement process.

They might also negotiate beneficial, affordable licenses or access to AI tools that can help with the desired functions. They can also ensure that case management and document management systems are responsive to the AI R&D needs, so that the legacy technology systems will integrate well with the new tools.

Ideally, the steward will help the statewide group and the local orgs make smart investments in the tech they might need to buy or build — and can help clear the way when hurdles emerge.

Bigger-Picture Steward Strategies

In addition to these possible actions, statewide stewards can also follow a few broader strategies to get a healthy AI R&D ecosystem in their state and beyond.

Be specific to legal teams

As I’ve already mentioned throughout this essay, stewards should be focused on the ‘team’ level, rather than the ‘organization’ one. It’s important that they develop relationships and run activities with teams that are in charge of specific workflows — and that means the specific kind of legal problem they help with.

Stewardship should be organizing its statewide network of named teams and named services, for example,

  • Housing law teams & their workflows: hotline consults, eviction defense prep, answers, motions to set aside, trial prep, RA letters for habitability issues, security-deposit demand letters.
  • Reentry teams & their workflows: record clearance screening, fines & fees relief, petitions, supporting declarations, RAP sheet interpretation, collateral consequences counseling.
  • Debt/consumer teams & their workflows: answer filing, settlement letters, debt verification, exemptions, repair counseling, FDCPA dispute letters.
  • Family law teams & their workflows: form prep (custody, DV orders), parenting plans, mediation prep, service and filing instructions, deadline tracking.

The steward can make progress on its 2 main goals — AI readiness and R&D encouragement — if it can build a strong local network among the teams that work on similar workflows, with similar data and documents, with similar audiences.

Put ethics, privacy, and operational safeguards at the center

Stewardship builds trust by making ethics operational rather than an afterthought. This all happens when AI conversations are grounded, informed, and specific among legal teams and communities. It also happens when they work with trained evaluators, who know how to evaluate the performance of AI rigorously, not based on anecdotes and speculation.

The steward network can help by planning out and vetting common, proven strategies to ensure quality & consumer protection are designed into the AI systems. They could work on:

  • Competence & supervision protocols: helping legal teams plan for the future of expert review of AI systems, clarifying “eyes-on” review models with staff trainings and tools. Stewards can also help them plan for escalation paths, when human reviewers find problems with the AI’s performance. Stewards might also work on standard warnings, verification prompts, and other key designs to ensure that reviewers are effectively watching AI’s performance.
  • Professional ethics rules clarity: help the teams design internal policies that ensure they’re in compliance with all ethical rules and responsibilities. Stewards can also help them plan out effective disclosures and consent protocols, so consumers know what is happening and have transparency.
  • Confidentiality & privacy: This can happen at the federated/ national level. Stewards can set rules for data flows, retention, de-identification/masking — which otherwise can be overwhelming for specific orgs. Stewards can also vet vendors for security and subprocessing.
  • Accountability & Improvements: Stewards can help organizations and vendors plan for good data-gathering & feedback cycles about AI’s performance. This can include guidance on document versioning, audit logs, failure reports, and user feedback loops.

Stewards can help bake safeguards into workflows and procurement, so that there are ethics and privacy by design in the technical systems that are being piloted.

Networking stewards into a federated ecosystem

For statewide stewardship to matter beyond isolated pilots, stewards need to network into a federated ecosystem — a light but disciplined network that preserves local autonomy while aligning on shared methods, shared infrastructure, and shared learning.

The value of federation is compounding: each state adapts tools to local law and practice, contributes back what it learns, and benefits from the advances of others. Also, many of the tasks of a steward — educating about AI, building ethics and safeguards, measuring AI, setting up good procurement — will be quite similar state-to-state. Stewards can share resources and materials to implement locally.

What follows reframes “membership requirements” as the operating norms of that ecosystem and explains how they translate into concrete habits, artifacts, and results.

Quarterly check-ins become the engine of national learning. Stewards participate in a regular virtual cohort, not as a status ritual but as an R&D loop. Each session surfaces what was tried, what worked, and what failed — brief demos, before/after metrics, and annotated playbooks.

Stewards use these meetings to co-develop materials, evaluation rubrics, funding strategies, disclosure patterns, and policy stances, and to retire practices that didn’t pan out. Over time, this cadence produces a living canon of benchmarks and templates that any newcomer steward can adopt on day one.

Each year, the steward could champion at least one pilot or evaluation (for example, reasonable-accommodation letters in housing or security-deposit demand letters in consumer law), making sure it has clear success criteria, review protocols, and an exit ramp if risks outweigh benefits. This can help the pilots spread to other jurisdictions more effectively.

Shared infrastructure is how federation stays interoperable. Rather than inventing new frameworks in every state, stewards lean on common platforms for evaluation, datasets, and reusable workflows. Practically, that means contributing test cases and localized content, adopting shared rubrics and disclosure patterns, and publishing results in a comparable format.

It also means using common identifiers and metadata conventions so that guides, form logic, and service directories can be exchanged or merged without bespoke cleanup. When a state localizes a workflow or improves a safety check, it pushes the enhancement upstream, so other states can pull it down and adapt with minimal effort.

Annual reporting turns stories into evidence and standards. Each steward could publish a concise yearly report that covers: progress made, obstacles encountered, datasets contributed (and their licensing status), tools piloted or adopted (and those intentionally rejected), equity and safety findings, and priorities for the coming year.

Because these reports follow a common outline, they are comparable across states and can be aggregated nationally to show impact, surface risks, and redirect effort. They also serve as onboarding guides for new teams: “Here’s what to try first, here’s what to avoid, here’s who to call.”

Success in 12–18 months looks concrete and repeatable. In a healthy federation, we could point to a public, living directory of AI-powered teams and services by portfolio, with visible gaps prioritized for action.

  • We could have several legal team copilots embedded in high-volume workflows — say, demand letters, security-deposit letters, or DV packet preparation — with documented time savings, quality gains, and staff acceptance.
  • We could have volunteer unlocks, where a clinic or pro bono program helps two to three times more people in brief-service matters because a copilot provides structure, drafting support, and review checkers.
  • We could have at least one direct-to-public workflow launched in a high-demand, manageable-risk area, with clear disclosures, escalation rules, and usage metrics.
  • We would see more contributions to data-driven evaluation practices and R&D protocols. This could be localized guides, triage logic, form metadata, anonymized samples, and evaluation results. Or it could be an ethics and safety playbook that is not just written but operationalized in training, procurement, and audits.

A federation of stewards doesn’t need heavy bureaucracy. It could be a set of light, disciplined habits that make local work easier and national progress faster. Quarterly cohort exchanges prevent wheel-reinventing. Local duties anchor AI in real services. Shared infrastructure keeps efforts compatible. Governance protects the public-interest character of the work. Annual reports convert experience into standards.

Put together, these practices allow stewards to move quickly and responsibly — delivering tangible improvements for clients and staff while building a body of knowledge the entire field can trust and reuse.

Stewardship as the current missing piece

Our team at Stanford Legal Design Lab is aiming for an impactful, ethical, robust ecosystem of AI in legal services. We are building the platform JusticeBench to be a home base for those working on AI R&D for access to justice. We are also building justice co-pilots directly with several legal aid groups.

But to build this robust ecosystem, we need local stewards for state jurisdictions across the country — who can take on key leadership roles and decisions — and make sure that there can be A2J AI that responds to local needs but benefits from national resources. Stewards can also help activate local legal teams, so that they are directing the development of AI solutions rather than reacting to others’ AI visions.

We can build legal help AI state by state, team by team, workflow by workflow. But we need stewards who keep clients, communities, and frontline staff at the center, while moving their state forward.

That’s how AI becomes a force for justice — because we designed it that way.

Categories
AI + Access to Justice

Human-Centered AI R&D at ICAIL’s Access to Justice Workshop

By Margaret Hagan, Executive Director of the Legal Design Lab

At this year’s International Conference on Artificial Intelligence and Law (ICAIL 2025) in Chicago, we co-hosted the AI for Access to Justice (AI4A2J) workshop—a full-day gathering of researchers, technologists, legal practitioners, and policy experts, all working to responsibly harness artificial intelligence to improve public access to justice.

The workshop was co-organized by an international team: myself (Margaret Hagan) from Stanford Legal Design Lab, Quinten Steenhuis of Suffolk University Law School/LIT Lab; Hannes Westermann of Maastricht University; Marc Lauritsen, Capstone Practice Systems; and Jaromir Savelka of Carnegie Mellon University. Together, we brought together 22 papers from contributors across the globe, representing deep work from Brazil, Czechia, Singapore, the UK, Canada, Italy, Finland, Australia, Taiwan, India, and the United States.

A Truly Global Conversation

What stood out most was the breadth of global participation and the specificity of solutions offered. Rather than high-level speculation, nearly every presentation shared tangible, grounded proposals or findings: tools developed and deployed, evaluative frameworks created, and real user experiences captured.

Whether it was a legal aid chatbot deployed in British Columbia, a framework for human-centered AI development from India, or benchmark models to evaluate AI-generated legal work product in Brazil, the contributions showcased the power of bottom-up experimentation and user-centered development.

A Diversity of Roles and Perspectives

Participants included legal researchers, practicing attorneys, judges, technologists, policy designers, and evaluation experts. The diversity of professional backgrounds allowed for robust discussion across multiple dimensions of justice system transformation. Each participant brought a unique lens—whether from working directly with vulnerable litigants, building AI systems, or establishing ethical and regulatory frameworks for new technologies.

Importantly, the workshop centered interdisciplinary collaboration. It wasn’t just legal professionals theorizing about AI, or technologists proposing disconnected tools. Instead, we heard from hybrid teams conducting qualitative user research, sharing open-source datasets, running field pilots, and conducting responsible evaluations of AI interventions in real-world settings.

Emerging Themes Across the Day

Across four themed panels, several core themes emerged:

  1. Human-Centered AI for Legal Aid and Self-Help
    Projects focused on building AI copilots and tools to support legal aid organizations and self-represented litigants. Presenters shared tools to help tenants facing eviction, systems to automate form filling with contextual guidance, and bots that assist in court navigation. Importantly, these tools were being built in partnership with legal aid teams and directly with users, with ongoing evaluations of quality, safety, and impact.
  2. Legal Writing, Research, and Data Tools
    A second group of projects explored how AI could help professionals and SRLs write legal documents, draft arguments, and find relevant precedent more efficiently. These systems included explainable outcome predictors for custody disputes, multilingual legal writing assistants, and knowledge graphs built from court filings. Many papers detailed methods for aligning AI output with local legal contexts, language needs, and cultural sensitivity.
  3. Systems-Level Innovation and AI Infrastructure
    A third set of papers zoomed out to the system level. Projects explored how AI could enable better triage and referral systems, standardized data pipelines, and early intervention mechanisms (e.g., detecting legal risk from text messages or scanned notices). We also heard from teams building open-source infrastructure for courts, public defenders, and justice tech startups to use.
  4. Ethics, Evaluation, and Responsible Design
    Finally, the workshop closed with discussions of AI benchmarks, regulatory models, and ethical frameworks to guide the development and deployment of legal AI tools. How do we measure the accuracy, fairness, and usefulness of a generative AI system when giving legal guidance? What does it mean to provide “good enough” help when full representation isn’t possible? Multiple projects proposed evaluation toolkits, participatory design processes, and accountability models for institutions adopting these tools.

Building on Past Work and Sharing New Ideas

Many workshop presenters built directly on prior research, tools, and evaluation methods developed through the Legal Design Lab and our broader community. We were especially excited to see our Lab’s Legal Q&A Evaluation Rubrics, originally developed to benchmark the quality of automated legal information, being adopted in People’s Law School’s (in British Columbia) Beagle+ project as they deploy and test a user-facing AI chatbot to answer people’s common legal questions.

Another compelling example came from Georgetown University, where our previous Visual Legal design work product, patterns and communication tools are now inspiring a new AI-powered visual design creator built by Brian Rhindress. Their tool helps legal aid organizations and court staff visually and textually explain legal processes to self-represented litigants—leveraging human-centered design and large language models to generate tailored explanations and visuals. A group can take their text materials and convert it into human-centered visual designs, using LLMs + examples (including those from our Lab and other university/design labs).

We’re excited to see these threads of design and evaluation from previous Stanford Legal Design Lab work continuing to evolve across jurisdictions.

The Need for Empirical Grounding and Regulatory Innovation

A major takeaway from the group discussions was the urgent need for new empirical research on how people actually interact with legal AI tools—what kinds of explanations they want, what kinds of help they trust, and what types of disclosures and safeguards are meaningful. Rather than assuming that strict unauthorized practice of law (UPL) rules will protect consumers, several papers challenged us to develop smarter, more nuanced models of consumer protection, ones grounded in real user behavior and real-world harms and benefits.

This opens the door for a new generation of research—not just about what AI can do, but about what regulatory frameworks and professional norms will ensure the tools truly serve the public good.

Highlights

There were many exciting contributions among the 22 presentations. Here is a short overview, and I encourage you to explore all the draft papers.

Tracking and Improving AI Tools with Real-World Usage Data: The Beagle+ Experiment in British Columbia

One of the standout implementations shared came from British Columbia’s People’s Law School’s Beagle+ project. This legal chatbot, launched in early 2024, builds on years of legal aid innovation to offer natural-language assistance to users navigating everyday legal questions. What makes Beagle+ especially powerful is its integrated feedback and monitoring system: each interaction is logged with Langfuse, recording inputs, outputs, system prompts, retrieval sources, and more.

The team uses this real-world usage data to monitor system accuracy, cost, latency, and user empowerment over time—allowing iterative improvements that directly respond to user behavior.

They also presented experiments in generative legal editing, exploring the chatbot’s ability to diagnose or correct contract clauses, with promising results. Yet, the team emphasized that no AI tool is perfect out of the box—for now, human review and thoughtful system design remain essential for safe deployment.


Helping Workers Navigate Employment Disputes in the UK: AI-Powered ODR Recommenders

Glory Ogbonda and Sarah Nason presented a pioneering tool from the UK designed to help workers triage their employment disputes and find the right online dispute resolution (ODR) system. Funded by the Solicitors Regulation Authority, this research uncovered what users in employment disputes really want: not just legal signposting, but a guided journey. Their proposed ODR-matching system uses RAG (retrieval-augmented generation) to give users an intuitive flow: first collecting plain-language descriptions of their workplace conflict, then offering legal framing, suggested next steps, and profiles of potential legal tools.

User testing revealed a tension between formal legal accuracy and the empathy and clarity that users crave. The project underscores a core dilemma in legal AI: how to balance actionable, user-centered advice with the guardrails of legal ethics and system limits.


Empowering Litigants through Education and AI-Augmented Practice: The Cybernetic Legal Approach

Zoe Dolan (working with Aiden the AI) shared insights from her hands-on project working directly with self-represented litigants in an appeals clinic. Dolan + Aiden trained cohorts of participants to use a custom-configured GPT-based tool, enhanced with rules, court guides, and tone instructions. Participants learned how to prompt the tool effectively, verify responses, and use it to file real motions and navigate the courts.

The project foregrounds empowerment, rather than outcome, as the key success metric—helping users avoid defaults, feel agency, and move confidently through procedures. Notably, Dolan found that many SRLs had developed their own sophisticated AI usage patterns, often outpacing legal professionals in strategic prompting and tool adoption. The project points to a future where legal literacy includes both procedural knowledge and AI fluency.


AI-Driven Early Intervention in Family and Housing Law in Chicago

Chlece Walker-Neal presented AJusticeLink, a preventative justice project from Chicago focused on identifying legal and psycho-legal risk through SMS messages. The tool analyzes texts that users send to friends, family, and others, detecting language that signals legal issues—such as risk of eviction or custody disputes—and assigning an urgency score. Based on this, users are linked to appropriate legal services. The project aims to intervene before a crisis reaches the courthouse, helping families address issues upstream. This early-warning approach exemplifies a shift in justice innovation: from reactive court services to proactive legal health interventions.


PONK: Helping Czech Litigants Write Better Legal Texts with Structured AI Guidance

The PONK project out of Czechia presented a tool for improving client-oriented legal writing by using structured AI and rule-based enhancements. Drawing on a dataset of over 250 annotated legal documents, the system helps convert raw legal text into clearer argumentation following a Fact–Rule–Conclusion (FRC) structure. This project is part of a broader movement to bring explainable AI into legal drafting and aims to serve both litigants and legal aid professionals by making documents more structured, persuasive, and usable across a wider audience. It showcases how small linguistic and structural refinements, guided by AI, can produce outsized impact in real-world justice communications.


Can AI Help Fix Hard-to-Use Court Forms and Text-Heavy Guides? A Visual Design AI Prototype

Brian Rhindress presented a provocative question: can AI be trained to reformat legal documents and court forms into something more visually accessible? And could the backlog of training materials in PDFs, docs, and text-heavy powerpoints be converted into something more akin to comic books, visuals, fliers, and other engaging materials?

Inspired by design principles from Stanford Legal Design Lab and the U.S. Digital Service, and building off materials from the Graphic Advocacy Project, Harvard A2J Lab, Legal Design Lab and more, the project tested generative models on their ability to re-layout legal forms. While early versions showed promise for ideation and inspiration, they often suffered from inconsistent checkbox placement, odd visual hierarchy, or poor design language.

Still, the vision is compelling: a future where AI-powered layout tools assist courts in publishing more user-friendly, standardized forms—potentially across jurisdictions. Future versions may build on configuration workflows and clear design templates to reduce hallucinations and increase reliability. The idea is to lower the entry barrier for underserved communities by combining proven legal messaging with compelling visual storytelling. Rather than developing entirely new tools, teams explored how off-the-shelf systems, paired with smart examples and curated prompts, can deliver real-time, audience-tailored legal visuals.


Building Transparent, Customizable AI Systems for Sentencing and Immigration Support

Aparna Komarla from Redo.io and colleagues from OpenProBono demonstrated the power of open, configurable AI agents in the justice system. In California’s “Second Look” sentencing reviews, attorneys can use this custom-built AI system to query multi-agency incarceration datasets and assign “suitability scores” to prioritize eligible individuals who might have claims the attorneys can assist with. The innovation lies in giving attorneys—not algorithms—the power to define and adjust the weights of relevant factors, helping to maintain transparency and align tools with local values and judicial discretion.


Place Matters: How Location Affects AI Hallucination Rates in Legal Answers

Damian Curran and colleagues explored an increasingly urgent issue: do large language models (LLMs) perform differently depending on the geographic context of the legal question? Their findings say yes—and in sometimes surprising ways. For instance, while LLMs hallucinated less often on employment law queries in Sydney, their housing law performance there was riddled with errors. In contrast, models did better on average with Los Angeles queries—possibly due to the volume of U.S.-based training data. The study underscores the importance of localization in AI legal tools, especially for long-tail or low-resourced jurisdictions where statutory nuance or recent reforms may not be well represented in AI training data.


Drawing the Line Between Legal Info and Legal Advice in India’s Emerging Chatbot Landscape

Avanti Durani and Shivani Sathe presented a critical user research study from India that investigates how AI-powered legal chatbots respond to user queries—and whether they stay within the bounds of offering legal information rather than unlicensed advice. Their analysis of six tools found widespread inconsistencies in tone, disclaimers, and the framing of legal responses. Some tools subtly slipped into strategic advice or overly narrow guidance, even as disclaimers were buried or hard to find. These gaps, they argue, pose real risks for low-literacy and legally vulnerable users. Their work raises important regulatory questions: should the standard for chatbots be defined only by unauthorized practice of law rules? Or should we also integrate user preferences and the expectations of trusted community intermediaries, such as social workers or legal aid navigators?


The full collection of 22 papers is available here with links to preprint drafts as available. We encourage everyone to explore the work and reach out to the authors—many are actively seeking collaborators, reviewers, and pilot partners.

Together, these contributions mark a new chapter in access to justice research—one where AI innovation is rigorously evaluated, deeply grounded in the legal domain, and shaped by the real needs of the people and professionals who use it.

What Comes Next

The enthusiasm and rigor from this year’s submissions reaffirmed that AI for access to justice is not a hypothetical field—it’s happening now, and it’s advancing rapidly.

The ICAIL AI4A2J workshop served as a global convening point where ideas were shared not just to impress, but to be replicated, scaled, and improved upon. Multiple projects made their datasets and prototypes publicly available, inviting others to test and build on them. Several are looking to collaborate across jurisdictions and domains to study effectiveness in new environments.

Our Stanford Legal Design Lab team left the workshop energized to continue our own work on AI co-pilots for eviction defense and debt relief, and newly inspired to integrate ideas from peers across the globe. We’re especially focused on how to:

  • Embed evaluation and quality standards from the start,
  • Design human-AI partnerships that support (not replace) frontline legal workers,
  • Spread and scale the best tools and protocols in ways that preserve trust, dignity, and legal integrity.
  • Develop policies and regulation that are based in empirical data, human behavior, and actual consumer protection

Thank You

We’re deeply grateful to our co-organizers and all the presenters who contributed to making this workshop a meaningful step forward. And a special thanks to the ICAIL community, which continues to be a space where technical innovation and public interest values come together in thoughtful dialogue.

Stay tuned—our program committee is considering next steps around publications and subsequent conferences, and we hope this is just the beginning of an ongoing, cross-border conversation about how AI can truly improve access to justice.

Please also see my colleague Quinten’s write-up of his takeaways from the workshop!

Categories
Class Blog Design Research

3 Kinds of Access to Justice Conflicts

(And the Different Ways to Design for Them)

by Margaret Hagan

In the access to justice world, we often talk about “the justice gap” as if it’s one massive, monolithic challenge. But if we want to truly serve the public, we need to be more precise. People encounter different kinds of legal problems, with different stakes, emotional dynamics, and system barriers. And those differences matter.

At the Legal Design Lab, we find it helpful to divide the access to justice landscape into three distinct types of problems. Each has its own logic — and each requires different approaches to research, design, technology, and intervention.

3 Types of Conflicts that we talk about when we talk about Access to Justice

1. David vs. Goliath Conflicts

This is the classic imbalance. An individual — low on time, legal knowledge, money, or support — faces off against a repeat player: a bank, a corporate landlord, a debt collector, or a government agency.

These Goliaths have teams of lawyers, streamlined filing systems, institutional knowledge, predictive data, and now increasingly, AI-powered legal automation and strategies. They can file thousands of cases a month — many of which go uncontested because people don’t understand the process, can’t afford help, or assume there’s no point trying.

This is the world of:

  • Eviction lawsuits from corporate landlords
  • Mass debt collection actions
  • Robo-filed claims, often incorrect but rarely challenged

The problem isn’t just unfairness — it’s non-participation. Most “Davids” default. They don’t get their day in court. And as AI makes robo-filing even faster and cheaper, we can expect the imbalance in knowledge, strategy, and participation may grow worse.

What Goliath vs. David Conflicts need

Designing for this space means understanding the imbalance and structuring tools to restore procedural fairness. That might mean:

  • Tools that help people respond before defaulting. These could be pre-filing defense tools that detect illegal filings or notice issues. It could also be tools that prepare people to negotiate from a stronger position — or empower them to respond before defaulting.
  • Systems that detect and challenge low-quality filings. It could also involve systems that flag repeat abusive behavior from institutional actors.
  • Interfaces that simplify legal documents into plain language. Simplified, visual tools to help people understand their rights and the process quickly.
  • Research into procedural justice and scalable human-AI support models

2. Person vs. Person Conflicts

This second type of case is different. Here, both parties are individuals, and neither has a lawyer.

In this world, both sides are unrepresented and lack institutional or procedural knowledge. There’s real conflict — often with emotional, financial, or relational stakes — but neither party knows how to navigate the system.

Think about emotionally charged, high-stakes cases of everyday life:

  • Family law disputes (custody, divorce, child support)
  • Mom-and-pop landlord-tenant disagreements
  • Small business vs. customer conflicts
  • Neighbor disputes and small claims lawsuits

Both people are often confused. They don’t know which forms to use, how to prepare for court, how to present evidence, or what will persuade a judge. They’re frustrated, emotional, and worried about losing something precious — time with their child, their home, their reputation. The conflict is real and felt deeply, but both sides are likely confused about the legal process.

Often, these conflicts escalate unnecessarily — not because the people are bad, but because the system offers them no support in finding resolution. And with the rise of generative AI, we must be cautious: if each person gets an AI assistant that just encourages them to “win” and “fight harder,” we could see a wave of escalation, polarization, and breakdowns in courtrooms and relationships.

We have to design for a future legal system that might, with AI usage increasing, become more adversarial, less just, and harder to resolve.

What Person Vs. Person Justice Conflicts Need

In person vs. person conflicts, the goal should be to get to mutual resolutions that avoid protracted ‘high’ conflict. The designs needed are about understanding and navigation, but also about de-escalation, emotional intelligence, and procedural scaffolding.

  • Tools that promote resolution and de-escalation, not just empowerment. They can ideally support shared understanding and finding a solution that can work for both parties.
  • Shared interfaces that help both parties prepare for court fairly. Technology can help parties prepare for court, but also explore off-ramps like mediation.
  • Mediation-oriented AI prompts and conflict-resolution scaffolding. New tools could have narrative builders that let people explain their story or make requests without hostility. AI prompts and assistants could calibrate to reduce conflict, not intensify it.
  • Design research that prioritizes relational harm and trauma awareness.

This is not just a legal problem. It’s a human problem — about communication, trust, and fairness. Interventions here also need to think about parties that are not directly involved in the conflict (like the children in a family law dispute between separating spouses).

3. Person vs. Bureaucracy

Finally, we have a third kind of justice issue — one that’s not so adversarial. Here, a person is simply trying to navigate a complex system to claim a right or access a service.

These kinds of conflicts might be:

  • Applying for public benefits, or appealing a denial
  • Dealing with a traffic ticket
  • Restoring a suspended driver’s license
  • Paying off fines or clearing a record
  • Filing taxes or appealing a tax decision
  • Correcting an error on a government file
  • Getting work authorization or housing assistance

There’s no opposing party. Just forms, deadlines, portals, and rules that seem designed to trip you up. People fall through the cracks because they don’t know what to do, can’t track all the requirements, or don’t have the documents ready. It’s not a courtroom battle. It’s a maze.

Here many of the people caught in these systems do have rights and options. They just don’t know it. Or they can’t get through all the procedural hoops to claim them. It’s a quiet form of injustice — made worse by fragmented service systems and hard-to-reach agencies.

What Person vs. Bureaucracy Conflicts Need

For people vs. bureaucracy conflicts, the key word is navigation. People need supportive, clarifying tools that coach and guide them through the process — and that might also make the process simpler to begin with.

  • Seamless navigation tools that walk people through every step. These could be digital co-pilots that walk people through complex government workflows, and keep them knowledgeable and encouraged at each step.
  • Clear eligibility screeners and document checklists. These could be intake simplification tools that flag whether the person is in the right place, and sets expectations about what forms someone needs and when.
  • Text-based reminders and deadline alerts, to keep people on top of complicated and lengthy processes. These procedural coaches can keep people from ending up in endless continuances or falling off the process altogether. Personal timelines and checklists can track each step and provide nudges.
  • Privacy-respecting data sharing so users don’t have to “start over” every time. This could mean administrative systems that have document collection & data verication systems that gather and store proofs (income, ID, residence) that people need to supply over and again. It could also mean bringing their choices and details among trusted systems, so they don’t need to fill in another form.

This space is ripe for good technology. But it also needs regulatory design and institutional tech improvements, so that systems become easier to plug into — and easier to fix. Aside from user-facing designs, we also need to work on standardizing forms, moving from form-dependencies to structured data, and improve the tech operations of the systems.

Why These Distinctions Matter

These three types of justice problems are different in form, in emotional tone, and in what people need to succeed. That means we need to study them differently, run stakeholder sessions differently, evaluate them with slightly different metrics, and employ different design patterns and principles.

Each of these problem types requires a different kind of solution and ideal outcome.

  • In David vs. Goliath, we need defense, protection, and fairness. We need to help reduce the massive imbalance in knowledge, capacity, and relationships, and ensure everyone can have their fair day in court.
  • In Person vs. Person, we need resolution, dignity, and de-escalation. We need to help people focus on mutually agreeable, sustainable resolutions to their problems with each other.
  • In Person vs. Bureaucracy, we need clarity, speed, and guided action. We must aim for seamless, navigable, efficient systems.

Each type of problem requires different work by researchers, designers, an policymakers. These include different kinds of:

  • User research methods, and ways to bring stakeholders together for collaborative design sessions
  • Product and service designs, and the patterns of tools, interfaces, and messages that will engage and serve users in this conflict.
  • Evaluation criteria, about what success looks like
  • AI safety guidelines, about how to prevent bias, capture, inaccuracies, and other possible harms. We can expect these 3 different conflicts changing as more AI usage appears among litigants, lawyers, and court systems.

If we blur these lines, we risk building one-size-fits-none tools.

How might the coming wave of AI in the legal system affect these 3 different kinds of Access to Justice problems?

Toward Smarter Justice Innovation

At the Legal Design Lab, we believe this three-type framework can help researchers, funders, courts, and technologists build smarter interventions — and avoid repeating old mistakes.

We can still learn across boundaries. For example:

  • How conflict resolution tools from family law might help in small business disputes
  • How navigational tools in benefits access could simplify court prep
  • How due process protections in eviction can inform other administrative hearings

But we also need to be honest: not every justice problem is built the same. And not every innovation should look the same.

By naming and studying these three zones of access to justice problems, we can better target our interventions, avoid unintended harm, and build systems that actually serve the people who need them most.