Insights from Quinten Steenhuis at the AI + Access to Justice Research Seminar

Recently, the Stanford Legal Design Lab hosted its latest installment of the AI+Access to Justice Research Seminar, featuring a presentation from Quinten Steenhuis.

Quinten is a professor and innovator-in-residence at Suffolk Law School’s LIT Lab. He’s also a former housing attorney in Massachusetts who has made a significant impact with projects like Court Forms Online and MADE, a tool for automating eviction help. His group Lemma Legal works with groups on developing legal tech for interviews, forms, and documents.

His presentation in April 2025 focused on a project he’s been working on in collaboration with Hannes Westermann from the Maastricht Law & Tech Lab. This R&D project focuses on whether large language models (LLMs) are effective at tasks that might streamline intake in civil legal services. This work is being developed in partnership with Legal Aid of Eastern Missouri, along with other legal aid groups and funding from the U.S. Department of Housing and Urban Development.

The central question addressed was: Can LLMs help people get through the legal intake process faster and more accurately?

The Challenge: Efficient, Accurate Legal Aid Intake and Triage

For many people, legal aid is hard to access. That is in part because of the intake process, to apply for help from a local legal aid group. It can be time-consuming and frustrating for people to go through the current legal aid intake and triage process. Imagine calling a legal aid hotline, stressed out about a problem with your housing, family, finances, or job, only to wait on hold for an hour or more. When your call is finally answered, the intake worker needs to determine whether you qualify for help based on a complex and often open-textured set of rules. These rules vary significantly depending on jurisdiction, issue area, and individual circumstances — from citizenship and income requirements to more subjective judgments like whether a case is a “good” or “bad” fit for the program’s priorities or funding streams.

Intake protocols are typically documented internally for staff members in narrative guides, sometimes as long as seven pages, containing a mix of rules, sample scripts, timelines, and sub-rules that differ by zip code and issue type. These rules are rarely published online, as they can be too complex for users to interpret on their own. Legal aid programs may also worry about misinterpretation and inaccurate self-screening by clients. Instead, they keep these screening rules private to their staff.

Moreover, the intake process can involve up to 30+ rules about which cases to accept. These rules can vary between legal aid groups and can also change frequently (often in part because of funding that changes frequently). This “rules complexity” makes it hard for call center workers to provide consistent, accurate determinations about whose case will be accepted, leading to long wait times and inconsistent screening results. The challenge is to reduce the time legal aid workers spend screening without incorrectly denying services to those who qualify.

The Proposed Intervention: Integrating LLMs for Faster, Smarter Intake

To address this issue, Quinten, Hannes, and their partners have been exploring whether LLMs can help automate parts of the intake process. Specifically, they asked:

Can LLMs quickly determine whether someone qualifies for legal aid?
Can this system reduce the time spent on screening and make intake more efficient?

The solution they developed is part of the Missouri Tenant Help project, a hybrid system that combines rule-based questions with LLM-powered responses. The site’s intake system begins by asking straightforward, rules-based questions about citizenship, income, location, and problem description. It uses DocAssemble, a flexible platform that integrates Missouri-specific legal screening questions with central rules from Suffolk’s Court Forms Online for income limits and federal guidelines.

At one point in the intake workflow, the system prompts users to describe their problem in a free-text box. The LLM then analyzes the input, cross-referencing it with the legal aid group’s eligibility rules. If the system still lacks sufficient data, it generates follow-up questions in real-time, using a low-temperature model version to ensure consistent and cautious output.

For example, if a user says, “I got kicked out of my house,” the system might follow up with, “Did your landlord give you any formal notice or involve the court before evicting you?” The goal is to quickly assess whether the person might qualify for legal help while minimizing unnecessary back-and-forth. The LLM’s job is to identify the legal problem at issue, and then match this specific legal problem with the case types that legal aid groups around Missouri may take (or may not).

If the LLM works perfectly, it would be able to predict correctly whether a legal aid group is likely to take on this case, is likely to decline it, or if it is borderline.

The Experiment: Testing Different LLMs

To evaluate the system, the team conducted an experiment using 16 scenarios, 3 sets of legal aid program rules, and 8 different LLMs (including open-source, commercial, and popular models). The main question was whether the system could accurately match the “accept” or “reject” labels that legal experts had assigned to the scenarios.

The team found that the LLMs did a fairly accurate job at predicting which cases should be accepted or not. Overall, the LLMs correctly predicted acceptance or rejection with 84% precision, and GPT-4 Turbo performed the best.

Of particular interest were the rates of inaccurate predictions to reject a case. The system rarely made incorrect denials, which is critical for avoiding unjust exclusion from services. Rather, the LLM erred on the side of caution, often generating follow-up questions rather than making definitive, potentially incorrect judgments.

However, it sometimes asked for unnecessary follow-up information even when it already had enough data. This could mean that it led to a bad user experience, asking for too many redundant details and delaying making a decision. The problem was not around inaccuracy, though.

Challenges and Insights

One surprising result was that the LLMs sometimes caught errors made by human labelers. For example, in one case involving a support animal in Kansas City, the model correctly identified that a KC legal aid group was likely to accept this case, while the human reviewer mistakenly marked it as a likely denial. This underscores the potential of LLMs to enhance accuracy when paired with human oversight.

However, the LLMs also faced unique challenges.

Some models, like Gemini, refused to engage with topics related to domestic violence due to content moderation settings. This raised questions about whether AI developers understand the nuances of legal contexts. It also flagged the importance of screening possible models for use, depending on whether they censor legal topics.
The system also struggled with ambiguous scenarios, like evaluating whether “flimsy doors and missing locks” constituted a severe issue. Such situations highlighted the need for more tailored training and model configuration.

User Feedback and Next Steps

The system has been live for a month and a half and is currently offered as an optional self-screening tool on the Missouri Tenant Help website. Early feedback from legal aid partners has been positive, with high satisfaction ratings from users who tested the system. Some service providers noted they would like to see more follow-up questions to gather comprehensive details upfront — envisioning the LLM doing even more data-gathering, beyond what is needed to determine if a case is likely to be accepted or rejected.

In the future, the team aims to continue refinement and planning work, including to:

Refine the LLM prompts and training data to better capture nuanced legal issues.
Improve system accuracy by integrating rules-based reasoning with LLM flexibility.
Explore more cost-effective models to keep the service affordable — currently around 5 cents per interaction.
Enhance error handling by implementing model switching when a primary LLM fails to respond or disengages due to sensitive content.

Can LLMs and Humans Work Together?

This project exemplifies how LLMs and human experts can complement each other. Rather than fully automating intake, the system serves as a first-pass filter. It gives community members a quicker tool to get a high-level read on whether they are likely to get services from a legal aid group, or whether it would be better for them to pursue another service.

Rather than waiting for hours on a phone line, the user can choose to use this tool to get quicker feedback. They can still call the program — the system does not issue a rejection, but rather just gives them a prediction of what the legal aid will tell them.

The next phase will involve ongoing live testing and iterative improvements to balance speed, accuracy, and user experience.

The Future of Improving Legal Intake with AI

As legal aid programs increasingly look to AI and LLMs to streamline intake, several key opportunities and challenges are emerging.

1. Enhancing Accuracy and Contextual Understanding:

One promising avenue is the development of more nuanced models that can better interpret ambiguous or context-dependent situations. For instance, instead of flagging a potential denial based solely on rigid rule interpretations, the system could use context-aware prompts that take into account local regulations and specific case details. This might involve combining rule-based logic with adaptive LLM responses to better handle edge cases, like domestic violence scenarios or complex tenancy disputes.

2. Adaptive Model Switching:

Another promising approach is to implement a hybrid model system that dynamically switches between different LLMs depending on the context. For example, if a model like Gemini refuses to address sensitive topics, the system could automatically switch to a more legally knowledgeable model or one with fewer content moderation constraints. This could be facilitated by a router API that monitors for censorship or errors and adjusts the model in real time.

3. More Robust Fact Gathering:

A significant future goal is to enhance the system’s ability to collect comprehensive facts during intake. Legal aid workers noted that they often needed follow-up information after the initial screening, especially when the client’s problem involved specific housing issues or complex legal nuances. The next version of the system will focus on expanding the follow-up question logic to reduce the need for manual callbacks. This could involve developing predefined question trees for common issues while maintaining the model’s ability to generate context-specific follow-up questions.

4. Tailoring to Local Needs and Specific Use Cases:

One of the biggest challenges for scaling AI-based intake systems is ensuring that they are flexible enough to adapt to local legal nuances. The team is considering ways to contextualize the system for individual jurisdictions, potentially using open-source approaches to allow local legal aid programs to train their own versions. This could enable more customized intake systems that better reflect local policies, tenant protections, and court requirements.

5. Real-Time Human-AI Collaboration:

Looking further ahead, there is potential for building integrated systems where AI actively assists call center workers in real time. For instance, instead of having the AI conduct intake independently, it could listen to live calls and provide real-time suggestions to human operators, similar to how customer support chatbots assist agents. This would allow AI to augment rather than replace human judgment, helping to maintain quality control and legal accuracy.

6. Privacy and Ethical Considerations:

As these systems evolve, maintaining data privacy and ethical standards will be crucial. The current setup already segregates personal information from AI processing, but as models become more integrated into intake workflows, new strategies may be needed. Exploring privacy-preserving AI methods and data anonymization techniques will help maintain compliance while leveraging the full potential of LLMs.

7. Cost and Efficiency Optimization:

At the current cost of around 5 cents per interaction, the system remains relatively affordable, but as more users engage, maintaining cost efficiency will be key. The team plans to experiment with more affordable model versions and optimize the routing strategy to ensure that high-quality responses are delivered at a sustainable price. The goal is to make the intake process not just faster but also economically feasible for widespread adoption.

Building the Next Generation of Legal Aid Systems

Quinten’s presentation at AI + Access to Justice seminar made it clear that while LLMs hold tremendous potential for improving legal intake, human oversight and adaptive systems are crucial to ensure reliability and fairness. The current system’s success — 84% precision, minimal false denials, and positive user feedback — shows that AI-human collaboration is not only possible but also promising.

As the team continues to refine the system, they aim to create a model that can balance efficiency with accuracy, while being adaptable to the diverse and dynamic needs of legal aid programs. The long-term vision is to develop a scalable, open-source tool that local programs can fine-tune and deploy independently, making access to legal support faster and more reliable for those who need it most.

Read the research article in detail here.

See more at Quinten’s group Lemma Legal: https://lemmalegal.com/

Read more about Hannes at Maastricht University: https://cris.maastrichtuniversity.nl/en/persons/hannes-westermann