Categories
AI + Access to Justice Current Projects

Measuring What Matters: A Quality Rubric for Legal AI Answers

by Margaret Hagan, Executive Director of the Legal Design Lab

Measuring What Matters: A Quality Rubric for Legal AI Answers

As more people turn to AI for legal advice, a pressing issue emerges: How do we know whether AI-generated legal answers are actually helpful? While legal professionals and regulators may have instincts about good and bad answers, there has been no clear, standardized way to evaluate AI’s performance in this space — until now.

What makes a good answer on a chatbot, clinic, livechat, or LLM site?

My paper for the JURIX 2024 conference, Measuring What Matters: Developing Human-Centered Legal Q-and-A Quality Standards through Multi-Stakeholder Research, tackles this challenge head-on. Through a series of empirical studies, the paper develops a human-centered framework for evaluating AI-generated legal answers, ensuring that quality benchmarks align with what actually helps people facing legal problems. The findings provide valuable guidance for legal aid organizations, product developers, and policymakers who are shaping the future of AI-driven legal assistance.

Why Quality Standards for AI Legal Help Matter

When people receive a legal notice — like an eviction warning or a debt collection letter — they often turn to the internet for guidance. Platforms such as Reddit’s r/legaladvice, free legal aid websites, and now AI chatbots have become common sources of legal information. However, the reliability and usefulness of these answers vary widely.

AI’s increasing role in legal Q&A raises serious questions:

  • Are AI-generated answers accurate and actionable?
  • Do they actually help users solve legal problems?
  • Could they mislead people, causing harm rather than good?

My research addresses these concerns by involving multiple stakeholders — end users, legal experts, and technologists — to define what makes a legal answer “good.”

The paper reveals several surprising insights about what actually matters when evaluating AI’s performance in legal Q&A. Here are some key takeaways that challenge conventional assumptions:

1. Accuracy Alone Isn’t Enough — Actionability Matters More

One of the biggest surprises is that accuracy is necessary but not sufficient. While many evaluations of legal AI focus on whether an answer is legally correct, the study finds that what really helps people is whether the answer provides clear, actionable steps. A technically accurate response that doesn’t tell someone what to do next is not as valuable as a slightly less precise but highly actionable answer.

Example of accuracy that is not helpful to user’s outcome:

  • AI says: “Your landlord is violating tenant laws in your state.” (Accurate but vague)
  • AI says: “You should file a response within a short time period — often 7 days. (Though this 7 days may be different depending on your exact situation.) Here’s a link to your county’s tenant protection forms and a local legal aid service.” (Actionable and useful)

2. Accurate Information Is Not Always Good for the User

The study highlights that some legal rights exist on paper but can be risky to exercise in practice — especially without proper guidance. For example, withholding rent is a legal remedy in many states if a landlord fails to make necessary repairs. However, in reality, exercising this right can backfire:

  • Many landlords retaliate by starting eviction proceedings.
  • The tenant may misapply the law, thinking they qualify when they don’t.
  • Even when legally justified, withholding rent can lead to court battles that tenants often lose if they don’t follow strict procedural steps.

This is a case where AI-generated legal advice could be technically accurate but still harmful if it doesn’t include risk disclosures. The study suggests that high-risk legal actions should always come with clear warnings about potential consequences. Instead of simply stating, “You have the right to withhold rent,” a high-quality AI response should add:

  • “Withholding rent is legally allowed in some cases, but it carries huge risks, including eviction. It’s very hard to withhold rent correctly. Reach out to this tenants’ rights organization before trying to do it on your own.”

This principle applies to other “paper rights” too — such as recording police interactions, filing complaints against employers, or disputing debts — where following the law technically might expose a person to serious retaliation or legal consequences.

Legal answers should not just state rights but also warn about practical risks — helping users make informed, strategic decisions rather than leading them into legal traps.

3. Legal Citations Aren’t That Valuable for Users

Legal experts often assume that providing citations to statutes and case law is crucial for credibility. However, both users and experts in the study ranked citations as a lower-priority feature. Most users don’t actually read or use legal citations — instead, they prefer practical, easy-to-understand guidance.

However, citations do help in one way: they allow users to verify information and use it as leverage in disputes (e.g., showing a landlord they know their rights). The best AI responses include citations sparingly and with context, rather than overwhelming users with legal references.

4. Overly Cautious Warnings Can Be Harmful

Many AI systems include disclaimers like “Consult a lawyer before taking any action.” While this seems responsible, the study found that excessive warnings can discourage people from acting at all.

Since most people seeking legal help online don’t have access to a lawyer, AI responses should avoid paralyzing users with fear and instead guide them toward steps they can take on their own — such as contacting free legal aid or filing paperwork themselves.

5. Misleading Answers Are More Dangerous Than Completely Wrong Ones

AI-generated legal answers that contain partial truths or misrepresentations are actually more dangerous than completely wrong ones. Users tend to trust AI responses by default, so if an answer sounds authoritative but gets key details wrong (like deadlines or filing procedures), it can lead to serious harm (e.g., missing a legal deadline).

The study found that the most harmful AI errors were related to procedural law — things like incorrect filing deadlines, court names, or legal steps. Even small errors in these areas can cause major problems for users.

6. The Best AI Answers Function Like a “Legal GPS”

Rather than replacing lawyers, users want AI to act like a smart navigation system — helping them spot legal issues, identify paths forward, and get to the right help. The most helpful answers do this by:

  • Quickly diagnosing the problem (understanding what the user is asking about).
  • Giving step-by-step guidance (telling the user exactly what to do next).
  • Providing links to relevant forms and local services (so users can act on the advice).

Instead of just stating the law, AI should orient users, give them confidence, and point them toward useful actions — even if that means simplifying some details to keep them engaged.

AI’s Role in Legal Help Is About Empowerment, Not Just Information

The research challenges the idea that AI legal help should be measured only by how well it mimics a lawyer’s expertise. Instead, the most effective AI legal Q&A focuses on empowering users with clear, actionable, and localized guidance — helping them take meaningful steps rather than just providing abstract legal knowledge.

Key Takeaways for Legal Aid, AI Developers, and Policymakers

The paper’s findings offer important lessons for different stakeholders in the legal AI ecosystem.

1. Legal Aid Organizations: Ensuring AI Helps, Not Hurts

Legal aid groups may increasingly rely on AI to extend their reach, but they must be cautious about its limitations. The research highlights that users want AI tools that:

  • Provide clear, step-by-step guidance on what to do next.
  • Offer jurisdiction-specific advice rather than generic legal principles.
  • Refer users to real-world resources, such as legal aid offices or court forms.
  • Are easy to read and understand, avoiding legal jargon.

Legal aid groups should ensure that the AI tools they deploy adhere to these quality benchmarks. Otherwise, users may receive vague, confusing, or even misleading responses that could worsen their legal situations.

2. AI Product Developers: Building Legal AI Responsibly & Knowing Justice Use Cases

AI developers must recognize that accuracy alone is not enough. The paper identifies four key criteria for evaluating the quality of AI legal answers:

  1. Accuracy — Does the answer provide correct legal information? And when legal information is accurate but high-risk, does it tell people about rights and options with sufficient context?
  2. Actionability — Does it offer concrete steps that the user can take?
  3. Empowerment — Does it help users feel capable of handling their problem?
  4. Strategic Caution — Does it avoid causing unnecessary fear or discouraging action?

One surprising insight is that legal citations — often seen as a hallmark of credibility — are not as critical as actionability. Users care less about legal precedents and more about what they can do next. Developers should focus on designing AI responses that prioritize usability over technical legal accuracy alone.

3. Policymakers: Regulating AI for Consumer Protection & Outcomes

For regulators, the study underscores the need for clear, enforceable quality standards for AI-generated legal guidance. Without such standards, AI-generated legal help may range from extremely useful to dangerously misleading.

Key regulatory considerations include:

  • Transparency: AI platforms should disclose how they generate answers and whether they have been reviewed by legal experts.
  • Accuracy Audits: Regulators should develop auditing protocols to ensure AI legal help is not systematically providing incorrect or harmful advice.
  • Consumer Protections: Policies should prevent AI tools from deterring users from seeking legal aid when needed.

Policymakers ideally will be in conversation with frontline practitioners, product/model developers, and community members to understand what is important to measure, how to measure it, and how to increase the quality and safety of performance. Evaluation based on concepts like Unauthorized Practice of Law does not necessarily correspond to consumers’ outcomes, needs, and priorities. Rather, figuring out what is beneficial to consumers should be based on what matters to the community and frontline providers.

The Research Approach: A Human-Centered Framework

How did we identify these insights and standards? The study used a three-part research process to hear from community members, frontline legal help providers, and access to justice experts. (Thanks to the Legal Design Lab team for helping me with interviews and study mechanics!)

  1. User Interviews: 46 community members tested AI legal help tools and shared feedback on their usefulness and trustworthiness.
  2. Expert Evaluations: 21 legal professionals ranked the importance of various quality criteria for AI-generated legal answers.
  3. AI Response Ratings: Legal experts assessed real AI-generated answers to legal questions, identifying common pitfalls and best practices.

This participatory, multi-stakeholder approach ensures that quality metrics reflect the real-world needs of legal aid seekers, not just theoretical legal standards.

The Legal Q-and-A Quality Rubric

What’s Next? Implementing the Quality Rubric

The research concludes with a proposed Quality Rubric that can serve as a blueprint for AI developers, researchers, and regulators. This rubric provides a scoring system that evaluates legal AI answers based on their strengths and weaknesses across key quality dimensions.

Potential next steps include:

  • Regular AI audits using the Quality Rubric to track performance.
  • Collaboration between legal aid groups and AI developers to refine AI-generated responses.
  • Policy frameworks that hold AI platforms accountable for misleading or harmful legal information.

Others might be developing internal quality review of the RAG-bots and AI systems on their websites and tools. They can use the rubric above as they are doing safety and quality checks, or training human labelers or AI automated judges to conduct these checks.

Conclusion: Measuring AI for Better Access to Justice

AI holds great promise for expanding access to legal help, but it must be measured and managed effectively. My research provides a concrete roadmap for ensuring that AI legal assistance is not just technically impressive but genuinely useful to people in need.

For legal aid organizations, the priority should be integrating AI tools that align with the study’s quality criteria. For AI developers, the challenge is to design products that go beyond accuracy and focus on usability, actionability, and strategic guidance. And for policymakers, the responsibility lies in crafting regulations that ensure AI-driven legal help does more good than harm.

As AI continues to transform how people access legal information, establishing clear, human-centered quality standards will be essential in shaping a fair and effective legal tech landscape.

Need for More Benchmarks of More Legal Tasks

In addition to this current focus on Legal Q-and-A, the justice community also needs to create similar evaluation standards and protocols for other tasks. Besides answering brief legal questions, there are other quality questions that matter to people’s outcomes, rights, and justice. This is the first part of a much bigger effort to have measurable, meaningful justice interventions.

This focus on delineated tasks & quality measures for each will be essential for quality products and models — serving the public — and unlocking greater scale and support of innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.