From Bench to Bar: Harnessing Gen AI for Legal Excellence

Table of Contents

When Judge Kevin Newsom of the U.S. Court of Appeals for the Eleventh Circuit turned to ChatGPT and Google Bard to analyze the meaning of “landscaping” in Snell v. United Specialty Ins., (11th Cir. 2024), he sparked a groundbreaking shift. This moment signifies more than just a novel approach to legal analysis from the federal appellate bench it heralds a new era in which Generative AI becomes an indispensable tool for all legal professionals. As we stand on the cusp of this AI revolution, it’s crucial to understand how Gen AI can transform legal practice, from research and analysis to courtroom strategy.

I applaud Judge Newsom for sharing his Gen AI experience and encouraging discussion among the bench and bar about this emerging and evolving technology. His approach provides valuable insight into how American jurists are grappling with the introduction of Gen AI to our profession.

My contribution to the dialogue – which hopefully aids all legal professionals, not just judges – suggests how Judge Newsom might have gone further in his analysis using Gen AI.

I’ve been sharing this since February 2023 – the early days of ChatGPT – and it bears repeating: good Gen AI responses require context. It’s now over a year later, and we should avail ourselves of ChatGPT’s many worthy rivals. To get the best of all worlds, I recommend sending the same context-filled prompts to many large language models (and, if resources permit, do so multiple times) and then using Gen AI to synthesize and distill the collected responses to arrive at a quality conclusion.

Background

Snell is an insurance coverage case that, in normal circumstances, would have turned on judicial interpretation of the “plain meaning” or “ordinary meaning” of a single word in the insurance policy: “landscaping.” Traditionally, judges turn to dictionaries to determine ordinary meaning: what regular folks would mean by a word. Fortunately for us, a quirk in Alabama state insurance law and the facts of the case disposed of this question, giving Judge Newsom an opportunity to wax on how Gen AI might supplement ordinary-meaning analysis.

James Snell was a landscaper in Alabama who was hired to install an in-ground trampoline in a customer’s lawn. A child’s injury on the trampoline led to lawsuits against everyone involved, including Snell, whose carrier denied insurance coverage on grounds that trampoline installation is not “landscaping.” Snell sued his carrier, but ultimately the federal courts sided with the insurance company.

Judge Newsom’s Concurrence

Judge Newsom’s concurrence reports that he had spent hours contemplating and coalescing dictionary definitions of “landscaping” – similar to what S.D. Ala. Senior District Judge Granade had considered in her summary judgment order – only to be frustrated by the unclear result in these factual circumstances.

He then turned to Gen AI.

Most of the 29-page concurring opinion details what led to this decision and Judge Newsom’s wide-ranging observations from using Gen AI. I’ll focus here on more ways that he could have more effectively leveraged the technology.

In the concurrence’s Appendix, the judge provided transcripts of his interactions with ChatGPT and Google Bard (now rebranded as Google Gemini). Judge Newsom gave both models an identical pair of prompts:

  • “What is the ordinary meaning of ‘landscaping’?” and
  • “Is installing an in-ground trampoline ‘landscaping’?”

Despite the sparse prompts, the Gen AI responses were good– especially Bard’s reasoned analysis on the ultimate issue about the trampoline. With additional context to these prompts, the outputs could have been more robust for a legal application.

Context for Gen AI

Gen AI operates in a black box. Unless it is explicitly given proper “context” – the important information it needs to perform its task – then it must make inferences. Gen AI guessing contributes to the dreaded “hallucinations” phenomenon: responses that appear authoritative and truthful but are inaccurate or false. This is why context is crucial.

In providing a prompt, I suggest revealing to the Gen AI tool who you are, explaining why you’re seeking its AI assistance, and providing it with the same necessary information otherwise given to a human assistant (here, the dictionary definitions of “landscaping”).

Moreover, tell the Gen AI the level of work you expect it to apply and the type of output you desire. If possible, also adjust the Gen AI’s “temperature”: the level of creativity/randomness in its response.

Combined, these details should cause the Gen AI, within its neural network’s sea of words and associations, to focus on relational probabilities between words that more closely align with your goals.

Had I been Judge Newsom’s law clerk, then I would have drafted an initial prompt like this:

I am a U.S. federal appellate judge. I am seeking your AI assistance in researching and thinking through a discreet issue. I will be making my own independent judgment based on all the facts, law and circumstances that you are not privy to.

You will use your analytical abilities and neural network to shed light on the “ordinary meaning” of a particular word, in American English usage. Legal theorists provide guidance that “[t]he ordinary meaning rule is the most fundamental semantic rule of interpretation” for constitutions, statutes, rules, contracts, etc., and “[w]ords are to be understood in their ordinary, everyday meanings.”

The particular word at issue here is “landscaping.” At the federal district court, the parties submitted dictionary definitions, which the district court organized in its opinion in the following table:

[Table omitted for space considerations – see accompanying full transcript.]

Carefully and thoroughly analyze these dictionary definitions as well as your corpus of training data, and describe to me in detail the American English “ordinary meaning” of “landscaping.”

This context-filled approach gives a Gen AI tool much more to chew on, including real-world examples that it may not have received in its training data and clear direction on how to proceed. I gave this prompt to ChatGPT-3.5 (the original version of ChatGPT, which I presume Judge Newsom used), and it yielded the following result:

Based on the compilation of dictionary definitions provided and considering the ordinary usage of the term “landscaping” in American English, we can distill its ordinary meaning as follows:

  1. Modification or Enhancement of Natural Scenery: Many definitions emphasize the act of modifying or enhancing the natural features of a piece of land. This can include altering plant cover, changing contours, and laying out grounds to resemble natural scenery.
  2. Beautification and Decoration: The primary purpose of landscaping is often seen as beautifying or enhancing the appearance of an area of land. This can involve the planning and planting of gardens and grounds to produce picturesque and harmonious effects.
  3. Activities Involved: Landscaping encompasses a range of activities, including planting trees, shrubs, grass, flowers, and other vegetation. It may also involve the installation of structural or decorative features such as walkways, retaining walls, fences, benches, lighting, and water features like reflective pools and fountains.
  4. Supportive Elements: In addition to the primary activities of planting and installing features, landscaping may also involve other supportive elements like irrigation systems, ponds, watercourses, mulch, topsoil, pavers, and decorative rock. Furthermore, it may include the preservation, protection, or replacement of existing natural features like wetlands, trees, and shrubs.
  5. Scope: The term “landscaping” is broad and can refer to activities ranging from small-scale residential yard improvements to extensive commercial or municipal projects involving large areas of land.
  6. Intent: The intent behind landscaping is to improve the appearance of the land, whether it’s a yard, garden, park, or other outdoor space. This improvement is achieved through careful planning, design, and execution of various landscaping activities.
  7. Inclusive Nature: The definition provided by Law Insider is particularly expansive, including both living plant material and structural or decorative features, as well as supportive elements and the preservation or replacement of existing natural features. This underscores the inclusive nature of landscaping and the diverse range of elements it can encompass.

In summary, the ordinary meaning of “landscaping” in American English involves the deliberate modification, enhancement, and beautification of outdoor spaces through a variety of activities aimed at improving their appearance and functionality. This includes planting vegetation, installing features, and incorporating supportive elements to create aesthetically pleasing and functional landscapes.

Compare with ChatGPT’s response to the judge’s seven-word prompt:

“Landscaping” refers to the process of altering the visible features of an area of land, typically a yard, garden or outdoor space, for aesthetic or practical purposes. This can include activities such as planting trees, shrubs, flowers, or grass, as well as installing paths, fences, water features, and other elements to enhance the appearance and functionality of the outdoor space.

Providing more context delivered a more robust result.

Seek Second Opinions From Other Gen AI Tools

Judge Newsom’s approach of supplying the same prompts to two Gen AI models – ChatGPT and Google Bard – was a wonderful start. I call it “getting a second opinion,” but I wouldn’t stop at two.

With several free or low-cost foundational Gen AI tools available, each operating on different large language models (LLMs) and thus applying unique statistical relationships between words within their neural networks, a real benefit exists in casting a wide net.

Variety in LLMs likely hews closer to the judge’s goal of discovering an ordinary meaning of “landscaping.”

Obtaining this variety is just a matter of copying and pasting the context-filled initial prompt. The accompanying transcript collects the responses received from the following Gen AI chatbots:

  1. ChatGPT-3.5
  2. ChatGPT-4
  3. ChatGPT-4o
  4. Claude 3 Opus
  5. Gemini 1.5 Pro (with Temperature set to 0)
  6. Gemini (Drafts 1, 2, and 3)
  7. Microsoft Copilot (set to “More Precise”)
  8. Perplexity.ai (which runs on GPT-3.5)
  9. Latimer.ai

Although not performed in this exercise, I also suggest collecting at least three responses to the prompt from each model – each generated in a new chat, to start from a clean slate – to obtain a more statistically significant sample from each LLM.

While the result is a seemingly daunting amount of text, the next step of digesting that content is where Gen AI shines.

Gen AI Synthesis

A leading use case of Gen AI is the summarization of long texts. A collection of Gen AI responses to an initial prompt tends to be a long, redundant, and nuanced text. Distillation of this content, though, is right up Gen AI’s alley.

A handful of the popular foundational Gen AI chatbots – including ChatGPT and Claude – can read uploaded files. Expanded “context windows” – the term for the amount of memory a Gen AI tool has during a chat – combined with the ability to read documents gives this technology a massive functionality boost, especially for the legal field.

I started new chat threads with four Gen AI tools and fed each a Word doc containing the initial prompt drafted above and its resulting responses, along with this prompt:

In the attached Word document are my initial prompt and various AI assistants’ responses to that prompt. Carefully and thoroughly analyze the Word document, understand the context in which I provided the initial prompt, and synthesize the various AI outputs into a cohesive ordinary meaning of “landscaping”. Take your time and be thoughtful.

The four Gen AI tools used were:

  1. Claude 3 Opus
  2. ChatGPT-4o
  3. Gemini 1.5 Pro (with Temperature set to 0)
  4. Perplexity.ai (running on GPT-3.5)

The results, which you can see in the transcript, boiled the sprawling text down to a more manageable 1,750 words. These Gen AI summaries come from analyzing the outputs of many other Gen AI models, all spurred by the same prompts. They’re crowdsourced, using the collected knowledge and reasoning of all available LLMs rather than relying on the idiosyncrasies of just one.

Gen AI and the Ultimate Issue

We’ve come this far, so why not? Let’s have Gen AI opine on the ultimate issue: is in-ground trampoline installation considered “landscaping”?

Judge Newsom attempted this with his second seven-word prompt, posed to ChatGPT and Bard. My recommended approach is to provide more models with more context—a prompt plus a file containing all prior Gen AI analyses—and submit it at least three times to each Gen AI model to get a stronger statistical sample.

I provided Claude 3 Opus, ChatGPT-4o, and Gemini 1.5 Pro (Perpexlity.ai couldn’t handle the file size) this prompt along with the transcript file, and did so three times:

In the attached Word document are 1. my initial prompt and various AI assistants’ responses to that prompt, followed by 2. my second prompt asking certain AI assistants to synthesize the responses and those AI assistants’ collected outputs. Carefully and thoroughly analyze the Word document, understand the context in which I provided the prompts and received the outputs, synthesize the various AI outputs into a cohesive American English ordinary meaning of “landscaping”, and then consider and detail whether the following fall under the American English ordinary meaning of “landscaping”:

  1. Installing a wood deck on a portion of a large lawn of a home.
  2. Installing a gazebo on a large lawn of a home.
  3. Installing an in-ground swimming pool in the middle of a large lawn of a home.
  4. Installing an in-ground trampoline in the middle of a large lawn of a home.

Take your time and be thoughtful.

I chose to inquire about the three other scenarios – deck, gazebo, and pool – to prime the Gen AI tools with similar examples before arriving at the subject trampoline. This allowed for observing if the tools detected and thought through any distinguishing factors.

The results are highlighted in the accompanying transcript. The three chats with three Gen AI tools yielded 6,000 words, once again a large haul.

ChatGPT-4o seemed to lean toward the trampoline installation qualifying as landscaping, although its third response wanted to see the install as part of a “broader” project to be considered “landscaping.” Claude 3 Opus, on the other hand, would only ascribe a trampoline installation to the ordinary meaning of “landscaping” if it were part of a “comprehensive” project. Gemini 1.5 Pro, meanwhile, found that the “recreational” aspect of the trampoline generally disqualified it from the ordinary meaning of “landscaping.”

While the Gen AI tools all wrote well-organized responses that permit a quick scan to their trampoline sections, a final use of Gen AI in this exercise is to determine the consensus on the ultimate issue among the tools. I performed this with a direct prompt and an upload of the full transcript:

Analyze the attached. What is the various Gen AI tools’ consensus on whether installing an in-ground trampoline is “landscaping” under American English ordinary meaning?

The consensus? The trampoline installation was not within the ordinary meaning of “landscaping.”

Final Thoughts

This approach – giving adequate context and using multiple Gen AI tools multiple times – hopefully inspires legal professionals to explore with Gen AI in legal scenarios.

For expediency, I left two things out of this exercise. First was volume: Querying more models, and doing so more than three times, would have enhanced the statistical strength of the results without much additional time invested (but a diminishing-returns threshold likely exists, so one should avoid over-doing it).

The second omission was multimodality: Both ChatGPT-4o and Claude 3 Opus can interpret images. It would have been interesting to see how those models grappled with the in-ground trampoline images that Judge Newsom included in his Snell concurrence while determining the ultimate issue.

The integration of Gen AI into legal practice is not just an opportunity—it’s an imperative. From Judge Newsom’s chambers to law offices across the country, Gen AI is reshaping how legal professionals approach their work. By embracing this technology and partnering with experts who understand both the legal and technological landscapes, lawyers and judges can elevate their practice to new heights of efficiency, insight, and excellence.

Ready to harness the power of Gen AI for your legal practice? Contact our team to learn how BriefCatch can guide you through this exciting transformation.

Picture of Kyle Bahr, J.D., Product Manager of AI and New Products at BriefCatch

Kyle Bahr, J.D., Product Manager of AI and New Products at BriefCatch

In his role as Product Manager of AI and New Products, Kyle Bahr brings deep industry insights from his experience as Fortune 200 in-house counsel, litigator at White & Case and Reed Smith, federal judicial clerk, and legal tech consultant focused on responsible adoption of Generative AI. Kyle focuses on building AI-enabled products that help legal professionals communicate more effectively and catapult their work to the next level.

Share the Post:

Related Posts

Four Motion Mistakes

Read about the critical insights from a Florida federal judge who corrected dozens of errors in a routine motion, revealing common pitfalls even top legal writers make. Learn from four key examples, including the nuanced rules of capitalization for ‘Order’ and ‘Motion,’ the proper use of ‘Plaintiff,’ ‘Defendant,’ and ‘Court,’ the American standards for punctuating quoted material, and the correct use of ordinal numbers.

Read More