Search This Blog

Tuesday, April 19, 2022

The Opaque, Incomplete Corpus Linguistics Analysis in the Mask Mandate Ruling

As virtually all news outlets have reported, Judge Kathryn Kimball Mizelle of the United States District Court, Middle District of Florida, recently struck down the CDC's requirement that people wear masks in certain settings, including plains, train stations, busses, and other public transit settings. The ruling is here. The case is Health Freedom Defense Fund, Inc. v. Biden.

Reaction to the ruling has been swift. Most of the commentary is an early phase, with initial reactions widespread on Twitter. For some initial, detailed discussion, Ilya Somin writes about the opinion here, suggesting that it is more defensible than critics claim, but that it may still be vulnerable to being overturned on appeal. The Wall Street Journal's editorial page writes favorably of the ruling here. The Washington Post has dueling takes for and against the ruling. 

More undoubtedly will be written as the days go on and as commentators parse the 59-page decision. Initial reactions suggest that the textualist methodology employed by the court is lacking. (See, e.g., here and here). I want to focus on one portion of the analysis: the court's use of corpus linguistics methodology in support of its conclusion.

I won't delve into the intricacies of the dispute and all of the arguments made by the parties and addressed in the ruling. In brief, the court addressed whether 42 U.S.C. § 264(a) was a sufficient basis for the CDC's mask requirement. The court concluded it was not, finding, among other things, that section 264(a)'s grant of power to provide for "sanitation" did not apply to requiring masks.

In reaching this conclusion, one method the court employed was corpus linguistics--a method in which databases of documents and texts are searched for instances in which words and phrases are used. In theory, one trying to determine the meaning of a word or phrase can type that word or phrase into a corpus linguistics database and examine the instances in which that word or phrase is used across a wide variety of texts. In doing so, patterns may emerge demonstrating multiple usages, common trends in meaning, and other information that may aid in determining the definition (or definitions) of a term. This method of interpretation has gained steam in recent years, particularly in originalists circles where it is hailed as a groundbreaking method for determining the original public meaning of constitutional provisions. (See articles by Lawrence Solum and Thomas Lee and James Phillips advocating the use of corpus linguistics in the originalist context). As I've noted in recent work coauthored with Alexander Hiland, this methodology raises a fair share of concerns, including a lack of transparency as to how a judge undertook the corpus linguistics analysis. The corpus linguistics analysis in Judge Mizelle's opinion demonstrates that this concern is well-founded.

Here's the excerpt of the ruling addressing corpus linguistics (from pages 17-18 of the ruling):

Customary usage at the time agrees. One method to assess the ordinary meaning of a term is to search a database of naturally occurring language. A search returns the desired word as well as its context and, with a sufficient sample size, search results permit inferences on how a word was used. This method is known as corpus linguistics.[FN 2] The Court here searched the Corpus of Historical American English (COHA) [FN 3] to find uses of "sanitation" between 1930 and 1944. Of the 507 results, the most frequent usage of sanitation fit the primary sense described above: a positive act to make a thing or place clean. Common examples referred to sanitation in the context of garbage disposal, sewage and plumbing, or direct cleaning of a dirty or contaminated object. In contrast, by far the least common usage-hovering around 5% of the data set-was of sanitation as a measure to maintain a status of cleanliness, or as a barrier to keep something clean. And so, the COHA search results are consistent with the contextual clues of the active words surrounding sanitation in § 264(a).
[FN 2]: "Corpus linguistics is an empirical approach to the study of language that uses large, electronic databases" of language gathered from sources such as books, magazines, and newspapers. Thomas R. Lee & Stephen C. Mouritsen, Judging Ordinary Meaning, 127 YALE L.J. 788, 828 (2018) (footnote omitted) (describing this tool).
[FN 3]: The COHA corpus is publicly available. See CORPUS OF HISTORICAL AMERICAN ENGLISH, https://www.english-corpora.org/coha/ (last visited Apr. 12, 2022). It is "the largest structured corpus of historical English." Id. Because Congress enacted the PHSA in 1944, the Court searched for uses of the word "sanitation" and variants like "sanitary'' and "sanitize" between 1930 and 1944. The search returned 507 hits, or "concordance lines."
This description of the analysis that the court undertook and the conclusions drawn from the analysis lack transparency and raise a number of questions.

To start, it is unclear what search (or searches) the court undertook. In footnote 3, the court indicates that it searched for the word "sanitation," and also "variants like 'sanitary' and 'sanitize'." The court does not specify how many variants of "sanitation" it searched for--only providing two examples of such variants. It appears that the court conducted a single search, although this is unclear as well, given the acknowledgment that the court searched for variations on "sanitation." This suggests that there was a single search for "sanitation," along with some of its variants (say: "sanitation OR sanitary OR sanitize"). The court's failure to specify the precise terms of its search, however, leave the reader to speculate as to how the court conducted its search of the database.

The court's analysis also lacks transparency regarding how it coded the results of its search. The court gives two apparent examples of its coding: results reflecting the meaning, "a positive act to make a thing or place clean" and results reflecting "a measure to maintain a status of cleanliness, or as a barrier to keep something clean." The court does not specify whether its search uncovered alternate meanings, and the percentage of results that fell into these meanings. Indeed, the only percentage specified in the opinion is that five percent of results were consistent with the "measure to maintain a status of cleanliness" meaning. The court is silent as to the percentage of results that fell into the other meaning it identifies, stating only that the "most frequent usage of sanitation" fit that sense. It is unclear whether sanitation as "a positive act to make a thing or place clean" was a majority of the results or a plurality of the results. Even if this sense was a majority of the results, there is still room for the possibility of a frequently used, third meaning that the court does not identify here. Without a breakdown of how results were coded and the frequency of hits for each definition that was code one cannot know how the court reached its decision or evaluate the significance of its conclusions regarding the frequency of definitions.

The court also fails to list sufficient examples detailing how it decided to categorize the search results. The court notes that it one of its definitions was "a positive act to make a thing or place clean," but goes on to provide purported "examples" that list little more than characterizations of the context of the terms. Some of these terms are likely loaded: "sewage" and "garbage disposal," for instance, suggest that some of these results may have been uses of the term, "sanitation," in a specialized sense to describe a "Department of Sanitation." The court does not specify how it classified search results like this, or whether it treated such results differently from the use of "sanitation" in non-departmental contexts. Departments of Sanitation likely carry out a variety of activities, including taking positive actions to make places clean and to maintain a state of cleanliness. Accordingly, it is likely that the court's analysis included hits that, themselves, could have been interpreted in multiple ways. The court does not specify whether, in these cases, it selected one of the alternate definitions, whether it discounted the result from its analysis, or whether it coded the result as including multiple definitions.

Additionally, this analysis illustrates an overall issue with corpus linguistics analysis: the methodology does not contain a basis for selecting among multiple meanings that a search of the relevant corpus uncovers, or whether choosing a single meaning is appropriate. Here, the court notes that the results of its analysis indicated that there were at least two potential meanings of "sanitation." The court ultimately concludes that it should use the variation that appears most frequently, but does not state why the most frequent usage is the only usage that should be employed (for a critique of this assumption, see Donald Drakeman's essay on corpus linguistics; although see Neal Goldfarb's arguments to the contrary). Moreover, selecting a particular meaning of "sanitation" is, itself, a judgment call as to how broadly to read the statute. If one takes a wide view of the CDC's power, one may be inclined to read "sanitation" to cover all of its potential meanings, so as to allow the CDC to take a broad range of actions. Limiting the reading of "sanitation" to only one of its alternate definitions is, itself, a judgment call as to the appropriate breadth of the agency's power--yet no basis or justification for this assumption is set forth.

Advocates of corpus linguistics claim that it can bring a level of empirical rigor to legal interpretation. But as the court's ruling in this case demonstrates, corpus linguistics can backfire and lead to conclusions based on methodology that is impossible to examine or verify. Advocates of corpus linguistics will undoubtedly argue that misapplications of the methodology should not count against the method itself. But in a world where attorneys may increasingly seek to use corpus linguistics in a one-sided manner to convince judges that their position is correct, or where judges themselves employ corpus linguistics without the necessary transparency, the costs of this method to judicial transparency may outweigh the benefits. A judge or attorney may abuse dictionary definitions by selecting a particular dictionary or one particular definition among alternate, plausible definitions. But these abuses can be identified and critiqued. This is not the case with incomplete corpus linguistics analysis, in which a failure to disclose search terms, coding methods, and percentages of coded results makes it impossible to evaluate the interpretive methods employed. The court's decision in Health Freedom Defense Fund illustrates how this opaque, incomplete methodology can impact the lives of millions.

In Praise of Unoriginal Scholarship

Authors of law review articles frequently claim that their article "fills a gap in the literature," that they are making an "original" contribution, or that their take on an issue (that has likely been the subject of prolonged debate for decades, if not centuries) is a completely new perspective. Critics scoff at these claims, arguing that they are almost always exaggerated and that, when it comes to big ideas about the law, there's nothing new under the sun (an example of this criticism is here). These critiques aren't far off. A lot of scholarship addresses arguments that have already been made elsewhere in the legal literature or in the academic literature of adjacent scholarly disciplines (which more than a few academic legal writers tend to avoid). 

Despite this criticism, authors continue to claim originality and take what they may well believe to be original--albeit potentially unusual or ridiculous--positions on issues. They do so in the hope that they'll end up making a novel contribution that ends up resonating (or, less charitably, with the hope that they can at least dupe editors and readers into thinking that they're making a truly original point).

Perhaps those caught up in the modern race to fill literature gaps and make original claims should take to heart the words of Jeremiah Smith in his article, "The Use of Maxims in Jurisprudence," where he prefaces his critique of various legal maxims with this disclaimer:

Those who are wont to eulogize maxims may not unreasonably require their critics to "file a specification." In compliance with this request, we proceed to furnish specific criticisms of some specific maxims. And the objections to these maxims will be stated, so far as practicable, in the words of jurists of acknowledged reputation. One who has the temerity to attack popular idols can hardly expect even to obtain a hearing, much less to convince, if he relies solely on the views "evolved from his own inner consciousness." The convincing force, if any such there be, of this article will consist in its want of originality. (emphasis added).
The landscape of legal scholarship may become just a bit less ridiculous if more authors and editors meditate on the last sentence of that paragraph. 

Wednesday, April 13, 2022

Originalism and the Dual Critiques of Indeterminacy and Dishonesty

At the Originalism Blog, Michael Ramsey writes about a couple of recent columns by Eric Segall and Andrew Koppelman, both of which are inspired by Justice Ketanji Brown Jackson's confirmation hearing. This post focuses on one of Ramsey's quick reactions to Koppelman's column.

Koppelman writes:

Originalism has three central problems. It doesn’t really constrain judges. Even if it did, it would do so randomly and chaotically. But in fact, as it has been deployed in the Supreme Court, it is a fraud: The self-styled originalists don’t really care about historical evidence. They manipulate it to reach the results they find politically congenial, and then parade their virtue by saying they are merely following the law.

Ramsey responds:

This essay, like others in a similar vein, seems to suffer from inconsistent claims: either (a) original meaning is indeterminate and so isn't useful as a way of constraining judges, or (b) purportedly originalist judges are frauds in that they ignore the best historical evidence to follow their political preferences. These are both potentially powerful critiques but they're inconsistent so the critic needs to pick one or the other. (Lawyers can argue in the alternative but scholars shouldn't.)

Ramsey goes on to respond to each of these issues, but I want to focus on this initial point. This claim of inconsistency is misguided, as both of these problems may exist simultaneously. One way to illustrate this is to take into account multiple categories of originalist actors. I focus on two categories here: (1) academic originalists; and (2) judges and Justices who claim to be originalists. Critiques of indeterminacy tend to have their place in the academic 

First, consider critiques of originalism as presented by its scholarly proponents. A critic may correctly argue that original meaning may be difficult or impossible to determine. There may also be multiple potential original interpretations of particular provisions. Academic originalists aren't without responses.  They often acknowledge that determining meaning is only one step of interpreting and applying the Constitution to cases before courts or to guide one's behavior. Interpreters may rely on particular rules to determine what interpretation should ultimately be implemented--for examples, rules that the most commonly used meanings of terms should be employed in cases where provisions have multiple meanings, or rules that ambiguous provisions should be interpreted with an eye to the "spirit" of the provisions (or the Constitution as a whole). Critics, in turn, may respond that the choice of what rules to use may inject further indeterminacy into what meaning is ultimately implemented. They may also argue that certain approaches to translating the meaning of constitutional provisions into implementable legal rules and determinations may introduce plenty of opportunities for vagueness and personal opinions to sway decisions. It may turn out that these layers of potential inconsistency are enough to doom originalism as a desirable approach to constitutional interpretation. The debate goes on.

While battle over academic theories of originalism rages in the pages of law reviews and on legal blogs, judges and Justices are making decisions on real world cases. Some of these decisions may be made on purportedly originalist grounds. And it may not take much analysis to realize that these originalist grounds ultimately have little to no connection to any accepted version of the original public meaning of the Constitution. Through selective citations to the vast originalist literature (vast, in part, due to the sheer quantity of theorizing necessary to translate indeterminate provisions into implementable interpretations), judges and Justices may back up their goal-oriented decisions with enough citations and historic hand-waving to create an appearance of legitimacy. 

There doesn't seem to be anything inconsistent about pointing out these two problems with originalist interpretation. Originalism may ultimately provide a range of potential meanings--a range that may be so broad (or that lacks a principled manner of choosing between options). While this issue and the potential meanings and methods for choosing between meanings is debated at the academic level, disingenuous, goal-oriented judges and attorneys may purport to take an originalist approach, yet go beyond the range of potential meanings originalism suggests. They may also reach conclusions that turn out to be consistent with originalism by happenstance, should those results end up being consistent with the desired result.  

Both of these problems can exist simultaneously, and both must be accounted for should originalists ever hope for their theory to be applied in a consistent, meaningful manner. Highlighting the seeming inconsistency of these problems in an effort to avoid criticism is nothing more than a dodge. 

One final point:

"Lawyers can argue in the alternative but scholars shouldn't." This seems to be a throwaway line, but it's a revealing example of legal academia's disconnect from the practice of law. There's been some more discussion lately of the ongoing trend of hiring professors with substantial academic, clerking, and fellowship credentials, but with little to no practice experience. Here, the disconnect is made explicit: "lawyers" are separated from "scholars," with the tactics of the former group having no place in the discourse of the latter. Perhaps this is an attempt to justify legal scholarship's increasing disconnect from the realities of practice--if the argumentative methods of lawyers has no place in legal academia, perhaps there is no loss as a result of the widening gap between practice and scholarship.

This instinct seems misguided. While legal scholarship certainly appears different from the arguments attorneys make in briefs and in court, it is (at least in theory) written with the purpose of describing or influencing the legal landscape. If legal scholarship is disconnected from practice (including the practical aspects of lawmaking, judicial opinions, and other real-world aspects of the law), it will end up having little to no impact beyond the theoretical universe it inhabits.