Michael Smith's Law Blog: Citations, Hierarchies, and Algorithms in Legal Scholarship

I read Brian Frye's Techdirt article, "It's the End of Citation As We Know It & I Feel Fine," where he makes the bold claim that the "worst thing about legal scholarship is the footnotes." In a field where article lengths are expanding to 100 pages and beyond, where costly submission software crowds out students and professionals who are outside of the academy, and where journals will publish anything--even ridiculous articles about shooting fish with guns, there is serious competition for the title of what aspect of legal scholarship is "the worst."

I won't deny that footnotes in legal scholarship can be a bit overwhelming for the uninitiated, and that some editors demand citations for everything under the sun. But I'm not sure that the footnote craze is as horrible as Frye makes it out to be. To start, I suspect that the focus on footnotes originates, at least in part, with the legal writing that many law students will go on to prepare in practice, in which arguments referencing cases and statutes require frequent citations to support the claims being made. To the extent that law review articles--particularly those with a more doctrinal focus like 50-state surveys of laws governing the shooting of fish with guns--include citations to case law or statutes, those footnotes should be encouraged to confirm that the legal claim being made has a basis in legal authority.

Even when citations are to scholarly, rather than legal, authorities, frequent footnotes can be helpful. They maybe a resource for those doing research in the area to find related scholarship on particular issues. They can serve as a substitute for literature reviews, reducing the length of what may already be a too-long piece. Footnotes to scholarship may also provide a barometer as to the legitimacy of claims being made. Overreliance on single sources, or--God forbid--citations to one's own work may undermine claims that are presented as well-established. Finally, numerous, repetitive, and useless footnotes, (I'm thinking especially of introduction footnotes beginning with "See infra Section __"), should be dealt with by the author, who can refuse to include such footnotes in the initial draft and who can (and should) push back on editors who demand such useless additions.

Frye turns to a discussion of ScholarSift, a platform created by Rob Anderson and Trent Wenzel, that purports to analyze legal scholarship to "identify the most relevant articles." From what little I can find out about ScholarSift, people can upload an article (either a draft, completed, or already-published article) and the system locates "relevant" articles based on analysis of the article's text and citations.

Frye suggests that ScholarSift could be used as a substitute for footnotes by finding sources that are similar or relevant to the text being analyzed. This does not seem feasible, as the software appears to be built around connecting authors to similar, or "relevant" sources based on the whole of a draft. It does not appear that the system is designed to connect one particular statement or proposition in an article to a source (or sources) that support that statement--instead, it generates a list of "related" articles (and, I think, cases, laws, and maybe books) that are "relevant" to the article as a whole. Replacing footnotes with this program would be similar to a law review article listing a bibliography at the end and telling the reader to look through all the sources to confirm whether the article's contents are accurate. As much work as sorting through footnotes may be, this approach sounds like much more of a burden.

(I admit that I do not have a ScholarSift account--which you apparently can only get by submitting a request by email. If my description of how the system's use of submissions to generate results is therefore incorrect, I welcome corrections.)

But Frye's discussion of ScholarSift raises some interesting notions about how it may assist in legal research help legal scholarship as a whole. Frye writes:

It works really well. As far as I can tell, ScholarSift is kind of like Turnitin in reverse. It compares the text of a law review article to a huge database of law review articles and tells you which ones are similar. Unsurprisingly, it turns out that machine learning is really good at identifying relevant scholarship. And ScholarSift seems to do a better job at identifying relevant scholarship than pricey legacy platforms like Westlaw and Lexis.

One of the many cool things about ScholarSift is its potential to make legal scholarship more equitable. In legal scholarship, as everywhere, fame begets fame. All too often, fame means the usual suspects get all the attention, and it’s a struggle for marginalized scholars to get the attention they deserve. Unlike other kinds of machine learning programs, which seem almost designed to reinforce unfortunate prejudices, ScholarSift seems to do the opposite, highlighting authors who might otherwise be overlooked. That’s important and valuable. I think Anderson and Wenzel are on to something, and I agree that ScholarSift could improve citation practices in legal scholarship.

I'm a bit less optimistic than Frye about ScholarSift, largely because I cannot find any information on how it works, it is unclear what database of documents ScholarSift pulls from, and I am concerned that to the extent that it relies on a database of legal scholarship, the hierarchical problems that Frye identifies in his article may still be imported into results.

Regarding the lack of information, ScholarSift's website contains virtually no information about how the system operates. I have not been able to locate additional written information on ScholarSift anywhere else--although my search for such information was admittedly a cursory one. I located, and listened to, this Ipse Dixit podcast interview of Rob Anderson who describes how ScholarSift works. But, as is the case with the website, the information is presented in largely conclusory terms--describing how the system will "look at" the text and citations of an article uploaded to it and analyze "relationships" between other articles to sift through a database of "a few hundred thousand articles" and list results in an order of what is most "closely-related." It remains unclear how "relevance" or "closely-related" determinations are made, although it appears that this is done through an analysis of the text, including commonly used words, phrases, and maybe combinations and/or proximities of words or phrases to one another.

The makeup of ScholarSift's database of articles and sources is also unclear. The contents of the database, how determinations are made on what to include, and the age of what is included are all mysterious. On the podcast, Anderson notes that scholars using the platform can upload their own drafts or articles to ensure that they are part of the database, but I expect this would only account for a small portion of what makes up the database. Without more information on the database, its contents, and how its contents are selected, it is impossible to conclude that ScholarSift can conduct exhaustive searches of potentially relevant material.

Finally, Frye and Anderson note that ScholarSift may help break down hierarchy problems in legal academia, where big names from prestigious institutions tend to be overcited, and where the body of scholarship consists largely of articles written by white, male authors. If the processes for locating "relevant" article truly focus on an article's text, perhaps the platform will have some impact. But I have my doubts.

First, the platform is meant to analyze articles and their citations in locating relevant results (although this will supposedly change as the system develops). If citations are included as inputs, though, they will influence the searches, and authors' biases in selecting their own sources will likely be reflected in the results.

Second, related to my concern about the database's contents, ScholarSift presumably draws from a body of scholarship in legal academia, in which white male authors are overrepresented (especially if historical writing is taken into account). Even if determinations of relevance are based on the text of submissions and articles in the database, this will still result in results skewing towards white males if they make up the bulk of what is included in the database.

Third, Anderson notes that there will be features permitting searchers to filter results in various ways, including by high citation rates. This suggests that the system at least includes information on articles' citation rates, and this information may influence what results are deemed "relevant"--which may in turn continue to perpetuate hierarchies of overciting authors from the most prestigious institutions. This last concern is, admittedly, speculative, but without information on how the algorithm works, this concern shouldn't be dismissed. Additionally, as the program develops (and, especially, if it is used to generate a profit), there may be pressure to prioritize results that account for the "prestige" of an author or publications--which could defeat the very purpose of the platform.

If ScholarSift is truly text-focused (as Frye describes, "Turnitin in reverse"), it may have a positive impact on legal research and lead to increased diversity in citations. Hopefully that will be the case. And Anderson notes that ScholarSift may end up being an alternate mean for submitting articles to law reviews--an outcome I would be happy to see. I think it is still early days and that it is too soon to be overly optimistic. But ScholarSift may be worth watching and including as one of many tools used by authors of legal scholarship.

Michael Smith's Law Blog

Pages

Tuesday, March 23, 2021

Citations, Hierarchies, and Algorithms in Legal Scholarship

No comments:

Post a Comment