It works really well. As far as I can tell, ScholarSift is kind of like Turnitin in reverse. It compares the text of a law review article to a huge database of law review articles and tells you which ones are similar. Unsurprisingly, it turns out that machine learning is really good at identifying relevant scholarship. And ScholarSift seems to do a better job at identifying relevant scholarship than pricey legacy platforms like Westlaw and Lexis.
One of the many cool things about ScholarSift is its potential to make legal scholarship more equitable. In legal scholarship, as everywhere, fame begets fame. All too often, fame means the usual suspects get all the attention, and it’s a struggle for marginalized scholars to get the attention they deserve. Unlike other kinds of machine learning programs, which seem almost designed to reinforce unfortunate prejudices, ScholarSift seems to do the opposite, highlighting authors who might otherwise be overlooked. That’s important and valuable. I think Anderson and Wenzel are on to something, and I agree that ScholarSift could improve citation practices in legal scholarship.
I'm a bit less optimistic than Frye about ScholarSift, largely because I cannot find any information on how it works, it is unclear what database of documents ScholarSift pulls from, and I am concerned that to the extent that it relies on a database of legal scholarship, the hierarchical problems that Frye identifies in his article may still be imported into results.
Regarding the lack of information, ScholarSift's website contains virtually no information about how the system operates. I have not been able to locate additional written information on ScholarSift anywhere else--although my search for such information was admittedly a cursory one. I located, and listened to, this Ipse Dixit podcast interview of Rob Anderson who describes how ScholarSift works. But, as is the case with the website, the information is presented in largely conclusory terms--describing how the system will "look at" the text and citations of an article uploaded to it and analyze "relationships" between other articles to sift through a database of "a few hundred thousand articles" and list results in an order of what is most "closely-related." It remains unclear how "relevance" or "closely-related" determinations are made, although it appears that this is done through an analysis of the text, including commonly used words, phrases, and maybe combinations and/or proximities of words or phrases to one another.
The makeup of ScholarSift's database of articles and sources is also unclear. The contents of the database, how determinations are made on what to include, and the age of what is included are all mysterious. On the podcast, Anderson notes that scholars using the platform can upload their own drafts or articles to ensure that they are part of the database, but I expect this would only account for a small portion of what makes up the database. Without more information on the database, its contents, and how its contents are selected, it is impossible to conclude that ScholarSift can conduct exhaustive searches of potentially relevant material.
Finally, Frye and Anderson note that ScholarSift may help break down hierarchy problems in legal academia, where big names from prestigious institutions tend to be overcited, and where the body of scholarship consists largely of articles written by white, male authors. If the processes for locating "relevant" article truly focus on an article's text, perhaps the platform will have some impact. But I have my doubts.
First, the platform is meant to analyze articles and their citations in locating relevant results (although this will supposedly change as the system develops). If citations are included as inputs, though, they will influence the searches, and authors' biases in selecting their own sources will likely be reflected in the results.
Second, related to my concern about the database's contents, ScholarSift presumably draws from a body of scholarship in legal academia, in which white male authors are overrepresented (especially if historical writing is taken into account). Even if determinations of relevance are based on the text of submissions and articles in the database, this will still result in results skewing towards white males if they make up the bulk of what is included in the database.
Third, Anderson notes that there will be features permitting searchers to filter results in various ways, including by high citation rates. This suggests that the system at least includes information on articles' citation rates, and this information may influence what results are deemed "relevant"--which may in turn continue to perpetuate hierarchies of overciting authors from the most prestigious institutions. This last concern is, admittedly, speculative, but without information on how the algorithm works, this concern shouldn't be dismissed. Additionally, as the program develops (and, especially, if it is used to generate a profit), there may be pressure to prioritize results that account for the "prestige" of an author or publications--which could defeat the very purpose of the platform.
If ScholarSift is truly text-focused (as Frye describes, "Turnitin in reverse"), it may have a positive impact on legal research and lead to increased diversity in citations. Hopefully that will be the case. And Anderson notes that ScholarSift may end up being an alternate mean for submitting articles to law reviews--an outcome I would be happy to see. I think it is still early days and that it is too soon to be overly optimistic. But ScholarSift may be worth watching and including as one of many tools used by authors of legal scholarship.
No comments:
Post a Comment