The Efficacy of ChatGPT: Is it Time for the Librarians to Go Home?

 

On November 30, 2022, OpenAI released ChatGPT (“Generative Pre-trained Transformer”) to the public. OpenAI is a research company that works in the area of artificial intelligence and is a hybrid for-profit/non-profit organization. ChatGPT is an artificial intelligence language model that can generate human-like text based on a given prompt. It is trained on huge data sets consisting of a massive amount of text data. The Harvard Business Review notes, “while versions of GPT have been around for a while, this model has crossed a threshold: It’s genuinely useful for a wide range of tasks, from creating software to generating business ideas to writing a wedding toast.”

In preparation for a presentation about race and academic libraries I tried ChatGPT (Jan 9 version) to see what it (they?) had to say. I was curious about how it worked and how accurately it responded to queries. The system is not quite ready for prime time, as even OpenAI notes, “While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.” OpenAI’s philosophy is to “release these models into the wild before all the guardrails are in place, in the hopes that feedback from users will help the company find and address harms based on interaction in the real world.”

For this assessment I posed queries to ChatGPT about racism and whiteness in academic libraries, requested more information in a follow-up ask, and then requested the system provide me with citations on the topic. (See Appendix I: Transcript for a transcript of the interaction with ChatGPT.)

Analysis

In a series of interactions, ChatGPT was asked to provide information about racism and whiteness in academic libraries. The responses generated were credible, clearly written and the program can provide nuance in its responses. The quality of the writing is about on par with a good Wikipedia entry. There were times when re-phrasing the question was necessary so the program would understand what was being asked of it. For example, the program was asked to find articles that discuss whiteness as a racial characteristic in academic libraries, and it responded with “I’m sorry, I am unable to find articles in academic libraries,” as if it thought it was being asked to deliver the articles.

Where ChatGPT failed miserably was in the citations it provided. Half of the 29 citations checked were from just two publications, Journal of Library Administration and Library and Information Science Research Journal. While the citations are adequately formed, typically they were incomplete, generally lacking volume or issue numbers.

The main problem found with ChatGPT is that the citations refer to articles that don’t exist (see Appendix II: Citations). They are phantom citations leading nowhere. Each citation was searched on Proquest’s Library and Information Science Collection, and the most common response was: “Your search for “article title” found 0 results.” Of the 29 citations checked, only one was accurate, one was correct but had the title transposed, and one was to a real article, but the source journal provided by ChatGPT was incorrect. When questioned as to the accuracy of the citations it provides, ChatGPT grew indignant, claiming, “The articles and studies I listed were published in reputable academic journals and written by experts in their field.” Well, maybe not so much. When pressed on this point, ChatGPT vacillated somewhat, offering “there may be instances where the information I provide may not be completely accurate or up-to-date.” Agreed. The system did give a shout-out to librarians, noting, “It is also recommended to consult with librarians and other subject matter experts, to ensure the accuracy of the information and to get the latest information.”

To provide extra verification the journal title/year combination was checked to verify the non-existence of the article. Each issue of a journal for the specified year was checked, and we found no trace of the items cited.

Phil Davis notes, “ChatGPT works in the same way your phone’s autocomplete function works while texting; it simply puts words together that are statistically likely to follow one other.” AI programs are initially exposed to training data to provide the program with a knowledge base from which to make its inferences. The solution to the citation problem is to expose ChatGPT to training data from the academic realm, perhaps the JSTOR corpus, or information from ScienceDirect, or one of the citation tracking sources. This would enable ChatGPT to provide citations to actual articles and other works, rather than making them up like a first-year student with a paper due the next morning.

ChatTGPT was asked if it was trained on the academic literature. The program was not, rather, used a diverse array of sources from the public Internet and social media sites. Thinking about the work of incorporating academic literature into its training base, the program described a rational process for how to engage with large academic data sets, although it was vague when pressed for a deliverables timetable for such a project. (See Appendix I: Transcript)

As far as the librarians, I think we better stick around for a while longer.

Editor’s Note: This article republished with permission of the author with first publication on Scholarly Kitchen.

Posted in: AI, KM, Libraries & Librarians, Search Engines, Search Strategies, Technology Trends