Thank you for your insightful comments …
would have been the proper way to end this conversation - had it indeed been a conversation.
In fact, although it seems a perfectly reasonable dialogue about a question that may not be very profound, but is not trivial either, all the “answers” are generated by the chatbot. They can easily be mistaken for the answers given by a thoughtful, knowledgeable human respondent, but they are not.
The quality of the large language model (LLM) at the core of the application makes it pretty hard to realize that I was not asking a human for an opinion or an insight. What I was doing instead at the outset of our dialogue, was offering a computer program the following input: “given the statistical distribution of words in the vast public corpus of (English) text processed by the LLM, what words are most likely triggered by the keywords and phrases in the question?” (a formula borrowed from M. Shanahan, Talking about Large Language Models).
And the communication algorithm made the response quite convincingly anthropomorphic by inserting the descriptive detail “in the painting, the figures are blindly following each other”, suggesting that the bot has actually “seen” the picture, which of course it hasn’t.
To generate the initial response and a conversation of this quality, OpenAI had to build a colossal dataset of texts and classify its content to make sense of the information it contains. We do not know the precise composition of the dataset. It is not publicly available - “OpenAI” is not that open - but even if it were, it would be far too large to be evaluated by human users. That is precisely why AI applications like this one are being developed in the first place. The pace at which the internet’s datapool is growing, keeps accelerating, so the gap between what is aggregated and what a human brain can process is getting wider all the time.
And that development will not stop. This accelerating expansion also holds for the textual and the visual sources humanities researchers work with, so they too will have to rely on information technology to maintain a grip on their material.
The sheer size of the corpus of raw data is overwhelming and leads to a certain level of opacity. That size is a fact of life and we simply have to deal with that. However, the size is not the main concern of researchers who try to evaluate just how useful AI applications can be. In a sense we have all become used to that, and in everyday life we have stopped asking how search engines select and prioritise query results. We accept those results for what they are, and we no longer wonder what remains hidden from view.
However, in the context of research the classification used to transform raw data into information should be transparent and accessible for inspection.
As researchers we must know how raw data are processed, and how these massive corpora of source material are classified and made searchable; how, for example, scholarly comments about a painting like the Breughel are scraped from the internet, digested, reorganized and then read back to us by a chatbot…
As the selection and interpretation of sources is fundamental to the work of most of us in the humanities, we need to know how the content of digitized sources is classified and made accessible. This holds all the more strongly for the visual sources we consult, since textual descriptors are still an indispensable element of the metadata we need for their retrieval.
Every classification represents a certain worldview; every vocabulary is biased in some ways. So is Iconclass. Unlike the classifications applied by “BigTech”, however, it is published on the internet in its entirety, so it is available for public inspection and evaluation. Its raw data, moreover, are deposited in a public repository (Github) as open data, and anyone who sees shortcomings can suggest improvements and corrections.
Hence it seems like a good idea to start offering a course to disseminate knowledge about Iconclass and its potential for computer-assisted but transparent image analysis.
(If you agree to this, let us know … we have some ideas for a course that we would like to test)
Whatever the consequences are of AI chatbots entering the field of the Humanities - and there will be many - one thing is clear: if we want to be able to critically assess what they produce we must make sure that our sources are processed with the help of tools that are open and transparent, and that the community of the researchers using them, is able to modify and improve them.
I decided to pose a related question to chatGPT, using a slightly different metaphor. Ignoring the chatbot’s puzzling combination of the emblem and the saying by Seneca - it would be useless to ask for its “source”, as we have seen above - we can find some consolation in its “answer”.
How can we connect Alciato’s emblem “Mutuum auxilium” to the concept of artificial intelligence?