I was very pleased, recently, to come across Hans Brandhorst’s essay, “The elephant in the room:
Iconography, Iconclass and Artificial Intelligence”. As I read it, it makes three critical points about the use of ChatGPT for iconographic research and cataloging (and I agree with all of them). These are: i) hallucinated Iconclass notations, ii) inaccurate pictorial descriptions and unreliable transcriptions, which then lead to iii) incorrect interpretations and notations. Other (also legitimate) issues about LLMs in general are raised in the essay as further contributing factors.
I was pleased because the essay is directly relevant to two resources I’ve been building over the last few weeks: rijksmuseum-iconclass-mcp, and its companion resource, rijksmuseum-mcp+. As it happens, both were designed to try and address above all Hans’ first point of criticism about LLMs (hallucinated references) and to at least make some progress towards addressing the third (incorrect interpretations). What they do, in a nutshell, is to direct an LLM to draw on a pre-defined data source in answering a prompt instead of falling back on its pre-trained, background knowledge. For the first resource, this is an online database of Iconclass notations, and for the second, the Rijksmuseum’s catalogue of artworks. The two are designed to work together, but for the sake of this discussion, let’s set aside rijksmuseum-mcp+ and just focus on rijksmuseum-iconclass-mcp which can, in any case, also be used as a standalone resource.
Once you ‘connect’ your LLM to rijksmuseum-iconclass-mcp (how to do so is described on the site) you can query it as usual in natural language and ask it to “look up the iconclass notation for an elephant” which it will then search for in its online database. The resource offers more than that, however, because it also allows you to, for example, search the Iconclass database by concept or meaning (i.e. semantic search) and not just keyword, to list all the key-expanded variants of a concept, or to explore the different places a concept appears within the Iconclass hierarchy. In my (admittedly still limited, and also not as an Iconclass expert) experience while building and testing the resource, hallucinated notations or labels are now rare.
With respect to Hans’ second (or rather, in my summary, third) point, about incorrect interpretations, this is, of course, due in large part to LLMs not being able to interpret the details of images correctly or to accurately transcribe historical texts. But (and here is where I disagree a little with his approach, though not with his conclusion) it is in part also due to how the LLM was being prompted in the ‘Elephants’ essay. LLMs live and die by the context they are given, and if very little is given (e.g. “interpret this” or “describe that”) then they will try to guess what exactly was meant by "interpret’ or ‘describe’ and consequently often fail at guessing right. But if you provide an LLM with more guidance and context, in the form of a more detailed and well structured prompt, it is likely to do much better. This, in effect, is what lies behind the research skills feature which is nothing more than a detailed set of instructions and guidance, in natural language (i.e. more context), to help the LLM better address Iconclass queries. Now, the ‘skill’ file I have created is very much geared towards teaching it how to query rijksmuseum-iconclass-mcp. It says little or nothing at all about how to conduct iconographic research, what constitutes a ‘good’ interpretation, what matters to focus on or to avoid, and so on. And it can’t, in its present form, because I’m really not an expert in this area. But it could, with the help of collaborators, and I suspect (based on my experience in working with LLMs) that once given this added context, an LLM could then also offer far better interpretations and cataloguing advice. But this remains a hypothesis to be tested!
Both resources are still under development and are in part, only technology demonstrators, but of course I also want them to have a practical function and to provide real value to users. For this reason, I’d be very grateful for any and all feedback and criticism from members of this forum.
,