Click or pinch to zoom in on the details. As it is many images you need to zoom in quite far to start seeing the actual images.

READ THIS PDF FIRST

A word is worth a thousand pictures

Why the use of Iconclass will make Artificial Intelligence smarter. An analysis of Iconclass and AI by Hans Brandhorst.

Iconclass AI Test Set

A test dataset and challenge to apply machine learning to collections described with the Iconclass classification system.

"The quality of your achieved results always depends on the quality (and quantity) of your training data" - Prof. Dr. Harald Sack, Karlsruher Institut für Technologie, Do Neural Networks Dream of Semantics? (see presentation)

To facilitate the creation of better models in the cultural heritage domain, and promote the research on tools and techniques using Iconclass, we are making this dataset freely available. All that we ask is that any use is acknowledged and results be shared so that we can all benefit. The content is sampled from the Arkyves database. Please consider requesting a subscription from your institutional librarian. If you are interested in having your cultural heritage collection added to Arkyves, get in touch.

Please contact us with comments or suggestions, or if you do something interesting with this dataset so we can share it with the community

21 February 2020

Feedback at posthumus@brill.com or connect on @epoz
Suggested citation: Etienne Posthumus, "Brill Iconclass AI Test Set", Feb 2020, https://labs.brill.com/ictestset/

Data Description

A dataset of 87749 images with assigned Iconclass notations to be used in training. The dataset is available as a 3.1GB zip file. (MD5: 779ba2ca9e977c58d818e3823a676973) All image files are in a single directory, so beware of just unzipping the file.

The data.json file is a map of filenames to Iconclass notations, here is what the first few entries look like:

{
  "IIHIM_1956438510.jpg": [
    "31A235",
    "31A24(+1)",
    "61B(+54)",
    "31A2212(+1)",
    "31D14"
  ],
  "IIHIM_-859728949.jpg": [
    "41D92",
    "25G41"
  ],
  "IIHIM_1207680098.jpg": [
    "11H",
    "11I35",
    "11I36"
  ],
  "IIHIM_-743518586.jpg": [
    "11F25",
    "11FF25",
    "41E2"
  ]
}

Image files

The image files in this dataset have maximum size of 500 pixels to the longest side, and have been sampled randomly from the Arkyves database. If for whatever reason you discover images that are erroneous or should be removed from the sample for whatever reason, please let us know so that action can be taken as soon as possible.

Rights

All images used in this display has either been digitized by Arkyves, or has been supplied from partner institutions under an open license. It has been suggested that it is desirable to supply more information for interested parties wishing to re-use these images. Great idea! We will compile an additional datafile with more information as soon as possible.

Using Iconclass from Python

To perform analysis and extract further meaning from the assigned classifications you can use the Iconclass Python package. It can be installed with:

pip install iconclass

Then in a Python interpreter you can use Python like:

>>> import iconclass
>>> iconclass.get('0')

NOTE: The first time you do an iconclass.get(...) call, the complete Iconclass system will be downloaded as a sqlite file to the current working directory. This is a circa 50MB file, and it also contains a wealth of useful data for analysis purposes.

To get structural information on the classification codes for an image, you can for example say:

>>> iconclass.get('25G41')

{'c': ['25G41(...)',
       '25G411',
       '25G412',
       '25GG41',
       '25G41(+0)',
       '25G41(+1)',
       '25G41(+2)',
       '25G41(+3)'],
 'kw': {'de': ['Blume'],
        'en': ['flower'],
        'es': ['flor'],
        'fi': ['kukka'],
        'fr': ['fleur'],
        'it': ['fiore'],
        'nl': ['bloem'],
        'pt': ['flor']},
 'n': '25G41',
 'p': ['2', '25', '25G', '25G4', '25G41'],
 'txt': {'de': 'Blumen',
         'en': 'flowers',
         'fi': 'kukat',
         'fr': 'fleurs',
         'it': 'fiori',
         'nl': 'Bloemen (met NAAM)',
         'pt': 'flores',
         'zh': '花卉'}}

The keys in the returned dict are:

You can also retrieve the notations in a list like:

 iconclass.get_list(["11F25", "41E2"])

CC0
To the extent possible under law, Brill has waived all copyright and related or neighboring rights to The Iconclass classification codes in Iconclass AI Test Set. This work is published from: Netherlands.


Big thanks to Ian Gilman for his most amazing work on OpenSeadragon which has made a thousand flowers bloom.