Monash University
Browse

MLLM Art Appreciation Evaluation Results and Correct Response Terms Appendix

dataset
posted on 2024-03-14, 11:20 authored by Tace McNamaraTace McNamara

Multi-modal large language models (MLLMs) are primarily evaluated on objective measures such as reasoning, common sense and pattern recognition. However, there is a notable lack of testing involving open-ended responses which require human evaluation. In response to this, this paper presents a comparative analysis of the capacities of GPT-4V, Gemini Pro, Gemini Ultra and MPLUG Owl2 in visual art appreciation, a domain requiring complex competencies demonstrative of higher order cognitive fluency thus presenting a ripe area for the evaluation of human-like intelligences.

A framework for the machine appreciation art was developed based on an established model of human aesthetic experience as a foundation. Seven questions were designed to assess each stage of this framework which outlines the nuanced capacities by which MLLMs can appreciate a visual art image. MLLMs were assessed on their long-form responses to this question set for ten distinct art images representing varying styles and mediums.



History

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC