Beyond FID: Human Perceptual Judgments Reveal Systematic Blind Spots in GAN Face Evaluation

Abstract
Generative Adversarial Networks (GANs) can synthesize highly realistic facial images from random noise vectors. The Fréchet Inception Distance (FID) is widely used as a standard metric to automatically evaluate the quality of GAN-generated images. However, it remains unclear to what extent this statistical measure reflects human perceptual judgments, which ultimately define image realism in practical applications. To address this, we conducted a psychophysical study in which participants (n = 20) performed a two-alternative forced-choice task, assessing actual photographs and GAN-generated images as real or fake. We show that while FID provides a reliable global ordering of image quality, it systematically fails for localized semantic artifacts (e.g., eyewear and skin texture) that disproportionately affect human realness judgments. This demonstrates that FID and human perception are not merely noisy versions of the same signal, but that FID has systematic blind spots for localized semantic artifacts that disproportionately drive human realism judgments.
Description

        
@inproceedings{
10.2312:egs.20261007
, booktitle = {
Eurographics 2026 - Short Papers
}, editor = {
Musialski, Przemyslaw
and
Lim, Isaak
}, title = {{
Beyond FID: Human Perceptual Judgments Reveal Systematic Blind Spots in GAN Face Evaluation
}}, author = {
Nierula, Birgit
and
Melnik, Anna
and
Stephani, Tilman
and
Bosse, Sebastian
and
Barthel, Florian
and
Brama, Aileen
and
Hilsmann, Anna
and
Eisert, Peter
and
Nikulin, Vadim V.
and
Gaebler, Michael
and
Klotzsche, Felix
and
Chen, Yonghao
}, year = {
2026
}, publisher = {
The Eurographics Association
}, ISSN = {
2309-5059
}, ISBN = {
978-3-03868-299-8
}, DOI = {
10.2312/egs.20261007
} }
Citation