Researchers from University of California – Los Angeles stated that deep convolutional networks do not classify based on global object shape
A team of researchers from University of California – Los Angeles (UCLA) conducted five experiments on Deep Convolutional Networks (DCNNs) and found that it is easy to fool the networks and that the networks’ method of identifying objects with the help of computer vision differs substantially from human vision. In the first experiment, the team showed VGG-19— one of the best deep learning networks—color images of animals and objects. The images had been altered. For instance, the surface of a golf ball was displayed on a teapot. The network ranked its top choices and chose the correct item as its first choice for only five of 40 objects.
Moreover, VGG-19 stated that there was only a 0.41% chance the teapot was a teapot and its first choice for the teapot was a golf ball. Lead author Nicholas Baker, a UCLA psychology graduate student stated that this shows that the Artificial Intelligence network observes the texture of an object more so than its shape. According to Philip Kellman, a UCLA distinguished professor of psychology and a senior author of the study, humans identify objects primarily from their shape however, the computer networks are more likely using a different method. In the second experiment, the team showed images of glass figurines to VGG-19 and AlexNet. Both networks were trained to recognize objects with help of an image database called ImageNet. Although VGG-19 performed better than the other network, both networks were inefficient in identifying the glass figurines.
In the third experiment, the team showed 40 drawings outlined in black, with images in white, to both the networks. The networks again failed to identify items such as a butterfly, an airplane, and a banana. The fourth experiment included 40 images in solid black. Both the networks produced the correct object label among their top five choices for around 50% of the objects. In the final experiment, the team scrambled the images to increase the difficulty level. However, pieces of the objects were preserved. Six images correctly recognized by VGG-19 earlier were scrambled. The team found that these images were hard to recognize for themselves, whereas VGG-19 recognized five of the six images and was close on the sixth. The research was published in the journal PLOS Computational Biology on December 07, 2018.