Creative process


The process consisted of using a model trained on the ImageNet set to make an operation on each image. The operation outputs a set of numbers, the embeddings vector, which is unique for every image and represents their relation to the ImageNet categories. But in order for this representation to be displayed in a meaningful way, another operation must be performed on every embedding, namely a dimensionality reduction, since the original  resulting vector has 2048 dimensions. Firstly, a direct reduction to the bi-dimensional x and y locations of a Cartesian plane, using Principal Component Analysis (PCA) was tried, with poor results.(Tipping and Bishop 1999) Then a two-step approach was followed, with PCA being used to reduce to one hundred dimensions, and after that another reduction to two variables was performed using the t-SNE method (t-distributed stochastic neighbor embedding). (Hinton and Roweis 2002) After tweaking parameters to make full use of the horizontal  space of the tableau, an algorithm named KMeans (Lloyd 2006) was applied to calculate the representativity of each image in its cluster, which was then visualized as its size.

Other visualizations were explored, like a fixed grid or the Voronoi algorithm, which divides areas in order to create a visualization of the clusters of different categories. (Voronoi 1908) Different models like VGG19 and VGG19 were also used (these yield 4096-dimensional embeddings). (Simonyan and Zisserman 2015) It is worth mentioning that the OpenClip model, which will be reviewed when we address text-to-image platforms, should provide a more natural, continuous, segmentation of subjects, less based on separate classes. But the ResNet50 model offered the possibility to use weights pre-trained with ImageNet. (He et al. 2015) For its historical relevance and the opportunity to study and reveal its shortcomings, this was the chosen approach. The code is published on the author’s Github repository and can be used by anyone willing to visually organize a large collection of pictures.

Different clustering methods

 

PCA with VGG16

 

PCA with VGG16

 

PCA with Resnet50

 

PCA+TSNE with VGG16

 

PCA+TSNE with VGG19

 

PCA+TSNE with ResNet50 (Final choice)

 

Different methods for displaying the clusters

 

Sparse with constant size

 

In grid

 

Voronoi clusters

 

Sparse with size depending on distance to center of cluster (final choice)

References for this page:

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.” http://arxiv.org/abs/1512.03385 (18 May 2022).

Hinton, Geoffrey E., and Sam Roweis. 2002. “Stochastic Neighbor Embedding.” Advances in Neural Information Processing Systems 15.


Lloyd, S. 2006. “Least Squares Quantization in PCM.” IEEE Transactions on Information Theory 28(2): 129–37.

Simonyan, Karen, and Andrew Zisserman. 2015. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” http://arxiv.org/abs/1409.1556 (3 April 2023).

Tipping, Michael E., and Christopher M. Bishop. 1999. “Mixtures of Probabilistic Principal Component Analyzers.” Neural computation 11(2): 443–82.

Voronoi, Georges. 1908. “Nouvelles Applications Des Paramètres Continus à La Théorie Des Formes Quadratiques. Premier Mémoire. Sur Quelques Propriétés Des Formes Quadratiques Positives Parfaites.” Journal für die reine und angewandte Mathematik (Crelles Journal) 1908(133): 97–102.