Abstract
In this work a Generative Adverserial Network (GAN) algorithm is intended to be used for image synthesis based on input data gathered from sound analysis. For image generation a pretrained BigGAN architecture is used: https://arxiv.org/pdf/1809.11096.pdf
The used BigGAN is trained with ImageNet dataset at 128×128 resolution over 1000 different classes. For implementation of BigGAN and sounds analysis the Deep Music Visualizer repository from GitHub is utilized which can be found here: https://github.com/msieg/deep-music-visualizer
Pitch analysis is used for switching between classes. The gradient of the music signal is calculated and used to control the noise vector, which is fed to the GAN model to generate an image. I used my piece “Void” as an audio input in this study.
Google Colab Link
https://colab.research.google.com/drive/1qqOWH_oxG13OzwNcQMkJSDh1C520qxV-?usp=sharing