From Music to Deep Image Synthesis

Abstract

In this work a Generative Adverserial Network (GAN) algorithm is intended to be used for image synthesis based on input data gathered from sound analysis. For image generation a pretrained BigGAN architecture is used: https://arxiv.org/pdf/1809.11096.pdf

The used BigGAN is trained with ImageNet dataset at 128×128 resolution over 1000 different classes. For implementation of BigGAN and sounds analysis the Deep Music Visualizer repository from GitHub is utilized which can be found here: https://github.com/msieg/deep-music-visualizer

Pitch analysis is used for switching between classes. The gradient of the music signal is calculated and used to control the noise vector, which is fed to the GAN model to generate an image. I used my piece “Void” as an audio input in this study.

Google Colab Link

https://colab.research.google.com/drive/1qqOWH_oxG13OzwNcQMkJSDh1C520qxV-?usp=sharing

Results

Some Small Trials