Using Machine Learning to Transmogrify Physical Input

In the vein of turing one thing into another, in this project I use Machine Learning and Gene Kogan's Doodle Classifier to transmogrify physical drawings into sound explorations. The project consists of a main piece in which doodles trigger diverse soundscapes (Sounds of New York City, Rain & Thunder, Crickets in the jungle, and Birdsong) and three accompanying experiments which further explore the concept, the Doodle Classifier tool, and a final experiment which reimagines the main soundscape piece using Regression as opposed to Classification. This project was inspired by Doodle Tunes by Andreas Refsgaard and Gene Kogan.



Creative Motivation

By studying and practicing tools used in Machine Learning, my hope is to demystify terminology and practices of machine learning, deep learning, and AI. Moreover, my goal is to use this newly acquired power for good(!) and to explore making my interactive projects and overall design practice more meaningful and engaging.

The benefits of Machine Learning in Creative Practice that I gather to date are:

1) A project can be uniquely personalized by creating an interaction that works best for the performer.
2) The interaction, especially that of audio based projects, can be refined to enable precise controlled output as opposed to using a sensor or input alone.

3) During production there is a strong feeling of collaboration between the designer/performer and the computer/software.

4) Once the piece is finsihed and is on display, the participant holds the power

In his article for Digimag in the Summer of 2017, Danish Creative Technologist Andreas Refsgaard writes, “by enabling people to decide upon and train their own unique controls for a system, the creative power shifts from the designer of a system to the person interacting with it.”

This is somewhat reminiscent of early Dadaist art where for the first time, the role of the artist as skilled creator was disrupted. When Marcel Duchamp presented his Readymades and elevated a mass-produced object as opposed to painting an original artwork, the distance between viewer and artist was closed. The participant was required to engage with the art to complete the work and therefore the artist and the work itself, were irrelevant without the contribution of a viewer. In Dadaism the spectator is empowered and in digital art involving machine learning, the person interacting with the work is empowered.

Beyond the shift of power, I wanted to explore the idea of collaboration between computer and designer. This can be seen in Cat Chorus which uses continuous OSC messages sent from the Doodle Classifier to MAX. I control the inputs and have a general sense of which sounds will be triggered, but whether I draw the samples well, whether the classes will be interpreted correctly, what messages are sent, the frequency of those messages, and how MAX will interpret those messages are all out of my control.

Cat Chorus

To a lessor degree this collaboration is explored in the main piece, Doodle Soundscapes. As opposed to Cat Chorus, the doodles were drawn in advance in order to be consistent with the training set. The classify button is pressed only when the parameters have been optimized and the paper is aligned well with the webcam. There is still a level of controlled chance because it is unclear what part of the soundscape will be played when triggered.

The inimitable science fiction writer Arthur C. Clarke wrote, “any sufficiently advanced technology is indistinguishable from magic” and being able to write code, understand the tools used, and not be limited by a lack of ability allows one to be the magician - to craft the spectacle and experience. Inputs such a Leap Motion or small imperceptible sensors allow an interaction to feel and look magical. An invisible interface also allows for less rigid physical interaction than traditional computer keyboard and mouse inputs. When touch-less, an interface has a greater propensity for discovery which is ultimately more engaging for the user.

Kogan’s Doodler Classifier is powerful but the input is limited to a static camera and the OSC messages even when running continuously, are limited a discrete class label output. The messages sent when using Regression are numeric and continuous, allowing for smooth manipulation of several outputs. To test an invisible interface, to free myself from the traditional computer inputs, and to compare both the experience and result of similar projects one made using Classification and other using Regression, I created a third experiment titled Leap Scapes.


In Leap Scapes I use Wekinator to correlate specific Leap Motion hand positions and gestures to interact with volume sliders in MAX. The Leap allows for fluid touch free input and the trained samples and messages from Wekinator permit smooth and continuous transitions between four soundscapes. Although I programmed fades between the audio files in Doodle Soundscapes, the discrete classifications resulted in audio files being either on or off. In Leap Scapes regression allows for any or all sounds to be playing with a volume from inaudible to highly audible.

While this project was intended initially for myself as an exploration of machine learning, I could imagine a version of this idea being used in an engaging experiential installation or in an educational capacity in a museum setting. To test this, I made a fourth experiment titled Random Animals. An image of an animal is presented to a camera and animal sounds are played randomly upon classification. The goal is to align the animal with its corresponding sound. This would be an entertaining way of teaching children about animals while also engaging with the child.




Project Reflection

I paired Gene Kogan’s Doodle Classifier, both the binary app and the openFrameworks project file, with MAX 8 in order to create Doodle Soundscapes and Cat Chorus. I was initially unable to modify the openFrameworks project file to use an external camera so I turned to the binary and used my webcam. Eventually, I found a solution and modified the openFrameworks file to set up an external PlayStation Eye camera. This allowed me to point a camera at a flat service and make Cat Chorus.

When I began this project I had only a loose idea of my end goal and felt unsure of the concepts, procedures, and the tools available but in spite of that I think this project is playful and successful. With more time and better equipment the project could be modified to suit a number of implementations.