Research, Prototyping, Testing, Writing
2018
Solo project
3 weeks
Advisory: Prof. Palle Dahlstedt (Interaction Design, Department of Computer Science and Engineering)
According to my research, I believe I have discovered a new augmentation of sound, which I call "Sonic Augmented Reality." Sonic Augmented Reality, is the technology of combining ”sonic interactions” (convey information and meaning through interactive context) from our technology-driven environment with computer-generated sound information as a second layer on top of our consumption of music.
In this project, I examine the impact of the increasing use of headphones and earpods by people who isolate themselves from the urban sonic environment. I intend to investigate whether "Machine Learning" (ML) with "Urban Sound Classification" is a useful tool to minimize accidents involving urban moving objects such as cars, trams, bicycles, etc. while simultaneously preserving the enjoyment of listening to music.
I began my research by understanding what the sounds are and how they work; therefore, I researched topics such as:
Secondly, I researched what projects are being done in the field of hearing that are relevant to the problem I want to solve. I looked at existing devices and art projects, such as:
Since I wasn't sure how my solution to the problem could look like, I tried four different experiments which may lead me to a potential concept I can further develop. Detailed information about each experiment is included in the research paper.
Experiment FOUR consists of a video prototype and a participant who speaks their thoughts (Think Aloud) while watching it. A semi-structured interview followed the viewing. In the video prototype, a person walks through New York City as Warning Sounds are generated based on the current environmental sound situation. The notifications were delivered via computer-generated voices and a variety of notification sounds. In the prototype, the user should feel cared for. Along with the Urban Sound Classification, the system uses different sources of 'Geographic Information Services' to provide more context to the sounds classified. As an example, if you are in a high-crime area, the system can warn you that a high level of pickpocket crime is present.
Although the fourth experiment was promising, I attempted to build a high-fidelity prototype to enable better test results in the field. In order to build a prototype for this project, I had to overcome several limitations. Despite my best efforts, I could not make the prototype work. I tried to train a tram bell using my first machine learning environment. It took a considerable amount of time to train my model due to the slow processing power of my computer. Consequently, the resulting model was not accurate enough to recognize the same sound through my headphones. Moreover, this can also be attributed to the poor quality of my headphone microphone. My microphone produced too much back noise, so I did some research to get it to work. Back noise from the microphone and the noise from an urban environment (white noise, traffic, people, etc.) cause a 'Demixing Problem': "Demixing is the problem of identifying multiple structured signals from an overlaid, undersampled, and noisy observation". In order to solve this problem, I would have to use some preprocessing filter algorithms. In an urban environment, it takes a lot of time and patience to capture sounds, such as a tram bell. Instead of standing outside for hours waiting until a tram rings the bell, I used Youtube and the Youtube 8M-Modules to get training sounds.
I created a video prototype showing the learnings from my fourth experiment while conducting further research on a high-fidelity prototype that failed. Based on this, the last test resulted in a list of learnings for this research paper. Below is a prototype of the final product.
The following is an experience prototype that illustrates various scenarios in which Sonic Augmented Reality might be used.