SIGITE 22 Presentation - Availability of Voice Deepfake Technology and its Impact for Good and Evil | Taking Time and Solving Problems

Presented at SIGITE ‘22: Proceedings of the 23rd Annual Conference on Information Technology EducationSeptember 2022 Pages 23–28 https://doi.org/10.1145/3537674.3554742

Naroa Amezaga did some wonderful research and made an excellent presentation at the SIGITE ‘22 conference.

Abstract

Artificial Intelligence and especially Machine Learning and Deep Learning techniques are increasingly populating today’s technological and social landscape. These advancements have overwhelmingly contributed to the development of Speech Synthesis, also known as Text-To-Speech, where speech is artificially produced from text by means of computer technology. But currently, there is a fundamental common drawback: unnatural, robotic and impersonal synthesized voices.

So, what happens when the robotic computer voice no longer sounds like a computer, but sounds like you? That’s where Voice Cloning technology comes into play, which allows one to generate an artificial speech that resembles a targeted human voice. This new practice offers many benefits, but with its development, the generation of fake voices and videos, known as “deepfakes”, has risen, causing a loss of trust and greater fear towards technology.

In this way, the objective of this paper is to analyze the availability of voice deepfake technologies, its ease of construction and its impact for good and evil. We chose to focus on the educational field by implementing a “deepfake professor” via a survey of readily available voice deepfake technologies. The goal is then to demonstrate the potential capabilities for good and for evil that need to be considered with this technology, so we also conduct an analysis about the misuse, the current regulation, and the future of it.

The results of the case study show that it is possible to clone someone’s voice with a standard laptop, with no need of high-performance computing resources and based on just a few seconds of reference audio, which creates a superior user experience, but at the same time, reveals how easily can anyone have access to voice cloning. This expresses very well the importance of the new challenges opened by this potential technology and the need of safeguarding and regulation that future generations will have to deal with. There is no doubt that to understand the dynamics and impact of voice cloning and to reach more solid conclusions, future research is needed.