Over the past few months/years we have been reading a lot about AI being used to identify emotions like fear, confusion and even traits like lying or trustworthiness of a person by analyzing video & audio recordings. This is driving innovations in Recruiting, Criminal investigations etc. In fact the global emotion detection and recognition market is estimated to witness a compound annual growth rate of 32.7% between 2018 – 2023, driving the market to reach USD 24.74 billion by 2020. So a lot of companies are focusing their efforts in this space as AI applications that are emotionally aware give a more realistic experience for users. However, there are multiple privacy implications of having a system detect a person’s emotional state when interacting with an online system.
So to counter this trend of systems becoming more and more aware there is now a group of researchers who have come up with an AI-based countermeasure to mask emotion in spoken words, kicking off an arms race between the two factions. The idea is to automatically converting emotional speech into “normal” speech using AI.
Their method for masking emotion involves collecting speech, analyzing it, and extracting emotional features from the raw signal. Next, an AI program trains on this signal and replaces the emotional indicators in speech, flattening them. Finally, a voice synthesizer re-generates the normalized speech using the AIs outputs, which gets sent to the cloud. The researchers say that this method reduced emotional identification by 96 percent in an experiment, although speech recognition accuracy decreased, with a word error rate of 35 percent.
In a way its quite cool because it removes a potential privacy issue, but if you extrapolate from existing research then we have the potential of bigger headaches in the future. Currently we have the capability of removing emotion from a audio recording, how difficult would it be to add emotion to a recording? Not too difficult if you go through the ongoing research. So, now we have a system that can take a audio/video recording and change the emotion from sadness to mocking or from happy to sad. This combined with the deepfakes apps that are already there in the market will cause huge headaches for the public as it would be really hard for us to determine if a given audio/video is authentic or altered.
Article: Researchers Created AI That Hides Your Emotions From Other AI
Paper: Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants
Well this is all for now. Will write more later.
– Suramya