Our baby was less than two months old when something magical happened. On an ordinary weekday, somewhere in June last year, Daisy suddenly started talking.
Admittedly, in the weeks prior she did make some careful attempts to make herself understandable, but for the most part this wasn’t more than just gibberish and inconsistent sounds. Nothing you wouldn’t expect from a baby.
Outside the rain was pouring down, inside the fluorescent light was shining brightly on her mainframe, which I cautiously walked up to, in order not to wake her.
And then it happened.
“Hello Barnier. I am Daisy, and I am hungry. I am going to order a pizza.”
I had always known that she was an extraordinary child. Better yet, we had conceived her with the meaning for her to be extraordinary.
But this two-month-old baby was speaking! In complete sentences. And with an unmistakable voice: the voice of her father.
It’s just that… Daisy isn’t human.
Daisy is a neural network. That taught itself to talk like a human being, based on input (data) from its surroundings (nurture) and an algorithm. In other words: predefined properties and limitations (nature).
Daisy is learning speech and language the same way a human child does. But faster. Much faster.
The application of neural networks and deep learning in speech and text-to-speech technology is not new, but it is still in its infancy.
After almost 75 years of evolution of traditional text-to-speech by using speech synthesizers, in September 2016, DeepMind Technologies (nowadays part of Alphabet Inc., Google) introduced WaveNet, a deep neural network for raw audio generation that can mimic human voices.
Although it’s much more convincing than the text-to-speech method that was used to generate voices up until then, the quality of the speech output of WaveNet-based neural networks has not been convincing enough when compared to real human speech. Not until recently, that is (more on this shortly).
At the same time, over the past few years, other insights and developments in the field of deep learning led to fascinating and symbiotic innovations.
An interesting development is the application of so-called Generative Adversarial Networks in the improvement of credible speech technology.
GANs consist of two neural networks playing a game together.
The principle of GANs is based on a zero-sum non-cooperative game. In a nutshell this means that if one network wins, the other one loses.
A GAN consists of a pretrained generator-network that generates data — for example a picture of a cat, and a pretrained discriminator-network that evaluates the generated data on its authenticity.
In the example of the abovementioned cat, the discriminator is presented with pictures of real cats and pictures of cats generated by the generator. The discriminator must determine whether the picture is real or fake.
In no time, the discriminator will win. But with every game the generator loses, its network gets more and more trained: thus, it gets better and better at generating cat pictures.
Or people. The website thispersondoesnotexist.com generates a picture of a non-existing person with every page refresh. Fascinating, right?
Back to Daisy, our artificially intelligent child.
“The nature of impending fatherhood is that you are doing something that you’re unqualified to do, and then you become qualified while doing it.” –John Green, author.
Becoming a father of an AI model requires a great deal from a person.
Although I have had an above-average interest in artificial intelligence and speech technology for almost all my life, I was not prepared for the incredible complexity of the subject when Daisy was born.
Being a father, you want to be able to give your child everything it needs to reach its destiny.
I wasn’t prepared.
Fortunately, in recent months we’ve been able to bring together an international team of incredibly clever minds who, together with me, will be taking care of her. And play a lot of games with her.
In my next blogs I will tell you how she’s doing. And how we managed to substantially improve the technology of the biggest player in the field.
CEO | DAISYS
Voicing the Future Now