In order to use machine learning technology, we must accept — at least indirectly — what can only be described as a mess. To compensate for that, we have developed different mechanisms on top of public infrastructures for big data exploration, automation, and research. But beneath lurks a bizarre and complicated structure. No matter how we try to avoid this outcome, we are going to end up with ever more increasing complexity in the technologies that touch every aspect of our lives.
There are two obvious reasons for this mess that are intertwined:
- The unavoidable reduction of life itself to math.
- The natural forces we attribute to algorithms to connect patterns within the mess.
Even this attempt of compiling my non-linear thoughts into a code of words creates an inevitable reduction of ideas and intentions. Let’s see if I can overcome it.
Reduction of data
The basic nature of mathematics assumes that we can find the perfect and most accurate equation to calculate the most complex and mysterious phenomena. We want to understand how things connect and use mathematics to unravel the universe’s secrets through the relationships between numbers. We need to give algorithms numbers in order for them to proceed and operate. This demands that we look into the core of the subject, seek patterns, and locate their absolute essence. Machine learning technology demands that we master nature; it compels us to falsely believe that we can perfectly model human behavior, desires, and motivation.
We know that humans are not all numbers. It is impossible to find patterns in history without taking into consideration the attached social, economic, political, personal, and unexpected data. Many efforts are invested in trying to find the perfect equation to represent random, diverse, and complex behaviors, human systems, and minds. We are insisting on it — we are blindly driven by our obsession to replicate ourselves, and to live forever even as machines.
Mathematics models are based on the past and the assumption that patterns will repeat
Ironically, even if such an operation can be computed, we cannot find patterns that will predict our future — since we just don’t have the data. Our historical heritage is built upon dominant cultures and men. Instead of acknowledging that, our mathematical models are trying to calculate it. This will either fix what is broken or will build life-changing structures upon rotten ones. Time will tell.
Algorithmic forces
Engineers are composing equations that look for patterns to fasten and scale our code, and to compute complex operations driven by human-like inferred outcomes. We are using the human brain as an inspiration and developing models that mirror our neurons in the shape of neural networks. Our associative and long-term memory drive our decision-making process through shortcuts that quicken our actions based on what we already learned. The various combinations of shortcuts are where the magic happens; these differentiate us from one another and where many of our emotional motivations lie.
These combinations push us forward as humans, and are affected by time and space in ways we don’t yet know. Humans are motivated by the desire to be loved and connected to one another. Machines are not, and I’m not sure if they ever will be. In my last post, I elaborated on this and cited Daniel Kahneman’s term emergent weirdness. The shortcuts made by the neural network will not create the same human-like meanings without eliminating context and content completely. We are currently in the midst (or is it still just the beginning?) of a state that forces immature algorithms to find meanings and order in chaotic data within the matrix, the high-dimensional space.
I am fascinated by the current state of these models, where the outcomes can reveal something not only about the power structures that stand behind them but provide new meanings that expand on reality and logic. The machine learning technology will probably improve, and this stuttering condition will probably be replaced by another — but it’s a good moment to stop and zoom into that phenomenon, as it greatly resembles how we consume and make sense of the world.
We spend most of our lives looking for meaning in all that we’re experiencing, hoping to uncover the mysterious reasons for our existence, speculating on the purpose of life. We are asking our machines to do the same. Looking for constant meanings in daily inputs and outputs is one cause for the existential crisis in our time. Can it also be one of the existential crises our machines will experience? If there will be a day of independent thinking entities, will they embody these depressing phenomena of our time?
This is part of ongoing research I’m doing about the possibility of mental illness in the intelligent machines we create.
Confusion guided by a clear sense of purpose
Input will trigger a machine learning network to look into the existing data for similarity, or common patterns that will hopefully generate a desirable output. Machine learning models are trained to provide an output. It might not be the right or the correct output, but there will always be one.
For example, in our early days at Volume—a machine learning tool to reconstruct 2D images in 3D space—Or Fleisher and I used a Convolutional Neural Network that was pre-trained on indoor images to test the way we can convert a single-view image to a 3D model. When we gave the model indoor scenes, and especially corridors, the results were promising:
But when we gave it something outside the training data, we received a range of mistakes. Some of them were surprisingly inspiring (like clouds that were perceived as ghost-like objects with tangible mass). My point is — Machine learning models will look at the input through the lenses of the data they were trained on.
One of my favorites cases was discovered lately by Reddit users and deals with the effect of neural machine translation. Google Translate works with a recurrent neural network that is designed to learn the mapping between an input sequence through the connection between the words.
Since context can give the same words different meanings, we need to teach the machine to look into the hidden layer of language. In the last decade, this approach improved the results of Google Translate, but also required them to train on more contextual data within the language’s source. What happens when there is a limited amount of existing English translation from a specific language? Three major languages—Somali, Hawaiian, and Vietnamese—reveal strange results. So strange that you could feel like someone is talking to you from beyond.
The explanation is dry and based on the fact that there weren’t as many written texts with English translation to train on Somali, Hawaiian, and Vietnamese as there were in other common languages. Also, the texts that did happen to be available were, according to one source, Christian bibles. The combination of a small and unrelated data set and the fact that the machine was trained to produce something at any cost reveals magical outputs.
When you’re trying it yourself, you can feel how the machine is constantly looking for meanings in data that has none. The best the machine can do in this case is to provide something fluent, that resembles an output that looks like human language. This tricks us to think that there might be a hidden intention, someone talking to us from the other side. We are humanizing the results, reflecting ourselves, looking for meanings in the algorithms that look for meanings.
This inspired me to test bible texts on a different model. Instead of text-to-text, I wanted to experiment with a text-to-image generator. Together with Ziv Schneider, I drew a few verses from the first and second chapters of the Old Testament, and with the help of Eyal Gruss we ran it on a generative model — AttnGAN.
This model was trained on the COCO dataset, a dataset of concrete objects and items from the physical world. We asked: How would a model that was trained on pre-labeled images of buildings, cars, and birds react to “God saw everything that he made, and indeed it was very good”? How would this model, with limited knowledge on a social, cultural, and political structure, extract meaning from such obscure sentence such as “And the rib that God had taken from the man he made into a woman and brought her to the man”?
I was curious to explore what happens when we force the machine to look for order in cultural texts that have no concrete reason and explicit logic within them. The results were mesmerizing. It seems to communicate core elements of the general emotional atmosphere of each verse, something that ranges between spiritual connection and dark humor. I found within the results a hidden space of error and loss. Can that be the portal to work with the machine's mental state?
One of the outcomes of mental illness is that reality is being experienced through specific lenses, not necessarily related to the input received. Isn’t that exactly what we are receiving from our machines? Could this be a new way for us to understand and reflect on human distortions and mental states?
As part of my research on the possibility of mental illness in machine learning, I’m working with an amazing team to develop an interactive web experience (Marrow).
We are exploring machine learning models through psychological lenses, designing a dynamic story structure that encourages interactions and unpredictable actions of machine learning characters. As part of our development and user research, the technical director Cristobal Valenzuela (in collaboration with Runway ML) developed a public interface to play with the text-to-image concept. We set certain conditions and let them grow in the wild, wild web. We are thrilled to see the excitement this demo caused. It seemed to give users an immediate route for a creative expression with a machine, which inspired interesting tests.
We are communicating with machine learning most of our day through different applications designed to work seemingly perfectly. One of Marrow’s goals is to use a storytelling system to help make sense of these complex systems, to include more people, and to broaden the conversation. The narrative that is mostly being used around machine learning is kind of ridiculous—we keep categorizing this technology as horror instead of connecting the public to the meaning of the data and model itself. We need applications that allow for absurdity, error, and unexpected results to be embraced as something that adds to our understanding of machine learning, a departure from the practical considerations of creating products for consumption. We can use these experiments as a way to ask difficult questions and to educate people in alternative ways.
Speculating on the future we are currently developing is a political and social act. But when the infrastructure that dictates our future feels inaccessible to the public, the opportunity for reflection and input becomes limited to the experts. This inherently closes off a diversity of perspectives and voices. Our work aims to welcome a wider public into a crucial discussion about the future we want to see.
thank you, Emma Dessau for editing this.
Immerse is an initiative of the MIT Open DocLab and The Fledgling Fund, and is fiscally sponsored by IFP. Learn more about our vision for the project here.