If deep learning finally has a recognized mathematical theory as its foundation in a hundred years, which can explain various mysteries in experiments, what will this theory look like?

2022-01-06 View External

Q: If deep learning finally has a recognized mathematical theory as its foundation in a hundred years, which can explain various mysteries in experiments, what will this theory look like? A: According to the Cobham Edmonds theorem, a good computational model belongs to a polynomial time complexity class. A good computational model means that any problem that the model can solve can also be solved by a general-purpose Turing machine; Although the computational performance of these models varies, overall there is still a polynomial level gap. There are also many good explanatory theories for deep learning, including using general machine learning theories to derive its universality as a machine learning model, and using nonlinear dynamics and phase space theories to prove its convergence ability. The new patterns include geometric flow and topological flow ..... Whether it is probability distribution or any fundamental theory, the implication of the CE theorem is that this may be a fascinating and intricate pattern, and using this or that to explain deep learning will not result in significant differences in computational power. However, the stunning performance of deep learning in practical applications on certain problems does not seem to make the difference in computing power seem insignificant. Unless it is acknowledged that the existing fundamental theories of deep learning are good enough to be recognized as "accepted theories" by the respondent, this is indeed a hidden danger. The respondent believes that this new theory should be able to explain the "various mysteries" in experiments, and metaphysics always appears as a defense line in nature. Without metaphysics, there is no engineering. This already implies that deep learning is a highly engineering technical discipline: a complex engineering process is highly unexplainable, and deep learning, which is filled with various technologies, is clearly such an engineering process. I have two small opinions on what form this new theory takes:

  1. The book "Scale" mentions Geoffrey West's questions about superlinear and sublinear growth, where superlinear growth refers to cities where innovation and vitality never dry up, and the larger the scale, the more prosperous the economy, while sublinear growth refers to corporate groups with their own lifecycle, which can grow or age and die. The current deep learning technology is just a product, with its own data sources and algorithm implementation. With the standardization of data facilities and the improvement of algorithms, old deep learning products will die or be replaced, while new ones will replace them. Even in a larger sense, truly universal artificial intelligence - humans themselves - will also face the challenge of knowledge iteration. Throughout his life, he will not face metaphysics, but rather immortal things such as big cities and civilizations. Even individual humans will face a lot of mysticism, but when it comes to the overall human society, this issue is very weak. Innovation, vitality, and diversity are manifestations of the 'undead': it will ultimately solve all mysticism. But what can create such an 'immortal thing' and how to turn deep learning into such an 'immortal thing' is a complex problem of mechanisms. But one thing is certain, today's deep learning is still just an accessory to individual human knowledge activities. Many problems with deep learning are not impossible without deep learning, and the sky will collapse without deep learning, but for the sake of excellence. If in the future deep learning can become a sensory, a level of thinking equivalent to rationality/sensibility, a foundation for imagination and behavior, rather than an accessory to the worldview shaped by these senses, thinking, and imagination, deep learning theories with this ability may "explain various mysteries in experiments". Unfortunately, I cannot provide a more mathematical answer, but rather a more imaginative one, simply because I believe that the former, under the shadow of the CE theorem, often spins in place, while the latter can open a window towards chaos and possibility.

  2. New possibilities in practice. Designers always like to adjust parameters, why not adjust numbers? Try to manipulate the vector group of the neural network to make a certain position have a higher or lower value. Of course, in terms of dynamics, this is just a special perturbation that may not change the calculation results, or it may make the calculation results strange. But can we start trying not to seek convergence or some stable algorithmic properties, but instead consider some behavioral art? Some respondents under this question mentioned "emergence", but personally, "emergence" is automatically established in the context of complex systems: I think this is equivalent to not saying anything. Because as long as the formed system managed by deep learning is regarded as a complex system, this property naturally exists. The key is how to face the attitude of emergence. Unlike some respondents, I believe that the existence of complex systems leads to emergence. Using certain decoupling methods to transform complex systems into another system may seem like a solution to this problem, but the complexity of complex systems comes from the complexity of the model designer rather than the complexity of the model itself. Decoupling methods only transfer complexity without reducing it. For example, I can design a model that can handle many situations and recognize different contexts of users, so users have less to do, but the model has more to do: multi scenario recognition and further utilizing pre designed solutions for this scenario. And users still need to do certain things, including knowing at least the possible requirements for using this model and how to integrate it. Now decoupling the model and dispersing it into multiple systems would increase the complexity of coordination, integration, and maintenance between systems, as well as the learning cost for users, or turn it into a multi-functional super interface system. Therefore, the complexity shifts to the users: they need to understand the complex interfaces and functions. The world is a super giant complex system, and in order to make the model useful and connect it with the world, the complexity of the world flows into the model. Models that ignore complexity are useless, while models that accept complexity are trivial. Assuming we are a group of restless genes, the galloping of cheetahs and the diving of hummingbirds are like the engineering performance we pursue, breaking the highest level. But are the projects that gave birth to the civilization of Homo sapiens among these projects? can make nothing of it. But this new project, the new theories related to deep learning, will definitely not take many ordinary paths as we see them now.