In 2017, the International Conference on Learning Representations (ICLR) gave the best paper award to a paper from UC Berkeley that made a wrong claim.
The senior author of the paper is Dawn Song and the title of the paper is: MAKING NEURAL PROGRAMMING ARCHITECTURES GENERALIZE VIA RECURSION
The paper claimed that “recursion makes it easier for the network to learn the right program and generalize to unknown situations.” This claim is wrong and the justification provided in the paper is incorrect. I will explain this in simple terms for those who have no expertise in machine learning, theoretical neuroscience or artificial intelligence in the next section of this article.
The General Chairs of the conference were Yann LeCun and Yoshua Bengio. These gentlemen failed to differentiate the best paper from a wrong paper.
Shortly after the conference, I contacted the authors of the paper as well as the organizers of the conference asking them to release the simulations and answer a few questions about their results. The authors and the conference organizers never responded. Therefore, I am publishing this article, clarifying that for reasons that are unclear to me, in 2017, ICLR conference gave the best paper award to a wrong paper.
I am asking UC Berkeley research integrity office to investigate this and release the simulations of Song’s paper.
Why the paper is wrong:
Here I present a simple explanation for non-experts as to why the paper is wrong:
In machine learning, generalization means that you learn the general rule from observing a few examples. For example, if you see the sequence 1, 3, 5, 7, 9, 11, 13, 15, then you realize that every time, you add 2 to previous number (3=1+2, 5=3+2, etc). So, you can easily generalize in the sense that if anyone asks you what is the next number, you will correctly say 17 (=15+2). We also say that the above algorithm is recursive because you repeat the same operation (adding 2 to previous number) over and over again.
In the article, Song states: “We find that recursion makes it easier for the network to learn the right program and generalize to unknown situations.”
She also states: “Empirically, we observe that the learned recursive programs solve all valid inputs with 100% accuracy after training on a very small number of examples, out-performing previous generalization results. Given verification sets that cover all the base cases and reduction rules, we can provide proofs that these learned programs generalize perfectly. This is the first time one can provide provable guarantees of perfect generalization for neural programs.”
Song uses a number of algorithms such as bubble sort to justify this claim. However, in all cases, she adds the key functions of those algorithms to the neural network. In other words, the neural network is not learning those functions. Those functions are given to the neural network and the network learns trivial relations between those functions (for example, the neural network learns trivial if-then operations between those functions).
The neural network is able to generalize 100% because it is cheating in the sense that the key functions are given to the neural network and it only needs to learn trivial relations between those functions (and it is well known that learning such trivial relations is a trivial task for neural networks).
Therefore, this paper is basically claiming that neural networks can perform trivial operations and Yann LeCun and Yoshua Bengio have given this paper the best paper award for making such a trivial claim. Note that the claim of this paper is wrong because they are pretending that the neural network has not cheated and has learned everything including the key functions of the algorithms from scratch.
I am asking UC Berkeley to release the simulations
Again, I’d like to mention that to prove me wrong, Song and the research integrity office at UC Berkeley are more than welcome to release the simulations.
To the readers of this article: Please ask UC Berkeley research integrity office to release the simulations. Once UC Berkeley released the simulations, please come up with a recursive algorithm that is different from those discussed in the paper and generate numbers from that algorithm. Then use those numbers as training data to Song’s simulation and see if her model learns to generalize. I assure you that you will see that her model does not generalize at all. There will be no generalization whatsoever.
Let me repeat one more time: If anyone believes I am wrong, then please release the simulations and please make it very easy for people to feed the simulations with their own training data and test data to verify the paper.
I am asking the research integrity office at UC Berkeley to investigate this. It is unfair that a wrong paper gets the best paper award.
None of the authors of this paper have a PhD in machine learning or theoretical neuroscience or AI. In other words, people without a PhD published a wrong paper in a conference and got the best paper award for that.
Yann LeCun and Yoshua Bengio were the General Chairs of ICLR 2017. I am requesting both gentlemen to release all electronic communications that resulted in the best paper award for this wrong paper in ICLR 2017.
Reza Moazzezi, PhD (Theoretical Neuroscience)
12/15/2022
United States of America