Figure 12From: Learning without lossTop: Behavior of RRR generalization (distribution of test error in final 10 epochs of 10 trials) with increasing depth n of the majority-gate generated data. Bottom: Same as the top figure but for 100 trials of SGD. Aside from some outliers, SGD does better on average for the deepest data but, unlike RRR, fails to get perfect generalization for \(n=2\) data on the small architectureBack to article page