Skip to main content
Figure 12 | Fixed Point Theory and Algorithms for Sciences and Engineering

Figure 12

From: Learning without loss

Figure 12

Top: Behavior of RRR generalization (distribution of test error in final 10 epochs of 10 trials) with increasing depth n of the majority-gate generated data. Bottom: Same as the top figure but for 100 trials of SGD. Aside from some outliers, SGD does better on average for the deepest data but, unlike RRR, fails to get perfect generalization for \(n=2\) data on the small architecture

Back to article page