Figure 11From: Learning without lossDistribution of training and test (generalization) error for RRR-trained networks on the depth 3 majority-gate circuit data. The four distributions differ only in the value of the margin Δ used in trainingBack to article page