Google Brain publishes Switch Transformer, scaling to 1.6T parameters using mixture-of-experts. Sparse scaling becomes a viable path to trillion-parameter models. PrevMain BlogNext