Switch Transformer — 1.6 Trillion Parameters at Google

Google Brain publishes Switch Transformer, scaling to 1.6T parameters using mixture-of-experts. Sparse scaling becomes a viable path to trillion-parameter models.

Switch Transformer — 1.6 Trillion Parameters at Google

Add comment Cancel reply