a hand holding a phone with a cartoon on the screen
Source:ByteDance

ByteDance’s Doubao Team Introduces UltraMem: A Sparse Model Architecture with Significant Performance Boost

According to the Doubao model team, ByteDance’s Doubao Foundation team has recently unveiled UltraMem, a sparse model architecture that, like MoE (Mixture of Experts), decouples computation and parameters. UltraMem addresses memory access issues in inference while maintaining model performance.

The new architecture effectively resolves the high memory access costs associated with MoE during inference, achieving a 2-6x speed improvement and reducing inference costs by up to 83%.

UltraMem