ByteDance Unveils Technical Report for Seed-Thinking-v1.5, a Next-Gen Reasoning Model with MoE Architecture

April 15, 2025

On April 15, ByteDance officially released the technical details of its latest reasoning-focused large model, Seed-Thinking-v1.5. The model will be made publicly accessible via Volcengine’s API starting April 17.

Seed-Thinking-v1.5 demonstrates exceptional performance in math reasoning, competitive programming, scientific inference, and *creative writing, establishing itself as a strong contender among state-of-the-art language models. Built on a Mixture of Experts (MoE) architecture, it features 200 billion total parameters with 20 billion active parameters per inference, resulting in a 50% lower inference cost compared to DeepSeek R1.

Technical report: https://github.com/ByteDance-Seed/Seed-Thinking-v1.5

Fusion of verifiable and creative data

Performance Highlights

Domain-Specific Tasks:

Math Reasoning: Achieved an AIME 2024 score of 86.7, matching OpenAI’s o3-mini-high
Programming: Codeforces pass@8 reached 55.0%, comparable to Gemini 2.5 Pro
Scientific Reasoning: Scored 77.3% on GPQA, approaching the industry-leading level of o3-mini-high

General Tasks:

Outperformed DeepSeek R1by 8% in human evaluation, addressing a wide range of scenarios with stronger creative and reasoning abilities.

Cost Efficiency:

Achieves 50% lower inference cost per unit compared to DeepSeek R1, optimizing both performance and efficiency.

Optimized Data Strategy for Reasoning and Generation

To balance between verifiability and creativity, the training data pipeline was tailored for different task types:

Verifiable Data (e.g., Math, Code):
Over 1 million samples went through a triple-filtering process (manual filtering → model filtering → multi-model verification)
Retained 100,000 high-difficulty problems
Introduced techniques like answer normalization and sandboxed verification to ensure accurate reasoning chains
Non-Verifiable Data (e.g., Creative Writing):
Based on Doubao 1.5 Pro dataset, filtered out low-quality samples
Employed pairwise comparison reward modeling to optimize generation quality
New Benchmark Dataset – BeyondAIME:
100 challenging math problems without answers, created to address limitations in current benchmark granularity.

Reward Modeling: Dual-Track Calibration for Balanced Training

Seed-Thinking introduces a dual-track reward system to address both objective and subjective tasks:

For Verifiable Tasks:
Developed two generations of Seed-Verifiers, evolving from string-level to step-wise reasoning match
Achieved over 99% accuracyon training/testing sets, eliminating “reward hacking”
For Non-Verifiable Tasks:
Used large-scale A/B pairwise comparison training (over 10 million tests) to capture human preferences for creativity, tone, and emotion
Dual-Track Fusion:
Combines hard metrics (accuracy) with soft preferences (quality), enabling full-spectrum model training.

Training Pipeline: Two-Stage Optimization with SFT + RL

Seed-Thinking-v1.5 follows a supervised fine-tuning + reinforcement learning training process:

Supervised Fine-Tuning (SFT):
400,000 curated samples (300k verifiable + 100k non-verifiable)
Constructed a long-chain reasoning dataset to align model thinking with human patterns
Reinforcement Learning (RL):
Driven by a tri-engine data framework (verifiable/general/mixed)
Introduced innovations like value pretraining and decoupled GAE
Online adaptation keeps data distribution dynamically optimized for model performance

Infrastructure: Scalable Foundation for 20B MoE Training

To support the large-scale 20B MoE system with 200B total parameters, the team built a robust infrastructure:

HybridFlow Programming Model:
Enables fast algorithm experimentation and efficient distributed training
Streaming Reasoning System (SRS):
Decouples model iteration from inference via stream-based reasoning, tripling training speed
Delivers 95% stability even under trillion-parameter loads
Triple Parallelism Architecture:
Combines tensor, expert, and sequence-level parallelism
Uses KARP scheduling algorithm to dynamically balance workloads and maximize GPU utilization

Topics:

AI, ByteDance, model training pipeline, MoE architecture, Seed-Thinking-v1.5

ByteDance Unveils Technical Report for Seed-Thinking-v1.5, a Next-Gen Reasoning Model with MoE Architecture

Performance Highlights

Optimized Data Strategy for Reasoning and Generation

Reward Modeling: Dual-Track Calibration for Balanced Training

Training Pipeline: Two-Stage Optimization with SFT + RL

Infrastructure: Scalable Foundation for 20B MoE Training

Topics:

Related News

Quick Links

Category

Newsletter

Copyright © 2023 Echoiz, All rights reserved. Powered by MoxCreative

ByteDance Unveils Technical Report for Seed-Thinking-v1.5, a Next-Gen Reasoning Model with MoE Architecture

Performance Highlights

Optimized Data Strategy for Reasoning and Generation

Reward Modeling: Dual-Track Calibration for Balanced Training

Training Pipeline: Two-Stage Optimization with SFT + RL

Infrastructure: Scalable Foundation for 20B MoE Training

Topics:

Related News

Tencent Quietly Launches “Yuanbao”, an AI Assistant Embedded in WeChat

Xpeng Unveils 72B-Parameter “Xpeng Foundation Model” for Autonomous Driving

Stanford HAI Releases 2025 AI Index Report: China and U.S. Models Nearly Equal in Performance

Quick Links

Category

Newsletter

Copyright © 2023 Echoiz, All rights reserved. Powered by MoxCreative