XiaomiMiMo/MiMo: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Unlocking the Reasoning Potential of Language Model
From Pretraining to Posttraining

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

This code repository is licensed under the Apache2.0 License.

Currently, most successful RL works, including open-source research, rely on relatively large base models, e.g., 32B models, particularly for enhancing code reasoning capabilities. Moreover, it was widely considered that achieving uniform and simultaneous improvements in both mathematical and code capabilities within a small model is challenging. Nonetheless, we believe that the effectiveness of the RL trained reasoning model relies on the inherent reasoning potential of the base model. To fully unlock the reasoning potential of language models, efforts must focus not only on post-training but also on pre-training strategies tailored to reasoning.

In this work, we present MiMo-7B, a series of models trained from scratch and born for reasoning tasks. Our RL experiments from MiMo-7B-Base show that our model possesses extraordinary reasoning potential, even surpassing much larger 32B models. Additionally, we perform RL training on a cold-started SFT model, resulting in MiMo-7B-RL, which demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1-mini.

We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model.
We believe this report along with the models will provides valuable insights to develop powerful reasoning LLM that benefit the larger community.

Models are avaliable at https://huggingface.co/XiaomiMiMo

Benchmark	GPT-4o-0513	Claude-3.5-Sonnet-1022	OpenAI o1-mini	QwQ-32B-Preview	R1-Distill-Qwen-14B	R1-Distill-Qwen-7B	MiMo-7B-RL
General
GPQA Diamond (Pass@1)	49.9	65.0	60.0	54.5	59.1	49.1	54.4
SuperGPQA (Pass@1)	42.4	48.2	45.2	43.6	40.6	28.9	40.5
DROP (3-shot F1)	83.7	88.3	83.9	71.2	85.5	77.0	78.7
MMLU-Pro (EM)	72.6	78.0	80.3	52.0	68.8	53.5	58.6
IF-Eval (Prompt Strict)	84.3	86.5	84.8	40.4	78.3	60.5	61.0
Mathematics
MATH-500 (Pass@1)	74.6	78.3	90.0	90.6	93.9	92.8	95.8
AIME 2024 (Pass@1)	9.3	16.0	63.6	50.0	69.7	55.5	68.2
AIME 2025 (Pass@1)	11.6	7.4	50.7	32.4	48.2	38.8	55.4
Code
LiveCodeBench v5 (Pass@1)	32.9	38.9	53.8	41.9	53.1	37.6	57.8
LiveCodeBench v6 (Pass@1)	30.9	37.2	46.8	39.1	31.9	23.9	49.3

MiMo-7B series

Benchmark	MiMo-7B-Base	MiMo-7B-RL-Zero	MiMo-7B-SFT	MiMo-7B-RL
Mathematics
MATH500 (Pass@1)	37.4	93.6	93.0	95.8
AIME 2024 (Pass@1)	32.9	56.4	58.7	68.2
AIME 2025 (Pass@1)	24.3	46.3	44.3	55.4
Code
LiveCodeBench v5 (Pass@1)	32.9	49.1	52.3	57.8
LiveCodeBench v6 (Pass@1)	29.1	42.9	45.5	49.3

Important

The evaluation are conducted with temperature=0.6.

AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.

[Recommended] We official support inference with MiMo-MTP using our fork of vLLM.

Example script

from vllm import LLM, SamplingParams

model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

conversation = [
    {
        "role": "system",
        "content": ""
    },
    {
        "role": "user",
        "content": "Write an essay about the importance of higher education.",
    },
]

outputs = llm.chat(conversation,
                   sampling_params=sampling_params,
                   use_tqdm=False)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

print("=" * 80)

Or, you can register a vLLM loader for MiMo without loading MTP parameters.

You can copy the registry/register_mimo_in_vllm.py to your directory and import it with

import register_mimo_in_vllm

from vllm import LLM, SamplingParams

model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    # num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

Example script

from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer

model_path = "/path/to/MiMo"
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)
inputs = tokenizer(["Today is"], return_tensors='pt')
output = model.generate(**inputs, max_new_tokens = 100)
print(tokenizer.decode(output.tolist()[0]))

Recommended environment and prompts

We recommend using our fork of vLLM which is developed based on vLLM 0.7.3.
We recommend using empty system prompt.

We haven’t verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo 💻.

@misc{xiaomi2025mimo,
      title={MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining}, 
      author={{Xiaomi LLM-Core Team}},
      year={2025},
      primaryClass={cs.CL},
      url={https://github.com/XiaomiMiMo/MiMo}, 
}

Please contact us at mimo@xiaomi.com or open an issue if you have any questions.

3D printing 3D scanning 5G 6G Adaptive learning AI AI ethics AI governance AI-driven automation AI-driven chatbots AI-driven healthcare AR/VR (Augmented and Virtual Reality)Artificial intelligence Augmented reality Automation Autonomous drones Autonomous vehicles Big data Bioinformatics Biometric security Blockchain Blockchain security Blockchain-as-a-Service Chatbots Cloud computing Cloud infrastructure Cloud security Cloud-native applications Cognitive computing Cryptocurrency Cyber defense Cyber-physical systems Cybersecurity Cybersecurity frameworks Data analytics Data governance Data lakes Data mining Data privacy Deep learning DevOps Digital currency Digital ecosystems Digital payments Digital transformation Digital twins Digital wallets Drones Edge AI Edge computing eSIM technology Fintech Fintech innovation Geospatial analytics Gig economy platforms Green technology Human augmentation Hybrid cloud Hyperautomation Image recognition Intelligent apps Internet of Behaviors (IoB)IoT (Internet of Things)IT operations IT security Machine learning Metaverse Microservices Mobile app development Multi-cloud environments Multi-factor authentication Natural language processing Neural networks Open-source software Predictive analytics Privacy-enhancing technologies Quantum computing Quantum encryption Quantum sensors Renewable energy storage Renewable energy tech Robotics Robotics process automation (RPA)SaaS (Software as a Service)Self-driving cars Serverless computing Smart cities Smart contracts Smart devices Smart grids Smart homes Supply chain tech Tech sustainability Video streaming Virtual assistants Virtual reality Voice recognition Wearable health tech Wearable technology Zero-trust security

XiaomiMiMo/MiMo: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Recommended environment and prompts

Man Utd have to pay €75m to sign 25yo Brazilian star

Donald Trump calls India-born Democrat Rep Shri Thanedar ‘lunatic, dumb guy’ over impeachment filing

Related Posts

Leave a Comment Cancel Reply