Models/Systems
Qwen3 - 04/29/25
- Released by:
- Model - Alibaba unveils Qwen3, a family of ‘hybrid’ AI reasoning models ⭐️
- https://qwenlm.github.io/blog/qwen3/
Introducing Qwen3!
— Qwen (@Alibaba_Qwen) April 28, 2025
We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general… pic.twitter.com/JWZkJeHWhC
Byte Latent Transformer (blt) - Meta - 04/30/25
- Hugging Face model card: facebook/blt
- Paper: Byte Latent Transformer: Patches Scale Better Than Tokens
- code: facebookresearch/blt
Phi-4 - Microsoft - 04/30/25
- Announcement: One year of Phi: Small language models making big leaps in AI
- Article: Microsoft’s most capable new Phi 4 AI model rivals the performance of far larger systems
- Paper: Phi-4-reasoning Technical Report
Mellum - JetBrains - 04/30/25
- Announcement: Mellum Goes Open Source: A Purpose-Built LLM for Developers, Now on Hugging Face
- Hugging Face model card: JetBrains/Mellum-4b-base
OLMo 2 - AllenAI - 05/01/25
- Project page: OLMo 2
- Hugging Face Collection: OLMo 2
- Paper: 2 OLMo 2 Furious
Llama-Nemotron: Efficient Reasoning Models - 05/02/25
- Paper
- NVIDIA Llama Nemotron Ultra Open Model Delivers Groundbreaking Reasoning Accuracy
- Hugging Face space
F Lite - 04/29/25
Agents
Papers
- OLMOTRACE: Tracing Language Model Outputs Back to Trillions of Training Tokens
- The Leaderboard Illusion - 04/29/2
- Phi-4-reasoning Technical Report - 04/30/25
- Byte Latent Transformer: Patches Scale Better Than Tokens
- All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning - 05/03/25
- WebThinker: Empowering Large Reasoning Models with Deep Research Capability
- Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA
- Practical Efficiency of Muon for Pretraining - 05/04/25
Articles
- Why We Think by Lilian Weng
Lectures
- Yann LeCun: Models of SSL - 04/29/25