BakeLab
The Science Behind Reliable Agents
Developed in Research. Proven in Deployment.
1.5k+ Stars
400+ Model Extensions
600+ Derivative Datasets
HuggingFace Daily#1
Expert-Level Data Quality & Verification Agent Failure Diagnosis & Taxonomy Evaluation Methodology for Real-World Tasks Human-AI Collaboration in Data Production
Publications & Blog
Open-source research from our founding team, published at top venues.
Paper Preprint Dec 2025
PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory
Personalization & Agentic Memory
Paper ICLR 2026 Oct 2025
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Agent Safety
Paper ICLR 2026 Oct 2025
CoDA: Agentic Systems for Collaborative Data Visualization
Agentic System
Paper Preprint Oct 2025
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
Tool-Agentic Data for SFT & RL
Paper NeurIPS 2025 Sep 2025
ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions
Scientific Data for SFT
Paper Preprint May 2025
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL
Multi-Modal Data for SFT & RL
Paper ACL 2025 Mar 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Coding Data for SFT & RL
Paper NAACL 2025 Nov 2024
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Response Data for SFT & DPO
Paper ICLR 2025 Jun 2024
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Question Data for SFT & DPO
Collaborate with us
We work with frontier labs, universities, and research teams. If you're working on hard problems in data quality, agent evaluation, or failure diagnosis, let's talk.
Talk to an Expert