New We released Crust, AI Agent Safety Infrastructure The infrastructure where
The infrastructure where
AI learns to do real work.
Agents need more than static data. They need environments, evaluation, and expert guidance.
Diagnosis · Evaluation · Expert Data · RL Environments
Leading AI labs use our open-source infrastructure to make AI systems work in the real world.
Benchmarks & Infrastructure
AI Can't Game
Product
Bake AI Infrastructures
From failure diagnosis to real-world training
BakeLens
Evaluation & Diagnosis
Know if your agents will fail.
Proof
Expert-Level Data
Fix them with human expert signals.
RL & Interactive
Learning Environments
Test before you deploy.
Domain Coverage
Critical Decisions
- Finance & Risk-Sensitive Decision Making
- Safety, Alignment, Red-Teaming
Verifiable Tasks
- Coding & Software Engineering Agents
- STEM & Multimodal Reasoning
Human Judgment
- Writing, Judgment, EQ
- Aesthetic & Art
BakeLab Research
The science behind reliable AI.
PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory
Personalization & Agentic Memory Preprint
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Agent Safety ICLR 2026
CoDA: Agentic Systems for Collaborative Data Visualization
Agentic System ICLR 2026
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
Tool-Agentic Data for SFT & RL Preprint
ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions
Scientific Data for SFT NeurIPS 2025