Writeups, guides, & projects.
A hub for everything published here — ventures, robotics architectures, ML infrastructure notes, and a complete companion series for Stanford's CS 224R. Use the search or filter chips below to jump in.
Ventures & Studios
— what I'm buildingRobotics & Architecture
— hardware meets policyyerobot — Teaching SO-101 to Clean Up From 8 Demos
π0.5 + ACT writeup: data collection, training loop, and what actually generalized.
Open ArchitectureAxiDraw MCP Architecture
How an MCP server drives an AxiDraw plotter — components, control flow, and contracts.
Open ArchitectureSmooth Dancing Robot — Component Architecture
Stack diagram for a robot that moves to music without looking like a microwave.
OpenML Foundations
— the math & the metalCS 224R · Companion Series
— Stanford reinforcement learningCS 224R · Complete Guide Series
The hub for the “from zero” companion guides — start here.
Open HW1 · P1BC with Regression, From Zero
Behavior cloning as supervised regression — loss, data, and where it breaks.
Open HW1 · P2Flow Matching, From Zero
From velocity fields to a working flow-matching trainer, derivations included.
Open HW1 · P3DAgger, From Zero
Dataset aggregation as the answer to BC's compounding-error problem.
Open HW2 · P1Tabular Q-Learning, From Zero
Bellman backups in a gridworld — convergence, exploration, and the table itself.
Open HW2 · P2PPO, From Zero
The clipped surrogate objective derived end-to-end, then implemented.
Open HW2 · P3Off-Policy Actor-Critic, From Zero
Replay buffers, target networks, and the gradient that actually flows.
Open HW3 · P1AWAC, From Zero
Advantage-weighted regression as a bridge from offline data to online fine-tuning.
Open HW3 · P2IQL, From Zero
Implicit Q-learning — expectile regression and avoiding out-of-distribution actions.
Open