AI Tools & Agents•March 13, 2026•KR

Running autoresearch Hands-On — Overnight Experiments on a Single GPU

From environment setup to agent execution and overnight results analysis. Tuning guide for smaller GPUs and practical tips.

Running autoresearch Hands-On — Overnight Experiments on a Single GPU

In Part 1, we looked at how Karpathy's autoresearch is structured. Here's the three-line summary:

A single train.py contains the GPT model + optimizer + training loop.
An AI agent (Claude Code, etc.) modifies this file, trains for 5 minutes, and keeps the change if val_bpb improves — otherwise discards it.
program.md defines the agent's behavior rules. Humans only edit this markdown file.

In Part 2, we'll set up the environment, launch the agent, and analyze the results from an overnight run.

Environment Setup — Getting Started

Requirements

Item	Minimum	Recommended
GPU	NVIDIA GPU (CUDA support)	H100 80GB
Python	3.10+	3.12
Package Manager	uv	uv
Agent	Claude Code or Codex	Claude Code

You don't need an H100. It runs on 4090, A100, 3090, and more. The difference is how many tokens get processed within the fixed 5-minute budget. We'll cover GPU-specific tuning later.

Installation and Data Preparation

bash

git clone https://github.com/karpathy/autoresearch
cd autoresearch

# Install uv package manager (skip if already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies (PyTorch 2.9.1, pyarrow, rustbpe, tiktoken, etc.)
uv sync

# Download data + train tokenizer (~2 min)
uv run prepare.py

uv run prepare.py does two things:

Downloads parquet shards from the climbmix-400b-shuffle dataset on HuggingFace. By default, 10 training shards + 1 fixed validation shard (shard_06542).
Trains a BPE tokenizer with rustbpe. vocab_size 8192, GPT-4 style split pattern.

🔒