Running autoresearch Hands-On — Overnight Experiments on a Single GPU
From environment setup to agent execution and overnight results analysis. Tuning guide for smaller GPUs and practical tips.

Running autoresearch Hands-On — Overnight Experiments on a Single GPU
In Part 1, we looked at how Karpathy's autoresearch is structured. Here's the three-line summary:
- A single
train.pycontains the GPT model + optimizer + training loop. - An AI agent (Claude Code, etc.) modifies this file, trains for 5 minutes, and keeps the change if val_bpb improves — otherwise discards it.
program.mddefines the agent's behavior rules. Humans only edit this markdown file.
In Part 2, we'll set up the environment, launch the agent, and analyze the results from an overnight run.
Environment Setup — Getting Started
Requirements
| Item | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA GPU (CUDA support) | H100 80GB |
| Python | 3.10+ | 3.12 |
| Package Manager | uv | uv |
| Agent | Claude Code or Codex | Claude Code |
You don't need an H100. It runs on 4090, A100, 3090, and more. The difference is how many tokens get processed within the fixed 5-minute budget. We'll cover GPU-specific tuning later.
Related Posts

Paperclip — The Open-Source Framework for Running AI Agent Companies
30K GitHub stars in 3 weeks. An open-source multi-agent orchestration platform with org charts, budgets, and governance. Heartbeat scheduling, per-agent monthly budgets, and company templates.

AgentScope Production Deployment — Runtime, Monitoring, Scaling
Docker deployment with agentscope-runtime, OpenTelemetry tracing, AgentScope Studio, RL fine-tuning, production checklist.

AgentScope Realtime Voice Agents — Build 3 Voice AI Apps
Build 3 real voice AI apps — chatbot, simultaneous interpreter, and customer service bot with RealtimeAgent + Gradio.