Roadmap

Your journey to a first break in AI

Step-by-step learning path: set up a Quarto blog, run models locally, learn inference and training, build an AI product, and ship a capstone project.

Your learning path — one journey, step by step. The cohort runs 1 May 2026 — 30 June 2026 (2 months). Use your AI-based IDE and the community to complete each step. This roadmap is in progress; new steps get added as the cohort grows.

Before you begin

Make sure you’ve completed the Checklist — set up your accounts and join the Discord so you can ask questions and share progress as you go.

Step 1

First use of AI for coding

Set up a Quarto blog and host it on GitHub with an about-me page, blog posts, “Today I learned,” and other pages.

Show details ↓

Tasks:

Set up the project locally, link it to a GitHub repo, and configure GitHub Pages for deployment.
Use your AI-based IDE to complete this setup.

You will learn:

GitHub basics refresher — branches, commits, PRs, merge conflicts
Setting up a personal and blogging website with Quarto
How AI coding tools and SWE agents work in practice

Office Hours sessions:

GitHub collaboration — PRs, conflicts, rebasing (13 Mar)
Claude Code harness — skills, sub-agents, safe learning (10 Apr)

Step 2

Run a model locally

Run Qwen3 0.6B locally in pure C — trace every operation from tokenization to token output.

Show details ↓

You will learn:

Basics of inference: decoding, KV cache, sampling
Chat templates and system prompts (ChatML format, special tokens)
Tokenization — why subword units, vocabulary size trade-offs
Temperature, top-p, and how they control output randomness
GGUF vs SafeTensors — model file formats and quantization
Transformer architecture: self-attention, MHA, positional encoding, decoder-only design

Guides:

Run Qwen3 0.6B in pure C — inference from first principles (tokenization, chat templates, attention, KV cache).
GGUF vs SafeTensors — model weight formats, security, quantization, and why we start with pure C.

Learning resources — follow in this order:

KV Cache explained (video, ~10 min) — start here. Visual walkthrough of why the KV cache exists, how it grows with context length, and what practical models do to keep it small. Directly answers why GPU memory is the bottleneck in inference.
The Illustrated Transformer — Jay Alammar — the most referenced visual guide to transformer architecture. Covers embeddings, self-attention, multi-head attention, and positional encoding with animated diagrams. Read this before the paper — it will make the paper feel simple.
Attention Is All You Need (2017 paper) — the original transformer paper. After reading the Alammar blog, this is approachable. The key contributions: removal of recurrence, self-attention as a core operator, multi-head attention, positional encoding. Shorter than you expect (~15 pages).
Agentic models are the future — Junyang Lin — former Qwen team lead on why reasoning/thinking models are a stepping stone, and why agentic models (RL-trained, tool-using, multi-step) are where the field is heading. Read this last — it gives you a mental map of where everything you are learning is going.

Office Hours sessions:

Qwen3 inference concepts — temperature, chat templates, speculative decoding (13 Mar)
Why attention? LSTM and RNN limitations (27 Mar)
Transformer breakthroughs — self-attention, MHA, positional encoding (27 Mar)
Decoder-only architecture in modern LLMs (27 Mar)
Model taxonomy — dense vs MoE, precision, thinking vs agentic (27 Mar)
KV cache — caveats after the explainer (10 Apr)
Benchmark literacy — reading model cards (10 Apr)
RoPE in practice — RoFormer paper and code (10 Apr)

Step 3

Inference deep dive

Go beyond running a model — understand how inference works under the hood and how to serve models at scale.

Show details ↓

You will learn:

Inference engines and runtimes (vLLM, TGI, llama.cpp server)
Batching, continuous batching, and throughput vs. latency trade-offs
Quantization (GGUF, GPTQ, AWQ) and when to use each format
Speculative decoding — how draft models speed up large model inference
Structured output, function calling, and tool use
Serving and API design for inference endpoints

Office Hours sessions:

Unsloth and LLM efficiency (13 Mar)
Benchmarking in AI (27 Mar)
The three pillars of model development (27 Mar)

Step 4

Training fundamentals

Build the foundations to train and fine-tune models from scratch — from a single GPU to distributed multi-node setups.

Show details ↓

You will learn:

PyTorch fundamentals: tensors, autograd, training loops
Modelling: architectures (transformers, attention, MLP), building blocks from scratch
Data pipelines: datasets, dataloaders, preprocessing, tokenization at scale
Fine-tuning: LoRA, QLoRA, full fine-tune, adapters — when to use each
Distributed training: DDP, FSDP, multi-GPU and multi-node setups
Experiment tracking and evaluation (Weights & Biases, validation loss curves)
Parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism

Projects you’ll be ready for:

nanoGPT speedrun — train GPT-2 scale to a target validation loss as fast as possible
Megatron / Picotron — read and understand production-grade distributed training code

Office Hours sessions:

DDP and the parallelism ladder (27 Mar)
Speedrun and Auto Research GPT (27 Mar)
Speedrun — nanoGPT, target loss, Tyler Romero’s worklog (10 Apr)
DDP — gradients and all-reduce (10 Apr)
Communication bubble, NVLink, DeepSeek-style constraints (10 Apr)
Muon optimizer — from contest to production (10 Apr)
Training script walkthrough — ranks, loops, collectives (10 Apr)

Step 5 — coming soon

Build an AI product

Ship an AI-powered product end to end — from idea to deployed, monitored application.

Show details ↓

You will learn:

Product thinking: problem → solution → users — scoping a project that can be shipped
Building with APIs, RAG, agents, and tool use
Frontend/backend integration for AI features
Deployment, monitoring, and iteration — keeping it running after launch

Office Hours sessions:

Learner project — LAN stock game (scope and shipping) (10 Apr)

Step 6 — coming soon

Capstone project or open-source contribution

Prove what you’ve learned. Pick one: ship a capstone project or make a meaningful contribution to an open-source AI project.

Show details ↓

Options:

Capstone: End-to-end project combining inference, training, or product skills — deployed, documented, and added to your public portfolio
Open-source contribution: Submit a PR to an AI repo (model, library, dataset, docs) — get reviewed, merged, and credited
Present your work to the cohort; get peer feedback

Why it matters: A shipped project or merged PR is the strongest signal on your profile when applying for your first AI role.

Step 1: First use of AI for coding — Quarto blog with GitHub

Goal: Create a Quarto blog and host it on GitHub with an about-me page, blog posts, “Today I learned,” and other pages.

Set up the project locally, link it to a GitHub repository, and configure GitHub Pages for deployment.
Use your AI-based IDE to complete this setup.

Learning objectives: GitHub basics refresher · Setting up a personal and blogging website · Understanding how coding tools or SWE / AI agents work

Office Hours sessions:

GitHub collaboration — PRs, conflicts, rebasing (13 Mar)
Claude Code harness — skills, sub-agents, safe learning (10 Apr)

Step 2: Run a model locally — Basic inference setup

Goal: Run Qwen3 0.6B locally in pure C — trace every operation from tokenization to token output.

Topics: Basics of inference (decoding, KV cache) · Chat templates and system prompts · Tokenization · Temperature and sampling · GGUF vs SafeTensors · Transformer architecture (self-attention, MHA, positional encoding, decoder-only design)

Guides:

Run Qwen3 0.6B in pure C — inference from first principles.
GGUF vs SafeTensors — model weight formats, security, quantization, and why we start with pure C.

Learning resources — follow in this order:

KV Cache explained (video, ~10 min) — why the KV cache exists, how it grows with context length, and what practical models do to keep it small.
The Illustrated Transformer — Jay Alammar — the most referenced visual guide to transformer architecture. Read before the paper.
Attention Is All You Need (2017 paper) — the original transformer paper. Approachable after the Alammar blog.
Agentic models are the future — Junyang Lin — former Qwen team lead on why agentic models (not just thinking models) are where the field is heading.

Office Hours sessions:

Qwen3 inference concepts — temperature, chat templates, speculative decoding (13 Mar)
Why attention? LSTM and RNN limitations (27 Mar)
Transformer breakthroughs — self-attention, MHA, positional encoding (27 Mar)
Decoder-only architecture in modern LLMs (27 Mar)
Model taxonomy — dense vs MoE, precision, thinking vs agentic (27 Mar)
KV cache — caveats after the explainer (10 Apr)
Benchmark literacy — reading model cards (10 Apr)
RoPE in practice — RoFormer paper and code (10 Apr)

Step 3: Inference deep dive

Goal: Go beyond running a model — understand how inference works under the hood and how to serve models.

Topics: Inference engines and runtimes (vLLM, TGI, llama.cpp server) · Batching and continuous batching · Quantization (GGUF, GPTQ, AWQ) · Structured output, function calling, and tool use · Serving and API design for inference endpoints

Office Hours sessions:

Unsloth and LLM efficiency (13 Mar)
Benchmarking in AI (27 Mar)
The three pillars of model development (27 Mar)

Step 4: Training fundamentals

Goal: Build the foundations to train and fine-tune models from scratch.

Topics: PyTorch fundamentals (tensors, autograd, training loops) · Modelling (transformers, attention, MLP) · Data pipelines · Fine-tuning (LoRA, QLoRA, full fine-tune, adapters) · Distributed training (DDP, FSDP, multi-GPU) · Experiment tracking and evaluation

Office Hours sessions:

DDP and the parallelism ladder (27 Mar)
Speedrun and Auto Research GPT (27 Mar)
Speedrun — nanoGPT, target loss, Tyler Romero’s worklog (10 Apr)
DDP — gradients and all-reduce (10 Apr)
Communication bubble, NVLink, DeepSeek-style constraints (10 Apr)
Muon optimizer — from contest to production (10 Apr)
Training script walkthrough — ranks, loops, collectives (10 Apr)

Step 5: Build an AI product (coming soon)

Goal: Ship an AI-powered product end to end.

Topics: Product thinking · Building with APIs, RAG, agents, and tool use · Frontend/backend integration · Deployment, monitoring, and iteration

Office Hours sessions:

Learner project — LAN stock game (scope and shipping) (10 Apr)

Step 6: Capstone project or open-source contribution (coming soon)

Goal: Prove what you’ve learned. Pick one: build a capstone project or make a meaningful contribution to an open-source AI project.

Options: Capstone (end-to-end project, deployed and documented, added to public portfolio) · Open-source contribution (PR to an AI repo — model, library, dataset, or docs) · Present your work to the cohort for peer feedback

More steps can be added as the roadmap grows. Suggest new modules via CONTRIBUTING.md or a pull request.