Roadmap
Your journey to a first break in AI
Your learning path — one journey, step by step. The cohort runs 1 May 2026 — 30 June 2026 (2 months). Use your AI-based IDE and the community to complete each step. This roadmap is in progress; new steps get added as the cohort grows.
Make sure you’ve completed the Checklist — set up your accounts and join the Discord so you can ask questions and share progress as you go.
Step 1
First use of AI for coding
Set up a Quarto blog and host it on GitHub with an about-me page, blog posts, “Today I learned,” and other pages.
Show details ↓
Tasks:
- Set up the project locally, link it to a GitHub repo, and configure GitHub Pages for deployment.
- Use your AI-based IDE to complete this setup.
You will learn:
- GitHub basics refresher — branches, commits, PRs, merge conflicts
- Setting up a personal and blogging website with Quarto
- How AI coding tools and SWE agents work in practice
Office Hours sessions:
Step 2
Run a model locally
Run Qwen3 0.6B locally in pure C — trace every operation from tokenization to token output.
Show details ↓
You will learn:
- Basics of inference: decoding, KV cache, sampling
- Chat templates and system prompts (ChatML format, special tokens)
- Tokenization — why subword units, vocabulary size trade-offs
- Temperature, top-p, and how they control output randomness
- GGUF vs SafeTensors — model file formats and quantization
- Transformer architecture: self-attention, MHA, positional encoding, decoder-only design
Guides:
- Run Qwen3 0.6B in pure C — inference from first principles (tokenization, chat templates, attention, KV cache).
- GGUF vs SafeTensors — model weight formats, security, quantization, and why we start with pure C.
Learning resources — follow in this order:
- KV Cache explained (video, ~10 min) — start here. Visual walkthrough of why the KV cache exists, how it grows with context length, and what practical models do to keep it small. Directly answers why GPU memory is the bottleneck in inference.
- The Illustrated Transformer — Jay Alammar — the most referenced visual guide to transformer architecture. Covers embeddings, self-attention, multi-head attention, and positional encoding with animated diagrams. Read this before the paper — it will make the paper feel simple.
- Attention Is All You Need (2017 paper) — the original transformer paper. After reading the Alammar blog, this is approachable. The key contributions: removal of recurrence, self-attention as a core operator, multi-head attention, positional encoding. Shorter than you expect (~15 pages).
- Agentic models are the future — Junyang Lin — former Qwen team lead on why reasoning/thinking models are a stepping stone, and why agentic models (RL-trained, tool-using, multi-step) are where the field is heading. Read this last — it gives you a mental map of where everything you are learning is going.
Office Hours sessions:
- Qwen3 inference concepts — temperature, chat templates, speculative decoding (13 Mar)
- Why attention? LSTM and RNN limitations (27 Mar)
- Transformer breakthroughs — self-attention, MHA, positional encoding (27 Mar)
- Decoder-only architecture in modern LLMs (27 Mar)
- Model taxonomy — dense vs MoE, precision, thinking vs agentic (27 Mar)
- KV cache — caveats after the explainer (10 Apr)
- Benchmark literacy — reading model cards (10 Apr)
- RoPE in practice — RoFormer paper and code (10 Apr)
Step 3
Inference deep dive
Go beyond running a model — understand how inference works under the hood and how to serve models at scale.
Show details ↓
You will learn:
- Inference engines and runtimes (vLLM, TGI, llama.cpp server)
- Batching, continuous batching, and throughput vs. latency trade-offs
- Quantization (GGUF, GPTQ, AWQ) and when to use each format
- Speculative decoding — how draft models speed up large model inference
- Structured output, function calling, and tool use
- Serving and API design for inference endpoints
Office Hours sessions:
- Unsloth and LLM efficiency (13 Mar)
- Benchmarking in AI (27 Mar)
- The three pillars of model development (27 Mar)
Step 4
Training fundamentals
Build the foundations to train and fine-tune models from scratch — from a single GPU to distributed multi-node setups.
Show details ↓
You will learn:
- PyTorch fundamentals: tensors, autograd, training loops
- Modelling: architectures (transformers, attention, MLP), building blocks from scratch
- Data pipelines: datasets, dataloaders, preprocessing, tokenization at scale
- Fine-tuning: LoRA, QLoRA, full fine-tune, adapters — when to use each
- Distributed training: DDP, FSDP, multi-GPU and multi-node setups
- Experiment tracking and evaluation (Weights & Biases, validation loss curves)
- Parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Projects you’ll be ready for:
- nanoGPT speedrun — train GPT-2 scale to a target validation loss as fast as possible
- Megatron / Picotron — read and understand production-grade distributed training code
Office Hours sessions:
- DDP and the parallelism ladder (27 Mar)
- Speedrun and Auto Research GPT (27 Mar)
- Speedrun — nanoGPT, target loss, Tyler Romero’s worklog (10 Apr)
- DDP — gradients and all-reduce (10 Apr)
- Communication bubble, NVLink, DeepSeek-style constraints (10 Apr)
- Muon optimizer — from contest to production (10 Apr)
- Training script walkthrough — ranks, loops, collectives (10 Apr)
Step 5 — coming soon
Build an AI product
Ship an AI-powered product end to end — from idea to deployed, monitored application.
Show details ↓
You will learn:
- Product thinking: problem → solution → users — scoping a project that can be shipped
- Building with APIs, RAG, agents, and tool use
- Frontend/backend integration for AI features
- Deployment, monitoring, and iteration — keeping it running after launch
Office Hours sessions:
Step 6 — coming soon
Capstone project or open-source contribution
Prove what you’ve learned. Pick one: ship a capstone project or make a meaningful contribution to an open-source AI project.
Show details ↓
Options:
- Capstone: End-to-end project combining inference, training, or product skills — deployed, documented, and added to your public portfolio
- Open-source contribution: Submit a PR to an AI repo (model, library, dataset, docs) — get reviewed, merged, and credited
- Present your work to the cohort; get peer feedback
Why it matters: A shipped project or merged PR is the strongest signal on your profile when applying for your first AI role.
Step 1: First use of AI for coding — Quarto blog with GitHub
Goal: Create a Quarto blog and host it on GitHub with an about-me page, blog posts, “Today I learned,” and other pages.
- Set up the project locally, link it to a GitHub repository, and configure GitHub Pages for deployment.
- Use your AI-based IDE to complete this setup.
Learning objectives: GitHub basics refresher · Setting up a personal and blogging website · Understanding how coding tools or SWE / AI agents work
Office Hours sessions:
Step 2: Run a model locally — Basic inference setup
Goal: Run Qwen3 0.6B locally in pure C — trace every operation from tokenization to token output.
Topics: Basics of inference (decoding, KV cache) · Chat templates and system prompts · Tokenization · Temperature and sampling · GGUF vs SafeTensors · Transformer architecture (self-attention, MHA, positional encoding, decoder-only design)
Guides:
- Run Qwen3 0.6B in pure C — inference from first principles.
- GGUF vs SafeTensors — model weight formats, security, quantization, and why we start with pure C.
Learning resources — follow in this order:
- KV Cache explained (video, ~10 min) — why the KV cache exists, how it grows with context length, and what practical models do to keep it small.
- The Illustrated Transformer — Jay Alammar — the most referenced visual guide to transformer architecture. Read before the paper.
- Attention Is All You Need (2017 paper) — the original transformer paper. Approachable after the Alammar blog.
- Agentic models are the future — Junyang Lin — former Qwen team lead on why agentic models (not just thinking models) are where the field is heading.
Office Hours sessions:
- Qwen3 inference concepts — temperature, chat templates, speculative decoding (13 Mar)
- Why attention? LSTM and RNN limitations (27 Mar)
- Transformer breakthroughs — self-attention, MHA, positional encoding (27 Mar)
- Decoder-only architecture in modern LLMs (27 Mar)
- Model taxonomy — dense vs MoE, precision, thinking vs agentic (27 Mar)
- KV cache — caveats after the explainer (10 Apr)
- Benchmark literacy — reading model cards (10 Apr)
- RoPE in practice — RoFormer paper and code (10 Apr)
Step 3: Inference deep dive
Goal: Go beyond running a model — understand how inference works under the hood and how to serve models.
Topics: Inference engines and runtimes (vLLM, TGI, llama.cpp server) · Batching and continuous batching · Quantization (GGUF, GPTQ, AWQ) · Structured output, function calling, and tool use · Serving and API design for inference endpoints
Office Hours sessions:
- Unsloth and LLM efficiency (13 Mar)
- Benchmarking in AI (27 Mar)
- The three pillars of model development (27 Mar)
Step 4: Training fundamentals
Goal: Build the foundations to train and fine-tune models from scratch.
Topics: PyTorch fundamentals (tensors, autograd, training loops) · Modelling (transformers, attention, MLP) · Data pipelines · Fine-tuning (LoRA, QLoRA, full fine-tune, adapters) · Distributed training (DDP, FSDP, multi-GPU) · Experiment tracking and evaluation
Office Hours sessions:
- DDP and the parallelism ladder (27 Mar)
- Speedrun and Auto Research GPT (27 Mar)
- Speedrun — nanoGPT, target loss, Tyler Romero’s worklog (10 Apr)
- DDP — gradients and all-reduce (10 Apr)
- Communication bubble, NVLink, DeepSeek-style constraints (10 Apr)
- Muon optimizer — from contest to production (10 Apr)
- Training script walkthrough — ranks, loops, collectives (10 Apr)
Step 5: Build an AI product (coming soon)
Goal: Ship an AI-powered product end to end.
Topics: Product thinking · Building with APIs, RAG, agents, and tool use · Frontend/backend integration · Deployment, monitoring, and iteration
Office Hours sessions:
Step 6: Capstone project or open-source contribution (coming soon)
Goal: Prove what you’ve learned. Pick one: build a capstone project or make a meaningful contribution to an open-source AI project.
Options: Capstone (end-to-end project, deployed and documented, added to public portfolio) · Open-source contribution (PR to an AI repo — model, library, dataset, or docs) · Present your work to the cohort for peer feedback
More steps can be added as the roadmap grows. Suggest new modules via CONTRIBUTING.md or a pull request.