EAG V3 Curriculum

Act 1 — Foundations

01 Foundations of Transformer Architecture

Understand neural networks, attention mechanisms, and positional encoding that power modern AI. You can't debug what you don't understand.

Core Topics

Neural networks and the Universal Approximation Theorem
Backpropagation walkthrough
The attention mechanism: why it works, what it computes
Multi-head attention and what different heads learn
Positional encoding: why position matters in sequences
Embeddings: from discrete tokens to continuous space

Sample Assignment

Chrome extension that interacts with an LLM API.

02 Modern LLM Internals & The 2026 Model Landscape

Learn tokenization, scaling laws, RLHF alignment, and the current state of reasoning and multi-modal models.

Core Topics

Tokenization deep dive (BPE, SentencePiece)
Scaling laws: Chinchilla, compute-optimal training
Causal language modeling, pre-training objectives
SFT, RLHF, DPO — aligning models to follow instructions
Emergent abilities: phase transitions at scale thresholds
NEW The 2026 model landscape — reasoning models, multi-modal models, local models
NEW What models can and cannot do — setting realistic expectations

Sample Assignment

Enhanced Chrome extension using Gemini Flash / Claude Haiku with streaming.

03 Developer Foundations & Your First Agent

Build Python and Node.js skills, then create your first goal-directed agent with a working web UI.

Core Topics

Python essentials: async/await, decorators, type hints, dataclasses
NEW Node.js/NPM basics: project setup, package.json, Express server
NEW UI fundamentals: React (or vanilla JS) + Vite
Three pillars of agency: goal-directed behavior, interactive capacity, autonomous decision-making
LLM vs RAG vs Agents — the spectrum
Build your first agent step by step (perception → decision → action)

Sample Assignment

Full-stack agent with a working web UI — backend in Python, frontend in Node.js/React. Agent takes a goal and executes it with at least one tool.

04 MCP — The Tool Protocol

Master the Model Context Protocol for tool registration, discovery, and invocation across servers.

Core Topics

The journey: Foundation LLMs → Function Calling → Agentic AI → MCP
MCP Server/Client architecture
stdio vs SSE transport, JSON-RPC communication
Tool registration, invocation, and result handling
Building MCP servers in Python AND TypeScript
Dynamic tool discovery: summary layer, hint-based filtering
NEW MCP as the first layer of the protocol stack (MCP → A2A → A2UI/AG-UI)

Sample Assignment

Build a custom MCP server wrapping a real-world API (weather, stocks, email, calendar — each student picks different). Connect it to your agent from Session 3.

Act 2 — Intelligence

05 Planning, Reasoning & Structured Prompting

Implement Chain-of-Thought, ReACT patterns, and self-validating task decomposition for intelligent agents.

Core Topics

Chain-of-Thought: activating latent reasoning circuits
ReACT: interleaving reasoning with tool actions
Structured prompting formats: input-output templates, step-labeled reasoning
Self-validation: agents that check their own work
Task decomposition and dependency-aware planning
NEW When to let the model reason vs. when to enforce structure

Sample Assignment

Multi-step agent that decomposes a complex goal, plans tool usage, executes, and self-validates results.

06 Cognitive Architecture & Adaptive Planning

Design the 4-layer cognitive pipeline with strategy profiles and adaptive retry loops.

Core Topics

4-layer cognitive pipeline: Perception → Memory → Decision → Action
Pydantic for typed data flow between layers
Strategy profiles: Conservative, Exploratory, Fallback
Agent-written Python plans: LLM generates solve() functions
Adaptive planning loops: controlled retry, tool switching, bounded retries
Memory-aware planning and planner introspection

Sample Assignment

4-module agent with user preference input, strategy selection, and adaptive retry across different problem types.

07 Memory Systems & Modern RAG

Build 3-tier memory (preferences, episodic, factual) with hybrid retrieval and semantic chunking.

Core Topics

Hybrid retrieval (semantic + BM25 + RRF) — modernized beyond basic FAISS
When RAG beats huge context windows (and when it doesn't)
Embedding models (Gemini, Nomic, Ollama-based)
Semantic chunking strategies
3-tier memory: REMME (Preferences), Episodic (Recipes), Factual (Knowledge)
Memory injection into agent prompts at planning time
NEW MarkItDown, Trafilatura, PyMuPDF4LLM for document processing

Sample Assignment

Agent with 3-tier memory that learns preferences, recalls past workflows, and retrieves facts. Demonstrate persistence across sessions.

08 Multi-Agent Systems & DAG Architecture

Coordinate multiple agents using directed acyclic graphs with parallel execution and session persistence.

Core Topics

Single vs multi-agent trade-offs
Coordination patterns: Parallel, Sequential, Loop, Router
“Don't build loops; build graphs” — the paradigm shift
NetworkX DiGraph as the execution substrate
Topological sorting for parallel-safe execution
Blackboard architecture via shared session state
NEW Fallback strategies — 3x code_variants per step, fallback nodes

Sample Assignment

Multi-agent DAG executor with 3+ agent types supporting parallel execution, session persistence, and resumption after interruption.

Act 3 — Capabilities

09 Browser Agents & Autonomous Web

Automate web browsing with Playwright, vision-capable navigation, and multi-source research pipelines.

Core Topics

Playwright-based browser automation (headless and headed)
Chrome DevTools Protocol (CDP) integration
Waterfall search strategy across 5+ engines
Triple extraction: Trafilatura vs Readability vs BeautifulSoup
Vision-capable browsing: screenshots + VLM analysis
Autonomous navigation: form filling, multi-page workflows
NEW Browser profiles, session persistence, cookie management
NEW Anti-detection patterns and ethical scraping

Sample Assignment

Agent that autonomously researches a topic across 5+ sources, extracts structured data, compares results, and generates a synthesis report.

10 Computer Use & Desktop Agents

Control desktop applications using screen understanding, accessibility trees, and OS-level automation.

Core Topics

Anthropic Computer Use API — the standard for screen interaction
Screen understanding with VLMs
UI element detection: YOLO/ONNX model for buttons, text fields, menus
Accessibility tree integration
Multi-modal perception pipeline: screenshot → YOLO → VLM → action
NEW Cross-platform considerations (macOS, Linux, Windows)
NEW Application-specific automation patterns

Sample Assignment

Agent that operates a desktop application to complete a real task using vision + accessibility tree, not just scripted clicks.

11 Channel Architecture, Voice & Gateway New

Connect agents to WhatsApp, Slack, Discord, voice, and 20+ channels through a unified adapter pattern.

Core Topics

The channel adapter pattern: one interface, many implementations
Gateway architecture: WebSocket control plane, session management
Building adapters for: WhatsApp, Telegram, Slack, Discord, Signal, Teams, LINE, IRC, Matrix, and more
Voice as first-class modality: STT, TTS, real-time bidirectional conversation
Multi-channel inbox: unified message handling
Daemon installation: launchd (macOS), systemd (Linux)
NEW Device nodes — companion apps exposing camera, screen, location

Sample Assignment

Each student/group picks a different channel to integrate. Build a complete adapter that connects to your agent pipeline with message ingress, formatting, and reply routing.

12 Error Correction, Safety & Container Isolation

Implement circuit breakers, JSON repair, and Docker-based sandboxing for safe agent execution.

Core Topics

JSON repair pipeline: fenced/balanced/json_repair
Code variants resilience: 3 attempts per step
State machine design: pending → running → completed | failed | stopped
Circuit breaker pattern: CLOSED → OPEN → HALF_OPEN
Container-first isolation: Docker, Apple Container
Per-agent isolated filesystem with explicit mount policies
Cost management: threshold enforcement, budget-aware execution
NEW Security logging and audit trails

Sample Assignment

Container-isolated agent execution with circuit breaker. Demonstrate that a misbehaving agent cannot access the host system.

Act 4 — Interoperability

13 A2A — Agent-to-Agent Protocol New

Enable cross-vendor agent collaboration using Google's Agent2Agent protocol with capability discovery and task delegation.

Core Topics

Why A2A? Cross-vendor agent coordination
Google's Agent2Agent protocol (50+ partners, Linux Foundation governance)
Agent Cards: JSON capability advertisements for discovery
JSON-RPC 2.0 over HTTP(S) communication
Three interaction modes: synchronous, streaming (SSE), async push
Building A2A servers and clients
Federated agent systems across organizations
NEW gRPC support, signed security cards

Sample Assignment

Build an A2A-compliant agent discovered and invoked by other students' agents. Demonstrate cross-agent task delegation to at least 2 other agents.

14 A2UI / AG-UI — Agent-to-User Interface New

Build agents that generate dynamic, interactive UIs at runtime using declarative and event-based protocols.

Core Topics

The third protocol layer: MCP + A2A + A2UI/AG-UI
A2UI (Google): declarative components, native rendering, security-first
AG-UI (CopilotKit/Oracle/Microsoft): event-based streaming, ~16 event types
Generative UI patterns: agents creating interfaces at runtime
Canvas/live visual runtime: WebSocket-synchronized surfaces
A2UI vs AG-UI — when to use which
NEW The Vercel v0 model — agents as UI generators

Sample Assignment

Agent that generates dynamic, interactive UIs — e.g., custom dashboards or comparison tables with interactive filters using A2UI or AG-UI protocol.

15 Model Routing, Agent Economics & Observability New

Implement intelligent multi-model routing with cost tracking, budget controls, and OpenTelemetry instrumentation.

Core Topics

Multi-model landscape: frontier (Opus, GPT-5), mid-tier (Sonnet, Flash), local (Llama, Phi, Qwen)
Role-based model selection and ModelManager
Intelligent routing: task complexity → automatic model selection
Cost tracking: per-request, per-agent, per-session metering
Prompt caching strategies for cost/latency optimization
OpenTelemetry: spans, traces, Jaeger visualization
NEW Budget-aware autonomous agents that optimize their own cost

Sample Assignment

Intelligent model router with cost dashboard. Auto-select between 3+ models based on task complexity and demonstrate cost savings vs. always-using-frontier.

Act 5 — Autonomy & Production

16 Event-Driven Autonomous Agents

Shift from reactive to proactive agents that monitor event streams, evaluate relevance, and act autonomously.

Core Topics

From reactive to proactive: “ask → answer” to “event → decide → act”
Cron jobs, webhooks, Gmail Pub/Sub
Event bus architecture: publish-subscribe with history replay
Autonomous decision-making: evaluate relevance, decide whether to act
Karpathy's autoresearch principles: “never stop” autonomy, constraint design, markdown-as-code
Fixed metrics for evaluation, accept/reject with git
NEW Real-time telemetry streaming during autonomous operation

Sample Assignment

Agent monitoring real event streams (GitHub webhooks, email, or custom) acting autonomously for at least 1 hour with a human-reviewable audit log.

17 Agentic Coding & Markdown-as-Code Skills

Build coding agents with System 2 reasoning, codebase navigation, and markdown-driven skill injection.

Core Topics

How coding agents work: Claude Code, Cursor, Windsurf architecture
File system awareness, diff generation, test running, git integration
Context management across large codebases
System 2 Reasoning Engine: Draft-Verify-Refine loop
Markdown-as-Code Skills: GenericSkill reads SKILL.md files
Karpathy: “You are programming the program.md”
NEW JitRL Query Optimizer — rewriting queries before planning

Sample Assignment

Coding agent that reads a codebase, identifies a bug, generates a fix, runs tests, and iterates until tests pass using System 2 reasoning and SKILL.md.

18 Agent Evaluation, Benchmarking & Capstone Prep

Design custom eval harnesses, run GAIA/SWE-bench benchmarks, and prepare capstone proposals.

Core Topics

GAIA benchmarks: multi-step reasoning evaluation
SWE-bench: software engineering task evaluation
Custom eval harnesses for domain-specific benchmarks
Regression testing for agent behavior
A/B testing: planning strategies, model configs, prompt variations
Measuring what matters: accuracy, cost, latency, safety
Capstone requirements: 30-day project, ArXiv-style paper, public demo
NEW Agent safety evaluation — prompt injection, tool misuse, cost runaway

Sample Assignment

Custom eval harness with 20+ test cases, automated scoring, regression detection, and a report comparing two configurations. Plus: capstone proposal draft.

19 Arcturus 2.0 — Full Integration & The Complete Platform

Integrate all three protocol layers (MCP + A2A + A2UI) into a production-ready agentic platform.

Core Topics

Complete protocol stack in one system: MCP + A2A + A2UI/AG-UI
Arcturus 2.0 architecture: how all 19 sessions integrate
Production deployment: Docker Compose, health checks, restart policies
Gateway API platform: auth, rate limiting, metering, webhooks
Live demo: task → DAG → agents → MCP → A2UI → channel
What's next: agent economies, self-improving systems, governance

Sample Assignment

Finalize capstone with GitHub Projects plan, or submit a PR to Arcturus 2.0 implementing a course concept. 2-minute lightning preview of capstone idea.

20 Capstone Pitches & Arcturus 2.0 Contributions

Student presentations and the beginning of the 4-week capstone execution window.

Format

5-minute pitch per team + 2–3 minutes Q&A
GitHub Projects reviewed
Go/no-go decision on proposals
4-week execution window begins

The Five-Act Arc

Session-by-Session

“Best in the World” Checklist