CS 124 is introductory computer science (CS1) at the University of Illinois, enrolling over 2,000 students per year. 80% are non-majors. Here’s what we’re doing and why.
Geoffrey Challen has been teaching the course since Fall 2017, and along the way we’ve learned a lot about what works when teaching intro CS at scale. A recent talk at the University of Sydney overviews many of the course innovations described below.
CS 124 doesn’t hold lectures. Instead, students work through daily interactive lessons that mix text, runnable code examples, interactive walkthroughs—more on those below—short videos, and practice problems. Students engage at their own pace, which matters a lot when you have a wide range of prior experience in the room—from students who’ve never written a line of code to those with years of experience.
Daily engagement also achieves spaced repetition naturally. Rather than cramming before a midterm, students return to new material every day. And because we’re not constrained by lecture slots, we’ve built up explanations from multiple instructors, so if one explanation doesn’t click, there’s usually another that might.
One of our more distinctive innovations is what we call interactive walkthroughs—recorded code editing sessions that students can replay and interact with. These aren’t videos. They’re actual editor replays: students see code being written character by character, can pause and edit the code themselves, and resume the walkthrough. It’s closer to watching someone code live, except you can rewind, speed up, and experiment along the way. You can see examples at learncs.online.
A student interacting with a walkthrough
An instructor recording a walkthrough
CS 124 has no final exam, no midterm, and no high-stakes assessments. Instead, students complete a daily homework problem and take a weekly 50-minute proctored quiz in a dedicated computer-based testing facility. Each quiz is worth only about 2.5% of the grade. Everything is autograded with immediate feedback, and students get unlimited attempts on programming problems—until the deadline or quiz timer runs out.
We’ve found that frequent small assessment may be the single most important component of student success. Students study regularly, we catch struggles early and reach out when someone does poorly on a quiz, and the low stakes reduce anxiety. Frequent data points also enable learning-focused policies that would be impossible with high-stakes exams: dropping lowest scores, allowing quiz retakes where students return to questions from previous weeks, and catch-up grading where doing better on a later quiz raises earlier scores.
Counterintuitively, this is also more rigorous. In Fall 2017, students wrote code in a proctored setting exactly once—on a paper final exam. Now they complete multiple autograded programming challenges every week, totaling 15 hours of proctored assessment per semester versus 3–4 previously.
For more on this approach, see Geoffrey’s talk on frequent assessment.
Frequent assessment creates a demand for problems. We needed a way to author them fast and accurately.
The insight behind our autograder, Questioner, is that when autograding, the solution is known. This is fundamentally different from software testing, where only the desired behavior is known. So instead of maintaining three sources of truth—a description, a solution, and tests—the author provides just a description and a reference solution, and Questioner generates and validates the testing strategy automatically using source code mutation.
The old process took hours per problem with unknown accuracy. The new process produces several problems per hour with validated accuracy. We’ve authored over 700 problems since Fall 2020 across multiple question types—code writing, debugging, tracing, and conceptual questions. Questioner also evaluates code quality—not just “does it work?” but “is it good?”—giving students instant feedback on complexity, style, and efficiency.
The same mutation engine powers debugging exercises. We mutate correct student submissions to create buggy versions, and students must find and fix the bug without rewriting. This forces them to read others’ code and exposes them to different solution approaches.
Our tutoring model is built around peer tutors—recent CS 124 graduates who took the course themselves. They remember what was hard, they know the material, and they’re motivated to help.
The core of the system is an online-first tutoring platform that provides immediate 1-on-1 support throughout the day. When a student has a question—at 9 AM or 9 PM—a tutor is usually available within minutes. No waiting for office hours, no standing in line.
Student-tutor online tutoring interaction flow
Staff are organized into tiers: volunteer assistants gaining experience, paid associate mentors, TAs, and head TAs. When a student struggles on a quiz, tutors proactively reach out to offer support. We’re not waiting for students to come to us.
This is where things have gotten really interesting.
The journey. In Fall 2024, we began allowing students to use AI on the course project as an experiment. Over the summer of 2025, Geoffrey tested whether Claude could complete the Fall 2025 project from test suites alone—and it could, completing almost everything with minimal human intervention. In Fall 2025, we tried a compromise: keeping the same project structure while allowing AI. But it didn’t work. Traditional programming assignments hinge on a specification that students translate to code, and AI coding agents have become too good at that translation step. The tension between precise specifications—needed for fair automated grading—and imprecise specifications—needed to prevent AI from doing all the work—proved irreconcilable.
The key insight. If AI handles the translation from specification to code, the valuable human contribution is formulating the idea and specification itself. That realization drove the redesign.
My Project. Starting in Spring 2026, every student designs and builds their own unique Android app, working with AI coding agents—specifically Claude Code—throughout the process. Students aren’t translating our specification into code—they’re creating their own specification and learning to communicate it effectively to an AI agent.
The preparatory pipeline. We don’t just hand students an AI tool and wish them luck. The project includes a carefully scaffolded sequence of activities:
Each activity builds on the previous one, moving students from “I have no idea what to build” to “I have a plan and know how to execute it with AI.”
Assessment strategy. We weight proctored quizzes at 70% of the grade. These are taken without AI in a computer-based testing facility, ensuring students develop real programming fundamentals. The project is 20% of the grade and is where students practice AI-assisted development. This split lets us teach both traditional programming and AI collaboration without one undermining the other.
Learning objectives for AI collaboration:
For the full story, see Geoffrey’s presentation to CS 124 students at the end of the Fall 2025 semester, and a follow-up talk on using and teaching coding agents.
Beyond the project, LLMs are woven throughout CS 124 to solve specific educational problems that arise when teaching 2,000+ students.
AI Teaching Assistant. Students need help outside office hours, and even with a large staff, 2,000+ students can’t all get 1-on-1 time whenever they need it. A chat assistant answers questions using course content while maintaining academic integrity guardrails—it won’t solve homework problems. It uses retrieval-augmented generation (RAG) to search transcribed lectures and walkthroughs, so answers stay grounded in what’s actually been taught rather than generic internet knowledge.
Automatic Transcription. Hundreds of hours of walkthrough recordings and videos need to be searchable and accessible. WhisperX runs locally to transcribe all audio with word-level timestamps, enabling both the search pipeline and student-facing transcripts and captions. This makes the entire content library accessible to students who prefer reading, need captions, or want to search for a specific topic across all recordings.
Semantic Search. With a large and growing content library, students and the AI assistant need to find relevant material quickly. Transcripts are chunked, embedded, and stored in a vector database for hybrid semantic and keyword search across all course content. A student asking “how do I use recursion with linked lists?” gets pointed to the right walkthrough segment, not just a keyword match.
Pitch Practice Feedback. Students preparing elevator pitches for their project ideas need a way to practice without requiring staff time for every attempt. They record themselves, get an automatic transcription, and receive encouraging feedback on clarity, delivery, and timing—letting them iterate before presenting to real people.
Submission Validation. Hundreds of project plan submissions need basic quality screening. An LLM validates that submitted plans are genuine implementation plans rather than placeholder text, providing instant feedback so students can fix issues immediately rather than waiting for manual review.