Kotlin

Java

CS 124 for Educators

CS 124 is introductory computer science (CS1) at the University of Illinois, enrolling over 2,000 students per year. 80% are non-majors. Here’s what we’re doing and why.

Geoff Challen has been teaching the course since Fall 2017, and along the way we’ve learned a lot about what works when teaching intro CS at scale. A recent talk at the University of Sydney overviews many of the course innovations described below.

Teaching Conversational Programming
Teaching Conversational Programming

This is where things have gotten really interesting.

The journey. In Fall 2024, we began allowing students to use AI on the course project as an experiment. Over the summer of 2025, Geoff tested whether Claude could complete the Fall 2025 project from test suites alone—and it could, completing almost everything with minimal human intervention.

In Fall 2025, we tried a compromise: keeping the same project structure while allowing AI. But it didn’t work. Traditional programming assignments hinge on a specification that students translate to code, and AI coding agents have become too good at that translation step. The tension between precise specifications—needed for fair automated grading—and imprecise specifications—needed to prevent AI from doing all the work—proved irreconcilable.

Claude completing the Fall 2025 project at 32x speed

The key insight. If AI handles the translation from specification to code, the valuable human contribution is formulating the idea and specification itself. That realization drove the redesign.

My Project
My Project

Starting in Spring 2026, every student designs and builds their own unique Android app, working with AI coding agents—specifically Claude Code—throughout the process. Students aren’t translating our specification into code—they’re creating their own specification and learning to communicate it effectively to an AI agent.

The preparatory pipeline. We don’t just hand students an AI tool and wish them luck. The project unfolds as a carefully scaffolded sequence of weekly discussion-section activities that carry students from “I have no idea what to build” to a shipped app:

Week 1—Q&A with Geoff: Kick off the semester and the project vision
Week 2—Classmate Bingo: Get to know fellow CS 124 students
Week 3—App Brainstorming: Generate and refine ideas for Android apps you could build
Week 4—App Idea Feedback: Pitch ideas in rotating sessions and gather feedback from multiple peers
Week 5—Claude Planning: Work with Claude to develop a project plan
Week 6—Implementation Planning: Turn that plan into a concrete implementation plan
Week 7—Elevator Pitch: Pitch the idea to tablemates and refine it
Week 8—Android Setup: Set up the development environment and build a first Android app
(Week 9: Spring Break)
Week 10—Project Kickoff: Start building the independent project with Claude Code
Week 11—Project Working Session: Keep building and try Claude Code Plan Mode
Week 12—Effective Agentic Development: Learn techniques that help Claude Code work more effectively
Week 13—High-Level Architectural Design: Use Claude to build genuine understanding of the codebase
Week 14—Learn Something Along the Way: Use Claude as a tutor to really understand a CS concept the app depends on
Week 15—App Review: Use Claude to honestly critique the app and practice prompting for critical feedback
Week 16—Final Reflection: Capture feedback on the semester

From Week 10 onward, students also work through a menu of design, development, and evaluation challenges (described below), giving them agency over their learning path while ensuring they engage with different aspects of software development. Each activity builds on the previous one, moving students from “I have no idea what to build” to “I have a plan and know how to execute it with AI.”

Project Tasks
Project Tasks

After building their MVP, students choose from a menu of challenges across four categories. Students continue choosing and completing tasks for the rest of the semester, allowing them agency over their learning path and the freedom to move at their own pace. Some have an AI verification step before staff approval; others require a tutor to confirm the work in person.

Design:

Simplify Something: Identify the most complex or cluttered part of your app. Simplify it—remove unnecessary options, combine steps, or break it into something clearer. Explain what you removed and why the app is better without it.
Screen Flow Redesign: Map out every screen and tap in your app’s main workflow. Find at least one way to reduce the number of steps needed to complete a common task. Implement the improvement.
Marketing Page: Write a compelling explanation of your app: what problem it solves, who it’s for, and why they’d want it. Include screenshots. If you can’t explain it clearly, the app needs work.
UI Style Exploration: Create 2–3 visual variations of your main screen. Show them to at least 2 people, get their preference, and keep the winner. Document what you tried and why you chose what you chose.

Development:

Error State Design: Identify 3 things that could go wrong when someone uses your app (no network, invalid input, empty data, missing permissions). Design and implement graceful handling for each—no crashes, no blank screens.
Deploy Your App: Deploy your app and any backend it requires so that it can run on another person’s device—not just your emulator. Document your deployment setup and have at least one other person successfully run your app.
App Store Submission: Submit your app to the Google Play Store or Apple App Store. Navigate the store requirements: icons, descriptions, screenshots, content ratings, and review process.

Evaluation:

Security Audit: Review your app for common security issues: hardcoded secrets, missing input validation, cleartext network traffic, sensitive data in logs, or insecure storage. Identify and fix at least 3 issues.
Outside Tester: Have someone outside CS 124 try your app without any instructions from you. Document what happened—where they got stuck, what they misunderstood, what surprised you. Implement at least one change based on what you observed.
Repeatability Test: Have 3 different people complete the same core task in your app. Document where their experiences diverged. Fix the inconsistencies so the experience is reliable.
Staff Feedback Session: Show your working app to a CS 124 staff member during section. Get their suggestions. Implement at least one change based on their feedback.
Accessibility Review: Test your app with TalkBack (Android’s screen reader) enabled. Navigate your main workflow using only TalkBack gestures. Fix at least 3 accessibility issues.

Time-on-Task Grading
Time-on-Task Grading

When every student builds a unique app, traditional autograding is impossible—there are no shared test suites to run. Manual rubric-based grading is infeasible at our scale and inevitably subjective for creative projects. Instead, we grade the project on estimated effort using AI coding agent session logs.

Claude Code records detailed session logs (timestamped JSONL files) that students commit alongside their code. An LLM (GPT-52) reads each week’s session transcript and estimates active development hours—time the student spent thinking, prompting, and reviewing, not idle wall-clock time. Each estimate averages three independent samples and is constrained by computed active time as a sanity check.

Students earn full project credit by accumulating 20 hours of estimated effort across the semester, with a target pace of roughly 4 hours per week and a cap of 8 credited hours per week. The weekly cap discourages cramming and encourages sustained engagement. Students control the process: push logs, request a new estimate, and immediately see a bar chart of weekly hours with target and cap lines, plus a history of estimation runs. The grading criteria are completely transparent—invest the time, earn the credit.

Learning objectives for agentic development:

Communicating effectively with a coding agent to describe desired behavior
Evaluating artifacts produced by agents for correctness and usability
Describing changes clearly so agents can implement them correctly
Reviewing and adjusting agent plans before execution
Creating effective environments for agents to work autonomously
Knowing when to intervene and when to let the agent proceed

Integration with Classical Programming
Integration with Classical Programming

Agentic development doesn’t replace classical programming—it complements it. We weight proctored quizzes at 70% of the grade, taken without AI in a computer-based testing facility. Across 14 weekly quizzes, students complete 36 proctored programming challenges—far more proctored coding than most CS1 courses require. The project is 20% of the grade and is where students practice agentic development. This split lets us teach both classical programming and agentic development without one undermining the other. Computational thinking unites the two—whether students are writing code themselves or directing an agent, they need to decompose problems, think algorithmically, and reason about correctness.

For the full story, see Geoff’s presentation to CS 124 students at the end of the Fall 2025 semester, and a follow-up talk on using and teaching coding agents.

No Lectures
No Lectures

CS 124 doesn’t hold lectures. Instead, students work through daily interactive lessons that mix text, runnable code examples, interactive walkthroughs—more on those below—short videos, and practice problems. Students engage at their own pace, which matters a lot when you have a wide range of prior experience in the room—from students who’ve never written a line of code to those with years of experience.

An interactive lesson on Java functions

Daily engagement also achieves spaced repetition naturally. Rather than cramming before a midterm, students return to new material every day. Because students move at their own pace, they can slow down on topics they find confusing and spend more time with the material that challenges them. We’ve also built up explanations from multiple instructors, so if one explanation doesn’t click, there’s usually another that might.

Frequent Small Assessment
Frequent Small Assessment

CS 124 has no final exam, no midterm, and no high-stakes assessments. Instead, students complete a daily homework problem and take a weekly 50-minute proctored quiz in a dedicated computer-based testing facility. Each quiz is worth only about 5% of the grade. Everything is autograded with immediate feedback, and students get unlimited attempts on programming problems—until the deadline or quiz timer runs out.

We’ve found that frequent small assessment may be the single most important component of student success. Students study regularly, we catch struggles early and reach out when someone does poorly on a quiz, and the low stakes reduce anxiety. Frequent data points also enable learning-focused policies that would be impossible with high-stakes exams: dropping lowest scores, allowing quiz retakes where students return to questions from previous weeks, and catch-up grading where doing better on a later quiz raises earlier scores.

Counterintuitively, this is also more rigorous. In Fall 2017, students wrote code in a proctored setting exactly once—on a paper final exam. Now they complete multiple autograded programming challenges every week, totaling 15 hours of proctored assessment per semester versus 3–4 previously.

For more on this approach, see Geoff’s talk on frequent assessment.

Interactive Walkthroughs
Interactive Walkthroughs

One of our more distinctive innovations is what we call interactive walkthroughs—recorded code editing sessions that students can replay and interact with. These aren’t videos. They’re actual editor replays: students see code being written character by character, can pause and edit the code themselves, and resume the walkthrough. It’s closer to watching someone code live, except you can rewind, speed up, and experiment along the way. You can see examples at learncs.online.

A student interacting with a walkthrough

An instructor recording a walkthrough

Solution-Generated Autograding
Solution-Generated Autograding

Frequent assessment creates a demand for problems. We needed a way to author them fast and accurately.

The insight behind our autograder, Questioner, is that when autograding, the solution is known. This is fundamentally different from software testing, where only the desired behavior is known. So instead of maintaining three sources of truth—a description, a solution, and tests—the author provides just a description and a reference solution, and Questioner generates and validates the testing strategy automatically using source code mutation.

A student solving a programming exercise

The old process took hours per problem with unknown accuracy. The new process produces several problems per hour with validated accuracy. We’ve authored over 700 problems since Fall 2020 across multiple question types—code writing, debugging, tracing, and conceptual questions. Questioner also evaluates code quality—not just “does it work?” but “is it good?”—giving students instant feedback on complexity, style, and efficiency.

The same mutation engine powers debugging exercises. We mutate correct student submissions to create buggy versions, and students must find and fix the bug without rewriting. This forces them to read others’ code and exposes them to different solution approaches.

Tutoring at Scale
Tutoring at Scale

Our tutoring model is built around peer tutors—recent CS 124 graduates who took the course themselves. They remember what was hard, they know the material, and they’re motivated to help.

The core of the system is an online-first tutoring platform that provides immediate 1-on-1 support throughout the day. When a student has a question—at 9 AM or 9 PM—a tutor is usually available within minutes. No waiting for office hours, no standing in line.

Student-tutor online tutoring interaction flow

Staff are organized into tiers: volunteer assistants gaining experience, paid associate mentors, TAs, and head TAs. When a student struggles on a quiz, tutors proactively reach out to offer support. We’re not waiting for students to come to us.

LLMs Across the Platform
LLMs Across the Platform

Beyond the project, LLMs are woven throughout CS 124 to solve specific educational problems that arise when teaching 2,000+ students.

AI Teaching Assistant. Students need help outside office hours, and even with a large staff, 2,000+ students can’t all get 1-on-1 time whenever they need it. A chat assistant answers questions using course content while maintaining academic integrity guardrails—it won’t solve homework problems. It uses retrieval-augmented generation (RAG) to search transcribed lectures and walkthroughs, so answers stay grounded in what’s actually been taught rather than generic internet knowledge.

Automatic Transcription. Hundreds of hours of walkthrough recordings and videos need to be searchable and accessible. WhisperX runs locally to transcribe all audio with word-level timestamps, enabling both the search pipeline and student-facing transcripts and captions. This makes the entire content library accessible to students who prefer reading, need captions, or want to search for a specific topic across all recordings.

Semantic Search. With a large and growing content library, students and the AI assistant need to find relevant material quickly. Transcripts are chunked, embedded, and stored in a vector database for hybrid semantic and keyword search across all course content. A student asking “how do I use recursion with linked lists?” gets pointed to the right walkthrough segment, not just a keyword match.

Pitch Practice Feedback. Students preparing elevator pitches for their project ideas need a way to practice without requiring staff time for every attempt. They record themselves, get an automatic transcription, and receive encouraging feedback on clarity, delivery, and timing—letting them iterate before presenting to real people.

Submission Validation. Hundreds of project plan submissions need basic quality screening. An LLM validates that submitted plans are genuine implementation plans rather than placeholder text, providing instant feedback so students can fix issues immediately rather than waiting for manual review.

Links and Resources
Links and Resources

The Educational Engineer—overview talk at the University of Sydney
learncs.online—the platform powering CS 124’s daily lessons, homework, and playgrounds
Interactive walkthroughs—examples of our recorded code editing sessions
CS 124’s AI page—our statement on AI for students, with FAQs
Geoff’s projects—the courseware behind CS 124
Talk on frequent assessment
Fall 2025 all-student meeting—explaining the shift to My Project
Using and teaching coding agents—a day with Claude Code
Geoff Challen’s website

CS 124 for Educators

Teaching Conversational ProgrammingTeaching Conversational Programming

My ProjectMy Project

Project TasksProject Tasks

Time-on-Task GradingTime-on-Task Grading

Integration with Classical ProgrammingIntegration with Classical Programming

No LecturesNo Lectures

Frequent Small AssessmentFrequent Small Assessment

Interactive WalkthroughsInteractive Walkthroughs

Solution-Generated AutogradingSolution-Generated Autograding

Tutoring at ScaleTutoring at Scale

LLMs Across the PlatformLLMs Across the Platform

Links and ResourcesLinks and Resources