Race Conditions: When Code Runs Faster Than You Think

The first time I encountered a race condition in production, it cost us eight hours of engineering time and a very uncomfortable call with a client. The dashboard was showing negative inventory counts. Not wrong counts — negative ones. The kind that make your stomach drop.

The frustrating part? The code looked fine. Both functions were correct in isolation. The bug only appeared when two users clicked "Purchase" at almost exactly the same moment. I've since come to think of race conditions less as bugs and more as timing landmines buried in perfectly reasonable code.

Section 01 So what actually is a race condition?

At its core, a race condition happens when the correctness of your program depends on the order in which threads execute — and you haven't guaranteed that order. Two threads are racing to access the same shared data, and whoever wins changes the outcome.

The deeper problem is that thread scheduling is handled by the operating system, which doesn't care about your application's logic. It can pause your thread mid-instruction and give CPU time to something else. That pause can happen at the worst possible moment.

"The bug isn't in any single line of code. It lives in the gap between two lines — in the moment when another thread slips through." — Every engineer who's debugged a race condition at 2am

🏠 The bank account you already understand

Your wife Alice has ₹1,000 in her account. You're at an ATM withdrawing ₹700. She's simultaneously doing an online transfer of ₹700. Both systems check the balance, both see ₹1,000, both think "yep, sufficient funds," and both proceed. The bank has now allowed ₹1,400 to leave a ₹1,000 account.

Nobody wrote buggy code. The ATM logic is correct. The transfer logic is correct. The problem is they ran concurrently against shared state without coordinating. That's a race condition.

Section 02 The classic example — and why it's deceptive

Let me show you the simplest possible version. Two threads, one counter:

Python

import threading

counter = 0

def increment():
    global counter
    for _ in range(100_000):
        counter += 1   # Looks like one step. It isn't.

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start(); t2.start()
t1.join();  t2.join()

print(counter)
# What you expect:  200,000
# What you get:     147,382  (different every run)

Here's what trips people up: counter += 1 looks atomic. It's one line. But the CPU doesn't execute it in one step. It expands to three: read the current value from memory, add 1 in a register, then write the result back. The OS can pause your thread between any of those three steps.

When Thread B reads the value before Thread A has written its result back, Thread B is working from stale data. Both threads end up writing the same value. One increment vanishes into thin air.

Section 03 Let's slow it down: a frame-by-frame breakdown

Say counter = 5 at this moment. Here's one way two threads can interleave — and lose an increment:

Thread A	Step	Thread B
Reads counter → gets 5	1
	2	Reads counter → gets 5
Adds 1, holds 6 in register	3
Writes 6 back to counter	4
	5	Adds 1 to its stale 5, holds 6
Expected: counter = 7	6	Writes 6 → counter stays 6 ✗

⚠ Both threads read 5, both add 1, both write 6. Thread A's write just got silently overwritten. One full increment is gone — no error thrown, no exception raised, nothing logged. This is what makes race conditions so nasty: they fail quietly.

Section 04 Where I've actually seen this hurt real systems

Textbook examples are fine, but let me give you the situations where this actually bites production systems.

🎫 Ticketing: the oversell nightmare

A concert drops and 50,000 people hit the site at once. The code checks seats_available > 0, then books the seat. With 1 seat left and 200 concurrent requests, many of them pass the check before any of them decrements. You've now sold 40 tickets for 1 seat. The refund emails are not fun to write.

This isn't hypothetical. Major ticketing platforms have had this exact failure during high-demand releases. The fix isn't a better check — it's making the check-and-decrement atomic.

📄 Microservices: the silent config overwrite

Two services read a shared config file, each modify a different key in memory, each write the full file back. Service B's write stomps on Service A's changes. Neither service errors. The config just quietly reverts. You spend an hour wondering why a feature you deployed last week keeps turning itself off.

⚡ React: the "why is my count wrong" problem

You click a button 5 times fast. Each click triggers an async setState that reads this.state.count, adds 1, and writes back. By the time the second click fires, the first setState hasn't flushed. Both read the same stale value. You clicked 5 times, count went up by 2. This is why setState(prev => prev + 1) exists — it reads from the queued state, not a snapshot that might already be stale.

Section 05 The check-then-act trap

Most race conditions share a common shape called check-then-act. You check a condition, then act on it — but the condition can change between the check and the act. Here's what that looks like in a Node.js order handler:

JavaScript — The Broken Version

async function purchaseItem(itemId, userId) {
  // Step 1: check if stock exists
  const item = await db.query(
    'SELECT quantity FROM items WHERE id = ?', [itemId]
  );

  if (item.quantity > 0) {
    // ⚠️  Right here. Another 400 requests just passed this check.
    //     They all read quantity = 1. They all think they're fine.

    await db.query(
      'UPDATE items SET quantity = quantity - 1 WHERE id = ?', [itemId]
    );
    await db.query(
      'INSERT INTO orders (item_id, user_id) VALUES (?, ?)',
      [itemId, userId]
    );
  }
}

The check and the decrement are two separate database round-trips. In the gap between them, every other request in your queue can also pass the check. The SELECT and the UPDATE are not atomic. That gap is the bug.

Section 06 How to actually fix this

The good news: once you understand race conditions, the fixes are conceptually simple. You're always trying to achieve one of three things — eliminate sharing, eliminate mutation, or make the critical section atomic.

01Mutex / Lock

Only one thread can hold the lock at a time. Everything else waits. Simple, reliable, but can become a bottleneck under high contention.

02Atomic Operations

Hardware instructions that complete without interruption. Compare-and-swap, fetch-and-add. Zero overhead — the CPU itself guarantees atomicity.

03Database Transactions

ACID guarantees are exactly for this. Wrap the read and write in one transaction with the right isolation level and let the DB engine coordinate.

04Optimistic Locking

Read with a version stamp. Update only if the version matches — retry on conflict. Excellent for low-contention workloads that need high throughput.

05Message Queues

Route all writes through a single-consumer queue. No two handlers can touch the same record simultaneously — the queue is your serializer.

06Immutable Data

Don't mutate shared objects — create new ones. If nothing can be changed in place, there's nothing to race over. Functional programming's cleanest win.

Here's the fixed version of the order handler — making the check and decrement a single atomic SQL operation:

JavaScript — Fixed

async function purchaseItem(itemId, userId) {
  // The WHERE clause and SET happen as one atomic operation inside the DB.
  // 400 concurrent requests → exactly 1 succeeds if quantity is 1.
  const result = await db.query(`
    UPDATE items
    SET    quantity = quantity - 1
    WHERE  id = ? AND quantity > 0
  `, [itemId]);

  if (result.affectedRows === 0) {
    throw new Error('Out of stock');   // Clean failure, no oversell
  }

  await db.query(
    'INSERT INTO orders (item_id, user_id) VALUES (?, ?)',
    [itemId, userId]
  );
  // ✅ No gap. No race. If affectedRows is 0, we stop.
}

✓ The database engine executes the WHERE quantity > 0 check and the SET quantity = quantity - 1 as a single indivisible unit. It doesn't matter how many threads fire this simultaneously — the database serialises the writes for you. Exactly one request gets the last item.

And the Python fix — wrapping the critical section in a Lock:

Python — Fixed with Lock

import threading

counter = 0
lock    = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:
            counter += 1   # Only one thread in here at a time.

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join();  t2.join()

print(counter)   # 200,000. Every single time. ✅

Section 07 Tools that find races you can't reproduce manually

The hardest thing about race conditions is that they don't show up in normal testing. They require specific timing — the exact wrong interleaving. These tools force that timing to happen:

Tool	Language	How it works
`ThreadSanitizer (TSan)`	C, C++, Go, Rust	Instruments binary at compile time; reports concurrent access at runtime with near-zero false positives
`Helgrind / DRD`	C, C++	Valgrind plugins that detect POSIX thread misuse and lock-ordering violations
`Java PathFinder`	Java	Model checker — systematically explores every possible thread interleaving to find races
`go run -race`	Go	Built into the standard toolchain. Just add `-race` and run your tests. No setup needed.
`stress + slow tests`	Python	Run concurrent tests with artificial scheduling delays to surface timing-sensitive bugs

● ○ ●

Final thoughts The real lesson from three days of chasing ghosts

Here's what those three days debugging negative inventory actually taught me: race conditions aren't a sign of careless code. They're a structural problem. The code I wrote was correct. It just wasn't concurrent-correct — and those are different things.

The mental model I use now: every time I write code that reads shared state and then acts on it, I ask "what happens if another thread runs this same code between my read and my act?" If the answer is "bad things," I need to make those two steps atomic.

Once you develop that instinct, race conditions stop being mysterious. They're just a class of bugs with a clear cause and a finite set of well-understood fixes. The checklist below is what I keep on my wall:

Defensive Concurrency — Personal Checklist

✓

Any shared mutable state needs an explicit owner, a lock, or to not exist at all.

✓

Every check-then-act sequence must be atomic. No exceptions.

✓

Default to immutability. Shared + mutable is the dangerous combination — remove either one.

✓

If atomicity is hard, serialize through a queue and make it someone else's problem (a good kind).

✓

Reach for std::atomic, AtomicInteger, or sync/atomic before a full mutex.

✓

Ship nothing concurrent without a race detector pass. go run -race takes 2 seconds.

If this saved you from a Tuesday afternoon debugging session, I'll consider it worth writing.

Next up: Deadlocks — when the fix for a race condition introduces a new, equally annoying problem.

Race Conditions: When Code Runs Faster Than You Think

Race Conditions —
When Threads Collide

Section 01 So what actually is a race condition?

Section 02 The classic example — and why it's deceptive

Section 03 Let's slow it down: a frame-by-frame breakdown

Section 04 Where I've actually seen this hurt real systems

Section 05 The check-then-act trap

Section 06 How to actually fix this

Section 07 Tools that find races you can't reproduce manually

Final thoughts The real lesson from three days of chasing ghosts

Post a Comment

Deepinder Goyal Temple Device: What Is It, Price, How It Works & Where to Buy

Deepinder Goyal Temple Device: What Is It, Price, How It Works & Where to Buy

Made with Love by

Contact form

Race Conditions: When Code Runs Faster Than You Think

Section 01 So what actually is a race condition?

Section 02 The classic example — and why it's deceptive

Section 03 Let's slow it down: a frame-by-frame breakdown

Section 04 Where I've actually seen this hurt real systems

Section 05 The check-then-act trap

Section 06 How to actually fix this

Section 07 Tools that find races you can't reproduce manually

Final thoughts The real lesson from three days of chasing ghosts

You may like these posts

Post a Comment

Contact form