Skip to content

Session 4: Reproducing and Fixing Intermittent Failures

Synopsis

Addresses flaky bugs caused by timing, networking, and environmental variation, with methods for controlled reproduction and systematic diagnosis.

Session Content

Session 4: Reproducing and Fixing Intermittent Failures

Duration: ~45 minutes
Audience: Python developers with basic programming knowledge learning MicroPython on Raspberry Pi Pico 2 W
Format: Theory + hands-on exercises


Session Goals

By the end of this session, learners will be able to:

  • Recognize common causes of intermittent failures in embedded MicroPython programs
  • Reproduce flaky behavior in a controlled way
  • Use logging, timing, and assertions to isolate problems
  • Fix race-like behavior caused by timing, blocking code, and unstable inputs
  • Apply practical debugging techniques to Pico 2 W IoT and hardware projects

Prerequisites

  • Raspberry Pi Pico 2 W
  • USB cable
  • Thonny IDE installed
  • Basic familiarity with:
  • variables, functions, loops, and exceptions in Python
  • uploading and running MicroPython code on Pico
  • Optional hardware for exercises:
  • built-in LED
  • push button
  • external LED + 220Ω resistor
  • breadboard and jumper wires

Development Environment Setup

Thonny Setup

  1. Install Thonny on your computer.
  2. Connect the Raspberry Pi Pico 2 W via USB.
  3. In Thonny:
  4. Go to Tools → Options → Interpreter
  5. Select MicroPython (Raspberry Pi Pico)
  6. Choose the correct serial port for the Pico
  7. Confirm the REPL works by typing:
print("Hello from Pico")

Expected output:

Hello from Pico
  • Save code to the Pico as main.py for auto-run on boot
  • Use Ctrl+C in Thonny REPL to stop a running program
  • Use Ctrl+D to soft reboot the Pico
  • Add clear debug prints when testing intermittent behavior

Session Outline

  1. What intermittent failures look like
  2. Common embedded causes
  3. Reproducing flaky behavior reliably
  4. Debugging tools and strategies
  5. Hands-on lab: button-triggered LED failure
  6. Fixing the issue with debouncing and timing safeguards
  7. Wrap-up and best practices

1) Theory: What Are Intermittent Failures?

Intermittent failures are problems that do not happen every time. They may appear:

  • only on some button presses
  • only after running for a while
  • only when power is unstable
  • only when sensor readings change quickly
  • only when code blocks too long

Typical Symptoms

  • LED sometimes does not turn on
  • sensor values occasionally jump to invalid numbers
  • Wi-Fi reconnects unpredictably
  • a button works sometimes and double-triggers
  • program freezes after a few minutes

Why They Matter

Intermittent bugs are hard to reproduce, which makes them easy to misdiagnose. In embedded systems, timing and hardware interactions often expose bugs that never show up in desktop Python.


2) Common Causes in MicroPython on Pico 2 W

A. Button Bounce

Mechanical buttons do not produce a single clean transition. One press may create multiple rapid signals.

B. Timing Issues

Code that depends on exact timing may fail if: - sleep() is too short or too long - work inside a loop blocks other tasks - a sensor needs stabilization time

C. Shared State Bugs

Variables changed from multiple places can create inconsistent behavior.

D. Weak Hardware Connections

Loose wires, poor grounding, or unstable power can cause random failures.

E. Network Instability

For IoT projects, Wi-Fi drops or slow responses can look like application bugs.

F. Resource Exhaustion

Creating too many objects, opening files repeatedly, or using memory carelessly can cause strange behavior over time.


3) Reproducing Intermittent Failures

To fix a flaky issue, first make it easier to reproduce.

Strategies

  • Reduce the problem to the smallest possible example
  • Add timestamps to events
  • Log state changes
  • Trigger the issue repeatedly in a loop
  • Use artificial delays to expose race-like timing issues
  • Test one hardware input at a time

Example Debug Logging Pattern

from machine import Pin
from time import ticks_ms, sleep_ms

led = Pin(25, Pin.OUT)

while True:
    print("t=", ticks_ms(), "LED ON")
    led.value(1)
    sleep_ms(500)

    print("t=", ticks_ms(), "LED OFF")
    led.value(0)
    sleep_ms(500)

Example output:

t= 1024 LED ON
t= 1526 LED OFF
t= 2027 LED ON
t= 2529 LED OFF

4) Debugging Tools and Strategies

Use Print Statements Wisely

Print: - input values - state changes - timestamps - error messages

Avoid printing in very tight loops unless needed, because it can slow the program and affect timing.

Use Assertions

Assertions can catch impossible states early.

def set_brightness(level):
    assert 0 <= level <= 100

Use Small Test Programs

Instead of testing the full application, isolate: - button input - LED output - sensor reading - Wi-Fi connection

Use Time-Based Measurements

MicroPython provides ticks_ms() for measuring elapsed time.

from time import ticks_ms, ticks_diff

start = ticks_ms()
# do something
elapsed = ticks_diff(ticks_ms(), start)
print("Elapsed:", elapsed, "ms")

Hands-On Exercise 1: Reproduce a Flaky Button Toggle

Goal

Create a program that intentionally behaves badly when a button is pressed, then observe intermittent failure caused by button bounce.

Hardware

  • Pico 2 W
  • push button
  • built-in LED or external LED
  • jumper wires
  • breadboard

Wiring

Option A: Use the built-in LED

No external LED wiring needed.

Button Wiring

Connect:

  • one side of button → GP14
  • other side of button → GND

Use internal pull-up resistor in software.


Faulty Code: No Debounce

Save as main.py:

from machine import Pin
from time import sleep

# Built-in LED on Pico boards is typically connected to GP25
led = Pin(25, Pin.OUT)

# Button connected to GP14 and GND
button = Pin(14, Pin.IN, Pin.PULL_UP)

# Track LED state
led_state = 0

print("Press the button. Notice that the LED may toggle multiple times per press.")

while True:
    # Button is active-low because we use PULL_UP
    if button.value() == 0:
        led_state = 1 - led_state
        led.value(led_state)
        print("Button detected, LED state =", led_state)
        sleep(0.05)  # Small delay, but not enough to fix bounce
    sleep(0.01)

Expected Behavior

Sometimes one press toggles the LED once. Sometimes it toggles multiple times.

Example Output

Press the button. Notice that the LED may toggle multiple times per press.
Button detected, LED state = 1
Button detected, LED state = 0
Button detected, LED state = 1

Observation

The LED may appear to flicker or end in the wrong state after a single press.


Discussion

This is not a software logic bug in the normal sense. The button is physically bouncing, producing several quick transitions. Your code is responding to each one.


Hands-On Exercise 2: Add Debouncing to Fix the Failure

Goal

Modify the button program so each press is counted only once.

Debounced Code

from machine import Pin
from time import sleep_ms, ticks_ms, ticks_diff

# Built-in LED
led = Pin(25, Pin.OUT)

# Button with internal pull-up
button = Pin(14, Pin.IN, Pin.PULL_UP)

led_state = 0
last_press_time = 0
debounce_ms = 200

print("Press the button. Each press should toggle the LED once.")

while True:
    now = ticks_ms()

    # Detect active-low button press
    if button.value() == 0:
        # Ignore presses that happen too soon after the last one
        if ticks_diff(now, last_press_time) > debounce_ms:
            led_state = 1 - led_state
            led.value(led_state)
            last_press_time = now
            print("Button accepted at", now, "ms, LED state =", led_state)

    sleep_ms(10)

Example Output

Press the button. Each press should toggle the LED once.
Button accepted at 15342 ms, LED state = 1
Button accepted at 16831 ms, LED state = 0
Button accepted at 18119 ms, LED state = 1

Notes

  • debounce_ms blocks repeated triggers within the debounce window
  • ticks_ms() and ticks_diff() handle wraparound safely

Intermittent problems often come from assumptions like:

  • “This sensor will always be ready immediately.”
  • “This Wi-Fi request will always return quickly.”
  • “This loop will always run at the same speed.”

In microcontroller projects, those assumptions are fragile.

Example: Blocking Code

If one part of your loop takes too long, you may miss input events.

from machine import Pin
from time import sleep

led = Pin(25, Pin.OUT)

while True:
    led.toggle()
    sleep(2)   # Very long delay

This is simple, but it prevents the system from reacting quickly to inputs.


Hands-On Exercise 3: Observe Timing Drift and Missed Events

Goal

See how long blocking delays can make a program appear unreliable.

Code

from machine import Pin
from time import sleep_ms, ticks_ms

led = Pin(25, Pin.OUT)
button = Pin(14, Pin.IN, Pin.PULL_UP)

print("Press the button while the LED blinks slowly.")

while True:
    led.toggle()
    print("LED toggled at", ticks_ms(), "ms")

    # Blocking delay makes the system slow to react to button presses
    sleep_ms(1000)

    if button.value() == 0:
        print("Button was pressed, but maybe not noticed immediately.")

What to Observe

  • Button presses may feel delayed
  • Short presses may be missed
  • The main loop is too slow to react reliably

Discussion

A common fix is to structure code so input checks happen frequently and long tasks are split into smaller steps.


6) Best-Practice Fixes for Intermittent Failures

A. Debounce Inputs

Use time-based filtering for buttons and switches.

B. Avoid Long Blocking Delays

Prefer short sleeps or non-blocking state machines.

C. Add State Tracking

Detect edges or transitions rather than repeatedly reacting to a steady input level.

D. Validate Inputs

Reject invalid sensor readings or out-of-range values.

E. Use Retry Logic for Network Tasks

Wi-Fi and cloud services fail occasionally; retry cleanly.

F. Keep Hardware Wiring Simple

Check: - correct GPIO pins - common ground - secure connections - correct resistor values


Hands-On Exercise 4: Fix a Flaky LED Sensor Demo with State Tracking

Goal

Use a simulated input pattern to practice detecting transitions cleanly.

Code

from machine import Pin
from time import sleep_ms

led = Pin(25, Pin.OUT)

# Simulated signal pattern with repeated 1s and 0s
signal = [0, 0, 1, 1, 1, 0, 0, 1, 0, 0]

last_value = signal[0]

print("Detecting only changes in input state...")

for value in signal:
    if value != last_value:
        print("State changed from", last_value, "to", value)
        led.value(value)
        last_value = value
    else:
        print("No change:", value)

    sleep_ms(200)

print("Done.")

Example Output

Detecting only changes in input state...
No change: 0
No change: 0
State changed from 0 to 1
No change: 1
No change: 1
State changed from 1 to 0
No change: 0
State changed from 0 to 1
State changed from 1 to 0
No change: 0
Done.

Learning Point

This pattern mirrors how real hardware state should often be handled: react to transitions, not repeated identical values.


7) Mini IoT Example: Robust Wi-Fi Retry Loop

Intermittent failures are common in network code too. A clean retry loop is essential.

Example: Wi-Fi Connection with Retry

import network
from time import sleep

SSID = "YOUR_WIFI_NAME"
PASSWORD = "YOUR_WIFI_PASSWORD"

wlan = network.WLAN(network.STA_IF)
wlan.active(True)

def connect_wifi(max_attempts=5):
    for attempt in range(1, max_attempts + 1):
        if wlan.isconnected():
            print("Already connected:", wlan.ifconfig())
            return True

        print("Connecting attempt", attempt, "of", max_attempts)
        wlan.connect(SSID, PASSWORD)

        for _ in range(10):
            if wlan.isconnected():
                print("Connected:", wlan.ifconfig())
                return True
            sleep(1)

        print("Attempt", attempt, "failed")

    print("Could not connect to Wi-Fi")
    return False

connect_wifi()

Example Output

Connecting attempt 1 of 5
Attempt 1 failed
Connecting attempt 2 of 5
Connected: ('192.168.1.42', '255.255.255.0', '192.168.1.1', '8.8.8.8')

8) Troubleshooting Checklist

When a failure seems random, check:

  • Is the input bouncing?
  • Are wires loose?
  • Is the correct GPIO pin used?
  • Is there a common ground?
  • Is the code blocking too long?
  • Are retries missing for Wi-Fi?
  • Is the program printing enough diagnostic information?
  • Is the power supply stable?
  • Are there race-like effects between repeated events?

9) Practice Task

Write a small program that:

  1. Reads a button on GP14
  2. Toggles the built-in LED on each valid press
  3. Prints a timestamp for every accepted press
  4. Ignores repeated presses within 150 ms

Starter Template

from machine import Pin
from time import sleep_ms, ticks_ms, ticks_diff

led = Pin(25, Pin.OUT)
button = Pin(14, Pin.IN, Pin.PULL_UP)

# TODO: initialize state variables
# TODO: implement debounce logic
# TODO: print accepted press times

while True:
    # TODO: detect button press
    sleep_ms(10)

10) Session Recap

Key Takeaways

  • Intermittent failures are often caused by timing, bounce, wiring, or network instability
  • The first step is to reproduce the failure consistently
  • Logging and timestamps make hidden behavior visible
  • Debouncing and state tracking fix many hardware input issues
  • Blocking code can make otherwise healthy programs appear unreliable
  • Retry logic is essential for Wi-Fi and IoT applications

11) Suggested Homework

Homework A

Modify the button debounce example so: - a short press toggles the LED - a long press prints LONG PRESS

Homework B

Create a Wi-Fi reconnect loop that: - checks connection status every 5 seconds - attempts reconnection if disconnected - prints each state change only once


  • machine.Pin
  • time.sleep_ms()
  • time.ticks_ms()
  • time.ticks_diff()
  • network.WLAN()

Reference Code Summary

Debounced Button Behavior

  • Reads button
  • Waits for a valid press interval
  • Toggles LED once per physical press

Retry-Based Wi-Fi Behavior

  • Activates station mode
  • Attempts connection repeatedly
  • Reports success or failure clearly

Expected Learning Outcome

A learner completing this session can identify and fix many real-world “it works most of the time” problems in Pico 2 W projects, especially those involving buttons, LEDs, timing, and Wi-Fi connectivity.


Back to Chapter | Back to Master Plan | Previous Session