Session 4: Reproducing and Fixing Intermittent Failures
Synopsis
Addresses flaky bugs caused by timing, networking, and environmental variation, with methods for controlled reproduction and systematic diagnosis.
Session Content
Session 4: Reproducing and Fixing Intermittent Failures
Duration: ~45 minutes
Audience: Python developers with basic programming knowledge learning MicroPython on Raspberry Pi Pico 2 W
Format: Theory + hands-on exercises
Session Goals
By the end of this session, learners will be able to:
- Recognize common causes of intermittent failures in embedded MicroPython programs
- Reproduce flaky behavior in a controlled way
- Use logging, timing, and assertions to isolate problems
- Fix race-like behavior caused by timing, blocking code, and unstable inputs
- Apply practical debugging techniques to Pico 2 W IoT and hardware projects
Prerequisites
- Raspberry Pi Pico 2 W
- USB cable
- Thonny IDE installed
- Basic familiarity with:
- variables, functions, loops, and exceptions in Python
- uploading and running MicroPython code on Pico
- Optional hardware for exercises:
- built-in LED
- push button
- external LED + 220Ω resistor
- breadboard and jumper wires
Development Environment Setup
Thonny Setup
- Install Thonny on your computer.
- Connect the Raspberry Pi Pico 2 W via USB.
- In Thonny:
- Go to Tools → Options → Interpreter
- Select MicroPython (Raspberry Pi Pico)
- Choose the correct serial port for the Pico
- Confirm the REPL works by typing:
print("Hello from Pico")
Expected output:
Hello from Pico
Recommended Workflow
- Save code to the Pico as
main.pyfor auto-run on boot - Use
Ctrl+Cin Thonny REPL to stop a running program - Use
Ctrl+Dto soft reboot the Pico - Add clear debug prints when testing intermittent behavior
Session Outline
- What intermittent failures look like
- Common embedded causes
- Reproducing flaky behavior reliably
- Debugging tools and strategies
- Hands-on lab: button-triggered LED failure
- Fixing the issue with debouncing and timing safeguards
- Wrap-up and best practices
1) Theory: What Are Intermittent Failures?
Intermittent failures are problems that do not happen every time. They may appear:
- only on some button presses
- only after running for a while
- only when power is unstable
- only when sensor readings change quickly
- only when code blocks too long
Typical Symptoms
- LED sometimes does not turn on
- sensor values occasionally jump to invalid numbers
- Wi-Fi reconnects unpredictably
- a button works sometimes and double-triggers
- program freezes after a few minutes
Why They Matter
Intermittent bugs are hard to reproduce, which makes them easy to misdiagnose. In embedded systems, timing and hardware interactions often expose bugs that never show up in desktop Python.
2) Common Causes in MicroPython on Pico 2 W
A. Button Bounce
Mechanical buttons do not produce a single clean transition. One press may create multiple rapid signals.
B. Timing Issues
Code that depends on exact timing may fail if:
- sleep() is too short or too long
- work inside a loop blocks other tasks
- a sensor needs stabilization time
C. Shared State Bugs
Variables changed from multiple places can create inconsistent behavior.
D. Weak Hardware Connections
Loose wires, poor grounding, or unstable power can cause random failures.
E. Network Instability
For IoT projects, Wi-Fi drops or slow responses can look like application bugs.
F. Resource Exhaustion
Creating too many objects, opening files repeatedly, or using memory carelessly can cause strange behavior over time.
3) Reproducing Intermittent Failures
To fix a flaky issue, first make it easier to reproduce.
Strategies
- Reduce the problem to the smallest possible example
- Add timestamps to events
- Log state changes
- Trigger the issue repeatedly in a loop
- Use artificial delays to expose race-like timing issues
- Test one hardware input at a time
Example Debug Logging Pattern
from machine import Pin
from time import ticks_ms, sleep_ms
led = Pin(25, Pin.OUT)
while True:
print("t=", ticks_ms(), "LED ON")
led.value(1)
sleep_ms(500)
print("t=", ticks_ms(), "LED OFF")
led.value(0)
sleep_ms(500)
Example output:
t= 1024 LED ON
t= 1526 LED OFF
t= 2027 LED ON
t= 2529 LED OFF
4) Debugging Tools and Strategies
Use Print Statements Wisely
Print: - input values - state changes - timestamps - error messages
Avoid printing in very tight loops unless needed, because it can slow the program and affect timing.
Use Assertions
Assertions can catch impossible states early.
def set_brightness(level):
assert 0 <= level <= 100
Use Small Test Programs
Instead of testing the full application, isolate: - button input - LED output - sensor reading - Wi-Fi connection
Use Time-Based Measurements
MicroPython provides ticks_ms() for measuring elapsed time.
from time import ticks_ms, ticks_diff
start = ticks_ms()
# do something
elapsed = ticks_diff(ticks_ms(), start)
print("Elapsed:", elapsed, "ms")
Hands-On Exercise 1: Reproduce a Flaky Button Toggle
Goal
Create a program that intentionally behaves badly when a button is pressed, then observe intermittent failure caused by button bounce.
Hardware
- Pico 2 W
- push button
- built-in LED or external LED
- jumper wires
- breadboard
Wiring
Option A: Use the built-in LED
No external LED wiring needed.
Button Wiring
Connect:
- one side of button →
GP14 - other side of button →
GND
Use internal pull-up resistor in software.
Faulty Code: No Debounce
Save as main.py:
from machine import Pin
from time import sleep
# Built-in LED on Pico boards is typically connected to GP25
led = Pin(25, Pin.OUT)
# Button connected to GP14 and GND
button = Pin(14, Pin.IN, Pin.PULL_UP)
# Track LED state
led_state = 0
print("Press the button. Notice that the LED may toggle multiple times per press.")
while True:
# Button is active-low because we use PULL_UP
if button.value() == 0:
led_state = 1 - led_state
led.value(led_state)
print("Button detected, LED state =", led_state)
sleep(0.05) # Small delay, but not enough to fix bounce
sleep(0.01)
Expected Behavior
Sometimes one press toggles the LED once. Sometimes it toggles multiple times.
Example Output
Press the button. Notice that the LED may toggle multiple times per press.
Button detected, LED state = 1
Button detected, LED state = 0
Button detected, LED state = 1
Observation
The LED may appear to flicker or end in the wrong state after a single press.
Discussion
This is not a software logic bug in the normal sense. The button is physically bouncing, producing several quick transitions. Your code is responding to each one.
Hands-On Exercise 2: Add Debouncing to Fix the Failure
Goal
Modify the button program so each press is counted only once.
Debounced Code
from machine import Pin
from time import sleep_ms, ticks_ms, ticks_diff
# Built-in LED
led = Pin(25, Pin.OUT)
# Button with internal pull-up
button = Pin(14, Pin.IN, Pin.PULL_UP)
led_state = 0
last_press_time = 0
debounce_ms = 200
print("Press the button. Each press should toggle the LED once.")
while True:
now = ticks_ms()
# Detect active-low button press
if button.value() == 0:
# Ignore presses that happen too soon after the last one
if ticks_diff(now, last_press_time) > debounce_ms:
led_state = 1 - led_state
led.value(led_state)
last_press_time = now
print("Button accepted at", now, "ms, LED state =", led_state)
sleep_ms(10)
Example Output
Press the button. Each press should toggle the LED once.
Button accepted at 15342 ms, LED state = 1
Button accepted at 16831 ms, LED state = 0
Button accepted at 18119 ms, LED state = 1
Notes
debounce_msblocks repeated triggers within the debounce windowticks_ms()andticks_diff()handle wraparound safely
5) Theory: Timing-Related Failures
Intermittent problems often come from assumptions like:
- “This sensor will always be ready immediately.”
- “This Wi-Fi request will always return quickly.”
- “This loop will always run at the same speed.”
In microcontroller projects, those assumptions are fragile.
Example: Blocking Code
If one part of your loop takes too long, you may miss input events.
from machine import Pin
from time import sleep
led = Pin(25, Pin.OUT)
while True:
led.toggle()
sleep(2) # Very long delay
This is simple, but it prevents the system from reacting quickly to inputs.
Hands-On Exercise 3: Observe Timing Drift and Missed Events
Goal
See how long blocking delays can make a program appear unreliable.
Code
from machine import Pin
from time import sleep_ms, ticks_ms
led = Pin(25, Pin.OUT)
button = Pin(14, Pin.IN, Pin.PULL_UP)
print("Press the button while the LED blinks slowly.")
while True:
led.toggle()
print("LED toggled at", ticks_ms(), "ms")
# Blocking delay makes the system slow to react to button presses
sleep_ms(1000)
if button.value() == 0:
print("Button was pressed, but maybe not noticed immediately.")
What to Observe
- Button presses may feel delayed
- Short presses may be missed
- The main loop is too slow to react reliably
Discussion
A common fix is to structure code so input checks happen frequently and long tasks are split into smaller steps.
6) Best-Practice Fixes for Intermittent Failures
A. Debounce Inputs
Use time-based filtering for buttons and switches.
B. Avoid Long Blocking Delays
Prefer short sleeps or non-blocking state machines.
C. Add State Tracking
Detect edges or transitions rather than repeatedly reacting to a steady input level.
D. Validate Inputs
Reject invalid sensor readings or out-of-range values.
E. Use Retry Logic for Network Tasks
Wi-Fi and cloud services fail occasionally; retry cleanly.
F. Keep Hardware Wiring Simple
Check: - correct GPIO pins - common ground - secure connections - correct resistor values
Hands-On Exercise 4: Fix a Flaky LED Sensor Demo with State Tracking
Goal
Use a simulated input pattern to practice detecting transitions cleanly.
Code
from machine import Pin
from time import sleep_ms
led = Pin(25, Pin.OUT)
# Simulated signal pattern with repeated 1s and 0s
signal = [0, 0, 1, 1, 1, 0, 0, 1, 0, 0]
last_value = signal[0]
print("Detecting only changes in input state...")
for value in signal:
if value != last_value:
print("State changed from", last_value, "to", value)
led.value(value)
last_value = value
else:
print("No change:", value)
sleep_ms(200)
print("Done.")
Example Output
Detecting only changes in input state...
No change: 0
No change: 0
State changed from 0 to 1
No change: 1
No change: 1
State changed from 1 to 0
No change: 0
State changed from 0 to 1
State changed from 1 to 0
No change: 0
Done.
Learning Point
This pattern mirrors how real hardware state should often be handled: react to transitions, not repeated identical values.
7) Mini IoT Example: Robust Wi-Fi Retry Loop
Intermittent failures are common in network code too. A clean retry loop is essential.
Example: Wi-Fi Connection with Retry
import network
from time import sleep
SSID = "YOUR_WIFI_NAME"
PASSWORD = "YOUR_WIFI_PASSWORD"
wlan = network.WLAN(network.STA_IF)
wlan.active(True)
def connect_wifi(max_attempts=5):
for attempt in range(1, max_attempts + 1):
if wlan.isconnected():
print("Already connected:", wlan.ifconfig())
return True
print("Connecting attempt", attempt, "of", max_attempts)
wlan.connect(SSID, PASSWORD)
for _ in range(10):
if wlan.isconnected():
print("Connected:", wlan.ifconfig())
return True
sleep(1)
print("Attempt", attempt, "failed")
print("Could not connect to Wi-Fi")
return False
connect_wifi()
Example Output
Connecting attempt 1 of 5
Attempt 1 failed
Connecting attempt 2 of 5
Connected: ('192.168.1.42', '255.255.255.0', '192.168.1.1', '8.8.8.8')
8) Troubleshooting Checklist
When a failure seems random, check:
- Is the input bouncing?
- Are wires loose?
- Is the correct GPIO pin used?
- Is there a common ground?
- Is the code blocking too long?
- Are retries missing for Wi-Fi?
- Is the program printing enough diagnostic information?
- Is the power supply stable?
- Are there race-like effects between repeated events?
9) Practice Task
Write a small program that:
- Reads a button on
GP14 - Toggles the built-in LED on each valid press
- Prints a timestamp for every accepted press
- Ignores repeated presses within 150 ms
Starter Template
from machine import Pin
from time import sleep_ms, ticks_ms, ticks_diff
led = Pin(25, Pin.OUT)
button = Pin(14, Pin.IN, Pin.PULL_UP)
# TODO: initialize state variables
# TODO: implement debounce logic
# TODO: print accepted press times
while True:
# TODO: detect button press
sleep_ms(10)
10) Session Recap
Key Takeaways
- Intermittent failures are often caused by timing, bounce, wiring, or network instability
- The first step is to reproduce the failure consistently
- Logging and timestamps make hidden behavior visible
- Debouncing and state tracking fix many hardware input issues
- Blocking code can make otherwise healthy programs appear unreliable
- Retry logic is essential for Wi-Fi and IoT applications
11) Suggested Homework
Homework A
Modify the button debounce example so:
- a short press toggles the LED
- a long press prints LONG PRESS
Homework B
Create a Wi-Fi reconnect loop that: - checks connection status every 5 seconds - attempts reconnection if disconnected - prints each state change only once
Appendix: Recommended MicroPython Functions Used
machine.Pintime.sleep_ms()time.ticks_ms()time.ticks_diff()network.WLAN()
Reference Code Summary
Debounced Button Behavior
- Reads button
- Waits for a valid press interval
- Toggles LED once per physical press
Retry-Based Wi-Fi Behavior
- Activates station mode
- Attempts connection repeatedly
- Reports success or failure clearly
Expected Learning Outcome
A learner completing this session can identify and fix many real-world “it works most of the time” problems in Pico 2 W projects, especially those involving buttons, LEDs, timing, and Wi-Fi connectivity.