/Register

Multicore debugging challenges in Zephyr RTOS: Part 2 – Cache coherency

March 31, 20265 min read
Multicore debugging challenges in Zephyr RTOS: Part 2 – Cache coherency

By Bea Ben Ali

Introduction

In part 1 of our three-part blog series, we discussed how to debug and analyze race conditions.


This second installment focuses on how to detect cache coherency issues, which are another common problem in multicore systems with multiple cores that must play well together.

Before we dive in, let’s first review how caches can be organized in a multicore system.
The diagram below shows how cache memory in a quad-core system is typically divided into L1, L2, and L3.

As developers, we choose which level to use based on the specific needs of our data.
L1 is the fastest but smallest, while L3 is largest and shared across multiple cores.

For this blog, we will focus on a dual-core system as we present ideas on how to trace and handle cache coherency problems when each core holds its own cache.

Debugging cache coherency problems

Cache coherency problems occur when several cores each have their own copy of the same data and access it at the wrong time. If one copy is changed, others may still use stale data, causing unpredictable behavior. Traditional debuggers show only main memory, not stale cached values.

Debugging with printf() can sometimes create “Heisenbugs,” where observing the system changes its timing.
Below is an animation showing how values on two cores diverge when cache is not handled:

Video Player

00:00

00:18

Overcoming cache coherency issues on Zephyr RTOS systems with SystemView with Ozone support

As shown in part 1, we must add instrumentation using SystemView to log system events. Initialize SystemView before adding event macros in key sections.

Enable debugging features in Zephyr

As a starting point, let’s configure Zephyr for multi-core debugging and tracing:

CONFIG_MP_MAX_NUM_CPUS=2

Enable the cache-related macro:

CONFIG_CACHE_MANAGEMENT=y

(Editor’s note: Enabling this config means the kernel handles cache coherency in AMP systems. If missing, use the tips below.)

Instrument the code for tracing

The next code snippet uses atomic operations, which are useful for dual-core systems.
SystemView provides SEGGER_SYSVIEW_Printf() and SEGGER_SYSVIEW_PrintfHost() for tracing without printf().

We create two threads running on separate cores. Shared data is in an external file accessible by both cores.

Attempt 1: Inconsistent shared data

main for core 1

#define STACKSIZE 1024

#

SEGGER_SYSVIEW_PrintfHost("Core1: shared_data = %d", shared_data);

#

k_msleep(10);

Code snippet: main for core 0

#define STACKSIZE 1024

#

SEGGER_SYSVIEW_PrintfHost("Core0: shared_data = %d", shared_data);

#

k_msleep(10);

(Editor’s note: shared_data will be declared extern so both cores can access it.)

Attempt 2: Consistent shared data

main for core 0

#define STACKSIZE 1024

#

SEGGER_SYSVIEW_PrintfHost("Core0: Pre-Increment: %d", atomic_get(&shared_data));

#

atomic_inc(&shared_data);

#

SEGGER_SYSVIEW_PrintfHost("Core0: Post-Increment: %d", atomic_get(&shared_data));

#

k_msleep(10);

Code snippet: main for core 1

#define STACKSIZE 1024

#

SEGGER_SYSVIEW_PrintfHost("Core1: Pre-Increment: %d", atomic_get(&shared_data));

#

atomic_inc(&shared_data);

#

SEGGER_SYSVIEW_PrintfHost("Core1: Post-Increment: %d", atomic_get(&shared_data));

#

k_msleep(10);

Spinlock Code snippet:

static atomic_t lock;

(Editor’s note: Zephyr also provides k_spinlock APIs:

DMB Code snippet:

void core1_thread(void *p1, void *p2, void *p3) {

#

SEGGER_SYSVIEW_PrintfHost("Core1: Pre-Increment: %d", atomic_get(&shared_data));

#

atomic_inc(&shared_data);

#

SEGGER_SYSVIEW_PrintfHost("Core1: Post-Increment: %d", atomic_get(&shared_data));

#

k_msleep(10);

Debugging workflow tips

Step 1: Use SystemView to detect race conditions.
Step 2: Use Ozone to inspect memory, registers, cache control, and coherency.

Zephyr cache APIs

Zephyr provides cache maintenance APIs such as cache_data_invd_range() and cache_data_flush_range().
These work only on Cortex-M7 because it has an L1 cache.

Up next: Inter-core messaging

In the next part of this series, we’ll discuss inter-core messaging.

References

© ThroughPut Marketing 2026All rights reserved
LMR Media & Marketing | LLC dba ThroughPut Marketing | 3104 E Camelback Rd | Unit #7781, Phoenix, AZ 85016