Basics and tools for multi-core debugging

In the past, debugging meant seeking for variables written with wrong values. These days, it’s completely different: for the multi-core systems used nowadays in automotive control units, debugging means managing deadlocks, resource confl icts or timing issues of real-time applications.


By Jens Braunes, PLS             Download PDF version of this article


The paradigm shift and the dramatic increase of complexity represent a big challenge for silicon vendors as well as for tool providers. And they can only master it together. The reason for this is because on-chip debug functions integrated by silicon vendors get fully effective only with powerful software tools which are able to completely utilize them and open a door for the developers for efficient use. If we look at the consumer market, multi-core systems have become mainstream since more than 10 years. But in deeply embedded systems, like motor control, the technology shift towards multi-core took place only in recent years and that often faint-heartedly. One reason is certainly the high demands on safety, reliability and real-time, and this has for sure the highest priority in the whole area of automotive applications. Another one is due to the existing huge portfolio of reinforced and well tested software modules for single-core systems whose porting to multiple heterogeneous cores would require a significant effort.

If we look at the world of PCs or consumer electronics dominated by Windows, Linux or Android operating systems then the CPUs used are based on homogeneous multi-core architectures. Identical cores with identical instruction sets, performance and interconnect to the other on-chip units allow executing any OS task or process by any core. Task creation and core allocation take place dynamically at run-time in order to balance the load and optimize the run-time behavior.

Figure 1. AURIX multi-core architecture (source: Infineon)

 

In the world of automotive control applications the multi-core approach is completely different. In general, tasks have dedicated processing times and slots, and must guarantee response within a specific time limit. And the tasks are heterogeneous. They have many different demands on performance, communication resources and instruction set features. For this reason, mostly heterogeneous multi-core architectures with several different cores, tailored to the needs of specific tasks are used. One example of such a microcontroller is the AURIX from Infineon. This is a complete device family of multi-core controllers that are widely used in engine control units nowadays. Although the main cores all come from the TriCore architecture family, they differ in details. In some AURIX devices different flavors of the core architecture are implemented, e.g. performance cores (P-cores) and economy cores (E-cores). Some of the cores feature additional lockstep cores, enabling them to fulfill higher safety requirements stipulated by ISO26262, for example. The lockstep cores are based on the same core architecture and execute the same code. The results of both the actual computational core and the lockstep core are compared to each other and the reliability of the code execution and calculations checked permanently. If a deviation arises, the whole system has to be reset into a save state.

Optimized algorithms for advanced timing control, as needed for PWMs for electrical and hybrid drives, or for complex signal processing are supported by an additional core of AURIX, namely the GTM (Generic Timer Module), an intellectual property (IP) from Bosch. The GTM is completely different from the other TriCore-based cores. It has a lot of units dedicated for signal generation and processing as well as a number of processing units which can be programmed in a RISC-like manner. The tasks executed by GTM are executed loosely coupled to the other TriCore tasks but can communicate using a kind of shared memory. It is quite obvious that the task allocation to the different cores is not only a question of load balancing. It is rather a question of which core is the most appropriate one for executing the task. That decision can hardly be made by an operation system during run-time. As a consequence, it has to be determined already during the design phase of the software which core will take over which task.

Figure 2. OCDS trigger switch of Infineon AURIX

 

Those applications which are executing distributed across several cores and have to cope with high real-time demands are often the most challenging for debugging, test and system analysis. The typically existing large dependencies between the tasks running on different cores have a considerable influence on run-mode debugging or also known as stop-go debugging. It might be quite dangerous to break a single core while the others are kept running. In the worst case, the whole application would end up in chaos or crashes. Sometimes the other cores and also the peripherals have to be halted as well in order that the application does not get into an undefined state. The point is that heterogeneous cores with different clocking and execution pipelines do not allow a real synchronous stop. In practice, we will always have a delay. There is a complete opposite case; if for instance another, completely independent application is running in parallel on the same processor but using different cores, then it might be dangerous to halt the complete multi-core system. These scenarios show the importance of a flexible, synchronous run-control in a multi-core debug infrastructure.

A second, not less important aspect is the analysis of the run-time behavior without influencing it at the same time. This non-intrusive system observation plays an important role not only for real-time critical applications but also for profiling tasks or monitoring communication between cores. Often it is desirable to read out the system state from the target by means of the debugger at a certain point in time. However, if we halt the application for that purpose the system behavior would be fundamentally changed and has nothing to do anymore with the behavior of the application running later without an attached debugger. As a consequence, for an efficient non-intrusive system observation trace is indispensable.

Before we take a deeper look into trace features we will first come back to aspects dealing with synchronous run-control. Synchronous run-control necessarily requires short signal paths between the cores which can only be realized by on-chip debug hardware. Signaling of stop and go requests from the outside, for example from the debug probe via the debug interface, takes too much time, in particular for the high clock rates we have nowadays. And once the complete system is stopped finally the states of the individual tasks have lost their coherence completely.

Figure 3. ARM CoreSight debug and trace infrastructure with cross-triggering

 

All silicon vendors provide their own on-chip debug solution. There is no real standard at all in that area. Infineon for example is calling its solution OCDS (on-chip debug support). The central component of OCDS for run-control is a trigger switch, which propagates halt and suspend signals via so-called trigger lines to all cores and also to peripherals. The trigger switch is configurable that individual cores as well as peripherals can be stopped and stated again at the same time and without having an effect on the others. In addition to that, the trigger lines can be connected to pins and make them available to the outside world. This offers interesting opportunities. For example, signals can be connected to an oscilloscope or a break can be triggered externally.

Of course, besides the AURIX, a number of other microcontroller architectures exist, which are used for automotive applications. Two other examples are the SoCs based on the ARM Cortex-R architecture and the PowerArchitecture based SPC5 from STMicroelectronics. Both bring along an own implementation of on-chip run-control support. On the ARM side, it is called CoreSight. Let’s have a look at this.

In CoreSight a so called cross-trigger matrix (CTM) is used in order to propagate break and go signals across the cores. The cores themselves can trigger such signals and respond to them but not directly. A cross-trigger interface (CTI) attached to each core takes care of it. Up to four channels in a CTM broadcast the signals to all attached CTIs. The CTIs can be configured that way, for either passing the run-control signals to the core or blocking them. Thus, simply a core gets halted along with others or not. Because of hand-shake mechanisms, which are necessary between different components, there is a little delay of several clock cycles. The actual amount of that delay highly depends on the implementation. In fact, avoiding it is technically not possible. One drawback of the ARM solution is that CoreSight is in fact a set of components and IP blocks from which silicon vendors can choose. As a consequence, debug tool vendors cannot rely on the existence of CTMs and CTIs in a particular multi-core SoC.

As expected, the PowerArchitecture based controllers of the SPC5 family support synchronous run-control by means of hardware. The unit in charge is called DCI (Debug and Calibration Interface). The advantage compared to the CoreSight is that, as we already know from the trigger- switch of AURIX, peripherals are also connected to the debug signals. That allows halting the complete system, not only the cores.

Figure 4. Multi-core run-control management of UDE

 

In real life developers don’t want to take care of all these details. For this reason multi-core debuggers like the Universal Debug Engine® (UDE) from PLS make the complex configuration of on-chip debug units transparent to the users. The integrated run-control management, for example, easily allows creating run-control groups containing all the cores which should be stopped and started synchronously.

Especially when it comes to debugging and system analysis of real-time applications, on-chip trace is mandatory and is available for almost all high-end microcontrollers like the AURIX or SPC5. STMicroelectronics for example implements Nexus class 3 for tracing, for Infineon microcontrollers the on-chip trace is called MCDS (Multi-Core-Debug Solution) and for ARM trace hardware blocks come from the already known CoreSight. They all have in common that they are able to capture trace data from different cores in parallel. Timestamps allow a time correlation between the data of the different trace sources and thus we can reconstruct the exact sequence of events. This allows us to detect deadlocks and race conditions and communication bottlenecks can be found too.

Now, the most challenging task is transferring the captured trace data off-chip in order to analyze them by the debugger. From the current trace systems we know two ways to do so. Either the trace data is buffered in an on-chip trace memory and transferred via the standard debug interface, or a high bandwidth trace interface exists. The first allows a much higher bandwidth between the trace sources (CPUs, busses) and the trace sink, namely the on-chip trace memory. The major drawback is the very limited capacity. The later allows a theoretically unlimited observation period, but the bandwidth is in fact the limiting parameter. For both cases clever filter and trigger mechanisms as part of the trace solutions can help. These allow qualifying the captured trace data while they are created in order to record only the really necessary data. With it cross-triggering is also possible. Cross-triggering allows, for example, starting the trace recording for one core if a specific event arises at another core. That function is helpful for debugging of inter-core communication.

Experience has shown that for an effective use of trace the user needs comprehensive support by debug tools. That is true not only for the analysis of the recorded data but also for the definition of trace tasks and the configuration of the filters and triggers. UDE, for example, even provides a graphical tool for that purpose which allows managing even complex cross-triggers easily. Several tools help to analyze the recorded trace: from visualization of parallel execution of cores, via profiling, to providing code coverage information. 


Related


Give Your Product a Voice with Alexa

Join us for a deep dive into the system architecture for voice-enabled products with Alexa Built-In. Device makers can use the Alexa Voice Service (AVS) to add conversational AI to a variety of produc...

The two big traps of code coverage

Code coverage is important, and improving coverage is a worthy goal. But simply chasing the percentage is not nearly so valuable as writing stable, maintainable, meaningful tests. By Arthur Hick...

Securing the smart and connected home

With the Internet of Things and Smart Home technologies, more and more devices are becoming connected and therefore can potentially become entry points for attackers to break into the system to steal,...

Accurate and fast power integrity measurements

Increasing demands on power distribution networks have resulted in smaller DC rails, as well as a proliferation of rails that ensure clean power reaches the pins of integrated circuits. Measuring r...

 

Perfect Motion Control For the Networked World

We live in a physical world where everything is connected. Trinamic transforms digital information into physical motion with accessible, flexible, and easy to use toolkits putting the world’s be...


New High-Performance Serial NAND: A Better High-Density Storage Option for Automotive Display

The automotive requirements: speed, reliability and compatibility. Winbond's high-performance serial NAND Flash technology offers both cost and performance advantages over the SPI NOR Flash typica...


President Tung-Yi talks about Winbond

Winbond is a leading specialty memory solution provider with a wide rage of product portfolio. Owned technology and innovation are our assets for our industry and our customers. Winbond we are high qu...


New Memory and Security Technologies for Designers of IoT Devices

Internet of Things (IoT) edge nodes are battery-powered, often portable, and are connected to an internet gateway or access point wirelessly. This means that the most important constraints on new I...


Winbond TrustMe Secure Flash - A Robust and Certifiable Secure Storage Solution

Winbond has introduced the TrustMe secure flash products to address the challenge of combining security with advanced process nodes and remove the barriers for adding secure non-volatile storage to pr...


Ultra-Low-Power DRAM: A “Green” Memory in IoT Devices

Winbond is offering a new way to extend the power savings available from Partial Array Self-Refresh (PASR), which was already specified in the JEDEC standard by implementing a new Deep Self-Refresh (D...


Polytronics Thermal Conductive Board (TCB) at Electronica 2018

This video introduce the basic product structure, advantage, and application of Polytronics thermal conductive board (TCB). Polytronics exhibit wide range of circuit protection products and thermal ma...


Arrow and Analog Devices strategic partnership and collaborative approach to provide solutions for our customers.

Mike Britchfield (VP for EMEA Sales) talks about why Analog Devices have a collaborative approach with Arrow Arrow’s design resources are key, from regional FAEs in the field to online des...


WE MAKE IT YOURS! Garz & Fricke to present the latest HMIs and SBCs at Electronica 2018

Sascha Ulrich, Head of Sales at Garz & Fricke, gives you a quick overview about the latest SBC, HMI and Panel-PC Highlights at electronica 2018. Learn more about the SANTOKA 15.6 Outdoor HMI, the ...


Macronix Innovations at electronica 2018

Macronix exhibited at electronica 2018 to showcase its latest innovations: 3D NAND, ArmorFlash secure memory, Ultra Low Vcc memory, and the NVM solutions with supreme quality mainly focusing on Automo...


ams CEO talks about their sensor solutions that define the mega trends of the future

In this video Alexander Everke, ams’ CEO, talks to Alix Paultre of EETimes about their optical, imaging and audio sensor solutions in fast-growing markets – from smartphones, mobile device...


Intel accelerated IoT Solutions by Arrow

Arrow is showing Intel’s Market Ready Solutions in a Retailer shop with complete eco environment. From sensors via gateways into the cloud, combined with data analytics, the full range of Intel ...


CSTAR - Manufacturers of cable assembly from Taiwan

CSTAR was founded in 2010 in Taipei, Taiwan. Through years of experience, we are experts in automotive products, LCD displays, LCD TVs, POS, computers, projectors, laptops, digital cameras, medical ca...


NXP Announces LPC5500 MCU Series

Check this video to discover the new NXP microcontroller LPC5500, the target application and focus area. Links to more information: LPC5500 Series: World’s First Arm® Cortex® -M...


Molex Meets Solutions at Electronica

These are exciting times in the electronics world as Molex migrates from a pure connectors company to an innovate solutions provider. Solutions often start at the component level, such as the connecto...


Alix Paultre investigates Bulgin's new optical fiber rugged connector range at Electronica 2018

Alix Paultre interviews Bulgin's Engineering Team Leader Christian Taylor to find out more about the company's new range of optical fiber connectors for harsh environments. As the smallest rug...


Cypress MCU and Connectivity are the best choice for real-world IoT solutions.

Cypress’ VP of Applications, Alan Hawse, explains why people should use Cypress for their IoT connectivity and MCU needs. Cypress wireless connectivity and MCU solutions work robustly and sea...


Chant Sincere unveils their latest High Speed/High Frequency connection solutions at Electronica 2018

Chant Sincere has been creating various of product families to provide comprehensive connection solutions to customers. USB Series Fakra Series QSFP Series Metric Connector Series Fibro ...


Addressing the energy challenge of IoT to unleash billions of devices

ON Semiconductor introduces various IoT use cases targeted towards smart homes/buildings, smart cities, industrial automation and medical applications on node-to-cloud platforms featuring ultra-low po...


ITECH, world leading manufacturer of power test instruments, shinned on electronica 2018

ITECH, as the leading power electronic instruments manufacturer, attended this show and brought abundant test solutions, such as automotive electronics, battery test, solar array simulator, and electr...


ITECH new series give users a fantastic user experience

ITECH latest series products have a first look at the electronics 2018, such as IT6000B regenerative power system, IT6000C bi-directional programmable DC power supply, IT6000D high power programmable ...


SOTB™ Process Technology - Energy Harvesting in Embedded Systems is Now a Reality

Exclusive SOTB technology from Renesas breaks the previous trade-off between achieving either low active current or low standby current consumption – previously you could only choose one. With S...


Power Integrations unveils their new motor control solution

In this video friend of the show Andy Smith of Power Integrations talks to Alix Paultre from Aspencore Media about their new BridgeSwitch ICs, which feature high- and low-side advanced FREDFETs (Fast ...


Panasonic talks about their automotive technology demonstrator

In this video Marco from Panasonic walks Alix Paultre of Aspencore Media through their automotive technology demonstrator at electronica 2018. The demonstrator highlights various vehicle subsystems an...