Embedded technology is most commonly used in real-world environments. What this means is that performance in real-time is of much importance. Imagine a pilot steering an airplane, if one of the wings does not respond in exact time, things can go really, really wrong. For an embedded engineer, performance in real-time is maybe the most crucial aspect when developing a piece of hardware or software. For an embedded software engineer, this means that your code has to be efficient, optimized, error-proof, and that it can handle different problems and situations.
In order to measure the performance of MCUs in real-time, software and hardware engineers have over time developed different testing ways we call " Benchmarks ". So what exactly are benchmarks?
Benchmarks usually contain a program, or a set of programs, which engineers run to test the performance of the CPU. These programs include processes which put a lot of workload on the CPU, so to test how well the CPU handles these operations. Operations often include integer math, floating point math, word processing, graphic processing etc. Benchmarks can also contain a key algorithm of an application, which is known to be demanding. That way, engineers can optimize that part of application and make it better and more efficient.
Not all that Glitters is Gold
No matter how well made the benchmarks are, there are always things you should look out for. Benchmarks can offer unrealistic results which can not be replicated in real usage. One example of this is the usage of compiler optimizers. For example, if a benchmark has a mathematical operation which the compiler recognizes, the optimizer will replace it with a faster and/or shorter operation with the equivalent result. This will result in the benchmark giving you a better result than what is realistically achievable.
Dhrystone Benchmarking
For almost one and half decades, Dhrystone benchmarking was the only benchmark for the MCU core. Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker and was intended to be representative of system (integer) programming. Dhrystone is a simple program that is carefully designed to statistically mimic the processor usage of certain common set of programs. Dhrystone does not represent the MIPS (Millions of Instructions per Second) of the device, rather, it shows how many times has the MCU executed the program in a particular time frame, usually per second.
Comparing the new STM32F7 to the STM32F4
At the end of the year 2015, mikroElektronika has published support for the all new STM32F7 MCU from STMicroelectronics. We here at the firmware department decided to write a simple benchmark to see just how powerfull the M7 is in comparison to the M4. Here's a simple tutorial of how we have tested and compared these two beasts.1. Setting up the Timer
We wanted to see how much the chips can do in a particular time frame, so we set up the timer to generate an interrupt every second.
RCC_APB1ENR.TIM2EN = 1; // Enable clock gating for timer module 2 TIM2_CR1.CEN = 0; // Disable timer TIM2_PSC = 1199; // Set timer prescaler. TIM2_ARR = 62499; NVIC_IntEnable(IVT_INT_TIM2); // Enable timer interrupt TIM2_DIER.UIE = 1; // Update interrupt enable TIM2_CR1.CEN = 1; // Enable timer
The interrupt routine will clear the flag, and increment a counter which will hold the value of the number of interrupts, in this case, number of seconds passed since the beginning of our program.
void Timer2_interrupt() iv IVT_INT_TIM2 { TIM2_SR.UIF = 0; int_ctr++; }
2. Getting the Test Code
Sharing is caring, especially when it comes to code, and we owe a special thank you to the people working at BEEBS, who provided an open source benchmark to use on GitHub. Out of all these beautiful test procedures, we chose two for our benchmark: an FFT (Fast Fourier Transformation) algorithm, and a Dhrystone implementation.
3. Implementation
We decided to run our program for 5 seconds, and see how many times the chips can carry out the different tasks, FFT or Dhrystone. The M4 was running on an impressive frequency of 140 MHz, but the M7 was pushing an amazing 216.
while (1) { benchmark_dhry(); // execute dhrystone algorithm // benchmark_fft(); // execute fft algorithm ctr++; // count the number of iterations if (int_ctr == 5) // if 5 seconds have passed { inttostr(ctr,txt); // print out the number of iterations uart1_write_text(txt); break; } }
4. Results
As expected, SMT32F7 was much faster, as it could manage 35,157 FFT iterations in 5 seconds, while the M4 could only crank out 26,397. As for the Dhrystone, results are similar: the M7 did 55398 iterations in 5 seconds, while the M4 did 42,777 iterations. Both chips are very powerful, and in most cases will do great accomplishing their tasks. However, if your have a burning need for speed, and you simply need as much power as you can get, you can always count on the M7!
Conclusion
If you are building a demanding project, in which your MCU must run as smoohtly as possible, benchmarking is always a good idea. Test out the demanding tasks, see how much power your MCU needs for them, optimize your code, cut the time shorter, and in the end, deliver an amazing project which will never fail!