CCS News

Tech Note: Unrolling Loops with a Duff's Device

Wednesday 22 July, 2020

Do you remember when you learned programming languages and your teacher or instruction materials told you to never use a GOTO because it could lead to code that was difficult to understand? Here is something that you might not have been warned about - Duff's device. Duff's device was created by Tom Duff, and its design was to unroll and speed up loops by removing conditional statements from the loop. By doing this, the execution time of the loop is decreased. A Duff's device is basically a switchcase statement, with the break operations removed so the code can continue by falling to the next case. Let us look at how it works and look at some real measurements of speed increases on a PIC18F MCU.

Before jumping into a Duff's device, it would be useful to review the switch-case statement in C:

switch(state)
{
case 0:
doState0();
break;

case 1:
doState1();
break;

case 2:
doState2();

case 3:
doState3();
break;
}

switch() reviews the value passed to it, the variable 'state' in the above example, and jumps to matching case statement that matches this variable. For most cases, the C compiler will create a jump table at the switch() to jump to the variable that matches. That means for a switch() statement, most of all the branching logic happens at the switch() statement. The break statement tells the C compiler to jump out of the current switch().

Notice anything odd about the case 2 in the above example? It does not have a break at the end! Without this break, you are telling the compiler that you want to continue execution. That means in the above example, when the case 2 has finished it will keep rolling into the case 3. Many compilers (including CCS C) and lint tools will generate a warning that you forgot a break statement, because for many control loops the developer only intended one block of code to execute for each case. But in a Duff's device we are going to leverage the ability to continue execution to the next case.

Let us look at a simple loop, maybe being used to send pulses or clock data on the IO:

#define PULSE() output_toggle(PIN_PULSE);
output_toggle(PIN_PULSE)

void send(int n)
{
while(n--)
{
PULSE();
}
}

This is a simple function that generates n pulses, using a while loop to decrement n and exit when n is 0. That means for every pulse, the value of n has to be decremented and checked against 0. Look at the timing of pulses generated on a PIC18F MCU running at 4MHz (one instruction every 1us):



This generated a 100KHz signal. You will notice the discrepancy of the duty cycle; the high time is 1us but the low time is 9us. That's because it took 8us to execute the while(n--) portion of the loop.

Now let us replace it with a Duff's device:

#define PULSE() output_toggle(PIN_PULSE);
output_toggle(PIN_PULSE)

void send(int n)
{
switch(n)
{
case 8:
PULSE();

case 7:
PULSE();

case 6:
PULSE();

case 5:
PULSE();

case 4:
PULSE();

case 3:
PULSE();

case 2:
PULSE();

case 1:
PULSE();
}
}

In the above example, the switch(n) creates a jump table of up to 8 pulses and jumps to position with n pulses. After the jump is calculated and executed, the following pulses run without any other conditional code. Lets look at the timing of pulses generated on a PIC18F running at 4MHz (one instruction every 1us):



This creates a 500KHz signal, 5 times faster than using a while loop. Also of interest is the duty cycle of the signal, which is now 50% (even time high and low) because the pulses execute without any other conditional checks. If you look at the LST file to view the instructions generated, the reason for this is clear (BTG instruction is used to perform a bit toggle, in this case toggling the register that controls this GPIO pin):

.................... case 8:
.................... PULSE();
00016: BTG 3FBD.0
00018: BTG 3FBD.0
.................... case 7:
.................... PULSE();
0001A: BTG 3FBD.0
0001C: BTG 3FBD.0
.................... case 6:
.................... PULSE();
0001E: BTG 3FBD.0
00020: BTG 3FBD.0

The next time you have a loop and you were looking to optimize it for speed, the Duff's device is a great method for accomplishing this!


Like us on Facebook. Follow us on Twitter.

About CCS:

CCS is a leading worldwide supplier of embedded software development tools that enable companies to develop premium products based on Microchip PIC® MCU and dsPIC® DSC devices. Complete proven tool chains from CCS include a code optimizing C compiler, application specific hardware platforms and software development kits. CCS' products accelerate development of energy saving industrial automation, wireless and wired communication, automotive, medical device and consumer product applications. Established in 1992, CCS is a Microchip Premier 3rd Party Partner. For more information, please visit http://www.ccsinfo.com.

PIC® MCU, MPLAB® IDE, MPLAB® ICD2, MPLAB® ICD3 and dsPIC® are registered trademarks of Microchip Technology Inc. in the U.S. and other countries.