Assembly language for Power Architecture, Part 4: Function calls and the PowerPC 64-bit ABI

Jumlah posting : 67 Join date : 13.04.11

Listing 1. Function to square a number using the simplified ABI

###FUNCTION ENTRY POINT DECLARATION###
.section .opd, "aw"
.align 3

.global my_square
my_square: #this is the name of the function as seen
.quad .my_square, .TOC.@tocbase, 0
#Tell the linker that this is a function reference
.type my_square, @function

###FUNCTION CODE HERE###
.text
.my_square: #This is the label for the code itself (referenced in the "opd")
#Parameter 1 -- number to be squared -- in register 3

#Multiply it by itself, and store it back into register 3
mulld 3, 3, 3

#The return value is now in register 3, so we just need to leave
blr

Previously, you were using the .opd section for declaring the program's entry point, but here you're also using it to declare a function. These are called official procedure descriptors, and they contain the information the linker needs to combine position-independent code from different shared object files together. The most important field is the first one, which is the address of the start of the code for the procedure. The second field is the TOC pointer used for the function. The third field is an environment pointer for languages that use one, but is normally just set to zero. Notice that the only symbol definition that is exported globally is the official procedure descriptor.

The C language prototype for this function is:

Listing 2. C prototype for number-squaring function

typedef long long int64;
int64 my_square(int64 val);

Here is the C code for using the function (enter as my_square_tester.c):

Listing 3. C code for calling the my_square function

#include <stdio.h>

/* make declarations easier to write */
typedef long long int64;

int64 my_square(int64);

int main() {
int a = 32;
printf("The square of %lld is %lld.\n", a, my_square(a));
return 0;
}

The simple way to compile and run this code is to do the following:

Listing 4. Compiling and running my_square_tester

gcc -m64 my_square.s my_square_tester.c -o my_square_tester
./my_square_tester

The -m64 flag tells the compiler to use 64-bit instructions, compile using the 64-bit ABI and libraries, and use the 64-bit ABI for linking. It then takes care of all of the linking issues for you (and there are several -- you can see the full linking command line by appending -v to the command line).

As you can see, writing functions using the simplified PowerPC ABI is very straightforward. The issues come in when the functions don't meet these criteria.

Back to top

The stack

Now let's get into the more complicated parts of the ABI. The most important part of any ABI is the details of how to make use of the stack, which is the area of memory that holds local function data.

The need for a stack

The best way to see why stacks are needed is to look at recursive functions. For simplicity, let's look at the recursive implementation of the factorial function:

Listing 5. Factorial function

typedef long long int64;
int64 factorial(int64 num) {
//BASE CASE
if (num == 0) {
return 1;
//RECURSIVE CASE
} else {
return num * factorial(num - 1);
}
}

This may be easy enough to understand conceptually, but let's examine it concretely. What is going on here? What happens, for instance, if you try to find the value of the factorial of 4? Let's follow the sequence:

First, the function will be called, and num will be set equal to 4. Then, because num is greater than 0, factorial will be called again, this time with 3. Now, in the new call to factorial, num is set to 3. However, this references a different memory location than the previous one, even though they share the same name and the same code. Even though it is the same variable name in the same code, num is different this time. This is because each time a function is called, it has an activation record (also called a stack frame) associated with it. The activation record contains all of the call-specific data for the function, including parameters and local variables. This is how recursive functions keep from trashing the values of the variables in other, active function calls. Each call gets its own activation record, so each time it is called the variables get their own storage space within that activation record. Only when the function call is completely finished is the space for the activation record released for reuse (more on this later).

So, with 3 as the value of num, we go through the function again, then with 2, then with 1, then with 0. However, with 0, the function has reached its base case. The base case is the point where it ceases to call itself, and instead returns. So, with 0 as num, it returns 1 as the result. The previous function call picks up where it left off (calling factorial(0)) and multiplies the result, 1, with the value in its own num, also 1. This is returned, and the next function waiting is reactivated. This one multiplies the result, 1, with its value of num, which is 2, and the result, 2, is then returned. The next waiting function call is then reactivated, and the previous result is multiplied by this function's value of num, which is 3, resulting in 6. This number is returned to our original function, whose value of num is 4. This is multiplied with the previous result to get 24.

As you can see, each time a function calls another function, its own values and state are suspended while the next function invocation occurs. This is true for all functions, not just recursive ones. If that function again calls other functions, its state is likewise suspended. When a function returns, the function that called it is revived and it continues from there. So, as we progress, the "live" function calls stack up on top of each other with each function call, and then are removed from the stack with every function return. The result looks like this (factorial will be abbreviated as fac):

1. fac(4) [active]
2. fac(4) [suspended], fac(3) [active]
3. fac(4) [suspended], fac(3) [suspended], fac(2) [active]
4. fac(4) [suspended], fac(3) [suspended], fac(2) [suspended], fac(1) [active]
5. fac(4) [suspended], fac(3) [suspended], fac(2) [suspended], fac(1) [suspended], fac(0) [active]
6. fac(4) [suspended], fac(3) [suspended], fac(2) [suspended], fac(1) [active]
7. fac(4) [suspended], fac(3) [suspended], fac(2) [active]
8. fac(4) [suspended], fac(3) [active]
9. fac(4) [active]

As you can see, the suspended function activation records "stack up", and then, when each function returns, it gets taken off of the stack.

The stack layout

To implement this idea, a range of memory is allocated for each program called the program stack. All PowerPC programs start off with a pointer to this stack in register 1. In the PowerPC ABI, register 1 always points to the top of the stack. This makes it easy for functions to know where their activation record is -- they are simply defined in terms of the stack pointer. If a function is executing, then the stack pointer is pointing to the top of the whole stack, which is also the top of that function's activation record. Because activation records are implemented on a stack, they are often referred to as stack frames, but both terms are equivalent.

Now, when the "top of the stack" is referred to, that is a conceptual designation. Physically, in memory, the stack grows downward, from large-numbered memory addresses to small-numbered ones. Therefore, register 1 will have a pointer to the conceptual top of the stack, and references to stack positions that have positive offsets will actually be below the top of the stack conceptually, and negative offsets will be conceptually above. So, 0(1) refers to the conceptual top of the stack, 4(1) refers to four bytes down from the top (conceptually), 24(1) is even lower conceptually, and 100(1) is lower still.

Now that you understand how the stack looks conceptually and physically, let's look at what exactly the individual stack frames hold. Here is the layout of the stack according to the 64-bit PowerPC ABI, from a physical memory standpoint (stack offsets, where given, refer to the beginning of this location in memory):

Table 1. Stack frame layout
Contains Size Beginning stack offset
Floating point non-volatile register save area Varies Varies
General non-volatile register save area Varies Varies
VRSAVE 4 bytes Varies
Alignment padding 4 or 12 bytes Varies
Vector non-volatile register save area Varies Varies (must be quadword-aligned)
Local variable storage Varies Varies
Parameters for function calls Varies (minimum 64 bytes) 48(1)
TOC save area 8 40(1)
Link editor area 8 32(1)
Compiler area 8 24(1)
Link Register save area 8 16(1)
Condition Register save area 8 8(1)
Pointer to top of previous stack frame 8 0(1)

I won't concern you with the floating point, VRSAVE, Vector, or alignment space. Those topics deal with floating point and vector processing and are outside the scope of this article. All stack values must be doubleword (8-byte) aligned, and the whole frame should be quadword (16-byte) aligned. All parameters must be doubleword-aligned.

Now, let's look at what each part of the stack frame does.

Non-volatile register save areas

The first part of the stack frame is the non-volatile register save area. Registers in the PowerPC ABI are divided into three basic classes: dedicated, volatile, and non-volatile. Dedicated registers are registers that have a predefined, permanent function, like the stack pointer (register 1) and the TOC pointer (register 2). Registers 3-12 are volatile registers, which means that any function can modify them freely without having to restore their previous value. However, this means that any time a function calls another function, it should assume that registers 3-12 will be overwritten by that function.

On the other hand, registers 13 and above are considered non-volatile registers. This means that a function can use them provided their value is restored before returning from the function. Therefore, before using a non-volatile register in a function, its value must be saved in the function's stack frame, and then restored before the function returns. Likewise, a function may also assume that the values it assigns to non-volatile registers will not be modified (or at least will be restored) when it makes calls to other functions. A function may use as little or as much memory in this save area as needed.

Now you can see why our earlier rules for the simplified ABI required that only registers 3 through 12 should be used: the others are non-volatile and require stack space to save them! Therefore, in order to use the other registers, they have to be saved on the stack. However, the ABI actually has a way to work around this limitation. Functions are free to use the 288 bytes that are physically below the stack pointer for functions that do not call other functions. Therefore, functions using the simplified ABI actually can save, use, and restore non-volatile registers by using negative offsets from the stack pointer.

Local variable storage

The local variable storage area is a general-purpose area for saving function-specific data. Often this is not needed because of the large number of registers available for use in the PowerPC architecture. However, this space is often used for local arrays. This area can be any size needed by the function.

Parameters for function calls

Function parameters are handled a little differently from other local data. The PowerPC ABI actually puts the storage space for the function parameters in the calling function's stack space. Now, as you saw earlier, function calls actually pass their parameters through registers. However, space must still be reserved for parameters in case the values need to be saved, especially since the parameters are passed using volatile registers. This space is also used for overflow: if there are more parameters than registers available for use, then they need to go in the stack space. Since this parameter area is shared by all functions called from the current one, when a function sets up its stack space, it has to reserve space for the largest number of parameters it will use in a function call.

So that a function can know where its parameters are, parameters are stored from the bottom of memory to the top. The first parameter is in 48(1), while the second parameter is in 56(1). This way, the function being called can know the exact offset of each parameter, no matter how big the parameter list area is. Remember, the parameter list area is defined for all of the calls made by a function, and therefore will likely be bigger than necessary for any individual function call.

Now, since the save area for the parameters passed to a function are actually in the calling function's stack frame, when a function establishes its own stack frame, the offsets to the parameter list now have to be adjusted to account for the function's own stack frame size. So, let's say that function func1 calls function func2 with three parameters, and func2 has a 112-byte stack frame. If func2 wants to access the memory for its first parameter, it would refer to it as 160(1), because it has to go past its own stack frame (112 bytes) and reach the first parameter in the last frame (48 bytes).

p>

Thankfully, functions rarely have to access their parameter save area because most parameters are passed by register, not in the parameter save area. However, space must be allocated for them even if there is nothing stored there. Functions must assume that for the first eight parameters, they are only passed by register, but they will still have a save area available if they need to be stored by the program. This space must also be a minimum of 64 bytes large.

TOC, link editor, and compiler areas

The TOC save area, compiler area, and linker area are all reserved for system use, and are not modified by programmers, but the programmer must reserve space for them.

Link register save area

The link register save area is different from the other parts of the ABI. When a function begins, it actually saves the link register in the calling function's stack frame, not its own, and then only if it needs to save it. Most functions that call other functions will need it, though.

Condition register save area

The condition register save area is needed if any of the non-volatile fields of the condition register are modified. The non-volatile fields are cr2, cr3, and cr4. The condition register should be saved in its area of the stack before any of these fields are modified, and then restored before returning.

Pointer to the previous stack frame

The final item in the stack frame is a pointer to the previous stack frame, often called the back pointer.

Writing a function that uses the stack

Functions create the stack frame during the beginning of the function (called the function prologue) and tear it down at the end of a function (called the function epilogue).

A function's prologue usually follows the following sequence:

1. Reserve stack space and save the old stack pointer, using stdu 1, -SIZE_OF_STACK(1) (where SIZE_OF_STACK is the size of the stack frame for this function). This will save the old stack pointer and allocate stack memory atomically.

2. If this function will call another function, or use the link register in any way, it will be saved by the instruction mflr 0 followed by a store into the link register save area of the function that called this one, using the instruction std 0, SIZE_OF_STACK+16(1).

3. Save all non-volatile registers that will be used during this function (including the condition register, if any of its non-volatile fields will be used).

The function's epilogue follows the reverse sequence, restoring what had been saved, and then destroying the stack frame using ld 1, 0(1), which loads the previous stack pointer back into the stack pointer register.

Now, let's return to the function that we originally implemented without a stack, and as an example, look and see what it would look like with a stack (enter as my_square.s and compile and run as before):

Listing 6. Function to square a number using a stack

###FUNCTION ENTRY POINT DECLARATION###
.section .opd, "aw"
.align 3

.global my_square
my_square: #this is the name of the function as seen
.quad .my_square, .TOC.@tocbase, 0
.type my_square, @function

###FUNCTION CODE HERE###
.text
.my_square: #This is the label for the code itself (Referenced in the "opd")
##PROLOGUE##
#Set up stack frame & back pointer (112 bytes -- minimum stack)
stdu 1, -112(1)
#Save LR (optional)
mflr 0
std 0, 128(1)
#Save non-volatile registers (we don't have any)

##FUNCTION BODY##
#Parameter 1 -- number to be squared -- in register 3
mulld 3, 3, 3

#The return value is now in register 3, so we just need to leave

##EPILOGUE##
#Restore non-volatile registers (we don't have any)
#Restore LR (not needed in this function, but here anyway)
ld 0, 128(1)
mtlr 0
#Restore stack frame atomically
ld 1, 0(1)
#Return
blr

That's exactly the same code as before, just wrapped with prologue and epilogue code. As mentioned, this code is simple enough that it doesn't need prologue and epilogue code and is perfectly fine using the simplified ABI. However, it is a good example of how to set up and tear down a stack frame.

Now, let's return to the factorial function. This function, since it calls itself, makes very good use of stack frames. Let's look at how the factorial function would work in assembly language (enter as factorial.s):

Listing 7. The factorial function in assembly language

###ENTRY POINT###
.section .opd, "aw"
.align 3

.global factorial
factorial:
.quad .factorial, .TOC.@tocbase, 0
.type factorial, @function

###CODE###
.text
.factorial:
#Prologue
#Reserve Space
#48 (save areas) + 64 (parameter area) + 8 (local variable) = 120 bytes.
#aligned to 16-byte boundary = 128 bytes
stdu 1, -128(1)
#Save Link Register
mflr 0
std 0, 144(1)

#Function body

#Base Case? (register 3 == 0)
cmpdi 3, 0
bt- eq, return_one

#Not base case - recursive call
#Save local variable
std 3, 112(1)
#NOTE - it could also have been stored in the parameter save area.
# parameter 1 would have been at 176(1)

#Subtract One
subi 3, 3, 1

#Call the function (branch and set the link register to the return address)
bl factorial
#Linker word
nop

#Restore local variable (but to a different register -
#register 3 is now the return value from the last factorial
#function)
ld 4, 112(1)
#Multiply by return value
mulld 3, 3, 4
#Result is in register 3, which is the return value register

factorial_return:
#Epilogue
#Restore Link Register
ld 0, 144(1)
mtlr 0
#Restore stack
ld 1, 0(1)
#Return
blr

return_one:
#Set return value to 1
li 3, 1
#Return
b factorial_return

To test it from C, enter the following (enter as factorial_caller.c):

Listing 8. Program to call factorial function

#include <stdio.h>
typedef long long int64;
int64 factorial(int64);

int main() {
int64 a = 10;
printf("The factorial of %lld is %lld\n", factorial(a));
return 0;
}

Compile and run as follows:

Listing 9. Compiling and running factorial

gcc -m64 factorial.s factorial_caller.c -o factorial
./factorial

There are a few features of this factorial function that are interesting. First of all, we are making use of the local variable storage space. We are saving the current parameter in 112(1). Now, since this is a function parameter, we could have saved an extra doubleword of stack space and stored it in the caller's parameter area.

Another interesting thing in the program is the nop instruction after the function call. That is required by the ABI. That extra instruction allows the linker to insert additional code if necessary during the linking process. For example, if you have a program that has enough symbols to warrant multiple TOCs (TOCs were discussed in "Assembly language for Power Architecture, Part 2: The art of loading and storing on PowerPC"), the linker will emit an instruction (or multiple instructions using a branch) to swap around TOCs for you.

Finally, notice that the branch target for the function call is not the code that starts it, but the .opd entry point descriptor. The linker will take care of converting this to point to the correct code. However, this will let the linker know additional information about the function, including which TOC it is using, so it can emit the code to swap these around if necessary.

Back to top

Creating dynamic libraries

Now that you know how to make functions, you can put them together into a library. You actually don't need to write any additional code, you just need to compile it all together. To combine the factorial and my_square functions into a single library (let's call it libmymath.so), just enter the following:

Listing 10. Compiling shared libraries

gcc -m64 -shared factorial.s my_square.s -o libmymath.so

This instructs the compiler to produce a shared object called libmymath.so. To link this into executables, you need to enable both the compile-time linker and the run-time dynamic linker to find it. To compile the factorial calling function to use the shared object, compile and run like this:

Listing 11. Using the shared library

#-L tells what directories to search, -l tells what libraries to find
gcc -m64 factorial_caller.c -o factorial -L. -lmymath
#Tell the dynamic linker what additional directories to search
export LD_LIBRARY_PATH=.
#Run the program
./factorial

» Tool-Tool Hacking di Linux (part 1)