Zero-Extension of 32-bit Results in x86-64
Zero-extension refers to the process of expanding a binary number by adding zeros to the higher-order bits. In this post, we will explore how zero-extension of 32-bit results works in the x86-64 architecture.
How does zero-extension in x86-64 work?
In general, byte and word operands are stored in the low 8 or 16 bits of GPRs without modifying their high 56 or 48 bits, respectively. Doubleword operands, however, are normally stored in the low 32 bits of GPRs and zero-extended to 64 bits.
- From AMD64 Architecture Programmer’s Manual Volume 1: Application Programming
For example, if the value in register RAX is 0x0001000100010001, what will be the value of RAX after the following instruction is executed?
add eax, eax
The answer is - 0x0000000000020002, as the higher 32 bits of register RAX is automatically zero-extended (cleared).
Tips and Tricks
The zero idiom in IA-32 is using the XOR instruction instead of MOV to initialize a register to 0, as the former generates shorter opcode.
For example, the following instruction is preferred when clearing register EAX:
xor eax, eax
How about clearing register RAX in the 64-bit mode of x86-64? Let's experiment with the following C code hello.c:
#include <stdio.h>
int main() {
printf("hello, world\n");
return 0;
}
Use gcc to generate the O2-optimized assembly code:
gcc hello.c -O2 -S -o hello.S
The content of hello.S is shown as follows:
.file "hello.c"
.text
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "hello, world"
.section .text.startup,"ax",@progbits
.p2align 4
.globl main
.type main, @function
main:
.LFB11:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
leaq .LC0(%rip), %rdi
call puts@PLT
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE11:
.size main, .-main
.ident "GCC: (GNU) 15.2.1 20251112"
.section .note.GNU-stack,"",@progbits
According to the x86-64 calling convention, integer return values are stored in RAX. Therefore, the instruction of setting the return value to 0 is
xorl %eax, %eax
As mentioned earlier, the results of higher 32 bits of GPRs will be automatically zero-extended. Therefore, clearing EAX implictly sets the higher 32 bits of RAX to 0.
But you may wonder, why does the compiler choose to generate assemebly code xor eax, eax rather than xor rax, rax? The reason lies in the difference in opcode length:
| Mnemonic | Opcode |
|---|---|
| xor eax, eax | 31 c0 |
| xor rax, rax | 48 31 c0 |
As you can see, the former instruction has shorter opcode.
The prefix 0x48 in the opcode is called REX prefix, according to the AMD64 manual:
For most instructions, the default operand size in 64-bit mode is 32 bits. To access 16-bit operand sizes, an instruction must contain an operand-size prefix (66h), as described in Section 3.2.3, “Operand Sizes and Overrides,” on page 41. To access the full 64-bit operand size, most instructions must contain a REX prefix.
- From AMD64 Architecture Programmer’s Manual Volume 1: Application Programming