x86-64 内存架构与 mov 指令详解:寻址机制与栈操作深入剖析
x86-64 架构内存寻址机制与数据移动指令详解。涵盖物理存储矩阵结构、寄存器层级设计、多种寻址模式计算逻辑、零扩展与符号扩展指令差异,以及栈指针向下增长的操作原理与常见错误规避。通过汇编与 C 代码对比,阐明底层 CPU 与内存交互的核心机制。

x86-64 架构内存寻址机制与数据移动指令详解。涵盖物理存储矩阵结构、寄存器层级设计、多种寻址模式计算逻辑、零扩展与符号扩展指令差异,以及栈指针向下增长的操作原理与常见错误规避。通过汇编与 C 代码对比,阐明底层 CPU 与内存交互的核心机制。

Modern main memory is implemented using a matrix structure, rather than a simple linear arrangement. The core motivation behind this design is to avoid the physical wiring challenges caused by 'extremely long memory address lines.'
Core Idea: Map a one-dimensional address into a two-dimensional physical space through row and column decoding, significantly reducing the complexity of address decoding circuits.
In the x86-64 architecture, the memory system has the following key characteristics:
| Feature | Description |
|---|---|
| Address Space | Theoretically supports 2^64 distinct memory addresses |
| Address Length | Each memory address requires 64 bits in binary representation |
| Addressing Granularity | Each address points to a single byte (1 byte = 8 bits) |
| Byte Order | Little-Endian |
Although the architecture supports a 2^64 address space, actual implementations typically use only 48 bits (256 TB) to avoid excessively large page tables.

x86-64 assembly instructions use a single-character suffix to explicitly indicate the operand size. This is fundamental to understanding instruction behavior:
| Suffix | Full Name | Size | Example Instruction |
|---|---|---|---|
| b | byte | 1 byte (8 bits) | movb |
| w | word | 2 bytes (16 bits) | movw |
| l | long/double word | 4 bytes (32 bits) | movl |
| q | quad word | 8 bytes (64 bits) | movq |
| Instruction | Effect |
|---|---|
| movb | move byte |
| movw | move word |
| movl | move double word |
| movq | move quad word |
When handling 64-bit immediate values, the standard movq may cause truncation (sign extension). In such cases, movabsq must be used:
; Incorrect: immediate is truncated to 32 bits and then sign-extended
movq $0x0011223344556677, %rax ; %rax = 0x0000000044556677 ❌
; Correct: full 64-bit immediate transfer
movabsq $0x0011223344556677, %rax ; %rax = 0x0011223344556677 ✓
x86-64 registers support partial access, which is a key design for backward compatibility with 32/16/8-bit code:

Data Consistency of Registers:
As shown above, these registers of different sizes are physically nested accesses to the same underlying storage unit.
This means registers of the same type completely share their contents:
In the x86-64 specification, writing to a 32-bit register (such as %eax) will automatically zero the upper 32 bits.
| 64-bit | 32-bit | 16-bit | 8-bit | Usage |
|---|---|---|---|---|
| %rax | %eax | %ax | %al | Accumulator, return value |
| %rbx | %ebx | %bx | %bl | Base register |
| %rcx | %ecx | %cx | %cl | Counter, loop variable |
| %rdx | %edx | %dx | %dl | Data register |
| %rsi | %esi | %si | %sil | Source index |
| %rdi | %edi | %di | %dil | Destination index |
| %r8-%r15 | %r8d-%r15d | %r8w-%r15w | %r8b-%r15b | Extended registers |
In functions, %rdi generally represents the first argument, %rsi the second argument, and %rcx often corresponds to the C-language index 'i'.
Note that these are 64-bit registers corresponding to the long type. If the parameter type is different, the corresponding register variant must be used. For example, if the parameter is int, use %edi and %esi.
x86-64 provides flexible addressing methods, which can be divided into three categories: Immediate, Register, and Memory.
Memory refers to the value stored at the address contained in a register, which may be a value or another address.
| Type | Form | Operand Value | Name |
|---|---|---|---|
| Immediate | $Imm | Imm | Immediate |
| Register | ra | R[ra] | Register |
| Absolute | Imm | M[Imm] | Absolute |
| Indirect | (ra) | M[R[ra]] | Indirect |
| Base + displacement | Imm(rb) | M[Imm+R[rb]] | Base + displacement |
| Indexed | (rb, ri) | M[R[rb]+R[ri]] | Indexed |
| Scaled indexed | (rb, ri, s) | M[R[rb]+R[ri]*s] | Scaled indexed |
📝 Scale factor s: can only be 1, 2, 4, 8 (convenient for array element access)
Assume the following register and memory state:
| Address | Value | Register | Value |
|---|---|---|---|
| 0x100 | 0xFF | %rax | 0x100 |
| 0x104 | 0xAB | %rcx | 0x1 |
| 0x108 | 0x13 | %rdx | 0x3 |
| 0x10C | 0x11 | - | - |
Then the operand values are:
| Operand | Calculation | Result |
|---|---|---|
| %rax | R[%rax] | 0x100 |
| 0x104 | M[0x104] | 0xAB |
| $0x108 | 0x108 | 0x108 |
| (%rax) | M[0x100] | 0xFF |
| 4(%rax) | M[0x100 + 4] | 0xAB |
| 9(%rax, %rdx) | M[0x100 + 0x3 + 9] | 0x11 |
| 260(%rcx, %rdx) | M[0x1 + 0x3 + 0x104] | 0x13 |
| (%rax, %rdx, 4) | M[0x100 + 0x3 * 4] | 0x11 |
Extend smaller data to larger data, filling high bits with zero:
| Instruction | Source | Destination | Effect |
|---|---|---|---|
| movzbw | 8-bit | 16-bit | zero-extend to word |
| movzbl | 8-bit | 32-bit | zero-extend to double word |
| movzbq | 8-bit | 64-bit | zero-extend to quad word |
| movzwl | 16-bit | 32-bit | zero-extend to double word |
| movzwq | 16-bit | 64-bit | zero-extend to quad word |
Extend smaller data to larger data, copying the sign bit into higher bits:
| Instruction | Source | Destination | Effect |
|---|---|---|---|
| movsbw | 8-bit | 16-bit | - |
| movsbl | 8-bit | 32-bit | - |
| movsbq | 8-bit | 64-bit | - |
| movswl | 16-bit | 32-bit | - |
| movswq | 16-bit | 64-bit | - |
| movslq | 32-bit | 64-bit | - |
| cltq | %eax | %rax | (abbreviation of movslq %eax, %rax) |
movq $0x0011223344556677, %rax ; %rax = $0x0000000044556677 (only low 8 bytes modified)
movabsq $0x0011223344556677, %rax ; %rax = 0x0011223344556677
movb $0xAA, %dl ; %dl = 0xAA (10101010)
movb %dl, %al ; %rax = 0x00112233445566AA (only lowest byte modified)
movsbq %dl, %rax ; %rax = 0xFFFFFFFFFFFFFFAA (sign-extended, 1-filled)
movzbq %dl, %rax ; %rax = 0x00000000000000AA (zero-extended, 0-filled)
In the x86-64 architecture, the stack is not merely a data structure; it is a hardware-encoded memory region maintained by the %rsp register. Its core logic lies in the reverse growth of addresses synchronized with pointer dereferencing.
The stack is a special region of memory used for temporarily storing data (such as local variables and saved register states).
push and pop are highly encapsulated instructions. Their execution flow can be decomposed into pointer arithmetic and memory access:
pushq %rax is fully equivalent to:
subq $8, %rsp ; 1. stack pointer moves downward (address decreases)
movq %rax, (%rsp) ; 2. write data to new top address
popq %rax is fully equivalent to:
movq (%rsp), %rax
addq $8, %rsp
Below is a trace analysis assuming %rsp initially equals 0x8000:
| Instruction | %rax Value | %rsp (Stack Top Address) | Memory State | Description |
|---|---|---|---|---|
| movq $0x123, %rax | 0x123 | 0x8000 | M[0x8000] = ? | Initialize register |
| pushq %rax | 0x123 | 0x7FF8 | M[0x7FF8] = 0x123 | Pointer moves down 8 bytes and writes |
| movq $0x22, %rax | 0x22 | 0x7FF8 | M[0x7FF8] = 0x123 | Register overwritten, memory unchanged |
| pushq %rax | 0x22 | 0x7FF0 | M[0x7FF0] = 0x22 | Pointer moves down again and writes |
| popq %rax | 0x22 | 0x7FF8 | M[0x7FF8] = 0x123、M[0x7FF0] = 0x22 | Data loaded into register, pointer restored |
Observe the last row popq %rax:
The pop operation only changes the %rsp pointer; it does not clear or erase old data in memory!
The 0x22 at address 0x7FF0 is not physically erased. Until overwritten by a future push, it is considered garbage data.
| Incorrect Instruction | Reason | Correct Form |
|---|---|---|
| movb $0xF, (%ebx) | Cannot use %ebx (32-bit) as address register | movb $0xF, (%rbx) |
| movl %rax, (%rsp) | Suffix l does not match 64-bit register rax | movq %rax, (%rsp) |
| movw (%rax), 4(%rsp) | Source and destination cannot both be memory | Use register as intermediary |
| movb %al, %sl | %sl register does not exist | movb %al, %sil |
| movq %rax, $0x123 | Immediate cannot be destination operand | movq $0x123, %rax |
| movl %eax, %dx | Destination operand size incorrect | movl %eax, %edx |
| movb %si, 8(%rbp) | Suffix b does not match 16-bit register si | movw %si, 8(%rbp) |
void swap(long *xp, long *yp) {
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}
Corresponding x86-64 assembly:
swap:
movq (%rdi), %rax ; t0 = *xp (xp in %rdi)
movq (%rsi), %rdx ; t1 = *yp (yp in %rsi)
movq %rdx, (%rdi) ; *xp = t1
movq %rax, (%rsi) ; *yp = t0
ret
Execution trace:
Assume long long array[8] base address is in %rdx and index 4 is in %rcx.
Goal: store array[4] into %rax
movq (%rdx, %rcx, 8), %rax # 1. Compute address %rdx + 4*8
# 2. Move 8 bytes from that memory address into %rax
Starting from the matrix implementation of main memory, this article systematically reviewed the following aspects of the x86-64 architecture:
Understanding these low-level mechanisms forms a solid foundation for subsequent study of arithmetic logic instructions, control flow, procedure calls, and memory hierarchy structures. Main memory is not only a container for data but also the core interface between the CPU and software. Mastering its working principles is essential to truly understanding the complete picture of computer systems.

微信公众号「极客日志」,在微信中扫描左侧二维码关注。展示文案:极客日志 zeeklog
使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online
将 Markdown(GFM)转为 HTML 片段,浏览器内 marked 解析;与 HTML转Markdown 互为补充。 在线工具,Markdown转HTML在线工具,online
将 HTML 片段转为 GitHub Flavored Markdown,支持标题、列表、链接、代码块与表格等;浏览器内处理,可链接预填。 在线工具,HTML转Markdown在线工具,online
通过删除不必要的空白来缩小和压缩JSON。 在线工具,JSON 压缩在线工具,online