x86-64 Memory Architecture and mov Instructions: Deep Dive into Addressing Mechanisms, Stack Operati

x86-64 Memory Architecture and mov Instructions: Deep Dive into Addressing Mechanisms, Stack Operati

本文为纯手打原创硬核干货,适合学习计算机组成汇编CSAPP 的同学,欢迎真实阅读、交流。


Based on the x86-64 architecture, this article starts with the matrix-based physical implementation of main memory, systematically breaks down the memory addressing mechanism, the family of data transfer instructions, and the logic of stack operations. It will help you fully grasp the underlying principles of CPU-memory interaction and thoroughly understand the essence of the mov instruction.


I. Physical Implementation of Main Memory and Addressing

1.1 Matrix-Based Storage Structure

Modern main memory is implemented using a matrix structure, rather than a simple linear arrangement. The core motivation behind this design is to avoid the physical wiring challenges caused by “extremely long memory address lines.”

Core Idea: Map a one-dimensional address into a two-dimensional physical space through row and column decoding, significantly reducing the complexity of address decoding circuits.

1.2 Memory Model in x86-64

In the x86-64 architecture, the memory system has the following key characteristics:

FeatureDescription
Address SpaceTheoretically supports 2642^{64}264 distinct memory addresses
Address LengthEach memory address requires 64 bits in binary representation
Addressing GranularityEach address points to a single byte (1 byte = 8 bits)
Byte OrderLittle-Endian
Although the architecture supports a 2642^{64}264 address space, actual implementations typically use only 48 bits (256 TB) to avoid excessively large page tables.

II. Data Formats and mov Instruction Rules

x86-64 Data Types

x86-64 assembly instructions use a single-character suffix to explicitly indicate the operand size. This is fundamental to understanding instruction behavior:

SuffixFull NameSizeExample Instruction
bbyte1 byte (8 bits)movb
wword2 bytes (16 bits)movw
llong/double word4 bytes (32 bits)movl
qquad word8 bytes (64 bits)movq

2.1 Data Movement Instructions (mov)

InstructionEffect
movbmove byte
movwmove word
movlmove double word
movqmove quad word
InstructionEffect
movbmove byte
movwmove word

2.2 Special Transfer Instruction: movabsq

When handling 64-bit immediate values, the standard movq may cause truncation (sign extension). In such cases, movabsq must be used:

; Incorrect: immediate is truncated to 32 bits and then sign-extended movq $0x0011223344556677, %rax ; %rax = 0x0000000044556677 ❌ ; Correct: full 64-bit immediate transfer movabsq $0x0011223344556677, %rax ; %rax = 0x0011223344556677 ✓ 

III. x86-64 Register System

3.1 Hierarchical Structure of General-Purpose Registers

x86-64 registers support partial access, which is a key design for backward compatibility with 32/16/8-bit code:

Common x86-64 Registers

Data Consistency of Registers:
As shown above, these registers of different sizes are physically nested accesses to the same underlying storage unit.

This means registers of the same type completely share their contents:

  • Write synchronization: Writing to a lower-sized register (such as %eax) will also affect the corresponding 64-bit register (%rax).
  • Partial read: No matter which size alias you operate on, you are accessing the same data. If you write a 64-bit integer into %rax, you can later read its lower 32 bits through %eax or its lowest byte through %al.
In the x86-64 specification, writing to a 32-bit register (such as %eax) will automatically zero the upper 32 bits.

3.2 Common Registers Overview

64-bit32-bit16-bit8-bitUsage
%rax%eax%ax%alAccumulator, return value
%rbx%ebx%bx%blBase register
%rcx%ecx%cx%clCounter, loop variable
%rdx%edx%dx%dlData register
%rsi%esi%si%silSource index
%rdi%edi%di%dilDestination index
%r8-%r15%r8d-%r15d%r8w-%r15w%r8b-%r15bExtended registers

In functions, %rdi generally represents the first argument, %rsi the second argument, and %rcx often corresponds to the C-language index “i”.
Note that these are 64-bit registers corresponding to the long type. If the parameter type is different, the corresponding register variant must be used. For example, if the parameter is int, use %edi and %esi.


IV. Operand Addressing Modes

x86-64 provides flexible addressing methods, which can be divided into three categories: Immediate, Register, and Memory.
Memory refers to the value stored at the address contained in a register, which may be a value or another address.

4.1 Addressing Mode Quick Reference

TypeFormOperand ValueName
Immediate$ImmImmImmImmImmediate
RegisterraR[ra]R[ra]R[ra]Register
AbsoluteImmM[Imm]M[Imm]M[Imm]Absolute
Indirect(ra)M[R[ra]]M[R[ra]]M[R[ra]]Indirect
Base + displacementImm(rb)M[Imm+R[rb]]M[Imm + R[rb]]M[Imm+R[rb]]Base + displacement
Indexed(rb, ri)M[R[rb]+R[ri]]M[R[rb] + R[ri]]M[R[rb]+R[ri]]Indexed
Scaled indexed(rb, ri, s)M[R[rb]+R[ri]⋅s]M[R[rb] + R[ri] \cdot s]M[R[rb]+R[ri]s]Scaled indexed
📝 Scale factor s: can only be 1, 2, 4, 8 (convenient for array element access)

4.2 Address Calculation Example

Assume the following register and memory state:

AddressValueRegisterValue
0x1000xFF%rax0x100
0x1040xAB%rcx0x1
0x1080x13%rdx0x3
0x10C0x11--

Then the operand values are:

OperandCalculationResult
%raxR[%rax]0x100
0x104M[0x104]0xAB
$0x1080x1080x108
(%rax)M[0x100]0xFF
4(%rax)M[0x100 + 4]0xAB
9(%rax, %rdx)M[0x100 + 0x3 + 9]0x11
260(%rcx, %rdx)M[0x1 + 0x3 + 0x104]0x13
(%rax, %rdx, 4)M[0x100 + 0x3 * 4]0x11
  • Common scenario: %rdi is the array A in parameters, %rcx represents offset i
    • Registers and addressing logic
      %rdi = A (array base address/pointer)
      %rcx = i (index/offset)
    • Addressing mapping for different data types
      int *A: (%rdi, %rcx, 4) -> A[i]
      char *A: (%rdi, %rcx, 1) -> A[i]
      long *A: (%rdi, %rcx, 8) -> A[i]

V. Data Extension Transfer Instructions

5.1 Zero Extension

Extend smaller data to larger data, filling high bits with zero:

InstructionSourceDestinationEffect
movzbw8-bit16-bitzero-extend to word
movzbl8-bit32-bitzero-extend to double word
movzbq8-bit64-bitzero-extend to quad word
movzwl16-bit32-bitzero-extend to double word
movzwq16-bit64-bitzero-extend to quad word
  • Special equivalence: movzlq %ecx, %rax is equivalent to movl %ecx, %eax, because a 32-bit move automatically zero-extends to 64 bits.

5.2 Sign Extension

Extend smaller data to larger data, copying the sign bit into higher bits:

InstructionSourceDestination
movsbw8-bit16-bit
movsbl8-bit32-bit
movsbq8-bit64-bit
movswl16-bit32-bit
movswq16-bit64-bit
movslq32-bit64-bit
cltq%eax%rax(abbreviation of movslq %eax, %rax)
  • cltq = movslq %eax, %rax => %rax = sign-extend(%eax)

5.3 Extension Instruction Comparison Experiment

movq $0x0011223344556677, %rax ; %rax = $0x0000000044556677 (only low 8 bytes modified) movabsq $0x0011223344556677, %rax ; %rax = 0x0011223344556677 movb $0xAA, %dl ; %dl = 0xAA (10101010) movb %dl, %al ; %rax = 0x00112233445566AA (only lowest byte modified) movsbq %dl, %rax ; %rax = 0xFFFFFFFFFFFFFFAA (sign-extended, 1-filled) movzbq %dl, %rax ; %rax = 0x00000000000000AA (zero-extended, 0-filled) 

VI. Stack Operations: Push and Pop

In the x86-64 architecture, the stack is not merely a data structure; it is a hardware-encoded memory region maintained by the %rsp register. Its core logic lies in the reverse growth of addresses synchronized with pointer dereferencing.

6.1 Core Concept: Counterintuitive “Downward Growth”

The stack is a special region of memory used for temporarily storing data (such as local variables and saved register states).

  • Stack pointer: The %rsp register always stores the starting address of the current top element of the stack.
  • Downward growth: A push operation causes %rsp to move toward lower addresses. In other words, as more data is pushed, the numerical value in %rsp becomes smaller.
  • Operation unit: In x86-64 systems, the standard stack operation unit is a Quadword (8 bytes / 64 bits).

6.2 Instruction Decomposition: Micro-Operation Equivalence

push and pop are highly encapsulated instructions. Their execution flow can be decomposed into pointer arithmetic and memory access:

  • Push
    Executing pushq %rax causes the CPU to perform:
  1. Pointer decrement: subtract 8 from %rsp (move downward by 8 bytes to allocate space).
  2. Memory write: store the data in %rax into the new address pointed to by %rsp.
pushq %rax is fully equivalent to: subq $8, %rsp ; 1. stack pointer moves downward (address decreases) movq %rax, (%rsp) ; 2. write data to new top address 
  • Pop
    Executing popq %rax performs the reverse sequence:
  1. Memory read: copy the value pointed to by %rsp into %rax.
  2. Pointer increment: add 8 to %rsp (reclaim the 8-byte space).
popq %rax is fully equivalent to: movq (%rsp), %rax addq $8, %rsp 

6.3 State Trace: Register and Memory Instantaneous States

Below is a trace analysis assuming %rsp initially equals 0x8000:

Instruction%rax Value%rsp (Stack Top Address)Memory StateDescription
movq $0x123, %rax0x1230x8000M[0x8000] = ?Initialize register
pushq %rax0x1230x7FF8M[0x7FF8] = 0x123Pointer moves down 8 bytes and writes
movq $0x22, %rax0x220x7FF8M[0x7FF8] = 0x123Register overwritten, memory unchanged
pushq %rax0x220x7FF0M[0x7FF0] = 0x22Pointer moves down again and writes
popq %rax0x220x7FF8M[0x7FF8] = 0x123、M[0x7FF0] = 0x22Data loaded into register, pointer restored
Observe the last row popq %rax:
The pop operation only changes the %rsp pointer; it does not clear or erase old data in memory!
The 0x22 at address 0x7FF0 is not physically erased. Until overwritten by a future push, it is considered garbage data.

VII. Common Errors and Notes

7.1 Illegal Operand Combinations

Incorrect InstructionReasonCorrect Form
movb $0xF, (%ebx)Cannot use %ebx (32-bit) as address registermovb $0xF, (%rbx)
movl %rax, (%rsp)Suffix l does not match 64-bit register raxmovq %rax, (%rsp)
movw (%rax), 4(%rsp)Source and destination cannot both be memoryUse register as intermediary
movb %al, %sl%sl register does not existmovb %al, %sil
movq %rax, $0x123Immediate cannot be destination operandmovq $0x123, %rax
movl %eax, %dxDestination operand size incorrectmovl %eax, %edx
movb %si, 8(%rbp)Suffix b does not match 16-bit register simovw %si, 8(%rbp)

7.2 Key Principles

  1. mov is just a copy: No persistent link is established between source and destination; it is only a value copy.
  2. Memory-to-memory prohibited: x86-64 does not allow a single instruction to directly transfer data between two memory locations.
  3. Immediate constraints: Immediate values cannot be destination operands, and their size is limited by the instruction suffix.

VIII. Practice: C Code and Assembly Comparison

8.1 Swap Function

voidswap(long*xp,long*yp){long t0 =*xp;long t1 =*yp;*xp = t1;*yp = t0;}

Corresponding x86-64 assembly:

swap: movq (%rdi), %rax ; t0 = *xp (xp in %rdi) movq (%rsi), %rdx ; t1 = *yp (yp in %rsi) movq %rdx, (%rdi) ; *xp = t1 movq %rax, (%rsi) ; *yp = t0 ret 

Execution trace:

  • Initial: %rdi=0x120 (stores 123), %rsi=0x100 (stores 456)
  • After execution: 0x120 becomes 456, 0x100 becomes 123

8.2 Array Addressing

Assume long long array[8] base address is in %rdx and index 4 is in %rcx.
Goal: store array[4] into %rax

movq (%rdx, %rcx, 8), %rax # 1. Compute address %rdx + 4*8 # 2. Move 8 bytes from that memory address into %rax 

IX. Summary and Outlook

Starting from the matrix implementation of main memory, this article systematically reviewed the following aspects of the x86-64 architecture:

  1. Addressing system: 2642^{64}264 byte address space, byte-level addressing
  2. Data formats: b/w/l/q sizes with explicit instruction suffixes
  3. Register design: hierarchical partial access mechanism
  4. Flexible addressing: 9 memory addressing modes with scaled indexing
  5. Data extension: precise control of zero and sign extension
  6. Stack mechanism: push/pop operations growing toward lower addresses

Understanding these low-level mechanisms forms a solid foundation for subsequent study of arithmetic logic instructions, control flow, procedure calls, and memory hierarchy structures. Main memory is not only a container for data but also the core interface between the CPU and software. Mastering its working principles is essential to truly understanding the complete picture of computer systems.


References

  • “Computer Organization” course notes
  • Computer Systems: A Programmer’s Perspective (CS:APP), Chapter 3
  • Intel 64 and IA-32 Architectures Software Developer’s Manual

Last updated: 2026-02-26

Read more

Pycharm中Github Copilot插件安装与配置全攻略(2023最新版)

PyCharm中GitHub Copilot:从安装到实战的深度配置指南 如果你是一位Python开发者,最近可能已经被各种关于AI编程助手的讨论所包围。GitHub Copilot,这个由GitHub和OpenAI联手打造的“结对编程”伙伴,已经不再是科技新闻里的概念,而是实实在在地进入了我们的开发工作流。特别是在PyCharm这样的专业IDE中,Copilot的集成能带来怎样的化学反应?是效率的倍增,还是全新的编码体验?这篇文章,我将从一个深度使用者的角度,带你走完从零安装到高效实战的全过程,并分享一些官方文档里不会告诉你的配置技巧和实战心得。 1. 环境准备与账号激活:迈出第一步 在开始安装插件之前,我们需要确保两件事:一个可用的GitHub Copilot订阅,以及一个正确版本的PyCharm IDE。很多人第一步就卡在了这里。 首先,关于订阅。GitHub Copilot提供个人和商业两种订阅计划。对于个人开发者,尤其是学生和开源项目维护者,GitHub有相应的优惠甚至免费政策。你需要一个GitHub账号,并前往 GitHub Copilot 官方页面 进行注册和订

VSCode + Copilot下:配置并使用 DeepSeek

以下是关于在 VSCode + Copilot 中,通过 OAI Compatible Provider for Copilot 插件配置使用 DeepSeek 系列模型 (deepseek-chat, deepseek-reasoner, deepseek-coder) 的完整汇总指南。 🎯 核心目标 通过该插件,将支持 OpenAI API 格式的第三方大模型(此处为 DeepSeek)接入 VSCode 的官方 Copilot 聊天侧边栏,实现调用。 📦 第一步:准备工作 在开始配置前,确保完成以下准备: 步骤操作说明1. 安装插件在 VSCode 扩展商店搜索并安装 OAI Compatible Provider for Copilot。这是连接 Copilot 与第三方模型的核心桥梁。2. 获取 API

低代码开发,企业应用搭建的新捷径

低代码开发,企业应用搭建的新捷径

低代码开发,让企业应用搭建像搭积木一样简单 你知道吗?在当今数字化时代,企业对于应用系统的需求日益增长,但传统开发方式往往耗时耗力,成本高昂。而低代码开发的出现,为企业带来了全新的解决方案,让应用搭建变得像搭积木一样简单。 一、低代码开发的优势 效率大幅提升:与传统开发相比,低代码开发无需编写大量代码,通过可视化界面和拖拽操作,即可快速构建应用。这大大缩短了开发周期,从数月甚至数年缩短至数周或数月。例如,某企业原本需要6个月开发一个客户关系管理系统,采用低代码开发后,仅用了2个月就完成了搭建。 降低技术门槛:低代码开发平台通常提供了丰富的组件和模板,非技术人员也能够轻松上手。这使得企业内部的业务人员可以参与到应用开发中来,更好地满足业务需求。同时,也减少了对专业开发人员的依赖,降低了开发成本。 易于维护和更新:低代码开发平台生成的应用具有良好的可维护性和可扩展性。当业务需求发生变化时,可以快速对应用进行修改和更新,无需重新开发整个系统。这使得企业能够更加灵活地应对市场变化,提高竞争力。 二、低代码开发在企业中的应用场景 办公自动化(OA)系统:OA系统是企业日常办公中

5分钟部署Qwen2.5极速对话机器人,CPU也能流畅体验AI聊天

5分钟部署Qwen2.5极速对话机器人,CPU也能流畅体验AI聊天 1. 引言 随着大语言模型的快速发展,越来越多开发者希望在本地或边缘设备上快速部署轻量级AI对话服务。然而,大多数高性能模型对硬件要求较高,尤其依赖GPU进行推理,这限制了其在低算力环境中的应用。 本文将介绍如何通过 Qwen/Qwen2.5-0.5B-Instruct 镜像,在仅使用CPU的情况下,5分钟内完成一个极速AI对话机器人的部署。该模型专为边缘计算优化,体积小、启动快、响应迅速,适合资源受限场景下的中文问答、文案生成与基础代码编写任务。 本方案无需复杂配置,支持一键拉起Web聊天界面,真正实现“开箱即用”的AI交互体验。 💡 适用人群: - 希望在无GPU环境下运行LLM的开发者 - 边缘计算、IoT设备集成AI功能的技术人员 - 快速搭建Demo原型的产品经理和学生 2. 技术背景与核心优势 2.1 Qwen2.5-0.5B-Instruct 模型简介 Qwen/Qwen2.5-0.5B-Instruct