0
# Disassembly Operations
1
2
Convert EVM bytecode to assembly language with support for various input formats including bytes, hex strings, and iterators. Provides detailed instruction analysis and supports all Ethereum hard forks for accurate opcode interpretation.
3
4
## Capabilities
5
6
### Single Instruction Disassembly
7
8
Disassemble a single EVM instruction from bytecode, extracting complete instruction metadata including operands.
9
10
```python { .api }
11
def disassemble_one(bytecode, pc: int = 0, fork: str = DEFAULT_FORK) -> Instruction:
12
"""
13
Disassemble a single instruction from bytecode.
14
15
Parameters:
16
- bytecode (str | bytes | bytearray | iterator): The bytecode stream
17
- pc (int, optional): Program counter of the instruction. Default: 0
18
- fork (str, optional): Fork name. Default: DEFAULT_FORK ("istanbul")
19
20
Returns:
21
Instruction: An Instruction object with complete metadata, or None if no instruction found
22
23
Raises:
24
ParseError: If bytecode is malformed or insufficient for operand parsing
25
"""
26
```
27
28
**Usage Examples:**
29
30
```python
31
from pyevmasm import disassemble_one
32
33
# Disassemble from bytes
34
instruction = disassemble_one(b'\x60\x40') # PUSH1 0x40
35
print(f"Name: {instruction.name}") # PUSH1
36
print(f"Operand: 0x{instruction.operand:x}") # 0x40
37
print(f"Gas: {instruction.fee}") # 3
38
39
# From hex string (without 0x prefix)
40
instruction = disassemble_one("6040")
41
print(f"Same instruction: {instruction.name}") # PUSH1
42
43
# Invalid instructions become INVALID
44
invalid = disassemble_one(b'\xff\xff')
45
print(f"Invalid opcode: {invalid.name}") # INVALID (for 0xff opcodes in some contexts)
46
47
# With program counter
48
instruction = disassemble_one(b'\x56', pc=100) # JUMP
49
print(f"PC: {instruction.pc}") # 100
50
```
51
52
### Multiple Instruction Disassembly
53
54
Disassemble all instructions in a bytecode sequence, returning a generator for memory-efficient processing of large bytecode.
55
56
```python { .api }
57
def disassemble_all(bytecode, pc: int = 0, fork: str = DEFAULT_FORK):
58
"""
59
Disassemble all instructions in bytecode.
60
61
Parameters:
62
- bytecode (str | bytes | bytearray | iterator): An EVM bytecode (binary)
63
- pc (int, optional): Program counter of the first instruction. Default: 0
64
- fork (str, optional): Fork name. Default: DEFAULT_FORK ("istanbul")
65
66
Returns:
67
Generator[Instruction]: A generator of Instruction objects
68
69
Note:
70
Generator stops when no more valid instructions can be parsed
71
"""
72
```
73
74
**Usage Examples:**
75
76
```python
77
from pyevmasm import disassemble_all
78
79
# Disassemble complete bytecode
80
bytecode = b'\x60\x80\x60\x40\x52\x60\x04\x36\x10'
81
instructions = list(disassemble_all(bytecode))
82
83
for instr in instructions:
84
print(f"{instr.pc:08x}: {instr}")
85
# Output:
86
# 00000000: PUSH1 0x80
87
# 00000002: PUSH1 0x40
88
# 00000004: MSTORE
89
# 00000005: PUSH1 0x4
90
# 00000007: CALLDATASIZE
91
# 00000008: LT
92
93
# Memory-efficient processing of large bytecode
94
for instruction in disassemble_all(large_bytecode):
95
if instruction.is_branch:
96
print(f"Branch at 0x{instruction.pc:x}: {instruction}")
97
```
98
99
### Text Disassembly
100
101
Disassemble bytecode to human-readable assembly text format, suitable for display and analysis.
102
103
```python { .api }
104
def disassemble(bytecode, pc: int = 0, fork: str = DEFAULT_FORK) -> str:
105
"""
106
Disassemble an EVM bytecode to text representation.
107
108
Parameters:
109
- bytecode (str | bytes | bytearray): Binary representation of EVM bytecode
110
- pc (int, optional): Program counter of the first instruction. Default: 0
111
- fork (str, optional): Fork name. Default: DEFAULT_FORK ("istanbul")
112
113
Returns:
114
str: The text representation of the assembly code (newline-separated)
115
"""
116
```
117
118
### Hexadecimal Disassembly
119
120
Disassemble hexadecimal bytecode strings to assembly text, handling common hex formats automatically.
121
122
```python { .api }
123
def disassemble_hex(bytecode: str, pc: int = 0, fork: str = DEFAULT_FORK) -> str:
124
"""
125
Disassemble hexadecimal EVM bytecode to assembly text.
126
127
Parameters:
128
- bytecode (str): Canonical representation of EVM bytecode (hexadecimal, with or without 0x prefix)
129
- pc (int, optional): Program counter of the first instruction. Default: 0
130
- fork (str, optional): Fork name. Default: DEFAULT_FORK ("istanbul")
131
132
Returns:
133
str: The text representation of the assembly code (newline-separated)
134
"""
135
```
136
137
**Usage Examples:**
138
139
```python
140
from pyevmasm import disassemble, disassemble_hex
141
142
# From binary bytecode
143
binary_code = b'\x60\x80\x60\x40\x52'
144
assembly = disassemble(binary_code)
145
print(assembly)
146
# PUSH1 0x80
147
# PUSH1 0x40
148
# MSTORE
149
150
# From hex string (most common usage)
151
hex_code = "0x608060405260043610"
152
assembly = disassemble_hex(hex_code)
153
print(assembly)
154
# PUSH1 0x80
155
# PUSH1 0x40
156
# MSTORE
157
# PUSH1 0x4
158
# CALLDATASIZE
159
# LT
160
161
# Hex without 0x prefix also works
162
assembly = disassemble_hex("608060405260043610")
163
# Same result
164
```
165
166
## Input Format Support
167
168
PyEVMAsm's disassembly functions accept multiple input formats:
169
170
### Binary Formats
171
- **bytes**: Native Python bytes objects
172
- **bytearray**: Mutable byte arrays
173
- **str (binary)**: Latin-1 encoded strings (legacy support)
174
- **iterator**: Any iterator yielding integer byte values
175
176
### Hexadecimal Formats
177
- **0x-prefixed**: "0x608060405260043610"
178
- **Plain hex**: "608060405260043610"
179
- **Mixed case**: Case-insensitive hex parsing
180
181
### Special Format Handling
182
183
The disassembly functions include intelligent format detection:
184
185
```python
186
from pyevmasm import disassemble_hex
187
188
# Automatically handles 0x prefix
189
code1 = disassemble_hex("0x6080604052")
190
code2 = disassemble_hex("6080604052")
191
assert code1 == code2
192
193
# Binary Ninja format detection (EVM prefix)
194
code3 = disassemble_hex("EVM6080604052") # Strips "EVM" prefix
195
196
# All-hex string detection
197
mixed_format = "6080604052" # Detected as hex even without 0x
198
```
199
200
## Fork-Specific Disassembly
201
202
Different Ethereum forks have different instruction sets. PyEVMAsm provides accurate disassembly for each fork:
203
204
```python
205
from pyevmasm import disassemble_one
206
207
# Byzantium introduced RETURNDATASIZE (0x3d)
208
instr = disassemble_one(b'\x3d', fork="byzantium")
209
print(instr.name) # "RETURNDATASIZE"
210
211
# Same opcode in frontier fork
212
instr = disassemble_one(b'\x3d', fork="frontier")
213
print(instr.name) # "INVALID"
214
215
# Constantinople introduced shift operations
216
instr = disassemble_one(b'\x1b', fork="constantinople")
217
print(instr.name) # "SHL"
218
219
instr = disassemble_one(b'\x1b', fork="byzantium")
220
print(instr.name) # "INVALID"
221
```
222
223
## Error Handling
224
225
Disassembly functions handle various error conditions gracefully:
226
227
- **Insufficient data**: Returns None or stops generator when not enough bytes for operands
228
- **Invalid opcodes**: Creates INVALID instruction objects for unknown opcodes
229
- **Empty input**: Returns None (single) or empty generator (multiple)
230
- **Malformed hex**: Raises appropriate parsing exceptions