EIP3540 - EVM Object Format (EOF) v1
# Abstract
We introduce an extensible and versioned container format for the EVM with a once-off validation at deploy time. The version described here brings the tangible benefit of code and data separation, and allows for easy introduction of a variety of changes in the future. This change relies on the reserved byte introduced by EIP-3541.
To summarise, EOF bytecode has the following layout:
format, magic, version, (section_kind, section_size)+, 0, <section contents>
# Motivation
On-chain deployed EVM bytecode contains no pre-defined structure today. Code is typically validated in clients to the extent of JUMPDEST
analysis at runtime, every single time prior to execution. This poses not only an overhead, but also a challenge for introducing new or deprecating existing features.
Validating code during the contract creation process allows code versioning without an additional version field in the account. Versioning is a useful tool for introducing or deprecating features, especially for larger changes (such as significant changes to control flow, or features like account abstraction).
The format described in this EIP introduces a simple and extensible container with a minimal set of changes required to both clients and languages, and introduces validation.
The first tangible feature it provides is separation of code and data. This separation is especially beneficial for on-chain code validators (like those utilised by layer-2 scaling tools, such as Optimism), because they can distinguish code and data (this includes deployment code and constructor arguments too). Currently they a) require changes prior to contract deployment; b) implement a fragile method; or c) implement an expensive and restrictive jump analysis. Code and data separation can result in ease of use and significant gas savings for such use cases. Additionally, various (static) analysis tools can also benefit, though off-chain tools can already deal with existing code, so the impact is smaller.
A non-exhaustive list of proposed changes which could benefit from this format:
- Including a
JUMPDEST
-table (to avoid analysis at execution time) and/or removingJUMPDEST
s entirely. - Introducing static jumps (with relative addresses) and jump tables, and disallowing dynamic jumps at the same time.
- Requiring code section(s) to be terminated by
STOP
. (Assumptions like this can provide significant speed improvements in interpreters, such as a speed up of ~7% seen in evmone (opens new window).) - Multi-byte opcodes without any workarounds.
- Representing functions as individual code sections instead of subroutines.
- Introducing special sections for different use cases, notably Account Abstraction.
# Specification
We use RFC2119 (opens new window) keywords in this section.
In order to guarantee that every EOF-formatted contract in the state is valid, we need to prevent already deployed (and not validated) contracts from being recognized as such format. This is achieved by choosing a byte sequence for the magic that doesn't exist in any of the already deployed contracts.
# Remarks
For purely reference purposes we call the 0xEF
byte the FORMAT
.
The initcode is the code executed in the context of the create transaction, CREATE
, or CREATE2
instructions. The initcode returns code (via the RETURN
instruction), which is inserted into the account. See section 7 ("Contract Creation") in the Yellow Paper for more information.
The opcode 0xEF
is currently an undefined instruction, therefore: It pops no stack items and pushes no stack items, and it causes an exceptional abort when executed. This means initcode or already deployed code starting with this instruction will continue to abort execution.
# Code validation
In this fork we introduce code validation for new contract creation. To achieve this, we define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.
We define the EOF prefix as the concatenation of FORMAT
and the magic.
At block.number == HF_BLOCK
new contract creation is modified:
- if initcode or code starts with the EOF prefix, it is considered to be EOF formatted and will undergo validation specified in the following sections,
- else if code starts with
0xEF
, creation continues to result in an exceptional abort (the rule introduced in EIP-3540), - otherwise code is considered legacy code and the following rules do not apply to it.
# Container specification
The container starts with the header:
description | length | value | |
---|---|---|---|
format | 1-byte | 0xEF | |
magic | 2-byte | 0xCA 0xFE | Subject to change, may be removed. |
version | 1-byte | 0x01 | means EOF1 |
This is followed by one or more section headers:
description | length | |
---|---|---|
section_kind | 1-byte | Encoded as a 8-bit unsigned number. |
section_size | 2-bytes | Encoded as a 16-bit unsigned big-endian number. |
The section kinds are defined as follows:
section_kind | meaning |
---|---|
0 | terminator |
1 | code |
2 | data |
If the terminator is encountered, section size MUST NOT follow.
The section contents follow after the header, in the order and size they are defined, without any padding bytes.
To summarise, the bytecode has the following layout:
format, magic, version, (section_kind, section_size)+, 0, <section contents>
# Validation rules
A bytestream starting with the EOF prefix declares itself conforming to the rules according to its version.
- The rules of
version=1
are specified below:
section_size
MUST NOT be 0.- Exactly one code section MUST be present.
- The code section MUST be the first section.
- A single data section MAY follow the code section.
- Stray bytes outside of sections MUST NOT be present. This includes trailing bytes after the last section.
- Any other version is invalid.
(Note: Contract creation code SHOULD set the section size of the data section so that the constructor arguments fit it.)
# Changes to execution semantics
For clarity, the container refers to the complete account code, while code refers to the contents of the code section only.
- JUMPDEST analysis is only run on the code.
- Execution starts at the first byte of the code, and
PC
is set to 0. - If
PC
goes outside of the code section bounds, execution aborts with failure. PC
returns the current position within the code.JUMP
/JUMPI
uses an absolute offset within the code.CODECOPY
/CODESIZE
/EXTCODECOPY
/EXTCODESIZE
/EXTCODEHASH
keeps operating on the entire container.- The input to
CREATE
/CREATE2
is still the entire container.
# Changes to contract creation semantics
For clarity, the EOF prefix together with a version number n is denoted as the EOFn prefix, e.g. EOF1 prefix.
- If initcode's container has EOF1 prefix it must be valid EOF1 code.
- If code's container has EOF1 prefix it must be valid EOF1 code.
# Rationale
EVM and/or account versioning has been discussed numerous times over the past years. This proposal aims to learn from them. See this collection of previous proposals (opens new window) for a good starting point.
# Execution vs. creation time validation
This specification introduces creation time validation, which means:
- All created contracts with EOFn prefix are valid according to version n rules. This is very strong and useful property. The client can trust that the deployed code is well-formed.
- In future, this allows to serialize
JUMPDEST
map in the EOF container and eliminate the need of implicitJUMPDEST
analysis required before execution. - Or to completely remove the need for
JUMPDEST
instructions. - This helps with deprecating EVM instructions and/or features.
- The biggest disadvantage is that deploy-time validation of EOF code must be enabled in two hard-forks. However, the first step (EIP-3541) is already deployed in London.
The alternative is to have execution time validation for EOF. This is performed every single time a contract is executed, however clients may be able to cache validation results. This alternative approach has the following properties:
- Because the validation is consensus-level execution step, it means the execution always requires the entire code. This makes code merkleization impractical.
- Can be enabled via a single hard-fork.
- Better backwards compatibility: data contracts starting with the
0xEF
byte or the EOF prefix can be deployed. This is a dubious benefit however.
# Contract creation restrictions
The Changes to contact creation semantics section defines minimal set of restrictions related to the contract creation: if initcode or code has the EOF1 container prefix it must be validated. This adds two validation steps in the contract creation, any of it failing will result in contract creation failure.
Since initcode and code are evaluated for EOF1 independently, number of interesting combinations are allowed:
- Create transaction with EOF1 initcode can deploy legacy contract,
- EOF1 contract can execute
CREATE
instruction with legacy initcode to create new legacy contract, - Legacy contract can execute
CREATE
instruction with EOF1 initcode to create new EOF1 contract, - Legacy contract can execute
CREATE
instruction with EOF1 initcode to create new legacy contract, - etc.
To limit the number of exotic bytecode version combinations, additional restrictions are considered, but currently are not part of the specification:
- The EOF version of initcode must much the version of code.
- An EOF1 contract must not create legacy contracts.
Finally, create transaction must be allowed to contain legacy initcode and deploy legacy code because otherwise there is no transition period allowing upgrading transaction signing tools. Deprecating such transactions may be considered in future.
# The FORMAT byte
The 0xEF
byte was chosen because it is reserved for this purpose by EIP-3541.
# Section structure
We have considered different questions for the sections:
- Streaming headers (i.e.
section_header, section_data, section_header, section_data, ...
) are used in some other formats (such as WebAssembly). They are handy for formats which are subject to editing (adding/removing sections). That is not a useful feature for EVM. One minor benefit applicable to our case is that they do not require a specific "header terminator". On the other hand they seem to play worse with code chunking / merkleization, as it is better to have all section headers in a single chunk. - Whether to have a header terminator or to encode
number_of_sections
ortotal_size_of_headers
. Both raise the question how large of a value these fields should be able to hold. While today there will be only two sections, in case each "EVM function" would become a separate code section, a fixed 8-bit field may not be big enough. A terminator byte seems to avoid these problems. - Whether to encode
section_size
as a fixed 16-bit value or some kind of variable length field (e.g. LEB128 (opens new window)). We have opted for fixed size, because it simplifies client implementations, and 16-bit seems enough, because of the currently exposed code size limit of 24576 bytes (see EIP-170 and EIP-2677). Should this be limiting in the future, a new EOF version could change the format. Besides simplifying client implementations, not using LEB128 also greatly simplifies on-chain parsing.
# PC starts with 0 at the code section
The values for PC
and JUMP
/JUMPI
start with 0 and are within the code section. We considered keeping PC
/JUMP
/JUMPI
values to operate on the whole container and be consistent with CODECOPY
/EXTCODECOPY
but in the end decided otherwise. It looks to be much easier to propose EOF extensions that affect jumps and jumpdests when JUMP
/JUMPI
already operates on indexes within code section only. This also feels more natural and easier to implement in EVM: the new EOF EVM should only care about traversing code and accessing other parts of the container only on special occasions (e.g. in CODECOPY
instruction).
# Backwards Compatibility
This is a breaking change given that any code starting with 0xEF
was not deployable before (and resulted in exceptional abort if executed), but now some subset of such codes can be deployed and executed successfully.
The choice of magic guarantees that none of the contracts existing on the chain are affected by the new rules.
# Test Cases
# EOF validation
# Valid cases
- Code section without data section
- Code section with data section
# Invalid cases
These cases use 00
as magic.
Bytecode | Validation error |
---|---|
EF | No magic |
EFFF01010002020004006000AABBCCDD | Invalid magic |
EF00 | No version |
EF0000010002020004006000AABBCCDD | Invalid version |
EF0002010002020004006000AABBCCDD | Invalid version |
EF00FF010002020004006000AABBCCDD | Invalid version |
EF0001 | No header |
EF000100 | No code section |
EF000101 | No code section size |
EF00010100 | Code section size incomplete |
EF0001010002 | No section terminator |
EF000101000200 | No code section contents |
EF00010100020060 | Code section contents incomplete |
EF0001010002006000DEADBEEF | Trailing bytes after code section |
EF00010100020100020060006000 | Multiple code sections |
EF000101000000 | Empty code section |
EF000101000002000200AABB | Empty code section |
EF000102000401000200AABBCCDD6000 | Data section preceding code section |
EF0001020004AABBCCDD | Data section without code section |
EF000101000202 | No data section size |
EF00010100020200 | Data section size incomplete |
EF0001010002020004 | No section terminator |
EF0001010002020004006000 | No data section contents |
EF0001010002020004006000AABBCC | Data section contents incomplete |
EF0001010002020004006000AABBCCDDEE | Trailing bytes after data section |
EF0001010002020004020004006000AABBCCDDAABBCCDD | Multiple data sections |
EF0001010002020000006000 | Empty data section |
EF0001010002030004006000AABBCCDD | Unknown section (id = 3) |
# Contract creation
All cases should be checked for creation transaction, CREATE
and CREATE2
.
- Legacy init code
- Returns legacy code
- Returns valid EOF1 code
- Returns invalid EOF1 code
- Returns 0xEF not followed by EOF1 code
- Valid EOF1 init code
- Returns legacy code
- Returns valid EOF1 code
- Returns invalid EOF1 code
- Returns 0xEF not followed by EOF1 code
- Invalid EOF1 init code
# Contract execution
- Valid EOF code containing
JUMP
/JUMPI
- offsets relative to code section start are used JUMP
/JUMPI
to5B
(JUMPDEST
) byte outside of code section - exceptional abort- EOF code containing
PC
opcode - offset inside code section is returned PUSH*
instructions- Complete push data - no changes expected
- Truncated push data without data section - execution ends with exceptional abort
- Truncated push data with data section - execution ends with exceptional abort
- Execution flows out of code section bounds (i.e. PC gets to
code_section_size
) - exceptional abort - EOF code containing
CODECOPY/CODESIZE
- works as in legacy codeCODESIZE
returns the size of entire containerCODECOPY
can copy from code sectionCODECOPY
can copy from data sectionCODECOPY
can copy from the EOF headerCODECOPY
can copy entire container
EXTCODECOPY/EXTCODESIZE/EXTCODEHASH
with the EOF target contract - works as with legacy target contractEXTCODESIZE
returns the size of entire target containerEXTCODEHASH
returns the hash of entire target containerEXTCODECOPY
can copy from target's code sectionEXTCODECOPY
can copy from target's data sectionEXTCODECOPY
can copy from target's EOF headerEXTCODECOPY
can copy entire target container- Results don't differ when executed inside legacy or EOF contract
# Reference Implementation
# Generic Implementation
FORMAT = 0xEF
MAGIC = 0x00 # To be defined
VERSION = 0x01
S_TERMINATOR = 0x00
S_CODE = 0x01
S_DATA = 0x02
def validate_eof(code: bytes):
# Old-style contracts are still allowed
if len(code) == 0 or code[0] != FORMAT:
return
# Validate format and magic
assert(len(code) >= 3 and code[1] == MAGIC and code[2] == VERSION)
# Process section headers
section_sizes = {S_CODE: 0, S_DATA: 0}
pos = 3
while True:
# Terminator not found
assert(pos < len(code))
section_id = code[pos]
pos += 1
if section_id == S_TERMINATOR:
break
# Disallow unknown sections
assert(section_id in section_sizes)
# Data section preceding code section
assert(not (section_id == S_DATA and section_sizes[S_CODE] == 0))
# Multiple sections with the same id
assert(section_sizes[section_id] == 0)
# Truncated section size
assert((pos + 1) < len(code))
section_sizes[section_id] = (code[pos] << 8) | code[pos + 1]
pos += 2
# Empty section
assert(section_sizes[section_id] != 0)
# Code section cannot be absent
assert(section_sizes[S_CODE] != 0)
# The entire container must be scanned
assert(len(code) == (pos + section_sizes[S_CODE] + section_sizes[S_DATA]))
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Simplified Implementation
Given the rigid rules of EOF1 it is possible to implement support for the container in clients using very simple pattern matching:
FORMAT = 0xEF
MAGIC = 0x00 # To be defined
VERSION = 0x01
S_TERMINATOR = 0x00
S_CODE = 0x01
S_DATA = 0x02
def validate_eof(code: bytes):
total_size = 0
if len(code) > 7 and code[0] == FORMAT and code[1] == MAGIC and code[2] == VERSION and code[3] == S_CODE and code[6] == S_TERMINATOR:
total_size = 7 + ((code[4] << 8) | code[5])
elif len(code) > 10 and code[0] == FORMAT and code[1] == MAGIC and code[2] == VERSION and code[3] == S_CODE and code[6] == S_DATA and code[9] == S_TERMINATOR:
total_size = 10 + ((code[4] << 8) | code[5]) + ((code[7] << 8) | code[8])
else:
assert(len(code) == 0 or code[0] != FORMAT)
assert(len(code) == total_size)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
However, future versions may introduce more sections or loosen up restrictions, requiring clients to actually parse sections instead of pattern matching.
# Security Considerations
Proposed validation rules can be checked at constant time, therefore it should not be easily attackable. This is subject to change with future extensions.
Currently initcode validation has no extra cost and the currently charged creation costs should be sufficient, however we consider adding an additional gas cost for contract creation.
# Copyright
Copyright and related rights waived via CC0 (opens new window).