EIP2315 - Simple Subroutines for the EVM
# Abstract
This proposal introduces two opcodes to support simple subroutines: RJUMPSUB
and RETURNSUB
.
Taken together with other recent propoposals it provides an efficient, static, and safe control-flow facility.
# Motivation
The EVM does not provide subroutines as primitives.
Instead, calls can be synthesized by fetching and pushing the return address and subroutine address on the data stack and executing JUMP
to the subroutine; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. These conventions cost gas and use stack slots unnecessarily. And they create unnecessary complexity that is borne by the humans and programs writing, reading, and analyzing EVM code. Note: A stack-rolling operator could reduce the complexity of the code -- it could could move return address to the top of the stack and implicitly shift the intervening items. But it would be an expensive operation with a dynamic gas cost.
As an alternative, we propose to provide for subroutines with two simple operations: RJUMPSUB
to call a subroutine and RETURNSUB
to return from it. Taken together with other recent propoposals they provide an efficient, static, and safe control-flow facility.
Efficient. Substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines -- as much as 56% savings in gas in the analysis below. Substantial performance advantages for static jumps are reported elsewhere.
Static. All possible jumps are known at contract creation time.
Safe. Valid contracts will not halt with an exception unless they run out of gas or recursively overflow stack.
# History
Facilities to directly support subroutines are provided by all but one of the real and virtual machines we have programmed, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, Sun SPARC, a few generations of Intel silicon, ARM, UCSD p-machine, Sun JVM, Wasm, and the sole exception -- the EVM. In whatever form, these operations provide for
- capturing the current context of execution,
- transferring control to a new context, and
- returning to the original context
- after possible further transfers of control
- to some arbitrary depth.
The concept (opens new window) goes back to Turing's Automatic Computing Engine of 1946 (opens new window):
... We also wish to be able to arrange for the splitting up of operations into subsidiary operations. This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation. ... When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note. How is the burying and disinterring of the note to be done? There are of course many ways. One is to keep a list of these notes in one or more standard size delay lines, (1024) with the most recent last...
Turing's machine held data in delay lines made of mercury-filled crystals. We have better hardware now, but the concept is simple and by now familiar: Turing's "subsidiary operations" are subroutines, and his "list of notes" is a stack of return addresses.
We propose to follow Turing's lead in the design of our subroutine facility, as specified below.
# Efficiency Analysis
We show here how these instructions can be used to reduce the complexity and gas costs of both ordinary subroutine calls and low-level optimizations compared to using JUMP
.
We assume that RJUMPSUB
costs 5 gas, and RETURNSUB
costs 8 gas. We justify these costs below, but they do not make a large difference in this analysis, as much of the improvement is due to PUSH
and SWAP
operations that are no longer needed.
# Simple Subroutine Call
Consider this example of calling a fairly minimal subroutine.
Subroutine call, using RJUMPSUB
:
TEST_SQUARE:
0x02 ; 3 gas
jumpsub SQUARE ; 5 gas
returnsub ; 3 gas
SQUARE:
dup1 ; 3 gas
mul ; 5 gas
returnsub ; 3 gas
Total 22 gas.
2
3
4
5
6
7
8
9
10
11
Subroutine call, using JUMP
:
TEST_SQUARE:
jumpdest ; 1 gas
RTN_SQUARE ; 3 gas
0x02 ; 3 gas
SQUARE ; 3 gas
jump ; 8 gas
RTN_SQUARE:
jumpdest ; 1 gas
swap1 ; 3 gas
jump ; 8 gas
SQUARE:
jumpdest ; 1 gas
dup1 ; 3 gas
mul ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
Total: 50 gas
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Using RJUMPSUB
saves 50 - 22 = 28 gas versus using JUMP
-- a 56% improvement.
# Tail Call Optimization
Of course in cases like this one we can optimize the tail call, so that the return from SQUARE
actually returns from TEST_SQUARE
.
Tail call optimization, using RJUMP
and RETURNSUB
:
TEST_SQUARE:
0x02 ; 3 gas
rjump SQUARE ; 3 gas
SQUARE:
dup1 ; 3 gas
mul ; 5 gas
returnsub ; 3 gas
Total: 17 gas
2
3
4
5
6
7
8
9
10
Tail call optimization, using JUMP
:
TEST_SQUARE:
jumpdest ; 1 gas
0x02 ; 3 gas
SQUARE ; 3 gas
jump ; 8 gas
SQUARE:
jumpdest ; 1 gas
dup1 ; 3 gas
mul ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
Total: 33 gas
2
3
4
5
6
7
8
9
10
11
12
13
14
Using RJUMPSUB
versus JUMP
saves 33 - 17 = 16 gas -- a 48% improvement.
# Tail Call Elimination
We can even take advantage of SQUARE
just happening to directly follow TEST_SQUARE
and just fall through rather than jump at all.
Tail call elimination
, using JUMPSUB:
TEST_SQUARE:
0x02 ; 3 gas
SQUARE:
dup1 ; 3 gas
mul ; 5 gas
returnsub ; 3 gas
Total 14 gas.
2
3
4
5
6
7
8
9
Tail call elimination, using JUMP:
TEST_SQUARE:
jumpdest ; 1 gas
0x02 ; 3 gas
SQUARE:
jumpdest ; 1 gas
dup1 ; 3 gas
mul ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
Total: 24 gas
2
3
4
5
6
7
8
9
10
11
12
Using RETURNSUB
versus JUMP
saves 24 - 14 = 10 gas -- a 42% improvement.
# Call Using Data Stack
We can also consider an alternative call mechanism -- call it CALLSUB
-- that pushes its return address on the data_stack
:
TEST_SQUARE:
jumpdest ; 1 gas
0x02 ; 3 gas
callsub SQUARE ; 5 gas
returnsub ; 3 gas
SQUARE:
jumpdest ; 1 gas
dup1 ; 3 gas
mul ; 5 gas
swap1 ; 3 gas
returncall ; 3 gas
2
3
4
5
6
7
8
9
10
11
12
Total 28 gas, compared to 22 gas for the RJUMPSUB
version. Possibly more, as using the wide-integer data stack is probably less efficient than using a stack of native integers.
So we can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using JUMP
.
Note: Clearly, the benefits acrrue less to programs composed of larger subroutines rather than being factored into smaller ones.
# Specification
To support calls and returns we specify an EVM return stack
in addition to the existing data stack
. The return stack
, like the data stack
, is limited to 1024
items. This stack supports two new instructions for subroutines.
# Instructions
# RJUMPSUB (0x5e) jmpdest
Transfers control to a subroutine. The destination address is relative to current PC.
- Decode the
jmpdest
from the immediate data. The data is encoded as two bytes, MSB-first.- Set
PC
toPC + jmpdest
.- Push
PC + 1
on thereturn stack
.The cost is low.
# RETURNSUB (0x5f)
Returns control to the caller of a subroutine.
- Pop
PC
off thereturn stack
.The cost is verylow.
Notes:
- If a resulting
PC
to be executed is beyond the last instruction then the opcode is implicitly aSTOP
, which is not an error. - Values popped off the
return stack
do not need to be validated, since they are alterable only byRJUMPSUB
andRETURNSUB
. - The description above lays out the semantics of these instructions in terms of a
return stack
. But the actual state of thereturn stack
is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may codeRJUMPSUB
to unobservably pushPC
on thereturn stack
rather thanPC + 1
, which is allowed so long asRETURNSUB
observably returns control to thePC + 1
location.) - The
return stack
is the functional equivalent of Turing's "delay line".
The low cost of RJUMPSUB
versus JUMP
is justified by needing only to add the immediate two byte destination to the PC
and push the return address on the return stack
, all using native arithmetric, versus using the data stack with emulated 256-bit instructions.
The verylow cost of RETURNSUB
is justified by needing only to pop the return stack
into the PC
. Benchmarking will be needed to tell if the costs are well-balanced.
# Safety
We define safety here as avoiding exceptional halting states, as defined in the Yellow Paper. A validator can always detect three of these states at validation time:
- Insufficient stack items
- Invalid jump destination
- Invalid instruction
A validator can detect stack overflow only for non-recursive programs. So two states will still require tests at runtime:
- Stack overflow
- Insufficient gas
EIP-3779 specifies creation-time validation to detect invalid contracts.
# Rationale
There are at least two designs for a subroutine facility.
We have chosen Turing's design, as have Forth, the JVM, Wasm, and others. We keep return addresses on a dedicated FIFOreturn stack
. The instruction to call a subroutine will
- push the return address onto the
return stack
and - jump to the first instruction of the subroutine.
The instruction to return from a subroutine will
- jump to the address popped off of the
return stack
.
We give an example of an alternative design above, using the single data stack
in RAM for both data and return addresses. Similar designs are used by many register machines. The instruction to call a subroutine will
- push the return address on the
data stack
and - jump to the first instruction of the subroutine.
The instruction to return from a subroutine will
- jump to the address popped off of the
data stack
. Note that the program must first ensure that the return address is the first element on the stack.
We prefer Turing's design for a few reasons.
- It maintains a clear separation between calculation and flow of control. It's impossible to overwrite the return stack, and the data stack is uncluttered with return addresses.
- It improves performance by
- using native arithmetic rather than 256-bit EVM instructions for the return address,
- not needing a
data stack
slot for the return address, - and not needing to move parameters and return values around the return address.
- It has a 75-year history of working well, especially for stack machines.
# Backwards Compatibility
These changes affect the semantics of existing EVM code.
These changes are compatible with EIP-3779 -- contracts following all of the rules and tests given there will be valid.
# Security Considerations
These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of EVM code needs to be modified accordingly. The RJUMPSUB
semantics are similar to JUMP
whereas the RETURNSUB
instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).
# Test Cases
# Simple routine
This should jump into a subroutine, back out and stop.
Bytecode: 0x60045e005b5d
(PUSH1 0x04, JUMPSUB, STOP, JUMPDEST, RETURNSUB
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | JUMPSUB | 5 | [] | [] |
3 | RETURNSUB | 5 | [] | [0] |
4 | STOP | 0 | [] | [] |
Output: 0x
Consumed gas: 10
# Two levels of subroutines
This should execute fine, going into one two depths of subroutines
Bytecode: 0x6800000000000000000c5e005b60115e5d5b5d
(PUSH9 0x00000000000000000c, JUMPSUB, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | JUMPSUB | 5 | [] | [] |
3 | JUMPSUB | 5 | [] | [0] |
4 | RETURNSUB | 5 | [] | [0,3] |
5 | RETURNSUB | 5 | [] | [3] |
6 | STOP | 0 | [] | [] |
Consumed gas: 20
# Failure 1: invalid jump
This should fail, since the given location is outside of the code-range. The code is the same as previous example,
except that the pushed location is 0x01000000000000000c
instead of 0x0c
.
Bytecode: (PUSH9 0x01000000000000000c, JUMPSUB,
0x6801000000000000000c5e005b60115e5d5b5d, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | JUMPSUB | 10 | [18446744073709551628] | [] |
Error: at pc=10, op=JUMPSUB: invalid jump destination
# Failure 2: shallow return stack
This should fail at first opcode, due to shallow return_stack
Bytecode: 0x5d5858
(RETURNSUB, PC, PC
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | RETURNSUB | 5 | [] | [] |
Error: at pc=0, op=RETURNSUB: invalid retsub
# Subroutine at end of code
In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error
Bytecode: 0x6005565b5d5b60035e
(PUSH1 0x05, JUMP, JUMPDEST, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | PUSH1 | 3 | [] | [] |
2 | JUMP | 8 | [5] | [] |
5 | JUMPDEST | 1 | [] | [] |
6 | JUMPSUB | 5 | [] | [] |
2 | RETURNSUB | 5 | [] | [2] |
7 | STOP | 0 | [] | [] |
Consumed gas: 30
# Copyright
Copyright and related rights waived via CC0 (opens new window).