EIP2315 - Simple Subroutines for the EVM

# Abstract

This proposal introduces two opcodes to support simple subroutines: RJUMPSUB and RETURNSUB.

Taken together with other recent propoposals it provides an efficient, static, and safe control-flow facility.

# Motivation

The EVM does not provide subroutines as primitives.

Instead, calls can be synthesized by fetching and pushing the return address and subroutine address on the data stack and executing JUMP to the subroutine; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. These conventions cost gas and use stack slots unnecessarily. And they create unnecessary complexity that is borne by the humans and programs writing, reading, and analyzing EVM code. Note: A stack-rolling operator could reduce the complexity of the code -- it could could move return address to the top of the stack and implicitly shift the intervening items. But it would be an expensive operation with a dynamic gas cost.

As an alternative, we propose to provide for subroutines with two simple operations: RJUMPSUB to call a subroutine and RETURNSUB to return from it. Taken together with other recent propoposals they provide an efficient, static, and safe control-flow facility.

Efficient. Substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines -- as much as 56% savings in gas in the analysis below. Substantial performance advantages for static jumps are reported elsewhere.

Static. All possible jumps are known at contract creation time.

Safe. Valid contracts will not halt with an exception unless they run out of gas or recursively overflow stack.

# History

Facilities to directly support subroutines are provided by all but one of the real and virtual machines we have programmed, including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, Sun SPARC, a few generations of Intel silicon, ARM, UCSD p-machine, Sun JVM, Wasm, and the sole exception -- the EVM. In whatever form, these operations provide for

capturing the current context of execution,
transferring control to a new context, and
returning to the original context
- after possible further transfers of control
- to some arbitrary depth.

The concept (opens new window) goes back to Turing's Automatic Computing Engine of 1946 (opens new window):

... We also wish to be able to arrange for the splitting up of operations into subsidiary operations. This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation. ... When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note. How is the burying and disinterring of the note to be done? There are of course many ways. One is to keep a list of these notes in one or more standard size delay lines, (1024) with the most recent last...

Turing's machine held data in delay lines made of mercury-filled crystals. We have better hardware now, but the concept is simple and by now familiar: Turing's "subsidiary operations" are subroutines, and his "list of notes" is a stack of return addresses.

We propose to follow Turing's lead in the design of our subroutine facility, as specified below.

# Efficiency Analysis

We show here how these instructions can be used to reduce the complexity and gas costs of both ordinary subroutine calls and low-level optimizations compared to using JUMP.

We assume that RJUMPSUB costs 5 gas, and RETURNSUB costs 8 gas. We justify these costs below, but they do not make a large difference in this analysis, as much of the improvement is due to PUSH and SWAP operations that are no longer needed.

# Simple Subroutine Call

Consider this example of calling a fairly minimal subroutine.

Subroutine call, using RJUMPSUB:

TEST_SQUARE:
    0x02            ; 3 gas
    jumpsub SQUARE  ; 5 gas
    returnsub       ; 3 gas

SQUARE:
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total 22 gas.

1
2
3
4
5
6
7
8
9
10
11

Subroutine call, using JUMP:

TEST_SQUARE:
    jumpdest        ; 1 gas
    RTN_SQUARE      ; 3 gas
    0x02            ; 3 gas
    SQUARE          ; 3 gas
    jump            ; 8 gas
RTN_SQUARE:
    jumpdest        ; 1 gas
    swap1           ; 3 gas
    jump            ; 8 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 50 gas

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Using RJUMPSUB saves 50 - 22 = 28 gas versus using JUMP -- a 56% improvement.

# Tail Call Optimization

Of course in cases like this one we can optimize the tail call, so that the return from SQUARE actually returns from TEST_SQUARE.

Tail call optimization, using RJUMP and RETURNSUB:

TEST_SQUARE:
    0x02            ; 3 gas
    rjump SQUARE    ; 3 gas

SQUARE:
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total: 17 gas

1
2
3
4
5
6
7
8
9
10

Tail call optimization, using JUMP:

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
    SQUARE          ; 3 gas
    jump            ; 8 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 33 gas

1
2
3
4
5
6
7
8
9
10
11
12
13
14

Using RJUMPSUB versus JUMP saves 33 - 17 = 16 gas -- a 48% improvement.

# Tail Call Elimination

We can even take advantage of SQUARE just happening to directly follow TEST_SQUARE and just fall through rather than jump at all.

Tail call elimination, using JUMPSUB:

TEST_SQUARE:
    0x02            ; 3 gas

SQUARE:
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

Total 14 gas.

1
2
3
4
5
6
7
8
9

Tail call elimination, using JUMP:

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

Total: 24 gas

1
2
3
4
5
6
7
8
9
10
11
12

Using RETURNSUB versus JUMP saves 24 - 14 = 10 gas -- a 42% improvement.

# Call Using Data Stack

We can also consider an alternative call mechanism -- call it CALLSUB -- that pushes its return address on the data_stack:

TEST_SQUARE:
    jumpdest        ; 1 gas
    0x02            ; 3 gas
    callsub SQUARE  ; 5 gas
    returnsub       ; 3 gas

SQUARE:
    jumpdest        ; 1 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    returncall      ; 3 gas

1
2
3
4
5
6
7
8
9
10
11
12

Total 28 gas, compared to 22 gas for the RJUMPSUB version. Possibly more, as using the wide-integer data stack is probably less efficient than using a stack of native integers.

So we can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using JUMP.

Note: Clearly, the benefits acrrue less to programs composed of larger subroutines rather than being factored into smaller ones.

# Specification

To support calls and returns we specify an EVM return stack in addition to the existing data stack. The return stack, like the data stack, is limited to 1024 items. This stack supports two new instructions for subroutines.

# Instructions

# `RJUMPSUB (0x5e) jmpdest`

Transfers control to a subroutine. The destination address is relative to current PC.

Decode the jmpdest from the immediate data. The data is encoded as two bytes, MSB-first.

Set PC to PC + jmpdest.

Push PC + 1 on the return stack.

The cost is low.

# `RETURNSUB (0x5f)`

Returns control to the caller of a subroutine.

Pop PC off the return stack.

The cost is verylow.

Notes:

If a resulting PC to be executed is beyond the last instruction then the opcode is implicitly a STOP, which is not an error.
Values popped off the return stack do not need to be validated, since they are alterable only by RJUMPSUB and RETURNSUB.
The description above lays out the semantics of these instructions in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code RJUMPSUB to unobservably push PC on the return stack rather than PC + 1, which is allowed so long as RETURNSUB observably returns control to the PC + 1 location.)
The return stack is the functional equivalent of Turing's "delay line".

The low cost of RJUMPSUB versus JUMP is justified by needing only to add the immediate two byte destination to the PC and push the return address on the return stack, all using native arithmetric, versus using the data stack with emulated 256-bit instructions.

The verylow cost of RETURNSUB is justified by needing only to pop the return stack into the PC. Benchmarking will be needed to tell if the costs are well-balanced.

# Safety

We define safety here as avoiding exceptional halting states, as defined in the Yellow Paper. A validator can always detect three of these states at validation time:

Insufficient stack items
Invalid jump destination
Invalid instruction

A validator can detect stack overflow only for non-recursive programs. So two states will still require tests at runtime:

Stack overflow
Insufficient gas

EIP-3779 specifies creation-time validation to detect invalid contracts.

# Rationale

There are at least two designs for a subroutine facility.

We have chosen Turing's design, as have Forth, the JVM, Wasm, and others. We keep return addresses on a dedicated FIFOreturn stack. The instruction to call a subroutine will

push the return address onto the return stack and
jump to the first instruction of the subroutine.

The instruction to return from a subroutine will

jump to the address popped off of thereturn stack.

We give an example of an alternative design above, using the single data stack in RAM for both data and return addresses. Similar designs are used by many register machines. The instruction to call a subroutine will

push the return address on the data stack and
jump to the first instruction of the subroutine.

The instruction to return from a subroutine will

jump to the address popped off of the data stack. Note that the program must first ensure that the return address is the first element on the stack.

We prefer Turing's design for a few reasons.

It maintains a clear separation between calculation and flow of control. It's impossible to overwrite the return stack, and the data stack is uncluttered with return addresses.
It improves performance by
- using native arithmetic rather than 256-bit EVM instructions for the return address,
- not needing a data stack slot for the return address,
- and not needing to move parameters and return values around the return address.
It has a 75-year history of working well, especially for stack machines.

# Backwards Compatibility

These changes affect the semantics of existing EVM code.
These changes are compatible with EIP-3779 -- contracts following all of the rules and tests given there will be valid.

# Security Considerations

These changes do introduce new flow control instructions, so any software which does static/dynamic analysis of EVM code needs to be modified accordingly. The RJUMPSUB semantics are similar to JUMP whereas the RETURNSUB instruction is different, since it can 'land' on any opcode (but the possible destinations can be statically inferred).

# Test Cases

# Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: 0x60045e005b5d (PUSH1 0x04, JUMPSUB, STOP, JUMPDEST, RETURNSUB)

Pc	Op	Cost	Stack	RStack
0	JUMPSUB	5	[]	[]
3	RETURNSUB	5	[]	[0]
4	STOP	0	[]	[]

Output: 0x Consumed gas: 10

# Two levels of subroutines

This should execute fine, going into one two depths of subroutines

Bytecode: 0x6800000000000000000c5e005b60115e5d5b5d (PUSH9 0x00000000000000000c, JUMPSUB, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB)

Pc	Op	Cost	Stack	RStack
0	JUMPSUB	5	[]	[]
3	JUMPSUB	5	[]	[0]
4	RETURNSUB	5	[]	[0,3]
5	RETURNSUB	5	[]	[3]
6	STOP	0	[]	[]

Consumed gas: 20

# Failure 1: invalid jump

This should fail, since the given location is outside of the code-range. The code is the same as previous example, except that the pushed location is 0x01000000000000000c instead of 0x0c.

Bytecode: (PUSH9 0x01000000000000000c, JUMPSUB,0x6801000000000000000c5e005b60115e5d5b5d, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB)

Pc	Op	Cost	Stack	RStack
0	JUMPSUB	10	[18446744073709551628]	[]

Error: at pc=10, op=JUMPSUB: invalid jump destination

# Failure 2: shallow `return stack`

This should fail at first opcode, due to shallow return_stack

Bytecode: 0x5d5858 (RETURNSUB, PC, PC)

Pc	Op	Cost	Stack	RStack
0	RETURNSUB	5	[]	[]

Error: at pc=0, op=RETURNSUB: invalid retsub

# Subroutine at end of code

In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error

Bytecode: 0x6005565b5d5b60035e (PUSH1 0x05, JUMP, JUMPDEST, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB)

Pc	Op	Cost	Stack	RStack
0	PUSH1	3	[]	[]
2	JUMP	8	[5]	[]
5	JUMPDEST	1	[]	[]
6	JUMPSUB	5	[]	[]
2	RETURNSUB	5	[]	[2]
7	STOP	0	[]	[]

Consumed gas: 30