Enable execution of WASM bytecode for smart contracts

This issue proposes an alternative VM for executing smart contracts on Algorand. The VM this issue focuses on in particular is WASM, which I think is the best fitting VM for our use case, but there are also other instruction sets, such as RISC-V, that have many of the same benefits.

## Proposal

Currently Algorand smart contracts have an ApprovalProgram and ClearProgram, both of which are AVM bytecode. The proposal advocates for a new WasmProgram, which is WASM bytecode. The WASM bytecode would live on chain and can be prefetched and AOT compiled prior to execution. Of course, AVM bytecode would still need to be supported alongside the WasmProgram. During execution, everything about the architecture of smart contracts (outside of the VM itself) stays the same with WASM. This means

* Inner transactions still supported (and a WASM program could call an AVM or vice-versa)
* Same state model (box, global, local)
* Same information available (global variables, app/asset/account params, etc.)
* Same resource model (i.e `tx.Access`)
* Similar compute limits (roughly equivalent opcode budget)

## Rationale

### More Primitives "For Free"

If we adopt WASM, we get access to more primitives. By primitive I mean essential building blocks for apps. I am mostly focusing on crypto primitives, but there are others (i.e math wider than u512) that would also be beneficial to the ecosystem. When I say primitives, think of things that could reasonably be opcodes in the current AVM. A common use case that is also driving other ecosystems, such as ETH, to consider other VMs (RISC-V in ETHs case) is Zero Knowledge proofs. Currently on Algorand, we support Plonk circuits with BN254 and BLS12-381. This required *at least* [eight months](https://github.com/algorand/go-algorand/pull/5978#issue-2250713130) to merge in the new opcodes to support this (and that's not also counting the time to release and consensus upgrade). Additionally, the verification logic for Plonk circuits needed to be implemented at the application layer, which is certainly [non-trivial](https://github.com/giuliop/AlgoPlonk/blob/main/verifier/templateSmartContractBLS12_381.go) and then build up all the off-chain tooling to actually use it. If we wanted to support a new type of ZK proofs, such as halo2 (mostly notably used in Zcash and recently chosen by Midnight) we'd need to go through this process once again. We'd need to add support for the pasta curves and likely the poseidon hash function. A new AVM verifier would need to be written along with all the off-chain tooling to actually leverage the new functionality. 

I say "for free" in this section's header because if we imagine a world where we could executed on-chain logic via WASM, writing a contract that verifies a halo2 proof would simply involve importing the existing halo2 Rust library. **This means there would be no engineering hours required from Algorand Technologies, Algorand Foundation, or any other contributor to `go-algorand`**. This also means there would not need to be any engineer hours required to update smart contract languages (i.e Puya) to support new opcodes. Also, perhaps more importantly, this would result in significantly higher veolcity for the availablility of bleeding edge technology to be available in smart contracts. In the world of WASM on Algorand, if we need to make the migration to post-quantum primitives in ZK circuits we can do that as soon as they are available in WASM-compatible languages. 

### Widely Adopted Tooling

Right now to build on Algorand you need to use the Puya compiler. You can choose either Python or TypeScript as a frontend, but both still have a learning curve due to Algorand specifics. The implementation of the Puya compiler is non-trivial and requires a LOT engineering hours to implement and maintain. With WASM as an option, Compilers (i.e Puya) is an entire class of software that we (currently Algorand Foundation, but more generally Algorand ecosystem) would no longer need to maintain. This not only saves time and money, but also allows those resource to be put into other areas of improvement. Given the fact that WASM is primarily aimed at browsers, it's probably safe to assume it is not going anywhere anytime soon. This means that we not only get access to current languages that support WASM (Rust, Zig, C, AssemblyScript, Go, etc.) but we also get future languages (i.e Mojo).  By enabling the usage of these languages, we also get access to their libraries. There are many fundmanetal libaries developers often take for granted, such as PRNG that require a [non-trivial amount of time and effort](https://github.com/CiottiGiorgio/lib-pcg-avm) to make available on the AVM. 

### External Resources

The tooling for WASM is not just widely adopted by users, but also contributors. Currently the only developers that are interested in contributing to the AVM (and its ecosystem) are the developers using Algorand. WASM on the other hand has a wide group of developers with a vested interest. This not only means there are external (external to the Algorand ecosystem) contributions to the WASM VM and runtime, but also other tooling such as libraries, languages, auditing tools, debugging, etc. This means the Algorand development ecosystem can grow along with the entire WASM ecosystem and not rely soley on internal contributions. 

### Performance

Many WASM runtimes support AOT compilation, which means the program is compiled in native instructions for whichever machine it is currently on. This results in sigificantly faster execution times. For Algorand, the value of faster contract execution wouldn't be faster block time (contract eval is already a small percentage of total block eval time), but rather more possibilities within the same block time. A good example of what WASM has enabled for other chains is NEAR's Aurora protocol. Aurora is how NEAR achieves EVM compability. To do this, they have re-implemented the entire EVM intrepreter in a smart contract, meaning one can call the Aurora EVM contract just like any other contract on the network. Having more compute per program also means we can avoid common UX problems we have with the current model, such as requiring a bunch of op-up transactions (which to be fair is solvable within the AVM).

### WASM Is Fit for Purpose

As previously mentioned, there are other instruction sets that are worth considering such as RISC-V, but I think WASM makes the most sense for contract execution. WASM is designed to run in browsers, a notably hostile environment. This means that it is design from the ground up to be secure and isolated from the system it is running on. By default, WASM has access to a pre-allocated slice of linear memory on the host, but nothing else. This means WASM binaries cannot execute system calls, meaning no filesystem access or network calls. It also means a WASM program cannot affect any other program running on the host machine. This is possible with RISC-V, but it is not a feature out of the box. For example, the Polkadot VM requires modifications to RISC-V bytecode to make it safe to execute on-chain:

> PolkaVM currently supports riscv64emac with the Zbb extension, but unlike most (all?) other RISC-V VMs it doesn’t run RISC-V binaries as-is (it’s not actually a RISC-V VM!). Offline we ingest normal RISC-V ELF binaries that you build using a normal compiler, and we translate them into a more constrained and efficient custom bytecode that’s designed for use in a VM (and not native hardware, like RISC-V is). The idea is to remove as much complexity out of the VM itself (which needs to run on-chain), put as much of that complexity as possible into offline tooling (which can run off-chain), and improve security by removing unnecessary features (e.g. in vanilla RISC-V you can jump to any address; in PolkaVM bytecode the code address space is fully virtualized, you can’t jump everywhere, and the bytecode isn’t even loaded into memory that’s accessible by the program).

[Source](https://ethereum-magicians.org/t/long-term-l1-execution-layer-proposal-replace-the-evm-with-risc-v/23617/32)

With WASM, we don't need the extra tooling complexity to make contracts secure. 

## Proof Of Concept

### Current Implementation

The implementation for this proof of concept can be seen in [this branch](https://github.com/joe-p/go-algorand/tree/spike/wazero). It should be noted this is a proof of concept. Many of the implementation details would need to be improved and this POC does not include all of the required functionality. For example, WASM programs are not currently stored in the ledger (create only)

### Runtime 

This POC uses wazero as the runtime, which is a WASM runtime implemented in pure go with zero dependencies

### AOT Compilation

Wazero supports AOT compilation with a compilation cache. This means WASM modules can be compiled to native instructions and then saved on disk. For existing apps, this can happen during catchup. For new apps, this can happen in the prefetcher. 

### Module Instantiation

Currently modules are instantiated by the prefetcher. This helps avoid long startup times when the transaction is actually evaluated. 

### Ledger Access

Ledger access is possible in WASM by exporting Go functions and calling them from WASM. For example, this is the go function for accessing global state:

```go
getGlobalUint := func(ctx context.Context, stack []uint64) {
		if evalContext, ok := ctx.Value("evalContext").(*logic.EvalContext); ok {
			appId := stack[0]
			key_pointer := wazeroapi.DecodeI32(stack[1])
			key_length := wazeroapi.DecodeI32(stack[2])

			mem := envModule.Memory()

			// TODO(wasm): Figure out best way to handle out of range memory access
			key, _ := mem.Read(uint32(key_pointer), uint32(key_length))

			tv, _, err := evalContext.Ledger.GetGlobal(basics.AppIndex(appId), string(key))
			if err != nil {
				fmt.Printf("Error getting global value: %v\n", err)
				// TODO(wasm): Figure out best way to handle errors in general
				stack[0] = 1337
				return
			}

			stack[0] = tv.Uint
			return
		}
		panic("getGlobalUint called without evalContext in context")

	}
```

We export this function with the name `host_get_global_uint` and then call it like any other external C ABI function from the language in which we are writing a contract. For example, in rust

```rust
#[link(wasm_import_module = "algorand")]
unsafe extern "C" {
    fn host_get_global_uint(app: u64, key: *const u8, len: i32) -> u64;
```

### Benchmarks

These benchmarks compares the **total group eval time** (not just contract execution time) of a TEAL program vs a WASM program (compiled from Rust). There are two different type of programs included in these benchmarks. 

 `StateLoop` is a program that iterates a counter in global state until a specific value is reached. `Fibo` is a recursive fibonacci implementation that calculates `fibonacci(7)`.  The source code can be viewed below for both TEAL and Rust. `Int1` is a program that does nothing other than return 1.

```
goos: darwin
goarch: arm64
pkg: github.com/algorand/go-algorand/ledger
cpu: Apple M4 Pro
BenchmarkWasmStateLoop
BenchmarkWasmStateLoop/rust_group_of_1_with_prefetcher
BenchmarkWasmStateLoop/rust_group_of_1_with_prefetcher-14                    500            439827 ns/op           83137 B/op        336 allocs/op
BenchmarkWasmStateLoop/teal_group_of_1_with_prefetcher
BenchmarkWasmStateLoop/teal_group_of_1_with_prefetcher-14                    500            473263 ns/op           83483 B/op        334 allocs/op
BenchmarkWasmStateLoop/rust_group_of_2_with_prefetcher
BenchmarkWasmStateLoop/rust_group_of_2_with_prefetcher-14                    500            556629 ns/op          140631 B/op        441 allocs/op
BenchmarkWasmStateLoop/teal_group_of_2_with_prefetcher
BenchmarkWasmStateLoop/teal_group_of_2_with_prefetcher-14                    500            812080 ns/op          144700 B/op        524 allocs/op
BenchmarkWasmStateLoop/teal_group_of_16_with_prefetcher
BenchmarkWasmStateLoop/teal_group_of_16_with_prefetcher-14                   500           5561834 ns/op         1063913 B/op       3204 allocs/op
BenchmarkWasmStateLoop/rust_group_of_16_with_prefetcher
BenchmarkWasmStateLoop/rust_group_of_16_with_prefetcher-14                   500           2349579 ns/op          996859 B/op       1932 allocs/op
BenchmarkWasmInt1
BenchmarkWasmInt1/rust_group_of_1_with_prefetcher
BenchmarkWasmInt1/rust_group_of_1_with_prefetcher-14                         500            263212 ns/op           76977 B/op        248 allocs/op
BenchmarkWasmInt1/teal_group_of_1_with_prefetcher
BenchmarkWasmInt1/teal_group_of_1_with_prefetcher-14                         500            231245 ns/op           77349 B/op        239 allocs/op
BenchmarkWasmInt1/teal_group_of_2_with_prefetcher
BenchmarkWasmInt1/teal_group_of_2_with_prefetcher-14                         500            346577 ns/op          136845 B/op        334 allocs/op
BenchmarkWasmInt1/rust_group_of_2_with_prefetcher
BenchmarkWasmInt1/rust_group_of_2_with_prefetcher-14                         500            395911 ns/op          138299 B/op        351 allocs/op
BenchmarkWasmInt1/rust_group_of_16_with_prefetcher
BenchmarkWasmInt1/rust_group_of_16_with_prefetcher-14                        500           2229678 ns/op         1000279 B/op       1815 allocs/op
BenchmarkWasmInt1/teal_group_of_16_with_prefetcher
BenchmarkWasmInt1/teal_group_of_16_with_prefetcher-14                        500           1892801 ns/op          995342 B/op       1681 allocs/op
BenchmarkWasmRustVsTealFibo
BenchmarkWasmRustVsTealFibo/teal_group_of_1_with_prefetcher
BenchmarkWasmRustVsTealFibo/teal_group_of_1_with_prefetcher-14               500            381887 ns/op           77307 B/op        248 allocs/op
BenchmarkWasmRustVsTealFibo/rust_group_of_1_with_prefetcher
BenchmarkWasmRustVsTealFibo/rust_group_of_1_with_prefetcher-14               500            290020 ns/op           76326 B/op        248 allocs/op
BenchmarkWasmRustVsTealFibo/rust_group_of_2_with_prefetcher
BenchmarkWasmRustVsTealFibo/rust_group_of_2_with_prefetcher-14               500            446074 ns/op          137684 B/op        351 allocs/op
BenchmarkWasmRustVsTealFibo/teal_group_of_2_with_prefetcher
BenchmarkWasmRustVsTealFibo/teal_group_of_2_with_prefetcher-14               500            672087 ns/op          138726 B/op        352 allocs/op
BenchmarkWasmRustVsTealFibo/rust_group_of_16_with_prefetcher
BenchmarkWasmRustVsTealFibo/rust_group_of_16_with_prefetcher-14              500           2494342 ns/op          995038 B/op       1814 allocs/op
BenchmarkWasmRustVsTealFibo/teal_group_of_16_with_prefetcher
BenchmarkWasmRustVsTealFibo/teal_group_of_16_with_prefetcher-14              500           4479521 ns/op         1015953 B/op       1825 allocs/op
PASS
ok      github.com/algorand/go-algorand/ledger  37.272s
```

#### StateLoop TEAL

```

#pragma version 11
b program
get_counter:
	byte "counter"
	app_global_get
	retsub
increment_counter:
	byte "counter"
	dup
	app_global_get
	int 1
	+
	app_global_put
	retsub

program:
	callsub increment_counter
	callsub get_counter
	int 25
	<
	bnz program

int 1
return
```

#### StateLoop Rust

```rust
use algokit::{get_current_application_id, get_global_uint, set_global_uint};

pub fn increment_counter() {
    let key = b"counter";
    let app_id = get_current_application_id();

    set_global_uint(app_id, key, get_global_uint(app_id, key) + 1);
}

pub fn get_counter() -> u64 {
    get_global_uint(get_current_application_id(), b"counter")
}

#[unsafe(no_mangle)]
pub fn program() -> u64 {
    while get_counter() < 25 {
        increment_counter();
    }

    get_counter()
}
```

#### Fibo TEAL

```

#pragma version 11
b program

// fibonacci(n: uint64): uint64
fibonacci:
	proto 1 1

	// *if1_condition
	// examples/calculator/calculator.algo.ts:49
	// n <= 1
	frame_dig -1 // n: uint64
	int 1
	<=
	bz *if1_end

	// *if1_consequent
	// examples/calculator/calculator.algo.ts:50
	// return n;
	frame_dig -1 // n: uint64
	retsub

*if1_end:
	// examples/calculator/calculator.algo.ts:52
	// return this.fibonacci(n - 1) + this.fibonacci(n - 2);
	frame_dig -1 // n: uint64
	int 1
	-
	callsub fibonacci
	frame_dig -1 // n: uint64
	pushint 2
	-
	callsub fibonacci
	+
	retsub


program:
	int 7
	callsub fibonacci
	return
```

#### Fibo Rust

```rust
fn fibo(n: u64) -> u64 {
    if n <= 1 {
        return n;
    }
    fibo(n - 1) + fibo(n - 2)
}

#[unsafe(no_mangle)]
pub fn program() -> u64 {
    // 7 is the highest we can go in TEAL without hitting opcode limit
    return fibo(7);
}
```

## Impacts

### Developer  Language Migration

Currently most developers are now using Algorand Python or Algorand TypesScript (or TEALScript). Since AVM bytecode program would still be supported, migration would not be urgent. I would imagine the early days of WASM support would result in existing AVM apps calling into WASM apps for access to more compute or new primitives (similar to ETH pre-compiles). Eventually though, we'd probably want developers to slowly move to targeting WASM. This would require one of two things (or both):

A) We keep Algorand Python and Algorand TypeScript frontends the same but target WASM (either by changing Puya, direct to WASM, or transpile to C/Rust/Zig)

B) We deprecate Algorand Python and Algorand TypeScript in favor of existing WASM-friendly languages

The obvious benefit to A is that there is essentially no conscious migration path needed for developers. This, however, would mean that a lot of engineering hours would still need to be dedicated to these Algorand-specific languages. There might also be a lot of complexity in how these languages target WASM. The most straightforward is probably transpiling to another language such as C, Rust, or Zig that can then compile down to WASM. B would involve more migration pain for developers, but seems to be a bit more sustainable in the long term since no one has to maintain an Algorand-specific language. With B we would still need to maintain WASM libraries for Algorand in whichever languages we choose to support, but that seems much easier than having to maintain a compiler/transpiler. 

### Loss of TEAL Elegance

One of the many upsides to TEAL is the elegance of the language. TEAL is very easy to read and thus easy to audit. To be fair, this is true for simple programs but more complex programs, especially ones with complex data structures like dynamic arrays, can be very terse. Below is a recursive fibonacci program in decompiled WAT and TEAL. Note that this is the WASM program with all symols stripped for a fair comparison against a decompiled TEAL program (which also strips branch names)

```
(module
  (type (;0;) (func (param i64) (result i64)))
  (type (;1;) (func (result i64)))
  (import "env" "memory" (memory (;0;) 16))
  (func (;0;) (type 0) (param i64) (result i64)
    (local i64)
    i64.const 0
    local.set 1
    block  ;; label = @1
      loop  ;; label = @2
        local.get 0
        i64.const 2
        i64.lt_u
        br_if 1 (;@1;)
        local.get 0
        i64.const -1
        i64.add
        call 0
        local.get 1
        i64.add
        local.set 1
        local.get 0
        i64.const -2
        i64.add
        local.set 0
        br 0 (;@2;)
      end
    end
    local.get 0
    local.get 1
    i64.add)
  (func (;1;) (type 1) (result i64)
    i64.const 7
    call 0)
  (global (;0;) (mut i32) (i32.const 1048576))
  (global (;1;) i32 (i32.const 1048576))
  (global (;2;) i32 (i32.const 1048576))
  (export "program" (func 1))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2)))
```

```
label5:
proto 1 1
frame_dig -1
intc_0 // 1
<=
bz label7
frame_dig -1
retsub
label7:
frame_dig -1
intc_0 // 1
-
callsub label5
frame_dig -1
pushint 2
-
callsub label5
+
retsub
label4:
pushint 7
callsub label5
retsub
```

### Complexity Tradeoff

In other sections of this issue I describe WASM integration as a reduction of complexity in client software. I do want to acknowledge that this is not entirely fair, because WASM runtimes are fairly complex themselves. The reduced complexity exists within the scope of the Algorand specific source code required for client implementation. I think the complexity can be categorized into three areas

A) Client implementation-specific source code
B) External dependencies for client implementation
C) Out-of-client complexity 

Right now the AVM results in moderate complexity in A, low complexity of B, and high complexity of C (think of all the AVM-specific tools that need to be maintained for development, debugging, testing, auditing, etc.). Adopting WASM would reduce A and C at the cost of increasing B. This, however, seems like a worthwhile tradeoff for the long-term sustainability of the protocol. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable execution of WASM bytecode for smart contracts #6393

Proposal

Rationale

More Primitives "For Free"

Widely Adopted Tooling

External Resources

Performance

WASM Is Fit for Purpose

Proof Of Concept

Current Implementation

Runtime

AOT Compilation

Module Instantiation

Ledger Access

Benchmarks

StateLoop TEAL

StateLoop Rust

Fibo TEAL

Fibo Rust

Impacts

Developer Language Migration

Loss of TEAL Elegance

Complexity Tradeoff

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable execution of WASM bytecode for smart contracts #6393

Description

Proposal

Rationale

More Primitives "For Free"

Widely Adopted Tooling

External Resources

Performance

WASM Is Fit for Purpose

Proof Of Concept

Current Implementation

Runtime

AOT Compilation

Module Instantiation

Ledger Access

Benchmarks

StateLoop TEAL

StateLoop Rust

Fibo TEAL

Fibo Rust

Impacts

Developer Language Migration

Loss of TEAL Elegance

Complexity Tradeoff

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions