Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 26.04.2025 | 1.11.3.4 | :sparkles: add bus lock feature | [#1245](https://github.com/stnolting/neorv32/pull/1245) |
| 26.04.2025 | 1.11.3.3 | optimize round-robin bus switch: remove idle cycles | [#1244](https://github.com/stnolting/neorv32/pull/1244) |
| 22.04.2025 | 1.11.3.2 | :bug: fix the privilege level with which the bootloader boots an application image | [#1241](https://github.com/stnolting/neorv32/pull/1241) |
| 22.04.2025 | 1.11.3.1 | add new top generic (`OCD_HW_BREAKPOINT`) to enable/disable OCD's hardware trigger; :warning: hardwire `tdata1.dmode` to `1` - only debug-mode can use the trigger module; hardwire `tdata1.action` to `0001` - debug-mode entry only | [#1239](https://github.com/stnolting/neorv32/pull/1239) |
Expand Down
53 changes: 28 additions & 25 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -380,16 +380,17 @@ always valid when set.
|=======================
| Signal | Width | Description
3+^| **In-Band Signals**
| `addr` | 32 | Access address (byte addressing)
| `data` | 32 | Write data
| `ben` | 4 | Byte-enable for each byte in `data`
| `stb` | 1 | Request trigger ("strobe", single-shot)
| `rw` | 1 | Access direction (`0` = read, `1` = write)
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store)
| `priv` | 1 | Set if privileged (M-mode) access
| `debug` | 1 | Set if debug mode access
| `amo` | 1 | Set if current access is an atomic memory operation (<<_atomic_memory_access>>)
| `amoop` | 4 | Type of atomic memory operation (<<_atomic_memory_access>>)
| `addr` | 32 | Access address using byte-wise addressing. Half-word (16-bit) and word (32-bit) addresses are aligned accordingly (e.g. LSB = 0 for half-word accesses).
| `data` | 32 | Write data. Writing individual bytes is controlled via `ben`.
| `ben` | 4 | Byte-enable for each byte in `data`.
| `stb` | 1 | Request trigger ("strobe"). This signal is high for exactly one cycle starting a bus request.
| `rw` | 1 | Access direction (`0` = read, `1` = write).
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store).
| `lock` | 1 | Set if contiguous transfer; the current transfer is part of a group of transfers that should not be interleaved. This signal is relevant for the bus network only and may be ignored.
| `priv` | 1 | Set if privileged (Machine-mode) access.
| `debug` | 1 | Set if debug-mode access (access from the on-chip debugger).
| `amo` | 1 | Set if current access is an atomic memory operation (<<_atomic_memory_access>>).
| `amoop` | 4 | Actual type of atomic memory operation (<<_atomic_memory_access>>). Irrelevant if `amo` is not set.
3+^| **Out-Of-Band Signals**
| `fence` | 1 | Data (load/store; `fence`) or instruction (instruction-fetch; `fence.i`) fence request; single-shot; see <<_memory_coherence>>
|=======================
Expand Down Expand Up @@ -425,7 +426,7 @@ The figure below shows three exemplary bus accesses:
. A write access to address `B_addr` writing `wdata` (fastest response; `ACK` arrives right in the next cycle).
. A failing read access to address `C_addr` (slow response; `ERR` arrives after several cycles).

.Three Exemplary Bus Transactions (showing only in-band signals; privileged non-debug non-atomic accesses)
.Three Exemplary Bus Transactions (privileged non-locked non-debug non-atomic accesses)
[wavedrom, format="svg", align="center"]
----
{signal: [
Expand All @@ -438,10 +439,12 @@ The figure below shows three exemplary bus accesses:
{name: 'stb', wave: '010|..10.10|..', node: '.a....d..f....'},
{name: 'rw', wave: 'x0.|.x1.x0.|..', node: '..............'},
{name: 'src', wave: 'x0.|.x0.x0.|.x'},
{name: 'lock', wave: '0..|.......|..'},
{name: 'priv', wave: 'x1.|.x1.x1.|.x'},
{name: 'debug', wave: 'x0.|.x0.x0.|.x'},
{name: 'amo', wave: 'x0.|.x0.x0.|.x'},
{name: 'amoop', wave: 'x0.|.x0.x0.|.x'},
{name: 'amoop', wave: 'x..|.......|..'},
{name: 'fence', wave: '0..|.......|..'},
],
{},
[
Expand Down Expand Up @@ -470,21 +473,21 @@ For the CPU, the atomic memory accesses are handled as plain load/store operatio
The `amoop` signal specifies the actual atomic processing operation:

.AMO Operation Type Encoding (Bus Interface Signals)
[cols="<1,<4,<4"]
[cols="<2,<4,<4"]
[options="header",grid="rows"]
|=======================
| `bus_req_t.amoop` | Description | ISA Extension
| `-000` | swap | <<_zaamo_isa_extension,`Zaamo`>>
| `-001` | unsigned add | <<_zaamo_isa_extension,`Zaamo`>>
| `-010` | logical xor | <<_zaamo_isa_extension,`Zaamo`>>
| `-011` | logical and | <<_zaamo_isa_extension,`Zaamo`>>
| `-100` | logical or | <<_zaamo_isa_extension,`Zaamo`>>
| `0110` | unsigned minimum | <<_zaamo_isa_extension,`Zaamo`>>
| `0111` | unsigned maximum | <<_zaamo_isa_extension,`Zaamo`>>
| `1110` | signed minimum | <<_zaamo_isa_extension,`Zaamo`>>
| `1111` | signed maximum | <<_zaamo_isa_extension,`Zaamo`>>
| `1000` | load-reservate | <<_zalrsc_isa_extension,`Zalrsc`>>
| `1001` | store-conditional | <<_zalrsc_isa_extension,`Zalrsc`>>
| `amoop` | Description | Required ISA Extension
| `-000` | swap | <<_zaamo_isa_extension,`Zaamo`>>
| `-001` | unsigned ADD | <<_zaamo_isa_extension,`Zaamo`>>
| `-010` | logical XOR | <<_zaamo_isa_extension,`Zaamo`>>
| `-011` | logical AND | <<_zaamo_isa_extension,`Zaamo`>>
| `-100` | logical OR | <<_zaamo_isa_extension,`Zaamo`>>
| `0110` | unsigned minimum | <<_zaamo_isa_extension,`Zaamo`>>
| `0111` | unsigned maximum | <<_zaamo_isa_extension,`Zaamo`>>
| `1110` | signed minimum | <<_zaamo_isa_extension,`Zaamo`>>
| `1111` | signed maximum | <<_zaamo_isa_extension,`Zaamo`>>
| `1000` | load-reservate | <<_zalrsc_isa_extension,`Zalrsc`>>
| `1001` | store-conditional | <<_zalrsc_isa_extension,`Zalrsc`>>
|=======================

.Cache Coherency
Expand Down
32 changes: 21 additions & 11 deletions rtl/core/neorv32_bus.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ architecture neorv32_bus_switch_rtl of neorv32_bus_switch is

type state_t is (S_IDLE, S_BUSY_A, S_BUSY_B);
signal state, state_nxt : state_t;
signal prev, prev_nxt, a_req, b_req, sel, stb : std_ulogic;
signal prio, prio_nxt, a_req, b_req, sel, stb : std_ulogic;

begin

Expand All @@ -46,12 +46,12 @@ begin
begin
if (rstn_i = '0') then
state <= S_IDLE;
prev <= '0';
prio <= '0';
a_req <= '0';
b_req <= '0';
elsif rising_edge(clk_i) then
state <= state_nxt;
prev <= prev_nxt;
prio <= prio_nxt;
if (state = S_BUSY_A) then -- clear request
a_req <= '0';
else -- buffer request
Expand All @@ -68,11 +68,11 @@ begin

-- Access Arbiter Comb --------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
arbiter_fsm: process(state, prev, a_req, b_req, a_req_i, b_req_i, x_rsp_i)
arbiter_fsm: process(state, prio, a_req, b_req, a_req_i, b_req_i, x_rsp_i)
begin
-- defaults --
state_nxt <= state;
prev_nxt <= prev;
prio_nxt <= prio;
sel <= '0';
stb <= '0';

Expand All @@ -81,23 +81,31 @@ begin

when S_BUSY_A => -- port A access in progress
-- ------------------------------------------------------------
prev_nxt <= '0';
sel <= '0';
sel <= '0';
if (a_req_i.lock = '1') then -- give port A prioritized access in the next cycle
prio_nxt <= '1';
else
prio_nxt <= '0';
end if;
if (x_rsp_i.err = '1') or (x_rsp_i.ack = '1') then
state_nxt <= S_IDLE;
end if;

when S_BUSY_B => -- port B access in progress
-- ------------------------------------------------------------
prev_nxt <= '1';
sel <= '1';
sel <= '1';
if (b_req_i.lock = '1') then -- give port B prioritized access in the next cycle
prio_nxt <= '0';
else
prio_nxt <= '1';
end if;
if (x_rsp_i.err = '1') or (x_rsp_i.ack = '1') then
state_nxt <= S_IDLE;
end if;

when others => -- wait for requests
-- ------------------------------------------------------------
if (prev = '1') or (ROUND_ROBIN_EN = false) then -- port B has just been served OR static prioritization
if (prio = '1') or (ROUND_ROBIN_EN = false) then -- serve port A first OR use static prioritization
if (a_req_i.stb = '1') or (a_req = '1') then -- request from port A (prioritized)?
sel <= '0';
stb <= '1';
Expand All @@ -107,7 +115,7 @@ begin
stb <= '1';
state_nxt <= S_BUSY_B;
end if;
else -- port A has just been served
else -- serve port B first
if (b_req_i.stb = '1') or (b_req = '1') then -- request from port B (prioritized)?
sel <= '1';
stb <= '1';
Expand All @@ -128,6 +136,7 @@ begin
x_req_o.addr <= a_req_i.addr when (sel = '0') else b_req_i.addr;
x_req_o.amo <= a_req_i.amo when (sel = '0') else b_req_i.amo;
x_req_o.amoop <= a_req_i.amoop when (sel = '0') else b_req_i.amoop;
x_req_o.lock <= a_req_i.lock when (sel = '0') else b_req_i.lock;
x_req_o.priv <= a_req_i.priv when (sel = '0') else b_req_i.priv;
x_req_o.debug <= a_req_i.debug when (sel = '0') else b_req_i.debug;
x_req_o.src <= a_req_i.src when (sel = '0') else b_req_i.src;
Expand Down Expand Up @@ -799,6 +808,7 @@ begin
sys_req_o.stb <= '1' when (arbiter.state = S_WRITE) else core_req_i.stb;
sys_req_o.rw <= '1' when (arbiter.state = S_WRITE) or (arbiter.state = S_WRITE_WAIT) else core_req_i.rw;
sys_req_o.src <= core_req_i.src;
sys_req_o.lock <= core_req_i.lock;
sys_req_o.priv <= core_req_i.priv;
sys_req_o.debug <= core_req_i.debug;
sys_req_o.amo <= core_req_i.amo;
Expand Down
33 changes: 11 additions & 22 deletions rtl/core/neorv32_cache.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -175,13 +175,14 @@ begin
-- host response defaults --
host_rsp_o <= rsp_terminate_c; -- all-zero

-- bus interface defaults (default = host access) --
-- bus interface defaults (default = cached host access) --
bus_req_o.addr <= addr.tag & addr.idx & addr.ofs & "00"; -- always word-aligned
bus_req_o.data <= cache_i.data;
bus_req_o.ben <= (others => '1'); -- full-word writes only
bus_req_o.stb <= '0'; -- no request by default
bus_req_o.rw <= '0';
bus_req_o.src <= host_req_i.src; -- pass-through
bus_req_o.lock <= '1'; -- cache block updates are contiguous transfers
bus_req_o.priv <= host_req_i.priv; -- pass-through
bus_req_o.debug <= host_req_i.debug; -- pass-through
bus_req_o.amo <= '0'; -- cache accesses cannot be atomic
Expand Down Expand Up @@ -223,14 +224,12 @@ begin
host_rsp_o.ack <= '1';
end if;
ctrl_nxt.state <= S_IDLE;
else -- cache miss
if (cache_i.sta_dir = '1') and (READ_ONLY = false) then -- block is dirty, upload first
addr_nxt.tag <= cache_i.sta_tag(31 downto 32-tag_size_c); -- tag of accessed block
ctrl_nxt.state <= S_UPLOAD_GET;
else -- block is clean, replace by new block
addr_nxt.tag <= host_req_i.addr(31 downto 32-tag_size_c); -- tag of referenced block
ctrl_nxt.state <= S_DOWNLOAD_REQ;
end if;
elsif (cache_i.sta_dir = '1') and (READ_ONLY = false) then -- cache miss: block is dirty, upload first
addr_nxt.tag <= cache_i.sta_tag(31 downto 32-tag_size_c); -- tag of accessed block
ctrl_nxt.state <= S_UPLOAD_GET;
else -- cache miss: block is clean, replace by new block
addr_nxt.tag <= host_req_i.addr(31 downto 32-tag_size_c); -- tag of referenced block
ctrl_nxt.state <= S_DOWNLOAD_REQ;
end if;


Expand All @@ -240,11 +239,6 @@ begin
bus_req_o.stb <= '1';
ctrl_nxt.buf_req <= '0'; -- access (about to be) completed
ctrl_nxt.state <= S_DIRECT_RSP;
-- -- update cache if accessed address is cached --
-- [NOTE] not implemented: this would make atomic memory access / memory coherence even more difficult to understand
-- if (cache_i.sta_hit = '1') and (host_req_i.rw = '1') and (READ_ONLY = false) then -- cache write hit
-- cache_o.we <= host_req_i.ben;
-- end if;

when S_DIRECT_RSP => -- wait for direct (uncached) access response
-- ------------------------------------------------------------
Expand All @@ -254,12 +248,6 @@ begin
if (bus_rsp_i.ack = '1') or (bus_rsp_i.err = '1') then
ctrl_nxt.state <= S_IDLE;
end if;
-- -- update cache if accessed address is cached --
-- [NOTE] not implemented: this would make atomic memory access / memory coherence even more difficult to understand
-- cache_o.data <= bus_rsp_i.data;
-- if (cache_i.sta_hit = '1') and (host_req_i.rw = '0') and (bus_rsp_i.ack = '1') then -- cache read hit
-- cache_o.we <= (others => '1');
-- end if;


when S_DOWNLOAD_REQ => -- download new cache block: request new word
Expand All @@ -277,10 +265,10 @@ begin
cache_o.cmd_new <= '1'; -- set new block (set tag, make valid & clean)
bus_req_o.rw <= '0'; -- read access
--
if (bus_rsp_i.err = '1') then --
cache_o.we <= (others => '1'); -- just keep writing full words
if (bus_rsp_i.err = '1') then -- bus error
ctrl_nxt.state <= S_DOWNLOAD_ERR;
elsif (bus_rsp_i.ack = '1') then
cache_o.we <= (others => '1'); -- cache: full-word write
addr_nxt.ofs <= std_ulogic_vector(unsigned(addr.ofs) + 1);
if (and_reduce_f(addr.ofs) = '1') then -- block completed
ctrl_nxt.state <= S_DOWNLOAD_DONE;
Expand Down Expand Up @@ -328,6 +316,7 @@ begin
cache_o.addr <= addr.tag & addr.idx & addr.ofs & "00";
bus_req_o.rw <= '1'; -- write access
cache_o.cmd_new <= '1'; -- set new block (set tag, make valid & clean)
--
if (bus_rsp_i.err = '1') then -- bus error (this is really bad...)
ctrl_nxt.state <= S_ERROR;
elsif (bus_rsp_i.ack = '1') then
Expand Down
4 changes: 2 additions & 2 deletions rtl/core/neorv32_clint.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,8 @@ begin
end if;
end process bus_access;

bus_rsp_o.ack <= ack_q;
bus_rsp_o.err <= '0';
bus_rsp_o.ack <= ack_q;
bus_rsp_o.err <= '0';


end neorv32_clint_rtl;
Expand Down
2 changes: 1 addition & 1 deletion rtl/core/neorv32_cpu_control.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,7 @@ begin
exe_engine_nxt.state <= EX_DISPATCH; -- stay here another round until hwtrig_start arrives in trap_ctrl.env_pending
elsif (frontend_i.valid = '1') then -- new instruction word available
if_ack <= '1'; -- instruction data is about to be consumed
trap_ctrl.instr_be <= frontend_i.error; -- access fault during instruction fetch
trap_ctrl.instr_be <= frontend_i.fault; -- access fault during instruction fetch
exe_engine_nxt.ci <= frontend_i.compr; -- this is a de-compressed instruction
exe_engine_nxt.ir <= frontend_i.instr; -- instruction word
exe_engine_nxt.pc <= exe_engine.pc2(XLEN-1 downto 1) & '0'; -- PC <= next PC
Expand Down
7 changes: 4 additions & 3 deletions rtl/core/neorv32_cpu_frontend.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ begin
ibus_req_o.ben <= (others => '0'); -- read-only
ibus_req_o.rw <= '0'; -- read-only
ibus_req_o.src <= '1'; -- always "instruction fetch" access
ibus_req_o.lock <= '0'; -- always single access
ibus_req_o.priv <= fetch.priv; -- current effective privilege level
ibus_req_o.debug <= ctrl_i.cpu_debug; -- CPU is in debug mode
ibus_req_o.amo <= '0'; -- cannot be an atomic memory operation
Expand Down Expand Up @@ -220,7 +221,7 @@ begin
issue.aclr <= '0';
-- start at LOW half-word --
if (issue.algn = '0') then
frontend_o.error <= ipb.rdata(0)(16);
frontend_o.fault <= ipb.rdata(0)(16);
if (ipb.rdata(0)(1 downto 0) /= "11") then -- compressed, consume IPB(0) entry
issue.aset <= ipb.avail(0); -- start of next instruction word is NOT 32-bit-aligned
issue.ack <= "01";
Expand All @@ -235,7 +236,7 @@ begin
end if;
-- start at HIGH half-word --
else
frontend_o.error <= ipb.rdata(1)(16);
frontend_o.fault <= ipb.rdata(1)(16);
if (ipb.rdata(1)(1 downto 0) /= "11") then -- compressed, consume IPB(1) entry
issue.aclr <= ipb.avail(1); -- start of next instruction word is 32-bit-aligned again
issue.ack <= "10";
Expand Down Expand Up @@ -270,7 +271,7 @@ begin
frontend_o.valid <= ipb.avail(0);
frontend_o.instr <= ipb.rdata(1)(15 downto 0) & ipb.rdata(0)(15 downto 0);
frontend_o.compr <= '0';
frontend_o.error <= ipb.rdata(0)(16);
frontend_o.fault <= ipb.rdata(0)(16);
end generate;


Expand Down
7 changes: 4 additions & 3 deletions rtl/core/neorv32_cpu_lsu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ begin
begin
if (rstn_i = '0') then
dbus_req_o.rw <= '0';
dbus_req_o.priv <= priv_mode_m_c;
dbus_req_o.priv <= '0';
dbus_req_o.debug <= '0';
dbus_req_o.amo <= '0';
dbus_req_o.amoop <= (others => '0');
Expand All @@ -84,9 +84,9 @@ begin
if (ctrl_i.lsu_mo_we = '1') then
-- type identifiers --
dbus_req_o.rw <= ctrl_i.lsu_rw; -- read/write
dbus_req_o.amo <= ctrl_i.lsu_amo; -- atomic memory operation
dbus_req_o.priv <= ctrl_i.lsu_priv; -- privilege level
dbus_req_o.debug <= ctrl_i.cpu_debug; -- debug-mode access
dbus_req_o.amo <= ctrl_i.lsu_amo; -- atomic memory operation
dbus_req_o.amoop <= amo_cmd;
-- data alignment + byte-enable --
case ctrl_i.ir_funct3(1 downto 0) is
Expand All @@ -108,7 +108,8 @@ begin
end process mem_do_reg;

-- hardwired signals --
dbus_req_o.src <= '0'; -- always "data" access
dbus_req_o.src <= '0'; -- always data access
dbus_req_o.lock <= '0'; -- always single access

-- out-of-band signals --
dbus_req_o.fence <= ctrl_i.lsu_fence;
Expand Down
15 changes: 7 additions & 8 deletions rtl/core/neorv32_cpu_pmp.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,13 @@ begin
end process addr_masking;
end generate; -- /nap_mode_enable

-- NAPOT disabled --
nap_mode_disable:
if not NAP_EN generate
addr_mask_napot(r) <= (others => '0');
addr_mask(r) <= (others => '0');
end generate;


-- check region address match --
-- NA4 and NAPOT --
Expand Down Expand Up @@ -337,14 +344,6 @@ begin
end generate;


-- NAPOT disabled --
nap_mode_disable:
if not NAP_EN generate
addr_mask_napot <= (others => (others => '0'));
addr_mask <= (others => (others => '0'));
end generate;


-- Access Permission Check (using static prioritization) ----------------------------------
-- -------------------------------------------------------------------------------------------

Expand Down
1 change: 1 addition & 0 deletions rtl/core/neorv32_dma.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,7 @@ begin
dma_req_o.rw <= engine.rw;
dma_req_o.addr <= engine.dst_addr when (engine.state = S_WRITE) else engine.src_addr;
dma_req_o.src <= '0'; -- source = data access
dma_req_o.lock <= '0'; -- always single access
dma_req_o.priv <= priv_mode_m_c; -- DMA accesses are always privileged
dma_req_o.debug <= '0'; -- can never ever be in debug mode
dma_req_o.amo <= '0'; -- no atomic memory operation possible
Expand Down
Loading