Skip to content

Conversation

stnolting
Copy link
Owner

@stnolting stnolting commented May 16, 2025

This PR removes the hardware spinlocks module and also the inter-core communication links.

⚠️ Remove Hardware Spinlocks

The hardware spinlocks are definitely (another) example of over-engineering. They cost unnecessary additional hardware. The same functionality can be realized on the software side via the A ISA extensions and its sub-extensions:

  • Zalrsc: Use load-reservate and store-conditional instructions for implementing a spinlock. This approach only adds a minimal hardware overhead. However, this solution is also less efficient as the processor only implements a single global reservation set.
  • Zaamo: Use atomic read-modify-write operations for implementing a spinlock. This is very efficient in terms of performance but adds a significant hardware overhead.

⚠️ Remove Inter-Core Communication

Until now, the ICC mechanism was mainly used for booting the second core. The ICC queues were too small for efficient communication between the cores (only 4 entries). Larger queues would cost significantly more hardware and also affect the critical path of the processor, as they are mapped directly into the CSR logic of the core. In addition, since the queues are ultimately only FIFOs, they only provide serial access to the data. However, the nice thing about the ICC was its low latency.

However, communication via the shared memory is much more flexible. For example, by using C's _Atomic variables you can build various communication structures.

@stnolting stnolting self-assigned this May 16, 2025
@stnolting stnolting added HW Hardware-related cleanup Clean-up the codebase labels May 16, 2025
@stnolting stnolting marked this pull request as ready for review May 17, 2025 08:59
@stnolting stnolting merged commit 22e6ee4 into main May 17, 2025
7 checks passed
@stnolting stnolting deleted the ipc_cleanup branch May 17, 2025 10:29
@NikLeberg
Copy link
Collaborator

Hey @stnolting
The last few changes focused on lowering the cores critical path. I like it! Do you have a ballpark number of the gained speed increase? I.e. what frequencies are you able to acheive on your systems?

@stnolting
Copy link
Owner Author

This is not so easy to answer, as the critical path always changes a little with each new placement. But without the FIFO links between the cores, the synthesizer can apparently pack the cores more densely.

I.e. what frequencies are you able to acheive on your systems?

I have a vintage Cyclone IV that runs at 120 MHz. My Artix7-35 is running at 150 MHz. Both setups use the dual-core configuration and almost all optional ISA extensions and peripheral modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Clean-up the codebase HW Hardware-related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants