Skip to content

Commit bdb1bbc

Browse files
Merge pull request #32 from riscv-non-isa/Issue-23
Reworked multi-memory-access data trace requriements - Issue-23
2 parents 3ce8023 + 33c1873 commit bdb1bbc

File tree

1 file changed

+30
-67
lines changed

1 file changed

+30
-67
lines changed

src/hti.adoc

Lines changed: 30 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -586,66 +586,6 @@ been returned to the hart from the memory system.
586586

587587
The two parts of a split load are associated by use of a transaction ID.
588588

589-
The Zc (code-size reduction) extension introduced push and pop
590-
instructions (_cm.push_, _cm.pop_, _cm.popret_ and _cm.popretz_) that
591-
each result in multiple loads or stores. To allow the resulting loads or
592-
stores to be associated with the correct instruction, these
593-
multi-memory-access instructions (and any other future instructions with
594-
similar characteristics) must be reported on the instruction trace
595-
interface multiple times (once for each individual load or store) using
596-
*itype* 0 except for the final load or store, which must retire using
597-
the natural *itype* for the instruction (for example, a _cm.popret_
598-
instruction must use *itype* 13 for the final load to signal the
599-
return). The instruction address reported will be the same for each
600-
occurrence.
601-
602-
The following illustrations show the retirement sequences when a single
603-
_cm.push_ or _cm.popret_ is used to push or pop 4 registers from the
604-
stack. They assume a RISC-V to encoder interface that can report a block
605-
of 1 or more retired instructions and one load or store per cycle. Each
606-
comprises 4 elements, and shows the instruction information reported for
607-
each load and store. As detailed in section
608-
<<sec:InstructionTraceInterface>>, this takes the form of the address
609-
of an instruction, the length of the block (1 for a single instruction)
610-
and the type of the last instruction in the block. In each element,
611-
’Block’ indicates a block of 1 or more instructions (i.e. could also be
612-
a single instruction), whereas ’Single’ indicates a single instruction
613-
(i.e. a block with a length of 1).
614-
615-
A _cm.push_ is equivalent to 4 store instructions:
616-
617-
. Block - last instruction is _cm.push_, *itype* 0 (data trace interface
618-
reports 1st store);
619-
. Single - _cm.push_, *itype* 0 (data trace interface reports 2nd
620-
store);
621-
. Single - _cm.push_, *itype* 0 (data trace interface reports 3rd
622-
store);
623-
. Block - 1st instruction is _cm.push_, *itype* dependent on last
624-
instruction in block (data trace interface reports 4th store);
625-
626-
A _cm.popret_ is equivalent to 4 loads and a return:
627-
628-
. Block - last instruction is _cm.popret_, *itype* 0 (data trace
629-
interface reports 1st load);
630-
. Single - _cm.popret_, *itype* 0 (data trace interface reports 2nd
631-
load);
632-
. Single - _cm.popret_, *itype* 0 (data trace interface reports 3rd
633-
load);
634-
. Single - _cm.popret_, *itype* 13 (data trace interface reports 4th
635-
load);
636-
637-
If an exception occurs part way through the sequence of loads or stores
638-
initiated by such an instruction, and the instruction is re-executed
639-
after the exception handler has been serviced, the load or store
640-
sequence must recommence from the beginning.
641-
642-
[NOTE]
643-
====
644-
This is required for data trace only. If data trace is not
645-
implemented, the push or pop may instead be reported just once in the
646-
normal way when all associated loads or stores complete successfully.
647-
====
648-
649589
[[sec:DataTraceInterface]]
650590
=== Data Trace Interface
651591

@@ -695,7 +635,7 @@ is high
695635
|*lid*[_lrid_width_p_-1:0] | S | Split Load ID. Valid when *lresp* is
696636
non-zero
697637
|*sdata*[_sdata_width_p_-1:0] | S | Store data. Valid when *dretire* is
698-
high and access is a store (*dtype* is 1) or atomic (*dtype* is 8 - 14).
638+
high and access is a store (*dtype* is 1 or 3) or atomic (*dtype* is 8 - 14).
699639
|*ldata*[_ldata_width_p_-1:0] | S | Load data. Valid when *lresp* is
700640
non-zero
701641
|===
@@ -709,8 +649,8 @@ non-zero
709649
| *Value* | *Description*
710650
|0| Load
711651
|1 | Store
712-
|2 | reserved
713-
|3 | reserved
652+
|2 | Multi-memory-access Load
653+
|3 | Multi-memory-access Store
714654
|4 | CSR read-write
715655
|5 | CSR read-set
716656
|6 | CSR read-clear
@@ -726,8 +666,7 @@ non-zero
726666
|===
727667

728668
The maximum value of _dtype_width_p_ is 4. However, if only loads and
729-
stores are supported, _dtype_width_p_ can be 1. If CSRs are supported
730-
but atomics are not, _dtype_width_p_ can be 3.
669+
stores are supported, _dtype_width_p_ can be 1. If Multi-memory access instructions are also required, _dtype_width_p_ can be 2. If CSRs are supported but atomics are not, _dtype_width_p_ can be 3.
731670

732671
Atomic and CSR accesses have either both load and store data, or store
733672
data and an operand. For CSRs and unified atomics, both values are
@@ -769,8 +708,8 @@ However, that is not the case for data trace. Consider two scenarios:
769708
* Case 2: 1st block contains instructions 1, 2; second block contains 3,
770709
4, 5
771710

772-
Given that *iretire* is non-zero in the same cycle that the data access
773-
retires, the encoder knows the address of the 1st and last instructions
711+
The signals from the <<sec:InstructionTraceInterface>> that provide details about the instruction block containing a data access instruction must be valid when *dretire* is active.
712+
Given this, the encoder knows the address of the 1st and last instructions
774713
in a block, but does not know precisely where in the block the data
775714
access is. In both cases, the first block matches the filtering criteria
776715
(it contains the address of instruction 1), and the second block does
@@ -796,3 +735,27 @@ access is associated with, so the encoder knows which block address to
796735
combine with the LSBs in order to construct the actual data access
797736
instruction address. 1 bit for 2 blocks per cycle, 2 bits for 4, and so
798737
on.
738+
739+
==== Multi-memory-access Instructions
740+
741+
[NOTE]
742+
====
743+
This section introduces a normative change from the behavior described in the E-trace spec. This is necessary because the original behavior would result in incorrect N-Trace instruction trace. It could also result in misleading E-Trace instruction trace when one of these instructions traps part way through.
744+
745+
The previous requirement to report multi-access instructions as retired multiple times (once for each load or store) is withdrawn. They should be reported as retired only once in the same way as any other instruction.
746+
====
747+
The Zcmp and Vector extensions include instructions that result in multiple loads or stores, and other future extensions may add other instructions with similar characteristics. It is not practical for the hart to provide information about all the loads or stores initiated by such an instruction simultenously when the instruction retires; it must be provided for each load or store in turn as it completes. This means that load/store information for these instructions is provided to the decoder before the instruction itself retires. This in turn means that *dretire* will be active when *iretire* is not, and this requires that the signals from the <<sec:InstructionTraceInterface>> that provide details about the instruction that are normally defined to be valid when *iretire* is non-zero must also be valid when *dretire* is 1.
748+
749+
These multi-memory-access instructions may be interrupted part-way through their access sequence, or a trap may occur on one of the accesses. This can lead to difficulties correlating the data and instruction trace streams. Consider the following situation:
750+
751+
* Instruction "X" retires;
752+
* A vector store that performs 4 stores starts executing;
753+
* 2 stores are completed and then a trap occurs on the 3rd store;
754+
* The trap handler includes a least one store;
755+
* On returning from the trap handler, the vector store instruction resumes, performs the remaining stores and then retires.
756+
757+
The program flow decoded from the instruction trace stream will show that a trap occurred following instruction "X", with the vector store following the trap handler return. On the other hand, the data trace stream will have the 2 vector stores that occured before the trap, then the stores and loads from the trap handler, then the remaining vector stores. It can be seen here that the order of the stores in the data trace stream does not match the order of store retirements in the instruction trace stream.
758+
759+
In order to allow the decoder to resolve ordering issues such as this, and correlate the loads and stores with the correct instruction, loads or stores from multi-memory-access instructions are identified using dedicated *dtype* encodings. On encountering these loads or stores in the data trace stream, the decoder must set these aside in a stack-like structure until a multi-access load or store instruction is encountered in the instruction trace stream. It can then correlate the appropriate number of the most recent multi-memory-access loads or stores with the instruction.
760+
761+
Some multi-access instructions restart from the beginning following a trap, and this complicates the correlation process, but such repeat loads or stores can be identified from their addresses. For example, a _cm.push_ instruction that performs 4 stores interrupted after the 2nd store will result in a sequence of 6 stores at addresses A, A+1, A, A+1, A+2, A+3. This will only happen when the multi-access load/store retires immediately after a *ret, and the number of repeated accesses can be determined from the addressing pattern.

0 commit comments

Comments
 (0)