You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an exception occurs part way through the sequence of loads or stores
638
-
initiated by such an instruction, and the instruction is re-executed
639
-
after the exception handler has been serviced, the load or store
640
-
sequence must recommence from the beginning.
641
-
642
-
[NOTE]
643
-
====
644
-
This is required for data trace only. If data trace is not
645
-
implemented, the push or pop may instead be reported just once in the
646
-
normal way when all associated loads or stores complete successfully.
647
-
====
648
-
649
589
[[sec:DataTraceInterface]]
650
590
=== Data Trace Interface
651
591
@@ -695,7 +635,7 @@ is high
695
635
|*lid*[_lrid_width_p_-1:0] | S | Split Load ID. Valid when *lresp* is
696
636
non-zero
697
637
|*sdata*[_sdata_width_p_-1:0] | S | Store data. Valid when *dretire* is
698
-
high and access is a store (*dtype* is 1) or atomic (*dtype* is 8 - 14).
638
+
high and access is a store (*dtype* is 1 or 3) or atomic (*dtype* is 8 - 14).
699
639
|*ldata*[_ldata_width_p_-1:0] | S | Load data. Valid when *lresp* is
700
640
non-zero
701
641
|===
@@ -709,8 +649,8 @@ non-zero
709
649
| *Value* | *Description*
710
650
|0| Load
711
651
|1 | Store
712
-
|2 | reserved
713
-
|3 | reserved
652
+
|2 | Multi-memory-access Load
653
+
|3 | Multi-memory-access Store
714
654
|4 | CSR read-write
715
655
|5 | CSR read-set
716
656
|6 | CSR read-clear
@@ -726,8 +666,7 @@ non-zero
726
666
|===
727
667
728
668
The maximum value of _dtype_width_p_ is 4. However, if only loads and
729
-
stores are supported, _dtype_width_p_ can be 1. If CSRs are supported
730
-
but atomics are not, _dtype_width_p_ can be 3.
669
+
stores are supported, _dtype_width_p_ can be 1. If Multi-memory access instructions are also required, _dtype_width_p_ can be 2. If CSRs are supported but atomics are not, _dtype_width_p_ can be 3.
731
670
732
671
Atomic and CSR accesses have either both load and store data, or store
733
672
data and an operand. For CSRs and unified atomics, both values are
@@ -769,8 +708,8 @@ However, that is not the case for data trace. Consider two scenarios:
769
708
* Case 2: 1st block contains instructions 1, 2; second block contains 3,
770
709
4, 5
771
710
772
-
Given that *iretire* is non-zero in the same cycle that the data access
773
-
retires, the encoder knows the address of the 1st and last instructions
711
+
The signals from the <<sec:InstructionTraceInterface>> that provide details about the instruction block containing a data access instruction must be valid when *dretire* is active.
712
+
Given this, the encoder knows the address of the 1st and last instructions
774
713
in a block, but does not know precisely where in the block the data
775
714
access is. In both cases, the first block matches the filtering criteria
776
715
(it contains the address of instruction 1), and the second block does
@@ -796,3 +735,27 @@ access is associated with, so the encoder knows which block address to
796
735
combine with the LSBs in order to construct the actual data access
797
736
instruction address. 1 bit for 2 blocks per cycle, 2 bits for 4, and so
798
737
on.
738
+
739
+
==== Multi-memory-access Instructions
740
+
741
+
[NOTE]
742
+
====
743
+
This section introduces a normative change from the behavior described in the E-trace spec. This is necessary because the original behavior would result in incorrect N-Trace instruction trace. It could also result in misleading E-Trace instruction trace when one of these instructions traps part way through.
744
+
745
+
The previous requirement to report multi-access instructions as retired multiple times (once for each load or store) is withdrawn. They should be reported as retired only once in the same way as any other instruction.
746
+
====
747
+
The Zcmp and Vector extensions include instructions that result in multiple loads or stores, and other future extensions may add other instructions with similar characteristics. It is not practical for the hart to provide information about all the loads or stores initiated by such an instruction simultenously when the instruction retires; it must be provided for each load or store in turn as it completes. This means that load/store information for these instructions is provided to the decoder before the instruction itself retires. This in turn means that *dretire* will be active when *iretire* is not, and this requires that the signals from the <<sec:InstructionTraceInterface>> that provide details about the instruction that are normally defined to be valid when *iretire* is non-zero must also be valid when *dretire* is 1.
748
+
749
+
These multi-memory-access instructions may be interrupted part-way through their access sequence, or a trap may occur on one of the accesses. This can lead to difficulties correlating the data and instruction trace streams. Consider the following situation:
750
+
751
+
* Instruction "X" retires;
752
+
* A vector store that performs 4 stores starts executing;
753
+
* 2 stores are completed and then a trap occurs on the 3rd store;
754
+
* The trap handler includes a least one store;
755
+
* On returning from the trap handler, the vector store instruction resumes, performs the remaining stores and then retires.
756
+
757
+
The program flow decoded from the instruction trace stream will show that a trap occurred following instruction "X", with the vector store following the trap handler return. On the other hand, the data trace stream will have the 2 vector stores that occured before the trap, then the stores and loads from the trap handler, then the remaining vector stores. It can be seen here that the order of the stores in the data trace stream does not match the order of store retirements in the instruction trace stream.
758
+
759
+
In order to allow the decoder to resolve ordering issues such as this, and correlate the loads and stores with the correct instruction, loads or stores from multi-memory-access instructions are identified using dedicated *dtype* encodings. On encountering these loads or stores in the data trace stream, the decoder must set these aside in a stack-like structure until a multi-access load or store instruction is encountered in the instruction trace stream. It can then correlate the appropriate number of the most recent multi-memory-access loads or stores with the instruction.
760
+
761
+
Some multi-access instructions restart from the beginning following a trap, and this complicates the correlation process, but such repeat loads or stores can be identified from their addresses. For example, a _cm.push_ instruction that performs 4 stores interrupted after the 2nd store will result in a sequence of 6 stores at addresses A, A+1, A, A+1, A+2, A+3. This will only happen when the multi-access load/store retires immediately after a *ret, and the number of repeated accesses can be determined from the addressing pattern.
0 commit comments