Updated documentation and test vectors

Added AesGenerator1R test Added benchmark hints if large pages fail
2026-03-04 21:57:36 -05:00 · 2019-06-22 17:42:26 +02:00
parent 91cd35ff13
commit 8282413154
5 changed files with 67 additions and 23 deletions
--- a/doc/design.md
+++ b/doc/design.md
@@ -157,7 +157,11 @@ The IADD_RS instruction utilizes the address calculation logic of CPUs and can b

 Because integer division is not fully pipelined in CPUs and can be made faster in ASICs, the IMUL_RCP instruction requires only one division per program to calculate the reciprocal. This forces an ASIC to include a hardware divider without giving them a performance advantage during program execution.

-#### 2.4.3 ISWAP_R
+#### 2.4.3 IROR_R/IROL_R
+
+Rotation instructions are split between rotate right and rotate left with a 4:1 ratio. Rotate right has a higher frequency because some architecures (like ARM) don't support rotate left natively (it must be emulated using rotate right).
+
+#### 2.4.4 ISWAP_R

 This instruction can be executed efficiently by CPUs that support register renaming/move elimination.

--- a/doc/specs.md
+++ b/doc/specs.md
@@ -567,8 +567,8 @@ For integer instructions, the destination is always an integer register (registe
 |2/256|INEG_R|R|-|-|`dst = -dst`|
 |15/256|IXOR_R|R|R|`src = imm32`|`dst = dst ^ src`|
 |5/256|IXOR_M|R|R|`src = 0`|`dst = dst ^ [mem]`|
-|10/256|IROR_R|R|R|`src = imm32`|`dst = dst >>> src`|
-|0/256|IROL_R|R|R|`src = imm32`|`dst = dst <<< src`|
+|8/256|IROR_R|R|R|`src = imm32`|`dst = dst >>> src`|
+|2/256|IROL_R|R|R|`src = imm32`|`dst = dst <<< src`|
 |4/256|ISWAP_R|R|R|`src = dst`|`temp = src; src = dst; dst = temp`|

 #### 5.2.1 IADD_RS
@@ -616,13 +616,13 @@ All floating point operations are rounded according to the current value of the

 |frequency|instruction|dst|src|operation|
 |-|-|-|-|-|
-|8/256|FSWAP_R|F+E|-|`(dst0, dst1) = (dst1, dst0)`|
-|20/256|FADD_R|F|A|`(dst0, dst1) = (dst0 + src0, dst1 + src1)`|
+|4/256|FSWAP_R|F+E|-|`(dst0, dst1) = (dst1, dst0)`|
+|16/256|FADD_R|F|A|`(dst0, dst1) = (dst0 + src0, dst1 + src1)`|
 |5/256|FADD_M|F|R|`(dst0, dst1) = (dst0 + [mem][0], dst1 + [mem][1])`|
-|20/256|FSUB_R|F|A|`(dst0, dst1) = (dst0 - src0, dst1 - src1)`|
+|16/256|FSUB_R|F|A|`(dst0, dst1) = (dst0 - src0, dst1 - src1)`|
 |5/256|FSUB_M|F|R|`(dst0, dst1) = (dst0 - [mem][0], dst1 - [mem][1])`|
 |6/256|FSCAL_R|F|-|<code>(dst0, dst1) = (-2<sup>x0</sup> * dst0, -2<sup>x1</sup> * dst1)</code>|
-|20/256|FMUL_R|E|A|`(dst0, dst1) = (dst0 * src0, dst1 * src1)`|
+|32/256|FMUL_R|E|A|`(dst0, dst1) = (dst0 * src0, dst1 * src1)`|
 |4/256|FDIV_M|E|R|`(dst0, dst1) = (dst0 / [mem][0], dst1 / [mem][1])`|
 |6/256|FSQRT_R|E|-|`(dst0, dst1) = (√dst0, √dst1)`|