Assembly Diffs

linux arm

Diffs are based on 2,230,531 contexts (825,130 MinOpts, 1,405,401 FullOpts).

MISSED contexts: 77,526 (3.36%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm.checked.mch 45,977 5,279 40,698 1,423 (3.00%) 1,423 (3.00%)
benchmarks.run_pgo.linux.arm.checked.mch 159,274 58,093 101,181 3,553 (2.18%) 3,553 (2.18%)
benchmarks.run_tiered.linux.arm.checked.mch 71,355 38,077 33,278 1,124 (1.55%) 1,124 (1.55%)
coreclr_tests.run.linux.arm.checked.mch 471,423 259,093 212,330 7,618 (1.59%) 7,618 (1.59%)
libraries.crossgen2.linux.arm.checked.mch 195,441 14 195,427 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm.checked.mch 269,967 6 269,961 9,462 (3.39%) 9,462 (3.39%)
libraries_tests.run.linux.arm.Release.mch 708,260 442,850 265,410 17,521 (2.41%) 17,521 (2.41%)
librariestestsnotieredcompilation.run.linux.arm.Release.mch 272,764 21,565 251,199 35,091 (11.40%) 35,091 (11.40%)
realworld.run.linux.arm.checked.mch 36,070 153 35,917 1,734 (4.59%) 1,734 (4.59%)
2,230,531 825,130 1,405,401 77,526 (3.36%) 77,526 (3.36%)


windows x86

Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts).

MISSED contexts: 6,850 (0.30%)

Overall (-3,338 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 6,964,581 -117
benchmarks.run_pgo.windows.x86.checked.mch 45,650,701 -126
benchmarks.run_tiered.windows.x86.checked.mch 9,318,080 -117
coreclr_tests.run.windows.x86.checked.mch 308,781,934 -672
libraries.pmi.windows.x86.checked.mch 48,079,459 -565
libraries_tests.run.windows.x86.Release.mch 187,430,449 -896
librariestestsnotieredcompilation.run.windows.x86.Release.mch 102,689,706 -845

FullOpts (-3,338 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 6,964,302 -117
benchmarks.run_pgo.windows.x86.checked.mch 39,037,822 -126
benchmarks.run_tiered.windows.x86.checked.mch 5,050,641 -117
coreclr_tests.run.windows.x86.checked.mch 107,087,629 -672
libraries.pmi.windows.x86.checked.mch 47,984,145 -565
libraries_tests.run.windows.x86.Release.mch 89,274,018 -896
librariestestsnotieredcompilation.run.windows.x86.Release.mch 94,019,998 -845

Example diffs

benchmarks.run.windows.x86.checked.mch

-117 (-11.17%) : 22326.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -9,23 +9,23 @@ ; Final local variable assignments ; ; V00 arg0 [V00,T13] ( 4, 4 ) byref -> edi single-def
-; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0xB4] single-def
+; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0x74] single-def
; V02 arg2 [V02,T14] ( 3, 3 ) int -> [ebp+0x10] single-def ; V03 arg3 [V03,T15] ( 2, 2 ) struct ( 8) [ebp+0x08] do-not-enreg[S] single-def <System.ReadOnlySpan`1[ushort]>
-; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0xB8] spill-single-def
+; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0x78] spill-single-def
; V05 loc1 [V05,T00] ( 19, 93.50) byref -> ebx ; V06 loc2 [V06,T30] ( 5, 10 ) simd16 -> [ebp-0x1C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V07 loc3 [V07,T31] ( 5, 10 ) simd16 -> [ebp-0x2C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]>
-; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0xBC] single-def
+; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0x7C] single-def
; V09 loc5 [V09,T32] ( 3, 8.50) simd32 -> [ebp-0x4C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V10 loc6 [V10,T33] ( 3, 8.50) simd32 -> [ebp-0x6C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0xC0] spill-single-def
+; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0x80] spill-single-def
; V12 loc8 [V12,T20] ( 4, 14 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V13 loc9 [V13,T01] ( 5, 66 ) int -> esi ; V14 loc10 [V14,T07] ( 3, 32.50) byref -> edi ; V15 loc11 [V15,T21] ( 4, 14 ) simd16 -> mm2 <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V16 loc12 [V16,T02] ( 5, 66 ) int -> esi
-; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0xC4] spill-single-def
+; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0x84] spill-single-def
;* V18 tmp0 [V18 ] ( 0, 0 ) int -> zero-ref ;* V19 tmp1 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V20 tmp2 [V20 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" @@ -39,10 +39,10 @@ ; V28 tmp10 [V28,T23] ( 3, 12 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V29 tmp11 [V29,T24] ( 3, 12 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V30 tmp12 [V30,T25] ( 3, 12 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> [ebp-0x8C] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V32 tmp14 [V32,T35] ( 2, 8 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V33 tmp15 [V33 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> [ebp-0xAC] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V35 tmp17 [V35,T16] ( 4, 16 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V36 tmp18 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V37 tmp19 [V37 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -99,10 +99,10 @@ ;* V88 tmp70 [V88 ] ( 0, 0 ) int -> zero-ref "field V80._length (fldOffset=0x4)" P-INDEP ;* V89 tmp71 [V89 ] ( 0, 0 ) byref -> zero-ref "field V82._reference (fldOffset=0x0)" P-INDEP ;* V90 tmp72 [V90 ] ( 0, 0 ) int -> zero-ref "field V82._length (fldOffset=0x4)" P-INDEP
-; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0xC8] spill-single-def "V03.[000..004)" -; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0xB0] spill-single-def "V03.[004..008)"
+; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0x88] spill-single-def "V03.[000..004)" +; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0x70] spill-single-def "V03.[004..008)"
;
-; Lcl frame size = 188
+; Lcl frame size = 124
G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp @@ -110,25 +110,25 @@ G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} push edi push esi push ebx
- sub esp, 188
+ sub esp, 124
vzeroupper mov edi, ecx ; byrRegs +[edi] mov esi, edx ; byrRegs +[esi] mov ebx, dword ptr [ebp+0x10]
- ;; size=22 bbWeight=1 PerfScore 7.00
+ ;; size=19 bbWeight=1 PerfScore 7.00
G_M48875_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C0 {esi edi}, byref, isz mov eax, bword ptr [ebp+0x08] ; byrRegs +[eax]
- mov bword ptr [ebp-0xC8], eax
+ mov bword ptr [ebp-0x88], eax
; GC ptr vars +{V91} mov edx, dword ptr [ebp+0x0C]
- mov dword ptr [ebp-0xB0], edx - mov eax, bword ptr [ebp-0xC8]
+ mov dword ptr [ebp-0x70], edx + mov eax, bword ptr [ebp-0x88]
cmp ebx, 16 jge SHORT G_M48875_IG04
- ;; size=29 bbWeight=1 PerfScore 6.25
+ ;; size=26 bbWeight=1 PerfScore 6.25
G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, gcvars, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -137,16 +137,16 @@ G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs= call [<unknown method>] ; gcrRegs -[ecx edx] ; byrRegs -[eax]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] ;; size=22 bbWeight=0.50 PerfScore 2.25 G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, byref mov dword ptr [ebp+0x10], ebx lea ecx, bword ptr [esi+2*ebx] ; byrRegs +[ecx]
- mov bword ptr [ebp-0xB8], ecx
+ mov bword ptr [ebp-0x78], ecx
; GC ptr vars +{V04}
- mov bword ptr [ebp-0xB4], esi
+ mov bword ptr [ebp-0x74], esi
; GC ptr vars +{V01} mov ebx, esi ; byrRegs +[ebx] @@ -156,7 +156,7 @@ G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {e vmovups xmmword ptr [ebp-0x2C], xmm1 cmp dword ptr [ebp+0x10], 32 jl G_M48875_IG17
- ;; size=49 bbWeight=1 PerfScore 16.75
+ ;; size=43 bbWeight=1 PerfScore 16.75
G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, gcvars, byref ; byrRegs -[esi edi] vmovaps ymm2, ymm0 @@ -167,9 +167,9 @@ G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc vmovups ymmword ptr [ebp-0x6C], ymm3 lea edi, bword ptr [ecx-0x40] ; byrRegs +[edi]
- mov bword ptr [ebp-0xC0], edi
+ mov bword ptr [ebp-0x80], edi
; GC ptr vars +{V11}
- ;; size=39 bbWeight=0.50 PerfScore 4.00
+ ;; size=36 bbWeight=0.50 PerfScore 4.00
G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref, isz ; byrRegs -[ecx edi] vmovups ymm4, ymmword ptr [ebx] @@ -184,41 +184,36 @@ G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, g vpand ymm5, ymm5, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm5, ymm7, ymm5
- vmovups ymmword ptr [ebp-0xAC], ymm5
vpand ymm6, ymm6, ymmword ptr [@RWD96] vpcmpub k1, ymm6, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm6, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm6, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm7, ymm5, ymm6, -54 - vpand ymm5, ymm7, ymmword ptr [ebp-0xAC]
+ vpblendmb ymm6 {k1}, ymm6, ymm7 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 vpxor ymm5, ymm5, ymm6
- vmovups ymmword ptr [ebp-0x8C], ymm5
vpsrld ymm6, ymm4, 5 vpand ymm6, ymm6, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymmword ptr [@RWD96] vpcmpub k1, ymm4, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm4, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm4, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm5, ymm4, -54 - vpand ymm4, ymm7, ymm6 - vxorps ymm5, ymm5, ymm5 - vpcmpeqb ymm4, ymm4, ymm5 - vpcmpeqd ymm5, ymm5, ymm5 - vpxor ymm4, ymm4, ymm5 - vmovups ymm5, ymmword ptr [ebp-0x8C]
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6 + vxorps ymm6, ymm6, ymm6 + vpcmpeqb ymm4, ymm4, ymm6 + vpcmpeqd ymm6, ymm6, ymm6 + vpxor ymm4, ymm4, ymm6
vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je SHORT G_M48875_IG11
- ;; size=274 bbWeight=4 PerfScore 348.00
+ ;; size=232 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref vpermq ymm4, ymm4, -40 vpmovmskb esi, ymm4 @@ -229,7 +224,7 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { lea edi, bword ptr [ebx+2*edi] ; byrRegs +[edi] movzx ecx, word ptr [edi]
- push dword ptr [ebp-0xB0]
+ push dword ptr [ebp-0x70]
movsx edx, cx mov ecx, eax ; byrRegs +[ecx] @@ -239,19 +234,19 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { jne SHORT G_M48875_IG12 blsr esi, esi jne SHORT G_M48875_IG10
- ;; size=40 bbWeight=16 PerfScore 192.00
+ ;; size=37 bbWeight=16 PerfScore 192.00
G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[edi] add ebx, 64
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
; byrRegs +[edi] cmp ebx, edi
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] vmovups ymm2, ymmword ptr [ebp-0x4C] vmovups ymm3, ymmword ptr [ebp-0x6C] jbe G_M48875_IG06
- mov ecx, bword ptr [ebp-0xB8]
+ mov ecx, bword ptr [ebp-0x78]
; byrRegs +[ecx] cmp ebx, ecx je SHORT G_M48875_IG14 @@ -261,13 +256,13 @@ G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {e ; byrRegs -[esi] cmp esi, 32 jle SHORT G_M48875_IG16
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
mov ebx, edi jmp G_M48875_IG06
- ;; size=65 bbWeight=4 PerfScore 75.00
+ ;; size=56 bbWeight=4 PerfScore 75.00
G_M48875_IG10: ; bbWeight=8, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[eax ecx edi]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] jmp SHORT G_M48875_IG08 ;; size=8 bbWeight=8 PerfScore 24.00 @@ -279,10 +274,10 @@ G_M48875_IG12: ; bbWeight=0.50, gcVars=0000000000001000 {V01}, gcrefRegs= ; GC ptr vars -{V04 V11 V91} mov eax, edi ; byrRegs +[eax]
- sub eax, dword ptr [ebp-0xB4]
+ sub eax, dword ptr [ebp-0x74]
; byrRegs -[eax] shr eax, 1
- ;; size=10 bbWeight=0.50 PerfScore 1.38
+ ;; size=7 bbWeight=0.50 PerfScore 1.38
G_M48875_IG13: ; bbWeight=0.50, epilog, nogc, extend vzeroupper lea esp, [ebp-0x0C] @@ -317,10 +312,10 @@ G_M48875_IG16: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc G_M48875_IG17: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, byref lea edi, bword ptr [ecx-0x20] ...

benchmarks.run_pgo.windows.x86.checked.mch

-126 (-11.73%) : 94556.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

@@ -9,23 +9,23 @@ ; Final local variable assignments ; ; V00 arg0 [V00,T13] ( 4, 4 ) byref -> edi single-def
-; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0xB4] single-def
+; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0x74] single-def
; V02 arg2 [V02,T14] ( 3, 3 ) int -> [ebp+0x10] single-def ; V03 arg3 [V03,T15] ( 2, 2 ) struct ( 8) [ebp+0x08] do-not-enreg[S] single-def <System.ReadOnlySpan`1[ushort]>
-; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0xB8] spill-single-def
+; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0x78] spill-single-def
; V05 loc1 [V05,T00] ( 19, 93.50) byref -> ebx ; V06 loc2 [V06,T30] ( 5, 10 ) simd16 -> [ebp-0x1C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V07 loc3 [V07,T31] ( 5, 10 ) simd16 -> [ebp-0x2C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]>
-; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0xBC] single-def
+; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0x7C] single-def
; V09 loc5 [V09,T32] ( 3, 8.50) simd32 -> [ebp-0x4C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V10 loc6 [V10,T33] ( 3, 8.50) simd32 -> [ebp-0x6C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0xC0] spill-single-def
+; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0x80] spill-single-def
; V12 loc8 [V12,T20] ( 4, 14 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V13 loc9 [V13,T01] ( 5, 66 ) int -> esi ; V14 loc10 [V14,T07] ( 3, 32.50) byref -> edi ; V15 loc11 [V15,T21] ( 4, 14 ) simd16 -> mm2 <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V16 loc12 [V16,T02] ( 5, 66 ) int -> esi
-; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0xC4] spill-single-def
+; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0x84] spill-single-def
;* V18 tmp0 [V18 ] ( 0, 0 ) int -> zero-ref ;* V19 tmp1 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V20 tmp2 [V20 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" @@ -39,10 +39,10 @@ ; V28 tmp10 [V28,T23] ( 3, 12 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V29 tmp11 [V29,T24] ( 3, 12 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V30 tmp12 [V30,T25] ( 3, 12 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> [ebp-0x8C] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V32 tmp14 [V32,T35] ( 2, 8 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V33 tmp15 [V33 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> [ebp-0xAC] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V35 tmp17 [V35,T16] ( 4, 16 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V36 tmp18 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V37 tmp19 [V37 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -99,10 +99,10 @@ ;* V88 tmp70 [V88 ] ( 0, 0 ) int -> zero-ref "field V80._length (fldOffset=0x4)" P-INDEP ;* V89 tmp71 [V89 ] ( 0, 0 ) byref -> zero-ref "field V82._reference (fldOffset=0x0)" P-INDEP ;* V90 tmp72 [V90 ] ( 0, 0 ) int -> zero-ref "field V82._length (fldOffset=0x4)" P-INDEP
-; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0xC8] spill-single-def "V03.[000..004)" -; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0xB0] spill-single-def "V03.[004..008)"
+; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0x88] spill-single-def "V03.[000..004)" +; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0x70] spill-single-def "V03.[004..008)"
;
-; Lcl frame size = 188
+; Lcl frame size = 124
G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp @@ -110,31 +110,31 @@ G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} push edi push esi push ebx
- sub esp, 188
+ sub esp, 124
vzeroupper mov edi, ecx ; byrRegs +[edi] mov esi, edx ; byrRegs +[esi] mov ebx, dword ptr [ebp+0x10]
- ;; size=22 bbWeight=1 PerfScore 7.00
+ ;; size=19 bbWeight=1 PerfScore 7.00
G_M48875_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C0 {esi edi}, byref mov eax, bword ptr [ebp+0x08] ; byrRegs +[eax]
- mov bword ptr [ebp-0xC8], eax
+ mov bword ptr [ebp-0x88], eax
; GC ptr vars +{V91} mov ecx, dword ptr [ebp+0x0C]
- mov dword ptr [ebp-0xB0], ecx
+ mov dword ptr [ebp-0x70], ecx
cmp ebx, 16 jl G_M48875_IG25
- ;; size=27 bbWeight=1 PerfScore 5.25
+ ;; size=24 bbWeight=1 PerfScore 5.25
G_M48875_IG03: ; bbWeight=1, gcVars=0000000000000020 {V91}, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, gcvars, byref mov dword ptr [ebp+0x10], ebx lea edx, bword ptr [esi+2*ebx] ; byrRegs +[edx]
- mov bword ptr [ebp-0xB8], edx
+ mov bword ptr [ebp-0x78], edx
; GC ptr vars +{V04}
- mov bword ptr [ebp-0xB4], esi
+ mov bword ptr [ebp-0x74], esi
; GC ptr vars +{V01} mov ebx, esi ; byrRegs +[ebx] @@ -144,7 +144,7 @@ G_M48875_IG03: ; bbWeight=1, gcVars=0000000000000020 {V91}, gcrefRegs=000 vmovups xmmword ptr [ebp-0x2C], xmm1 cmp dword ptr [ebp+0x10], 32 jl G_M48875_IG16
- ;; size=49 bbWeight=1 PerfScore 16.75
+ ;; size=43 bbWeight=1 PerfScore 16.75
G_M48875_IG04: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gcrefRegs=00000000 {}, byrefRegs=0000000D {eax edx ebx}, gcvars, byref ; byrRegs -[esi edi] vmovaps ymm2, ymm0 @@ -155,10 +155,10 @@ G_M48875_IG04: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc vmovups ymmword ptr [ebp-0x6C], ymm3 lea edi, bword ptr [edx-0x40] ; byrRegs +[edi]
- mov bword ptr [ebp-0xC0], edi
+ mov bword ptr [ebp-0x80], edi
; GC ptr vars +{V11}
- ;; size=39 bbWeight=0.50 PerfScore 4.00 -G_M48875_IG05: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref
+ ;; size=36 bbWeight=0.50 PerfScore 4.00 +G_M48875_IG05: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref, isz
; byrRegs -[edx edi] vmovups ymm4, ymmword ptr [ebx] vmovups ymm5, ymmword ptr [ebx+0x20] @@ -172,41 +172,36 @@ G_M48875_IG05: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, g vpand ymm5, ymm5, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm5, ymm7, ymm5
- vmovups ymmword ptr [ebp-0xAC], ymm5
vpand ymm6, ymm6, ymmword ptr [@RWD96] vpcmpub k1, ymm6, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm6, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm6, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm7, ymm5, ymm6, -54 - vpand ymm5, ymm7, ymmword ptr [ebp-0xAC]
+ vpblendmb ymm6 {k1}, ymm6, ymm7 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 vpxor ymm5, ymm5, ymm6
- vmovups ymmword ptr [ebp-0x8C], ymm5
vpsrld ymm6, ymm4, 5 vpand ymm6, ymm6, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymmword ptr [@RWD96] vpcmpub k1, ymm4, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm4, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm4, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm5, ymm4, -54 - vpand ymm4, ymm7, ymm6 - vxorps ymm5, ymm5, ymm5 - vpcmpeqb ymm4, ymm4, ymm5 - vpcmpeqd ymm5, ymm5, ymm5 - vpxor ymm4, ymm4, ymm5 - vmovups ymm5, ymmword ptr [ebp-0x8C]
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6 + vxorps ymm6, ymm6, ymm6 + vpcmpeqb ymm4, ymm4, ymm6 + vpcmpeqd ymm6, ymm6, ymm6 + vpxor ymm4, ymm4, ymm6
vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4
- je G_M48875_IG10 - ;; size=278 bbWeight=4 PerfScore 348.00
+ je SHORT G_M48875_IG10 + ;; size=232 bbWeight=4 PerfScore 313.33
G_M48875_IG06: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref vpermq ymm4, ymm4, -40 vpmovmskb esi, ymm4 @@ -231,16 +226,16 @@ G_M48875_IG07: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { G_M48875_IG08: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[edi] add ebx, 64
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
; byrRegs +[edi] cmp ebx, edi
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax]
- mov ecx, dword ptr [ebp-0xB0]
+ mov ecx, dword ptr [ebp-0x70]
vmovups ymm2, ymmword ptr [ebp-0x4C] vmovups ymm3, ymmword ptr [ebp-0x6C] jbe G_M48875_IG05
- mov edx, bword ptr [ebp-0xB8]
+ mov edx, bword ptr [ebp-0x78]
; byrRegs +[edx] cmp ebx, edx je SHORT G_M48875_IG13 @@ -250,17 +245,17 @@ G_M48875_IG08: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {e ; byrRegs -[esi] cmp esi, 32 jle SHORT G_M48875_IG15
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
mov ebx, edi jmp G_M48875_IG05
- ;; size=71 bbWeight=4 PerfScore 79.00
+ ;; size=59 bbWeight=4 PerfScore 79.00
G_M48875_IG09: ; bbWeight=8, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[eax edx edi]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax]
- mov ecx, dword ptr [ebp-0xB0]
+ mov ecx, dword ptr [ebp-0x70]
jmp SHORT G_M48875_IG07
- ;; size=14 bbWeight=8 PerfScore 32.00
+ ;; size=11 bbWeight=8 PerfScore 32.00
G_M48875_IG10: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref, isz jmp SHORT G_M48875_IG08 ;; size=2 bbWeight=2 PerfScore 4.00 @@ -269,10 +264,10 @@ G_M48875_IG11: ; bbWeight=0.50, gcVars=0000000000001000 {V01}, gcrefRegs= ; GC ptr vars -{V04 V11 V91} mov eax, edi ; byrRegs +[eax]
- sub eax, dword ptr [ebp-0xB4]
+ sub eax, dword ptr [ebp-0x74]
; byrRegs -[eax] shr eax, 1
- ;; size=10 bbWeight=0.50 PerfScore 1.38
+ ;; size=7 bbWeight=0.50 PerfScore 1.38
G_M48875_IG12: ; bbWeight=0.50, epilog, nogc, extend vzeroupper lea esp, [ebp-0x0C] @@ -307,9 +302,9 @@ G_M48875_IG15: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc G_M48875_IG16: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=0000000D {eax edx ebx}, byref lea edi, bword ptr [edx-0x20] ; byrRegs +[edi]
- mov bword ptr [ebp-0xBC], edi
+ mov bword ptr [ebp-0x7C], edi
; GC ptr vars +{V08}
- ;; size=9 bbWeight=0.50 PerfScore 0.75
+ ;; size=6 bbWeight=0.50 PerfScore 0.75
G_M48875_IG17: ; bbWeight=4, gcVars=0000000000001620 {V01 V04 V08 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref ; byrRegs -[edx edi] ; GC ptr vars -{V05 V09 V12} @@ -327,12 +322,11 @@ G_M48875_IG17: ; bbWeight=4, gcVars=0000000000001620 {V01 V04 V08 V91}, g vpshufb xmm3, xmm5, xmm3 vpand xmm4, xmm4, xmmword ptr [@RWD96] vpcmpub k1, xmm4, xmmword ptr [@RWD128], 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm4, xmmword ptr [@RWD160] - vpshufb xmm6, xmm1, xmm6
...

benchmarks.run_tiered.windows.x86.checked.mch

-117 (-11.17%) : 44440.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

@@ -9,23 +9,23 @@ ; Final local variable assignments ; ; V00 arg0 [V00,T13] ( 4, 4 ) byref -> edi single-def
-; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0xB4] single-def
+; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0x74] single-def
; V02 arg2 [V02,T14] ( 3, 3 ) int -> [ebp+0x10] single-def ; V03 arg3 [V03,T15] ( 2, 2 ) struct ( 8) [ebp+0x08] do-not-enreg[S] single-def <System.ReadOnlySpan`1[ushort]>
-; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0xB8] spill-single-def
+; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0x78] spill-single-def
; V05 loc1 [V05,T00] ( 19, 93.50) byref -> ebx ; V06 loc2 [V06,T30] ( 5, 10 ) simd16 -> [ebp-0x1C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V07 loc3 [V07,T31] ( 5, 10 ) simd16 -> [ebp-0x2C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]>
-; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0xBC] single-def
+; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0x7C] single-def
; V09 loc5 [V09,T32] ( 3, 8.50) simd32 -> [ebp-0x4C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V10 loc6 [V10,T33] ( 3, 8.50) simd32 -> [ebp-0x6C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0xC0] spill-single-def
+; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0x80] spill-single-def
; V12 loc8 [V12,T20] ( 4, 14 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V13 loc9 [V13,T01] ( 5, 66 ) int -> esi ; V14 loc10 [V14,T07] ( 3, 32.50) byref -> edi ; V15 loc11 [V15,T21] ( 4, 14 ) simd16 -> mm2 <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V16 loc12 [V16,T02] ( 5, 66 ) int -> esi
-; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0xC4] spill-single-def
+; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0x84] spill-single-def
;* V18 tmp0 [V18 ] ( 0, 0 ) int -> zero-ref ;* V19 tmp1 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V20 tmp2 [V20 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" @@ -39,10 +39,10 @@ ; V28 tmp10 [V28,T23] ( 3, 12 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V29 tmp11 [V29,T24] ( 3, 12 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V30 tmp12 [V30,T25] ( 3, 12 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> [ebp-0x8C] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V32 tmp14 [V32,T35] ( 2, 8 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V33 tmp15 [V33 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> [ebp-0xAC] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V35 tmp17 [V35,T16] ( 4, 16 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V36 tmp18 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V37 tmp19 [V37 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -99,10 +99,10 @@ ;* V88 tmp70 [V88 ] ( 0, 0 ) int -> zero-ref "field V80._length (fldOffset=0x4)" P-INDEP ;* V89 tmp71 [V89 ] ( 0, 0 ) byref -> zero-ref "field V82._reference (fldOffset=0x0)" P-INDEP ;* V90 tmp72 [V90 ] ( 0, 0 ) int -> zero-ref "field V82._length (fldOffset=0x4)" P-INDEP
-; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0xC8] spill-single-def "V03.[000..004)" -; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0xB0] spill-single-def "V03.[004..008)"
+; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0x88] spill-single-def "V03.[000..004)" +; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0x70] spill-single-def "V03.[004..008)"
;
-; Lcl frame size = 188
+; Lcl frame size = 124
G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp @@ -110,25 +110,25 @@ G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} push edi push esi push ebx
- sub esp, 188
+ sub esp, 124
vzeroupper mov edi, ecx ; byrRegs +[edi] mov esi, edx ; byrRegs +[esi] mov ebx, dword ptr [ebp+0x10]
- ;; size=22 bbWeight=1 PerfScore 7.00
+ ;; size=19 bbWeight=1 PerfScore 7.00
G_M48875_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C0 {esi edi}, byref, isz mov eax, bword ptr [ebp+0x08] ; byrRegs +[eax]
- mov bword ptr [ebp-0xC8], eax
+ mov bword ptr [ebp-0x88], eax
; GC ptr vars +{V91} mov edx, dword ptr [ebp+0x0C]
- mov dword ptr [ebp-0xB0], edx - mov eax, bword ptr [ebp-0xC8]
+ mov dword ptr [ebp-0x70], edx + mov eax, bword ptr [ebp-0x88]
cmp ebx, 16 jge SHORT G_M48875_IG04
- ;; size=29 bbWeight=1 PerfScore 6.25
+ ;; size=26 bbWeight=1 PerfScore 6.25
G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, gcvars, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -137,16 +137,16 @@ G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs= call [<unknown method>] ; gcrRegs -[ecx edx] ; byrRegs -[eax]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] ;; size=22 bbWeight=0.50 PerfScore 2.25 G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, byref mov dword ptr [ebp+0x10], ebx lea ecx, bword ptr [esi+2*ebx] ; byrRegs +[ecx]
- mov bword ptr [ebp-0xB8], ecx
+ mov bword ptr [ebp-0x78], ecx
; GC ptr vars +{V04}
- mov bword ptr [ebp-0xB4], esi
+ mov bword ptr [ebp-0x74], esi
; GC ptr vars +{V01} mov ebx, esi ; byrRegs +[ebx] @@ -156,7 +156,7 @@ G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {e vmovups xmmword ptr [ebp-0x2C], xmm1 cmp dword ptr [ebp+0x10], 32 jl G_M48875_IG17
- ;; size=49 bbWeight=1 PerfScore 16.75
+ ;; size=43 bbWeight=1 PerfScore 16.75
G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, gcvars, byref ; byrRegs -[esi edi] vmovaps ymm2, ymm0 @@ -167,9 +167,9 @@ G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc vmovups ymmword ptr [ebp-0x6C], ymm3 lea edi, bword ptr [ecx-0x40] ; byrRegs +[edi]
- mov bword ptr [ebp-0xC0], edi
+ mov bword ptr [ebp-0x80], edi
; GC ptr vars +{V11}
- ;; size=39 bbWeight=0.50 PerfScore 4.00
+ ;; size=36 bbWeight=0.50 PerfScore 4.00
G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref, isz ; byrRegs -[ecx edi] vmovups ymm4, ymmword ptr [ebx] @@ -184,41 +184,36 @@ G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, g vpand ymm5, ymm5, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm5, ymm7, ymm5
- vmovups ymmword ptr [ebp-0xAC], ymm5
vpand ymm6, ymm6, ymmword ptr [@RWD96] vpcmpub k1, ymm6, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm6, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm6, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm7, ymm5, ymm6, -54 - vpand ymm5, ymm7, ymmword ptr [ebp-0xAC]
+ vpblendmb ymm6 {k1}, ymm6, ymm7 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 vpxor ymm5, ymm5, ymm6
- vmovups ymmword ptr [ebp-0x8C], ymm5
vpsrld ymm6, ymm4, 5 vpand ymm6, ymm6, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymmword ptr [@RWD96] vpcmpub k1, ymm4, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm4, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm4, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm5, ymm4, -54 - vpand ymm4, ymm7, ymm6 - vxorps ymm5, ymm5, ymm5 - vpcmpeqb ymm4, ymm4, ymm5 - vpcmpeqd ymm5, ymm5, ymm5 - vpxor ymm4, ymm4, ymm5 - vmovups ymm5, ymmword ptr [ebp-0x8C]
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6 + vxorps ymm6, ymm6, ymm6 + vpcmpeqb ymm4, ymm4, ymm6 + vpcmpeqd ymm6, ymm6, ymm6 + vpxor ymm4, ymm4, ymm6
vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je SHORT G_M48875_IG11
- ;; size=274 bbWeight=4 PerfScore 348.00
+ ;; size=232 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref vpermq ymm4, ymm4, -40 vpmovmskb esi, ymm4 @@ -229,7 +224,7 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { lea edi, bword ptr [ebx+2*edi] ; byrRegs +[edi] movzx ecx, word ptr [edi]
- push dword ptr [ebp-0xB0]
+ push dword ptr [ebp-0x70]
movsx edx, cx mov ecx, eax ; byrRegs +[ecx] @@ -239,19 +234,19 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { jne SHORT G_M48875_IG12 blsr esi, esi jne SHORT G_M48875_IG10
- ;; size=40 bbWeight=16 PerfScore 192.00
+ ;; size=37 bbWeight=16 PerfScore 192.00
G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[edi] add ebx, 64
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
; byrRegs +[edi] cmp ebx, edi
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] vmovups ymm2, ymmword ptr [ebp-0x4C] vmovups ymm3, ymmword ptr [ebp-0x6C] jbe G_M48875_IG06
- mov ecx, bword ptr [ebp-0xB8]
+ mov ecx, bword ptr [ebp-0x78]
; byrRegs +[ecx] cmp ebx, ecx je SHORT G_M48875_IG14 @@ -261,13 +256,13 @@ G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {e ; byrRegs -[esi] cmp esi, 32 jle SHORT G_M48875_IG16
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
mov ebx, edi jmp G_M48875_IG06
- ;; size=65 bbWeight=4 PerfScore 75.00
+ ;; size=56 bbWeight=4 PerfScore 75.00
G_M48875_IG10: ; bbWeight=8, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[eax ecx edi]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] jmp SHORT G_M48875_IG08 ;; size=8 bbWeight=8 PerfScore 24.00 @@ -279,10 +274,10 @@ G_M48875_IG12: ; bbWeight=0.50, gcVars=0000000000001000 {V01}, gcrefRegs= ; GC ptr vars -{V04 V11 V91} mov eax, edi ; byrRegs +[eax]
- sub eax, dword ptr [ebp-0xB4]
+ sub eax, dword ptr [ebp-0x74]
; byrRegs -[eax] shr eax, 1
- ;; size=10 bbWeight=0.50 PerfScore 1.38
+ ;; size=7 bbWeight=0.50 PerfScore 1.38
G_M48875_IG13: ; bbWeight=0.50, epilog, nogc, extend vzeroupper lea esp, [ebp-0x0C] @@ -317,10 +312,10 @@ G_M48875_IG16: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc G_M48875_IG17: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, byref lea edi, bword ptr [ecx-0x20] ...

coreclr_tests.run.windows.x86.checked.mch

-28 (-1.89%) : 469361.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)

@@ -289,10 +289,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpd k1, ymm2, ymmword ptr [ebp-0x28], 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -340,10 +339,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG22: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpd k1, ymm2, ymm1, 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -391,10 +389,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG26: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpd k1, ymm2, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -442,10 +439,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG30: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpd k1, ymm1, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -600,6 +596,6 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, ret ;; size=10 bbWeight=1 PerfScore 4.00
-; Total bytes of code 1483, prolog size 26, PerfScore 1134.08, instruction count 335, allocated bytes for code 1483 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
+; Total bytes of code 1455, prolog size 26, PerfScore 1129.42, instruction count 331, allocated bytes for code 1455 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
; ============================================================

-28 (-1.89%) : 207935.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Tier0-FullOpts)

@@ -289,10 +289,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpd k1, ymm2, ymmword ptr [ebp-0x28], 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -340,10 +339,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG22: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpd k1, ymm2, ymm1, 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -391,10 +389,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG26: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpd k1, ymm2, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -442,10 +439,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M1266_IG30: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpd k1, ymm1, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -600,6 +596,6 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, ret ;; size=10 bbWeight=1 PerfScore 4.00
-; Total bytes of code 1483, prolog size 26, PerfScore 1134.08, instruction count 335, allocated bytes for code 1483 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Tier0-FullOpts)
+; Total bytes of code 1455, prolog size 26, PerfScore 1129.42, instruction count 331, allocated bytes for code 1455 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Tier0-FullOpts)
; ============================================================

-28 (-1.86%) : 207945.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Tier0-FullOpts)

@@ -291,10 +291,9 @@ G_M44299_IG17: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M44299_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm0, ymmword ptr [ebp-0x68] vpcmpb k1, ymm0, ymmword ptr [ebp-0x28], 2
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M44299_IG19: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -342,10 +341,9 @@ G_M44299_IG21: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M44299_IG22: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpb k1, ymm0, ymm2, 2
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=20 bbWeight=1 PerfScore 9.25
G_M44299_IG23: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -393,10 +391,9 @@ G_M44299_IG25: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M44299_IG26: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm0, ymmword ptr [ebp-0x68] vpcmpb k1, ymm0, ymmword ptr [ebp-0x28], 5
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M44299_IG27: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -444,10 +441,9 @@ G_M44299_IG29: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M44299_IG30: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpb k1, ymm2, ymmword ptr [ebp-0x28], 5
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M44299_IG31: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -602,6 +598,6 @@ G_M44299_IG42: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} ret ;; size=10 bbWeight=1 PerfScore 4.00
-; Total bytes of code 1503, prolog size 26, PerfScore 1296.58, instruction count 336, allocated bytes for code 1503 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Tier0-FullOpts)
+; Total bytes of code 1475, prolog size 26, PerfScore 1294.58, instruction count 332, allocated bytes for code 1475 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Tier0-FullOpts)
; ============================================================

-28 (-1.86%) : 207944.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Tier0-FullOpts)

@@ -291,10 +291,9 @@ G_M8563_IG17: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M8563_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm0, ymmword ptr [ebp-0x68] vpcmpw k1, ymm0, ymmword ptr [ebp-0x28], 2
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG19: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -342,10 +341,9 @@ G_M8563_IG21: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M8563_IG22: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpw k1, ymm0, ymm2, 2
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=20 bbWeight=1 PerfScore 9.25
G_M8563_IG23: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -393,10 +391,9 @@ G_M8563_IG25: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M8563_IG26: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm0, ymmword ptr [ebp-0x68] vpcmpw k1, ymm0, ymmword ptr [ebp-0x28], 5
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG27: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -444,10 +441,9 @@ G_M8563_IG29: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M8563_IG30: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpw k1, ymm2, ymmword ptr [ebp-0x28], 5
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG31: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -602,6 +598,6 @@ G_M8563_IG42: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, ret ;; size=10 bbWeight=1 PerfScore 4.00
-; Total bytes of code 1503, prolog size 26, PerfScore 1296.58, instruction count 336, allocated bytes for code 1503 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Tier0-FullOpts)
+; Total bytes of code 1475, prolog size 26, PerfScore 1294.58, instruction count 332, allocated bytes for code 1475 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Tier0-FullOpts)
; ============================================================

-28 (-0.63%) : 207938.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)

@@ -699,9 +699,8 @@ G_M59915_IG22: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -710,7 +709,7 @@ G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG25
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG24: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -839,9 +838,8 @@ G_M59915_IG27: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x44], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -850,7 +848,7 @@ G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG30
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG29: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -979,9 +977,8 @@ G_M59915_IG32: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -990,7 +987,7 @@ G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG35
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG34: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1119,9 +1116,8 @@ G_M59915_IG37: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x44] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -1130,7 +1126,7 @@ G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG40
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG39: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1542,6 +1538,6 @@ G_M59915_IG53: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} ret 16 ;; size=11 bbWeight=1 PerfScore 4.50
-; Total bytes of code 4476, prolog size 32, PerfScore 917.83, instruction count 1037, allocated bytes for code 4476 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)
+; Total bytes of code 4448, prolog size 32, PerfScore 913.83, instruction count 1033, allocated bytes for code 4452 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)
; ============================================================

-28 (-0.63%) : 469362.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)

@@ -699,9 +699,8 @@ G_M59915_IG22: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -710,7 +709,7 @@ G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG25
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG24: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -839,9 +838,8 @@ G_M59915_IG27: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x44], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -850,7 +848,7 @@ G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG30
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG29: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -979,9 +977,8 @@ G_M59915_IG32: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -990,7 +987,7 @@ G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG35
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG34: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1119,9 +1116,8 @@ G_M59915_IG37: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x44] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x44] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [ebp-0x24]
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -1130,7 +1126,7 @@ G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG40
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=63 bbWeight=1 PerfScore 26.50
G_M59915_IG39: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1542,6 +1538,6 @@ G_M59915_IG53: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} ret 16 ;; size=11 bbWeight=1 PerfScore 4.50
-; Total bytes of code 4476, prolog size 32, PerfScore 917.83, instruction count 1037, allocated bytes for code 4476 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 4448, prolog size 32, PerfScore 913.83, instruction count 1033, allocated bytes for code 4452 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================

libraries.pmi.windows.x86.checked.mch

-21 (-20.59%) : 273965.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -16,7 +16,7 @@ ;* V05 loc2 [V05 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V06 tmp1 [V06 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 tmp2 [V07 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm2 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm0 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V09 tmp4 [V09 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; ; Lcl frame size = 0 @@ -31,23 +31,20 @@ G_M27576_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M27576_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {ecx}, byref ; byrRegs +[ecx] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm1, zmm0, -54 - vpcmpub k1, zmm0, zmm1, 6 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [ecx], zmm2 - ;; size=69 bbWeight=1 PerfScore 16.33
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm0, zmm1 + vpcmpub k2, zmm0, zmm1, 6 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [ecx], zmm0 + ;; size=48 bbWeight=1 PerfScore 14.83
G_M27576_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp ret 128 ;; size=7 bbWeight=1 PerfScore 3.50
-; Total bytes of code 102, prolog size 6, PerfScore 28.08, instruction count 19, allocated bytes for code 102 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 81, prolog size 6, PerfScore 26.58, instruction count 16, allocated bytes for code 81 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================

-21 (-20.59%) : 274022.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -16,7 +16,7 @@ ;* V05 loc2 [V05 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V06 tmp1 [V06 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 tmp2 [V07 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm2 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm0 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V09 tmp4 [V09 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; ; Lcl frame size = 0 @@ -31,23 +31,20 @@ G_M10214_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M10214_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {ecx}, byref ; byrRegs +[ecx] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm0, zmm1, -54 - vpcmpub k1, zmm0, zmm1, 1 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [ecx], zmm2 - ;; size=69 bbWeight=1 PerfScore 16.33
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm1, zmm0 + vpcmpub k2, zmm0, zmm1, 1 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [ecx], zmm0 + ;; size=48 bbWeight=1 PerfScore 14.83
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp ret 128 ;; size=7 bbWeight=1 PerfScore 3.50
-; Total bytes of code 102, prolog size 6, PerfScore 28.08, instruction count 19, allocated bytes for code 102 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 81, prolog size 6, PerfScore 26.58, instruction count 16, allocated bytes for code 81 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================

-21 (-20.59%) : 273943.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T02] ( 4, 4 ) simd64 -> mm1 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 loc5 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument" @@ -31,23 +31,20 @@ G_M22834_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M22834_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {ecx}, byref ; byrRegs +[ecx] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm1, zmm0, -54 - vpcmpub k1, zmm0, zmm1, 6 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [ecx], zmm2 - ;; size=69 bbWeight=1 PerfScore 16.33
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm0, zmm1 + vpcmpub k2, zmm0, zmm1, 6 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [ecx], zmm0 + ;; size=48 bbWeight=1 PerfScore 14.83
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp ret 128 ;; size=7 bbWeight=1 PerfScore 3.50
-; Total bytes of code 102, prolog size 6, PerfScore 28.08, instruction count 19, allocated bytes for code 102 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 81, prolog size 6, PerfScore 26.58, instruction count 16, allocated bytes for code 81 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================

-7 (-5.47%) : 4816.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte]):System.Runtime.Intrinsics.Vector2561ubyte

@@ -35,20 +35,19 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpshufb ymm1, ymm2, ymm1 vpand ymm0, ymm0, ymmword ptr [@RWD64] vpcmpub k1, ymm0, ymmword ptr [@RWD96], 6
- vpmovm2b ymm2, k1 - vpsubb ymm3, ymm0, ymmword ptr [@RWD128] - vmovups ymm4, ymmword ptr [ebp+0x28] - vpshufb ymm3, ymm4, ymm3 - vmovups ymm4, ymmword ptr [ebp+0x48] - vpshufb ymm0, ymm4, ymm0 - vpternlogd ymm2, ymm3, ymm0, -54 - vpand ymm0, ymm2, ymm1
+ vpsubb ymm2, ymm0, ymmword ptr [@RWD128] + vmovups ymm3, ymmword ptr [ebp+0x28] + vpshufb ymm2, ymm3, ymm2 + vmovups ymm3, ymmword ptr [ebp+0x48] + vpshufb ymm0, ymm3, ymm0 + vpblendmb ymm0 {k1}, ymm0, ymm2 + vpand ymm0, ymm0, ymm1
vxorps ymm1, ymm1, ymm1 vpcmpeqb ymm0, ymm0, ymm1 vpcmpeqd ymm1, ymm1, ymm1 vpxor ymm0, ymm0, ymm1 vmovups ymmword ptr [ecx], ymm0
- ;; size=110 bbWeight=1 PerfScore 35.50
+ ;; size=103 bbWeight=1 PerfScore 35.00
G_M53822_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp @@ -61,6 +60,6 @@ RWD96 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD128 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 128, prolog size 6, PerfScore 45.25, instruction count 26, allocated bytes for code 128 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 121, prolog size 6, PerfScore 44.75, instruction count 25, allocated bytes for code 121 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================

-15 (-5.21%) : 4815.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],byref):System.Runtime.Intrinsics.Vector256`1ubyte

@@ -16,8 +16,8 @@ ; V05 loc1 [V05,T05] ( 3, 3 ) simd32 -> mm3 <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V06 loc2 [V06,T06] ( 3, 3 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V07 loc3 [V07,T07] ( 3, 3 ) simd32 -> mm2 <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V08 loc4 [V08,T15] ( 2, 2 ) simd32 -> mm1 <System.Runtime.Intrinsics.Vector256`1[ubyte]> -; V09 loc5 [V09,T16] ( 2, 2 ) simd32 -> mm0 <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V08 loc4 [V08,T15] ( 2, 2 ) simd32 -> mm0 <System.Runtime.Intrinsics.Vector256`1[ubyte]> +; V09 loc5 [V09,T16] ( 2, 2 ) simd32 -> mm1 <System.Runtime.Intrinsics.Vector256`1[ubyte]>
;* V10 loc6 [V10 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V11 tmp1 [V11,T17] ( 2, 2 ) simd32 -> [ebp-0x20] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V12 tmp2 [V12,T02] ( 4, 4 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -26,7 +26,7 @@ ;* V15 tmp5 [V15 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V16 tmp6 [V16 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V17 tmp7 [V17 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V18 tmp8 [V18,T18] ( 2, 2 ) simd32 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V18 tmp8 [V18,T18] ( 2, 2 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V19 tmp9 [V19,T03] ( 4, 4 ) simd32 -> mm2 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V20 tmp10 [V20 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V21 tmp11 [V21 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -36,16 +36,17 @@ ; V25 cse1 [V25,T09] ( 3, 3 ) simd32 -> mm5 "CSE - moderate" ; V26 cse2 [V26,T10] ( 3, 3 ) simd32 -> mm6 "CSE - moderate" ; V27 cse3 [V27,T11] ( 3, 3 ) simd32 -> mm7 "CSE - moderate"
-; V28 cse4 [V28,T12] ( 3, 3 ) simd32 -> [ebp-0x40] spill-single-def "CSE - moderate"
+; V28 cse4 [V28,T12] ( 3, 3 ) simd32 -> mm3 "CSE - moderate"
;
-; Lcl frame size = 64
+; Lcl frame size = 32
G_M59405_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp mov ebp, esp
- sub esp, 64
+ sub esp, 32
vzeroupper
- ;; size=9 bbWeight=1 PerfScore 2.50
+ vmovups ymm1, ymmword ptr [ebp+0x08] + ;; size=14 bbWeight=1 PerfScore 6.50
G_M59405_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000006 {ecx edx}, byref ; byrRegs +[ecx edx] vmovups ymm2, ymmword ptr [edx] @@ -67,40 +68,37 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000006 {e vpand ymm4, ymm4, ymm6 vmovups ymm7, ymmword ptr [@RWD128] vpcmpub k1, ymm4, ymm7, 6
- vpmovm2b ymm3, k1 - vmovups ymm0, ymmword ptr [@RWD160] - vmovups ymmword ptr [ebp-0x40], ymm0 - vpsubb ymm1, ymm4, ymm0 - vmovups ymm0, ymmword ptr [ebp+0x08] - vpshufb ymm1, ymm0, ymm1 - vmovups ymm0, ymmword ptr [ebp+0x28] - vpshufb ymm4, ymm0, ymm4 - vpternlogd ymm3, ymm1, ymm4, -54 - vpand ymm1, ymm3, ymmword ptr [ebp-0x20] - vxorps ymm3, ymm3, ymm3 - vpcmpeqb ymm1, ymm1, ymm3 - vpcmpeqd ymm3, ymm3, ymm3 - vpxor ymm1, ymm1, ymm3 - vpsrld ymm3, ymm2, 5 - vpand ymm3, ymm3, ymm5 - vmovups ymm4, ymmword ptr [@RWD64] - vpshufb ymm3, ymm4, ymm3
+ vmovups ymm3, ymmword ptr [@RWD160] + vpsubb ymm0, ymm4, ymm3 + vmovups ymmword ptr [ebp+0x08], ymm1 + vpshufb ymm0, ymm1, ymm0 + vmovups ymm1, ymmword ptr [ebp+0x28] + vpshufb ymm4, ymm1, ymm4 + vpblendmb ymm0 {k1}, ymm4, ymm0 + vpand ymm0, ymm0, ymmword ptr [ebp-0x20] + vxorps ymm4, ymm4, ymm4 + vpcmpeqb ymm0, ymm0, ymm4 + vpcmpeqd ymm4, ymm4, ymm4 + vpxor ymm0, ymm0, ymm4 + vpsrld ymm4, ymm2, 5 + vpand ymm4, ymm4, ymm5 + vmovups ymm5, ymmword ptr [@RWD64] + vpshufb ymm4, ymm5, ymm4
vpand ymm2, ymm2, ymm6 vpcmpub k1, ymm2, ymm7, 6
- vpmovm2b ymm4, k1 - vpsubb ymm5, ymm2, ymmword ptr [ebp-0x40] - vmovups ymm6, ymmword ptr [ebp+0x08] - vpshufb ymm5, ymm6, ymm5 - vpshufb ymm0, ymm0, ymm2 - vpternlogd ymm4, ymm5, ymm0, -54 - vpand ymm0, ymm4, ymm3
+ vpsubb ymm3, ymm2, ymm3 + vmovups ymm5, ymmword ptr [ebp+0x08] + vpshufb ymm3, ymm5, ymm3 + vpshufb ymm1, ymm1, ymm2 + vpblendmb ymm1 {k1}, ymm1, ymm3 + vpand ymm1, ymm1, ymm4
vxorps ymm2, ymm2, ymm2
- vpcmpeqb ymm0, ymm0, ymm2
+ vpcmpeqb ymm1, ymm1, ymm2
vpcmpeqd ymm2, ymm2, ymm2
- vpxor ymm0, ymm0, ymm2 - vpand ymm0, ymm1, ymm0
+ vpxor ymm1, ymm1, ymm2 + vpand ymm0, ymm0, ymm1
vmovups ymmword ptr [ecx], ymm0
- ;; size=270 bbWeight=1 PerfScore 95.33
+ ;; size=250 bbWeight=1 PerfScore 88.67
G_M59405_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper mov esp, ebp @@ -115,6 +113,6 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 288, prolog size 9, PerfScore 101.58, instruction count 60, allocated bytes for code 288 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 273, prolog size 9, PerfScore 98.92, instruction count 58, allocated bytes for code 273 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================

-7 (-4.38%) : 273746.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float

@@ -39,16 +39,15 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpandd zmm3, zmm4, dword ptr [@RWD128] {1to16} vpord zmm4, zmm3, dword ptr [@RWD132] {1to16} vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1 - vpslld zmm5, zmm4, 1 - vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1 + vpblendmd zmm2 {k1}, zmm2, zmm4
vpslld zmm0, zmm0, 13 vpandd zmm0, zmm0, dword ptr [@RWD136] {1to16} vpaddd zmm0, zmm0, zmm2 vsubps zmm0, zmm0, zmm3 vpord zmm0, zmm0, zmm1 vmovups zmmword ptr [ecx], zmm0
- ;; size=137 bbWeight=1 PerfScore 29.00
+ ;; size=130 bbWeight=1 PerfScore 28.00
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp @@ -64,6 +63,6 @@ RWD132 dd 38000000h RWD136 dd 0FFFE000h
-; Total bytes of code 160, prolog size 6, PerfScore 37.75, instruction count 26, allocated bytes for code 160 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 153, prolog size 6, PerfScore 36.75, instruction count 25, allocated bytes for code 153 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================

libraries_tests.run.windows.x86.Release.mch

-14 (-17.72%) : 370873.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,19 +34,17 @@ G_M10273_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm1, -54
+ vpblendmd xmm3 {k1}, xmm1, xmm0
vpcmpud k1, xmm0, xmm1, 1
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [ecx], xmm2
- ;; size=59 bbWeight=1 PerfScore 12.33
+ ;; size=45 bbWeight=1 PerfScore 10.00
G_M10273_IG03: ; bbWeight=1, epilog, nogc, extend pop ebp ret 32 ;; size=4 bbWeight=1 PerfScore 2.50
-; Total bytes of code 79, prolog size 6, PerfScore 23.08, instruction count 17, allocated bytes for code 79 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 65, prolog size 6, PerfScore 20.75, instruction count 15, allocated bytes for code 65 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================

-14 (-17.72%) : 366898.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,19 +34,17 @@ G_M23551_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 {k1}, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [ecx], xmm2
- ;; size=59 bbWeight=1 PerfScore 12.33
+ ;; size=45 bbWeight=1 PerfScore 10.00
G_M23551_IG03: ; bbWeight=1, epilog, nogc, extend pop ebp ret 32 ;; size=4 bbWeight=1 PerfScore 2.50
-; Total bytes of code 79, prolog size 6, PerfScore 23.08, instruction count 17, allocated bytes for code 79 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 65, prolog size 6, PerfScore 20.75, instruction count 15, allocated bytes for code 65 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================

-14 (-17.72%) : 366792.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,19 +34,17 @@ G_M10273_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm1, -54
+ vpblendmd xmm3 {k1}, xmm1, xmm0
vpcmpud k1, xmm0, xmm1, 1
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [ecx], xmm2
- ;; size=59 bbWeight=1 PerfScore 12.33
+ ;; size=45 bbWeight=1 PerfScore 10.00
G_M10273_IG03: ; bbWeight=1, epilog, nogc, extend pop ebp ret 32 ;; size=4 bbWeight=1 PerfScore 2.50
-; Total bytes of code 79, prolog size 6, PerfScore 23.08, instruction count 17, allocated bytes for code 79 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 65, prolog size 6, PerfScore 20.75, instruction count 15, allocated bytes for code 65 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================

librariestestsnotieredcompilation.run.windows.x86.Release.mch

-14 (-1.97%) : 167868.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)

@@ -78,8 +78,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} vmovups ymm1, ymmword ptr [ecx+0x08] vmovups ymmword ptr [ebp-0x48], ymm1 vpcmpud k1, ymm0, ymm1, 6
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54
+ vpblendmd ymm2 {k1}, ymm1, ymm0
vmovups ymmword ptr [ebp-0x68], ymm2 mov ecx, 0xD1FFAB1E ; System.Action`2[int,uint] ; gcrRegs -[ecx] @@ -129,9 +128,9 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} vmovups ymm2, ymmword ptr [ebp-0xA8] vextracti128 xmm0, ymm2, 1 vmovd edx, xmm0
- ;; size=272 bbWeight=1 PerfScore 104.50 -G_M21446_IG03: ; bbWeight=1, extend
push edx
+ ;; size=266 bbWeight=1 PerfScore 104.33 +G_M21446_IG03: ; bbWeight=1, extend
mov edx, 4 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] @@ -167,9 +166,8 @@ G_M21446_IG03: ; bbWeight=1, extend vmovups ymm0, ymmword ptr [ebp-0x28] vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm0, ymm1, 2
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54 - vmovups ymmword ptr [ebp-0x88], ymm2
+ vpblendmd ymm0 {k1}, ymm1, ymm0 + vmovups ymmword ptr [ebp-0x88], ymm0
mov ecx, 0xD1FFAB1E ; System.Action`2[int,uint] call CORINFO_HELP_NEWSFAST ; gcrRegs +[eax] @@ -181,70 +179,70 @@ G_M21446_IG03: ; bbWeight=1, extend ; gcrRegs -[eax esi] ; byrRegs -[edx] mov dword ptr [edi+0x0C], 0xD1FFAB1E
- vmovups ymm2, ymmword ptr [ebp-0x88] - vmovups ymmword ptr [ebp-0xC8], ymm2 - vmovd edx, xmm2
+ vmovups ymm0, ymmword ptr [ebp-0x88] + vmovups ymmword ptr [ebp-0xC8], ymm0 + vmovd edx, xmm0
push edx xor edx, edx mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovdqu xmm0, xmmword ptr [ebp-0xC8] - vpextrd edx, xmm0, 1
+ vmovdqu xmm1, xmmword ptr [ebp-0xC8] + vpextrd edx, xmm1, 1
push edx mov edx, 1 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovdqu xmm0, xmmword ptr [ebp-0xC8] - vpextrd edx, xmm0, 2
+ vmovdqu xmm1, xmmword ptr [ebp-0xC8] + vpextrd edx, xmm1, 2
push edx mov edx, 2 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovdqu xmm0, xmmword ptr [ebp-0xC8] - vpextrd edx, xmm0, 3
+ vmovdqu xmm1, xmmword ptr [ebp-0xC8] + vpextrd edx, xmm1, 3
push edx mov edx, 3 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1 - vmovd edx, xmm0
+ vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm1, ymm0, 1 + vmovd edx, xmm1
push edx mov edx, 4
- ;; size=304 bbWeight=1 PerfScore 128.75 -G_M21446_IG04: ; bbWeight=1, extend
mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 1
+ ;; size=302 bbWeight=1 PerfScore 131.58 +G_M21446_IG04: ; bbWeight=1, extend + vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 1
push edx mov edx, 5 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 2
+ vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 2
push edx mov edx, 6 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1
+ vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm0, ymm0, 1
vpextrd edx, xmm0, 3 push edx mov edx, 7 @@ -252,7 +250,7 @@ G_M21446_IG04: ; bbWeight=1, extend ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx edi]
- ;; size=102 bbWeight=1 PerfScore 50.75
+ ;; size=96 bbWeight=1 PerfScore 45.75
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend vzeroupper lea esp, [ebp-0x08] @@ -266,6 +264,6 @@ G_M21446_IG06: ; bbWeight=0, gcVars=00000000 {}, gcrefRegs=00000000 {}, b int3 ;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 709, prolog size 14, PerfScore 292.50, instruction count 167, allocated bytes for code 709 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 695, prolog size 14, PerfScore 290.17, instruction count 165, allocated bytes for code 695 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.windows.x86.checked.mch 1 1 0 0 -117 +0
benchmarks.run_pgo.windows.x86.checked.mch 1 1 0 0 -126 +0
benchmarks.run_tiered.windows.x86.checked.mch 1 1 0 0 -117 +0
coreclr_tests.run.windows.x86.checked.mch 16 16 0 0 -672 +0
libraries.crossgen2.windows.x86.checked.mch 0 0 0 0 -0 +0
libraries.pmi.windows.x86.checked.mch 24 24 0 0 -565 +0
libraries_tests.run.windows.x86.Release.mch 10 10 0 0 -896 +0
librariestestsnotieredcompilation.run.windows.x86.Release.mch 7 7 0 0 -845 +0
realworld.run.windows.x86.checked.mch 0 0 0 0 -0 +0
60 60 0 0 -3,338 +0

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.x86.checked.mch 24,263 4 24,259 223 (0.91%) 223 (0.91%)
benchmarks.run_pgo.windows.x86.checked.mch 119,596 41,883 77,713 237 (0.20%) 237 (0.20%)
benchmarks.run_tiered.windows.x86.checked.mch 47,814 28,723 19,091 166 (0.35%) 166 (0.35%)
coreclr_tests.run.windows.x86.checked.mch 574,204 320,026 254,178 531 (0.09%) 531 (0.09%)
libraries.crossgen2.windows.x86.checked.mch 242,344 15 242,329 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.x86.checked.mch 302,978 6 302,972 2,071 (0.68%) 2,071 (0.68%)
libraries_tests.run.windows.x86.Release.mch 631,078 427,921 203,157 1,208 (0.19%) 1,208 (0.19%)
librariestestsnotieredcompilation.run.windows.x86.Release.mch 314,404 21,871 292,533 2,024 (0.64%) 2,024 (0.64%)
realworld.run.windows.x86.checked.mch 35,597 3 35,594 390 (1.08%) 390 (1.08%)
2,292,278 840,452 1,451,826 6,850 (0.30%) 6,850 (0.30%)

jit-analyze output

benchmarks.run.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 6964581 (overridden on cmd)
Total bytes of diff: 6964464 (overridden on cmd)
Total bytes of delta: -117 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -117 : 22326.dasm (-11.17 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -117 (-11.17 % of base) : 22326.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
        -117 (-11.17 % of base) : 22326.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


benchmarks.run_pgo.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 45650701 (overridden on cmd)
Total bytes of diff: 45650575 (overridden on cmd)
Total bytes of delta: -126 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -126 : 94556.dasm (-11.73 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -126 (-11.73 % of base) : 94556.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

Top method improvements (percentages):
        -126 (-11.73 % of base) : 94556.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


benchmarks.run_tiered.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 9318080 (overridden on cmd)
Total bytes of diff: 9317963 (overridden on cmd)
Total bytes of delta: -117 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -117 : 44440.dasm (-11.17 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -117 (-11.17 % of base) : 44440.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

Top method improvements (percentages):
        -117 (-11.17 % of base) : 44440.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


coreclr_tests.run.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 308781934 (overridden on cmd)
Total bytes of diff: 308781262 (overridden on cmd)
Total bytes of delta: -672 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 207939.dasm (-3.64 % of base)
         -56 : 207947.dasm (-1.24 % of base)
         -56 : 469364.dasm (-3.64 % of base)
         -56 : 469367.dasm (-3.69 % of base)
         -56 : 207941.dasm (-3.64 % of base)
         -56 : 207946.dasm (-3.69 % of base)
         -56 : 469368.dasm (-1.24 % of base)
         -56 : 469363.dasm (-3.64 % of base)
         -28 : 469362.dasm (-0.63 % of base)
         -28 : 207938.dasm (-0.63 % of base)
         -28 : 207945.dasm (-1.86 % of base)
         -28 : 469365.dasm (-1.86 % of base)
         -28 : 207935.dasm (-1.89 % of base)
         -28 : 207944.dasm (-1.86 % of base)
         -28 : 469361.dasm (-1.89 % of base)
         -28 : 469366.dasm (-1.86 % of base)

16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-3.64 % of base) : 469364.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-3.64 % of base) : 207941.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Tier0-FullOpts)
         -56 (-3.69 % of base) : 469367.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-3.69 % of base) : 207946.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Tier0-FullOpts)
         -56 (-1.24 % of base) : 469368.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-1.24 % of base) : 207947.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Tier0-FullOpts)
         -56 (-3.64 % of base) : 469363.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -56 (-3.64 % of base) : 207939.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Tier0-FullOpts)
         -28 (-1.86 % of base) : 469366.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.86 % of base) : 207945.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Tier0-FullOpts)
         -28 (-1.89 % of base) : 469361.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.89 % of base) : 207935.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Tier0-FullOpts)
         -28 (-0.63 % of base) : 469362.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-0.63 % of base) : 207938.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)
         -28 (-1.86 % of base) : 469365.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -28 (-1.86 % of base) : 207944.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Tier0-FullOpts)

Top method improvements (percentages):
         -56 (-3.69 % of base) : 469367.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-3.69 % of base) : 207946.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Tier0-FullOpts)
         -56 (-3.64 % of base) : 469364.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-3.64 % of base) : 207941.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Tier0-FullOpts)
         -56 (-3.64 % of base) : 469363.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -56 (-3.64 % of base) : 207939.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Tier0-FullOpts)
         -28 (-1.89 % of base) : 469361.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.89 % of base) : 207935.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Tier0-FullOpts)
         -28 (-1.86 % of base) : 469366.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.86 % of base) : 207945.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Tier0-FullOpts)
         -28 (-1.86 % of base) : 469365.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -28 (-1.86 % of base) : 207944.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Tier0-FullOpts)
         -56 (-1.24 % of base) : 469368.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-1.24 % of base) : 207947.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Tier0-FullOpts)
         -28 (-0.63 % of base) : 469362.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-0.63 % of base) : 207938.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)

16 total methods with Code Size differences (16 improved, 0 regressed).


libraries.pmi.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 48079459 (overridden on cmd)
Total bytes of diff: 48078894 (overridden on cmd)
Total bytes of delta: -565 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -117 : 4822.dasm (-11.17 % of base)
         -56 : 273944.dasm (-20.36 % of base)
         -56 : 274001.dasm (-20.36 % of base)
         -28 : 273946.dasm (-16.47 % of base)
         -28 : 274003.dasm (-16.47 % of base)
         -21 : 273943.dasm (-20.59 % of base)
         -21 : 274022.dasm (-20.59 % of base)
         -21 : 273965.dasm (-20.59 % of base)
         -21 : 274000.dasm (-20.59 % of base)
         -20 : 4817.dasm (-7.07 % of base)
         -15 : 4815.dasm (-5.21 % of base)
         -14 : 273942.dasm (-17.07 % of base)
         -14 : 273998.dasm (-17.72 % of base)
         -14 : 273999.dasm (-17.07 % of base)
         -14 : 274002.dasm (-14.43 % of base)
         -14 : 274021.dasm (-17.07 % of base)
         -14 : 273964.dasm (-17.07 % of base)
         -14 : 273963.dasm (-17.72 % of base)
         -14 : 274020.dasm (-17.72 % of base)
         -14 : 273941.dasm (-17.72 % of base)

24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -117 (-11.17 % of base) : 4822.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -56 (-20.36 % of base) : 273944.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.36 % of base) : 274001.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -28 (-16.47 % of base) : 273946.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-16.47 % of base) : 274003.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -21 (-20.59 % of base) : 273943.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.59 % of base) : 273965.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.59 % of base) : 274000.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.59 % of base) : 274022.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -20 (-7.07 % of base) : 4817.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -15 (-5.21 % of base) : 4815.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.72 % of base) : 273941.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-14.43 % of base) : 273945.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-17.07 % of base) : 273942.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.72 % of base) : 273963.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.07 % of base) : 273964.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.72 % of base) : 273998.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-14.43 % of base) : 274002.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-17.07 % of base) : 273999.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.72 % of base) : 274020.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

Top method improvements (percentages):
         -21 (-20.59 % of base) : 273943.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.59 % of base) : 273965.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.59 % of base) : 274000.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.59 % of base) : 274022.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -56 (-20.36 % of base) : 273944.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.36 % of base) : 274001.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -14 (-17.72 % of base) : 273941.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.72 % of base) : 273963.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.72 % of base) : 273998.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.72 % of base) : 274020.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.07 % of base) : 273942.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.07 % of base) : 273964.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.07 % of base) : 273999.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.07 % of base) : 274021.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -28 (-16.47 % of base) : 273946.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-16.47 % of base) : 274003.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -14 (-14.43 % of base) : 273945.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-14.43 % of base) : 274002.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
        -117 (-11.17 % of base) : 4822.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -20 (-7.07 % of base) : 4817.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

24 total methods with Code Size differences (24 improved, 0 regressed).


libraries_tests.run.windows.x86.Release.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 187430449 (overridden on cmd)
Total bytes of diff: 187429553 (overridden on cmd)
Total bytes of delta: -896 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -182 : 369103.dasm (-18.00 % of base)
        -182 : 367511.dasm (-18.00 % of base)
        -154 : 369537.dasm (-17.44 % of base)
         -98 : 363521.dasm (-14.78 % of base)
         -98 : 369394.dasm (-14.78 % of base)
         -84 : 370885.dasm (-10.05 % of base)
         -56 : 318312.dasm (-5.28 % of base)
         -14 : 366898.dasm (-17.72 % of base)
         -14 : 370873.dasm (-17.72 % of base)
         -14 : 366792.dasm (-17.72 % of base)

10 total files with Code Size differences (10 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -182 (-18.00 % of base) : 369103.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
        -182 (-18.00 % of base) : 367511.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
        -154 (-17.44 % of base) : 369537.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (Tier0-FullOpts)
         -98 (-14.78 % of base) : 363521.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -98 (-14.78 % of base) : 369394.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -84 (-10.05 % of base) : 370885.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](System.ReadOnlySpan`1[uint],System.ReadOnlySpan`1[uint],System.Span`1[uint]) (Tier1)
         -56 (-5.28 % of base) : 318312.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)
         -14 (-17.72 % of base) : 366898.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.72 % of base) : 370873.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.72 % of base) : 366792.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)

Top method improvements (percentages):
        -182 (-18.00 % of base) : 369103.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
        -182 (-18.00 % of base) : 367511.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
         -14 (-17.72 % of base) : 366898.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.72 % of base) : 370873.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.72 % of base) : 366792.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
        -154 (-17.44 % of base) : 369537.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (Tier0-FullOpts)
         -98 (-14.78 % of base) : 363521.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -98 (-14.78 % of base) : 369394.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -84 (-10.05 % of base) : 370885.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](System.ReadOnlySpan`1[uint],System.ReadOnlySpan`1[uint],System.Span`1[uint]) (Tier1)
         -56 (-5.28 % of base) : 318312.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

10 total methods with Code Size differences (10 improved, 0 regressed).


librariestestsnotieredcompilation.run.windows.x86.Release.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 102689706 (overridden on cmd)
Total bytes of diff: 102688861 (overridden on cmd)
Total bytes of delta: -845 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -182 : 165267.dasm (-18.00 % of base)
        -182 : 167191.dasm (-18.00 % of base)
        -154 : 167215.dasm (-17.44 % of base)
        -117 : 149348.dasm (-11.17 % of base)
         -98 : 166467.dasm (-14.78 % of base)
         -98 : 167075.dasm (-14.78 % of base)
         -14 : 167868.dasm (-1.97 % of base)

7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -182 (-18.00 % of base) : 167191.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-18.00 % of base) : 165267.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.44 % of base) : 167215.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
        -117 (-11.17 % of base) : 149348.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -98 (-14.78 % of base) : 166467.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-14.78 % of base) : 167075.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -14 (-1.97 % of base) : 167868.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

Top method improvements (percentages):
        -182 (-18.00 % of base) : 167191.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-18.00 % of base) : 165267.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.44 % of base) : 167215.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-14.78 % of base) : 166467.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-14.78 % of base) : 167075.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
        -117 (-11.17 % of base) : 149348.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -14 (-1.97 % of base) : 167868.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

7 total methods with Code Size differences (7 improved, 0 regressed).