Assembly Diffs

linux arm64

Diffs are based on 2,505,351 contexts (1,011,240 MinOpts, 1,494,111 FullOpts).

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm64.checked.mch 34,852 3,148 31,704 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.linux.arm64.checked.mch 151,104 59,296 91,808 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.linux.arm64.checked.mch 71,207 53,989 17,218 0 (0.00%) 0 (0.00%)
coreclr_tests.run.linux.arm64.checked.mch 627,221 383,796 243,425 0 (0.00%) 0 (0.00%)
libraries.crossgen2.linux.arm64.checked.mch 234,183 15 234,168 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 295,043 6 295,037 0 (0.00%) 0 (0.00%)
libraries_tests.run.linux.arm64.Release.mch 734,812 489,338 245,474 0 (0.00%) 0 (0.00%)
librariestestsnotieredcompilation.run.linux.arm64.Release.mch 304,797 21,560 283,237 0 (0.00%) 0 (0.00%)
realworld.run.linux.arm64.checked.mch 33,103 85 33,018 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.arm64.checked.mch 19,029 7 19,022 0 (0.00%) 0 (0.00%)
2,505,351 1,011,240 1,494,111 0 (0.00%) 0 (0.00%)


linux x64

Diffs are based on 2,512,262 contexts (977,780 MinOpts, 1,534,482 FullOpts).

Overall (-10,720 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 16,454,856 -42
coreclr_tests.run.linux.x64.checked.mch 403,726,743 -528
libraries.pmi.linux.x64.checked.mch 60,288,822 -469
libraries_tests.run.linux.x64.Release.mch 342,241,520 -8,897
librariestestsnotieredcompilation.run.linux.x64.Release.mch 132,684,790 -756
smoke_tests.nativeaot.linux.x64.checked.mch 4,195,910 -28

MinOpts (-248 bytes)

Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch 279,817,920 -192
libraries_tests.run.linux.x64.Release.mch 183,917,771 -56

FullOpts (-10,472 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 16,190,683 -42
coreclr_tests.run.linux.x64.checked.mch 123,908,823 -336
libraries.pmi.linux.x64.checked.mch 60,175,952 -469
libraries_tests.run.linux.x64.Release.mch 158,323,749 -8,841
librariestestsnotieredcompilation.run.linux.x64.Release.mch 122,026,342 -756
smoke_tests.nativeaot.linux.x64.checked.mch 4,194,961 -28

Example diffs

benchmarks.run.linux.x64.checked.mch

-14 (-4.14%) : 10422.dasm - System.SpanHelpers:ReplaceValueTypeushort (FullOpts)

@@ -161,27 +161,25 @@ G_M56402_IG17: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000 G_M56402_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, byref, isz vmovups zmm2, zmmword ptr [rdi+2*rax] vpcmpeqw k1, zmm0, zmm2
- vpmovm2w zmm3, k1 - vpternlogd zmm3, zmm1, zmm2, -54 - vmovups zmmword ptr [rsi+2*rax], zmm3
+ vpblendmw zmm2 {k1}, zmm2, zmm1 + vmovups zmmword ptr [rsi+2*rax], zmm2
add rax, 32 cmp rax, r8 jb SHORT G_M56402_IG18
- ;; size=42 bbWeight=4 PerfScore 42.00
+ ;; size=35 bbWeight=4 PerfScore 40.00
G_M56402_IG19: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, byref vmovups zmm2, zmmword ptr [rdi+2*r8] vpcmpeqw k1, zmm0, zmm2
- vpmovm2w zmm0, k1 - vpternlogd zmm0, zmm1, zmm2, -54
+ vpblendmw zmm0 {k1}, zmm2, zmm1
vmovups zmmword ptr [rsi+2*r8], zmm0
- ;; size=33 bbWeight=0.50 PerfScore 4.50
+ ;; size=26 bbWeight=0.50 PerfScore 4.25
G_M56402_IG20: ; bbWeight=0.50, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25
-; Total bytes of code 338, prolog size 7, PerfScore 172.38, instruction count 86, allocated bytes for code 338 (MethodHash=34bd23ad) for method System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
+; Total bytes of code 324, prolog size 7, PerfScore 170.12, instruction count 84, allocated bytes for code 324 (MethodHash=34bd23ad) for method System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
; ============================================================ Unwind Info:

-28 (-2.59%) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -106,13 +106,13 @@ ; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 136 @@ -194,13 +194,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -211,12 +210,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -224,7 +222,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpermq ymm4, ymm4, -40 vpmovmskb r12d, ymm4 @@ -356,13 +354,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -372,12 +369,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -385,7 +381,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -473,7 +469,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1081, prolog size 43, PerfScore 1253.25, instruction count 240, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1053, prolog size 43, PerfScore 1245.25, instruction count 236, allocated bytes for code 1053 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

coreclr_tests.run.linux.x64.checked.mch

-28 (-1.60%) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)

@@ -312,10 +312,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpd k1, ymm2, ymmword ptr [rbp-0x30], 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -366,10 +365,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpd k1, ymm2, ymm1, 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -420,10 +418,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpd k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -474,10 +471,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpd k1, ymm1, ymmword ptr [rbp-0x30], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -641,7 +637,7 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1747, prolog size 35, PerfScore 1104.08, instruction count 335, allocated bytes for code 1747 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
+; Total bytes of code 1719, prolog size 35, PerfScore 1099.42, instruction count 331, allocated bytes for code 1719 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.57%) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)

@@ -312,10 +312,9 @@ G_M59915_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpq k1, ymm2, ymmword ptr [rbp-0x30], 2
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -366,10 +365,9 @@ G_M59915_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpq k1, ymm2, ymm1, 2
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M59915_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -420,10 +418,9 @@ G_M59915_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpq k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -474,10 +471,9 @@ G_M59915_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpq k1, ymm1, ymmword ptr [rbp-0x30], 5
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -641,7 +637,7 @@ G_M59915_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1783, prolog size 35, PerfScore 1110.08, instruction count 335, allocated bytes for code 1783 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 1755, prolog size 35, PerfScore 1105.42, instruction count 331, allocated bytes for code 1755 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.57%) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)

@@ -314,10 +314,9 @@ G_M8563_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x30], 2
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -368,10 +367,9 @@ G_M8563_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpw k1, ymm0, ymm2, 2
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=20 bbWeight=1 PerfScore 9.25
G_M8563_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -422,10 +420,9 @@ G_M8563_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x30], 5
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -476,10 +473,9 @@ G_M8563_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpw k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -643,7 +639,7 @@ G_M8563_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1789, prolog size 35, PerfScore 1266.58, instruction count 336, allocated bytes for code 1789 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
+; Total bytes of code 1761, prolog size 35, PerfScore 1264.58, instruction count 332, allocated bytes for code 1761 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
; ============================================================ Unwind Info:

-16 (-0.39%) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)

@@ -437,14 +437,13 @@ G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x100], edi jmp G_M59915_IG25
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 4 jae G_M59915_IG54 @@ -523,14 +522,13 @@ G_M59915_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x108], edi jmp G_M59915_IG30
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 4 jae G_M59915_IG54 @@ -609,14 +607,13 @@ G_M59915_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x110], edi jmp G_M59915_IG35
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 4 jae G_M59915_IG54 @@ -695,14 +692,13 @@ G_M59915_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x118], edi jmp G_M59915_IG40
- ;; size=75 bbWeight=1 PerfScore 23.25
+ ;; size=71 bbWeight=1 PerfScore 22.25
G_M59915_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 4 jae G_M59915_IG54 @@ -964,7 +960,7 @@ G_M59915_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4101, prolog size 78, PerfScore 831.26, instruction count 657, allocated bytes for code 4101 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
+; Total bytes of code 4085, prolog size 78, PerfScore 827.26, instruction count 653, allocated bytes for code 4089 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

@@ -440,14 +440,13 @@ G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x100], edi jmp G_M8563_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 16 jae G_M8563_IG54 @@ -526,14 +525,13 @@ G_M8563_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x108], edi jmp G_M8563_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 16 jae G_M8563_IG54 @@ -612,14 +610,13 @@ G_M8563_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x110], edi jmp G_M8563_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 16 jae G_M8563_IG54 @@ -698,14 +695,13 @@ G_M8563_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x118], edi jmp G_M8563_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M8563_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 16 jae G_M8563_IG54 @@ -967,7 +963,7 @@ G_M8563_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {} int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4104, prolog size 58, PerfScore 860.01, instruction count 660, allocated bytes for code 4104 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
+; Total bytes of code 4088, prolog size 58, PerfScore 860.01, instruction count 656, allocated bytes for code 4092 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)

@@ -440,14 +440,13 @@ G_M44299_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x100], edi jmp G_M44299_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 32 jae G_M44299_IG54 @@ -526,14 +525,13 @@ G_M44299_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x108], edi jmp G_M44299_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 32 jae G_M44299_IG54 @@ -612,14 +610,13 @@ G_M44299_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x110], edi jmp G_M44299_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 32 jae G_M44299_IG54 @@ -698,14 +695,13 @@ G_M44299_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x118], edi jmp G_M44299_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M44299_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 32 jae G_M44299_IG54 @@ -967,7 +963,7 @@ G_M44299_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4104, prolog size 58, PerfScore 860.01, instruction count 660, allocated bytes for code 4104 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
+; Total bytes of code 4088, prolog size 58, PerfScore 860.01, instruction count 656, allocated bytes for code 4092 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
; ============================================================ Unwind Info:

libraries.pmi.linux.x64.checked.mch

-21 (-20.19%) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -17,7 +17,7 @@ ;* V06 tmp1 [V06 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 tmp2 [V07 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T03] ( 2, 2 ) simd64 -> mm2 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T03] ( 2, 2 ) simd64 -> mm0 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; ; Lcl frame size = 0 @@ -32,26 +32,23 @@ G_M10214_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M10214_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref ; byrRegs +[rdi] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm0, zmm1, -54 - vpcmpub k1, zmm0, zmm1, 1 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm1, zmm0 + vpcmpub k2, zmm0, zmm1, 1 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [rdi], zmm0
mov rax, rdi ; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.19%) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T02] ( 4, 4 ) simd64 -> mm1 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" @@ -32,26 +32,23 @@ G_M22834_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M22834_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref ; byrRegs +[rdi] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm1, zmm0, -54 - vpcmpub k1, zmm0, zmm1, 6 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm0, zmm1 + vpcmpub k2, zmm0, zmm1, 6 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [rdi], zmm0
mov rax, rdi ; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.19%) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T01] ( 5, 5 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" @@ -32,26 +32,23 @@ G_M30188_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M30188_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref ; byrRegs +[rdi] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm0, zmm1, -54 - vpcmpub k1, zmm0, zmm1, 1 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm1, zmm0 + vpcmpub k2, zmm0, zmm1, 1 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [rdi], zmm0
mov rax, rdi ; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M30188_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=29ab8a13) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=29ab8a13) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-14 (-5.19%) : 20784.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],byref):System.Runtime.Intrinsics.Vector256`1ubyte

@@ -37,7 +37,7 @@ ; V26 cse1 [V26,T09] ( 3, 3 ) simd32 -> mm5 "CSE - moderate" ; V27 cse2 [V27,T10] ( 3, 3 ) simd32 -> mm6 "CSE - moderate" ; V28 cse3 [V28,T11] ( 3, 3 ) simd32 -> mm7 "CSE - moderate"
-; V29 cse4 [V29,T12] ( 3, 3 ) simd32 -> mm9 "CSE - moderate"
+; V29 cse4 [V29,T12] ( 3, 3 ) simd32 -> mm8 "CSE - moderate"
; ; Lcl frame size = 0 @@ -68,13 +68,12 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, vpand ymm4, ymm4, ymm6 vmovups ymm7, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm4, ymm7, 6
- vpmovm2b ymm8, k1 - vmovups ymm9, ymmword ptr [reloc @RWD160] - vpsubb ymm10, ymm4, ymm9 - vpshufb ymm10, ymm1, ymm10
+ vmovups ymm8, ymmword ptr [reloc @RWD160] + vpsubb ymm9, ymm4, ymm8 + vpshufb ymm9, ymm1, ymm9
vpshufb ymm4, ymm0, ymm4
- vpternlogd ymm8, ymm10, ymm4, -54 - vpand ymm3, ymm8, ymm3
+ vpblendmb ymm4 {k1}, ymm4, ymm9 + vpand ymm3, ymm4, ymm3
vxorps ymm4, ymm4, ymm4 vpcmpeqb ymm3, ymm3, ymm4 vpcmpeqd ymm4, ymm4, ymm4 @@ -85,12 +84,11 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, vpshufb ymm4, ymm5, ymm4 vpand ymm2, ymm2, ymm6 vpcmpub k1, ymm2, ymm7, 6
- vpmovm2b ymm5, k1 - vpsubb ymm6, ymm2, ymm9 - vpshufb ymm1, ymm1, ymm6
+ vpsubb ymm5, ymm2, ymm8 + vpshufb ymm1, ymm1, ymm5
vpshufb ymm0, ymm0, ymm2
- vpternlogd ymm5, ymm1, ymm0, -54 - vpand ymm0, ymm5, ymm4
+ vpblendmb ymm0 {k1}, ymm0, ymm1 + vpand ymm0, ymm0, ymm4
vxorps ymm1, ymm1, ymm1 vpcmpeqb ymm0, ymm0, ymm1 vpcmpeqd ymm1, ymm1, ymm1 @@ -99,7 +97,7 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, vmovups ymmword ptr [rdi], ymm0 mov rax, rdi ; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 78.25
+ ;; size=234 bbWeight=1 PerfScore 77.25
G_M59405_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp @@ -113,7 +111,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 270, prolog size 7, PerfScore 91.00, instruction count 56, allocated bytes for code 270 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 256, prolog size 7, PerfScore 90.00, instruction count 54, allocated bytes for code 256 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-7 (-4.32%) : 206873.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float

@@ -40,9 +40,8 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vpandd zmm3, zmm4, dword ptr [reloc @RWD128] {1to16} vpord zmm4, zmm3, dword ptr [reloc @RWD132] {1to16} vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1 - vpslld zmm5, zmm4, 1 - vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1 + vpblendmd zmm2 {k1}, zmm2, zmm4
vpslld zmm0, zmm0, 13 vpandd zmm0, zmm0, dword ptr [reloc @RWD136] {1to16} vpaddd zmm0, zmm0, zmm2 @@ -51,7 +50,7 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vmovups zmmword ptr [rdi], zmm0 mov rax, rdi ; byrRegs +[rax]
- ;; size=140 bbWeight=1 PerfScore 29.25
+ ;; size=133 bbWeight=1 PerfScore 28.25
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp @@ -67,7 +66,7 @@ RWD132 dd 38000000h RWD136 dd 0FFFE000h
-; Total bytes of code 162, prolog size 7, PerfScore 37.00, instruction count 27, allocated bytes for code 162 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 155, prolog size 7, PerfScore 36.00, instruction count 26, allocated bytes for code 155 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================ Unwind Info:

-28 (-2.59%) : 20791.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -102,13 +102,13 @@ ; V91 cse1 [V91,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V92 cse2 [V92,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V93 cse3 [V93,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V94 cse4 [V94,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V94 cse4 [V94,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V95 cse5 [V95,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V96 cse6 [V96,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V97 cse7 [V97,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V98 cse8 [V98,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V99 cse9 [V99,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V100 cse10 [V100,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V100 cse10 [V100,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 168 @@ -187,13 +187,12 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -204,12 +203,11 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -218,7 +216,7 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vmovups ymmword ptr [rbp-0xB0], ymm4 vptest ymm4, ymm4 je SHORT G_M48875_IG07
- ;; size=265 bbWeight=4 PerfScore 336.00
+ ;; size=251 bbWeight=4 PerfScore 332.00
G_M48875_IG05: ; bbWeight=2, gcVars=0000000000000201 {V04 V05}, gcrefRegs=0000 {}, byrefRegs=6008 {rbx r13 r14}, gcvars, byref ; byrRegs -[rcx] mov edi, 1 @@ -342,13 +340,12 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -358,12 +355,11 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -371,7 +367,7 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG19
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG15: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -459,7 +455,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1080, prolog size 43, PerfScore 1246.50, instruction count 241, allocated bytes for code 1080 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1052, prolog size 43, PerfScore 1238.50, instruction count 237, allocated bytes for code 1052 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

libraries_tests.run.linux.x64.Release.mch

-28 (-20.29%) : 447031.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]](System.Runtime.Intrinsics.Vector1281[uint]):uint (Tier1)

@@ -37,30 +37,26 @@ G_M12292_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 {k1}, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vpshufd xmm0, xmm2, -79 vpcmpeqd xmm1, xmm2, xmm0 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm2, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm2, -54
+ vpblendmd xmm3 {k1}, xmm2, xmm0
vpcmpud k1, xmm2, xmm0, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm2, xmm0, -54 - vpternlogd xmm1, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm0, xmm2 + vpternlogd xmm1, xmm3, xmm0, -54
vmovd eax, xmm1
- ;; size=124 bbWeight=1 PerfScore 24.67
+ ;; size=96 bbWeight=1 PerfScore 20.00
G_M12292_IG03: ; bbWeight=1, epilog, nogc, extend pop rbp ret ;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 138, prolog size 7, PerfScore 31.42, instruction count 27, allocated bytes for code 138 (MethodHash=4174cffb) for method System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
+; Total bytes of code 110, prolog size 7, PerfScore 26.75, instruction count 23, allocated bytes for code 110 (MethodHash=4174cffb) for method System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
; ============================================================ Unwind Info:

-14 (-17.28%) : 433093.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,22 +34,20 @@ G_M36523_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm1, -54
+ vpblendmd xmm3 {k1}, xmm1, xmm0
vpcmpud k1, xmm0, xmm1, 1
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [rdi], xmm2 mov rax, rdi ; byrRegs +[rax]
- ;; size=62 bbWeight=1 PerfScore 12.58
+ ;; size=48 bbWeight=1 PerfScore 10.25
G_M36523_IG03: ; bbWeight=1, epilog, nogc, extend pop rbp ret ;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 81, prolog size 7, PerfScore 22.33, instruction count 18, allocated bytes for code 81 (MethodHash=772b7154) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 7, PerfScore 20.00, instruction count 16, allocated bytes for code 67 (MethodHash=772b7154) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================ Unwind Info:

-14 (-17.28%) : 432810.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -35,22 +35,20 @@ G_M23551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 {k1}, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [rdi], xmm2 mov rax, rdi ; byrRegs +[rax]
- ;; size=62 bbWeight=1 PerfScore 12.58
+ ;; size=48 bbWeight=1 PerfScore 10.25
G_M23551_IG03: ; bbWeight=1, epilog, nogc, extend pop rbp ret ;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 81, prolog size 7, PerfScore 22.33, instruction count 18, allocated bytes for code 81 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 7, PerfScore 20.00, instruction count 16, allocated bytes for code 67 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================ Unwind Info:

-7 (-2.86%) : 431769.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector2561[uint],System.Runtime.Intrinsics.Vector2561[uint]):System.Runtime.Intrinsics.Vector2561uint

@@ -47,9 +47,8 @@ G_M25547_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogd ymm1, ymm2, ymmword ptr [rbp+0x10], -54 vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpud k1, ymm2, ymmword ptr [rbp-0x30], 1
- vpmovm2d ymm2, k1 - vmovups ymm3, ymmword ptr [rbp+0x30] - vpternlogd ymm2, ymm3, ymmword ptr [rbp+0x10], -54
+ vmovups ymm2, ymmword ptr [rbp+0x10] + vpblendmd ymm2 {k1}, ymm2, ymmword ptr [rbp+0x30]
vpternlogd ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rdi, 0xD1FFAB1E @@ -61,7 +60,7 @@ G_M25547_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp-0x08]
- ;; size=174 bbWeight=1 PerfScore 62.50
+ ;; size=167 bbWeight=1 PerfScore 61.50
G_M25547_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 240 @@ -69,7 +68,7 @@ G_M25547_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 245, prolog size 55, PerfScore 79.33, instruction count 43, allocated bytes for code 245 (MethodHash=df969c34) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
+; Total bytes of code 238, prolog size 55, PerfScore 78.33, instruction count 42, allocated bytes for code 239 (MethodHash=df969c34) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
; ============================================================ Unwind Info:

-7 (-2.86%) : 431752.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector2561[uint],System.Runtime.Intrinsics.Vector2561[uint]):System.Runtime.Intrinsics.Vector2561uint

@@ -47,9 +47,8 @@ G_M22549_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogd ymm1, ymm2, ymmword ptr [rbp+0x10], -54 vmovups ymm2, ymmword ptr [rbp-0x30] vpcmpud k1, ymm2, ymmword ptr [rbp-0x50], 6
- vpmovm2d ymm2, k1 - vmovups ymm3, ymmword ptr [rbp+0x10] - vpternlogd ymm2, ymm3, ymmword ptr [rbp+0x30], -54
+ vmovups ymm2, ymmword ptr [rbp+0x30] + vpblendmd ymm2 {k1}, ymm2, ymmword ptr [rbp+0x10]
vpternlogd ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rdi, 0xD1FFAB1E @@ -61,7 +60,7 @@ G_M22549_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp-0x08]
- ;; size=174 bbWeight=1 PerfScore 62.50
+ ;; size=167 bbWeight=1 PerfScore 61.50
G_M22549_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 240 @@ -69,7 +68,7 @@ G_M22549_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 245, prolog size 55, PerfScore 79.33, instruction count 43, allocated bytes for code 245 (MethodHash=7ad0a7ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
+; Total bytes of code 238, prolog size 55, PerfScore 78.33, instruction count 42, allocated bytes for code 239 (MethodHash=7ad0a7ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
; ============================================================ Unwind Info:

-28 (-2.61%) : 378853.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)

@@ -103,7 +103,7 @@ ; V91 cse1 [V91,T23] ( 3, 17.08) simd32 -> mm7 "CSE - aggressive" ; V92 cse2 [V92,T24] ( 3, 17.08) simd32 -> mm8 "CSE - aggressive" ; V93 cse3 [V93,T25] ( 3, 17.08) simd32 -> mm9 "CSE - aggressive"
-; V94 cse4 [V94,T26] ( 3, 17.08) simd32 -> mm11 "CSE - aggressive"
+; V94 cse4 [V94,T26] ( 3, 17.08) simd32 -> mm10 "CSE - aggressive"
; ; Lcl frame size = 168 @@ -180,13 +180,12 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -197,14 +196,13 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11
+ vpsubb ymm7, ymm4, ymm10
vmovups ymmword ptr [rbp-0x90], ymm3
- vpshufb ymm8, ymm3, ymm8
+ vpshufb ymm7, ymm3, ymm7
vmovups ymmword ptr [rbp-0x70], ymm2 vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -213,7 +211,7 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vmovups ymmword ptr [rbp-0xB0], ymm4 vptest ymm4, ymm4 jne SHORT G_M48875_IG07
- ;; size=278 bbWeight=5.69 PerfScore 489.70
+ ;; size=264 bbWeight=5.69 PerfScore 484.00
G_M48875_IG05: ; bbWeight=4.77, gcVars=0000000000000401 {V04 V05}, gcrefRegs=0000 {}, byrefRegs=C008 {rbx r14 r15}, gcvars, byref ; byrRegs -[rcx] mov rcx, bword ptr [rbp-0xC0] @@ -335,12 +333,11 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpshufb xmm3, xmm5, xmm3 vpand xmm4, xmm4, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm4, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm4, xmmword ptr [reloc @RWD160] - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm4, xmmword ptr [reloc @RWD160] + vpshufb xmm5, xmm1, xmm5
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm5, xmm6, xmm4, -54 - vpand xmm3, xmm5, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm5 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -351,12 +348,11 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpshufb xmm4, xmm5, xmm4 vpand xmm2, xmm2, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm2, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmmword ptr [reloc @RWD160] - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmmword ptr [reloc @RWD160] + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -364,7 +360,7 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG25
- ;; size=250 bbWeight=0.07 PerfScore 4.43
+ ;; size=236 bbWeight=0.07 PerfScore 4.36
G_M48875_IG18: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rbx r14 r15}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=0.07 PerfScore 0.13 @@ -474,7 +470,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1074, prolog size 42, PerfScore 652.51, instruction count 236, allocated bytes for code 1074 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
+; Total bytes of code 1046, prolog size 42, PerfScore 646.75, instruction count 232, allocated bytes for code 1046 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
; ============================================================ Unwind Info:

librariestestsnotieredcompilation.run.linux.x64.Release.mch

-28 (-2.59%) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -106,13 +106,13 @@ ; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 136 @@ -194,13 +194,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -211,12 +210,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -224,7 +222,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpermq ymm4, ymm4, -40 vpmovmskb r12d, ymm4 @@ -356,13 +354,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -372,12 +369,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -385,7 +381,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -473,7 +469,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1081, prolog size 43, PerfScore 1253.25, instruction count 240, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1053, prolog size 43, PerfScore 1245.25, instruction count 236, allocated bytes for code 1053 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

-14 (-1.65%) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)

@@ -98,8 +98,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rdi+0x10] vmovups ymmword ptr [rbp-0x50], ymm1 vpcmpud k1, ymm0, ymm1, 6
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54
+ vpblendmd ymm2 {k1}, ymm1, ymm0
vmovups ymmword ptr [rbp-0x70], ymm2 mov rdi, 0xD1FFAB1E ; System.Action`2[int,uint] ; gcrRegs -[rdi] @@ -138,9 +137,9 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- ;; size=313 bbWeight=1 PerfScore 86.50 -G_M21446_IG03: ; bbWeight=1, extend
vmovdqu xmm0, xmmword ptr [rbp-0xB0]
+ ;; size=314 bbWeight=1 PerfScore 88.33 +G_M21446_IG03: ; bbWeight=1, extend
vpextrd edx, xmm0, 3 mov esi, 3 mov rdi, gword ptr [r15+0x08] @@ -182,9 +181,8 @@ G_M21446_IG03: ; bbWeight=1, extend vmovups ymm0, ymmword ptr [rbp-0x30] vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpud k1, ymm0, ymm1, 2
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54 - vmovups ymmword ptr [rbp-0x90], ymm2
+ vpblendmd ymm0 {k1}, ymm1, ymm0 + vmovups ymmword ptr [rbp-0x90], ymm0
mov rdi, 0xD1FFAB1E ; System.Action`2[int,uint] call CORINFO_HELP_NEWSFAST ; gcrRegs +[rax] @@ -199,63 +197,63 @@ G_M21446_IG03: ; bbWeight=1, extend ; byrRegs -[rdi] mov rdx, 0xD1FFAB1E ; code for <unknown method> mov qword ptr [r15+0x18], rdx
- vmovups ymm2, ymmword ptr [rbp-0x90] - vmovups ymmword ptr [rbp-0xD0], ymm2 - vmovd edx, xmm2
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vmovups ymmword ptr [rbp-0xD0], ymm0 + vmovd edx, xmm0
xor esi, esi mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0] - vpextrd edx, xmm0, 1
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0] + vpextrd edx, xmm1, 1
mov esi, 1 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0] - vpextrd edx, xmm0, 2
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0] + vpextrd edx, xmm1, 2
mov esi, 2 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0] - vpextrd edx, xmm0, 3
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0] + vpextrd edx, xmm1, 3
mov esi, 3 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1 - vmovd edx, xmm0 - ;; size=368 bbWeight=1 PerfScore 139.25 -G_M21446_IG04: ; bbWeight=1, extend
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm1, ymm0, 1 + vmovd edx, xmm1
mov esi, 4 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi]
+ ;; size=362 bbWeight=1 PerfScore 137.33 +G_M21446_IG04: ; bbWeight=1, extend
call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 1
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 1
mov esi, 5 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 2
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 2
mov esi, 6 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm0, ymm0, 1
vpextrd edx, xmm0, 3 mov esi, 7 mov rdi, gword ptr [r15+0x08] @@ -263,7 +261,7 @@ G_M21446_IG04: ; bbWeight=1, extend call [r15+0x18]<unknown method> ; gcrRegs -[rdi r15] nop
- ;; size=113 bbWeight=1 PerfScore 48.25
+ ;; size=104 bbWeight=1 PerfScore 46.00
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 208 @@ -277,7 +275,7 @@ G_M21446_IG06: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 847, prolog size 31, PerfScore 283.75, instruction count 165, allocated bytes for code 847 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 833, prolog size 31, PerfScore 281.42, instruction count 163, allocated bytes for code 833 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================ Unwind Info:

smoke_tests.nativeaot.linux.x64.checked.mch

-28 (-2.61%) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -105,13 +105,13 @@ ; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 136 @@ -193,13 +193,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -210,12 +209,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -223,7 +221,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpermq ymm4, ymm4, -40 vpmovmskb r12d, ymm4 @@ -355,13 +353,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -371,12 +368,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -384,7 +380,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -472,7 +468,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1072, prolog size 43, PerfScore 1188.50, instruction count 240, allocated bytes for code 1072 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1044, prolog size 43, PerfScore 1180.50, instruction count 236, allocated bytes for code 1044 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Cfi Info:

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.linux.x64.checked.mch 2 2 0 0 -42 +0
benchmarks.run_pgo.linux.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_tiered.linux.x64.checked.mch 0 0 0 0 -0 +0
coreclr_tests.run.linux.x64.checked.mch 16 16 0 0 -528 +0
libraries.crossgen2.linux.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.linux.x64.checked.mch 24 24 0 0 -469 +0
libraries_tests.run.linux.x64.Release.mch 75 75 0 0 -8,897 +0
librariestestsnotieredcompilation.run.linux.x64.Release.mch 7 7 0 0 -756 +0
realworld.run.linux.x64.checked.mch 0 0 0 0 -0 +0
smoke_tests.nativeaot.linux.x64.checked.mch 1 1 0 0 -28 +0
125 125 0 0 -10,720 +0

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.x64.checked.mch 42,857 3,142 39,715 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.linux.x64.checked.mch 158,377 60,175 98,202 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.linux.x64.checked.mch 56,500 42,284 14,216 0 (0.00%) 0 (0.00%)
coreclr_tests.run.linux.x64.checked.mch 596,771 354,686 242,085 0 (0.00%) 0 (0.00%)
libraries.crossgen2.linux.x64.checked.mch 234,032 15 234,017 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.x64.checked.mch 296,234 6 296,228 0 (0.00%) 0 (0.00%)
libraries_tests.run.linux.x64.Release.mch 761,652 495,580 266,072 0 (0.00%) 0 (0.00%)
librariestestsnotieredcompilation.run.linux.x64.Release.mch 305,348 21,873 283,475 0 (0.00%) 0 (0.00%)
realworld.run.linux.x64.checked.mch 33,069 9 33,060 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.x64.checked.mch 27,422 10 27,412 0 (0.00%) 0 (0.00%)
2,512,262 977,780 1,534,482 0 (0.00%) 0 (0.00%)

jit-analyze output

benchmarks.run.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16454856 (overridden on cmd)
Total bytes of diff: 16454814 (overridden on cmd)
Total bytes of delta: -42 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 27287.dasm (-2.59 % of base)
         -14 : 10422.dasm (-4.14 % of base)

2 total files with Code Size differences (2 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.59 % of base) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -14 (-4.14 % of base) : 10422.dasm - System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)

Top method improvements (percentages):
         -14 (-4.14 % of base) : 10422.dasm - System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
         -28 (-2.59 % of base) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

2 total methods with Code Size differences (2 improved, 0 regressed).


coreclr_tests.run.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 403726743 (overridden on cmd)
Total bytes of diff: 403726215 (overridden on cmd)
Total bytes of delta: -528 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 491717.dasm (-3.11 % of base)
         -56 : 491721.dasm (-3.14 % of base)
         -56 : 491718.dasm (-3.07 % of base)
         -56 : 491722.dasm (-3.09 % of base)
         -32 : 205671.dasm (-0.78 % of base)
         -32 : 205678.dasm (-0.78 % of base)
         -32 : 205673.dasm (-0.77 % of base)
         -32 : 205679.dasm (-0.77 % of base)
         -28 : 491719.dasm (-1.57 % of base)
         -28 : 491715.dasm (-1.60 % of base)
         -28 : 491720.dasm (-1.57 % of base)
         -28 : 491716.dasm (-1.57 % of base)
         -16 : 205676.dasm (-0.39 % of base)
         -16 : 205670.dasm (-0.39 % of base)
         -16 : 205669.dasm (-0.40 % of base)
         -16 : 205675.dasm (-0.39 % of base)

16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-3.07 % of base) : 491718.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-3.14 % of base) : 491721.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-3.09 % of base) : 491722.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-3.11 % of base) : 491717.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -32 (-0.77 % of base) : 205673.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 205678.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 205679.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 205671.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -28 (-1.57 % of base) : 491720.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.60 % of base) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.57 % of base) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.57 % of base) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -16 (-0.39 % of base) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.40 % of base) : 205669.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

Top method improvements (percentages):
         -56 (-3.14 % of base) : 491721.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-3.11 % of base) : 491717.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -56 (-3.09 % of base) : 491722.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-3.07 % of base) : 491718.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -28 (-1.60 % of base) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.57 % of base) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.57 % of base) : 491720.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.57 % of base) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -32 (-0.78 % of base) : 205678.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 205671.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 205679.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 205673.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -16 (-0.40 % of base) : 205669.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

16 total methods with Code Size differences (16 improved, 0 regressed).


libraries.pmi.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 60288822 (overridden on cmd)
Total bytes of diff: 60288353 (overridden on cmd)
Total bytes of delta: -469 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 207128.dasm (-20.51 % of base)
         -56 : 207071.dasm (-20.51 % of base)
         -28 : 207073.dasm (-17.18 % of base)
         -28 : 207130.dasm (-17.18 % of base)
         -28 : 20791.dasm (-2.59 % of base)
         -21 : 207070.dasm (-20.19 % of base)
         -21 : 207092.dasm (-20.19 % of base)
         -21 : 207127.dasm (-20.19 % of base)
         -21 : 207149.dasm (-20.19 % of base)
         -14 : 207072.dasm (-15.56 % of base)
         -14 : 207068.dasm (-17.28 % of base)
         -14 : 207091.dasm (-16.67 % of base)
         -14 : 207126.dasm (-16.67 % of base)
         -14 : 207148.dasm (-16.67 % of base)
         -14 : 20784.dasm (-5.19 % of base)
         -14 : 207125.dasm (-17.28 % of base)
         -14 : 207129.dasm (-15.56 % of base)
         -14 : 207147.dasm (-17.28 % of base)
         -14 : 20786.dasm (-5.41 % of base)
         -14 : 207069.dasm (-16.67 % of base)

24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-20.51 % of base) : 207071.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.51 % of base) : 207128.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -28 (-2.59 % of base) : 20791.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -28 (-17.18 % of base) : 207073.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-17.18 % of base) : 207130.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -21 (-20.19 % of base) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207092.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-5.41 % of base) : 20786.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-5.19 % of base) : 20784.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207068.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-15.56 % of base) : 207072.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.67 % of base) : 207069.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207090.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207091.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207125.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-15.56 % of base) : 207129.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.67 % of base) : 207126.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207147.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

Top method improvements (percentages):
         -56 (-20.51 % of base) : 207071.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.51 % of base) : 207128.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -21 (-20.19 % of base) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207092.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207068.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207090.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207125.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207147.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -28 (-17.18 % of base) : 207073.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-17.18 % of base) : 207130.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -14 (-16.67 % of base) : 207069.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207091.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207126.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207148.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-15.56 % of base) : 207072.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-15.56 % of base) : 207129.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
          -7 (-5.51 % of base) : 20787.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-5.41 % of base) : 20786.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

24 total methods with Code Size differences (24 improved, 0 regressed).


libraries_tests.run.linux.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 342241520 (overridden on cmd)
Total bytes of diff: 342232623 (overridden on cmd)
Total bytes of delta: -8897 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -350 : 432478.dasm (-17.22 % of base)
        -350 : 432506.dasm (-19.84 % of base)
        -350 : 433621.dasm (-15.43 % of base)
        -350 : 433873.dasm (-15.43 % of base)
        -350 : 437375.dasm (-15.26 % of base)
        -350 : 437419.dasm (-15.26 % of base)
        -350 : 439257.dasm (-19.56 % of base)
        -350 : 439478.dasm (-15.26 % of base)
        -350 : 439729.dasm (-15.26 % of base)
        -350 : 446906.dasm (-15.43 % of base)
        -350 : 446913.dasm (-17.22 % of base)
        -350 : 446980.dasm (-15.43 % of base)
        -350 : 448831.dasm (-15.26 % of base)
        -350 : 448983.dasm (-15.26 % of base)
        -350 : 449381.dasm (-15.26 % of base)
        -350 : 449490.dasm (-15.26 % of base)
        -350 : 449527.dasm (-17.02 % of base)
        -350 : 449532.dasm (-19.56 % of base)
        -182 : 443696.dasm (-17.43 % of base)
        -182 : 443767.dasm (-17.42 % of base)

62 total files with Code Size differences (62 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -350 (-17.22 % of base) : 432478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-17.22 % of base) : 446913.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-19.84 % of base) : 432506.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-17.02 % of base) : 449527.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 439257.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 449532.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 433873.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 446906.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 433621.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 446980.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 437375.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 439478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 448831.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 449381.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 437419.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 439729.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 448983.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 449490.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -182 (-17.43 % of base) : 443696.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.43 % of base) : 446405.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)

Top method improvements (percentages):
         -28 (-20.29 % of base) : 447031.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
        -350 (-19.84 % of base) : 432506.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 439257.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 449532.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -182 (-17.43 % of base) : 443696.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.43 % of base) : 446405.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.42 % of base) : 443767.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.42 % of base) : 446518.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
         -14 (-17.28 % of base) : 437547.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]](System.Runtime.Intrinsics.Vector128`1[ulong]):ulong (Tier1)
         -14 (-17.28 % of base) : 449076.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.Runtime.Intrinsics.Vector128`1[ulong]):ulong (Tier1)
         -14 (-17.28 % of base) : 432810.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.28 % of base) : 447015.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.28 % of base) : 433842.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.28 % of base) : 433093.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
        -350 (-17.22 % of base) : 432478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-17.22 % of base) : 446913.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
         -14 (-17.07 % of base) : 437311.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-17.07 % of base) : 437354.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-17.07 % of base) : 439184.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-17.07 % of base) : 449015.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)


librariestestsnotieredcompilation.run.linux.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 132684790 (overridden on cmd)
Total bytes of diff: 132684034 (overridden on cmd)
Total bytes of delta: -756 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -182 : 160726.dasm (-17.91 % of base)
        -182 : 160167.dasm (-17.91 % of base)
        -154 : 160688.dasm (-17.21 % of base)
         -98 : 160706.dasm (-15.38 % of base)
         -98 : 160519.dasm (-15.38 % of base)
         -28 : 142512.dasm (-2.59 % of base)
         -14 : 161431.dasm (-1.65 % of base)

7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -182 (-17.91 % of base) : 160726.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.91 % of base) : 160167.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.21 % of base) : 160688.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.38 % of base) : 160706.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.38 % of base) : 160519.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.59 % of base) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -14 (-1.65 % of base) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

Top method improvements (percentages):
        -182 (-17.91 % of base) : 160726.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.91 % of base) : 160167.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.21 % of base) : 160688.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.38 % of base) : 160706.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.38 % of base) : 160519.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.59 % of base) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -14 (-1.65 % of base) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

7 total methods with Code Size differences (7 improved, 0 regressed).


smoke_tests.nativeaot.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 4195910 (overridden on cmd)
Total bytes of diff: 4195882 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 2104.dasm (-2.61 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.61 % of base) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
         -28 (-2.61 % of base) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).



osx arm64

Diffs are based on 2,236,017 contexts (927,360 MinOpts, 1,308,657 FullOpts).

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run_pgo.osx.arm64.checked.mch 84,826 48,345 36,481 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.osx.arm64.checked.mch 48,316 37,331 10,985 0 (0.00%) 0 (0.00%)
coreclr_tests.run.osx.arm64.checked.mch 586,585 358,028 228,557 0 (0.00%) 0 (0.00%)
libraries.crossgen2.osx.arm64.checked.mch 233,760 15 233,745 0 (0.00%) 0 (0.00%)
libraries.pmi.osx.arm64.checked.mch 315,616 18 315,598 0 (0.00%) 0 (0.00%)
libraries_tests.run.osx.arm64.Release.mch 632,257 462,062 170,195 0 (0.00%) 0 (0.00%)
librariestestsnotieredcompilation.run.osx.arm64.Release.mch 303,114 21,558 281,556 0 (0.00%) 0 (0.00%)
realworld.run.osx.arm64.checked.mch 31,543 3 31,540 0 (0.00%) 0 (0.00%)
2,236,017 927,360 1,308,657 0 (0.00%) 0 (0.00%)


windows arm64

Diffs are based on 2,314,798 contexts (929,692 MinOpts, 1,385,106 FullOpts).

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.arm64.checked.mch 24,447 4 24,443 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.arm64.checked.mch 96,983 48,066 48,917 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.windows.arm64.checked.mch 48,473 36,693 11,780 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.arm64.checked.mch 595,703 362,539 233,164 0 (0.00%) 0 (0.00%)
libraries.crossgen2.windows.arm64.checked.mch 243,831 15 243,816 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.arm64.checked.mch 304,871 6 304,865 0 (0.00%) 0 (0.00%)
libraries_tests.run.windows.arm64.Release.mch 626,054 460,799 165,255 0 (0.00%) 0 (0.00%)
librariestestsnotieredcompilation.run.windows.arm64.Release.mch 317,037 21,559 295,478 0 (0.00%) 0 (0.00%)
realworld.run.windows.arm64.checked.mch 33,244 3 33,241 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.arm64.checked.mch 24,155 8 24,147 0 (0.00%) 0 (0.00%)
2,314,798 929,692 1,385,106 0 (0.00%) 0 (0.00%)


windows x64

Diffs are based on 2,373,201 contexts (928,756 MinOpts, 1,444,445 FullOpts).

Overall (-2,856 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,749,502 -28
coreclr_tests.run.windows.x64.checked.mch 393,893,406 -528
libraries.pmi.windows.x64.checked.mch 61,525,850 -499
libraries_tests.run.windows.x64.Release.mch 279,744,051 -1,014
librariestestsnotieredcompilation.run.windows.x64.Release.mch 137,525,226 -759
smoke_tests.nativeaot.windows.x64.checked.mch 5,089,881 -28

MinOpts (-248 bytes)

Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x64.checked.mch 273,505,068 -192
libraries_tests.run.windows.x64.Release.mch 175,004,596 -56

FullOpts (-2,608 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,749,141 -28
coreclr_tests.run.windows.x64.checked.mch 120,388,338 -336
libraries.pmi.windows.x64.checked.mch 61,412,331 -499
libraries_tests.run.windows.x64.Release.mch 104,739,455 -958
librariestestsnotieredcompilation.run.windows.x64.Release.mch 126,648,064 -759
smoke_tests.nativeaot.windows.x64.checked.mch 5,088,934 -28

Example diffs

benchmarks.run.windows.x64.checked.mch

-28 (-2.50%) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

coreclr_tests.run.windows.x64.checked.mch

-28 (-1.46%) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)

@@ -331,10 +331,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm8, ymm6, 2
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -389,10 +388,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm8, ymm7, 2
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -447,10 +445,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm8, ymm6, 5
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -505,10 +502,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm7, ymm6, 5
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -687,7 +683,7 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1922, prolog size 82, PerfScore 1043.58, instruction count 381, allocated bytes for code 1922 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
+; Total bytes of code 1894, prolog size 82, PerfScore 1038.92, instruction count 377, allocated bytes for code 1894 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.43%) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)

@@ -331,10 +331,9 @@ G_M59915_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm8, ymm6, 2
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -389,10 +388,9 @@ G_M59915_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm8, ymm7, 2
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -447,10 +445,9 @@ G_M59915_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm8, ymm6, 5
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -505,10 +502,9 @@ G_M59915_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm7, ymm6, 5
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -687,7 +683,7 @@ G_M59915_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1956, prolog size 82, PerfScore 1049.58, instruction count 381, allocated bytes for code 1956 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 1928, prolog size 82, PerfScore 1044.92, instruction count 377, allocated bytes for code 1928 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.42%) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)

@@ -334,10 +334,9 @@ G_M8563_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm6, ymm7, 2
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -392,10 +391,9 @@ G_M8563_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm6, ymm8, 2
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -450,10 +448,9 @@ G_M8563_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm6, ymm7, 5
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -508,10 +505,9 @@ G_M8563_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm8, ymm7, 5
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -690,7 +686,7 @@ G_M8563_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1968, prolog size 82, PerfScore 1206.33, instruction count 383, allocated bytes for code 1968 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
+; Total bytes of code 1940, prolog size 82, PerfScore 1204.33, instruction count 379, allocated bytes for code 1940 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
; ============================================================ Unwind Info:

-16 (-0.39%) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)

@@ -437,14 +437,13 @@ G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x100], ecx jmp G_M59915_IG25
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 4 jae G_M59915_IG54 @@ -523,14 +522,13 @@ G_M59915_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x108], ecx jmp G_M59915_IG30
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 4 jae G_M59915_IG54 @@ -609,14 +607,13 @@ G_M59915_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x110], ecx jmp G_M59915_IG35
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 4 jae G_M59915_IG54 @@ -695,14 +692,13 @@ G_M59915_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x118], ecx jmp G_M59915_IG40
- ;; size=75 bbWeight=1 PerfScore 23.25
+ ;; size=71 bbWeight=1 PerfScore 22.25
G_M59915_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 4 jae G_M59915_IG54 @@ -964,7 +960,7 @@ G_M59915_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4100, prolog size 77, PerfScore 831.26, instruction count 657, allocated bytes for code 4100 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
+; Total bytes of code 4084, prolog size 77, PerfScore 827.26, instruction count 653, allocated bytes for code 4088 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)

@@ -443,14 +443,13 @@ G_M44299_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x100], ecx jmp G_M44299_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 32 jae G_M44299_IG54 @@ -529,14 +528,13 @@ G_M44299_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x108], ecx jmp G_M44299_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 32 jae G_M44299_IG54 @@ -615,14 +613,13 @@ G_M44299_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x110], ecx jmp G_M44299_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 32 jae G_M44299_IG54 @@ -701,14 +698,13 @@ G_M44299_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x118], ecx jmp G_M44299_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M44299_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 32 jae G_M44299_IG54 @@ -970,7 +966,7 @@ G_M44299_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4123, prolog size 77, PerfScore 865.01, instruction count 663, allocated bytes for code 4123 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
+; Total bytes of code 4107, prolog size 77, PerfScore 865.01, instruction count 659, allocated bytes for code 4111 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

@@ -443,14 +443,13 @@ G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x100], ecx jmp G_M8563_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 16 jae G_M8563_IG54 @@ -529,14 +528,13 @@ G_M8563_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x108], ecx jmp G_M8563_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 16 jae G_M8563_IG54 @@ -615,14 +613,13 @@ G_M8563_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x110], ecx jmp G_M8563_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 16 jae G_M8563_IG54 @@ -701,14 +698,13 @@ G_M8563_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x118], ecx jmp G_M8563_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M8563_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 16 jae G_M8563_IG54 @@ -970,7 +966,7 @@ G_M8563_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {} int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4123, prolog size 77, PerfScore 865.01, instruction count 663, allocated bytes for code 4123 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
+; Total bytes of code 4107, prolog size 77, PerfScore 865.01, instruction count 659, allocated bytes for code 4111 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
; ============================================================ Unwind Info:

libraries.pmi.windows.x64.checked.mch

-21 (-20.39%) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -17,7 +17,7 @@ ; V06 tmp1 [V06,T05] ( 3, 3 ) simd64 -> mm1 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V07 tmp2 [V07,T06] ( 3, 3 ) simd64 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm0 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V11 cse0 [V11,T03] ( 5, 5 ) simd64 -> mm0 "CSE - aggressive" ; V12 cse1 [V12,T04] ( 4, 4 ) simd64 -> mm2 "CSE - aggressive" @@ -34,25 +34,22 @@ G_M27576_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vmovups zmm2, zmmword ptr [r8] vmovaps zmm3, zmm2 vpcmpeqb k1, zmm1, zmm3
- vpmovm2b zmm4, k1 - vxorps ymm5, ymm5, ymm5 - vpcmpub k1, zmm0, zmm5, 1 - vpmovm2b zmm5, k1 - vpternlogd zmm5, zmm2, zmm0, -54 - vpcmpub k1, zmm1, zmm3, 6 - vpmovm2b zmm1, k1 - vpternlogd zmm1, zmm0, zmm2, -54 - vpternlogd zmm4, zmm5, zmm1, -54 - vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4 + vpcmpub k2, zmm0, zmm4, 1 + vpblendmb zmm4 {k2}, zmm0, zmm2 + vpcmpub k2, zmm1, zmm3, 6 + vpblendmb zmm0 {k2}, zmm2, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm4 + vmovups zmmword ptr [rcx], zmm0
mov rax, rcx ; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M27576_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret ;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.39%) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -17,7 +17,7 @@ ; V06 tmp1 [V06,T05] ( 3, 3 ) simd64 -> mm1 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V07 tmp2 [V07,T06] ( 3, 3 ) simd64 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm0 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V11 cse0 [V11,T03] ( 5, 5 ) simd64 -> mm2 "CSE - aggressive" ; V12 cse1 [V12,T04] ( 4, 4 ) simd64 -> mm0 "CSE - aggressive" @@ -34,25 +34,22 @@ G_M10214_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vmovups zmm2, zmmword ptr [r8] vmovaps zmm3, zmm2 vpcmpeqb k1, zmm3, zmm1
- vpmovm2b zmm4, k1 - vxorps ymm5, ymm5, ymm5 - vpcmpub k1, zmm2, zmm5, 1 - vpmovm2b zmm5, k1 - vpternlogd zmm5, zmm2, zmm0, -54 - vpcmpub k1, zmm3, zmm1, 1 - vpmovm2b zmm1, k1 - vpternlogd zmm1, zmm2, zmm0, -54 - vpternlogd zmm4, zmm5, zmm1, -54 - vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4 + vpcmpub k2, zmm2, zmm4, 1 + vpblendmb zmm4 {k2}, zmm0, zmm2 + vpcmpub k2, zmm3, zmm1, 1 + vpblendmb zmm0 {k2}, zmm0, zmm2 + vpblendmb zmm0 {k1}, zmm0, zmm4 + vmovups zmmword ptr [rcx], zmm0
mov rax, rcx ; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret ;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.39%) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T01] ( 3, 6 ) byref -> r8 single-def ; V03 loc0 [V03,T05] ( 3, 3 ) simd64 -> mm1 <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V04 loc1 [V04,T06] ( 3, 3 ) simd64 -> mm3 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T07] ( 2, 2 ) simd64 -> mm4 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T07] ( 2, 2 ) simd64 -> mm0 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" @@ -34,25 +34,22 @@ G_M22834_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vmovups zmm2, zmmword ptr [r8] vmovaps zmm3, zmm2 vpcmpeqb k1, zmm1, zmm3
- vpmovm2b zmm4, k1 - vxorps ymm5, ymm5, ymm5 - vpcmpub k1, zmm0, zmm5, 1 - vpmovm2b zmm5, k1 - vpternlogd zmm5, zmm2, zmm0, -54 - vpcmpub k1, zmm1, zmm3, 6 - vpmovm2b zmm1, k1 - vpternlogd zmm1, zmm0, zmm2, -54 - vpternlogd zmm4, zmm5, zmm1, -54 - vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4 + vpcmpub k2, zmm0, zmm4, 1 + vpblendmb zmm4 {k2}, zmm0, zmm2 + vpcmpub k2, zmm1, zmm3, 6 + vpblendmb zmm0 {k2}, zmm2, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm4 + vmovups zmmword ptr [rcx], zmm0
mov rax, rcx ; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret ;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-7 (-5.65%) : 27696.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte]):System.Runtime.Intrinsics.Vector2561ubyte

@@ -35,14 +35,13 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpshufb ymm1, ymm2, ymm1 vpand ymm0, ymm0, ymmword ptr [reloc @RWD64] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD96], 6
- vpmovm2b ymm2, k1 - vmovups ymm3, ymmword ptr [r8] - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD128] - vpshufb ymm3, ymm3, ymm4 - vmovups ymm4, ymmword ptr [rdx] - vpshufb ymm0, ymm4, ymm0 - vpternlogd ymm2, ymm3, ymm0, -54 - vpand ymm0, ymm2, ymm1
+ vmovups ymm2, ymmword ptr [r8] + vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD128] + vpshufb ymm2, ymm2, ymm3 + vmovups ymm3, ymmword ptr [rdx] + vpshufb ymm0, ymm3, ymm0 + vpblendmb ymm0 {k1}, ymm0, ymm2 + vpand ymm0, ymm0, ymm1
vxorps ymm1, ymm1, ymm1 vpcmpeqb ymm0, ymm0, ymm1 vpcmpeqd ymm1, ymm1, ymm1 @@ -50,7 +49,7 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vmovups ymmword ptr [rcx], ymm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=117 bbWeight=1 PerfScore 42.75
+ ;; size=110 bbWeight=1 PerfScore 42.25
G_M53822_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret @@ -62,7 +61,7 @@ RWD96 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD128 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 124, prolog size 3, PerfScore 45.75, instruction count 24, allocated bytes for code 124 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 117, prolog size 3, PerfScore 45.25, instruction count 23, allocated bytes for code 117 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-7 (-4.58%) : 293786.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float

@@ -39,9 +39,8 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpandd zmm3, zmm4, dword ptr [reloc @RWD128] {1to16} vpord zmm4, zmm3, dword ptr [reloc @RWD132] {1to16} vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1 - vpslld zmm5, zmm4, 1 - vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1 + vpblendmd zmm2 {k1}, zmm2, zmm4
vpslld zmm0, zmm0, 13 vpandd zmm0, zmm0, dword ptr [reloc @RWD136] {1to16} vpaddd zmm0, zmm0, zmm2 @@ -50,7 +49,7 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vmovups zmmword ptr [rcx], zmm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=146 bbWeight=1 PerfScore 33.25
+ ;; size=139 bbWeight=1 PerfScore 32.25
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret @@ -65,7 +64,7 @@ RWD132 dd 38000000h RWD136 dd 0FFFE000h
-; Total bytes of code 153, prolog size 3, PerfScore 36.25, instruction count 24, allocated bytes for code 153 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 146, prolog size 3, PerfScore 35.25, instruction count 23, allocated bytes for code 146 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================ Unwind Info:

-28 (-2.50%) : 27702.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

libraries_tests.run.windows.x64.Release.mch

-14 (-16.67%) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong

@@ -37,21 +37,19 @@ G_M11551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vpcmpeqq xmm4, xmm1, xmm3 vxorps xmm5, xmm5, xmm5 vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1 - vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1 - vpternlogq xmm1, xmm0, xmm2, -54 - vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0 + vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4 mov rax, rcx ; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M11551_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================ Unwind Info:

-14 (-16.67%) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong

@@ -36,21 +36,19 @@ G_M1813_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r8 vpcmpeqq xmm4, xmm1, xmm3 vxorps xmm5, xmm5, xmm5 vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1 - vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1 - vpternlogq xmm1, xmm0, xmm2, -54 - vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0 + vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4 mov rax, rcx ; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M1813_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=f881f8ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=f881f8ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================ Unwind Info:

-14 (-16.67%) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong

@@ -37,21 +37,19 @@ G_M11551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vpcmpeqq xmm4, xmm1, xmm3 vxorps xmm5, xmm5, xmm5 vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1 - vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1 - vpternlogq xmm1, xmm0, xmm2, -54 - vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0 + vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4 mov rax, rcx ; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M11551_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================ Unwind Info:

-7 (-2.52%) : 392581.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector2561[ulong],System.Runtime.Intrinsics.Vector2561[ulong]):System.Runtime.Intrinsics.Vector2561ulong

@@ -59,16 +59,17 @@ G_M12395_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogq ymm1, ymm2, ymmword ptr [rcx], -54 vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpuq k1, ymm2, ymmword ptr [rbp-0x30], 1
- vpmovm2q ymm2, k1
mov rcx, bword ptr [rbp+0x20]
- vmovups ymm3, ymmword ptr [rcx] - mov rcx, bword ptr [rbp+0x18] - vpternlogq ymm2, ymm3, ymmword ptr [rcx], -54
+ mov rax, bword ptr [rbp+0x18] + ; byrRegs +[rax] + vmovups ymm2, ymmword ptr [rax] + vpblendmq ymm2 {k1}, ymm2, ymmword ptr [rcx]
vpternlogq ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rcx, 0xD1FFAB1E ; byrRegs -[rcx] call CORINFO_HELP_COUNTPROFILE32
+ ; byrRegs -[rax]
mov rcx, 0xD1FFAB1E call CORINFO_HELP_COUNTPROFILE32 mov rax, bword ptr [rbp+0x10] @@ -76,7 +77,7 @@ G_M12395_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp+0x10]
- ;; size=200 bbWeight=1 PerfScore 77.00
+ ;; size=193 bbWeight=1 PerfScore 76.00
G_M12395_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 272 @@ -84,7 +85,7 @@ G_M12395_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 278, prolog size 54, PerfScore 95.83, instruction count 53, allocated bytes for code 278 (MethodHash=3d67cf94) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
+; Total bytes of code 271, prolog size 54, PerfScore 94.83, instruction count 52, allocated bytes for code 272 (MethodHash=3d67cf94) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
; ============================================================ Unwind Info:

-7 (-2.52%) : 392539.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector2561[ulong],System.Runtime.Intrinsics.Vector2561[ulong]):System.Runtime.Intrinsics.Vector2561ulong

@@ -59,16 +59,17 @@ G_M63669_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogq ymm1, ymm2, ymmword ptr [rcx], -54 vmovups ymm2, ymmword ptr [rbp-0x30] vpcmpuq k1, ymm2, ymmword ptr [rbp-0x50], 6
- vpmovm2q ymm2, k1
mov rcx, bword ptr [rbp+0x18]
- vmovups ymm3, ymmword ptr [rcx] - mov rcx, bword ptr [rbp+0x20] - vpternlogq ymm2, ymm3, ymmword ptr [rcx], -54
+ mov rax, bword ptr [rbp+0x20] + ; byrRegs +[rax] + vmovups ymm2, ymmword ptr [rax] + vpblendmq ymm2 {k1}, ymm2, ymmword ptr [rcx]
vpternlogq ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rcx, 0xD1FFAB1E ; byrRegs -[rcx] call CORINFO_HELP_COUNTPROFILE32
+ ; byrRegs -[rax]
mov rcx, 0xD1FFAB1E call CORINFO_HELP_COUNTPROFILE32 mov rax, bword ptr [rbp+0x10] @@ -76,7 +77,7 @@ G_M63669_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp+0x10]
- ;; size=200 bbWeight=1 PerfScore 77.00
+ ;; size=193 bbWeight=1 PerfScore 76.00
G_M63669_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 272 @@ -84,7 +85,7 @@ G_M63669_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 278, prolog size 54, PerfScore 95.83, instruction count 53, allocated bytes for code 278 (MethodHash=dc23074a) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
+; Total bytes of code 271, prolog size 54, PerfScore 94.83, instruction count 52, allocated bytes for code 272 (MethodHash=dc23074a) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
; ============================================================ Unwind Info:

-28 (-2.40%) : 342446.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)

@@ -172,12 +172,11 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -188,12 +187,11 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -201,7 +199,7 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpand ymm10, ymm1, ymm0 vptest ymm10, ymm10 jne SHORT G_M48875_IG07
- ;; size=248 bbWeight=4.37 PerfScore 346.74
+ ;; size=234 bbWeight=4.37 PerfScore 342.37
G_M48875_IG05: ; bbWeight=3.45, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref add r15, 64 cmp r15, rsi @@ -318,12 +316,11 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb xmm1, xmm3, xmm1 vpand xmm2, xmm2, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm2, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmmword ptr [reloc @RWD160] - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmmword ptr [reloc @RWD160] + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -334,12 +331,11 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb xmm2, xmm3, xmm2 vpand xmm0, xmm0, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm0, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmmword ptr [reloc @RWD160] - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmmword ptr [reloc @RWD160] + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -347,7 +343,7 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG21
- ;; size=248 bbWeight=0.08 PerfScore 5.05
+ ;; size=234 bbWeight=0.08 PerfScore 4.97
G_M48875_IG18: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=0.07 PerfScore 0.14 @@ -444,7 +440,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1165, prolog size 86, PerfScore 510.92, instruction count 247, allocated bytes for code 1165 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
+; Total bytes of code 1137, prolog size 86, PerfScore 506.47, instruction count 243, allocated bytes for code 1137 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
; ============================================================ Unwind Info:

librariestestsnotieredcompilation.run.windows.x64.Release.mch

-28 (-2.50%) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

-17 (-1.75%) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)

@@ -15,7 +15,7 @@ ;* V04 loc3 [V04 ] ( 0, 0 ) simd32 -> zero-ref <System.Numerics.Vector`1[uint]> ; V05 loc4 [V05,T16] ( 2, 2 ) simd32 -> mm8 <System.Numerics.Vector`1[uint]> ;* V06 loc5 [V06 ] ( 0, 0 ) simd32 -> zero-ref <System.Numerics.Vector`1[uint]>
-; V07 loc6 [V07,T17] ( 2, 2 ) simd32 -> mm8 <System.Numerics.Vector`1[uint]>
+; V07 loc6 [V07,T17] ( 2, 2 ) simd32 -> mm6 <System.Numerics.Vector`1[uint]>
;* V08 loc7 [V08 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op <System.Nullable`1[int]> ; V09 OutArgs [V09 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V10 tmp1 [V10 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "NewObj constructor temp" <System.Numerics.Tests.GenericVectorTests+<>c__DisplayClass670_0`1[uint]> @@ -26,7 +26,7 @@ ;* V15 tmp6 [V15,T08] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" ; V16 tmp7 [V16,T12] ( 9, 18 ) simd32 -> mm8 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]> ;* V17 tmp8 [V17,T09] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
-; V18 tmp9 [V18,T13] ( 9, 18 ) simd32 -> mm8 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]>
+; V18 tmp9 [V18,T13] ( 9, 18 ) simd32 -> mm6 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]>
;* V19 tmp10 [V19,T10] ( 0, 0 ) ubyte -> zero-ref single-def "field V08.hasValue (fldOffset=0x0)" P-INDEP ;* V20 tmp11 [V20,T11] ( 0, 0 ) int -> zero-ref single-def "field V08.value (fldOffset=0x4)" P-INDEP ; V21 tmp12 [V21,T02] ( 4, 8 ) struct ( 8) [rsp+0x20] do-not-enreg[SF] "by-value struct argument" <System.Nullable`1[int]> @@ -104,8 +104,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M21446_IG06 vmovups ymm7, ymmword ptr [rcx+0x10] vpcmpud k1, ymm6, ymm7, 6
- vpmovm2d ymm8, k1 - vpternlogd ymm8, ymm6, ymm7, -54
+ vpblendmd ymm8 {k1}, ymm7, ymm6
mov rcx, 0xD1FFAB1E ; System.Action`2[int,uint] ; gcrRegs -[rcx] vextractf128 xmm9, ymm6, 1 @@ -151,7 +150,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx] vextractf128 xmm12, ymm8, 1
- ;; size=312 bbWeight=1 PerfScore 88.00
+ ;; size=305 bbWeight=1 PerfScore 86.83
G_M21446_IG03: ; bbWeight=1, extend call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] @@ -208,10 +207,9 @@ G_M21446_IG03: ; bbWeight=1, extend vinsertf128 ymm6, ymm6, xmm9, 1 vinsertf128 ymm7, ymm7, xmm10, 1 vpcmpud k1, ymm6, ymm7, 2
- vpmovm2d ymm8, k1 - vpternlogd ymm8, ymm6, ymm7, -54
+ vpblendmd ymm6 {k1}, ymm7, ymm6
mov rcx, 0xD1FFAB1E ; System.Action`2[int,uint]
- vextractf128 xmm6, ymm8, 1
+ vextractf128 xmm7, ymm6, 1
call CORINFO_HELP_NEWSFAST ; gcrRegs +[rax] ; gcr arg pop 0 @@ -226,79 +224,79 @@ G_M21446_IG03: ; bbWeight=1, extend ; byrRegs -[rcx] mov r8, 0xD1FFAB1E ; code for <unknown method> mov qword ptr [rsi+0x18], r8
- vinsertf128 ymm8, ymm8, xmm6, 1 - vmovd r8d, xmm8
+ vinsertf128 ymm6, ymm6, xmm7, 1 + vmovd r8d, xmm6
xor edx, edx mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vmovaps ymm0, ymm8
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 1 mov edx, 1 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vmovaps ymm0, ymm8
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 2 mov edx, 2 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- ;; size=353 bbWeight=1 PerfScore 120.75
+ vinsertf128 ymm6, ymm6, xmm8, 1 + ;; size=350 bbWeight=1 PerfScore 121.58
G_M21446_IG04: ; bbWeight=1, extend
- vinsertf128 ymm8, ymm8, xmm7, 1 - vmovaps ymm0, ymm8
+ vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 3 mov edx, 3 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vmovd r8d, xmm0 mov edx, 4 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 1 mov edx, 5 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 2 mov edx, 6 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 3 mov edx, 7 mov rcx, gword ptr [rsi+0x08] @@ -307,7 +305,7 @@ G_M21446_IG04: ; bbWeight=1, extend ; gcrRegs -[rcx rsi] ; gcr arg pop 0 nop
- ;; size=173 bbWeight=1 PerfScore 66.75
+ ;; size=166 bbWeight=1 PerfScore 64.75
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend vmovaps xmm6, xmmword ptr [rsp+0x90] vmovaps xmm7, xmmword ptr [rsp+0x80] @@ -328,7 +326,7 @@ G_M21446_IG06: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 973, prolog size 67, PerfScore 325.25, instruction count 194, allocated bytes for code 973 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 956, prolog size 67, PerfScore 322.92, instruction count 192, allocated bytes for code 956 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================ Unwind Info:

smoke_tests.nativeaot.windows.x64.checked.mch

-28 (-2.52%) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -185,12 +185,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -201,12 +200,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -214,7 +212,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -337,12 +335,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -352,12 +349,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -365,7 +361,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -432,7 +428,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1109, prolog size 86, PerfScore 1189.83, instruction count 240, allocated bytes for code 1109 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1081, prolog size 86, PerfScore 1181.83, instruction count 236, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.windows.x64.checked.mch 1 1 0 0 -28 +0
benchmarks.run_pgo.windows.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_tiered.windows.x64.checked.mch 0 0 0 0 -0 +0
coreclr_tests.run.windows.x64.checked.mch 16 16 0 0 -528 +0
libraries.crossgen2.windows.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.windows.x64.checked.mch 24 24 0 0 -499 +0
libraries_tests.run.windows.x64.Release.mch 29 29 0 0 -1,014 +0
librariestestsnotieredcompilation.run.windows.x64.Release.mch 7 7 0 0 -759 +0
realworld.run.windows.x64.checked.mch 0 0 0 0 -0 +0
smoke_tests.nativeaot.windows.x64.checked.mch 1 1 0 0 -28 +0
78 78 0 0 -2,856 +0

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.x64.checked.mch 28,086 4 28,082 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.x64.checked.mch 101,718 49,794 51,924 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.windows.x64.checked.mch 54,385 36,847 17,538 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.x64.checked.mch 573,989 340,983 233,006 0 (0.00%) 0 (0.00%)
libraries.crossgen2.windows.x64.checked.mch 243,425 15 243,410 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.x64.checked.mch 308,498 6 308,492 0 (0.00%) 0 (0.00%)
libraries_tests.run.windows.x64.Release.mch 673,287 479,208 194,079 0 (0.00%) 0 (0.00%)
librariestestsnotieredcompilation.run.windows.x64.Release.mch 320,511 21,885 298,626 0 (0.00%) 0 (0.00%)
realworld.run.windows.x64.checked.mch 36,890 3 36,887 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 32,412 11 32,401 0 (0.00%) 0 (0.00%)
2,373,201 928,756 1,444,445 0 (0.00%) 0 (0.00%)

jit-analyze output

benchmarks.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 8749502 (overridden on cmd)
Total bytes of diff: 8749474 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 24358.dasm (-2.50 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.50 % of base) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
         -28 (-2.50 % of base) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


coreclr_tests.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 393893406 (overridden on cmd)
Total bytes of diff: 393892878 (overridden on cmd)
Total bytes of delta: -528 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 174838.dasm (-2.83 % of base)
         -56 : 174841.dasm (-2.86 % of base)
         -56 : 174837.dasm (-2.83 % of base)
         -56 : 174842.dasm (-2.82 % of base)
         -32 : 429090.dasm (-0.77 % of base)
         -32 : 429091.dasm (-0.77 % of base)
         -32 : 429094.dasm (-0.78 % of base)
         -32 : 429095.dasm (-0.77 % of base)
         -28 : 174840.dasm (-1.42 % of base)
         -28 : 174836.dasm (-1.43 % of base)
         -28 : 174835.dasm (-1.46 % of base)
         -28 : 174839.dasm (-1.42 % of base)
         -16 : 429088.dasm (-0.39 % of base)
         -16 : 429092.dasm (-0.39 % of base)
         -16 : 429093.dasm (-0.39 % of base)
         -16 : 429089.dasm (-0.39 % of base)

16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-2.83 % of base) : 174838.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-2.86 % of base) : 174841.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-2.82 % of base) : 174842.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-2.83 % of base) : 174837.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -32 (-0.77 % of base) : 429091.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 429094.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429095.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429090.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -28 (-1.42 % of base) : 174840.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.46 % of base) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.43 % of base) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.42 % of base) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -16 (-0.39 % of base) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429088.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

Top method improvements (percentages):
         -56 (-2.86 % of base) : 174841.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-2.83 % of base) : 174838.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-2.83 % of base) : 174837.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -56 (-2.82 % of base) : 174842.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -28 (-1.46 % of base) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.43 % of base) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.42 % of base) : 174840.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.42 % of base) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -32 (-0.78 % of base) : 429094.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429095.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429091.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429090.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429088.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

16 total methods with Code Size differences (16 improved, 0 regressed).


libraries.pmi.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 61525850 (overridden on cmd)
Total bytes of diff: 61525351 (overridden on cmd)
Total bytes of delta: -499 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 293984.dasm (-20.66 % of base)
         -56 : 294041.dasm (-20.66 % of base)
         -32 : 27695.dasm (-10.85 % of base)
         -28 : 27702.dasm (-2.50 % of base)
         -28 : 293986.dasm (-16.28 % of base)
         -28 : 294043.dasm (-16.28 % of base)
         -26 : 27697.dasm (-8.78 % of base)
         -21 : 294040.dasm (-20.39 % of base)
         -21 : 294062.dasm (-20.39 % of base)
         -21 : 293983.dasm (-20.39 % of base)
         -21 : 294005.dasm (-20.39 % of base)
         -14 : 293982.dasm (-16.28 % of base)
         -14 : 294004.dasm (-16.28 % of base)
         -14 : 294038.dasm (-16.87 % of base)
         -14 : 294042.dasm (-13.73 % of base)
         -14 : 294061.dasm (-16.28 % of base)
         -14 : 293981.dasm (-16.87 % of base)
         -14 : 293985.dasm (-13.73 % of base)
         -14 : 294003.dasm (-16.87 % of base)
         -14 : 294039.dasm (-16.28 % of base)

24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-20.66 % of base) : 293984.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.66 % of base) : 294041.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -32 (-10.85 % of base) : 27695.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -28 (-2.50 % of base) : 27702.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -28 (-16.28 % of base) : 293986.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-16.28 % of base) : 294043.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -26 (-8.78 % of base) : 27697.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294040.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 293981.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-13.73 % of base) : 293985.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 293982.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294003.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.28 % of base) : 294004.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294038.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-13.73 % of base) : 294042.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 294039.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294060.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

Top method improvements (percentages):
         -56 (-20.66 % of base) : 293984.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.66 % of base) : 294041.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -21 (-20.39 % of base) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294040.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 293981.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294003.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294038.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294060.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.28 % of base) : 293982.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -28 (-16.28 % of base) : 293986.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 294004.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.28 % of base) : 294039.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -28 (-16.28 % of base) : 294043.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 294061.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-13.73 % of base) : 293985.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-13.73 % of base) : 294042.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -32 (-10.85 % of base) : 27695.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -26 (-8.78 % of base) : 27697.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

24 total methods with Code Size differences (24 improved, 0 regressed).


libraries_tests.run.windows.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 279744051 (overridden on cmd)
Total bytes of diff: 279743037 (overridden on cmd)
Total bytes of delta: -1014 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -368 : 393123.dasm (-15.51 % of base)
         -84 : 386605.dasm (-10.98 % of base)
         -84 : 393118.dasm (-10.81 % of base)
         -84 : 393473.dasm (-11.02 % of base)
         -84 : 386805.dasm (-10.98 % of base)
         -32 : 342443.dasm (-10.85 % of base)
         -28 : 342446.dasm (-2.40 % of base)
         -26 : 342451.dasm (-8.78 % of base)
         -14 : 385681.dasm (-16.09 % of base)
         -14 : 385680.dasm (-16.09 % of base)
         -14 : 385965.dasm (-16.09 % of base)
         -14 : 393121.dasm (-16.09 % of base)
         -14 : 393190.dasm (-16.67 % of base)
         -14 : 393191.dasm (-16.67 % of base)
         -14 : 393192.dasm (-16.09 % of base)
         -14 : 393193.dasm (-16.09 % of base)
         -14 : 385964.dasm (-16.67 % of base)
         -14 : 386233.dasm (-16.67 % of base)
         -14 : 393122.dasm (-16.09 % of base)
          -7 : 392275.dasm (-3.27 % of base)

29 total files with Code Size differences (29 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -368 (-15.51 % of base) : 393123.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
         -84 (-10.98 % of base) : 386605.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-11.02 % of base) : 393473.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.81 % of base) : 393118.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.98 % of base) : 386805.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -32 (-10.85 % of base) : 342443.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)
         -28 (-2.40 % of base) : 342446.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
         -26 (-8.78 % of base) : 342451.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
         -14 (-16.67 % of base) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385681.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393193.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385680.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393192.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 385964.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385965.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393122.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393121.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
          -7 (-5.79 % of base) : 342450.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)

Top method improvements (percentages):
         -14 (-16.67 % of base) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 385964.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385681.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393193.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385680.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393192.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385965.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393122.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393121.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
        -368 (-15.51 % of base) : 393123.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
         -84 (-11.02 % of base) : 393473.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.98 % of base) : 386605.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.98 % of base) : 386805.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -32 (-10.85 % of base) : 342443.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)
         -84 (-10.81 % of base) : 393118.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -26 (-8.78 % of base) : 342451.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
          -7 (-5.79 % of base) : 342450.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
          -7 (-5.65 % of base) : 342442.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)

29 total methods with Code Size differences (29 improved, 0 regressed).


librariestestsnotieredcompilation.run.windows.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 137525226 (overridden on cmd)
Total bytes of diff: 137524467 (overridden on cmd)
Total bytes of delta: -759 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -182 : 168894.dasm (-17.50 % of base)
        -182 : 168616.dasm (-17.50 % of base)
        -154 : 169032.dasm (-17.09 % of base)
         -98 : 168947.dasm (-15.15 % of base)
         -98 : 168825.dasm (-15.15 % of base)
         -28 : 150728.dasm (-2.50 % of base)
         -17 : 169934.dasm (-1.75 % of base)

7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -182 (-17.50 % of base) : 168894.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.50 % of base) : 168616.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.09 % of base) : 169032.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.15 % of base) : 168947.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.15 % of base) : 168825.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.50 % of base) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -17 (-1.75 % of base) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

Top method improvements (percentages):
        -182 (-17.50 % of base) : 168894.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.50 % of base) : 168616.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.09 % of base) : 169032.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.15 % of base) : 168947.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.15 % of base) : 168825.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.50 % of base) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -17 (-1.75 % of base) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

7 total methods with Code Size differences (7 improved, 0 regressed).


smoke_tests.nativeaot.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 5089881 (overridden on cmd)
Total bytes of diff: 5089853 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 19903.dasm (-2.52 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.52 % of base) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
         -28 (-2.52 % of base) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).