Assembly Diffs

linux arm64

Diffs are based on 2,498,787 contexts (1,011,240 MinOpts, 1,487,547 FullOpts).

MISSED contexts: 6,564 (0.26%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm64.checked.mch 34,614 3,148 31,466 238 (0.68%) 238 (0.68%)
benchmarks.run_pgo.linux.arm64.checked.mch 150,972 59,296 91,676 132 (0.09%) 132 (0.09%)
benchmarks.run_tiered.linux.arm64.checked.mch 71,125 53,989 17,136 82 (0.12%) 82 (0.12%)
coreclr_tests.run.linux.arm64.checked.mch 626,766 383,796 242,970 455 (0.07%) 455 (0.07%)
libraries.crossgen2.linux.arm64.checked.mch 234,183 15 234,168 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 293,135 6 293,129 1,908 (0.65%) 1,908 (0.65%)
libraries_tests.run.linux.arm64.Release.mch 733,491 489,338 244,153 1,321 (0.18%) 1,321 (0.18%)
librariestestsnotieredcompilation.run.linux.arm64.Release.mch 302,708 21,560 281,148 2,089 (0.69%) 2,089 (0.69%)
realworld.run.linux.arm64.checked.mch 32,766 85 32,681 337 (1.02%) 337 (1.02%)
smoke_tests.nativeaot.linux.arm64.checked.mch 19,027 7 19,020 2 (0.01%) 2 (0.01%)
2,498,787 1,011,240 1,487,547 6,564 (0.26%) 6,564 (0.26%)


linux x64

Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts).

MISSED contexts: 6,904 (0.27%)

Overall (-10,720 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 16,154,102 -42
coreclr_tests.run.linux.x64.checked.mch 403,032,058 -528
libraries.pmi.linux.x64.checked.mch 58,945,732 -469
libraries_tests.run.linux.x64.Release.mch 340,323,621 -8,897
librariestestsnotieredcompilation.run.linux.x64.Release.mch 131,037,400 -756
smoke_tests.nativeaot.linux.x64.checked.mch 4,193,336 -28

MinOpts (-248 bytes)

Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch 279,817,336 -192
libraries_tests.run.linux.x64.Release.mch 183,915,696 -56

FullOpts (-10,472 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 15,889,929 -42
coreclr_tests.run.linux.x64.checked.mch 123,214,722 -336
libraries.pmi.linux.x64.checked.mch 58,832,862 -469
libraries_tests.run.linux.x64.Release.mch 156,407,925 -8,841
librariestestsnotieredcompilation.run.linux.x64.Release.mch 120,378,952 -756
smoke_tests.nativeaot.linux.x64.checked.mch 4,192,387 -28

Example diffs

benchmarks.run.linux.x64.checked.mch

-14 (-4.14%) : 10422.dasm - System.SpanHelpers:ReplaceValueTypeushort (FullOpts)

@@ -161,27 +161,25 @@ G_M56402_IG17: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000 G_M56402_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, byref, isz vmovups zmm2, zmmword ptr [rdi+2*rax] vpcmpeqw k1, zmm0, zmm2
- vpmovm2w zmm3, k1 - vpternlogd zmm3, zmm1, zmm2, -54 - vmovups zmmword ptr [rsi+2*rax], zmm3
+ vpblendmw zmm2 {k1}, zmm2, zmm1 + vmovups zmmword ptr [rsi+2*rax], zmm2
add rax, 32 cmp rax, r8 jb SHORT G_M56402_IG18
- ;; size=42 bbWeight=4 PerfScore 42.00
+ ;; size=35 bbWeight=4 PerfScore 40.00
G_M56402_IG19: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, byref vmovups zmm2, zmmword ptr [rdi+2*r8] vpcmpeqw k1, zmm0, zmm2
- vpmovm2w zmm0, k1 - vpternlogd zmm0, zmm1, zmm2, -54
+ vpblendmw zmm0 {k1}, zmm2, zmm1
vmovups zmmword ptr [rsi+2*r8], zmm0
- ;; size=33 bbWeight=0.50 PerfScore 4.50
+ ;; size=26 bbWeight=0.50 PerfScore 4.25
G_M56402_IG20: ; bbWeight=0.50, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25
-; Total bytes of code 338, prolog size 7, PerfScore 172.38, instruction count 86, allocated bytes for code 338 (MethodHash=34bd23ad) for method System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
+; Total bytes of code 324, prolog size 7, PerfScore 170.12, instruction count 84, allocated bytes for code 324 (MethodHash=34bd23ad) for method System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
; ============================================================ Unwind Info:

-28 (-2.59%) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -106,13 +106,13 @@ ; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 136 @@ -194,13 +194,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -211,12 +210,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -224,7 +222,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpermq ymm4, ymm4, -40 vpmovmskb r12d, ymm4 @@ -356,13 +354,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -372,12 +369,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -385,7 +381,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -473,7 +469,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1081, prolog size 43, PerfScore 1253.25, instruction count 240, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1053, prolog size 43, PerfScore 1245.25, instruction count 236, allocated bytes for code 1053 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

coreclr_tests.run.linux.x64.checked.mch

-28 (-1.60%) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)

@@ -312,10 +312,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpd k1, ymm2, ymmword ptr [rbp-0x30], 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -366,10 +365,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpd k1, ymm2, ymm1, 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -420,10 +418,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpd k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -474,10 +471,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M1266_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpd k1, ymm1, ymmword ptr [rbp-0x30], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -641,7 +637,7 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1747, prolog size 35, PerfScore 1104.08, instruction count 335, allocated bytes for code 1747 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
+; Total bytes of code 1719, prolog size 35, PerfScore 1099.42, instruction count 331, allocated bytes for code 1719 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.57%) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)

@@ -312,10 +312,9 @@ G_M59915_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpq k1, ymm2, ymmword ptr [rbp-0x30], 2
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -366,10 +365,9 @@ G_M59915_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpq k1, ymm2, ymm1, 2
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M59915_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -420,10 +418,9 @@ G_M59915_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x70] vpcmpq k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -474,10 +471,9 @@ G_M59915_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M59915_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpq k1, ymm1, ymmword ptr [rbp-0x30], 5
- vpmovm2q ymm3, k1 - vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -641,7 +637,7 @@ G_M59915_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1783, prolog size 35, PerfScore 1110.08, instruction count 335, allocated bytes for code 1783 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 1755, prolog size 35, PerfScore 1105.42, instruction count 331, allocated bytes for code 1755 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.57%) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)

@@ -314,10 +314,9 @@ G_M8563_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x30], 2
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -368,10 +367,9 @@ G_M8563_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpw k1, ymm0, ymm2, 2
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=20 bbWeight=1 PerfScore 9.25
G_M8563_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -422,10 +420,9 @@ G_M8563_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x30], 5
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -476,10 +473,9 @@ G_M8563_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref G_M8563_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpw k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2w ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov edi, ebx vmovups ymmword ptr [rbp-0x90], ymm3 @@ -643,7 +639,7 @@ G_M8563_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1789, prolog size 35, PerfScore 1266.58, instruction count 336, allocated bytes for code 1789 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
+; Total bytes of code 1761, prolog size 35, PerfScore 1264.58, instruction count 332, allocated bytes for code 1761 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
; ============================================================ Unwind Info:

-16 (-0.39%) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)

@@ -437,14 +437,13 @@ G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x100], edi jmp G_M59915_IG25
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 4 jae G_M59915_IG54 @@ -523,14 +522,13 @@ G_M59915_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x108], edi jmp G_M59915_IG30
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 4 jae G_M59915_IG54 @@ -609,14 +607,13 @@ G_M59915_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x110], edi jmp G_M59915_IG35
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 4 jae G_M59915_IG54 @@ -695,14 +692,13 @@ G_M59915_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x118], edi jmp G_M59915_IG40
- ;; size=75 bbWeight=1 PerfScore 23.25
+ ;; size=71 bbWeight=1 PerfScore 22.25
G_M59915_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 4 jae G_M59915_IG54 @@ -964,7 +960,7 @@ G_M59915_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4101, prolog size 78, PerfScore 831.26, instruction count 657, allocated bytes for code 4101 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
+; Total bytes of code 4085, prolog size 78, PerfScore 827.26, instruction count 653, allocated bytes for code 4089 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

@@ -440,14 +440,13 @@ G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x100], edi jmp G_M8563_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 16 jae G_M8563_IG54 @@ -526,14 +525,13 @@ G_M8563_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x108], edi jmp G_M8563_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 16 jae G_M8563_IG54 @@ -612,14 +610,13 @@ G_M8563_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x110], edi jmp G_M8563_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 16 jae G_M8563_IG54 @@ -698,14 +695,13 @@ G_M8563_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x118], edi jmp G_M8563_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M8563_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 16 jae G_M8563_IG54 @@ -967,7 +963,7 @@ G_M8563_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {} int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4104, prolog size 58, PerfScore 860.01, instruction count 660, allocated bytes for code 4104 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
+; Total bytes of code 4088, prolog size 58, PerfScore 860.01, instruction count 656, allocated bytes for code 4092 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)

@@ -440,14 +440,13 @@ G_M44299_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x100], edi jmp G_M44299_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 32 jae G_M44299_IG54 @@ -526,14 +525,13 @@ G_M44299_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x108], edi jmp G_M44299_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 32 jae G_M44299_IG54 @@ -612,14 +610,13 @@ G_M44299_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x110], edi jmp G_M44299_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 32 jae G_M44299_IG54 @@ -698,14 +695,13 @@ G_M44299_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor edi, edi mov dword ptr [rbp-0x118], edi jmp G_M44299_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M44299_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 32 jae G_M44299_IG54 @@ -967,7 +963,7 @@ G_M44299_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4104, prolog size 58, PerfScore 860.01, instruction count 660, allocated bytes for code 4104 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
+; Total bytes of code 4088, prolog size 58, PerfScore 860.01, instruction count 656, allocated bytes for code 4092 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
; ============================================================ Unwind Info:

libraries.pmi.linux.x64.checked.mch

-21 (-20.19%) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -17,7 +17,7 @@ ;* V06 tmp1 [V06 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 tmp2 [V07 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T03] ( 2, 2 ) simd64 -> mm2 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T03] ( 2, 2 ) simd64 -> mm0 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; ; Lcl frame size = 0 @@ -32,26 +32,23 @@ G_M10214_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M10214_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref ; byrRegs +[rdi] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm0, zmm1, -54 - vpcmpub k1, zmm0, zmm1, 1 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm1, zmm0 + vpcmpub k2, zmm0, zmm1, 1 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [rdi], zmm0
mov rax, rdi ; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.19%) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T02] ( 4, 4 ) simd64 -> mm1 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" @@ -32,26 +32,23 @@ G_M22834_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M22834_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref ; byrRegs +[rdi] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm1, zmm0, -54 - vpcmpub k1, zmm0, zmm1, 6 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm0, zmm1 + vpcmpub k2, zmm0, zmm1, 6 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [rdi], zmm0
mov rax, rdi ; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.19%) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T01] ( 5, 5 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" @@ -32,26 +32,23 @@ G_M30188_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M30188_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref ; byrRegs +[rdi] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm0, zmm1, -54 - vpcmpub k1, zmm0, zmm1, 1 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 {k2}, zmm1, zmm0 + vpcmpub k2, zmm0, zmm1, 1 + vpblendmb zmm0 {k2}, zmm1, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm2 + vmovups zmmword ptr [rdi], zmm0
mov rax, rdi ; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M30188_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp ret ;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=29ab8a13) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=29ab8a13) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-14 (-5.19%) : 20784.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],byref):System.Runtime.Intrinsics.Vector256`1ubyte

@@ -37,7 +37,7 @@ ; V26 cse1 [V26,T09] ( 3, 3 ) simd32 -> mm5 "CSE - moderate" ; V27 cse2 [V27,T10] ( 3, 3 ) simd32 -> mm6 "CSE - moderate" ; V28 cse3 [V28,T11] ( 3, 3 ) simd32 -> mm7 "CSE - moderate"
-; V29 cse4 [V29,T12] ( 3, 3 ) simd32 -> mm9 "CSE - moderate"
+; V29 cse4 [V29,T12] ( 3, 3 ) simd32 -> mm8 "CSE - moderate"
; ; Lcl frame size = 0 @@ -68,13 +68,12 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, vpand ymm4, ymm4, ymm6 vmovups ymm7, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm4, ymm7, 6
- vpmovm2b ymm8, k1 - vmovups ymm9, ymmword ptr [reloc @RWD160] - vpsubb ymm10, ymm4, ymm9 - vpshufb ymm10, ymm1, ymm10
+ vmovups ymm8, ymmword ptr [reloc @RWD160] + vpsubb ymm9, ymm4, ymm8 + vpshufb ymm9, ymm1, ymm9
vpshufb ymm4, ymm0, ymm4
- vpternlogd ymm8, ymm10, ymm4, -54 - vpand ymm3, ymm8, ymm3
+ vpblendmb ymm4 {k1}, ymm4, ymm9 + vpand ymm3, ymm4, ymm3
vxorps ymm4, ymm4, ymm4 vpcmpeqb ymm3, ymm3, ymm4 vpcmpeqd ymm4, ymm4, ymm4 @@ -85,12 +84,11 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, vpshufb ymm4, ymm5, ymm4 vpand ymm2, ymm2, ymm6 vpcmpub k1, ymm2, ymm7, 6
- vpmovm2b ymm5, k1 - vpsubb ymm6, ymm2, ymm9 - vpshufb ymm1, ymm1, ymm6
+ vpsubb ymm5, ymm2, ymm8 + vpshufb ymm1, ymm1, ymm5
vpshufb ymm0, ymm0, ymm2
- vpternlogd ymm5, ymm1, ymm0, -54 - vpand ymm0, ymm5, ymm4
+ vpblendmb ymm0 {k1}, ymm0, ymm1 + vpand ymm0, ymm0, ymm4
vxorps ymm1, ymm1, ymm1 vpcmpeqb ymm0, ymm0, ymm1 vpcmpeqd ymm1, ymm1, ymm1 @@ -99,7 +97,7 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, vmovups ymmword ptr [rdi], ymm0 mov rax, rdi ; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 78.25
+ ;; size=234 bbWeight=1 PerfScore 77.25
G_M59405_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp @@ -113,7 +111,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 270, prolog size 7, PerfScore 91.00, instruction count 56, allocated bytes for code 270 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 256, prolog size 7, PerfScore 90.00, instruction count 54, allocated bytes for code 256 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-7 (-4.32%) : 206873.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float

@@ -40,9 +40,8 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vpandd zmm3, zmm4, dword ptr [reloc @RWD128] {1to16} vpord zmm4, zmm3, dword ptr [reloc @RWD132] {1to16} vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1 - vpslld zmm5, zmm4, 1 - vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1 + vpblendmd zmm2 {k1}, zmm2, zmm4
vpslld zmm0, zmm0, 13 vpandd zmm0, zmm0, dword ptr [reloc @RWD136] {1to16} vpaddd zmm0, zmm0, zmm2 @@ -51,7 +50,7 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vmovups zmmword ptr [rdi], zmm0 mov rax, rdi ; byrRegs +[rax]
- ;; size=140 bbWeight=1 PerfScore 29.25
+ ;; size=133 bbWeight=1 PerfScore 28.25
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp @@ -67,7 +66,7 @@ RWD132 dd 38000000h RWD136 dd 0FFFE000h
-; Total bytes of code 162, prolog size 7, PerfScore 37.00, instruction count 27, allocated bytes for code 162 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 155, prolog size 7, PerfScore 36.00, instruction count 26, allocated bytes for code 155 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================ Unwind Info:

-28 (-2.59%) : 20791.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -102,13 +102,13 @@ ; V91 cse1 [V91,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V92 cse2 [V92,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V93 cse3 [V93,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V94 cse4 [V94,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V94 cse4 [V94,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V95 cse5 [V95,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V96 cse6 [V96,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V97 cse7 [V97,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V98 cse8 [V98,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V99 cse9 [V99,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V100 cse10 [V100,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V100 cse10 [V100,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 168 @@ -187,13 +187,12 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -204,12 +203,11 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -218,7 +216,7 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vmovups ymmword ptr [rbp-0xB0], ymm4 vptest ymm4, ymm4 je SHORT G_M48875_IG07
- ;; size=265 bbWeight=4 PerfScore 336.00
+ ;; size=251 bbWeight=4 PerfScore 332.00
G_M48875_IG05: ; bbWeight=2, gcVars=0000000000000201 {V04 V05}, gcrefRegs=0000 {}, byrefRegs=6008 {rbx r13 r14}, gcvars, byref ; byrRegs -[rcx] mov edi, 1 @@ -342,13 +340,12 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -358,12 +355,11 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -371,7 +367,7 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG19
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG15: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -459,7 +455,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1080, prolog size 43, PerfScore 1246.50, instruction count 241, allocated bytes for code 1080 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1052, prolog size 43, PerfScore 1238.50, instruction count 237, allocated bytes for code 1052 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

libraries_tests.run.linux.x64.Release.mch

-28 (-20.29%) : 447031.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]](System.Runtime.Intrinsics.Vector1281[uint]):uint (Tier1)

@@ -37,30 +37,26 @@ G_M12292_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 {k1}, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vpshufd xmm0, xmm2, -79 vpcmpeqd xmm1, xmm2, xmm0 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm2, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm2, -54
+ vpblendmd xmm3 {k1}, xmm2, xmm0
vpcmpud k1, xmm2, xmm0, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm2, xmm0, -54 - vpternlogd xmm1, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm0, xmm2 + vpternlogd xmm1, xmm3, xmm0, -54
vmovd eax, xmm1
- ;; size=124 bbWeight=1 PerfScore 24.67
+ ;; size=96 bbWeight=1 PerfScore 20.00
G_M12292_IG03: ; bbWeight=1, epilog, nogc, extend pop rbp ret ;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 138, prolog size 7, PerfScore 31.42, instruction count 27, allocated bytes for code 138 (MethodHash=4174cffb) for method System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
+; Total bytes of code 110, prolog size 7, PerfScore 26.75, instruction count 23, allocated bytes for code 110 (MethodHash=4174cffb) for method System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
; ============================================================ Unwind Info:

-14 (-17.28%) : 433093.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,22 +34,20 @@ G_M36523_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm1, -54
+ vpblendmd xmm3 {k1}, xmm1, xmm0
vpcmpud k1, xmm0, xmm1, 1
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [rdi], xmm2 mov rax, rdi ; byrRegs +[rax]
- ;; size=62 bbWeight=1 PerfScore 12.58
+ ;; size=48 bbWeight=1 PerfScore 10.25
G_M36523_IG03: ; bbWeight=1, epilog, nogc, extend pop rbp ret ;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 81, prolog size 7, PerfScore 22.33, instruction count 18, allocated bytes for code 81 (MethodHash=772b7154) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 7, PerfScore 20.00, instruction count 16, allocated bytes for code 67 (MethodHash=772b7154) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================ Unwind Info:

-14 (-17.28%) : 432810.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -35,22 +35,20 @@ G_M23551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 {k1}, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [rdi], xmm2 mov rax, rdi ; byrRegs +[rax]
- ;; size=62 bbWeight=1 PerfScore 12.58
+ ;; size=48 bbWeight=1 PerfScore 10.25
G_M23551_IG03: ; bbWeight=1, epilog, nogc, extend pop rbp ret ;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 81, prolog size 7, PerfScore 22.33, instruction count 18, allocated bytes for code 81 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 7, PerfScore 20.00, instruction count 16, allocated bytes for code 67 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================ Unwind Info:

-7 (-2.86%) : 431769.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector2561[uint],System.Runtime.Intrinsics.Vector2561[uint]):System.Runtime.Intrinsics.Vector2561uint

@@ -47,9 +47,8 @@ G_M25547_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogd ymm1, ymm2, ymmword ptr [rbp+0x10], -54 vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpud k1, ymm2, ymmword ptr [rbp-0x30], 1
- vpmovm2d ymm2, k1 - vmovups ymm3, ymmword ptr [rbp+0x30] - vpternlogd ymm2, ymm3, ymmword ptr [rbp+0x10], -54
+ vmovups ymm2, ymmword ptr [rbp+0x10] + vpblendmd ymm2 {k1}, ymm2, ymmword ptr [rbp+0x30]
vpternlogd ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rdi, 0xD1FFAB1E @@ -61,7 +60,7 @@ G_M25547_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp-0x08]
- ;; size=174 bbWeight=1 PerfScore 62.50
+ ;; size=167 bbWeight=1 PerfScore 61.50
G_M25547_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 240 @@ -69,7 +68,7 @@ G_M25547_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 245, prolog size 55, PerfScore 79.33, instruction count 43, allocated bytes for code 245 (MethodHash=df969c34) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
+; Total bytes of code 238, prolog size 55, PerfScore 78.33, instruction count 42, allocated bytes for code 239 (MethodHash=df969c34) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
; ============================================================ Unwind Info:

-7 (-2.86%) : 431752.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector2561[uint],System.Runtime.Intrinsics.Vector2561[uint]):System.Runtime.Intrinsics.Vector2561uint

@@ -47,9 +47,8 @@ G_M22549_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogd ymm1, ymm2, ymmword ptr [rbp+0x10], -54 vmovups ymm2, ymmword ptr [rbp-0x30] vpcmpud k1, ymm2, ymmword ptr [rbp-0x50], 6
- vpmovm2d ymm2, k1 - vmovups ymm3, ymmword ptr [rbp+0x10] - vpternlogd ymm2, ymm3, ymmword ptr [rbp+0x30], -54
+ vmovups ymm2, ymmword ptr [rbp+0x30] + vpblendmd ymm2 {k1}, ymm2, ymmword ptr [rbp+0x10]
vpternlogd ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rdi, 0xD1FFAB1E @@ -61,7 +60,7 @@ G_M22549_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp-0x08]
- ;; size=174 bbWeight=1 PerfScore 62.50
+ ;; size=167 bbWeight=1 PerfScore 61.50
G_M22549_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 240 @@ -69,7 +68,7 @@ G_M22549_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 245, prolog size 55, PerfScore 79.33, instruction count 43, allocated bytes for code 245 (MethodHash=7ad0a7ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
+; Total bytes of code 238, prolog size 55, PerfScore 78.33, instruction count 42, allocated bytes for code 239 (MethodHash=7ad0a7ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
; ============================================================ Unwind Info:

-28 (-2.61%) : 378853.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)

@@ -103,7 +103,7 @@ ; V91 cse1 [V91,T23] ( 3, 17.08) simd32 -> mm7 "CSE - aggressive" ; V92 cse2 [V92,T24] ( 3, 17.08) simd32 -> mm8 "CSE - aggressive" ; V93 cse3 [V93,T25] ( 3, 17.08) simd32 -> mm9 "CSE - aggressive"
-; V94 cse4 [V94,T26] ( 3, 17.08) simd32 -> mm11 "CSE - aggressive"
+; V94 cse4 [V94,T26] ( 3, 17.08) simd32 -> mm10 "CSE - aggressive"
; ; Lcl frame size = 168 @@ -180,13 +180,12 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -197,14 +196,13 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11
+ vpsubb ymm7, ymm4, ymm10
vmovups ymmword ptr [rbp-0x90], ymm3
- vpshufb ymm8, ymm3, ymm8
+ vpshufb ymm7, ymm3, ymm7
vmovups ymmword ptr [rbp-0x70], ymm2 vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -213,7 +211,7 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vmovups ymmword ptr [rbp-0xB0], ymm4 vptest ymm4, ymm4 jne SHORT G_M48875_IG07
- ;; size=278 bbWeight=5.69 PerfScore 489.70
+ ;; size=264 bbWeight=5.69 PerfScore 484.00
G_M48875_IG05: ; bbWeight=4.77, gcVars=0000000000000401 {V04 V05}, gcrefRegs=0000 {}, byrefRegs=C008 {rbx r14 r15}, gcvars, byref ; byrRegs -[rcx] mov rcx, bword ptr [rbp-0xC0] @@ -335,12 +333,11 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpshufb xmm3, xmm5, xmm3 vpand xmm4, xmm4, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm4, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm4, xmmword ptr [reloc @RWD160] - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm4, xmmword ptr [reloc @RWD160] + vpshufb xmm5, xmm1, xmm5
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm5, xmm6, xmm4, -54 - vpand xmm3, xmm5, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm5 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -351,12 +348,11 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpshufb xmm4, xmm5, xmm4 vpand xmm2, xmm2, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm2, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmmword ptr [reloc @RWD160] - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmmword ptr [reloc @RWD160] + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -364,7 +360,7 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG25
- ;; size=250 bbWeight=0.07 PerfScore 4.43
+ ;; size=236 bbWeight=0.07 PerfScore 4.36
G_M48875_IG18: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rbx r14 r15}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=0.07 PerfScore 0.13 @@ -474,7 +470,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1074, prolog size 42, PerfScore 652.51, instruction count 236, allocated bytes for code 1074 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
+; Total bytes of code 1046, prolog size 42, PerfScore 646.75, instruction count 232, allocated bytes for code 1046 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
; ============================================================ Unwind Info:

librariestestsnotieredcompilation.run.linux.x64.Release.mch

-28 (-2.59%) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -106,13 +106,13 @@ ; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 136 @@ -194,13 +194,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -211,12 +210,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -224,7 +222,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpermq ymm4, ymm4, -40 vpmovmskb r12d, ymm4 @@ -356,13 +354,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -372,12 +369,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -385,7 +381,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -473,7 +469,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1081, prolog size 43, PerfScore 1253.25, instruction count 240, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1053, prolog size 43, PerfScore 1245.25, instruction count 236, allocated bytes for code 1053 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

-14 (-1.65%) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)

@@ -98,8 +98,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm1, ymmword ptr [rdi+0x10] vmovups ymmword ptr [rbp-0x50], ymm1 vpcmpud k1, ymm0, ymm1, 6
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54
+ vpblendmd ymm2 {k1}, ymm1, ymm0
vmovups ymmword ptr [rbp-0x70], ymm2 mov rdi, 0xD1FFAB1E ; System.Action`2[int,uint] ; gcrRegs -[rdi] @@ -138,9 +137,9 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- ;; size=313 bbWeight=1 PerfScore 86.50 -G_M21446_IG03: ; bbWeight=1, extend
vmovdqu xmm0, xmmword ptr [rbp-0xB0]
+ ;; size=314 bbWeight=1 PerfScore 88.33 +G_M21446_IG03: ; bbWeight=1, extend
vpextrd edx, xmm0, 3 mov esi, 3 mov rdi, gword ptr [r15+0x08] @@ -182,9 +181,8 @@ G_M21446_IG03: ; bbWeight=1, extend vmovups ymm0, ymmword ptr [rbp-0x30] vmovups ymm1, ymmword ptr [rbp-0x50] vpcmpud k1, ymm0, ymm1, 2
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54 - vmovups ymmword ptr [rbp-0x90], ymm2
+ vpblendmd ymm0 {k1}, ymm1, ymm0 + vmovups ymmword ptr [rbp-0x90], ymm0
mov rdi, 0xD1FFAB1E ; System.Action`2[int,uint] call CORINFO_HELP_NEWSFAST ; gcrRegs +[rax] @@ -199,63 +197,63 @@ G_M21446_IG03: ; bbWeight=1, extend ; byrRegs -[rdi] mov rdx, 0xD1FFAB1E ; code for <unknown method> mov qword ptr [r15+0x18], rdx
- vmovups ymm2, ymmword ptr [rbp-0x90] - vmovups ymmword ptr [rbp-0xD0], ymm2 - vmovd edx, xmm2
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vmovups ymmword ptr [rbp-0xD0], ymm0 + vmovd edx, xmm0
xor esi, esi mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0] - vpextrd edx, xmm0, 1
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0] + vpextrd edx, xmm1, 1
mov esi, 1 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0] - vpextrd edx, xmm0, 2
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0] + vpextrd edx, xmm1, 2
mov esi, 2 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0] - vpextrd edx, xmm0, 3
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0] + vpextrd edx, xmm1, 3
mov esi, 3 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1 - vmovd edx, xmm0 - ;; size=368 bbWeight=1 PerfScore 139.25 -G_M21446_IG04: ; bbWeight=1, extend
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm1, ymm0, 1 + vmovd edx, xmm1
mov esi, 4 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi]
+ ;; size=362 bbWeight=1 PerfScore 137.33 +G_M21446_IG04: ; bbWeight=1, extend
call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 1
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 1
mov esi, 5 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 2
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 2
mov esi, 6 mov rdi, gword ptr [r15+0x08] ; gcrRegs +[rdi] call [r15+0x18]<unknown method> ; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0] - vextracti128 xmm0, ymm2, 1
+ vmovups ymm0, ymmword ptr [rbp-0xD0] + vextracti128 xmm0, ymm0, 1
vpextrd edx, xmm0, 3 mov esi, 7 mov rdi, gword ptr [r15+0x08] @@ -263,7 +261,7 @@ G_M21446_IG04: ; bbWeight=1, extend call [r15+0x18]<unknown method> ; gcrRegs -[rdi r15] nop
- ;; size=113 bbWeight=1 PerfScore 48.25
+ ;; size=104 bbWeight=1 PerfScore 46.00
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 208 @@ -277,7 +275,7 @@ G_M21446_IG06: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 847, prolog size 31, PerfScore 283.75, instruction count 165, allocated bytes for code 847 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 833, prolog size 31, PerfScore 281.42, instruction count 163, allocated bytes for code 833 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================ Unwind Info:

smoke_tests.nativeaot.linux.x64.checked.mch

-28 (-2.61%) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -105,13 +105,13 @@ ; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate" ; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate" ; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate" ; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate" ; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate" ; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate" ; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
; ; Lcl frame size = 136 @@ -193,13 +193,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm6, ymm6, ymm8 vmovups ymm9, ymmword ptr [reloc @RWD128] vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1 - vmovups ymm11, ymmword ptr [reloc @RWD160] - vpsubb ymm12, ymm6, ymm11 - vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160] + vpsubb ymm11, ymm6, ymm10 + vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54 - vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -210,12 +209,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymm8 vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1 - vpsubb ymm8, ymm4, ymm11 - vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10 + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54 - vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7 + vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm4, ymm4, ymm6 vpcmpeqd ymm6, ymm6, ymm6 @@ -223,7 +221,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpermq ymm4, ymm4, -40 vpmovmskb r12d, ymm4 @@ -355,13 +353,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm4, xmm4, xmm7 vmovups xmm8, xmmword ptr [reloc @RWD128] vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1 - vmovups xmm10, xmmword ptr [reloc @RWD160] - vpsubb xmm11, xmm4, xmm10 - vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160] + vpsubb xmm10, xmm4, xmm9 + vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54 - vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10 + vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm3, xmm3, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -371,12 +368,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpshufb xmm4, xmm6, xmm4 vpand xmm2, xmm2, xmm7 vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm2, xmm10 - vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9 + vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54 - vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5 + vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4 vpcmpeqb xmm2, xmm2, xmm4 vpcmpeqd xmm4, xmm4, xmm4 @@ -384,7 +380,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r vpand xmm2, xmm3, xmm2 vptest xmm2, xmm2 je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref vpmovmskb r12d, xmm2 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -472,7 +468,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1072, prolog size 43, PerfScore 1188.50, instruction count 240, allocated bytes for code 1072 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1044, prolog size 43, PerfScore 1180.50, instruction count 236, allocated bytes for code 1044 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Cfi Info:

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.linux.x64.checked.mch 2 2 0 0 -42 +0
benchmarks.run_pgo.linux.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_tiered.linux.x64.checked.mch 0 0 0 0 -0 +0
coreclr_tests.run.linux.x64.checked.mch 16 16 0 0 -528 +0
libraries.crossgen2.linux.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.linux.x64.checked.mch 24 24 0 0 -469 +0
libraries_tests.run.linux.x64.Release.mch 75 75 0 0 -8,897 +0
librariestestsnotieredcompilation.run.linux.x64.Release.mch 7 7 0 0 -756 +0
realworld.run.linux.x64.checked.mch 0 0 0 0 -0 +0
smoke_tests.nativeaot.linux.x64.checked.mch 1 1 0 0 -28 +0
125 125 0 0 -10,720 +0

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.x64.checked.mch 42,535 3,142 39,393 322 (0.75%) 322 (0.75%)
benchmarks.run_pgo.linux.x64.checked.mch 158,221 60,171 98,050 156 (0.10%) 156 (0.10%)
benchmarks.run_tiered.linux.x64.checked.mch 56,416 42,280 14,136 84 (0.15%) 84 (0.15%)
coreclr_tests.run.linux.x64.checked.mch 596,312 354,685 241,627 459 (0.08%) 459 (0.08%)
libraries.crossgen2.linux.x64.checked.mch 234,032 15 234,017 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.x64.checked.mch 294,187 6 294,181 2,047 (0.69%) 2,047 (0.69%)
libraries_tests.run.linux.x64.Release.mch 760,255 495,575 264,680 1,397 (0.18%) 1,397 (0.18%)
librariestestsnotieredcompilation.run.linux.x64.Release.mch 303,253 21,873 281,380 2,095 (0.69%) 2,095 (0.69%)
realworld.run.linux.x64.checked.mch 32,730 9 32,721 339 (1.03%) 339 (1.03%)
smoke_tests.nativeaot.linux.x64.checked.mch 27,417 10 27,407 5 (0.02%) 5 (0.02%)
2,505,358 977,766 1,527,592 6,904 (0.27%) 6,904 (0.27%)

jit-analyze output

benchmarks.run.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16154102 (overridden on cmd)
Total bytes of diff: 16154060 (overridden on cmd)
Total bytes of delta: -42 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 27287.dasm (-2.59 % of base)
         -14 : 10422.dasm (-4.14 % of base)

2 total files with Code Size differences (2 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.59 % of base) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -14 (-4.14 % of base) : 10422.dasm - System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)

Top method improvements (percentages):
         -14 (-4.14 % of base) : 10422.dasm - System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
         -28 (-2.59 % of base) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

2 total methods with Code Size differences (2 improved, 0 regressed).


coreclr_tests.run.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 403032058 (overridden on cmd)
Total bytes of diff: 403031530 (overridden on cmd)
Total bytes of delta: -528 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 491718.dasm (-3.07 % of base)
         -56 : 491721.dasm (-3.14 % of base)
         -56 : 491717.dasm (-3.11 % of base)
         -56 : 491722.dasm (-3.09 % of base)
         -32 : 205671.dasm (-0.78 % of base)
         -32 : 205673.dasm (-0.77 % of base)
         -32 : 205678.dasm (-0.78 % of base)
         -32 : 205679.dasm (-0.77 % of base)
         -28 : 491716.dasm (-1.57 % of base)
         -28 : 491720.dasm (-1.57 % of base)
         -28 : 491715.dasm (-1.60 % of base)
         -28 : 491719.dasm (-1.57 % of base)
         -16 : 205676.dasm (-0.39 % of base)
         -16 : 205670.dasm (-0.39 % of base)
         -16 : 205669.dasm (-0.40 % of base)
         -16 : 205675.dasm (-0.39 % of base)

16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-3.07 % of base) : 491718.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-3.14 % of base) : 491721.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-3.09 % of base) : 491722.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-3.11 % of base) : 491717.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -32 (-0.77 % of base) : 205673.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 205678.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 205679.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 205671.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -28 (-1.57 % of base) : 491720.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.60 % of base) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.57 % of base) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.57 % of base) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -16 (-0.39 % of base) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.40 % of base) : 205669.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

Top method improvements (percentages):
         -56 (-3.14 % of base) : 491721.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-3.11 % of base) : 491717.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -56 (-3.09 % of base) : 491722.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-3.07 % of base) : 491718.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -28 (-1.60 % of base) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.57 % of base) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.57 % of base) : 491720.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.57 % of base) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -32 (-0.78 % of base) : 205678.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 205671.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 205679.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 205673.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -16 (-0.40 % of base) : 205669.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

16 total methods with Code Size differences (16 improved, 0 regressed).


libraries.pmi.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 58945732 (overridden on cmd)
Total bytes of diff: 58945263 (overridden on cmd)
Total bytes of delta: -469 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 207071.dasm (-20.51 % of base)
         -56 : 207128.dasm (-20.51 % of base)
         -28 : 207073.dasm (-17.18 % of base)
         -28 : 207130.dasm (-17.18 % of base)
         -28 : 20791.dasm (-2.59 % of base)
         -21 : 207070.dasm (-20.19 % of base)
         -21 : 207092.dasm (-20.19 % of base)
         -21 : 207127.dasm (-20.19 % of base)
         -21 : 207149.dasm (-20.19 % of base)
         -14 : 207125.dasm (-17.28 % of base)
         -14 : 207148.dasm (-16.67 % of base)
         -14 : 207069.dasm (-16.67 % of base)
         -14 : 207072.dasm (-15.56 % of base)
         -14 : 20784.dasm (-5.19 % of base)
         -14 : 207068.dasm (-17.28 % of base)
         -14 : 207090.dasm (-17.28 % of base)
         -14 : 207091.dasm (-16.67 % of base)
         -14 : 207126.dasm (-16.67 % of base)
         -14 : 207129.dasm (-15.56 % of base)
         -14 : 207147.dasm (-17.28 % of base)

24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-20.51 % of base) : 207071.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.51 % of base) : 207128.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -28 (-2.59 % of base) : 20791.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -28 (-17.18 % of base) : 207073.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-17.18 % of base) : 207130.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -21 (-20.19 % of base) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207092.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-5.41 % of base) : 20786.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-5.19 % of base) : 20784.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207068.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-15.56 % of base) : 207072.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.67 % of base) : 207069.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207090.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207091.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207125.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-15.56 % of base) : 207129.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.67 % of base) : 207126.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207147.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

Top method improvements (percentages):
         -56 (-20.51 % of base) : 207071.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.51 % of base) : 207128.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -21 (-20.19 % of base) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207092.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.19 % of base) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207068.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207090.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207125.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-17.28 % of base) : 207147.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -28 (-17.18 % of base) : 207073.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-17.18 % of base) : 207130.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -14 (-16.67 % of base) : 207069.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207091.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207126.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.67 % of base) : 207148.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-15.56 % of base) : 207072.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-15.56 % of base) : 207129.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
          -7 (-5.51 % of base) : 20787.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-5.41 % of base) : 20786.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

24 total methods with Code Size differences (24 improved, 0 regressed).


libraries_tests.run.linux.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 340323621 (overridden on cmd)
Total bytes of diff: 340314724 (overridden on cmd)
Total bytes of delta: -8897 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -350 : 432506.dasm (-19.84 % of base)
        -350 : 433621.dasm (-15.43 % of base)
        -350 : 433873.dasm (-15.43 % of base)
        -350 : 437375.dasm (-15.26 % of base)
        -350 : 437419.dasm (-15.26 % of base)
        -350 : 439257.dasm (-19.56 % of base)
        -350 : 439478.dasm (-15.26 % of base)
        -350 : 439729.dasm (-15.26 % of base)
        -350 : 446906.dasm (-15.43 % of base)
        -350 : 446913.dasm (-17.22 % of base)
        -350 : 446980.dasm (-15.43 % of base)
        -350 : 448831.dasm (-15.26 % of base)
        -350 : 448983.dasm (-15.26 % of base)
        -350 : 449381.dasm (-15.26 % of base)
        -350 : 449490.dasm (-15.26 % of base)
        -350 : 449527.dasm (-17.02 % of base)
        -350 : 449532.dasm (-19.56 % of base)
        -350 : 432478.dasm (-17.22 % of base)
        -182 : 443696.dasm (-17.43 % of base)
        -182 : 443767.dasm (-17.42 % of base)

62 total files with Code Size differences (62 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -350 (-17.22 % of base) : 446913.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-17.22 % of base) : 432478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-19.84 % of base) : 432506.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-17.02 % of base) : 449527.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 439257.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 449532.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 433873.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 446906.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 433621.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.43 % of base) : 446980.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 437375.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 439478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 448831.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 449381.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 437419.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 439729.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 448983.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -350 (-15.26 % of base) : 449490.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
        -182 (-17.43 % of base) : 443696.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.43 % of base) : 446405.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)

Top method improvements (percentages):
         -28 (-20.29 % of base) : 447031.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
        -350 (-19.84 % of base) : 432506.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 439257.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -350 (-19.56 % of base) : 449532.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
        -182 (-17.43 % of base) : 443696.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.43 % of base) : 446405.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.42 % of base) : 443767.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
        -182 (-17.42 % of base) : 446518.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
         -14 (-17.28 % of base) : 437547.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]](System.Runtime.Intrinsics.Vector128`1[ulong]):ulong (Tier1)
         -14 (-17.28 % of base) : 449076.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.Runtime.Intrinsics.Vector128`1[ulong]):ulong (Tier1)
         -14 (-17.28 % of base) : 432810.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.28 % of base) : 447015.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.28 % of base) : 433842.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -14 (-17.28 % of base) : 433093.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
        -350 (-17.22 % of base) : 446913.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
        -350 (-17.22 % of base) : 432478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
         -14 (-17.07 % of base) : 437311.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-17.07 % of base) : 437354.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-17.07 % of base) : 439184.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-17.07 % of base) : 449015.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)


librariestestsnotieredcompilation.run.linux.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 131037400 (overridden on cmd)
Total bytes of diff: 131036644 (overridden on cmd)
Total bytes of delta: -756 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -182 : 160167.dasm (-17.91 % of base)
        -182 : 160726.dasm (-17.91 % of base)
        -154 : 160688.dasm (-17.21 % of base)
         -98 : 160519.dasm (-15.38 % of base)
         -98 : 160706.dasm (-15.38 % of base)
         -28 : 142512.dasm (-2.59 % of base)
         -14 : 161431.dasm (-1.65 % of base)

7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -182 (-17.91 % of base) : 160726.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.91 % of base) : 160167.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.21 % of base) : 160688.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.38 % of base) : 160519.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.38 % of base) : 160706.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.59 % of base) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -14 (-1.65 % of base) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

Top method improvements (percentages):
        -182 (-17.91 % of base) : 160726.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.91 % of base) : 160167.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.21 % of base) : 160688.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.38 % of base) : 160519.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.38 % of base) : 160706.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.59 % of base) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -14 (-1.65 % of base) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

7 total methods with Code Size differences (7 improved, 0 regressed).


smoke_tests.nativeaot.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 4193336 (overridden on cmd)
Total bytes of diff: 4193308 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 2104.dasm (-2.61 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.61 % of base) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
         -28 (-2.61 % of base) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).



osx arm64

Diffs are based on 2,229,935 contexts (927,360 MinOpts, 1,302,575 FullOpts).

MISSED contexts: 6,082 (0.27%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run_pgo.osx.arm64.checked.mch 84,643 48,345 36,298 183 (0.22%) 183 (0.22%)
benchmarks.run_tiered.osx.arm64.checked.mch 48,253 37,331 10,922 63 (0.13%) 63 (0.13%)
coreclr_tests.run.osx.arm64.checked.mch 586,148 358,028 228,120 437 (0.07%) 437 (0.07%)
libraries.crossgen2.osx.arm64.checked.mch 233,760 15 233,745 0 (0.00%) 0 (0.00%)
libraries.pmi.osx.arm64.checked.mch 313,588 18 313,570 2,028 (0.64%) 2,028 (0.64%)
libraries_tests.run.osx.arm64.Release.mch 631,294 462,062 169,232 963 (0.15%) 963 (0.15%)
librariestestsnotieredcompilation.run.osx.arm64.Release.mch 301,031 21,558 279,473 2,083 (0.69%) 2,083 (0.69%)
realworld.run.osx.arm64.checked.mch 31,218 3 31,215 325 (1.03%) 325 (1.03%)
2,229,935 927,360 1,302,575 6,082 (0.27%) 6,082 (0.27%)


windows arm64

Diffs are based on 2,308,464 contexts (929,692 MinOpts, 1,378,772 FullOpts).

MISSED contexts: 6,334 (0.27%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.arm64.checked.mch 24,218 4 24,214 229 (0.94%) 229 (0.94%)
benchmarks.run_pgo.windows.arm64.checked.mch 96,879 48,066 48,813 104 (0.11%) 104 (0.11%)
benchmarks.run_tiered.windows.arm64.checked.mch 48,412 36,693 11,719 61 (0.13%) 61 (0.13%)
coreclr_tests.run.windows.arm64.checked.mch 595,265 362,539 232,726 438 (0.07%) 438 (0.07%)
libraries.crossgen2.windows.arm64.checked.mch 243,831 15 243,816 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.arm64.checked.mch 302,817 6 302,811 2,054 (0.67%) 2,054 (0.67%)
libraries_tests.run.windows.arm64.Release.mch 625,154 460,799 164,355 900 (0.14%) 900 (0.14%)
librariestestsnotieredcompilation.run.windows.arm64.Release.mch 314,858 21,559 293,299 2,179 (0.69%) 2,179 (0.69%)
realworld.run.windows.arm64.checked.mch 32,878 3 32,875 366 (1.10%) 366 (1.10%)
smoke_tests.nativeaot.windows.arm64.checked.mch 24,152 8 24,144 3 (0.01%) 3 (0.01%)
2,308,464 929,692 1,378,772 6,334 (0.27%) 6,334 (0.27%)


windows x64

Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts).

MISSED contexts: 6,788 (0.29%)

Overall (-2,856 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,538,947 -28
coreclr_tests.run.windows.x64.checked.mch 393,235,161 -528
libraries.pmi.windows.x64.checked.mch 60,138,217 -499
libraries_tests.run.windows.x64.Release.mch 278,258,792 -1,014
librariestestsnotieredcompilation.run.windows.x64.Release.mch 135,861,173 -759
smoke_tests.nativeaot.windows.x64.checked.mch 5,087,315 -28

MinOpts (-248 bytes)

Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.windows.x64.checked.mch 273,504,444 -192
libraries_tests.run.windows.x64.Release.mch 175,002,442 -56

FullOpts (-2,608 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,538,586 -28
coreclr_tests.run.windows.x64.checked.mch 119,730,717 -336
libraries.pmi.windows.x64.checked.mch 60,024,698 -499
libraries_tests.run.windows.x64.Release.mch 103,256,350 -958
librariestestsnotieredcompilation.run.windows.x64.Release.mch 124,984,011 -759
smoke_tests.nativeaot.windows.x64.checked.mch 5,086,368 -28

Example diffs

benchmarks.run.windows.x64.checked.mch

-28 (-2.50%) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

coreclr_tests.run.windows.x64.checked.mch

-28 (-1.46%) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)

@@ -331,10 +331,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm8, ymm6, 2
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -389,10 +388,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm8, ymm7, 2
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -447,10 +445,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm8, ymm6, 5
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -505,10 +502,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M1266_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpd k1, ymm7, ymm6, 5
- vpmovm2d ymm9, k1 - vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -687,7 +683,7 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1922, prolog size 82, PerfScore 1043.58, instruction count 381, allocated bytes for code 1922 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
+; Total bytes of code 1894, prolog size 82, PerfScore 1038.92, instruction count 377, allocated bytes for code 1894 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.43%) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)

@@ -331,10 +331,9 @@ G_M59915_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm8, ymm6, 2
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -389,10 +388,9 @@ G_M59915_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm8, ymm7, 2
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -447,10 +445,9 @@ G_M59915_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm8, ymm6, 5
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -505,10 +502,9 @@ G_M59915_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M59915_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpq k1, ymm7, ymm6, 5
- vpmovm2q ymm9, k1 - vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -687,7 +683,7 @@ G_M59915_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1956, prolog size 82, PerfScore 1049.58, instruction count 381, allocated bytes for code 1956 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 1928, prolog size 82, PerfScore 1044.92, instruction count 377, allocated bytes for code 1928 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================ Unwind Info:

-28 (-1.42%) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)

@@ -334,10 +334,9 @@ G_M8563_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm6, ymm7, 2
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -392,10 +391,9 @@ G_M8563_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm6, ymm8, 2
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -450,10 +448,9 @@ G_M8563_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm6, ymm7, 5
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -508,10 +505,9 @@ G_M8563_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ;; size=11 bbWeight=4 PerfScore 6.00 G_M8563_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpcmpw k1, ymm8, ymm7, 5
- vpmovm2w ymm9, k1 - vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz mov ecx, ebx vmovups ymmword ptr [rsp+0x20], ymm9 @@ -690,7 +686,7 @@ G_M8563_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ret ;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1968, prolog size 82, PerfScore 1206.33, instruction count 383, allocated bytes for code 1968 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
+; Total bytes of code 1940, prolog size 82, PerfScore 1204.33, instruction count 379, allocated bytes for code 1940 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
; ============================================================ Unwind Info:

-16 (-0.39%) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)

@@ -437,14 +437,13 @@ G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x100], ecx jmp G_M59915_IG25
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 4 jae G_M59915_IG54 @@ -523,14 +522,13 @@ G_M59915_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x108], ecx jmp G_M59915_IG30
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 4 jae G_M59915_IG54 @@ -609,14 +607,13 @@ G_M59915_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x110], ecx jmp G_M59915_IG35
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 4 jae G_M59915_IG54 @@ -695,14 +692,13 @@ G_M59915_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M59915_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x118], ecx jmp G_M59915_IG40
- ;; size=75 bbWeight=1 PerfScore 23.25
+ ;; size=71 bbWeight=1 PerfScore 22.25
G_M59915_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 4 jae G_M59915_IG54 @@ -964,7 +960,7 @@ G_M59915_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4100, prolog size 77, PerfScore 831.26, instruction count 657, allocated bytes for code 4100 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
+; Total bytes of code 4084, prolog size 77, PerfScore 827.26, instruction count 653, allocated bytes for code 4088 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)

@@ -443,14 +443,13 @@ G_M44299_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x100], ecx jmp G_M44299_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 32 jae G_M44299_IG54 @@ -529,14 +528,13 @@ G_M44299_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x108], ecx jmp G_M44299_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 32 jae G_M44299_IG54 @@ -615,14 +613,13 @@ G_M44299_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x110], ecx jmp G_M44299_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 32 jae G_M44299_IG54 @@ -701,14 +698,13 @@ G_M44299_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M44299_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x118], ecx jmp G_M44299_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M44299_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 32 jae G_M44299_IG54 @@ -970,7 +966,7 @@ G_M44299_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4123, prolog size 77, PerfScore 865.01, instruction count 663, allocated bytes for code 4123 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
+; Total bytes of code 4107, prolog size 77, PerfScore 865.01, instruction count 659, allocated bytes for code 4111 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
; ============================================================ Unwind Info:

-16 (-0.39%) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

@@ -443,14 +443,13 @@ G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG18 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x100], ecx jmp G_M8563_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x100], 16 jae G_M8563_IG54 @@ -529,14 +528,13 @@ G_M8563_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG23 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x108], ecx jmp G_M8563_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x108], 16 jae G_M8563_IG54 @@ -615,14 +613,13 @@ G_M8563_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG28 vmovups ymm0, ymmword ptr [rbp-0x70] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x110], ecx jmp G_M8563_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x110], 16 jae G_M8563_IG54 @@ -701,14 +698,13 @@ G_M8563_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M8563_IG33 vmovups ymm0, ymmword ptr [rbp-0x90] vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1 - vmovups ymm1, ymmword ptr [rbp-0x70] - vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90] + vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0 xor ecx, ecx mov dword ptr [rbp-0x118], ecx jmp G_M8563_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M8563_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref cmp dword ptr [rbp-0x118], 16 jae G_M8563_IG54 @@ -970,7 +966,7 @@ G_M8563_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {} int3 ;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4123, prolog size 77, PerfScore 865.01, instruction count 663, allocated bytes for code 4123 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
+; Total bytes of code 4107, prolog size 77, PerfScore 865.01, instruction count 659, allocated bytes for code 4111 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
; ============================================================ Unwind Info:

libraries.pmi.windows.x64.checked.mch

-21 (-20.39%) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -17,7 +17,7 @@ ; V06 tmp1 [V06,T05] ( 3, 3 ) simd64 -> mm1 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V07 tmp2 [V07,T06] ( 3, 3 ) simd64 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm0 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V11 cse0 [V11,T03] ( 5, 5 ) simd64 -> mm0 "CSE - aggressive" ; V12 cse1 [V12,T04] ( 4, 4 ) simd64 -> mm2 "CSE - aggressive" @@ -34,25 +34,22 @@ G_M27576_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vmovups zmm2, zmmword ptr [r8] vmovaps zmm3, zmm2 vpcmpeqb k1, zmm1, zmm3
- vpmovm2b zmm4, k1 - vxorps ymm5, ymm5, ymm5 - vpcmpub k1, zmm0, zmm5, 1 - vpmovm2b zmm5, k1 - vpternlogd zmm5, zmm2, zmm0, -54 - vpcmpub k1, zmm1, zmm3, 6 - vpmovm2b zmm1, k1 - vpternlogd zmm1, zmm0, zmm2, -54 - vpternlogd zmm4, zmm5, zmm1, -54 - vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4 + vpcmpub k2, zmm0, zmm4, 1 + vpblendmb zmm4 {k2}, zmm0, zmm2 + vpcmpub k2, zmm1, zmm3, 6 + vpblendmb zmm0 {k2}, zmm2, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm4 + vmovups zmmword ptr [rcx], zmm0
mov rax, rcx ; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M27576_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret ;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.39%) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -17,7 +17,7 @@ ; V06 tmp1 [V06,T05] ( 3, 3 ) simd64 -> mm1 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V07 tmp2 [V07,T06] ( 3, 3 ) simd64 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm0 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V11 cse0 [V11,T03] ( 5, 5 ) simd64 -> mm2 "CSE - aggressive" ; V12 cse1 [V12,T04] ( 4, 4 ) simd64 -> mm0 "CSE - aggressive" @@ -34,25 +34,22 @@ G_M10214_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vmovups zmm2, zmmword ptr [r8] vmovaps zmm3, zmm2 vpcmpeqb k1, zmm3, zmm1
- vpmovm2b zmm4, k1 - vxorps ymm5, ymm5, ymm5 - vpcmpub k1, zmm2, zmm5, 1 - vpmovm2b zmm5, k1 - vpternlogd zmm5, zmm2, zmm0, -54 - vpcmpub k1, zmm3, zmm1, 1 - vpmovm2b zmm1, k1 - vpternlogd zmm1, zmm2, zmm0, -54 - vpternlogd zmm4, zmm5, zmm1, -54 - vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4 + vpcmpub k2, zmm2, zmm4, 1 + vpblendmb zmm4 {k2}, zmm0, zmm2 + vpcmpub k2, zmm3, zmm1, 1 + vpblendmb zmm0 {k2}, zmm0, zmm2 + vpblendmb zmm0 {k1}, zmm0, zmm4 + vmovups zmmword ptr [rcx], zmm0
mov rax, rcx ; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret ;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-21 (-20.39%) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T01] ( 3, 6 ) byref -> r8 single-def ; V03 loc0 [V03,T05] ( 3, 3 ) simd64 -> mm1 <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; V04 loc1 [V04,T06] ( 3, 3 ) simd64 -> mm3 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T07] ( 2, 2 ) simd64 -> mm4 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T07] ( 2, 2 ) simd64 -> mm0 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" @@ -34,25 +34,22 @@ G_M22834_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vmovups zmm2, zmmword ptr [r8] vmovaps zmm3, zmm2 vpcmpeqb k1, zmm1, zmm3
- vpmovm2b zmm4, k1 - vxorps ymm5, ymm5, ymm5 - vpcmpub k1, zmm0, zmm5, 1 - vpmovm2b zmm5, k1 - vpternlogd zmm5, zmm2, zmm0, -54 - vpcmpub k1, zmm1, zmm3, 6 - vpmovm2b zmm1, k1 - vpternlogd zmm1, zmm0, zmm2, -54 - vpternlogd zmm4, zmm5, zmm1, -54 - vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4 + vpcmpub k2, zmm0, zmm4, 1 + vpblendmb zmm4 {k2}, zmm0, zmm2 + vpcmpub k2, zmm1, zmm3, 6 + vpblendmb zmm0 {k2}, zmm2, zmm0 + vpblendmb zmm0 {k1}, zmm0, zmm4 + vmovups zmmword ptr [rcx], zmm0
mov rax, rcx ; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret ;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-7 (-5.65%) : 27696.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte]):System.Runtime.Intrinsics.Vector2561ubyte

@@ -35,14 +35,13 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpshufb ymm1, ymm2, ymm1 vpand ymm0, ymm0, ymmword ptr [reloc @RWD64] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD96], 6
- vpmovm2b ymm2, k1 - vmovups ymm3, ymmword ptr [r8] - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD128] - vpshufb ymm3, ymm3, ymm4 - vmovups ymm4, ymmword ptr [rdx] - vpshufb ymm0, ymm4, ymm0 - vpternlogd ymm2, ymm3, ymm0, -54 - vpand ymm0, ymm2, ymm1
+ vmovups ymm2, ymmword ptr [r8] + vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD128] + vpshufb ymm2, ymm2, ymm3 + vmovups ymm3, ymmword ptr [rdx] + vpshufb ymm0, ymm3, ymm0 + vpblendmb ymm0 {k1}, ymm0, ymm2 + vpand ymm0, ymm0, ymm1
vxorps ymm1, ymm1, ymm1 vpcmpeqb ymm0, ymm0, ymm1 vpcmpeqd ymm1, ymm1, ymm1 @@ -50,7 +49,7 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vmovups ymmword ptr [rcx], ymm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=117 bbWeight=1 PerfScore 42.75
+ ;; size=110 bbWeight=1 PerfScore 42.25
G_M53822_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret @@ -62,7 +61,7 @@ RWD96 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD128 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 124, prolog size 3, PerfScore 45.75, instruction count 24, allocated bytes for code 124 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 117, prolog size 3, PerfScore 45.25, instruction count 23, allocated bytes for code 117 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================ Unwind Info:

-7 (-4.58%) : 293786.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float

@@ -39,9 +39,8 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpandd zmm3, zmm4, dword ptr [reloc @RWD128] {1to16} vpord zmm4, zmm3, dword ptr [reloc @RWD132] {1to16} vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1 - vpslld zmm5, zmm4, 1 - vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1 + vpblendmd zmm2 {k1}, zmm2, zmm4
vpslld zmm0, zmm0, 13 vpandd zmm0, zmm0, dword ptr [reloc @RWD136] {1to16} vpaddd zmm0, zmm0, zmm2 @@ -50,7 +49,7 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vmovups zmmword ptr [rcx], zmm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=146 bbWeight=1 PerfScore 33.25
+ ;; size=139 bbWeight=1 PerfScore 32.25
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper ret @@ -65,7 +64,7 @@ RWD132 dd 38000000h RWD136 dd 0FFFE000h
-; Total bytes of code 153, prolog size 3, PerfScore 36.25, instruction count 24, allocated bytes for code 153 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 146, prolog size 3, PerfScore 35.25, instruction count 23, allocated bytes for code 146 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================ Unwind Info:

-28 (-2.50%) : 27702.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

libraries_tests.run.windows.x64.Release.mch

-14 (-16.67%) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong

@@ -37,21 +37,19 @@ G_M11551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vpcmpeqq xmm4, xmm1, xmm3 vxorps xmm5, xmm5, xmm5 vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1 - vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1 - vpternlogq xmm1, xmm0, xmm2, -54 - vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0 + vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4 mov rax, rcx ; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M11551_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================ Unwind Info:

-14 (-16.67%) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong

@@ -36,21 +36,19 @@ G_M1813_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r8 vpcmpeqq xmm4, xmm1, xmm3 vxorps xmm5, xmm5, xmm5 vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1 - vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1 - vpternlogq xmm1, xmm0, xmm2, -54 - vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0 + vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4 mov rax, rcx ; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M1813_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=f881f8ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=f881f8ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================ Unwind Info:

-14 (-16.67%) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong

@@ -37,21 +37,19 @@ G_M11551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r vpcmpeqq xmm4, xmm1, xmm3 vxorps xmm5, xmm5, xmm5 vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1 - vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1 - vpternlogq xmm1, xmm0, xmm2, -54 - vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0 + vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4 mov rax, rcx ; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M11551_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================ Unwind Info:

-7 (-2.52%) : 392581.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector2561[ulong],System.Runtime.Intrinsics.Vector2561[ulong]):System.Runtime.Intrinsics.Vector2561ulong

@@ -59,16 +59,17 @@ G_M12395_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogq ymm1, ymm2, ymmword ptr [rcx], -54 vmovups ymm2, ymmword ptr [rbp-0x50] vpcmpuq k1, ymm2, ymmword ptr [rbp-0x30], 1
- vpmovm2q ymm2, k1
mov rcx, bword ptr [rbp+0x20]
- vmovups ymm3, ymmword ptr [rcx] - mov rcx, bword ptr [rbp+0x18] - vpternlogq ymm2, ymm3, ymmword ptr [rcx], -54
+ mov rax, bword ptr [rbp+0x18] + ; byrRegs +[rax] + vmovups ymm2, ymmword ptr [rax] + vpblendmq ymm2 {k1}, ymm2, ymmword ptr [rcx]
vpternlogq ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rcx, 0xD1FFAB1E ; byrRegs -[rcx] call CORINFO_HELP_COUNTPROFILE32
+ ; byrRegs -[rax]
mov rcx, 0xD1FFAB1E call CORINFO_HELP_COUNTPROFILE32 mov rax, bword ptr [rbp+0x10] @@ -76,7 +77,7 @@ G_M12395_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp+0x10]
- ;; size=200 bbWeight=1 PerfScore 77.00
+ ;; size=193 bbWeight=1 PerfScore 76.00
G_M12395_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 272 @@ -84,7 +85,7 @@ G_M12395_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 278, prolog size 54, PerfScore 95.83, instruction count 53, allocated bytes for code 278 (MethodHash=3d67cf94) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
+; Total bytes of code 271, prolog size 54, PerfScore 94.83, instruction count 52, allocated bytes for code 272 (MethodHash=3d67cf94) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
; ============================================================ Unwind Info:

-7 (-2.52%) : 392539.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector2561[ulong],System.Runtime.Intrinsics.Vector2561[ulong]):System.Runtime.Intrinsics.Vector2561ulong

@@ -59,16 +59,17 @@ G_M63669_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogq ymm1, ymm2, ymmword ptr [rcx], -54 vmovups ymm2, ymmword ptr [rbp-0x30] vpcmpuq k1, ymm2, ymmword ptr [rbp-0x50], 6
- vpmovm2q ymm2, k1
mov rcx, bword ptr [rbp+0x18]
- vmovups ymm3, ymmword ptr [rcx] - mov rcx, bword ptr [rbp+0x20] - vpternlogq ymm2, ymm3, ymmword ptr [rcx], -54
+ mov rax, bword ptr [rbp+0x20] + ; byrRegs +[rax] + vmovups ymm2, ymmword ptr [rax] + vpblendmq ymm2 {k1}, ymm2, ymmword ptr [rcx]
vpternlogq ymm0, ymm1, ymm2, -54 vmovups ymmword ptr [rbp-0x70], ymm0 mov rcx, 0xD1FFAB1E ; byrRegs -[rcx] call CORINFO_HELP_COUNTPROFILE32
+ ; byrRegs -[rax]
mov rcx, 0xD1FFAB1E call CORINFO_HELP_COUNTPROFILE32 mov rax, bword ptr [rbp+0x10] @@ -76,7 +77,7 @@ G_M63669_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vmovups ymm0, ymmword ptr [rbp-0x70] vmovups ymmword ptr [rax], ymm0 mov rax, bword ptr [rbp+0x10]
- ;; size=200 bbWeight=1 PerfScore 77.00
+ ;; size=193 bbWeight=1 PerfScore 76.00
G_M63669_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper add rsp, 272 @@ -84,7 +85,7 @@ G_M63669_IG03: ; bbWeight=1, epilog, nogc, extend ret ;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 278, prolog size 54, PerfScore 95.83, instruction count 53, allocated bytes for code 278 (MethodHash=dc23074a) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
+; Total bytes of code 271, prolog size 54, PerfScore 94.83, instruction count 52, allocated bytes for code 272 (MethodHash=dc23074a) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
; ============================================================ Unwind Info:

-28 (-2.40%) : 342446.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)

@@ -172,12 +172,11 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -188,12 +187,11 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -201,7 +199,7 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpand ymm10, ymm1, ymm0 vptest ymm10, ymm10 jne SHORT G_M48875_IG07
- ;; size=248 bbWeight=4.37 PerfScore 346.74
+ ;; size=234 bbWeight=4.37 PerfScore 342.37
G_M48875_IG05: ; bbWeight=3.45, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref add r15, 64 cmp r15, rsi @@ -318,12 +316,11 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb xmm1, xmm3, xmm1 vpand xmm2, xmm2, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm2, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmmword ptr [reloc @RWD160] - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmmword ptr [reloc @RWD160] + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -334,12 +331,11 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpshufb xmm2, xmm3, xmm2 vpand xmm0, xmm0, xmmword ptr [reloc @RWD96] vpcmpub k1, xmm0, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmmword ptr [reloc @RWD160] - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmmword ptr [reloc @RWD160] + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -347,7 +343,7 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG21
- ;; size=248 bbWeight=0.08 PerfScore 5.05
+ ;; size=234 bbWeight=0.08 PerfScore 4.97
G_M48875_IG18: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=0.07 PerfScore 0.14 @@ -444,7 +440,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1165, prolog size 86, PerfScore 510.92, instruction count 247, allocated bytes for code 1165 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
+; Total bytes of code 1137, prolog size 86, PerfScore 506.47, instruction count 243, allocated bytes for code 1137 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
; ============================================================ Unwind Info:

librariestestsnotieredcompilation.run.windows.x64.Release.mch

-28 (-2.50%) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

-17 (-1.75%) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)

@@ -15,7 +15,7 @@ ;* V04 loc3 [V04 ] ( 0, 0 ) simd32 -> zero-ref <System.Numerics.Vector`1[uint]> ; V05 loc4 [V05,T16] ( 2, 2 ) simd32 -> mm8 <System.Numerics.Vector`1[uint]> ;* V06 loc5 [V06 ] ( 0, 0 ) simd32 -> zero-ref <System.Numerics.Vector`1[uint]>
-; V07 loc6 [V07,T17] ( 2, 2 ) simd32 -> mm8 <System.Numerics.Vector`1[uint]>
+; V07 loc6 [V07,T17] ( 2, 2 ) simd32 -> mm6 <System.Numerics.Vector`1[uint]>
;* V08 loc7 [V08 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op <System.Nullable`1[int]> ; V09 OutArgs [V09 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V10 tmp1 [V10 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "NewObj constructor temp" <System.Numerics.Tests.GenericVectorTests+<>c__DisplayClass670_0`1[uint]> @@ -26,7 +26,7 @@ ;* V15 tmp6 [V15,T08] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" ; V16 tmp7 [V16,T12] ( 9, 18 ) simd32 -> mm8 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]> ;* V17 tmp8 [V17,T09] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
-; V18 tmp9 [V18,T13] ( 9, 18 ) simd32 -> mm8 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]>
+; V18 tmp9 [V18,T13] ( 9, 18 ) simd32 -> mm6 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]>
;* V19 tmp10 [V19,T10] ( 0, 0 ) ubyte -> zero-ref single-def "field V08.hasValue (fldOffset=0x0)" P-INDEP ;* V20 tmp11 [V20,T11] ( 0, 0 ) int -> zero-ref single-def "field V08.value (fldOffset=0x4)" P-INDEP ; V21 tmp12 [V21,T02] ( 4, 8 ) struct ( 8) [rsp+0x20] do-not-enreg[SF] "by-value struct argument" <System.Nullable`1[int]> @@ -104,8 +104,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref jl G_M21446_IG06 vmovups ymm7, ymmword ptr [rcx+0x10] vpcmpud k1, ymm6, ymm7, 6
- vpmovm2d ymm8, k1 - vpternlogd ymm8, ymm6, ymm7, -54
+ vpblendmd ymm8 {k1}, ymm7, ymm6
mov rcx, 0xD1FFAB1E ; System.Action`2[int,uint] ; gcrRegs -[rcx] vextractf128 xmm9, ymm6, 1 @@ -151,7 +150,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx] vextractf128 xmm12, ymm8, 1
- ;; size=312 bbWeight=1 PerfScore 88.00
+ ;; size=305 bbWeight=1 PerfScore 86.83
G_M21446_IG03: ; bbWeight=1, extend call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] @@ -208,10 +207,9 @@ G_M21446_IG03: ; bbWeight=1, extend vinsertf128 ymm6, ymm6, xmm9, 1 vinsertf128 ymm7, ymm7, xmm10, 1 vpcmpud k1, ymm6, ymm7, 2
- vpmovm2d ymm8, k1 - vpternlogd ymm8, ymm6, ymm7, -54
+ vpblendmd ymm6 {k1}, ymm7, ymm6
mov rcx, 0xD1FFAB1E ; System.Action`2[int,uint]
- vextractf128 xmm6, ymm8, 1
+ vextractf128 xmm7, ymm6, 1
call CORINFO_HELP_NEWSFAST ; gcrRegs +[rax] ; gcr arg pop 0 @@ -226,79 +224,79 @@ G_M21446_IG03: ; bbWeight=1, extend ; byrRegs -[rcx] mov r8, 0xD1FFAB1E ; code for <unknown method> mov qword ptr [rsi+0x18], r8
- vinsertf128 ymm8, ymm8, xmm6, 1 - vmovd r8d, xmm8
+ vinsertf128 ymm6, ymm6, xmm7, 1 + vmovd r8d, xmm6
xor edx, edx mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vmovaps ymm0, ymm8
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 1 mov edx, 1 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vmovaps ymm0, ymm8
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 2 mov edx, 2 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- ;; size=353 bbWeight=1 PerfScore 120.75
+ vinsertf128 ymm6, ymm6, xmm8, 1 + ;; size=350 bbWeight=1 PerfScore 121.58
G_M21446_IG04: ; bbWeight=1, extend
- vinsertf128 ymm8, ymm8, xmm7, 1 - vmovaps ymm0, ymm8
+ vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 3 mov edx, 3 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vmovd r8d, xmm0 mov edx, 4 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 1 mov edx, 5 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 2 mov edx, 6 mov rcx, gword ptr [rsi+0x08] ; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method> ; gcrRegs -[rcx] ; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1 - vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1 + vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 3 mov edx, 7 mov rcx, gword ptr [rsi+0x08] @@ -307,7 +305,7 @@ G_M21446_IG04: ; bbWeight=1, extend ; gcrRegs -[rcx rsi] ; gcr arg pop 0 nop
- ;; size=173 bbWeight=1 PerfScore 66.75
+ ;; size=166 bbWeight=1 PerfScore 64.75
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend vmovaps xmm6, xmmword ptr [rsp+0x90] vmovaps xmm7, xmmword ptr [rsp+0x80] @@ -328,7 +326,7 @@ G_M21446_IG06: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 973, prolog size 67, PerfScore 325.25, instruction count 194, allocated bytes for code 973 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 956, prolog size 67, PerfScore 322.92, instruction count 192, allocated bytes for code 956 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================ Unwind Info:

smoke_tests.nativeaot.windows.x64.checked.mch

-28 (-2.52%) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -185,12 +185,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm1, ymm3, ymm1 vpand ymm2, ymm2, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54 - vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3 + vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm1, ymm1, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -201,12 +200,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb ymm2, ymm3, ymm2 vpand ymm0, ymm0, ymmword ptr [reloc @RWD96] vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1 - vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160] - vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160] + vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54 - vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3 + vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2 vpcmpeqb ymm0, ymm0, ymm2 vpcmpeqd ymm2, ymm2, ymm2 @@ -214,7 +212,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand ymm0, ymm1, ymm0 vptest ymm0, ymm0 je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpermq ymm0, ymm0, -40 vpmovmskb ebp, ymm0 @@ -337,12 +335,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm1, xmm10, xmm1 vpand xmm2, xmm2, xmm11 vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm2, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54 - vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3 + vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -352,12 +349,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpshufb xmm2, xmm10, xmm2 vpand xmm0, xmm0, xmm11 vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1 - vpsubb xmm4, xmm0, xmm13 - vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13 + vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54 - vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3 + vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm0, xmm0, xmm2 vpcmpeqd xmm2, xmm2, xmm2 @@ -365,7 +361,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r vpand xmm0, xmm1, xmm0 vptest xmm0, xmm0 je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref vpmovmskb ebp, xmm0 ;; size=4 bbWeight=2 PerfScore 4.00 @@ -432,7 +428,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1109, prolog size 86, PerfScore 1189.83, instruction count 240, allocated bytes for code 1109 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1081, prolog size 86, PerfScore 1181.83, instruction count 236, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================ Unwind Info:

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.windows.x64.checked.mch 1 1 0 0 -28 +0
benchmarks.run_pgo.windows.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_tiered.windows.x64.checked.mch 0 0 0 0 -0 +0
coreclr_tests.run.windows.x64.checked.mch 16 16 0 0 -528 +0
libraries.crossgen2.windows.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.windows.x64.checked.mch 24 24 0 0 -499 +0
libraries_tests.run.windows.x64.Release.mch 29 29 0 0 -1,014 +0
librariestestsnotieredcompilation.run.windows.x64.Release.mch 7 7 0 0 -759 +0
realworld.run.windows.x64.checked.mch 0 0 0 0 -0 +0
smoke_tests.nativeaot.windows.x64.checked.mch 1 1 0 0 -28 +0
78 78 0 0 -2,856 +0

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.x64.checked.mch 27,854 4 27,850 232 (0.83%) 232 (0.83%)
benchmarks.run_pgo.windows.x64.checked.mch 101,589 49,789 51,800 129 (0.13%) 129 (0.13%)
benchmarks.run_tiered.windows.x64.checked.mch 54,309 36,842 17,467 76 (0.14%) 76 (0.14%)
coreclr_tests.run.windows.x64.checked.mch 573,547 340,982 232,565 442 (0.08%) 442 (0.08%)
libraries.crossgen2.windows.x64.checked.mch 243,425 15 243,410 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.x64.checked.mch 306,302 6 306,296 2,196 (0.71%) 2,196 (0.71%)
libraries_tests.run.windows.x64.Release.mch 672,162 479,203 192,959 1,125 (0.17%) 1,125 (0.17%)
librariestestsnotieredcompilation.run.windows.x64.Release.mch 318,324 21,885 296,439 2,187 (0.68%) 2,187 (0.68%)
realworld.run.windows.x64.checked.mch 36,492 3 36,489 398 (1.08%) 398 (1.08%)
smoke_tests.nativeaot.windows.x64.checked.mch 32,409 11 32,398 3 (0.01%) 3 (0.01%)
2,366,413 928,740 1,437,673 6,788 (0.29%) 6,788 (0.29%)

jit-analyze output

benchmarks.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 8538947 (overridden on cmd)
Total bytes of diff: 8538919 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 24358.dasm (-2.50 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.50 % of base) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
         -28 (-2.50 % of base) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


coreclr_tests.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 393235161 (overridden on cmd)
Total bytes of diff: 393234633 (overridden on cmd)
Total bytes of delta: -528 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 174838.dasm (-2.83 % of base)
         -56 : 174841.dasm (-2.86 % of base)
         -56 : 174837.dasm (-2.83 % of base)
         -56 : 174842.dasm (-2.82 % of base)
         -32 : 429091.dasm (-0.77 % of base)
         -32 : 429094.dasm (-0.78 % of base)
         -32 : 429095.dasm (-0.77 % of base)
         -32 : 429090.dasm (-0.77 % of base)
         -28 : 174840.dasm (-1.42 % of base)
         -28 : 174835.dasm (-1.46 % of base)
         -28 : 174836.dasm (-1.43 % of base)
         -28 : 174839.dasm (-1.42 % of base)
         -16 : 429089.dasm (-0.39 % of base)
         -16 : 429092.dasm (-0.39 % of base)
         -16 : 429093.dasm (-0.39 % of base)
         -16 : 429088.dasm (-0.39 % of base)

16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-2.83 % of base) : 174838.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-2.86 % of base) : 174841.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-2.82 % of base) : 174842.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -56 (-2.83 % of base) : 174837.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -32 (-0.77 % of base) : 429091.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -32 (-0.78 % of base) : 429094.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429095.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429090.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -28 (-1.42 % of base) : 174840.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.46 % of base) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.43 % of base) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.42 % of base) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -16 (-0.39 % of base) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429088.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

Top method improvements (percentages):
         -56 (-2.86 % of base) : 174841.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -56 (-2.83 % of base) : 174838.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -56 (-2.83 % of base) : 174837.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -56 (-2.82 % of base) : 174842.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -28 (-1.46 % of base) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -28 (-1.43 % of base) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -28 (-1.42 % of base) : 174840.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -28 (-1.42 % of base) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -32 (-0.78 % of base) : 429094.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429095.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429091.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
         -32 (-0.77 % of base) : 429090.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429088.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
         -16 (-0.39 % of base) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)

16 total methods with Code Size differences (16 improved, 0 regressed).


libraries.pmi.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 60138217 (overridden on cmd)
Total bytes of diff: 60137718 (overridden on cmd)
Total bytes of delta: -499 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -56 : 294041.dasm (-20.66 % of base)
         -56 : 293984.dasm (-20.66 % of base)
         -32 : 27695.dasm (-10.85 % of base)
         -28 : 293986.dasm (-16.28 % of base)
         -28 : 27702.dasm (-2.50 % of base)
         -28 : 294043.dasm (-16.28 % of base)
         -26 : 27697.dasm (-8.78 % of base)
         -21 : 293983.dasm (-20.39 % of base)
         -21 : 294005.dasm (-20.39 % of base)
         -21 : 294040.dasm (-20.39 % of base)
         -21 : 294062.dasm (-20.39 % of base)
         -14 : 293982.dasm (-16.28 % of base)
         -14 : 294042.dasm (-13.73 % of base)
         -14 : 293985.dasm (-13.73 % of base)
         -14 : 294003.dasm (-16.87 % of base)
         -14 : 294038.dasm (-16.87 % of base)
         -14 : 294060.dasm (-16.87 % of base)
         -14 : 294061.dasm (-16.28 % of base)
         -14 : 293981.dasm (-16.87 % of base)
         -14 : 294004.dasm (-16.28 % of base)

24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -56 (-20.66 % of base) : 293984.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.66 % of base) : 294041.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -32 (-10.85 % of base) : 27695.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -28 (-2.50 % of base) : 27702.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -28 (-16.28 % of base) : 293986.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -28 (-16.28 % of base) : 294043.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -26 (-8.78 % of base) : 27697.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294040.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 293981.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-13.73 % of base) : 293985.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 293982.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294003.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.28 % of base) : 294004.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294038.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-13.73 % of base) : 294042.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 294039.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294060.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

Top method improvements (percentages):
         -56 (-20.66 % of base) : 293984.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -56 (-20.66 % of base) : 294041.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -21 (-20.39 % of base) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294040.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -21 (-20.39 % of base) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 293981.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294003.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294038.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.87 % of base) : 294060.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -14 (-16.28 % of base) : 293982.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -28 (-16.28 % of base) : 293986.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 294004.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-16.28 % of base) : 294039.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -28 (-16.28 % of base) : 294043.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -14 (-16.28 % of base) : 294061.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -14 (-13.73 % of base) : 293985.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -14 (-13.73 % of base) : 294042.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -32 (-10.85 % of base) : 27695.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -26 (-8.78 % of base) : 27697.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

24 total methods with Code Size differences (24 improved, 0 regressed).


libraries_tests.run.windows.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 278258792 (overridden on cmd)
Total bytes of diff: 278257778 (overridden on cmd)
Total bytes of delta: -1014 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -368 : 393123.dasm (-15.51 % of base)
         -84 : 393118.dasm (-10.81 % of base)
         -84 : 386805.dasm (-10.98 % of base)
         -84 : 393473.dasm (-11.02 % of base)
         -84 : 386605.dasm (-10.98 % of base)
         -32 : 342443.dasm (-10.85 % of base)
         -28 : 342446.dasm (-2.40 % of base)
         -26 : 342451.dasm (-8.78 % of base)
         -14 : 385680.dasm (-16.09 % of base)
         -14 : 386233.dasm (-16.67 % of base)
         -14 : 393193.dasm (-16.09 % of base)
         -14 : 385681.dasm (-16.09 % of base)
         -14 : 385965.dasm (-16.09 % of base)
         -14 : 393122.dasm (-16.09 % of base)
         -14 : 385964.dasm (-16.67 % of base)
         -14 : 393121.dasm (-16.09 % of base)
         -14 : 393190.dasm (-16.67 % of base)
         -14 : 393191.dasm (-16.67 % of base)
         -14 : 393192.dasm (-16.09 % of base)
          -7 : 342450.dasm (-5.79 % of base)

29 total files with Code Size differences (29 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -368 (-15.51 % of base) : 393123.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
         -84 (-11.02 % of base) : 393473.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.98 % of base) : 386605.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.81 % of base) : 393118.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.98 % of base) : 386805.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -32 (-10.85 % of base) : 342443.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)
         -28 (-2.40 % of base) : 342446.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
         -26 (-8.78 % of base) : 342451.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
         -14 (-16.67 % of base) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393193.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385681.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385680.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393192.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 385964.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385965.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393122.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393121.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
          -7 (-5.79 % of base) : 342450.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)

Top method improvements (percentages):
         -14 (-16.67 % of base) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.67 % of base) : 385964.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393193.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385681.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385680.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393192.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 385965.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393122.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
         -14 (-16.09 % of base) : 393121.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
        -368 (-15.51 % of base) : 393123.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
         -84 (-11.02 % of base) : 393473.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.98 % of base) : 386605.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -84 (-10.98 % of base) : 386805.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -32 (-10.85 % of base) : 342443.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)
         -84 (-10.81 % of base) : 393118.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
         -26 (-8.78 % of base) : 342451.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
          -7 (-5.79 % of base) : 342450.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
          -7 (-5.65 % of base) : 342442.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)

29 total methods with Code Size differences (29 improved, 0 regressed).


librariestestsnotieredcompilation.run.windows.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 135861173 (overridden on cmd)
Total bytes of diff: 135860414 (overridden on cmd)
Total bytes of delta: -759 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -182 : 168894.dasm (-17.50 % of base)
        -182 : 168616.dasm (-17.50 % of base)
        -154 : 169032.dasm (-17.09 % of base)
         -98 : 168947.dasm (-15.15 % of base)
         -98 : 168825.dasm (-15.15 % of base)
         -28 : 150728.dasm (-2.50 % of base)
         -17 : 169934.dasm (-1.75 % of base)

7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -182 (-17.50 % of base) : 168894.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.50 % of base) : 168616.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.09 % of base) : 169032.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.15 % of base) : 168947.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.15 % of base) : 168825.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.50 % of base) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -17 (-1.75 % of base) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

Top method improvements (percentages):
        -182 (-17.50 % of base) : 168894.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -182 (-17.50 % of base) : 168616.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -154 (-17.09 % of base) : 169032.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -98 (-15.15 % of base) : 168947.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -98 (-15.15 % of base) : 168825.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -28 (-2.50 % of base) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -17 (-1.75 % of base) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

7 total methods with Code Size differences (7 improved, 0 regressed).


smoke_tests.nativeaot.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 5087315 (overridden on cmd)
Total bytes of diff: 5087287 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -28 : 19903.dasm (-2.52 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -28 (-2.52 % of base) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
         -28 (-2.52 % of base) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).