Assembly Diffs
linux arm64
Diffs are based on 2,505,351 contexts (1,011,240 MinOpts, 1,494,111 FullOpts).
No diffs found.
Details
Context information
| Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
| benchmarks.run.linux.arm64.checked.mch |
34,852 |
3,148 |
31,704 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_pgo.linux.arm64.checked.mch |
151,104 |
59,296 |
91,808 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_tiered.linux.arm64.checked.mch |
71,207 |
53,989 |
17,218 |
0 (0.00%) |
0 (0.00%) |
| coreclr_tests.run.linux.arm64.checked.mch |
627,221 |
383,796 |
243,425 |
0 (0.00%) |
0 (0.00%) |
| libraries.crossgen2.linux.arm64.checked.mch |
234,183 |
15 |
234,168 |
0 (0.00%) |
0 (0.00%) |
| libraries.pmi.linux.arm64.checked.mch |
295,043 |
6 |
295,037 |
0 (0.00%) |
0 (0.00%) |
| libraries_tests.run.linux.arm64.Release.mch |
734,812 |
489,338 |
245,474 |
0 (0.00%) |
0 (0.00%) |
| librariestestsnotieredcompilation.run.linux.arm64.Release.mch |
304,797 |
21,560 |
283,237 |
0 (0.00%) |
0 (0.00%) |
| realworld.run.linux.arm64.checked.mch |
33,103 |
85 |
33,018 |
0 (0.00%) |
0 (0.00%) |
| smoke_tests.nativeaot.linux.arm64.checked.mch |
19,029 |
7 |
19,022 |
0 (0.00%) |
0 (0.00%) |
|
2,505,351 |
1,011,240 |
1,494,111 |
0 (0.00%) |
0 (0.00%) |
linux x64
Diffs are based on 2,512,262 contexts (977,780 MinOpts, 1,534,482 FullOpts).
Overall (-10,720 bytes)
| Collection |
Base size (bytes) |
Diff size (bytes) |
| benchmarks.run.linux.x64.checked.mch |
16,454,856 |
-42 |
| coreclr_tests.run.linux.x64.checked.mch |
403,726,743 |
-528 |
| libraries.pmi.linux.x64.checked.mch |
60,288,822 |
-469 |
| libraries_tests.run.linux.x64.Release.mch |
342,241,520 |
-8,897 |
| librariestestsnotieredcompilation.run.linux.x64.Release.mch |
132,684,790 |
-756 |
| smoke_tests.nativeaot.linux.x64.checked.mch |
4,195,910 |
-28 |
MinOpts (-248 bytes)
| Collection |
Base size (bytes) |
Diff size (bytes) |
| coreclr_tests.run.linux.x64.checked.mch |
279,817,920 |
-192 |
| libraries_tests.run.linux.x64.Release.mch |
183,917,771 |
-56 |
FullOpts (-10,472 bytes)
| Collection |
Base size (bytes) |
Diff size (bytes) |
| benchmarks.run.linux.x64.checked.mch |
16,190,683 |
-42 |
| coreclr_tests.run.linux.x64.checked.mch |
123,908,823 |
-336 |
| libraries.pmi.linux.x64.checked.mch |
60,175,952 |
-469 |
| libraries_tests.run.linux.x64.Release.mch |
158,323,749 |
-8,841 |
| librariestestsnotieredcompilation.run.linux.x64.Release.mch |
122,026,342 |
-756 |
| smoke_tests.nativeaot.linux.x64.checked.mch |
4,194,961 |
-28 |
Example diffs
benchmarks.run.linux.x64.checked.mch
-14 (-4.14%) : 10422.dasm - System.SpanHelpers:ReplaceValueTypeushort (FullOpts)
@@ -161,27 +161,25 @@ G_M56402_IG17: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
G_M56402_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, byref, isz
vmovups zmm2, zmmword ptr [rdi+2*rax]
vpcmpeqw k1, zmm0, zmm2
- vpmovm2w zmm3, k1
- vpternlogd zmm3, zmm1, zmm2, -54
- vmovups zmmword ptr [rsi+2*rax], zmm3
+ vpblendmw zmm2 {k1}, zmm2, zmm1
+ vmovups zmmword ptr [rsi+2*rax], zmm2
add rax, 32
cmp rax, r8
jb SHORT G_M56402_IG18
- ;; size=42 bbWeight=4 PerfScore 42.00
+ ;; size=35 bbWeight=4 PerfScore 40.00
G_M56402_IG19: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi}, byref
vmovups zmm2, zmmword ptr [rdi+2*r8]
vpcmpeqw k1, zmm0, zmm2
- vpmovm2w zmm0, k1
- vpternlogd zmm0, zmm1, zmm2, -54
+ vpblendmw zmm0 {k1}, zmm2, zmm1
vmovups zmmword ptr [rsi+2*r8], zmm0
- ;; size=33 bbWeight=0.50 PerfScore 4.50
+ ;; size=26 bbWeight=0.50 PerfScore 4.25
G_M56402_IG20: ; bbWeight=0.50, epilog, nogc, extend
vzeroupper
pop rbp
ret
;; size=5 bbWeight=0.50 PerfScore 1.25
-; Total bytes of code 338, prolog size 7, PerfScore 172.38, instruction count 86, allocated bytes for code 338 (MethodHash=34bd23ad) for method System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
+; Total bytes of code 324, prolog size 7, PerfScore 170.12, instruction count 84, allocated bytes for code 324 (MethodHash=34bd23ad) for method System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
; ============================================================
Unwind Info:
-28 (-2.59%) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -106,13 +106,13 @@
; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate"
; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate"
; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate"
; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate"
; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate"
; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate"
; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
;
; Lcl frame size = 136
@@ -194,13 +194,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand ymm6, ymm6, ymm8
vmovups ymm9, ymmword ptr [reloc @RWD128]
vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1
- vmovups ymm11, ymmword ptr [reloc @RWD160]
- vpsubb ymm12, ymm6, ymm11
- vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160]
+ vpsubb ymm11, ymm6, ymm10
+ vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54
- vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11
+ vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm5, ymm5, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -211,12 +210,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb ymm6, ymm7, ymm6
vpand ymm4, ymm4, ymm8
vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1
- vpsubb ymm8, ymm4, ymm11
- vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10
+ vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54
- vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7
+ vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm4, ymm4, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -224,7 +222,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand ymm4, ymm5, ymm4
vptest ymm4, ymm4
je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref
vpermq ymm4, ymm4, -40
vpmovmskb r12d, ymm4
@@ -356,13 +354,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm4, xmm4, xmm7
vmovups xmm8, xmmword ptr [reloc @RWD128]
vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1
- vmovups xmm10, xmmword ptr [reloc @RWD160]
- vpsubb xmm11, xmm4, xmm10
- vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160]
+ vpsubb xmm10, xmm4, xmm9
+ vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54
- vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10
+ vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm3, xmm3, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -372,12 +369,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb xmm4, xmm6, xmm4
vpand xmm2, xmm2, xmm7
vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1
- vpsubb xmm6, xmm2, xmm10
- vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9
+ vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54
- vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5
+ vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm2, xmm2, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -385,7 +381,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm2, xmm3, xmm2
vptest xmm2, xmm2
je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref
vpmovmskb r12d, xmm2
;; size=4 bbWeight=2 PerfScore 4.00
@@ -473,7 +469,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1081, prolog size 43, PerfScore 1253.25, instruction count 240, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1053, prolog size 43, PerfScore 1245.25, instruction count 236, allocated bytes for code 1053 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Unwind Info:
coreclr_tests.run.linux.x64.checked.mch
-28 (-1.60%) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
@@ -312,10 +312,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M1266_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm2, ymmword ptr [rbp-0x70]
vpcmpd k1, ymm2, ymmword ptr [rbp-0x30], 2
- vpmovm2d ymm3, k1
- vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -366,10 +365,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M1266_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm1, ymmword ptr [rbp-0x50]
vpcmpd k1, ymm2, ymm1, 2
- vpmovm2d ymm3, k1
- vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -420,10 +418,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M1266_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm2, ymmword ptr [rbp-0x70]
vpcmpd k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2d ymm3, k1
- vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -474,10 +471,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M1266_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm1, ymmword ptr [rbp-0x50]
vpcmpd k1, ymm1, ymmword ptr [rbp-0x30], 5
- vpmovm2d ymm3, k1
- vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -641,7 +637,7 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ret
;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1747, prolog size 35, PerfScore 1104.08, instruction count 335, allocated bytes for code 1747 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
+; Total bytes of code 1719, prolog size 35, PerfScore 1099.42, instruction count 331, allocated bytes for code 1719 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
; ============================================================
Unwind Info:
-28 (-1.57%) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
@@ -312,10 +312,9 @@ G_M59915_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M59915_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm2, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm2, ymmword ptr [rbp-0x30], 2
- vpmovm2q ymm3, k1
- vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -366,10 +365,9 @@ G_M59915_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm1, ymmword ptr [rbp-0x50]
vpcmpq k1, ymm2, ymm1, 2
- vpmovm2q ymm3, k1
- vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=20 bbWeight=1 PerfScore 7.58
G_M59915_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -420,10 +418,9 @@ G_M59915_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M59915_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm2, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2q ymm3, k1
- vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -474,10 +471,9 @@ G_M59915_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M59915_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm1, ymmword ptr [rbp-0x50]
vpcmpq k1, ymm1, ymmword ptr [rbp-0x30], 5
- vpmovm2q ymm3, k1
- vpternlogq ymm3, ymm2, ymm1, -54
+ vpblendmq ymm3 {k1}, ymm1, ymm2
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=24 bbWeight=1 PerfScore 9.58
G_M59915_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -641,7 +637,7 @@ G_M59915_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ret
;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1783, prolog size 35, PerfScore 1110.08, instruction count 335, allocated bytes for code 1783 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 1755, prolog size 35, PerfScore 1105.42, instruction count 331, allocated bytes for code 1755 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================
Unwind Info:
-28 (-1.57%) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
@@ -314,10 +314,9 @@ G_M8563_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M8563_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0x30], 2
- vpmovm2w ymm3, k1
- vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -368,10 +367,9 @@ G_M8563_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm2, ymmword ptr [rbp-0x50]
vpcmpw k1, ymm0, ymm2, 2
- vpmovm2w ymm3, k1
- vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=20 bbWeight=1 PerfScore 9.25
G_M8563_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -422,10 +420,9 @@ G_M8563_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M8563_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0x30], 5
- vpmovm2w ymm3, k1
- vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -476,10 +473,9 @@ G_M8563_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
G_M8563_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm2, ymmword ptr [rbp-0x50]
vpcmpw k1, ymm2, ymmword ptr [rbp-0x30], 5
- vpmovm2w ymm3, k1
- vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmw ymm3 {k1}, ymm2, ymm0
xor ebx, ebx
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=24 bbWeight=1 PerfScore 11.25
G_M8563_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov edi, ebx
vmovups ymmword ptr [rbp-0x90], ymm3
@@ -643,7 +639,7 @@ G_M8563_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ret
;; size=15 bbWeight=1 PerfScore 3.75
-; Total bytes of code 1789, prolog size 35, PerfScore 1266.58, instruction count 336, allocated bytes for code 1789 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
+; Total bytes of code 1761, prolog size 35, PerfScore 1264.58, instruction count 332, allocated bytes for code 1761 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
; ============================================================
Unwind Info:
-16 (-0.39%) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
@@ -437,14 +437,13 @@ G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG18
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x100], edi
jmp G_M59915_IG25
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x100], 4
jae G_M59915_IG54
@@ -523,14 +522,13 @@ G_M59915_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG23
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x108], edi
jmp G_M59915_IG30
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x108], 4
jae G_M59915_IG54
@@ -609,14 +607,13 @@ G_M59915_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG28
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x110], edi
jmp G_M59915_IG35
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x110], 4
jae G_M59915_IG54
@@ -695,14 +692,13 @@ G_M59915_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG33
vmovups ymm0, ymmword ptr [rbp-0x90]
vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x118], edi
jmp G_M59915_IG40
- ;; size=75 bbWeight=1 PerfScore 23.25
+ ;; size=71 bbWeight=1 PerfScore 22.25
G_M59915_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x118], 4
jae G_M59915_IG54
@@ -964,7 +960,7 @@ G_M59915_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4101, prolog size 78, PerfScore 831.26, instruction count 657, allocated bytes for code 4101 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
+; Total bytes of code 4085, prolog size 78, PerfScore 827.26, instruction count 653, allocated bytes for code 4089 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
; ============================================================
Unwind Info:
-16 (-0.39%) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
@@ -440,14 +440,13 @@ G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG18
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x100], edi
jmp G_M8563_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x100], 16
jae G_M8563_IG54
@@ -526,14 +525,13 @@ G_M8563_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG23
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x108], edi
jmp G_M8563_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x108], 16
jae G_M8563_IG54
@@ -612,14 +610,13 @@ G_M8563_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG28
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x110], edi
jmp G_M8563_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x110], 16
jae G_M8563_IG54
@@ -698,14 +695,13 @@ G_M8563_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG33
vmovups ymm0, ymmword ptr [rbp-0x90]
vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x118], edi
jmp G_M8563_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M8563_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x118], 16
jae G_M8563_IG54
@@ -967,7 +963,7 @@ G_M8563_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4104, prolog size 58, PerfScore 860.01, instruction count 660, allocated bytes for code 4104 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
+; Total bytes of code 4088, prolog size 58, PerfScore 860.01, instruction count 656, allocated bytes for code 4092 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
; ============================================================
Unwind Info:
-16 (-0.39%) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
@@ -440,14 +440,13 @@ G_M44299_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG18
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x100], edi
jmp G_M44299_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x100], 32
jae G_M44299_IG54
@@ -526,14 +525,13 @@ G_M44299_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG23
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpb k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x108], edi
jmp G_M44299_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x108], 32
jae G_M44299_IG54
@@ -612,14 +610,13 @@ G_M44299_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG28
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x110], edi
jmp G_M44299_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x110], 32
jae G_M44299_IG54
@@ -698,14 +695,13 @@ G_M44299_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG33
vmovups ymm0, ymmword ptr [rbp-0x90]
vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor edi, edi
mov dword ptr [rbp-0x118], edi
jmp G_M44299_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M44299_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x118], 32
jae G_M44299_IG54
@@ -967,7 +963,7 @@ G_M44299_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4104, prolog size 58, PerfScore 860.01, instruction count 660, allocated bytes for code 4104 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
+; Total bytes of code 4088, prolog size 58, PerfScore 860.01, instruction count 656, allocated bytes for code 4092 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
; ============================================================
Unwind Info:
libraries.pmi.linux.x64.checked.mch
-21 (-20.19%) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte
@@ -17,7 +17,7 @@
;* V06 tmp1 [V06 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V07 tmp2 [V07 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T03] ( 2, 2 ) simd64 -> mm2 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T03] ( 2, 2 ) simd64 -> mm0 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;
; Lcl frame size = 0
@@ -32,26 +32,23 @@ G_M10214_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M10214_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref
; byrRegs +[rdi]
vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1
- vxorps ymm3, ymm3, ymm3
- vpcmpub k1, zmm0, zmm3, 1
- vpmovm2b zmm3, k1
- vpternlogd zmm3, zmm0, zmm1, -54
- vpcmpub k1, zmm0, zmm1, 1
- vpmovm2b zmm4, k1
- vpternlogd zmm4, zmm0, zmm1, -54
- vpternlogd zmm2, zmm3, zmm4, -54
- vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2
+ vpcmpub k2, zmm0, zmm2, 1
+ vpblendmb zmm2 {k2}, zmm1, zmm0
+ vpcmpub k2, zmm0, zmm1, 1
+ vpblendmb zmm0 {k2}, zmm1, zmm0
+ vpblendmb zmm0 {k1}, zmm0, zmm2
+ vmovups zmmword ptr [rdi], zmm0
mov rax, rdi
; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
pop rbp
ret
;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-21 (-20.19%) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte
@@ -13,7 +13,7 @@
; V02 arg1 [V02,T02] ( 4, 4 ) simd64 -> mm1 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
@@ -32,26 +32,23 @@ G_M22834_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M22834_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref
; byrRegs +[rdi]
vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1
- vxorps ymm3, ymm3, ymm3
- vpcmpub k1, zmm0, zmm3, 1
- vpmovm2b zmm3, k1
- vpternlogd zmm3, zmm1, zmm0, -54
- vpcmpub k1, zmm0, zmm1, 6
- vpmovm2b zmm4, k1
- vpternlogd zmm4, zmm0, zmm1, -54
- vpternlogd zmm2, zmm3, zmm4, -54
- vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2
+ vpcmpub k2, zmm0, zmm2, 1
+ vpblendmb zmm2 {k2}, zmm0, zmm1
+ vpcmpub k2, zmm0, zmm1, 6
+ vpblendmb zmm0 {k2}, zmm1, zmm0
+ vpblendmb zmm0 {k1}, zmm0, zmm2
+ vmovups zmmword ptr [rdi], zmm0
mov rax, rdi
; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
pop rbp
ret
;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-21 (-20.19%) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte
@@ -13,7 +13,7 @@
; V02 arg1 [V02,T01] ( 5, 5 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
@@ -32,26 +32,23 @@ G_M30188_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M30188_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byref
; byrRegs +[rdi]
vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1
- vxorps ymm3, ymm3, ymm3
- vpcmpub k1, zmm0, zmm3, 1
- vpmovm2b zmm3, k1
- vpternlogd zmm3, zmm0, zmm1, -54
- vpcmpub k1, zmm0, zmm1, 1
- vpmovm2b zmm4, k1
- vpternlogd zmm4, zmm0, zmm1, -54
- vpternlogd zmm2, zmm3, zmm4, -54
- vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2
+ vpcmpub k2, zmm0, zmm2, 1
+ vpblendmb zmm2 {k2}, zmm1, zmm0
+ vpcmpub k2, zmm0, zmm1, 1
+ vpblendmb zmm0 {k2}, zmm1, zmm0
+ vpblendmb zmm0 {k1}, zmm0, zmm2
+ vmovups zmmword ptr [rdi], zmm0
mov rax, rdi
; byrRegs +[rax]
- ;; size=72 bbWeight=1 PerfScore 16.58
+ ;; size=51 bbWeight=1 PerfScore 15.08
G_M30188_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
pop rbp
ret
;; size=5 bbWeight=1 PerfScore 2.50
-; Total bytes of code 104, prolog size 7, PerfScore 27.33, instruction count 20, allocated bytes for code 104 (MethodHash=29ab8a13) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 83, prolog size 7, PerfScore 25.83, instruction count 17, allocated bytes for code 83 (MethodHash=29ab8a13) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-14 (-5.19%) : 20784.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],byref):System.Runtime.Intrinsics.Vector256`1ubyte
@@ -37,7 +37,7 @@
; V26 cse1 [V26,T09] ( 3, 3 ) simd32 -> mm5 "CSE - moderate"
; V27 cse2 [V27,T10] ( 3, 3 ) simd32 -> mm6 "CSE - moderate"
; V28 cse3 [V28,T11] ( 3, 3 ) simd32 -> mm7 "CSE - moderate"
-; V29 cse4 [V29,T12] ( 3, 3 ) simd32 -> mm9 "CSE - moderate"
+; V29 cse4 [V29,T12] ( 3, 3 ) simd32 -> mm8 "CSE - moderate"
;
; Lcl frame size = 0
@@ -68,13 +68,12 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi},
vpand ymm4, ymm4, ymm6
vmovups ymm7, ymmword ptr [reloc @RWD128]
vpcmpub k1, ymm4, ymm7, 6
- vpmovm2b ymm8, k1
- vmovups ymm9, ymmword ptr [reloc @RWD160]
- vpsubb ymm10, ymm4, ymm9
- vpshufb ymm10, ymm1, ymm10
+ vmovups ymm8, ymmword ptr [reloc @RWD160]
+ vpsubb ymm9, ymm4, ymm8
+ vpshufb ymm9, ymm1, ymm9
vpshufb ymm4, ymm0, ymm4
- vpternlogd ymm8, ymm10, ymm4, -54
- vpand ymm3, ymm8, ymm3
+ vpblendmb ymm4 {k1}, ymm4, ymm9
+ vpand ymm3, ymm4, ymm3
vxorps ymm4, ymm4, ymm4
vpcmpeqb ymm3, ymm3, ymm4
vpcmpeqd ymm4, ymm4, ymm4
@@ -85,12 +84,11 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi},
vpshufb ymm4, ymm5, ymm4
vpand ymm2, ymm2, ymm6
vpcmpub k1, ymm2, ymm7, 6
- vpmovm2b ymm5, k1
- vpsubb ymm6, ymm2, ymm9
- vpshufb ymm1, ymm1, ymm6
+ vpsubb ymm5, ymm2, ymm8
+ vpshufb ymm1, ymm1, ymm5
vpshufb ymm0, ymm0, ymm2
- vpternlogd ymm5, ymm1, ymm0, -54
- vpand ymm0, ymm5, ymm4
+ vpblendmb ymm0 {k1}, ymm0, ymm1
+ vpand ymm0, ymm0, ymm4
vxorps ymm1, ymm1, ymm1
vpcmpeqb ymm0, ymm0, ymm1
vpcmpeqd ymm1, ymm1, ymm1
@@ -99,7 +97,7 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=00C0 {rsi rdi},
vmovups ymmword ptr [rdi], ymm0
mov rax, rdi
; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 78.25
+ ;; size=234 bbWeight=1 PerfScore 77.25
G_M59405_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
pop rbp
@@ -113,7 +111,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 270, prolog size 7, PerfScore 91.00, instruction count 56, allocated bytes for code 270 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 256, prolog size 7, PerfScore 90.00, instruction count 54, allocated bytes for code 256 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-7 (-4.32%) : 206873.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float
@@ -40,9 +40,8 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr
vpandd zmm3, zmm4, dword ptr [reloc @RWD128] {1to16}
vpord zmm4, zmm3, dword ptr [reloc @RWD132] {1to16}
vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1
- vpslld zmm5, zmm4, 1
- vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1
+ vpblendmd zmm2 {k1}, zmm2, zmm4
vpslld zmm0, zmm0, 13
vpandd zmm0, zmm0, dword ptr [reloc @RWD136] {1to16}
vpaddd zmm0, zmm0, zmm2
@@ -51,7 +50,7 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr
vmovups zmmword ptr [rdi], zmm0
mov rax, rdi
; byrRegs +[rax]
- ;; size=140 bbWeight=1 PerfScore 29.25
+ ;; size=133 bbWeight=1 PerfScore 28.25
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
pop rbp
@@ -67,7 +66,7 @@ RWD132 dd 38000000h
RWD136 dd 0FFFE000h
-; Total bytes of code 162, prolog size 7, PerfScore 37.00, instruction count 27, allocated bytes for code 162 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 155, prolog size 7, PerfScore 36.00, instruction count 26, allocated bytes for code 155 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================
Unwind Info:
-28 (-2.59%) : 20791.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -102,13 +102,13 @@
; V91 cse1 [V91,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate"
; V92 cse2 [V92,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate"
; V93 cse3 [V93,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V94 cse4 [V94,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V94 cse4 [V94,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V95 cse5 [V95,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate"
; V96 cse6 [V96,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate"
; V97 cse7 [V97,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate"
; V98 cse8 [V98,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate"
; V99 cse9 [V99,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V100 cse10 [V100,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V100 cse10 [V100,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
;
; Lcl frame size = 168
@@ -187,13 +187,12 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand ymm6, ymm6, ymm8
vmovups ymm9, ymmword ptr [reloc @RWD128]
vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1
- vmovups ymm11, ymmword ptr [reloc @RWD160]
- vpsubb ymm12, ymm6, ymm11
- vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160]
+ vpsubb ymm11, ymm6, ymm10
+ vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54
- vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11
+ vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm5, ymm5, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -204,12 +203,11 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb ymm6, ymm7, ymm6
vpand ymm4, ymm4, ymm8
vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1
- vpsubb ymm8, ymm4, ymm11
- vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10
+ vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54
- vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7
+ vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm4, ymm4, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -218,7 +216,7 @@ G_M48875_IG04: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vmovups ymmword ptr [rbp-0xB0], ymm4
vptest ymm4, ymm4
je SHORT G_M48875_IG07
- ;; size=265 bbWeight=4 PerfScore 336.00
+ ;; size=251 bbWeight=4 PerfScore 332.00
G_M48875_IG05: ; bbWeight=2, gcVars=0000000000000201 {V04 V05}, gcrefRegs=0000 {}, byrefRegs=6008 {rbx r13 r14}, gcvars, byref
; byrRegs -[rcx]
mov edi, 1
@@ -342,13 +340,12 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm4, xmm4, xmm7
vmovups xmm8, xmmword ptr [reloc @RWD128]
vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1
- vmovups xmm10, xmmword ptr [reloc @RWD160]
- vpsubb xmm11, xmm4, xmm10
- vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160]
+ vpsubb xmm10, xmm4, xmm9
+ vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54
- vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10
+ vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm3, xmm3, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -358,12 +355,11 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb xmm4, xmm6, xmm4
vpand xmm2, xmm2, xmm7
vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1
- vpsubb xmm6, xmm2, xmm10
- vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9
+ vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54
- vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5
+ vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm2, xmm2, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -371,7 +367,7 @@ G_M48875_IG14: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm2, xmm3, xmm2
vptest xmm2, xmm2
je G_M48875_IG19
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG15: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref
vpmovmskb r12d, xmm2
;; size=4 bbWeight=2 PerfScore 4.00
@@ -459,7 +455,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1080, prolog size 43, PerfScore 1246.50, instruction count 241, allocated bytes for code 1080 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1052, prolog size 43, PerfScore 1238.50, instruction count 237, allocated bytes for code 1052 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Unwind Info:
libraries_tests.run.linux.x64.Release.mch
-28 (-20.29%) : 447031.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]](System.Runtime.Intrinsics.Vector1281[uint]):uint (Tier1)
@@ -37,30 +37,26 @@ G_M12292_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpeqd xmm2, xmm0, xmm1
vxorps xmm3, xmm3, xmm3
vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1
- vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 {k1}, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1
- vpternlogd xmm4, xmm0, xmm1, -54
- vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0
+ vpternlogd xmm2, xmm3, xmm0, -54
vpshufd xmm0, xmm2, -79
vpcmpeqd xmm1, xmm2, xmm0
vxorps xmm3, xmm3, xmm3
vpcmpud k1, xmm2, xmm3, 1
- vpmovm2d xmm3, k1
- vpternlogd xmm3, xmm0, xmm2, -54
+ vpblendmd xmm3 {k1}, xmm2, xmm0
vpcmpud k1, xmm2, xmm0, 6
- vpmovm2d xmm4, k1
- vpternlogd xmm4, xmm2, xmm0, -54
- vpternlogd xmm1, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm0, xmm2
+ vpternlogd xmm1, xmm3, xmm0, -54
vmovd eax, xmm1
- ;; size=124 bbWeight=1 PerfScore 24.67
+ ;; size=96 bbWeight=1 PerfScore 20.00
G_M12292_IG03: ; bbWeight=1, epilog, nogc, extend
pop rbp
ret
;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 138, prolog size 7, PerfScore 31.42, instruction count 27, allocated bytes for code 138 (MethodHash=4174cffb) for method System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
+; Total bytes of code 110, prolog size 7, PerfScore 26.75, instruction count 23, allocated bytes for code 110 (MethodHash=4174cffb) for method System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
; ============================================================
Unwind Info:
-14 (-17.28%) : 433093.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint
@@ -34,22 +34,20 @@ G_M36523_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr
vpcmpeqd xmm2, xmm0, xmm1
vxorps xmm3, xmm3, xmm3
vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1
- vpternlogd xmm3, xmm0, xmm1, -54
+ vpblendmd xmm3 {k1}, xmm1, xmm0
vpcmpud k1, xmm0, xmm1, 1
- vpmovm2d xmm4, k1
- vpternlogd xmm4, xmm0, xmm1, -54
- vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0
+ vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [rdi], xmm2
mov rax, rdi
; byrRegs +[rax]
- ;; size=62 bbWeight=1 PerfScore 12.58
+ ;; size=48 bbWeight=1 PerfScore 10.25
G_M36523_IG03: ; bbWeight=1, epilog, nogc, extend
pop rbp
ret
;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 81, prolog size 7, PerfScore 22.33, instruction count 18, allocated bytes for code 81 (MethodHash=772b7154) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 7, PerfScore 20.00, instruction count 16, allocated bytes for code 67 (MethodHash=772b7154) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================
Unwind Info:
-14 (-17.28%) : 432810.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint
@@ -35,22 +35,20 @@ G_M23551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0080 {rdi}, byr
vpcmpeqd xmm2, xmm0, xmm1
vxorps xmm3, xmm3, xmm3
vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1
- vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 {k1}, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1
- vpternlogd xmm4, xmm0, xmm1, -54
- vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 {k1}, xmm1, xmm0
+ vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [rdi], xmm2
mov rax, rdi
; byrRegs +[rax]
- ;; size=62 bbWeight=1 PerfScore 12.58
+ ;; size=48 bbWeight=1 PerfScore 10.25
G_M23551_IG03: ; bbWeight=1, epilog, nogc, extend
pop rbp
ret
;; size=2 bbWeight=1 PerfScore 1.50
-; Total bytes of code 81, prolog size 7, PerfScore 22.33, instruction count 18, allocated bytes for code 81 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 7, PerfScore 20.00, instruction count 16, allocated bytes for code 67 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================
Unwind Info:
-7 (-2.86%) : 431769.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector2561[uint],System.Runtime.Intrinsics.Vector2561[uint]):System.Runtime.Intrinsics.Vector2561uint
@@ -47,9 +47,8 @@ G_M25547_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpternlogd ymm1, ymm2, ymmword ptr [rbp+0x10], -54
vmovups ymm2, ymmword ptr [rbp-0x50]
vpcmpud k1, ymm2, ymmword ptr [rbp-0x30], 1
- vpmovm2d ymm2, k1
- vmovups ymm3, ymmword ptr [rbp+0x30]
- vpternlogd ymm2, ymm3, ymmword ptr [rbp+0x10], -54
+ vmovups ymm2, ymmword ptr [rbp+0x10]
+ vpblendmd ymm2 {k1}, ymm2, ymmword ptr [rbp+0x30]
vpternlogd ymm0, ymm1, ymm2, -54
vmovups ymmword ptr [rbp-0x70], ymm0
mov rdi, 0xD1FFAB1E
@@ -61,7 +60,7 @@ G_M25547_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rax], ymm0
mov rax, bword ptr [rbp-0x08]
- ;; size=174 bbWeight=1 PerfScore 62.50
+ ;; size=167 bbWeight=1 PerfScore 61.50
G_M25547_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
add rsp, 240
@@ -69,7 +68,7 @@ G_M25547_IG03: ; bbWeight=1, epilog, nogc, extend
ret
;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 245, prolog size 55, PerfScore 79.33, instruction count 43, allocated bytes for code 245 (MethodHash=df969c34) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
+; Total bytes of code 238, prolog size 55, PerfScore 78.33, instruction count 42, allocated bytes for code 239 (MethodHash=df969c34) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
; ============================================================
Unwind Info:
-7 (-2.86%) : 431752.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector2561[uint],System.Runtime.Intrinsics.Vector2561[uint]):System.Runtime.Intrinsics.Vector2561uint
@@ -47,9 +47,8 @@ G_M22549_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpternlogd ymm1, ymm2, ymmword ptr [rbp+0x10], -54
vmovups ymm2, ymmword ptr [rbp-0x30]
vpcmpud k1, ymm2, ymmword ptr [rbp-0x50], 6
- vpmovm2d ymm2, k1
- vmovups ymm3, ymmword ptr [rbp+0x10]
- vpternlogd ymm2, ymm3, ymmword ptr [rbp+0x30], -54
+ vmovups ymm2, ymmword ptr [rbp+0x30]
+ vpblendmd ymm2 {k1}, ymm2, ymmword ptr [rbp+0x10]
vpternlogd ymm0, ymm1, ymm2, -54
vmovups ymmword ptr [rbp-0x70], ymm0
mov rdi, 0xD1FFAB1E
@@ -61,7 +60,7 @@ G_M22549_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rax], ymm0
mov rax, bword ptr [rbp-0x08]
- ;; size=174 bbWeight=1 PerfScore 62.50
+ ;; size=167 bbWeight=1 PerfScore 61.50
G_M22549_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
add rsp, 240
@@ -69,7 +68,7 @@ G_M22549_IG03: ; bbWeight=1, epilog, nogc, extend
ret
;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 245, prolog size 55, PerfScore 79.33, instruction count 43, allocated bytes for code 245 (MethodHash=7ad0a7ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
+; Total bytes of code 238, prolog size 55, PerfScore 78.33, instruction count 42, allocated bytes for code 239 (MethodHash=7ad0a7ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector256`1[uint],System.Runtime.Intrinsics.Vector256`1[uint]):System.Runtime.Intrinsics.Vector256`1[uint] (Instrumented Tier0)
; ============================================================
Unwind Info:
-28 (-2.61%) : 378853.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
@@ -103,7 +103,7 @@
; V91 cse1 [V91,T23] ( 3, 17.08) simd32 -> mm7 "CSE - aggressive"
; V92 cse2 [V92,T24] ( 3, 17.08) simd32 -> mm8 "CSE - aggressive"
; V93 cse3 [V93,T25] ( 3, 17.08) simd32 -> mm9 "CSE - aggressive"
-; V94 cse4 [V94,T26] ( 3, 17.08) simd32 -> mm11 "CSE - aggressive"
+; V94 cse4 [V94,T26] ( 3, 17.08) simd32 -> mm10 "CSE - aggressive"
;
; Lcl frame size = 168
@@ -180,13 +180,12 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb
vpand ymm6, ymm6, ymm8
vmovups ymm9, ymmword ptr [reloc @RWD128]
vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1
- vmovups ymm11, ymmword ptr [reloc @RWD160]
- vpsubb ymm12, ymm6, ymm11
- vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160]
+ vpsubb ymm11, ymm6, ymm10
+ vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54
- vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11
+ vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm5, ymm5, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -197,14 +196,13 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb
vpshufb ymm6, ymm7, ymm6
vpand ymm4, ymm4, ymm8
vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1
- vpsubb ymm8, ymm4, ymm11
+ vpsubb ymm7, ymm4, ymm10
vmovups ymmword ptr [rbp-0x90], ymm3
- vpshufb ymm8, ymm3, ymm8
+ vpshufb ymm7, ymm3, ymm7
vmovups ymmword ptr [rbp-0x70], ymm2
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54
- vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7
+ vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm4, ymm4, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -213,7 +211,7 @@ G_M48875_IG04: ; bbWeight=5.69, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb
vmovups ymmword ptr [rbp-0xB0], ymm4
vptest ymm4, ymm4
jne SHORT G_M48875_IG07
- ;; size=278 bbWeight=5.69 PerfScore 489.70
+ ;; size=264 bbWeight=5.69 PerfScore 484.00
G_M48875_IG05: ; bbWeight=4.77, gcVars=0000000000000401 {V04 V05}, gcrefRegs=0000 {}, byrefRegs=C008 {rbx r14 r15}, gcvars, byref
; byrRegs -[rcx]
mov rcx, bword ptr [rbp-0xC0]
@@ -335,12 +333,11 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb
vpshufb xmm3, xmm5, xmm3
vpand xmm4, xmm4, xmmword ptr [reloc @RWD96]
vpcmpub k1, xmm4, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm5, k1
- vpsubb xmm6, xmm4, xmmword ptr [reloc @RWD160]
- vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm4, xmmword ptr [reloc @RWD160]
+ vpshufb xmm5, xmm1, xmm5
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm5, xmm6, xmm4, -54
- vpand xmm3, xmm5, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm5
+ vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm3, xmm3, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -351,12 +348,11 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb
vpshufb xmm4, xmm5, xmm4
vpand xmm2, xmm2, xmmword ptr [reloc @RWD96]
vpcmpub k1, xmm2, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm5, k1
- vpsubb xmm6, xmm2, xmmword ptr [reloc @RWD160]
- vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmmword ptr [reloc @RWD160]
+ vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54
- vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5
+ vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm2, xmm2, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -364,7 +360,7 @@ G_M48875_IG17: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rb
vpand xmm2, xmm3, xmm2
vptest xmm2, xmm2
je G_M48875_IG25
- ;; size=250 bbWeight=0.07 PerfScore 4.43
+ ;; size=236 bbWeight=0.07 PerfScore 4.36
G_M48875_IG18: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C00A {rcx rbx r14 r15}, byref
vpmovmskb r12d, xmm2
;; size=4 bbWeight=0.07 PerfScore 0.13
@@ -474,7 +470,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1074, prolog size 42, PerfScore 652.51, instruction count 236, allocated bytes for code 1074 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
+; Total bytes of code 1046, prolog size 42, PerfScore 646.75, instruction count 232, allocated bytes for code 1046 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
; ============================================================
Unwind Info:
librariestestsnotieredcompilation.run.linux.x64.Release.mch
-28 (-2.59%) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -106,13 +106,13 @@
; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate"
; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate"
; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate"
; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate"
; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate"
; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate"
; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
;
; Lcl frame size = 136
@@ -194,13 +194,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand ymm6, ymm6, ymm8
vmovups ymm9, ymmword ptr [reloc @RWD128]
vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1
- vmovups ymm11, ymmword ptr [reloc @RWD160]
- vpsubb ymm12, ymm6, ymm11
- vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160]
+ vpsubb ymm11, ymm6, ymm10
+ vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54
- vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11
+ vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm5, ymm5, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -211,12 +210,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb ymm6, ymm7, ymm6
vpand ymm4, ymm4, ymm8
vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1
- vpsubb ymm8, ymm4, ymm11
- vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10
+ vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54
- vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7
+ vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm4, ymm4, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -224,7 +222,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand ymm4, ymm5, ymm4
vptest ymm4, ymm4
je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref
vpermq ymm4, ymm4, -40
vpmovmskb r12d, ymm4
@@ -356,13 +354,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm4, xmm4, xmm7
vmovups xmm8, xmmword ptr [reloc @RWD128]
vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1
- vmovups xmm10, xmmword ptr [reloc @RWD160]
- vpsubb xmm11, xmm4, xmm10
- vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160]
+ vpsubb xmm10, xmm4, xmm9
+ vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54
- vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10
+ vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm3, xmm3, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -372,12 +369,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb xmm4, xmm6, xmm4
vpand xmm2, xmm2, xmm7
vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1
- vpsubb xmm6, xmm2, xmm10
- vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9
+ vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54
- vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5
+ vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm2, xmm2, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -385,7 +381,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm2, xmm3, xmm2
vptest xmm2, xmm2
je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref
vpmovmskb r12d, xmm2
;; size=4 bbWeight=2 PerfScore 4.00
@@ -473,7 +469,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1081, prolog size 43, PerfScore 1253.25, instruction count 240, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1053, prolog size 43, PerfScore 1245.25, instruction count 236, allocated bytes for code 1053 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Unwind Info:
-14 (-1.65%) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)
@@ -98,8 +98,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm1, ymmword ptr [rdi+0x10]
vmovups ymmword ptr [rbp-0x50], ymm1
vpcmpud k1, ymm0, ymm1, 6
- vpmovm2d ymm2, k1
- vpternlogd ymm2, ymm0, ymm1, -54
+ vpblendmd ymm2 {k1}, ymm1, ymm0
vmovups ymmword ptr [rbp-0x70], ymm2
mov rdi, 0xD1FFAB1E ; System.Action`2[int,uint]
; gcrRegs -[rdi]
@@ -138,9 +137,9 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs +[rdi]
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- ;; size=313 bbWeight=1 PerfScore 86.50
-G_M21446_IG03: ; bbWeight=1, extend
vmovdqu xmm0, xmmword ptr [rbp-0xB0]
+ ;; size=314 bbWeight=1 PerfScore 88.33
+G_M21446_IG03: ; bbWeight=1, extend
vpextrd edx, xmm0, 3
mov esi, 3
mov rdi, gword ptr [r15+0x08]
@@ -182,9 +181,8 @@ G_M21446_IG03: ; bbWeight=1, extend
vmovups ymm0, ymmword ptr [rbp-0x30]
vmovups ymm1, ymmword ptr [rbp-0x50]
vpcmpud k1, ymm0, ymm1, 2
- vpmovm2d ymm2, k1
- vpternlogd ymm2, ymm0, ymm1, -54
- vmovups ymmword ptr [rbp-0x90], ymm2
+ vpblendmd ymm0 {k1}, ymm1, ymm0
+ vmovups ymmword ptr [rbp-0x90], ymm0
mov rdi, 0xD1FFAB1E ; System.Action`2[int,uint]
call CORINFO_HELP_NEWSFAST
; gcrRegs +[rax]
@@ -199,63 +197,63 @@ G_M21446_IG03: ; bbWeight=1, extend
; byrRegs -[rdi]
mov rdx, 0xD1FFAB1E ; code for <unknown method>
mov qword ptr [r15+0x18], rdx
- vmovups ymm2, ymmword ptr [rbp-0x90]
- vmovups ymmword ptr [rbp-0xD0], ymm2
- vmovd edx, xmm2
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vmovups ymmword ptr [rbp-0xD0], ymm0
+ vmovd edx, xmm0
xor esi, esi
mov rdi, gword ptr [r15+0x08]
; gcrRegs +[rdi]
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0]
- vpextrd edx, xmm0, 1
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0]
+ vpextrd edx, xmm1, 1
mov esi, 1
mov rdi, gword ptr [r15+0x08]
; gcrRegs +[rdi]
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0]
- vpextrd edx, xmm0, 2
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0]
+ vpextrd edx, xmm1, 2
mov esi, 2
mov rdi, gword ptr [r15+0x08]
; gcrRegs +[rdi]
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- vmovdqu xmm0, xmmword ptr [rbp-0xD0]
- vpextrd edx, xmm0, 3
+ vmovdqu xmm1, xmmword ptr [rbp-0xD0]
+ vpextrd edx, xmm1, 3
mov esi, 3
mov rdi, gword ptr [r15+0x08]
; gcrRegs +[rdi]
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0]
- vextracti128 xmm0, ymm2, 1
- vmovd edx, xmm0
- ;; size=368 bbWeight=1 PerfScore 139.25
-G_M21446_IG04: ; bbWeight=1, extend
+ vmovups ymm0, ymmword ptr [rbp-0xD0]
+ vextracti128 xmm1, ymm0, 1
+ vmovd edx, xmm1
mov esi, 4
mov rdi, gword ptr [r15+0x08]
; gcrRegs +[rdi]
+ ;; size=362 bbWeight=1 PerfScore 137.33
+G_M21446_IG04: ; bbWeight=1, extend
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0]
- vextracti128 xmm0, ymm2, 1
- vpextrd edx, xmm0, 1
+ vmovups ymm0, ymmword ptr [rbp-0xD0]
+ vextracti128 xmm1, ymm0, 1
+ vpextrd edx, xmm1, 1
mov esi, 5
mov rdi, gword ptr [r15+0x08]
; gcrRegs +[rdi]
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0]
- vextracti128 xmm0, ymm2, 1
- vpextrd edx, xmm0, 2
+ vmovups ymm0, ymmword ptr [rbp-0xD0]
+ vextracti128 xmm1, ymm0, 1
+ vpextrd edx, xmm1, 2
mov esi, 6
mov rdi, gword ptr [r15+0x08]
; gcrRegs +[rdi]
call [r15+0x18]<unknown method>
; gcrRegs -[rdi]
- vmovups ymm2, ymmword ptr [rbp-0xD0]
- vextracti128 xmm0, ymm2, 1
+ vmovups ymm0, ymmword ptr [rbp-0xD0]
+ vextracti128 xmm0, ymm0, 1
vpextrd edx, xmm0, 3
mov esi, 7
mov rdi, gword ptr [r15+0x08]
@@ -263,7 +261,7 @@ G_M21446_IG04: ; bbWeight=1, extend
call [r15+0x18]<unknown method>
; gcrRegs -[rdi r15]
nop
- ;; size=113 bbWeight=1 PerfScore 48.25
+ ;; size=104 bbWeight=1 PerfScore 46.00
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend
vzeroupper
add rsp, 208
@@ -277,7 +275,7 @@ G_M21446_IG06: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
int3
;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 847, prolog size 31, PerfScore 283.75, instruction count 165, allocated bytes for code 847 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 833, prolog size 31, PerfScore 281.42, instruction count 163, allocated bytes for code 833 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================
Unwind Info:
smoke_tests.nativeaot.linux.x64.checked.mch
-28 (-2.61%) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -105,13 +105,13 @@
; V95 cse1 [V95,T30] ( 3, 12 ) simd32 -> mm7 "CSE - moderate"
; V96 cse2 [V96,T31] ( 3, 12 ) simd32 -> mm8 "CSE - moderate"
; V97 cse3 [V97,T32] ( 3, 12 ) simd32 -> mm9 "CSE - moderate"
-; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm11 "CSE - moderate"
+; V98 cse4 [V98,T33] ( 3, 12 ) simd32 -> mm10 "CSE - moderate"
; V99 cse5 [V99,T34] ( 3, 12 ) simd16 -> mm4 "CSE - moderate"
; V100 cse6 [V100,T35] ( 3, 12 ) simd16 -> mm5 "CSE - moderate"
; V101 cse7 [V101,T36] ( 3, 12 ) simd16 -> mm6 "CSE - moderate"
; V102 cse8 [V102,T37] ( 3, 12 ) simd16 -> mm7 "CSE - moderate"
; V103 cse9 [V103,T38] ( 3, 12 ) simd16 -> mm8 "CSE - moderate"
-; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm10 "CSE - moderate"
+; V104 cse10 [V104,T39] ( 3, 12 ) simd16 -> mm9 "CSE - moderate"
;
; Lcl frame size = 136
@@ -193,13 +193,12 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand ymm6, ymm6, ymm8
vmovups ymm9, ymmword ptr [reloc @RWD128]
vpcmpub k1, ymm6, ymm9, 6
- vpmovm2b ymm10, k1
- vmovups ymm11, ymmword ptr [reloc @RWD160]
- vpsubb ymm12, ymm6, ymm11
- vpshufb ymm12, ymm3, ymm12
+ vmovups ymm10, ymmword ptr [reloc @RWD160]
+ vpsubb ymm11, ymm6, ymm10
+ vpshufb ymm11, ymm3, ymm11
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm10, ymm12, ymm6, -54
- vpand ymm5, ymm10, ymm5
+ vpblendmb ymm6 {k1}, ymm6, ymm11
+ vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm5, ymm5, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -210,12 +209,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb ymm6, ymm7, ymm6
vpand ymm4, ymm4, ymm8
vpcmpub k1, ymm4, ymm9, 6
- vpmovm2b ymm7, k1
- vpsubb ymm8, ymm4, ymm11
- vpshufb ymm8, ymm3, ymm8
+ vpsubb ymm7, ymm4, ymm10
+ vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm8, ymm4, -54
- vpand ymm4, ymm7, ymm6
+ vpblendmb ymm4 {k1}, ymm4, ymm7
+ vpand ymm4, ymm4, ymm6
vxorps ymm6, ymm6, ymm6
vpcmpeqb ymm4, ymm4, ymm6
vpcmpeqd ymm6, ymm6, ymm6
@@ -223,7 +221,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand ymm4, ymm5, ymm4
vptest ymm4, ymm4
je G_M48875_IG11
- ;; size=254 bbWeight=4 PerfScore 328.00
+ ;; size=240 bbWeight=4 PerfScore 324.00
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref
vpermq ymm4, ymm4, -40
vpmovmskb r12d, ymm4
@@ -355,13 +353,12 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm4, xmm4, xmm7
vmovups xmm8, xmmword ptr [reloc @RWD128]
vpcmpub k1, xmm4, xmm8, 6
- vpmovm2b xmm9, k1
- vmovups xmm10, xmmword ptr [reloc @RWD160]
- vpsubb xmm11, xmm4, xmm10
- vpshufb xmm11, xmm1, xmm11
+ vmovups xmm9, xmmword ptr [reloc @RWD160]
+ vpsubb xmm10, xmm4, xmm9
+ vpshufb xmm10, xmm1, xmm10
vpshufb xmm4, xmm0, xmm4
- vpternlogd xmm9, xmm11, xmm4, -54
- vpand xmm3, xmm9, xmm3
+ vpblendmb xmm4 {k1}, xmm4, xmm10
+ vpand xmm3, xmm4, xmm3
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm3, xmm3, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -371,12 +368,11 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpshufb xmm4, xmm6, xmm4
vpand xmm2, xmm2, xmm7
vpcmpub k1, xmm2, xmm8, 6
- vpmovm2b xmm5, k1
- vpsubb xmm6, xmm2, xmm10
- vpshufb xmm6, xmm1, xmm6
+ vpsubb xmm5, xmm2, xmm9
+ vpshufb xmm5, xmm1, xmm5
vpshufb xmm2, xmm0, xmm2
- vpternlogd xmm5, xmm6, xmm2, -54
- vpand xmm2, xmm5, xmm4
+ vpblendmb xmm2 {k1}, xmm2, xmm5
+ vpand xmm2, xmm2, xmm4
vxorps xmm4, xmm4, xmm4
vpcmpeqb xmm2, xmm2, xmm4
vpcmpeqd xmm4, xmm4, xmm4
@@ -384,7 +380,7 @@ G_M48875_IG18: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r
vpand xmm2, xmm3, xmm2
vptest xmm2, xmm2
je G_M48875_IG23
- ;; size=244 bbWeight=4 PerfScore 240.00
+ ;; size=230 bbWeight=4 PerfScore 236.00
G_M48875_IG19: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=600A {rcx rbx r13 r14}, byref
vpmovmskb r12d, xmm2
;; size=4 bbWeight=2 PerfScore 4.00
@@ -472,7 +468,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1072, prolog size 43, PerfScore 1188.50, instruction count 240, allocated bytes for code 1072 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1044, prolog size 43, PerfScore 1180.50, instruction count 236, allocated bytes for code 1044 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Cfi Info:
Details
Improvements/regressions per collection
| Collection |
Contexts with diffs |
Improvements |
Regressions |
Same size |
Improvements (bytes) |
Regressions (bytes) |
| benchmarks.run.linux.x64.checked.mch |
2 |
2 |
0 |
0 |
-42 |
+0 |
| benchmarks.run_pgo.linux.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| benchmarks.run_tiered.linux.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| coreclr_tests.run.linux.x64.checked.mch |
16 |
16 |
0 |
0 |
-528 |
+0 |
| libraries.crossgen2.linux.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| libraries.pmi.linux.x64.checked.mch |
24 |
24 |
0 |
0 |
-469 |
+0 |
| libraries_tests.run.linux.x64.Release.mch |
75 |
75 |
0 |
0 |
-8,897 |
+0 |
| librariestestsnotieredcompilation.run.linux.x64.Release.mch |
7 |
7 |
0 |
0 |
-756 |
+0 |
| realworld.run.linux.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| smoke_tests.nativeaot.linux.x64.checked.mch |
1 |
1 |
0 |
0 |
-28 |
+0 |
|
125 |
125 |
0 |
0 |
-10,720 |
+0 |
Context information
| Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
| benchmarks.run.linux.x64.checked.mch |
42,857 |
3,142 |
39,715 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_pgo.linux.x64.checked.mch |
158,377 |
60,175 |
98,202 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_tiered.linux.x64.checked.mch |
56,500 |
42,284 |
14,216 |
0 (0.00%) |
0 (0.00%) |
| coreclr_tests.run.linux.x64.checked.mch |
596,771 |
354,686 |
242,085 |
0 (0.00%) |
0 (0.00%) |
| libraries.crossgen2.linux.x64.checked.mch |
234,032 |
15 |
234,017 |
0 (0.00%) |
0 (0.00%) |
| libraries.pmi.linux.x64.checked.mch |
296,234 |
6 |
296,228 |
0 (0.00%) |
0 (0.00%) |
| libraries_tests.run.linux.x64.Release.mch |
761,652 |
495,580 |
266,072 |
0 (0.00%) |
0 (0.00%) |
| librariestestsnotieredcompilation.run.linux.x64.Release.mch |
305,348 |
21,873 |
283,475 |
0 (0.00%) |
0 (0.00%) |
| realworld.run.linux.x64.checked.mch |
33,069 |
9 |
33,060 |
0 (0.00%) |
0 (0.00%) |
| smoke_tests.nativeaot.linux.x64.checked.mch |
27,422 |
10 |
27,412 |
0 (0.00%) |
0 (0.00%) |
|
2,512,262 |
977,780 |
1,534,482 |
0 (0.00%) |
0 (0.00%) |
jit-analyze output
benchmarks.run.linux.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 16454856 (overridden on cmd)
Total bytes of diff: 16454814 (overridden on cmd)
Total bytes of delta: -42 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-28 : 27287.dasm (-2.59 % of base)
-14 : 10422.dasm (-4.14 % of base)
2 total files with Code Size differences (2 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-28 (-2.59 % of base) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
-14 (-4.14 % of base) : 10422.dasm - System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
Top method improvements (percentages):
-14 (-4.14 % of base) : 10422.dasm - System.SpanHelpers:ReplaceValueType[ushort](byref,byref,ushort,ushort,ulong) (FullOpts)
-28 (-2.59 % of base) : 27287.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
2 total methods with Code Size differences (2 improved, 0 regressed).
coreclr_tests.run.linux.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 403726743 (overridden on cmd)
Total bytes of diff: 403726215 (overridden on cmd)
Total bytes of delta: -528 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-56 : 491717.dasm (-3.11 % of base)
-56 : 491721.dasm (-3.14 % of base)
-56 : 491718.dasm (-3.07 % of base)
-56 : 491722.dasm (-3.09 % of base)
-32 : 205671.dasm (-0.78 % of base)
-32 : 205678.dasm (-0.78 % of base)
-32 : 205673.dasm (-0.77 % of base)
-32 : 205679.dasm (-0.77 % of base)
-28 : 491719.dasm (-1.57 % of base)
-28 : 491715.dasm (-1.60 % of base)
-28 : 491720.dasm (-1.57 % of base)
-28 : 491716.dasm (-1.57 % of base)
-16 : 205676.dasm (-0.39 % of base)
-16 : 205670.dasm (-0.39 % of base)
-16 : 205669.dasm (-0.40 % of base)
-16 : 205675.dasm (-0.39 % of base)
16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-56 (-3.07 % of base) : 491718.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
-56 (-3.14 % of base) : 491721.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
-56 (-3.09 % of base) : 491722.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
-56 (-3.11 % of base) : 491717.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
-32 (-0.77 % of base) : 205673.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
-32 (-0.78 % of base) : 205678.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
-32 (-0.77 % of base) : 205679.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
-32 (-0.78 % of base) : 205671.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
-28 (-1.57 % of base) : 491720.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
-28 (-1.60 % of base) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
-28 (-1.57 % of base) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
-28 (-1.57 % of base) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
-16 (-0.39 % of base) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
-16 (-0.40 % of base) : 205669.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
-16 (-0.39 % of base) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
-16 (-0.39 % of base) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
Top method improvements (percentages):
-56 (-3.14 % of base) : 491721.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
-56 (-3.11 % of base) : 491717.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
-56 (-3.09 % of base) : 491722.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
-56 (-3.07 % of base) : 491718.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
-28 (-1.60 % of base) : 491715.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
-28 (-1.57 % of base) : 491716.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
-28 (-1.57 % of base) : 491720.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
-28 (-1.57 % of base) : 491719.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
-32 (-0.78 % of base) : 205678.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
-32 (-0.78 % of base) : 205671.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
-32 (-0.77 % of base) : 205679.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
-32 (-0.77 % of base) : 205673.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
-16 (-0.40 % of base) : 205669.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
-16 (-0.39 % of base) : 205670.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
-16 (-0.39 % of base) : 205676.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
-16 (-0.39 % of base) : 205675.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
16 total methods with Code Size differences (16 improved, 0 regressed).
libraries.pmi.linux.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 60288822 (overridden on cmd)
Total bytes of diff: 60288353 (overridden on cmd)
Total bytes of delta: -469 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-56 : 207128.dasm (-20.51 % of base)
-56 : 207071.dasm (-20.51 % of base)
-28 : 207073.dasm (-17.18 % of base)
-28 : 207130.dasm (-17.18 % of base)
-28 : 20791.dasm (-2.59 % of base)
-21 : 207070.dasm (-20.19 % of base)
-21 : 207092.dasm (-20.19 % of base)
-21 : 207127.dasm (-20.19 % of base)
-21 : 207149.dasm (-20.19 % of base)
-14 : 207072.dasm (-15.56 % of base)
-14 : 207068.dasm (-17.28 % of base)
-14 : 207091.dasm (-16.67 % of base)
-14 : 207126.dasm (-16.67 % of base)
-14 : 207148.dasm (-16.67 % of base)
-14 : 20784.dasm (-5.19 % of base)
-14 : 207125.dasm (-17.28 % of base)
-14 : 207129.dasm (-15.56 % of base)
-14 : 207147.dasm (-17.28 % of base)
-14 : 20786.dasm (-5.41 % of base)
-14 : 207069.dasm (-16.67 % of base)
24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-56 (-20.51 % of base) : 207071.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-56 (-20.51 % of base) : 207128.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-28 (-2.59 % of base) : 20791.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
-28 (-17.18 % of base) : 207073.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-28 (-17.18 % of base) : 207130.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-21 (-20.19 % of base) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.19 % of base) : 207092.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.19 % of base) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.19 % of base) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-14 (-5.41 % of base) : 20786.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-5.19 % of base) : 20784.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207068.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-15.56 % of base) : 207072.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-14 (-16.67 % of base) : 207069.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207090.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-16.67 % of base) : 207091.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207125.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-15.56 % of base) : 207129.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-14 (-16.67 % of base) : 207126.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207147.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
Top method improvements (percentages):
-56 (-20.51 % of base) : 207071.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-56 (-20.51 % of base) : 207128.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-21 (-20.19 % of base) : 207070.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.19 % of base) : 207092.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.19 % of base) : 207127.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.19 % of base) : 207149.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207068.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207090.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207125.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-17.28 % of base) : 207147.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-28 (-17.18 % of base) : 207073.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-28 (-17.18 % of base) : 207130.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-14 (-16.67 % of base) : 207069.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-16.67 % of base) : 207091.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-16.67 % of base) : 207126.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-16.67 % of base) : 207148.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-15.56 % of base) : 207072.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-14 (-15.56 % of base) : 207129.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-7 (-5.51 % of base) : 20787.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-5.41 % of base) : 20786.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
24 total methods with Code Size differences (24 improved, 0 regressed).
libraries_tests.run.linux.x64.Release.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 342241520 (overridden on cmd)
Total bytes of diff: 342232623 (overridden on cmd)
Total bytes of delta: -8897 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-350 : 432478.dasm (-17.22 % of base)
-350 : 432506.dasm (-19.84 % of base)
-350 : 433621.dasm (-15.43 % of base)
-350 : 433873.dasm (-15.43 % of base)
-350 : 437375.dasm (-15.26 % of base)
-350 : 437419.dasm (-15.26 % of base)
-350 : 439257.dasm (-19.56 % of base)
-350 : 439478.dasm (-15.26 % of base)
-350 : 439729.dasm (-15.26 % of base)
-350 : 446906.dasm (-15.43 % of base)
-350 : 446913.dasm (-17.22 % of base)
-350 : 446980.dasm (-15.43 % of base)
-350 : 448831.dasm (-15.26 % of base)
-350 : 448983.dasm (-15.26 % of base)
-350 : 449381.dasm (-15.26 % of base)
-350 : 449490.dasm (-15.26 % of base)
-350 : 449527.dasm (-17.02 % of base)
-350 : 449532.dasm (-19.56 % of base)
-182 : 443696.dasm (-17.43 % of base)
-182 : 443767.dasm (-17.42 % of base)
62 total files with Code Size differences (62 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-350 (-17.22 % of base) : 432478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
-350 (-17.22 % of base) : 446913.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
-350 (-19.84 % of base) : 432506.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
-350 (-17.02 % of base) : 449527.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
-350 (-19.56 % of base) : 439257.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
-350 (-19.56 % of base) : 449532.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
-350 (-15.43 % of base) : 433873.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
-350 (-15.43 % of base) : 446906.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
-350 (-15.43 % of base) : 433621.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
-350 (-15.43 % of base) : 446980.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 437375.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 439478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 448831.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 449381.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 437419.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 439729.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 448983.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-350 (-15.26 % of base) : 449490.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-182 (-17.43 % of base) : 443696.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
-182 (-17.43 % of base) : 446405.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
Top method improvements (percentages):
-28 (-20.29 % of base) : 447031.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[uint,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]](System.Runtime.Intrinsics.Vector128`1[uint]):uint (Tier1)
-350 (-19.84 % of base) : 432506.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
-350 (-19.56 % of base) : 439257.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
-350 (-19.56 % of base) : 449532.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[ulong,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[ulong],System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,ulong,byref,ulong) (Tier1)
-182 (-17.43 % of base) : 443696.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
-182 (-17.43 % of base) : 446405.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
-182 (-17.42 % of base) : 443767.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
-182 (-17.42 % of base) : 446518.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier1)
-14 (-17.28 % of base) : 437547.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]](System.Runtime.Intrinsics.Vector128`1[ulong]):ulong (Tier1)
-14 (-17.28 % of base) : 449076.dasm - System.Numerics.Tensors.TensorPrimitives:HorizontalAggregate[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.Runtime.Intrinsics.Vector128`1[ulong]):ulong (Tier1)
-14 (-17.28 % of base) : 432810.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
-14 (-17.28 % of base) : 447015.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
-14 (-17.28 % of base) : 433842.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
-14 (-17.28 % of base) : 433093.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
-350 (-17.22 % of base) : 432478.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
-350 (-17.22 % of base) : 446913.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanScalarIntoSpan>g__Vectorized256|223_2[uint,System.Numerics.Tensors.TensorPrimitives+IdentityOperator`1[uint],System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[uint]](byref,uint,byref,ulong) (Tier1)
-14 (-17.07 % of base) : 437311.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-17.07 % of base) : 437354.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-17.07 % of base) : 439184.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-17.07 % of base) : 449015.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
librariestestsnotieredcompilation.run.linux.x64.Release.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 132684790 (overridden on cmd)
Total bytes of diff: 132684034 (overridden on cmd)
Total bytes of delta: -756 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-182 : 160726.dasm (-17.91 % of base)
-182 : 160167.dasm (-17.91 % of base)
-154 : 160688.dasm (-17.21 % of base)
-98 : 160706.dasm (-15.38 % of base)
-98 : 160519.dasm (-15.38 % of base)
-28 : 142512.dasm (-2.59 % of base)
-14 : 161431.dasm (-1.65 % of base)
7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-182 (-17.91 % of base) : 160726.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-182 (-17.91 % of base) : 160167.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-154 (-17.21 % of base) : 160688.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
-98 (-15.38 % of base) : 160706.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-98 (-15.38 % of base) : 160519.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-28 (-2.59 % of base) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
-14 (-1.65 % of base) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
Top method improvements (percentages):
-182 (-17.91 % of base) : 160726.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-182 (-17.91 % of base) : 160167.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-154 (-17.21 % of base) : 160688.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
-98 (-15.38 % of base) : 160706.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-98 (-15.38 % of base) : 160519.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-28 (-2.59 % of base) : 142512.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
-14 (-1.65 % of base) : 161431.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
7 total methods with Code Size differences (7 improved, 0 regressed).
smoke_tests.nativeaot.linux.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 4195910 (overridden on cmd)
Total bytes of diff: 4195882 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-28 : 2104.dasm (-2.61 % of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-28 (-2.61 % of base) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
Top method improvements (percentages):
-28 (-2.61 % of base) : 2104.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
1 total methods with Code Size differences (1 improved, 0 regressed).
osx arm64
Diffs are based on 2,236,017 contexts (927,360 MinOpts, 1,308,657 FullOpts).
No diffs found.
Details
Context information
| Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
| benchmarks.run_pgo.osx.arm64.checked.mch |
84,826 |
48,345 |
36,481 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_tiered.osx.arm64.checked.mch |
48,316 |
37,331 |
10,985 |
0 (0.00%) |
0 (0.00%) |
| coreclr_tests.run.osx.arm64.checked.mch |
586,585 |
358,028 |
228,557 |
0 (0.00%) |
0 (0.00%) |
| libraries.crossgen2.osx.arm64.checked.mch |
233,760 |
15 |
233,745 |
0 (0.00%) |
0 (0.00%) |
| libraries.pmi.osx.arm64.checked.mch |
315,616 |
18 |
315,598 |
0 (0.00%) |
0 (0.00%) |
| libraries_tests.run.osx.arm64.Release.mch |
632,257 |
462,062 |
170,195 |
0 (0.00%) |
0 (0.00%) |
| librariestestsnotieredcompilation.run.osx.arm64.Release.mch |
303,114 |
21,558 |
281,556 |
0 (0.00%) |
0 (0.00%) |
| realworld.run.osx.arm64.checked.mch |
31,543 |
3 |
31,540 |
0 (0.00%) |
0 (0.00%) |
|
2,236,017 |
927,360 |
1,308,657 |
0 (0.00%) |
0 (0.00%) |
windows arm64
Diffs are based on 2,314,798 contexts (929,692 MinOpts, 1,385,106 FullOpts).
No diffs found.
Details
Context information
| Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
| benchmarks.run.windows.arm64.checked.mch |
24,447 |
4 |
24,443 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_pgo.windows.arm64.checked.mch |
96,983 |
48,066 |
48,917 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_tiered.windows.arm64.checked.mch |
48,473 |
36,693 |
11,780 |
0 (0.00%) |
0 (0.00%) |
| coreclr_tests.run.windows.arm64.checked.mch |
595,703 |
362,539 |
233,164 |
0 (0.00%) |
0 (0.00%) |
| libraries.crossgen2.windows.arm64.checked.mch |
243,831 |
15 |
243,816 |
0 (0.00%) |
0 (0.00%) |
| libraries.pmi.windows.arm64.checked.mch |
304,871 |
6 |
304,865 |
0 (0.00%) |
0 (0.00%) |
| libraries_tests.run.windows.arm64.Release.mch |
626,054 |
460,799 |
165,255 |
0 (0.00%) |
0 (0.00%) |
| librariestestsnotieredcompilation.run.windows.arm64.Release.mch |
317,037 |
21,559 |
295,478 |
0 (0.00%) |
0 (0.00%) |
| realworld.run.windows.arm64.checked.mch |
33,244 |
3 |
33,241 |
0 (0.00%) |
0 (0.00%) |
| smoke_tests.nativeaot.windows.arm64.checked.mch |
24,155 |
8 |
24,147 |
0 (0.00%) |
0 (0.00%) |
|
2,314,798 |
929,692 |
1,385,106 |
0 (0.00%) |
0 (0.00%) |
windows x64
Diffs are based on 2,373,201 contexts (928,756 MinOpts, 1,444,445 FullOpts).
Overall (-2,856 bytes)
| Collection |
Base size (bytes) |
Diff size (bytes) |
| benchmarks.run.windows.x64.checked.mch |
8,749,502 |
-28 |
| coreclr_tests.run.windows.x64.checked.mch |
393,893,406 |
-528 |
| libraries.pmi.windows.x64.checked.mch |
61,525,850 |
-499 |
| libraries_tests.run.windows.x64.Release.mch |
279,744,051 |
-1,014 |
| librariestestsnotieredcompilation.run.windows.x64.Release.mch |
137,525,226 |
-759 |
| smoke_tests.nativeaot.windows.x64.checked.mch |
5,089,881 |
-28 |
MinOpts (-248 bytes)
| Collection |
Base size (bytes) |
Diff size (bytes) |
| coreclr_tests.run.windows.x64.checked.mch |
273,505,068 |
-192 |
| libraries_tests.run.windows.x64.Release.mch |
175,004,596 |
-56 |
FullOpts (-2,608 bytes)
| Collection |
Base size (bytes) |
Diff size (bytes) |
| benchmarks.run.windows.x64.checked.mch |
8,749,141 |
-28 |
| coreclr_tests.run.windows.x64.checked.mch |
120,388,338 |
-336 |
| libraries.pmi.windows.x64.checked.mch |
61,412,331 |
-499 |
| libraries_tests.run.windows.x64.Release.mch |
104,739,455 |
-958 |
| librariestestsnotieredcompilation.run.windows.x64.Release.mch |
126,648,064 |
-759 |
| smoke_tests.nativeaot.windows.x64.checked.mch |
5,088,934 |
-28 |
Example diffs
benchmarks.run.windows.x64.checked.mch
-28 (-2.50%) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm1, ymm3, ymm1
vpand ymm2, ymm2, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54
- vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3
+ vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm1, ymm1, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm2, ymm3, ymm2
vpand ymm0, ymm0, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54
- vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3
+ vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm0, ymm0, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand ymm0, ymm1, ymm0
vptest ymm0, ymm0
je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpermq ymm0, ymm0, -40
vpmovmskb ebp, ymm0
@@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm1, xmm10, xmm1
vpand xmm2, xmm2, xmm11
vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm2, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54
- vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3
+ vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm1, xmm1, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm2, xmm10, xmm2
vpand xmm0, xmm0, xmm11
vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm0, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54
- vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3
+ vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm0, xmm0, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand xmm0, xmm1, xmm0
vptest xmm0, xmm0
je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpmovmskb ebp, xmm0
;; size=4 bbWeight=2 PerfScore 4.00
@@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Unwind Info:
coreclr_tests.run.windows.x64.checked.mch
-28 (-1.46%) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
@@ -331,10 +331,9 @@ G_M1266_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M1266_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpd k1, ymm8, ymm6, 2
- vpmovm2d ymm9, k1
- vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -389,10 +388,9 @@ G_M1266_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M1266_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpd k1, ymm8, ymm7, 2
- vpmovm2d ymm9, k1
- vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -447,10 +445,9 @@ G_M1266_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M1266_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpd k1, ymm8, ymm6, 5
- vpmovm2d ymm9, k1
- vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -505,10 +502,9 @@ G_M1266_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M1266_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpd k1, ymm7, ymm6, 5
- vpmovm2d ymm9, k1
- vpternlogd ymm9, ymm8, ymm7, -54
+ vpblendmd ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M1266_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -687,7 +683,7 @@ G_M1266_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ret
;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1922, prolog size 82, PerfScore 1043.58, instruction count 381, allocated bytes for code 1922 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
+; Total bytes of code 1894, prolog size 82, PerfScore 1038.92, instruction count 377, allocated bytes for code 1894 (MethodHash=881ffb0d) for method VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
; ============================================================
Unwind Info:
-28 (-1.43%) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
@@ -331,10 +331,9 @@ G_M59915_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M59915_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpq k1, ymm8, ymm6, 2
- vpmovm2q ymm9, k1
- vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -389,10 +388,9 @@ G_M59915_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpq k1, ymm8, ymm7, 2
- vpmovm2q ymm9, k1
- vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -447,10 +445,9 @@ G_M59915_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M59915_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpq k1, ymm8, ymm6, 5
- vpmovm2q ymm9, k1
- vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -505,10 +502,9 @@ G_M59915_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M59915_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpq k1, ymm7, ymm6, 5
- vpmovm2q ymm9, k1
- vpternlogq ymm9, ymm8, ymm7, -54
+ vpblendmq ymm9 {k1}, ymm7, ymm8
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 4.75
+ ;; size=15 bbWeight=1 PerfScore 3.58
G_M59915_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -687,7 +683,7 @@ G_M59915_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ret
;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1956, prolog size 82, PerfScore 1049.58, instruction count 381, allocated bytes for code 1956 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 1928, prolog size 82, PerfScore 1044.92, instruction count 377, allocated bytes for code 1928 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================
Unwind Info:
-28 (-1.42%) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
@@ -334,10 +334,9 @@ G_M8563_IG17: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M8563_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpw k1, ymm6, ymm7, 2
- vpmovm2w ymm9, k1
- vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG19: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -392,10 +391,9 @@ G_M8563_IG21: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpw k1, ymm6, ymm8, 2
- vpmovm2w ymm9, k1
- vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG23: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -450,10 +448,9 @@ G_M8563_IG25: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M8563_IG26: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpw k1, ymm6, ymm7, 5
- vpmovm2w ymm9, k1
- vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG27: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -508,10 +505,9 @@ G_M8563_IG29: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
;; size=11 bbWeight=4 PerfScore 6.00
G_M8563_IG30: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpcmpw k1, ymm8, ymm7, 5
- vpmovm2w ymm9, k1
- vpternlogd ymm9, ymm6, ymm8, -54
+ vpblendmw ymm9 {k1}, ymm8, ymm6
xor ebx, ebx
- ;; size=22 bbWeight=1 PerfScore 5.75
+ ;; size=15 bbWeight=1 PerfScore 5.25
G_M8563_IG31: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov ecx, ebx
vmovups ymmword ptr [rsp+0x20], ymm9
@@ -690,7 +686,7 @@ G_M8563_IG42: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ret
;; size=73 bbWeight=1 PerfScore 35.25
-; Total bytes of code 1968, prolog size 82, PerfScore 1206.33, instruction count 383, allocated bytes for code 1968 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
+; Total bytes of code 1940, prolog size 82, PerfScore 1204.33, instruction count 379, allocated bytes for code 1940 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
; ============================================================
Unwind Info:
-16 (-0.39%) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
@@ -437,14 +437,13 @@ G_M59915_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG18
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x100], ecx
jmp G_M59915_IG25
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x100], 4
jae G_M59915_IG54
@@ -523,14 +522,13 @@ G_M59915_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG23
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x108], ecx
jmp G_M59915_IG30
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x108], 4
jae G_M59915_IG54
@@ -609,14 +607,13 @@ G_M59915_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG28
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x110], ecx
jmp G_M59915_IG35
- ;; size=72 bbWeight=1 PerfScore 23.25
+ ;; size=68 bbWeight=1 PerfScore 22.25
G_M59915_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x110], 4
jae G_M59915_IG54
@@ -695,14 +692,13 @@ G_M59915_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M59915_IG33
vmovups ymm0, ymmword ptr [rbp-0x90]
vpcmpq k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2q ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogq ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmq ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x118], ecx
jmp G_M59915_IG40
- ;; size=75 bbWeight=1 PerfScore 23.25
+ ;; size=71 bbWeight=1 PerfScore 22.25
G_M59915_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x118], 4
jae G_M59915_IG54
@@ -964,7 +960,7 @@ G_M59915_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4100, prolog size 77, PerfScore 831.26, instruction count 657, allocated bytes for code 4100 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
+; Total bytes of code 4084, prolog size 77, PerfScore 827.26, instruction count 653, allocated bytes for code 4088 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
; ============================================================
Unwind Info:
-16 (-0.39%) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
@@ -443,14 +443,13 @@ G_M44299_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG18
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x100], ecx
jmp G_M44299_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x100], 32
jae G_M44299_IG54
@@ -529,14 +528,13 @@ G_M44299_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG23
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpb k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x108], ecx
jmp G_M44299_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x108], 32
jae G_M44299_IG54
@@ -615,14 +613,13 @@ G_M44299_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG28
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x110], ecx
jmp G_M44299_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M44299_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x110], 32
jae G_M44299_IG54
@@ -701,14 +698,13 @@ G_M44299_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M44299_IG33
vmovups ymm0, ymmword ptr [rbp-0x90]
vpcmpb k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2b ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmb ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x118], ecx
jmp G_M44299_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M44299_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x118], 32
jae G_M44299_IG54
@@ -970,7 +966,7 @@ G_M44299_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4123, prolog size 77, PerfScore 865.01, instruction count 663, allocated bytes for code 4123 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
+; Total bytes of code 4107, prolog size 77, PerfScore 865.01, instruction count 659, allocated bytes for code 4111 (MethodHash=09db52f4) for method VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
; ============================================================
Unwind Info:
-16 (-0.39%) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
@@ -443,14 +443,13 @@ G_M8563_IG22: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG18
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 2
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x100], ecx
jmp G_M8563_IG25
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG23: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x100], 16
jae G_M8563_IG54
@@ -529,14 +528,13 @@ G_M8563_IG27: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG23
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0x90], 2
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x108], ecx
jmp G_M8563_IG30
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG28: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x108], 16
jae G_M8563_IG54
@@ -615,14 +613,13 @@ G_M8563_IG32: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG28
vmovups ymm0, ymmword ptr [rbp-0x70]
vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x110], ecx
jmp G_M8563_IG35
- ;; size=72 bbWeight=1 PerfScore 24.25
+ ;; size=68 bbWeight=1 PerfScore 24.25
G_M8563_IG33: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x110], 16
jae G_M8563_IG54
@@ -701,14 +698,13 @@ G_M8563_IG37: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M8563_IG33
vmovups ymm0, ymmword ptr [rbp-0x90]
vpcmpw k1, ymm0, ymmword ptr [rbp-0xB0], 5
- vpmovm2w ymm0, k1
- vmovups ymm1, ymmword ptr [rbp-0x70]
- vpternlogd ymm0, ymm1, ymmword ptr [rbp-0x90], -54
+ vmovups ymm0, ymmword ptr [rbp-0x90]
+ vpblendmw ymm0 {k1}, ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rbp-0xD0], ymm0
xor ecx, ecx
mov dword ptr [rbp-0x118], ecx
jmp G_M8563_IG40
- ;; size=75 bbWeight=1 PerfScore 24.25
+ ;; size=71 bbWeight=1 PerfScore 24.25
G_M8563_IG38: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
cmp dword ptr [rbp-0x118], 16
jae G_M8563_IG54
@@ -970,7 +966,7 @@ G_M8563_IG54: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 4123, prolog size 77, PerfScore 865.01, instruction count 663, allocated bytes for code 4123 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
+; Total bytes of code 4107, prolog size 77, PerfScore 865.01, instruction count 659, allocated bytes for code 4111 (MethodHash=33a1de8c) for method VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
; ============================================================
Unwind Info:
libraries.pmi.windows.x64.checked.mch
-21 (-20.39%) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte
@@ -17,7 +17,7 @@
; V06 tmp1 [V06,T05] ( 3, 3 ) simd64 -> mm1 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
; V07 tmp2 [V07,T06] ( 3, 3 ) simd64 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm0 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
; V11 cse0 [V11,T03] ( 5, 5 ) simd64 -> mm0 "CSE - aggressive"
; V12 cse1 [V12,T04] ( 4, 4 ) simd64 -> mm2 "CSE - aggressive"
@@ -34,25 +34,22 @@ G_M27576_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r
vmovups zmm2, zmmword ptr [r8]
vmovaps zmm3, zmm2
vpcmpeqb k1, zmm1, zmm3
- vpmovm2b zmm4, k1
- vxorps ymm5, ymm5, ymm5
- vpcmpub k1, zmm0, zmm5, 1
- vpmovm2b zmm5, k1
- vpternlogd zmm5, zmm2, zmm0, -54
- vpcmpub k1, zmm1, zmm3, 6
- vpmovm2b zmm1, k1
- vpternlogd zmm1, zmm0, zmm2, -54
- vpternlogd zmm4, zmm5, zmm1, -54
- vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4
+ vpcmpub k2, zmm0, zmm4, 1
+ vpblendmb zmm4 {k2}, zmm0, zmm2
+ vpcmpub k2, zmm1, zmm3, 6
+ vpblendmb zmm0 {k2}, zmm2, zmm0
+ vpblendmb zmm0 {k1}, zmm0, zmm4
+ vmovups zmmword ptr [rcx], zmm0
mov rax, rcx
; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M27576_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
ret
;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-21 (-20.39%) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte
@@ -17,7 +17,7 @@
; V06 tmp1 [V06,T05] ( 3, 3 ) simd64 -> mm1 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
; V07 tmp2 [V07,T06] ( 3, 3 ) simd64 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V08 tmp3 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V09 tmp4 [V09,T07] ( 2, 2 ) simd64 -> mm0 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V10 tmp5 [V10 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
; V11 cse0 [V11,T03] ( 5, 5 ) simd64 -> mm2 "CSE - aggressive"
; V12 cse1 [V12,T04] ( 4, 4 ) simd64 -> mm0 "CSE - aggressive"
@@ -34,25 +34,22 @@ G_M10214_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r
vmovups zmm2, zmmword ptr [r8]
vmovaps zmm3, zmm2
vpcmpeqb k1, zmm3, zmm1
- vpmovm2b zmm4, k1
- vxorps ymm5, ymm5, ymm5
- vpcmpub k1, zmm2, zmm5, 1
- vpmovm2b zmm5, k1
- vpternlogd zmm5, zmm2, zmm0, -54
- vpcmpub k1, zmm3, zmm1, 1
- vpmovm2b zmm1, k1
- vpternlogd zmm1, zmm2, zmm0, -54
- vpternlogd zmm4, zmm5, zmm1, -54
- vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4
+ vpcmpub k2, zmm2, zmm4, 1
+ vpblendmb zmm4 {k2}, zmm0, zmm2
+ vpcmpub k2, zmm3, zmm1, 1
+ vpblendmb zmm0 {k2}, zmm0, zmm2
+ vpblendmb zmm0 {k1}, zmm0, zmm4
+ vmovups zmmword ptr [rcx], zmm0
mov rax, rcx
; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
ret
;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-21 (-20.39%) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte
@@ -13,7 +13,7 @@
; V02 arg1 [V02,T01] ( 3, 6 ) byref -> r8 single-def
; V03 loc0 [V03,T05] ( 3, 3 ) simd64 -> mm1 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
; V04 loc1 [V04,T06] ( 3, 3 ) simd64 -> mm3 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T07] ( 2, 2 ) simd64 -> mm4 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T07] ( 2, 2 ) simd64 -> mm0 <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
@@ -34,25 +34,22 @@ G_M22834_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r
vmovups zmm2, zmmword ptr [r8]
vmovaps zmm3, zmm2
vpcmpeqb k1, zmm1, zmm3
- vpmovm2b zmm4, k1
- vxorps ymm5, ymm5, ymm5
- vpcmpub k1, zmm0, zmm5, 1
- vpmovm2b zmm5, k1
- vpternlogd zmm5, zmm2, zmm0, -54
- vpcmpub k1, zmm1, zmm3, 6
- vpmovm2b zmm1, k1
- vpternlogd zmm1, zmm0, zmm2, -54
- vpternlogd zmm4, zmm5, zmm1, -54
- vmovups zmmword ptr [rcx], zmm4
+ vxorps ymm4, ymm4, ymm4
+ vpcmpub k2, zmm0, zmm4, 1
+ vpblendmb zmm4 {k2}, zmm0, zmm2
+ vpcmpub k2, zmm1, zmm3, 6
+ vpblendmb zmm0 {k2}, zmm2, zmm0
+ vpblendmb zmm0 {k1}, zmm0, zmm4
+ vmovups zmmword ptr [rcx], zmm0
mov rax, rcx
; byrRegs +[rax]
- ;; size=96 bbWeight=1 PerfScore 25.08
+ ;; size=75 bbWeight=1 PerfScore 23.58
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
ret
;; size=4 bbWeight=1 PerfScore 2.00
-; Total bytes of code 103, prolog size 3, PerfScore 28.08, instruction count 19, allocated bytes for code 103 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 82, prolog size 3, PerfScore 26.58, instruction count 16, allocated bytes for code 82 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-7 (-5.65%) : 27696.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte]):System.Runtime.Intrinsics.Vector2561ubyte
@@ -35,14 +35,13 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vpshufb ymm1, ymm2, ymm1
vpand ymm0, ymm0, ymmword ptr [reloc @RWD64]
vpcmpub k1, ymm0, ymmword ptr [reloc @RWD96], 6
- vpmovm2b ymm2, k1
- vmovups ymm3, ymmword ptr [r8]
- vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD128]
- vpshufb ymm3, ymm3, ymm4
- vmovups ymm4, ymmword ptr [rdx]
- vpshufb ymm0, ymm4, ymm0
- vpternlogd ymm2, ymm3, ymm0, -54
- vpand ymm0, ymm2, ymm1
+ vmovups ymm2, ymmword ptr [r8]
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD128]
+ vpshufb ymm2, ymm2, ymm3
+ vmovups ymm3, ymmword ptr [rdx]
+ vpshufb ymm0, ymm3, ymm0
+ vpblendmb ymm0 {k1}, ymm0, ymm2
+ vpand ymm0, ymm0, ymm1
vxorps ymm1, ymm1, ymm1
vpcmpeqb ymm0, ymm0, ymm1
vpcmpeqd ymm1, ymm1, ymm1
@@ -50,7 +49,7 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vmovups ymmword ptr [rcx], ymm0
mov rax, rcx
; byrRegs +[rax]
- ;; size=117 bbWeight=1 PerfScore 42.75
+ ;; size=110 bbWeight=1 PerfScore 42.25
G_M53822_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
ret
@@ -62,7 +61,7 @@ RWD96 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD128 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 124, prolog size 3, PerfScore 45.75, instruction count 24, allocated bytes for code 124 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 117, prolog size 3, PerfScore 45.25, instruction count 23, allocated bytes for code 117 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
-7 (-4.58%) : 293786.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float
@@ -39,9 +39,8 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
vpandd zmm3, zmm4, dword ptr [reloc @RWD128] {1to16}
vpord zmm4, zmm3, dword ptr [reloc @RWD132] {1to16}
vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1
- vpslld zmm5, zmm4, 1
- vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1
+ vpblendmd zmm2 {k1}, zmm2, zmm4
vpslld zmm0, zmm0, 13
vpandd zmm0, zmm0, dword ptr [reloc @RWD136] {1to16}
vpaddd zmm0, zmm0, zmm2
@@ -50,7 +49,7 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
vmovups zmmword ptr [rcx], zmm0
mov rax, rcx
; byrRegs +[rax]
- ;; size=146 bbWeight=1 PerfScore 33.25
+ ;; size=139 bbWeight=1 PerfScore 32.25
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
ret
@@ -65,7 +64,7 @@ RWD132 dd 38000000h
RWD136 dd 0FFFE000h
-; Total bytes of code 153, prolog size 3, PerfScore 36.25, instruction count 24, allocated bytes for code 153 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 146, prolog size 3, PerfScore 35.25, instruction count 23, allocated bytes for code 146 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================
Unwind Info:
-28 (-2.50%) : 27702.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm1, ymm3, ymm1
vpand ymm2, ymm2, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54
- vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3
+ vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm1, ymm1, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm2, ymm3, ymm2
vpand ymm0, ymm0, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54
- vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3
+ vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm0, ymm0, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand ymm0, ymm1, ymm0
vptest ymm0, ymm0
je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpermq ymm0, ymm0, -40
vpmovmskb ebp, ymm0
@@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm1, xmm10, xmm1
vpand xmm2, xmm2, xmm11
vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm2, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54
- vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3
+ vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm1, xmm1, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm2, xmm10, xmm2
vpand xmm0, xmm0, xmm11
vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm0, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54
- vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3
+ vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm0, xmm0, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand xmm0, xmm1, xmm0
vptest xmm0, xmm0
je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpmovmskb ebp, xmm0
;; size=4 bbWeight=2 PerfScore 4.00
@@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Unwind Info:
libraries_tests.run.windows.x64.Release.mch
-14 (-16.67%) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong
@@ -37,21 +37,19 @@ G_M11551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r
vpcmpeqq xmm4, xmm1, xmm3
vxorps xmm5, xmm5, xmm5
vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1
- vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1
- vpternlogq xmm1, xmm0, xmm2, -54
- vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0
+ vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4
mov rax, rcx
; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M11551_IG03: ; bbWeight=1, epilog, nogc, extend
ret
;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================
Unwind Info:
-14 (-16.67%) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong
@@ -36,21 +36,19 @@ G_M1813_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r8
vpcmpeqq xmm4, xmm1, xmm3
vxorps xmm5, xmm5, xmm5
vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1
- vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1
- vpternlogq xmm1, xmm0, xmm2, -54
- vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0
+ vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4
mov rax, rcx
; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M1813_IG03: ; bbWeight=1, epilog, nogc, extend
ret
;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=f881f8ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=f881f8ea) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================
Unwind Info:
-14 (-16.67%) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector1281[ulong],System.Runtime.Intrinsics.Vector1281[ulong]):System.Runtime.Intrinsics.Vector1281ulong
@@ -37,21 +37,19 @@ G_M11551_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r
vpcmpeqq xmm4, xmm1, xmm3
vxorps xmm5, xmm5, xmm5
vpcmpuq k1, xmm0, xmm5, 1
- vpmovm2q xmm5, k1
- vpternlogq xmm5, xmm2, xmm0, -54
+ vpblendmq xmm5 {k1}, xmm0, xmm2
vpcmpuq k1, xmm1, xmm3, 6
- vpmovm2q xmm1, k1
- vpternlogq xmm1, xmm0, xmm2, -54
- vpternlogq xmm4, xmm5, xmm1, -54
+ vpblendmq xmm0 {k1}, xmm2, xmm0
+ vpternlogq xmm4, xmm5, xmm0, -54
vmovups xmmword ptr [rcx], xmm4
mov rax, rcx
; byrRegs +[rax]
- ;; size=80 bbWeight=1 PerfScore 21.08
+ ;; size=66 bbWeight=1 PerfScore 18.75
G_M11551_IG03: ; bbWeight=1, epilog, nogc, extend
ret
;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 84, prolog size 3, PerfScore 23.08, instruction count 17, allocated bytes for code 84 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
+; Total bytes of code 70, prolog size 3, PerfScore 20.75, instruction count 15, allocated bytes for code 70 (MethodHash=e91fd2e0) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
; ============================================================
Unwind Info:
-7 (-2.52%) : 392581.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector2561[ulong],System.Runtime.Intrinsics.Vector2561[ulong]):System.Runtime.Intrinsics.Vector2561ulong
@@ -59,16 +59,17 @@ G_M12395_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpternlogq ymm1, ymm2, ymmword ptr [rcx], -54
vmovups ymm2, ymmword ptr [rbp-0x50]
vpcmpuq k1, ymm2, ymmword ptr [rbp-0x30], 1
- vpmovm2q ymm2, k1
mov rcx, bword ptr [rbp+0x20]
- vmovups ymm3, ymmword ptr [rcx]
- mov rcx, bword ptr [rbp+0x18]
- vpternlogq ymm2, ymm3, ymmword ptr [rcx], -54
+ mov rax, bword ptr [rbp+0x18]
+ ; byrRegs +[rax]
+ vmovups ymm2, ymmword ptr [rax]
+ vpblendmq ymm2 {k1}, ymm2, ymmword ptr [rcx]
vpternlogq ymm0, ymm1, ymm2, -54
vmovups ymmword ptr [rbp-0x70], ymm0
mov rcx, 0xD1FFAB1E
; byrRegs -[rcx]
call CORINFO_HELP_COUNTPROFILE32
+ ; byrRegs -[rax]
mov rcx, 0xD1FFAB1E
call CORINFO_HELP_COUNTPROFILE32
mov rax, bword ptr [rbp+0x10]
@@ -76,7 +77,7 @@ G_M12395_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rax], ymm0
mov rax, bword ptr [rbp+0x10]
- ;; size=200 bbWeight=1 PerfScore 77.00
+ ;; size=193 bbWeight=1 PerfScore 76.00
G_M12395_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
add rsp, 272
@@ -84,7 +85,7 @@ G_M12395_IG03: ; bbWeight=1, epilog, nogc, extend
ret
;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 278, prolog size 54, PerfScore 95.83, instruction count 53, allocated bytes for code 278 (MethodHash=3d67cf94) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
+; Total bytes of code 271, prolog size 54, PerfScore 94.83, instruction count 52, allocated bytes for code 272 (MethodHash=3d67cf94) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
; ============================================================
Unwind Info:
-7 (-2.52%) : 392539.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ulong]:Invoke(System.Runtime.Intrinsics.Vector2561[ulong],System.Runtime.Intrinsics.Vector2561[ulong]):System.Runtime.Intrinsics.Vector2561ulong
@@ -59,16 +59,17 @@ G_M63669_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpternlogq ymm1, ymm2, ymmword ptr [rcx], -54
vmovups ymm2, ymmword ptr [rbp-0x30]
vpcmpuq k1, ymm2, ymmword ptr [rbp-0x50], 6
- vpmovm2q ymm2, k1
mov rcx, bword ptr [rbp+0x18]
- vmovups ymm3, ymmword ptr [rcx]
- mov rcx, bword ptr [rbp+0x20]
- vpternlogq ymm2, ymm3, ymmword ptr [rcx], -54
+ mov rax, bword ptr [rbp+0x20]
+ ; byrRegs +[rax]
+ vmovups ymm2, ymmword ptr [rax]
+ vpblendmq ymm2 {k1}, ymm2, ymmword ptr [rcx]
vpternlogq ymm0, ymm1, ymm2, -54
vmovups ymmword ptr [rbp-0x70], ymm0
mov rcx, 0xD1FFAB1E
; byrRegs -[rcx]
call CORINFO_HELP_COUNTPROFILE32
+ ; byrRegs -[rax]
mov rcx, 0xD1FFAB1E
call CORINFO_HELP_COUNTPROFILE32
mov rax, bword ptr [rbp+0x10]
@@ -76,7 +77,7 @@ G_M63669_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vmovups ymm0, ymmword ptr [rbp-0x70]
vmovups ymmword ptr [rax], ymm0
mov rax, bword ptr [rbp+0x10]
- ;; size=200 bbWeight=1 PerfScore 77.00
+ ;; size=193 bbWeight=1 PerfScore 76.00
G_M63669_IG03: ; bbWeight=1, epilog, nogc, extend
vzeroupper
add rsp, 272
@@ -84,7 +85,7 @@ G_M63669_IG03: ; bbWeight=1, epilog, nogc, extend
ret
;; size=12 bbWeight=1 PerfScore 2.75
-; Total bytes of code 278, prolog size 54, PerfScore 95.83, instruction count 53, allocated bytes for code 278 (MethodHash=dc23074a) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
+; Total bytes of code 271, prolog size 54, PerfScore 94.83, instruction count 52, allocated bytes for code 272 (MethodHash=dc23074a) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Instrumented Tier0)
; ============================================================
Unwind Info:
-28 (-2.40%) : 342446.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
@@ -172,12 +172,11 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs
vpshufb ymm1, ymm3, ymm1
vpand ymm2, ymm2, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54
- vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3
+ vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm1, ymm1, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -188,12 +187,11 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs
vpshufb ymm2, ymm3, ymm2
vpand ymm0, ymm0, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54
- vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3
+ vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm0, ymm0, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -201,7 +199,7 @@ G_M48875_IG04: ; bbWeight=4.37, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs
vpand ymm10, ymm1, ymm0
vptest ymm10, ymm10
jne SHORT G_M48875_IG07
- ;; size=248 bbWeight=4.37 PerfScore 346.74
+ ;; size=234 bbWeight=4.37 PerfScore 342.37
G_M48875_IG05: ; bbWeight=3.45, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
add r15, 64
cmp r15, rsi
@@ -318,12 +316,11 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs
vpshufb xmm1, xmm3, xmm1
vpand xmm2, xmm2, xmmword ptr [reloc @RWD96]
vpcmpub k1, xmm2, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm2, xmmword ptr [reloc @RWD160]
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmmword ptr [reloc @RWD160]
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54
- vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3
+ vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm1, xmm1, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -334,12 +331,11 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs
vpshufb xmm2, xmm3, xmm2
vpand xmm0, xmm0, xmmword ptr [reloc @RWD96]
vpcmpub k1, xmm0, xmmword ptr [reloc @RWD128], 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm0, xmmword ptr [reloc @RWD160]
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmmword ptr [reloc @RWD160]
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54
- vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3
+ vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm0, xmm0, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -347,7 +343,7 @@ G_M48875_IG17: ; bbWeight=0.08, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rs
vpand xmm0, xmm1, xmm0
vptest xmm0, xmm0
je SHORT G_M48875_IG21
- ;; size=248 bbWeight=0.08 PerfScore 5.05
+ ;; size=234 bbWeight=0.08 PerfScore 4.97
G_M48875_IG18: ; bbWeight=0.07, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpmovmskb ebp, xmm0
;; size=4 bbWeight=0.07 PerfScore 0.14
@@ -444,7 +440,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1165, prolog size 86, PerfScore 510.92, instruction count 247, allocated bytes for code 1165 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
+; Total bytes of code 1137, prolog size 86, PerfScore 506.47, instruction count 243, allocated bytes for code 1137 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
; ============================================================
Unwind Info:
librariestestsnotieredcompilation.run.windows.x64.Release.mch
-28 (-2.50%) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -186,12 +186,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm1, ymm3, ymm1
vpand ymm2, ymm2, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54
- vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3
+ vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm1, ymm1, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -202,12 +201,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm2, ymm3, ymm2
vpand ymm0, ymm0, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54
- vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3
+ vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm0, ymm0, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -215,7 +213,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand ymm0, ymm1, ymm0
vptest ymm0, ymm0
je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpermq ymm0, ymm0, -40
vpmovmskb ebp, ymm0
@@ -338,12 +336,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm1, xmm10, xmm1
vpand xmm2, xmm2, xmm11
vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm2, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54
- vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3
+ vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm1, xmm1, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -353,12 +350,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm2, xmm10, xmm2
vpand xmm0, xmm0, xmm11
vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm0, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54
- vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3
+ vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm0, xmm0, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -366,7 +362,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand xmm0, xmm1, xmm0
vptest xmm0, xmm0
je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpmovmskb ebp, xmm0
;; size=4 bbWeight=2 PerfScore 4.00
@@ -433,7 +429,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1118, prolog size 86, PerfScore 1254.58, instruction count 240, allocated bytes for code 1118 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1090, prolog size 86, PerfScore 1246.58, instruction count 236, allocated bytes for code 1090 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Unwind Info:
-17 (-1.75%) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)
@@ -15,7 +15,7 @@
;* V04 loc3 [V04 ] ( 0, 0 ) simd32 -> zero-ref <System.Numerics.Vector`1[uint]>
; V05 loc4 [V05,T16] ( 2, 2 ) simd32 -> mm8 <System.Numerics.Vector`1[uint]>
;* V06 loc5 [V06 ] ( 0, 0 ) simd32 -> zero-ref <System.Numerics.Vector`1[uint]>
-; V07 loc6 [V07,T17] ( 2, 2 ) simd32 -> mm8 <System.Numerics.Vector`1[uint]>
+; V07 loc6 [V07,T17] ( 2, 2 ) simd32 -> mm6 <System.Numerics.Vector`1[uint]>
;* V08 loc7 [V08 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op <System.Nullable`1[int]>
; V09 OutArgs [V09 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V10 tmp1 [V10 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "NewObj constructor temp" <System.Numerics.Tests.GenericVectorTests+<>c__DisplayClass670_0`1[uint]>
@@ -26,7 +26,7 @@
;* V15 tmp6 [V15,T08] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
; V16 tmp7 [V16,T12] ( 9, 18 ) simd32 -> mm8 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]>
;* V17 tmp8 [V17,T09] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
-; V18 tmp9 [V18,T13] ( 9, 18 ) simd32 -> mm8 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]>
+; V18 tmp9 [V18,T13] ( 9, 18 ) simd32 -> mm6 ld-addr-op "Inlining Arg" <System.Numerics.Vector`1[uint]>
;* V19 tmp10 [V19,T10] ( 0, 0 ) ubyte -> zero-ref single-def "field V08.hasValue (fldOffset=0x0)" P-INDEP
;* V20 tmp11 [V20,T11] ( 0, 0 ) int -> zero-ref single-def "field V08.value (fldOffset=0x4)" P-INDEP
; V21 tmp12 [V21,T02] ( 4, 8 ) struct ( 8) [rsp+0x20] do-not-enreg[SF] "by-value struct argument" <System.Nullable`1[int]>
@@ -104,8 +104,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
jl G_M21446_IG06
vmovups ymm7, ymmword ptr [rcx+0x10]
vpcmpud k1, ymm6, ymm7, 6
- vpmovm2d ymm8, k1
- vpternlogd ymm8, ymm6, ymm7, -54
+ vpblendmd ymm8 {k1}, ymm7, ymm6
mov rcx, 0xD1FFAB1E ; System.Action`2[int,uint]
; gcrRegs -[rcx]
vextractf128 xmm9, ymm6, 1
@@ -151,7 +150,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
vextractf128 xmm12, ymm8, 1
- ;; size=312 bbWeight=1 PerfScore 88.00
+ ;; size=305 bbWeight=1 PerfScore 86.83
G_M21446_IG03: ; bbWeight=1, extend
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
@@ -208,10 +207,9 @@ G_M21446_IG03: ; bbWeight=1, extend
vinsertf128 ymm6, ymm6, xmm9, 1
vinsertf128 ymm7, ymm7, xmm10, 1
vpcmpud k1, ymm6, ymm7, 2
- vpmovm2d ymm8, k1
- vpternlogd ymm8, ymm6, ymm7, -54
+ vpblendmd ymm6 {k1}, ymm7, ymm6
mov rcx, 0xD1FFAB1E ; System.Action`2[int,uint]
- vextractf128 xmm6, ymm8, 1
+ vextractf128 xmm7, ymm6, 1
call CORINFO_HELP_NEWSFAST
; gcrRegs +[rax]
; gcr arg pop 0
@@ -226,79 +224,79 @@ G_M21446_IG03: ; bbWeight=1, extend
; byrRegs -[rcx]
mov r8, 0xD1FFAB1E ; code for <unknown method>
mov qword ptr [rsi+0x18], r8
- vinsertf128 ymm8, ymm8, xmm6, 1
- vmovd r8d, xmm8
+ vinsertf128 ymm6, ymm6, xmm7, 1
+ vmovd r8d, xmm6
xor edx, edx
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1
- vmovaps ymm0, ymm8
+ vinsertf128 ymm6, ymm6, xmm8, 1
+ vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 1
mov edx, 1
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1
- vmovaps ymm0, ymm8
+ vinsertf128 ymm6, ymm6, xmm8, 1
+ vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 2
mov edx, 2
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- ;; size=353 bbWeight=1 PerfScore 120.75
+ vinsertf128 ymm6, ymm6, xmm8, 1
+ ;; size=350 bbWeight=1 PerfScore 121.58
G_M21446_IG04: ; bbWeight=1, extend
- vinsertf128 ymm8, ymm8, xmm7, 1
- vmovaps ymm0, ymm8
+ vmovaps ymm0, ymm6
vpextrd r8d, xmm0, 3
mov edx, 3
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1
- vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1
+ vextracti128 xmm0, ymm6, 1
vmovd r8d, xmm0
mov edx, 4
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1
- vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1
+ vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 1
mov edx, 5
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1
- vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1
+ vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 2
mov edx, 6
mov rcx, gword ptr [rsi+0x08]
; gcrRegs +[rcx]
- vextractf128 xmm7, ymm8, 1
+ vextractf128 xmm8, ymm6, 1
call [rsi+0x18]<unknown method>
; gcrRegs -[rcx]
; gcr arg pop 0
- vinsertf128 ymm8, ymm8, xmm7, 1
- vextracti128 xmm0, ymm8, 1
+ vinsertf128 ymm6, ymm6, xmm8, 1
+ vextracti128 xmm0, ymm6, 1
vpextrd r8d, xmm0, 3
mov edx, 7
mov rcx, gword ptr [rsi+0x08]
@@ -307,7 +305,7 @@ G_M21446_IG04: ; bbWeight=1, extend
; gcrRegs -[rcx rsi]
; gcr arg pop 0
nop
- ;; size=173 bbWeight=1 PerfScore 66.75
+ ;; size=166 bbWeight=1 PerfScore 64.75
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend
vmovaps xmm6, xmmword ptr [rsp+0x90]
vmovaps xmm7, xmmword ptr [rsp+0x80]
@@ -328,7 +326,7 @@ G_M21446_IG06: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
int3
;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 973, prolog size 67, PerfScore 325.25, instruction count 194, allocated bytes for code 973 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 956, prolog size 67, PerfScore 322.92, instruction count 192, allocated bytes for code 956 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================
Unwind Info:
smoke_tests.nativeaot.windows.x64.checked.mch
-28 (-2.52%) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
@@ -185,12 +185,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm1, ymm3, ymm1
vpand ymm2, ymm2, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm2, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm2, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm2, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm2, ymm8, ymm2
- vpternlogd ymm3, ymm4, ymm2, -54
- vpand ymm1, ymm3, ymm1
+ vpblendmb ymm2 {k1}, ymm2, ymm3
+ vpand ymm1, ymm2, ymm1
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm1, ymm1, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -201,12 +200,11 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb ymm2, ymm3, ymm2
vpand ymm0, ymm0, ymmword ptr [reloc @RWD96]
vpcmpub k1, ymm0, ymmword ptr [reloc @RWD128], 6
- vpmovm2b ymm3, k1
- vpsubb ymm4, ymm0, ymmword ptr [reloc @RWD160]
- vpshufb ymm4, ymm9, ymm4
+ vpsubb ymm3, ymm0, ymmword ptr [reloc @RWD160]
+ vpshufb ymm3, ymm9, ymm3
vpshufb ymm0, ymm8, ymm0
- vpternlogd ymm3, ymm4, ymm0, -54
- vpand ymm0, ymm3, ymm2
+ vpblendmb ymm0 {k1}, ymm0, ymm3
+ vpand ymm0, ymm0, ymm2
vxorps ymm2, ymm2, ymm2
vpcmpeqb ymm0, ymm0, ymm2
vpcmpeqd ymm2, ymm2, ymm2
@@ -214,7 +212,7 @@ G_M48875_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand ymm0, ymm1, ymm0
vptest ymm0, ymm0
je SHORT G_M48875_IG09
- ;; size=248 bbWeight=4 PerfScore 317.33
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpermq ymm0, ymm0, -40
vpmovmskb ebp, ymm0
@@ -337,12 +335,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm1, xmm10, xmm1
vpand xmm2, xmm2, xmm11
vpcmpub k1, xmm2, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm2, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm2, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm2, xmm6, xmm2
- vpternlogd xmm3, xmm4, xmm2, -54
- vpand xmm1, xmm3, xmm1
+ vpblendmb xmm2 {k1}, xmm2, xmm3
+ vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm1, xmm1, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -352,12 +349,11 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpshufb xmm2, xmm10, xmm2
vpand xmm0, xmm0, xmm11
vpcmpub k1, xmm0, xmm12, 6
- vpmovm2b xmm3, k1
- vpsubb xmm4, xmm0, xmm13
- vpshufb xmm4, xmm7, xmm4
+ vpsubb xmm3, xmm0, xmm13
+ vpshufb xmm3, xmm7, xmm3
vpshufb xmm0, xmm6, xmm0
- vpternlogd xmm3, xmm4, xmm0, -54
- vpand xmm0, xmm3, xmm2
+ vpblendmb xmm0 {k1}, xmm0, xmm3
+ vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm0, xmm0, xmm2
vpcmpeqd xmm2, xmm2, xmm2
@@ -365,7 +361,7 @@ G_M48875_IG16: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi r
vpand xmm0, xmm1, xmm0
vptest xmm0, xmm0
je SHORT G_M48875_IG19
- ;; size=200 bbWeight=4 PerfScore 168.00
+ ;; size=186 bbWeight=4 PerfScore 164.00
G_M48875_IG17: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=C0C8 {rbx rsi rdi r14 r15}, byref
vpmovmskb ebp, xmm0
;; size=4 bbWeight=2 PerfScore 4.00
@@ -432,7 +428,7 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F
RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 1109, prolog size 86, PerfScore 1189.83, instruction count 240, allocated bytes for code 1109 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 1081, prolog size 86, PerfScore 1181.83, instruction count 236, allocated bytes for code 1081 (MethodHash=36e94114) for method System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
; ============================================================
Unwind Info:
Details
Improvements/regressions per collection
| Collection |
Contexts with diffs |
Improvements |
Regressions |
Same size |
Improvements (bytes) |
Regressions (bytes) |
| benchmarks.run.windows.x64.checked.mch |
1 |
1 |
0 |
0 |
-28 |
+0 |
| benchmarks.run_pgo.windows.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| benchmarks.run_tiered.windows.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| coreclr_tests.run.windows.x64.checked.mch |
16 |
16 |
0 |
0 |
-528 |
+0 |
| libraries.crossgen2.windows.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| libraries.pmi.windows.x64.checked.mch |
24 |
24 |
0 |
0 |
-499 |
+0 |
| libraries_tests.run.windows.x64.Release.mch |
29 |
29 |
0 |
0 |
-1,014 |
+0 |
| librariestestsnotieredcompilation.run.windows.x64.Release.mch |
7 |
7 |
0 |
0 |
-759 |
+0 |
| realworld.run.windows.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
| smoke_tests.nativeaot.windows.x64.checked.mch |
1 |
1 |
0 |
0 |
-28 |
+0 |
|
78 |
78 |
0 |
0 |
-2,856 |
+0 |
Context information
| Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
| benchmarks.run.windows.x64.checked.mch |
28,086 |
4 |
28,082 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_pgo.windows.x64.checked.mch |
101,718 |
49,794 |
51,924 |
0 (0.00%) |
0 (0.00%) |
| benchmarks.run_tiered.windows.x64.checked.mch |
54,385 |
36,847 |
17,538 |
0 (0.00%) |
0 (0.00%) |
| coreclr_tests.run.windows.x64.checked.mch |
573,989 |
340,983 |
233,006 |
0 (0.00%) |
0 (0.00%) |
| libraries.crossgen2.windows.x64.checked.mch |
243,425 |
15 |
243,410 |
0 (0.00%) |
0 (0.00%) |
| libraries.pmi.windows.x64.checked.mch |
308,498 |
6 |
308,492 |
0 (0.00%) |
0 (0.00%) |
| libraries_tests.run.windows.x64.Release.mch |
673,287 |
479,208 |
194,079 |
0 (0.00%) |
0 (0.00%) |
| librariestestsnotieredcompilation.run.windows.x64.Release.mch |
320,511 |
21,885 |
298,626 |
0 (0.00%) |
0 (0.00%) |
| realworld.run.windows.x64.checked.mch |
36,890 |
3 |
36,887 |
0 (0.00%) |
0 (0.00%) |
| smoke_tests.nativeaot.windows.x64.checked.mch |
32,412 |
11 |
32,401 |
0 (0.00%) |
0 (0.00%) |
|
2,373,201 |
928,756 |
1,444,445 |
0 (0.00%) |
0 (0.00%) |
jit-analyze output
benchmarks.run.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 8749502 (overridden on cmd)
Total bytes of diff: 8749474 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-28 : 24358.dasm (-2.50 % of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-28 (-2.50 % of base) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
Top method improvements (percentages):
-28 (-2.50 % of base) : 24358.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
1 total methods with Code Size differences (1 improved, 0 regressed).
coreclr_tests.run.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 393893406 (overridden on cmd)
Total bytes of diff: 393892878 (overridden on cmd)
Total bytes of delta: -528 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-56 : 174838.dasm (-2.83 % of base)
-56 : 174841.dasm (-2.86 % of base)
-56 : 174837.dasm (-2.83 % of base)
-56 : 174842.dasm (-2.82 % of base)
-32 : 429090.dasm (-0.77 % of base)
-32 : 429091.dasm (-0.77 % of base)
-32 : 429094.dasm (-0.78 % of base)
-32 : 429095.dasm (-0.77 % of base)
-28 : 174840.dasm (-1.42 % of base)
-28 : 174836.dasm (-1.43 % of base)
-28 : 174835.dasm (-1.46 % of base)
-28 : 174839.dasm (-1.42 % of base)
-16 : 429088.dasm (-0.39 % of base)
-16 : 429092.dasm (-0.39 % of base)
-16 : 429093.dasm (-0.39 % of base)
-16 : 429089.dasm (-0.39 % of base)
16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-56 (-2.83 % of base) : 174838.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
-56 (-2.86 % of base) : 174841.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
-56 (-2.82 % of base) : 174842.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
-56 (-2.83 % of base) : 174837.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
-32 (-0.77 % of base) : 429091.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
-32 (-0.78 % of base) : 429094.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
-32 (-0.77 % of base) : 429095.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
-32 (-0.77 % of base) : 429090.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
-28 (-1.42 % of base) : 174840.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
-28 (-1.46 % of base) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
-28 (-1.43 % of base) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
-28 (-1.42 % of base) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
-16 (-0.39 % of base) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
-16 (-0.39 % of base) : 429088.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
-16 (-0.39 % of base) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
-16 (-0.39 % of base) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
Top method improvements (percentages):
-56 (-2.86 % of base) : 174841.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
-56 (-2.83 % of base) : 174838.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
-56 (-2.83 % of base) : 174837.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
-56 (-2.82 % of base) : 174842.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
-28 (-1.46 % of base) : 174835.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
-28 (-1.43 % of base) : 174836.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
-28 (-1.42 % of base) : 174840.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
-28 (-1.42 % of base) : 174839.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
-32 (-0.78 % of base) : 429094.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Instrumented Tier0)
-32 (-0.77 % of base) : 429095.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Instrumented Tier0)
-32 (-0.77 % of base) : 429091.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Instrumented Tier0)
-32 (-0.77 % of base) : 429090.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Instrumented Tier0)
-16 (-0.39 % of base) : 429088.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Instrumented Tier0)
-16 (-0.39 % of base) : 429089.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Instrumented Tier0)
-16 (-0.39 % of base) : 429093.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Instrumented Tier0)
-16 (-0.39 % of base) : 429092.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Instrumented Tier0)
16 total methods with Code Size differences (16 improved, 0 regressed).
libraries.pmi.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 61525850 (overridden on cmd)
Total bytes of diff: 61525351 (overridden on cmd)
Total bytes of delta: -499 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-56 : 293984.dasm (-20.66 % of base)
-56 : 294041.dasm (-20.66 % of base)
-32 : 27695.dasm (-10.85 % of base)
-28 : 27702.dasm (-2.50 % of base)
-28 : 293986.dasm (-16.28 % of base)
-28 : 294043.dasm (-16.28 % of base)
-26 : 27697.dasm (-8.78 % of base)
-21 : 294040.dasm (-20.39 % of base)
-21 : 294062.dasm (-20.39 % of base)
-21 : 293983.dasm (-20.39 % of base)
-21 : 294005.dasm (-20.39 % of base)
-14 : 293982.dasm (-16.28 % of base)
-14 : 294004.dasm (-16.28 % of base)
-14 : 294038.dasm (-16.87 % of base)
-14 : 294042.dasm (-13.73 % of base)
-14 : 294061.dasm (-16.28 % of base)
-14 : 293981.dasm (-16.87 % of base)
-14 : 293985.dasm (-13.73 % of base)
-14 : 294003.dasm (-16.87 % of base)
-14 : 294039.dasm (-16.28 % of base)
24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-56 (-20.66 % of base) : 293984.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-56 (-20.66 % of base) : 294041.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-32 (-10.85 % of base) : 27695.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-28 (-2.50 % of base) : 27702.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
-28 (-16.28 % of base) : 293986.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-28 (-16.28 % of base) : 294043.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-26 (-8.78 % of base) : 27697.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-21 (-20.39 % of base) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.39 % of base) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.39 % of base) : 294040.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.39 % of base) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 293981.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-13.73 % of base) : 293985.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-14 (-16.28 % of base) : 293982.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 294003.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-16.28 % of base) : 294004.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 294038.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-13.73 % of base) : 294042.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-14 (-16.28 % of base) : 294039.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 294060.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
Top method improvements (percentages):
-56 (-20.66 % of base) : 293984.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-56 (-20.66 % of base) : 294041.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
-21 (-20.39 % of base) : 293983.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.39 % of base) : 294005.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.39 % of base) : 294040.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-21 (-20.39 % of base) : 294062.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 293981.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 294003.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 294038.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-16.87 % of base) : 294060.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
-14 (-16.28 % of base) : 293982.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-28 (-16.28 % of base) : 293986.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-14 (-16.28 % of base) : 294004.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-16.28 % of base) : 294039.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-28 (-16.28 % of base) : 294043.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
-14 (-16.28 % of base) : 294061.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-14 (-13.73 % of base) : 293985.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-14 (-13.73 % of base) : 294042.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
-32 (-10.85 % of base) : 27695.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
-26 (-8.78 % of base) : 27697.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
24 total methods with Code Size differences (24 improved, 0 regressed).
libraries_tests.run.windows.x64.Release.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 279744051 (overridden on cmd)
Total bytes of diff: 279743037 (overridden on cmd)
Total bytes of delta: -1014 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-368 : 393123.dasm (-15.51 % of base)
-84 : 386605.dasm (-10.98 % of base)
-84 : 393118.dasm (-10.81 % of base)
-84 : 393473.dasm (-11.02 % of base)
-84 : 386805.dasm (-10.98 % of base)
-32 : 342443.dasm (-10.85 % of base)
-28 : 342446.dasm (-2.40 % of base)
-26 : 342451.dasm (-8.78 % of base)
-14 : 385681.dasm (-16.09 % of base)
-14 : 385680.dasm (-16.09 % of base)
-14 : 385965.dasm (-16.09 % of base)
-14 : 393121.dasm (-16.09 % of base)
-14 : 393190.dasm (-16.67 % of base)
-14 : 393191.dasm (-16.67 % of base)
-14 : 393192.dasm (-16.09 % of base)
-14 : 393193.dasm (-16.09 % of base)
-14 : 385964.dasm (-16.67 % of base)
-14 : 386233.dasm (-16.67 % of base)
-14 : 393122.dasm (-16.09 % of base)
-7 : 392275.dasm (-3.27 % of base)
29 total files with Code Size differences (29 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-368 (-15.51 % of base) : 393123.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-84 (-10.98 % of base) : 386605.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-84 (-11.02 % of base) : 393473.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-84 (-10.81 % of base) : 393118.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-84 (-10.98 % of base) : 386805.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-32 (-10.85 % of base) : 342443.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)
-28 (-2.40 % of base) : 342446.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier1)
-26 (-8.78 % of base) : 342451.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
-14 (-16.67 % of base) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.67 % of base) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.09 % of base) : 385681.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393193.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.67 % of base) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.09 % of base) : 385680.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393192.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.67 % of base) : 385964.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.09 % of base) : 385965.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393122.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393121.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-7 (-5.79 % of base) : 342450.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
Top method improvements (percentages):
-14 (-16.67 % of base) : 393191.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.67 % of base) : 386233.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.67 % of base) : 393190.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.67 % of base) : 385964.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector128`1[ulong],System.Runtime.Intrinsics.Vector128`1[ulong]):System.Runtime.Intrinsics.Vector128`1[ulong] (Tier1)
-14 (-16.09 % of base) : 385681.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393193.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 385680.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393192.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 385965.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393122.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-14 (-16.09 % of base) : 393121.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]:Invoke(System.Runtime.Intrinsics.Vector256`1[ulong],System.Runtime.Intrinsics.Vector256`1[ulong]):System.Runtime.Intrinsics.Vector256`1[ulong] (Tier1)
-368 (-15.51 % of base) : 393123.dasm - System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|220_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
-84 (-11.02 % of base) : 393473.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-84 (-10.98 % of base) : 386605.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-84 (-10.98 % of base) : 386805.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-32 (-10.85 % of base) : 342443.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)
-84 (-10.81 % of base) : 393118.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](System.ReadOnlySpan`1[ulong],System.ReadOnlySpan`1[ulong],System.Span`1[ulong]) (Tier1)
-26 (-8.78 % of base) : 342451.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
-7 (-5.79 % of base) : 342450.dasm - System.Buffers.ProbabilisticMap:IsCharBitSet(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
-7 (-5.65 % of base) : 342442.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (Tier1)
29 total methods with Code Size differences (29 improved, 0 regressed).
librariestestsnotieredcompilation.run.windows.x64.Release.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 137525226 (overridden on cmd)
Total bytes of diff: 137524467 (overridden on cmd)
Total bytes of delta: -759 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-182 : 168894.dasm (-17.50 % of base)
-182 : 168616.dasm (-17.50 % of base)
-154 : 169032.dasm (-17.09 % of base)
-98 : 168947.dasm (-15.15 % of base)
-98 : 168825.dasm (-15.15 % of base)
-28 : 150728.dasm (-2.50 % of base)
-17 : 169934.dasm (-1.75 % of base)
7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-182 (-17.50 % of base) : 168894.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-182 (-17.50 % of base) : 168616.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-154 (-17.09 % of base) : 169032.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
-98 (-15.15 % of base) : 168947.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-98 (-15.15 % of base) : 168825.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-28 (-2.50 % of base) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
-17 (-1.75 % of base) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
Top method improvements (percentages):
-182 (-17.50 % of base) : 168894.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-182 (-17.50 % of base) : 168616.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
-154 (-17.09 % of base) : 169032.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
-98 (-15.15 % of base) : 168947.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-98 (-15.15 % of base) : 168825.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
-28 (-2.50 % of base) : 150728.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
-17 (-1.75 % of base) : 169934.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
7 total methods with Code Size differences (7 improved, 0 regressed).
smoke_tests.nativeaot.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 5089881 (overridden on cmd)
Total bytes of diff: 5089853 (overridden on cmd)
Total bytes of delta: -28 (-0.00 % of base)
diff is an improvement.
relative diff is an improvement.
Detail diffs
Top file improvements (bytes):
-28 : 19903.dasm (-2.52 % of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-28 (-2.52 % of base) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
Top method improvements (percentages):
-28 (-2.52 % of base) : 19903.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
1 total methods with Code Size differences (1 improved, 0 regressed).