Assembly Diffs
osx arm64
Diffs are based on 2,029,386 contexts (927,368 MinOpts, 1,102,018 FullOpts).
MISSED contexts: 109 (0.01%)
No diffs found.
Details
Context information
Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
benchmarks.run.osx.arm64.checked.mch |
24,861 |
5 |
24,856 |
0 (0.00%) |
0 (0.00%) |
benchmarks.run_pgo.osx.arm64.checked.mch |
84,163 |
48,254 |
35,909 |
13 (0.02%) |
13 (0.02%) |
benchmarks.run_tiered.osx.arm64.checked.mch |
48,057 |
37,339 |
10,718 |
0 (0.00%) |
0 (0.00%) |
coreclr_tests.run.osx.arm64.checked.mch |
584,881 |
356,502 |
228,379 |
7 (0.00%) |
7 (0.00%) |
libraries.crossgen2.osx.arm64.checked.mch |
1,881 |
0 |
1,881 |
0 (0.00%) |
0 (0.00%) |
libraries.pmi.osx.arm64.checked.mch |
316,291 |
18 |
316,273 |
3 (0.00%) |
3 (0.00%) |
libraries_tests.run.osx.arm64.Release.mch |
634,566 |
463,650 |
170,916 |
83 (0.01%) |
83 (0.01%) |
librariestestsnotieredcompilation.run.osx.arm64.Release.mch |
303,144 |
21,597 |
281,547 |
2 (0.00%) |
2 (0.00%) |
realworld.run.osx.arm64.checked.mch |
31,542 |
3 |
31,539 |
1 (0.00%) |
1 (0.00%) |
|
2,029,386 |
927,368 |
1,102,018 |
109 (0.01%) |
109 (0.01%) |
windows arm64
Diffs are based on 2,070,850 contexts (937,853 MinOpts, 1,132,997 FullOpts).
MISSED contexts: 139 (0.01%)
No diffs found.
Details
Context information
Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
benchmarks.run.windows.arm64.checked.mch |
24,455 |
4 |
24,451 |
0 (0.00%) |
0 (0.00%) |
benchmarks.run_pgo.windows.arm64.checked.mch |
97,527 |
48,627 |
48,900 |
13 (0.01%) |
13 (0.01%) |
benchmarks.run_tiered.windows.arm64.checked.mch |
49,174 |
36,718 |
12,456 |
0 (0.00%) |
0 (0.00%) |
coreclr_tests.run.windows.arm64.checked.mch |
595,172 |
362,437 |
232,735 |
11 (0.00%) |
11 (0.00%) |
libraries.crossgen2.windows.arm64.checked.mch |
2,130 |
0 |
2,130 |
0 (0.00%) |
0 (0.00%) |
libraries.pmi.windows.arm64.checked.mch |
305,519 |
6 |
305,513 |
3 (0.00%) |
3 (0.00%) |
libraries_tests.run.windows.arm64.Release.mch |
646,533 |
468,460 |
178,073 |
107 (0.02%) |
107 (0.02%) |
librariestestsnotieredcompilation.run.windows.arm64.Release.mch |
317,022 |
21,598 |
295,424 |
4 (0.00%) |
4 (0.00%) |
realworld.run.windows.arm64.checked.mch |
33,241 |
3 |
33,238 |
1 (0.00%) |
1 (0.00%) |
smoke_tests.nativeaot.windows.arm64.checked.mch |
77 |
0 |
77 |
0 (0.00%) |
0 (0.00%) |
|
2,070,850 |
937,853 |
1,132,997 |
139 (0.01%) |
139 (0.01%) |
windows x64
Diffs are based on 2,098,432 contexts (926,221 MinOpts, 1,172,211 FullOpts).
MISSED contexts: 138 (0.01%)
Overall (-151 bytes)
Collection |
Base size (bytes) |
Diff size (bytes) |
benchmarks.run.windows.x64.checked.mch |
8,730,756 |
+0 |
benchmarks.run_pgo.windows.x64.checked.mch |
35,773,696 |
+0 |
benchmarks.run_tiered.windows.x64.checked.mch |
12,546,772 |
+0 |
libraries.pmi.windows.x64.checked.mch |
61,645,293 |
-16 |
libraries_tests.run.windows.x64.Release.mch |
278,809,463 |
+2 |
realworld.run.windows.x64.checked.mch |
13,946,185 |
-137 |
FullOpts (-151 bytes)
Collection |
Base size (bytes) |
Diff size (bytes) |
benchmarks.run.windows.x64.checked.mch |
8,730,393 |
+0 |
benchmarks.run_pgo.windows.x64.checked.mch |
21,741,615 |
+0 |
benchmarks.run_tiered.windows.x64.checked.mch |
3,451,035 |
+0 |
libraries.pmi.windows.x64.checked.mch |
61,531,772 |
-16 |
libraries_tests.run.windows.x64.Release.mch |
106,634,847 |
+2 |
realworld.run.windows.x64.checked.mch |
13,559,576 |
-137 |
Example diffs
benchmarks.run.windows.x64.checked.mch
+0 (0.00%) : 16504.dasm - Algorithms.VectorFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (FullOpts)
@@ -220,8 +220,8 @@ G_M3972_IG07: ; bbWeight=128, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=000
vaddps ymm5, ymm5, ymm16
vcmpps ymm5, ymm5, ymm10, 2
vpcmpd k1, ymm6, ymm7, 2
- vpmovm2d ymm9, k1
- vpternlogd ymm5, ymm9, ymm4, -128
+ vpmovm2d ymm16, k1
+ vpternlogd ymm5, ymm16, ymm4, -128
vmovaps ymm4, ymm5
vptest ymm4, ymm4
vmovups ymm1, ymmword ptr [rsp+0x20]
benchmarks.run_pgo.windows.x64.checked.mch
+0 (0.00%) : 31047.dasm - System.Text.Ascii:EqualsIgnoreCase[ushort,ushort,System.Text.Ascii+PlainLoader`1[ushort]](byref,byref,ulong):ubyte (Tier1)
@@ -160,8 +160,8 @@ G_M2558_IG04: ; bbWeight=0.95, gcrefRegs=0000 {}, byrefRegs=0107 {rax rcx
vpor xmm4, xmm4, xmm0
vpor xmm5, xmm5, xmm0
vpsubw xmm16, xmm4, xmm1
- vpandd xmm6, xmm16, xmm6
- vpcmpuw k1, xmm6, xmm2, 6
+ vpandd xmm16, xmm16, xmm6
+ vpcmpuw k1, xmm16, xmm2, 6
kortestb k1, k1
setne r10b
movzx r10, r10b
benchmarks.run_tiered.windows.x64.checked.mch
+0 (0.00%) : 32432.dasm - Algorithms.VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (Tier1-OSR)
@@ -214,8 +214,8 @@ G_M57953_IG09: ; bbWeight=64, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=000
vaddpd ymm5, ymm5, ymm16
vcmppd ymm5, ymm5, ymm9, 2
vpcmpq k1, ymm6, ymm10, 2
- vpmovm2q ymm3, k1
- vpternlogq ymm5, ymm3, ymm2, -128
+ vpmovm2q ymm16, k1
+ vpternlogq ymm5, ymm16, ymm2, -128
vmovaps ymm2, ymm5
vptest ymm2, ymm2
jne SHORT G_M57953_IG09
libraries.pmi.windows.x64.checked.mch
-16 (-5.93%) : 27601.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector1281[ubyte],System.Runtime.Intrinsics.Vector128
1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1ubyte
@@ -47,18 +47,16 @@
; V36 cse1 [V36,T11] ( 3, 3 ) simd16 -> mm3 "CSE - aggressive"
; V37 cse2 [V37,T12] ( 3, 3 ) simd16 -> mm4 "CSE - aggressive"
; V38 cse3 [V38,T13] ( 3, 3 ) simd16 -> mm5 "CSE - aggressive"
-; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm6 "CSE - aggressive"
-; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive"
-; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive"
-; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm19 "CSE - aggressive"
+; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive"
+; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive"
+; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm18 "CSE - aggressive"
+; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm20 "CSE - aggressive"
;
-; Lcl frame size = 24
+; Lcl frame size = 0
G_M35004_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- sub rsp, 24
vzeroupper
- vmovaps xmmword ptr [rsp], xmm6
- ;; size=12 bbWeight=1 PerfScore 3.25
+ ;; size=3 bbWeight=1 PerfScore 1.00
G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r8 r9}, byref
; byrRegs +[rcx rdx r8-r9]
vmovups xmm0, xmmword ptr [r9]
@@ -77,15 +75,15 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vpshufb xmm1, xmm4, xmm1
vmovups xmm5, xmmword ptr [reloc @RWD48]
vpand xmm2, xmm2, xmm5
- vmovups xmm6, xmmword ptr [reloc @RWD64]
- vpcmpub k1, xmm2, xmm6, 6
- vmovups xmm16, xmmword ptr [r8]
- vmovups xmm17, xmmword ptr [reloc @RWD80]
- vpsubb xmm18, xmm2, xmm17
- vpshufb xmm18, xmm16, xmm18
- vmovups xmm19, xmmword ptr [rdx]
- vpshufb xmm2, xmm19, xmm2
- vpblendmb xmm2 {k1}, xmm2, xmm18
+ vmovups xmm16, xmmword ptr [reloc @RWD64]
+ vpcmpub k1, xmm2, xmm16, 6
+ vmovups xmm17, xmmword ptr [r8]
+ vmovups xmm18, xmmword ptr [reloc @RWD80]
+ vpsubb xmm19, xmm2, xmm18
+ vpshufb xmm19, xmm17, xmm19
+ vmovups xmm20, xmmword ptr [rdx]
+ vpshufb xmm2, xmm20, xmm2
+ vpblendmb xmm2 {k1}, xmm2, xmm19
vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm1, xmm1, xmm2
@@ -95,10 +93,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vpand xmm2, xmm2, xmm3
vpshufb xmm2, xmm4, xmm2
vpand xmm0, xmm0, xmm5
- vpcmpub k1, xmm0, xmm6, 6
- vpsubb xmm3, xmm0, xmm17
- vpshufb xmm3, xmm16, xmm3
- vpshufb xmm0, xmm19, xmm0
+ vpcmpub k1, xmm0, xmm16, 6
+ vpsubb xmm3, xmm0, xmm18
+ vpshufb xmm3, xmm17, xmm3
+ vpshufb xmm0, xmm20, xmm0
vpblendmb xmm0 {k1}, xmm0, xmm3
vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2
@@ -109,12 +107,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 65.25
+ ;; size=250 bbWeight=1 PerfScore 65.25
G_M35004_IG03: ; bbWeight=1, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp]
- add rsp, 24
ret
- ;; size=10 bbWeight=1 PerfScore 5.25
+ ;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh
RWD16 dq 0707070707070707h, 0707070707070707h
RWD32 dq 8040201008040201h, 8040201008040201h
@@ -123,7 +119,7 @@ RWD64 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh
RWD80 dq 1010101010101010h, 1010101010101010h
-Total bytes of code 270, prolog size 12, PerfScore 73.75, instruction count 53, allocated bytes for code 270 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
+Total bytes of code 254, prolog size 3, PerfScore 67.25, instruction count 49, allocated bytes for code 254 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
; ============================================================
Unwind Info:
@@ -131,11 +127,8 @@ Unwind Info:
>> End offset : 0xd1ffab1e (not in unwind data)
Version : 1
Flags : 0x00
- SizeOfProlog : 0x0C
- CountOfUnwindCodes: 3
+ SizeOfProlog : 0x00
+ CountOfUnwindCodes: 0
FrameRegister : none (0)
FrameOffset : N/A (no FrameRegister) (Value=0)
UnwindCodes :
- CodeOffset: 0x0C UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6)
- Scaled Small Offset: 0 * 16 = 0 = 0x00000
- CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 2 * 8 + 8 = 24 = 0x18
libraries_tests.run.windows.x64.Release.mch
-16 (-5.93%) : 339303.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector1281[ubyte],System.Runtime.Intrinsics.Vector128
1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1ubyte
@@ -48,18 +48,16 @@
; V36 cse1 [V36,T11] ( 3, 3 ) simd16 -> mm3 "CSE - aggressive"
; V37 cse2 [V37,T12] ( 3, 3 ) simd16 -> mm4 "CSE - aggressive"
; V38 cse3 [V38,T13] ( 3, 3 ) simd16 -> mm5 "CSE - aggressive"
-; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm6 "CSE - aggressive"
-; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive"
-; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive"
-; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm19 "CSE - aggressive"
+; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive"
+; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive"
+; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm18 "CSE - aggressive"
+; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm20 "CSE - aggressive"
;
-; Lcl frame size = 24
+; Lcl frame size = 0
G_M35004_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- sub rsp, 24
vzeroupper
- vmovaps xmmword ptr [rsp], xmm6
- ;; size=12 bbWeight=1 PerfScore 3.25
+ ;; size=3 bbWeight=1 PerfScore 1.00
G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r8 r9}, byref
; byrRegs +[rcx rdx r8-r9]
vmovups xmm0, xmmword ptr [r9]
@@ -78,15 +76,15 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vpshufb xmm1, xmm4, xmm1
vmovups xmm5, xmmword ptr [reloc @RWD48]
vpand xmm2, xmm2, xmm5
- vmovups xmm6, xmmword ptr [reloc @RWD64]
- vpcmpub k1, xmm2, xmm6, 6
- vmovups xmm16, xmmword ptr [r8]
- vmovups xmm17, xmmword ptr [reloc @RWD80]
- vpsubb xmm18, xmm2, xmm17
- vpshufb xmm18, xmm16, xmm18
- vmovups xmm19, xmmword ptr [rdx]
- vpshufb xmm2, xmm19, xmm2
- vpblendmb xmm2 {k1}, xmm2, xmm18
+ vmovups xmm16, xmmword ptr [reloc @RWD64]
+ vpcmpub k1, xmm2, xmm16, 6
+ vmovups xmm17, xmmword ptr [r8]
+ vmovups xmm18, xmmword ptr [reloc @RWD80]
+ vpsubb xmm19, xmm2, xmm18
+ vpshufb xmm19, xmm17, xmm19
+ vmovups xmm20, xmmword ptr [rdx]
+ vpshufb xmm2, xmm20, xmm2
+ vpblendmb xmm2 {k1}, xmm2, xmm19
vpand xmm1, xmm2, xmm1
vxorps xmm2, xmm2, xmm2
vpcmpeqb xmm1, xmm1, xmm2
@@ -96,10 +94,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vpand xmm2, xmm2, xmm3
vpshufb xmm2, xmm4, xmm2
vpand xmm0, xmm0, xmm5
- vpcmpub k1, xmm0, xmm6, 6
- vpsubb xmm3, xmm0, xmm17
- vpshufb xmm3, xmm16, xmm3
- vpshufb xmm0, xmm19, xmm0
+ vpcmpub k1, xmm0, xmm16, 6
+ vpsubb xmm3, xmm0, xmm18
+ vpshufb xmm3, xmm17, xmm3
+ vpshufb xmm0, xmm20, xmm0
vpblendmb xmm0 {k1}, xmm0, xmm3
vpand xmm0, xmm0, xmm2
vxorps xmm2, xmm2, xmm2
@@ -110,12 +108,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 65.25
+ ;; size=250 bbWeight=1 PerfScore 65.25
G_M35004_IG03: ; bbWeight=1, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp]
- add rsp, 24
ret
- ;; size=10 bbWeight=1 PerfScore 5.25
+ ;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh
RWD16 dq 0707070707070707h, 0707070707070707h
RWD32 dq 8040201008040201h, 8040201008040201h
@@ -124,7 +120,7 @@ RWD64 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh
RWD80 dq 1010101010101010h, 1010101010101010h
-Total bytes of code 270, prolog size 12, PerfScore 73.75, instruction count 53, allocated bytes for code 270 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
+Total bytes of code 254, prolog size 3, PerfScore 67.25, instruction count 49, allocated bytes for code 254 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
; ============================================================
Unwind Info:
@@ -132,11 +128,8 @@ Unwind Info:
>> End offset : 0xd1ffab1e (not in unwind data)
Version : 1
Flags : 0x00
- SizeOfProlog : 0x0C
- CountOfUnwindCodes: 3
+ SizeOfProlog : 0x00
+ CountOfUnwindCodes: 0
FrameRegister : none (0)
FrameOffset : N/A (no FrameRegister) (Value=0)
UnwindCodes :
- CodeOffset: 0x0C UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6)
- Scaled Small Offset: 0 * 16 = 0 = 0x00000
- CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 2 * 8 + 8 = 24 = 0x18
+2 (+0.10%) : 385984.dasm - System.Numerics.Tensors.TensorPrimitives:g_Vectorized256|2272[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
@@ -248,21 +248,20 @@
; V236 tmp204 [V236,T07] ( 4, 7.03) long -> r8 "Cast away GC"
; V237 cse0 [V237,T12] ( 4, 3.51) long -> rbx "CSE - conservative"
;
-; Lcl frame size = 88
+; Lcl frame size = 72
G_M219_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push rsi
push rbx
- sub rsp, 88
+ sub rsp, 72
vzeroupper
- vmovaps xmmword ptr [rsp+0x40], xmm6
- vmovaps xmmword ptr [rsp+0x30], xmm7
- vmovaps xmmword ptr [rsp+0x20], xmm8
+ vmovaps xmmword ptr [rsp+0x30], xmm6
+ vmovaps xmmword ptr [rsp+0x20], xmm7
vxorps xmm4, xmm4, xmm4
vmovdqu xmmword ptr [rsp+0x08], xmm4
xor eax, eax
mov qword ptr [rsp+0x18], rax
- ;; size=44 bbWeight=1 PerfScore 12.83
+ ;; size=38 bbWeight=1 PerfScore 10.83
G_M219_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r8}, byref
; byrRegs +[rcx rdx r8]
mov rax, r8
@@ -340,8 +339,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by
vmovups ymm0, ymmword ptr [r10+0x20]
vmovups ymm1, ymmword ptr [r11+0x20]
vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6
- vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -349,8 +348,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by
vmovups ymm0, ymmword ptr [r10+0x40]
vmovups ymm1, ymmword ptr [r11+0x40]
vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7
- vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -358,8 +357,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by
vmovups ymm0, ymmword ptr [r10+0x60]
vmovups ymm1, ymmword ptr [r11+0x60]
vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8
- vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -380,19 +379,19 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by
vmovups ymm0, ymmword ptr [r10+0xA0]
vmovups ymm1, ymmword ptr [r11+0xA0]
vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6
- vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
vpternlogq ymm5, ymm16, ymm0, -54
vmovups ymm0, ymmword ptr [r10+0xC0]
- ;; size=363 bbWeight=3.54 PerfScore 428.52
+ ;; size=370 bbWeight=3.54 PerfScore 428.52
G_M219_IG07: ; bbWeight=3.54, extend
vmovups ymm1, ymmword ptr [r11+0xC0]
vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7
- vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -400,8 +399,8 @@ G_M219_IG07: ; bbWeight=3.54, extend
vmovups ymm0, ymmword ptr [r10+0xE0]
vmovups ymm1, ymmword ptr [r11+0xE0]
vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8
- vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -415,7 +414,7 @@ G_M219_IG07: ; bbWeight=3.54, extend
add rbx, 256
add r9, -32
jmp G_M219_IG05
- ;; size=174 bbWeight=3.54 PerfScore 148.74
+ ;; size=177 bbWeight=3.54 PerfScore 148.74
G_M219_IG08: ; bbWeight=0.88, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref
mov rcx, r10
; byrRegs +[rcx]
@@ -542,15 +541,14 @@ G_M219_IG20: ; bbWeight=0.99, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by
vmovups ymmword ptr [rax], ymm2
;; size=4 bbWeight=0.99 PerfScore 1.98
G_M219_IG21: ; bbWeight=0.99, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp+0x40]
- vmovaps xmm7, xmmword ptr [rsp+0x30]
- vmovaps xmm8, xmmword ptr [rsp+0x20]
+ vmovaps xmm6, xmmword ptr [rsp+0x30]
+ vmovaps xmm7, xmmword ptr [rsp+0x20]
vzeroupper
- add rsp, 88
+ add rsp, 72
pop rbx
pop rsi
ret
- ;; size=28 bbWeight=0.99 PerfScore 15.13
+ ;; size=22 bbWeight=0.99 PerfScore 11.17
G_M219_IG22: ; bbWeight=0, gcVars=00000000000000000000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, gcvars, byref
cmp r9, 32
jb G_M219_IG08
@@ -568,8 +566,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref
vmovups ymm0, ymmword ptr [r10+0x20]
vmovups ymm1, ymmword ptr [r11+0x20]
vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6
- vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -577,8 +575,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref
vmovups ymm0, ymmword ptr [r10+0x40]
vmovups ymm1, ymmword ptr [r11+0x40]
vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7
- vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -586,8 +584,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref
vmovups ymm0, ymmword ptr [r10+0x60]
vmovups ymm1, ymmword ptr [r11+0x60]
vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8
- vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -608,19 +606,19 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref
vmovups ymm0, ymmword ptr [r10+0xA0]
vmovups ymm1, ymmword ptr [r11+0xA0]
vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6
- vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
vpternlogq ymm5, ymm16, ymm0, -54
vmovups ymm0, ymmword ptr [r10+0xC0]
- ;; size=363 bbWeight=0 PerfScore 0.00
+ ;; size=370 bbWeight=0 PerfScore 0.00
G_M219_IG24: ; bbWeight=0, extend
vmovups ymm1, ymmword ptr [r11+0xC0]
vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7
- vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -628,8 +626,8 @@ G_M219_IG24: ; bbWeight=0, extend
vmovups ymm0, ymmword ptr [r10+0xE0]
vmovups ymm1, ymmword ptr [r11+0xE0]
vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8
- vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16
+ vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1
vpcmpuq k1, ymm1, ymm0, 1
vpblendmq ymm0 {k1}, ymm0, ymm1
@@ -645,18 +643,17 @@ G_M219_IG24: ; bbWeight=0, extend
cmp r9, 32
jae G_M219_IG23
jmp G_M219_IG08
- ;; size=184 bbWeight=0 PerfScore 0.00
+ ;; size=187 bbWeight=0 PerfScore 0.00
G_M219_IG25: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
; byrRegs -[rax]
- vmovaps xmm6, xmmword ptr [rsp+0x40]
- vmovaps xmm7, xmmword ptr [rsp+0x30]
- vmovaps xmm8, xmmword ptr [rsp+0x20]
+ vmovaps xmm6, xmmword ptr [rsp+0x30]
+ vmovaps xmm7, xmmword ptr [rsp+0x20]
vzeroupper
- add rsp, 88
+ add rsp, 72
pop rbx
pop rsi
ret
- ;; size=28 bbWeight=0 PerfScore 0.00
+ ;; size=22 bbWeight=0 PerfScore 0.00
RWD00 dd G_M219_IG20 - G_M219_IG02
dd G_M219_IG19 - G_M219_IG02
dd G_M219_IG18 - G_M219_IG02
@@ -668,7 +665,7 @@ RWD00 dd G_M219_IG20 - G_M219_IG02
dd G_M219_IG12 - G_M219_IG02
-Total bytes of code 2004, prolog size 44, PerfScore 732.42, instruction count 346, allocated bytes for code 2004 (MethodHash=9888ff24) for method System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|227_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
+Total bytes of code 2006, prolog size 38, PerfScore 726.45, instruction count 343, allocated bytes for code 2006 (MethodHash=9888ff24) for method System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|227_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
; ============================================================
Unwind Info:
@@ -676,17 +673,15 @@ Unwind Info:
>> End offset : 0xd1ffab1e (not in unwind data)
Version : 1
Flags : 0x00
- SizeOfProlog : 0x1B
- CountOfUnwindCodes: 9
+ SizeOfProlog : 0x15
+ CountOfUnwindCodes: 7
FrameRegister : none (0)
FrameOffset : N/A (no FrameRegister) (Value=0)
UnwindCodes :
- CodeOffset: 0x1B UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM8 (8)
- Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x15 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM7 (7)
- Scaled Small Offset: 3 * 16 = 48 = 0x00030
+ Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x0F UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6)
- Scaled Small Offset: 4 * 16 = 64 = 0x00040
- CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 10 * 8 + 8 = 88 = 0x58
+ Scaled Small Offset: 3 * 16 = 48 = 0x00030
+ CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 8 * 8 + 8 = 72 = 0x48
CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3)
CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6)
+2 (+0.71%) : 393286.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector128
1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
@@ -85,11 +85,11 @@ G_M8683_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
vpshufd xmm3, xmm2, -79
vcmpps xmm4, xmm0, xmm1, 14
vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
jne SHORT G_M8683_IG06
- ;; size=83 bbWeight=1 PerfScore 32.33
+ ;; size=85 bbWeight=1 PerfScore 32.33
G_M8683_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byref
; byrRegs -[rcx]
vpternlogd xmm0, xmm0, xmm4, 85
@@ -130,7 +130,7 @@ G_M8683_IG07: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx
jmp G_M8683_IG03
;; size=53 bbWeight=0.16 PerfScore 1.49
-Total bytes of code 280, prolog size 18, PerfScore 74.19, instruction count 59, allocated bytes for code 284 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
+Total bytes of code 282, prolog size 18, PerfScore 74.19, instruction count 59, allocated bytes for code 286 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
; ============================================================
Unwind Info:
+4 (+0.88%) : 397867.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator1[float]](System.Runtime.Intrinsics.Vector256
1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
@@ -104,11 +104,11 @@ G_M33561_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
vandnps xmm6, xmm4, xmm0
vcmpps xmm7, xmm5, xmm6, 1
vcmpps xmm5, xmm5, xmm6, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
jne G_M33561_IG09
- ;; size=75 bbWeight=1 PerfScore 29.50
+ ;; size=77 bbWeight=1 PerfScore 29.50
G_M33561_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; byrRegs -[rcx rdx]
vpternlogd xmm5, xmm5, xmm7, 85
@@ -121,11 +121,11 @@ G_M33561_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vandnps xmm6, xmm4, xmm0
vcmpps xmm7, xmm5, xmm6, 1
vcmpps xmm5, xmm5, xmm6, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
jne G_M33561_IG08
- ;; size=75 bbWeight=1 PerfScore 17.00
+ ;; size=77 bbWeight=1 PerfScore 17.00
G_M33561_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
vpternlogd xmm5, xmm5, xmm7, 85
vblendvps xmm1 xmm1, xmm0, xmm5
@@ -197,7 +197,7 @@ G_M33561_IG09: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
RWD00 dq 8000000080000000h, 8000000080000000h
-Total bytes of code 457, prolog size 30, PerfScore 97.71, instruction count 91, allocated bytes for code 463 (MethodHash=5eda7ce6) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
+Total bytes of code 461, prolog size 30, PerfScore 97.71, instruction count 91, allocated bytes for code 467 (MethodHash=5eda7ce6) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
; ============================================================
Unwind Info:
+4 (+1.47%) : 395837.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector128
1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
@@ -68,11 +68,11 @@ G_M8683_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
vpshufd xmm3, xmm2, 78
vcmpps xmm4, xmm0, xmm1, 14
vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
je SHORT G_M8683_IG04
- ;; size=45 bbWeight=1 PerfScore 21.33
+ ;; size=47 bbWeight=1 PerfScore 21.33
G_M8683_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, byref
vpcmpgtd xmm2, xmm3, xmm2
vxorps xmm6, xmm6, xmm6
@@ -99,11 +99,11 @@ G_M8683_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
vpshufd xmm3, xmm2, -79
vcmpps xmm4, xmm0, xmm1, 14
vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
je SHORT G_M8683_IG06
- ;; size=83 bbWeight=1 PerfScore 32.33
+ ;; size=85 bbWeight=1 PerfScore 32.33
G_M8683_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byref
; byrRegs -[rcx]
vpcmpgtd xmm6, xmm3, xmm2
@@ -129,7 +129,7 @@ G_M8683_IG07: ; bbWeight=1, epilog, nogc, extend
ret
;; size=16 bbWeight=1 PerfScore 9.25
-Total bytes of code 273, prolog size 18, PerfScore 78.42, instruction count 58, allocated bytes for code 277 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
+Total bytes of code 277, prolog size 18, PerfScore 78.42, instruction count 58, allocated bytes for code 281 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
; ============================================================
Unwind Info:
+6 (+1.47%) : 393288.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector256
1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
@@ -93,11 +93,11 @@ G_M46251_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx},
vextractf128 xmm2, ymm2, 1
vcmpps xmm4, xmm1, xmm0, 14
vcmpps xmm5, xmm1, xmm0, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
jne G_M46251_IG09
- ;; size=59 bbWeight=1 PerfScore 25.83
+ ;; size=61 bbWeight=1 PerfScore 25.83
G_M46251_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; byrRegs -[rcx rdx]
vpternlogd xmm5, xmm5, xmm4, 85
@@ -108,11 +108,11 @@ G_M46251_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpshufd xmm2, xmm3, 78
vcmpps xmm4, xmm1, xmm0, 14
vcmpps xmm5, xmm1, xmm0, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
jne G_M46251_IG08
- ;; size=67 bbWeight=1 PerfScore 16.33
+ ;; size=69 bbWeight=1 PerfScore 16.33
G_M46251_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
vpternlogd xmm5, xmm5, xmm4, 85
vblendvps xmm1 xmm1, xmm0, xmm5
@@ -122,11 +122,11 @@ G_M46251_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
vpshufd xmm3, xmm0, -79
vcmpps xmm4, xmm1, xmm2, 14
vcmpps xmm5, xmm1, xmm2, 0
- vxorps xmm6, xmm6, xmm6
- vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16
+ vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1
jne SHORT G_M46251_IG07
- ;; size=63 bbWeight=1 PerfScore 16.33
+ ;; size=65 bbWeight=1 PerfScore 16.33
G_M46251_IG05: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
vpternlogd xmm1, xmm1, xmm4, 85
vblendvps xmm0 xmm0, xmm3, xmm1
@@ -179,7 +179,7 @@ G_M46251_IG09: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
jmp G_M46251_IG03
;; size=54 bbWeight=0.16 PerfScore 1.09
-Total bytes of code 409, prolog size 24, PerfScore 86.73, instruction count 82, allocated bytes for code 415 (MethodHash=f1994b54) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
+Total bytes of code 415, prolog size 24, PerfScore 86.73, instruction count 82, allocated bytes for code 421 (MethodHash=f1994b54) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
; ============================================================
Unwind Info:
realworld.run.windows.x64.checked.mch
+3 (+0.08%) : 1544.dasm - BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)
@@ -170,7 +170,7 @@
; V159 tmp109 [V159,T78] ( 6, 22 ) simd32 -> mm3 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
;* V160 tmp110 [V160 ] ( 0, 0 ) struct (96) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <BepuUtilities.Vector3Wide>
;* V161 tmp111 [V161 ] ( 0, 0 ) struct (96) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <BepuUtilities.Vector3Wide>
-; V162 tmp112 [V162,T148] ( 3, 10 ) simd32 -> mm7 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
+; V162 tmp112 [V162,T148] ( 3, 10 ) simd32 -> mm16 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
; V163 tmp113 [V163,T95] ( 4, 16 ) simd32 -> mm0 ld-addr-op "Inline stloc first use temp" <System.Numerics.Vector`1[float]>
;* V164 tmp114 [V164 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
;* V165 tmp115 [V165 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
@@ -324,7 +324,7 @@
; V313 cse6 [V313,T101] ( 4, 16 ) simd32 -> mm23 "CSE - conservative"
; V314 cse7 [V314,T65] ( 2, 5 ) long -> rdx hoist "CSE - conservative"
; V315 cse8 [V315,T64] ( 2, 5 ) byref -> [rbp+0x10] spill-single-def hoist "CSE - conservative"
-; V316 rat0 [V316,T77] ( 3, 24 ) simd32 -> mm8 "ReplaceWithLclVar is creating a new local variable"
+; V316 rat0 [V316,T77] ( 3, 24 ) simd32 -> mm7 "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 1848
@@ -982,16 +982,16 @@ G_M11466_IG22: ; bbWeight=4, extend
vaddps ymm16, ymm16, ymm21
vmovups ymm21, ymmword ptr [rbp+0x408]
vmulps ymm21, ymm21, ymmword ptr [rbp+0x408]
- vaddps ymm7, ymm16, ymm21
- vxorps ymm8, ymm8, ymm8
- vcmpps ymm8, ymm7, ymm8, 14
- vptest ymm8, ymm8
+ vaddps ymm16, ymm16, ymm21
+ vxorps ymm21, ymm21, ymm21
+ vcmpps ymm7, ymm16, ymm21, 14
+ vptest ymm7, ymm7
je G_M11466_IG24
- ;; size=328 bbWeight=4 PerfScore 689.33
+ ;; size=329 bbWeight=4 PerfScore 689.33
G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r12}, byref
- vmulps ymm16, ymm19, ymm19
+ vmulps ymm19, ymm19, ymm19
vmulps ymm2, ymm2, ymm2
- vaddps ymm2, ymm16, ymm2
+ vaddps ymm2, ymm19, ymm2
vmulps ymm0, ymm0, ymm0
vaddps ymm0, ymm2, ymm0
vsqrtps ymm0, ymm0
@@ -1000,14 +1000,14 @@ G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1
vmovups ymm14, ymmword ptr [rbp+0x60]
vmulps ymm2, ymm14, ymm14
vmovups ymm15, ymmword ptr [rbp+0x40]
- vmulps ymm16, ymm15, ymm15
- vaddps ymm2, ymm2, ymm16
- vmovups ymm8, ymmword ptr [rbp+0x20]
- vmulps ymm16, ymm8, ymm8
- vaddps ymm2, ymm2, ymm16
+ vmulps ymm19, ymm15, ymm15
+ vaddps ymm2, ymm2, ymm19
+ vmovups ymm7, ymmword ptr [rbp+0x20]
+ vmulps ymm19, ymm7, ymm7
+ vaddps ymm2, ymm2, ymm19
vsqrtps ymm2, ymm2
vaddps ymm0, ymm0, ymm2
- vsqrtps ymm2, ymm7
+ vsqrtps ymm2, ymm16
vaddps ymm16, ymm0, ymmword ptr [rbp+0x380]
vaddps ymm0, ymm0, ymmword ptr [rbp+0x360]
vmulps ymm2, ymm2, ymm16
@@ -1031,7 +1031,7 @@ G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1
vaddps ymm4, ymm4, ymm0
vaddps ymm5, ymm5, ymm0
vaddps ymm1, ymm1, ymm0
- ;; size=240 bbWeight=2 PerfScore 348.00
+ ;; size=242 bbWeight=2 PerfScore 348.00
G_M11466_IG24: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r12}, byref, isz
vmovups ymm0, ymmword ptr [rbp+0x3A0]
vminps ymm4, ymm0, ymm4
@@ -1062,14 +1062,14 @@ G_M11466_IG24: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1
vmovups ymm15, ymmword ptr [rbp+0x40]
vaddps ymm0, ymm15, ymmword ptr [rbp+0x240]
vmovups ymmword ptr [rbp+0x240], ymm0
- vmovups ymm8, ymmword ptr [rbp+0x20]
- vaddps ymm0, ymm8, ymmword ptr [rbp+0x260]
+ vmovups ymm7, ymmword ptr [rbp+0x20]
+ vaddps ymm0, ymm7, ymmword ptr [rbp+0x260]
vmovups ymmword ptr [rbp+0x260], ymm0
vaddps ymm0, ymm14, ymmword ptr [rbp+0x1C0]
vmovups ymmword ptr [rbp+0x1C0], ymm0
vaddps ymm0, ymm15, ymmword ptr [rbp+0x1E0]
vmovups ymmword ptr [rbp+0x1E0], ymm0
- vaddps ymm0, ymm8, ymmword ptr [rbp+0x200]
+ vaddps ymm0, ymm7, ymmword ptr [rbp+0x200]
vmovups ymmword ptr [rbp+0x200], ymm0
xor ecx, ecx
test r14d, r14d
@@ -1208,7 +1208,7 @@ RWD76 dd 3AB60B61h ; 0.00138889
RWD80 dd C0000000h ; -2
-Total bytes of code 3990, prolog size 154, PerfScore 10975.17, instruction count 746, allocated bytes for code 3990 (MethodHash=0979d335) for method BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)
+Total bytes of code 3993, prolog size 154, PerfScore 10975.17, instruction count 746, allocated bytes for code 3993 (MethodHash=0979d335) for method BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)
; ============================================================
Unwind Info:
Details
Improvements/regressions per collection
Collection |
Contexts with diffs |
Improvements |
Regressions |
Same size |
Improvements (bytes) |
Regressions (bytes) |
benchmarks.run.windows.x64.checked.mch |
1 |
0 |
0 |
1 |
-0 |
+0 |
benchmarks.run_pgo.windows.x64.checked.mch |
1 |
0 |
0 |
1 |
-0 |
+0 |
benchmarks.run_tiered.windows.x64.checked.mch |
1 |
0 |
0 |
1 |
-0 |
+0 |
coreclr_tests.run.windows.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
libraries.crossgen2.windows.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
libraries.pmi.windows.x64.checked.mch |
1 |
1 |
0 |
0 |
-16 |
+0 |
libraries_tests.run.windows.x64.Release.mch |
6 |
1 |
5 |
0 |
-16 |
+18 |
librariestestsnotieredcompilation.run.windows.x64.Release.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
realworld.run.windows.x64.checked.mch |
2 |
1 |
1 |
0 |
-140 |
+3 |
smoke_tests.nativeaot.windows.x64.checked.mch |
0 |
0 |
0 |
0 |
-0 |
+0 |
|
12 |
3 |
6 |
3 |
-172 |
+21 |
Context information
Collection |
Diffed contexts |
MinOpts |
FullOpts |
Missed, base |
Missed, diff |
benchmarks.run.windows.x64.checked.mch |
27,913 |
4 |
27,909 |
0 (0.00%) |
0 (0.00%) |
benchmarks.run_pgo.windows.x64.checked.mch |
102,631 |
50,161 |
52,470 |
19 (0.02%) |
19 (0.02%) |
benchmarks.run_tiered.windows.x64.checked.mch |
54,331 |
36,871 |
17,460 |
0 (0.00%) |
0 (0.00%) |
coreclr_tests.run.windows.x64.checked.mch |
573,719 |
341,128 |
232,591 |
8 (0.00%) |
8 (0.00%) |
libraries.crossgen2.windows.x64.checked.mch |
2,104 |
0 |
2,104 |
0 (0.00%) |
0 (0.00%) |
libraries.pmi.windows.x64.checked.mch |
309,142 |
6 |
309,136 |
0 (0.00%) |
0 (0.00%) |
libraries_tests.run.windows.x64.Release.mch |
671,200 |
476,124 |
195,076 |
111 (0.02%) |
111 (0.02%) |
librariestestsnotieredcompilation.run.windows.x64.Release.mch |
320,485 |
21,924 |
298,561 |
0 (0.00%) |
0 (0.00%) |
realworld.run.windows.x64.checked.mch |
36,840 |
3 |
36,837 |
0 (0.00%) |
0 (0.00%) |
smoke_tests.nativeaot.windows.x64.checked.mch |
67 |
0 |
67 |
0 (0.00%) |
0 (0.00%) |
|
2,098,432 |
926,221 |
1,172,211 |
138 (0.01%) |
138 (0.01%) |
jit-analyze output
benchmarks.run.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 8730756 (overridden on cmd)
Total bytes of diff: 8730756 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
relative diff is a regression.
Detail diffs
0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.
0 total methods with Code Size differences (0 improved, 0 regressed).
benchmarks.run_pgo.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 35773696 (overridden on cmd)
Total bytes of diff: 35773696 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
relative diff is a regression.
Detail diffs
0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.
0 total methods with Code Size differences (0 improved, 0 regressed).
benchmarks.run_tiered.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 12546772 (overridden on cmd)
Total bytes of diff: 12546772 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
relative diff is a regression.
Detail diffs
0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.
0 total methods with Code Size differences (0 improved, 0 regressed).
libraries.pmi.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 61645293 (overridden on cmd)
Total bytes of diff: 61645277 (overridden on cmd)
Total bytes of delta: -16 (-0.00 % of base)
relative diff is a regression.
Detail diffs
0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.
0 total methods with Code Size differences (0 improved, 0 regressed).
libraries_tests.run.windows.x64.Release.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 278809463 (overridden on cmd)
Total bytes of diff: 278809465 (overridden on cmd)
Total bytes of delta: 2 (0.00 % of base)
relative diff is a regression.
Detail diffs
0 total files with Code Size differences (0 improved, 0 regressed), 6 unchanged.
0 total methods with Code Size differences (0 improved, 0 regressed).
realworld.run.windows.x64.checked.mch
To reproduce these diffs on Windows x64:
superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 13946185 (overridden on cmd)
Total bytes of diff: 13946048 (overridden on cmd)
Total bytes of delta: -137 (-0.00 % of base)
relative diff is a regression.
Detail diffs
0 total files with Code Size differences (0 improved, 0 regressed), 2 unchanged.
0 total methods with Code Size differences (0 improved, 0 regressed).