Assembly Diffs

osx arm64

Diffs are based on 2,029,386 contexts (927,368 MinOpts, 1,102,018 FullOpts).

MISSED contexts: 109 (0.01%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.osx.arm64.checked.mch 24,861 5 24,856 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.osx.arm64.checked.mch 84,163 48,254 35,909 13 (0.02%) 13 (0.02%)
benchmarks.run_tiered.osx.arm64.checked.mch 48,057 37,339 10,718 0 (0.00%) 0 (0.00%)
coreclr_tests.run.osx.arm64.checked.mch 584,881 356,502 228,379 7 (0.00%) 7 (0.00%)
libraries.crossgen2.osx.arm64.checked.mch 1,881 0 1,881 0 (0.00%) 0 (0.00%)
libraries.pmi.osx.arm64.checked.mch 316,291 18 316,273 3 (0.00%) 3 (0.00%)
libraries_tests.run.osx.arm64.Release.mch 634,566 463,650 170,916 83 (0.01%) 83 (0.01%)
librariestestsnotieredcompilation.run.osx.arm64.Release.mch 303,144 21,597 281,547 2 (0.00%) 2 (0.00%)
realworld.run.osx.arm64.checked.mch 31,542 3 31,539 1 (0.00%) 1 (0.00%)
2,029,386 927,368 1,102,018 109 (0.01%) 109 (0.01%)


windows arm64

Diffs are based on 2,070,850 contexts (937,853 MinOpts, 1,132,997 FullOpts).

MISSED contexts: 139 (0.01%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.arm64.checked.mch 24,455 4 24,451 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.arm64.checked.mch 97,527 48,627 48,900 13 (0.01%) 13 (0.01%)
benchmarks.run_tiered.windows.arm64.checked.mch 49,174 36,718 12,456 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.arm64.checked.mch 595,172 362,437 232,735 11 (0.00%) 11 (0.00%)
libraries.crossgen2.windows.arm64.checked.mch 2,130 0 2,130 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.arm64.checked.mch 305,519 6 305,513 3 (0.00%) 3 (0.00%)
libraries_tests.run.windows.arm64.Release.mch 646,533 468,460 178,073 107 (0.02%) 107 (0.02%)
librariestestsnotieredcompilation.run.windows.arm64.Release.mch 317,022 21,598 295,424 4 (0.00%) 4 (0.00%)
realworld.run.windows.arm64.checked.mch 33,241 3 33,238 1 (0.00%) 1 (0.00%)
smoke_tests.nativeaot.windows.arm64.checked.mch 77 0 77 0 (0.00%) 0 (0.00%)
2,070,850 937,853 1,132,997 139 (0.01%) 139 (0.01%)


windows x64

Diffs are based on 2,098,432 contexts (926,221 MinOpts, 1,172,211 FullOpts).

MISSED contexts: 138 (0.01%)

Overall (-151 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,730,756 +0
benchmarks.run_pgo.windows.x64.checked.mch 35,773,696 +0
benchmarks.run_tiered.windows.x64.checked.mch 12,546,772 +0
libraries.pmi.windows.x64.checked.mch 61,645,293 -16
libraries_tests.run.windows.x64.Release.mch 278,809,463 +2
realworld.run.windows.x64.checked.mch 13,946,185 -137

FullOpts (-151 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,730,393 +0
benchmarks.run_pgo.windows.x64.checked.mch 21,741,615 +0
benchmarks.run_tiered.windows.x64.checked.mch 3,451,035 +0
libraries.pmi.windows.x64.checked.mch 61,531,772 -16
libraries_tests.run.windows.x64.Release.mch 106,634,847 +2
realworld.run.windows.x64.checked.mch 13,559,576 -137

Example diffs

benchmarks.run.windows.x64.checked.mch

+0 (0.00%) : 16504.dasm - Algorithms.VectorFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (FullOpts)

@@ -220,8 +220,8 @@ G_M3972_IG07: ; bbWeight=128, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=000 vaddps ymm5, ymm5, ymm16 vcmpps ymm5, ymm5, ymm10, 2 vpcmpd k1, ymm6, ymm7, 2
- vpmovm2d ymm9, k1 - vpternlogd ymm5, ymm9, ymm4, -128
+ vpmovm2d ymm16, k1 + vpternlogd ymm5, ymm16, ymm4, -128
vmovaps ymm4, ymm5 vptest ymm4, ymm4 vmovups ymm1, ymmword ptr [rsp+0x20]

benchmarks.run_pgo.windows.x64.checked.mch

+0 (0.00%) : 31047.dasm - System.Text.Ascii:EqualsIgnoreCase[ushort,ushort,System.Text.Ascii+PlainLoader`1[ushort]](byref,byref,ulong):ubyte (Tier1)

@@ -160,8 +160,8 @@ G_M2558_IG04: ; bbWeight=0.95, gcrefRegs=0000 {}, byrefRegs=0107 {rax rcx vpor xmm4, xmm4, xmm0 vpor xmm5, xmm5, xmm0 vpsubw xmm16, xmm4, xmm1
- vpandd xmm6, xmm16, xmm6 - vpcmpuw k1, xmm6, xmm2, 6
+ vpandd xmm16, xmm16, xmm6 + vpcmpuw k1, xmm16, xmm2, 6
kortestb k1, k1 setne r10b movzx r10, r10b

benchmarks.run_tiered.windows.x64.checked.mch

+0 (0.00%) : 32432.dasm - Algorithms.VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (Tier1-OSR)

@@ -214,8 +214,8 @@ G_M57953_IG09: ; bbWeight=64, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=000 vaddpd ymm5, ymm5, ymm16 vcmppd ymm5, ymm5, ymm9, 2 vpcmpq k1, ymm6, ymm10, 2
- vpmovm2q ymm3, k1 - vpternlogq ymm5, ymm3, ymm2, -128
+ vpmovm2q ymm16, k1 + vpternlogq ymm5, ymm16, ymm2, -128
vmovaps ymm2, ymm5 vptest ymm2, ymm2 jne SHORT G_M57953_IG09

libraries.pmi.windows.x64.checked.mch

-16 (-5.93%) : 27601.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector1281[ubyte],System.Runtime.Intrinsics.Vector1281[ubyte],byref):System.Runtime.Intrinsics.Vector128`1ubyte

@@ -47,18 +47,16 @@ ; V36 cse1 [V36,T11] ( 3, 3 ) simd16 -> mm3 "CSE - aggressive" ; V37 cse2 [V37,T12] ( 3, 3 ) simd16 -> mm4 "CSE - aggressive" ; V38 cse3 [V38,T13] ( 3, 3 ) simd16 -> mm5 "CSE - aggressive"
-; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm6 "CSE - aggressive" -; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" -; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" -; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm19 "CSE - aggressive"
+; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" +; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" +; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm18 "CSE - aggressive" +; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm20 "CSE - aggressive"
;
-; Lcl frame size = 24
+; Lcl frame size = 0
G_M35004_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- sub rsp, 24
vzeroupper
- vmovaps xmmword ptr [rsp], xmm6 - ;; size=12 bbWeight=1 PerfScore 3.25
+ ;; size=3 bbWeight=1 PerfScore 1.00
G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r8 r9}, byref ; byrRegs +[rcx rdx r8-r9] vmovups xmm0, xmmword ptr [r9] @@ -77,15 +75,15 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpshufb xmm1, xmm4, xmm1 vmovups xmm5, xmmword ptr [reloc @RWD48] vpand xmm2, xmm2, xmm5
- vmovups xmm6, xmmword ptr [reloc @RWD64] - vpcmpub k1, xmm2, xmm6, 6 - vmovups xmm16, xmmword ptr [r8] - vmovups xmm17, xmmword ptr [reloc @RWD80] - vpsubb xmm18, xmm2, xmm17 - vpshufb xmm18, xmm16, xmm18 - vmovups xmm19, xmmword ptr [rdx] - vpshufb xmm2, xmm19, xmm2 - vpblendmb xmm2 {k1}, xmm2, xmm18
+ vmovups xmm16, xmmword ptr [reloc @RWD64] + vpcmpub k1, xmm2, xmm16, 6 + vmovups xmm17, xmmword ptr [r8] + vmovups xmm18, xmmword ptr [reloc @RWD80] + vpsubb xmm19, xmm2, xmm18 + vpshufb xmm19, xmm17, xmm19 + vmovups xmm20, xmmword ptr [rdx] + vpshufb xmm2, xmm20, xmm2 + vpblendmb xmm2 {k1}, xmm2, xmm19
vpand xmm1, xmm2, xmm1 vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 @@ -95,10 +93,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpand xmm2, xmm2, xmm3 vpshufb xmm2, xmm4, xmm2 vpand xmm0, xmm0, xmm5
- vpcmpub k1, xmm0, xmm6, 6 - vpsubb xmm3, xmm0, xmm17 - vpshufb xmm3, xmm16, xmm3 - vpshufb xmm0, xmm19, xmm0
+ vpcmpub k1, xmm0, xmm16, 6 + vpsubb xmm3, xmm0, xmm18 + vpshufb xmm3, xmm17, xmm3 + vpshufb xmm0, xmm20, xmm0
vpblendmb xmm0 {k1}, xmm0, xmm3 vpand xmm0, xmm0, xmm2 vxorps xmm2, xmm2, xmm2 @@ -109,12 +107,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vmovups xmmword ptr [rcx], xmm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 65.25
+ ;; size=250 bbWeight=1 PerfScore 65.25
G_M35004_IG03: ; bbWeight=1, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp] - add rsp, 24
ret
- ;; size=10 bbWeight=1 PerfScore 5.25
+ ;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh RWD16 dq 0707070707070707h, 0707070707070707h RWD32 dq 8040201008040201h, 8040201008040201h @@ -123,7 +119,7 @@ RWD64 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh RWD80 dq 1010101010101010h, 1010101010101010h
-Total bytes of code 270, prolog size 12, PerfScore 73.75, instruction count 53, allocated bytes for code 270 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
+Total bytes of code 254, prolog size 3, PerfScore 67.25, instruction count 49, allocated bytes for code 254 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
; ============================================================ Unwind Info: @@ -131,11 +127,8 @@ Unwind Info: >> End offset : 0xd1ffab1e (not in unwind data) Version : 1 Flags : 0x00
- SizeOfProlog : 0x0C - CountOfUnwindCodes: 3
+ SizeOfProlog : 0x00 + CountOfUnwindCodes: 0
FrameRegister : none (0) FrameOffset : N/A (no FrameRegister) (Value=0) UnwindCodes :
- CodeOffset: 0x0C UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6) - Scaled Small Offset: 0 * 16 = 0 = 0x00000 - CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 2 * 8 + 8 = 24 = 0x18

libraries_tests.run.windows.x64.Release.mch

-16 (-5.93%) : 339303.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector1281[ubyte],System.Runtime.Intrinsics.Vector1281[ubyte],byref):System.Runtime.Intrinsics.Vector128`1ubyte

@@ -48,18 +48,16 @@ ; V36 cse1 [V36,T11] ( 3, 3 ) simd16 -> mm3 "CSE - aggressive" ; V37 cse2 [V37,T12] ( 3, 3 ) simd16 -> mm4 "CSE - aggressive" ; V38 cse3 [V38,T13] ( 3, 3 ) simd16 -> mm5 "CSE - aggressive"
-; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm6 "CSE - aggressive" -; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" -; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" -; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm19 "CSE - aggressive"
+; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" +; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" +; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm18 "CSE - aggressive" +; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm20 "CSE - aggressive"
;
-; Lcl frame size = 24
+; Lcl frame size = 0
G_M35004_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- sub rsp, 24
vzeroupper
- vmovaps xmmword ptr [rsp], xmm6 - ;; size=12 bbWeight=1 PerfScore 3.25
+ ;; size=3 bbWeight=1 PerfScore 1.00
G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r8 r9}, byref ; byrRegs +[rcx rdx r8-r9] vmovups xmm0, xmmword ptr [r9] @@ -78,15 +76,15 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpshufb xmm1, xmm4, xmm1 vmovups xmm5, xmmword ptr [reloc @RWD48] vpand xmm2, xmm2, xmm5
- vmovups xmm6, xmmword ptr [reloc @RWD64] - vpcmpub k1, xmm2, xmm6, 6 - vmovups xmm16, xmmword ptr [r8] - vmovups xmm17, xmmword ptr [reloc @RWD80] - vpsubb xmm18, xmm2, xmm17 - vpshufb xmm18, xmm16, xmm18 - vmovups xmm19, xmmword ptr [rdx] - vpshufb xmm2, xmm19, xmm2 - vpblendmb xmm2 {k1}, xmm2, xmm18
+ vmovups xmm16, xmmword ptr [reloc @RWD64] + vpcmpub k1, xmm2, xmm16, 6 + vmovups xmm17, xmmword ptr [r8] + vmovups xmm18, xmmword ptr [reloc @RWD80] + vpsubb xmm19, xmm2, xmm18 + vpshufb xmm19, xmm17, xmm19 + vmovups xmm20, xmmword ptr [rdx] + vpshufb xmm2, xmm20, xmm2 + vpblendmb xmm2 {k1}, xmm2, xmm19
vpand xmm1, xmm2, xmm1 vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 @@ -96,10 +94,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpand xmm2, xmm2, xmm3 vpshufb xmm2, xmm4, xmm2 vpand xmm0, xmm0, xmm5
- vpcmpub k1, xmm0, xmm6, 6 - vpsubb xmm3, xmm0, xmm17 - vpshufb xmm3, xmm16, xmm3 - vpshufb xmm0, xmm19, xmm0
+ vpcmpub k1, xmm0, xmm16, 6 + vpsubb xmm3, xmm0, xmm18 + vpshufb xmm3, xmm17, xmm3 + vpshufb xmm0, xmm20, xmm0
vpblendmb xmm0 {k1}, xmm0, xmm3 vpand xmm0, xmm0, xmm2 vxorps xmm2, xmm2, xmm2 @@ -110,12 +108,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vmovups xmmword ptr [rcx], xmm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 65.25
+ ;; size=250 bbWeight=1 PerfScore 65.25
G_M35004_IG03: ; bbWeight=1, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp] - add rsp, 24
ret
- ;; size=10 bbWeight=1 PerfScore 5.25
+ ;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh RWD16 dq 0707070707070707h, 0707070707070707h RWD32 dq 8040201008040201h, 8040201008040201h @@ -124,7 +120,7 @@ RWD64 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh RWD80 dq 1010101010101010h, 1010101010101010h
-Total bytes of code 270, prolog size 12, PerfScore 73.75, instruction count 53, allocated bytes for code 270 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
+Total bytes of code 254, prolog size 3, PerfScore 67.25, instruction count 49, allocated bytes for code 254 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
; ============================================================ Unwind Info: @@ -132,11 +128,8 @@ Unwind Info: >> End offset : 0xd1ffab1e (not in unwind data) Version : 1 Flags : 0x00
- SizeOfProlog : 0x0C - CountOfUnwindCodes: 3
+ SizeOfProlog : 0x00 + CountOfUnwindCodes: 0
FrameRegister : none (0) FrameOffset : N/A (no FrameRegister) (Value=0) UnwindCodes :
- CodeOffset: 0x0C UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6) - Scaled Small Offset: 0 * 16 = 0 = 0x00000 - CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 2 * 8 + 8 = 24 = 0x18

+2 (+0.10%) : 385984.dasm - System.Numerics.Tensors.TensorPrimitives:g_Vectorized256|2272[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)

@@ -248,21 +248,20 @@ ; V236 tmp204 [V236,T07] ( 4, 7.03) long -> r8 "Cast away GC" ; V237 cse0 [V237,T12] ( 4, 3.51) long -> rbx "CSE - conservative" ;
-; Lcl frame size = 88
+; Lcl frame size = 72
G_M219_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG push rsi push rbx
- sub rsp, 88
+ sub rsp, 72
vzeroupper
- vmovaps xmmword ptr [rsp+0x40], xmm6 - vmovaps xmmword ptr [rsp+0x30], xmm7 - vmovaps xmmword ptr [rsp+0x20], xmm8
+ vmovaps xmmword ptr [rsp+0x30], xmm6 + vmovaps xmmword ptr [rsp+0x20], xmm7
vxorps xmm4, xmm4, xmm4 vmovdqu xmmword ptr [rsp+0x08], xmm4 xor eax, eax mov qword ptr [rsp+0x18], rax
- ;; size=44 bbWeight=1 PerfScore 12.83
+ ;; size=38 bbWeight=1 PerfScore 10.83
G_M219_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r8}, byref ; byrRegs +[rcx rdx r8] mov rax, r8 @@ -340,8 +339,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0x20] vmovups ymm1, ymmword ptr [r11+0x20] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -349,8 +348,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0x40] vmovups ymm1, ymmword ptr [r11+0x40] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -358,8 +357,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0x60] vmovups ymm1, ymmword ptr [r11+0x60] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -380,19 +379,19 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0xA0] vmovups ymm1, ymmword ptr [r11+0xA0] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 vpternlogq ymm5, ymm16, ymm0, -54 vmovups ymm0, ymmword ptr [r10+0xC0]
- ;; size=363 bbWeight=3.54 PerfScore 428.52
+ ;; size=370 bbWeight=3.54 PerfScore 428.52
G_M219_IG07: ; bbWeight=3.54, extend vmovups ymm1, ymmword ptr [r11+0xC0] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -400,8 +399,8 @@ G_M219_IG07: ; bbWeight=3.54, extend vmovups ymm0, ymmword ptr [r10+0xE0] vmovups ymm1, ymmword ptr [r11+0xE0] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -415,7 +414,7 @@ G_M219_IG07: ; bbWeight=3.54, extend add rbx, 256 add r9, -32 jmp G_M219_IG05
- ;; size=174 bbWeight=3.54 PerfScore 148.74
+ ;; size=177 bbWeight=3.54 PerfScore 148.74
G_M219_IG08: ; bbWeight=0.88, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref mov rcx, r10 ; byrRegs +[rcx] @@ -542,15 +541,14 @@ G_M219_IG20: ; bbWeight=0.99, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymmword ptr [rax], ymm2 ;; size=4 bbWeight=0.99 PerfScore 1.98 G_M219_IG21: ; bbWeight=0.99, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp+0x40] - vmovaps xmm7, xmmword ptr [rsp+0x30] - vmovaps xmm8, xmmword ptr [rsp+0x20]
+ vmovaps xmm6, xmmword ptr [rsp+0x30] + vmovaps xmm7, xmmword ptr [rsp+0x20]
vzeroupper
- add rsp, 88
+ add rsp, 72
pop rbx pop rsi ret
- ;; size=28 bbWeight=0.99 PerfScore 15.13
+ ;; size=22 bbWeight=0.99 PerfScore 11.17
G_M219_IG22: ; bbWeight=0, gcVars=00000000000000000000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, gcvars, byref cmp r9, 32 jb G_M219_IG08 @@ -568,8 +566,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0x20] vmovups ymm1, ymmword ptr [r11+0x20] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -577,8 +575,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0x40] vmovups ymm1, ymmword ptr [r11+0x40] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -586,8 +584,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0x60] vmovups ymm1, ymmword ptr [r11+0x60] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -608,19 +606,19 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0xA0] vmovups ymm1, ymmword ptr [r11+0xA0] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 vpternlogq ymm5, ymm16, ymm0, -54 vmovups ymm0, ymmword ptr [r10+0xC0]
- ;; size=363 bbWeight=0 PerfScore 0.00
+ ;; size=370 bbWeight=0 PerfScore 0.00
G_M219_IG24: ; bbWeight=0, extend vmovups ymm1, ymmword ptr [r11+0xC0] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -628,8 +626,8 @@ G_M219_IG24: ; bbWeight=0, extend vmovups ymm0, ymmword ptr [r10+0xE0] vmovups ymm1, ymmword ptr [r11+0xE0] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -645,18 +643,17 @@ G_M219_IG24: ; bbWeight=0, extend cmp r9, 32 jae G_M219_IG23 jmp G_M219_IG08
- ;; size=184 bbWeight=0 PerfScore 0.00
+ ;; size=187 bbWeight=0 PerfScore 0.00
G_M219_IG25: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc ; byrRegs -[rax]
- vmovaps xmm6, xmmword ptr [rsp+0x40] - vmovaps xmm7, xmmword ptr [rsp+0x30] - vmovaps xmm8, xmmword ptr [rsp+0x20]
+ vmovaps xmm6, xmmword ptr [rsp+0x30] + vmovaps xmm7, xmmword ptr [rsp+0x20]
vzeroupper
- add rsp, 88
+ add rsp, 72
pop rbx pop rsi ret
- ;; size=28 bbWeight=0 PerfScore 0.00
+ ;; size=22 bbWeight=0 PerfScore 0.00
RWD00 dd G_M219_IG20 - G_M219_IG02 dd G_M219_IG19 - G_M219_IG02 dd G_M219_IG18 - G_M219_IG02 @@ -668,7 +665,7 @@ RWD00 dd G_M219_IG20 - G_M219_IG02 dd G_M219_IG12 - G_M219_IG02
-Total bytes of code 2004, prolog size 44, PerfScore 732.42, instruction count 346, allocated bytes for code 2004 (MethodHash=9888ff24) for method System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|227_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
+Total bytes of code 2006, prolog size 38, PerfScore 726.45, instruction count 343, allocated bytes for code 2006 (MethodHash=9888ff24) for method System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|227_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
; ============================================================ Unwind Info: @@ -676,17 +673,15 @@ Unwind Info: >> End offset : 0xd1ffab1e (not in unwind data) Version : 1 Flags : 0x00
- SizeOfProlog : 0x1B - CountOfUnwindCodes: 9
+ SizeOfProlog : 0x15 + CountOfUnwindCodes: 7
FrameRegister : none (0) FrameOffset : N/A (no FrameRegister) (Value=0) UnwindCodes :
- CodeOffset: 0x1B UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM8 (8) - Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x15 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM7 (7)
- Scaled Small Offset: 3 * 16 = 48 = 0x00030
+ Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x0F UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6)
- Scaled Small Offset: 4 * 16 = 64 = 0x00040 - CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 10 * 8 + 8 = 88 = 0x58
+ Scaled Small Offset: 3 * 16 = 48 = 0x00030 + CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 8 * 8 + 8 = 72 = 0x48
CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3) CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6)

+2 (+0.71%) : 393286.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector1281[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)

@@ -85,11 +85,11 @@ G_M8683_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpshufd xmm3, xmm2, -79 vcmpps xmm4, xmm0, xmm1, 14 vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne SHORT G_M8683_IG06
- ;; size=83 bbWeight=1 PerfScore 32.33
+ ;; size=85 bbWeight=1 PerfScore 32.33
G_M8683_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byref ; byrRegs -[rcx] vpternlogd xmm0, xmm0, xmm4, 85 @@ -130,7 +130,7 @@ G_M8683_IG07: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx jmp G_M8683_IG03 ;; size=53 bbWeight=0.16 PerfScore 1.49
-Total bytes of code 280, prolog size 18, PerfScore 74.19, instruction count 59, allocated bytes for code 284 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
+Total bytes of code 282, prolog size 18, PerfScore 74.19, instruction count 59, allocated bytes for code 286 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
; ============================================================ Unwind Info:

+4 (+0.88%) : 397867.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator1[float]](System.Runtime.Intrinsics.Vector2561[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)

@@ -104,11 +104,11 @@ G_M33561_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vandnps xmm6, xmm4, xmm0 vcmpps xmm7, xmm5, xmm6, 1 vcmpps xmm5, xmm5, xmm6, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M33561_IG09
- ;; size=75 bbWeight=1 PerfScore 29.50
+ ;; size=77 bbWeight=1 PerfScore 29.50
G_M33561_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; byrRegs -[rcx rdx] vpternlogd xmm5, xmm5, xmm7, 85 @@ -121,11 +121,11 @@ G_M33561_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vandnps xmm6, xmm4, xmm0 vcmpps xmm7, xmm5, xmm6, 1 vcmpps xmm5, xmm5, xmm6, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M33561_IG08
- ;; size=75 bbWeight=1 PerfScore 17.00
+ ;; size=77 bbWeight=1 PerfScore 17.00
G_M33561_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz vpternlogd xmm5, xmm5, xmm7, 85 vblendvps xmm1 xmm1, xmm0, xmm5 @@ -197,7 +197,7 @@ G_M33561_IG09: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr RWD00 dq 8000000080000000h, 8000000080000000h
-Total bytes of code 457, prolog size 30, PerfScore 97.71, instruction count 91, allocated bytes for code 463 (MethodHash=5eda7ce6) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
+Total bytes of code 461, prolog size 30, PerfScore 97.71, instruction count 91, allocated bytes for code 467 (MethodHash=5eda7ce6) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
; ============================================================ Unwind Info:

+4 (+1.47%) : 395837.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector1281[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)

@@ -68,11 +68,11 @@ G_M8683_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpshufd xmm3, xmm2, 78 vcmpps xmm4, xmm0, xmm1, 14 vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 je SHORT G_M8683_IG04
- ;; size=45 bbWeight=1 PerfScore 21.33
+ ;; size=47 bbWeight=1 PerfScore 21.33
G_M8683_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, byref vpcmpgtd xmm2, xmm3, xmm2 vxorps xmm6, xmm6, xmm6 @@ -99,11 +99,11 @@ G_M8683_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpshufd xmm3, xmm2, -79 vcmpps xmm4, xmm0, xmm1, 14 vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 je SHORT G_M8683_IG06
- ;; size=83 bbWeight=1 PerfScore 32.33
+ ;; size=85 bbWeight=1 PerfScore 32.33
G_M8683_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byref ; byrRegs -[rcx] vpcmpgtd xmm6, xmm3, xmm2 @@ -129,7 +129,7 @@ G_M8683_IG07: ; bbWeight=1, epilog, nogc, extend ret ;; size=16 bbWeight=1 PerfScore 9.25
-Total bytes of code 273, prolog size 18, PerfScore 78.42, instruction count 58, allocated bytes for code 277 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
+Total bytes of code 277, prolog size 18, PerfScore 78.42, instruction count 58, allocated bytes for code 281 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
; ============================================================ Unwind Info:

+6 (+1.47%) : 393288.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector2561[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)

@@ -93,11 +93,11 @@ G_M46251_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vextractf128 xmm2, ymm2, 1 vcmpps xmm4, xmm1, xmm0, 14 vcmpps xmm5, xmm1, xmm0, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M46251_IG09
- ;; size=59 bbWeight=1 PerfScore 25.83
+ ;; size=61 bbWeight=1 PerfScore 25.83
G_M46251_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; byrRegs -[rcx rdx] vpternlogd xmm5, xmm5, xmm4, 85 @@ -108,11 +108,11 @@ G_M46251_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpshufd xmm2, xmm3, 78 vcmpps xmm4, xmm1, xmm0, 14 vcmpps xmm5, xmm1, xmm0, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M46251_IG08
- ;; size=67 bbWeight=1 PerfScore 16.33
+ ;; size=69 bbWeight=1 PerfScore 16.33
G_M46251_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz vpternlogd xmm5, xmm5, xmm4, 85 vblendvps xmm1 xmm1, xmm0, xmm5 @@ -122,11 +122,11 @@ G_M46251_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, vpshufd xmm3, xmm0, -79 vcmpps xmm4, xmm1, xmm2, 14 vcmpps xmm5, xmm1, xmm2, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne SHORT G_M46251_IG07
- ;; size=63 bbWeight=1 PerfScore 16.33
+ ;; size=65 bbWeight=1 PerfScore 16.33
G_M46251_IG05: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogd xmm1, xmm1, xmm4, 85 vblendvps xmm0 xmm0, xmm3, xmm1 @@ -179,7 +179,7 @@ G_M46251_IG09: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr jmp G_M46251_IG03 ;; size=54 bbWeight=0.16 PerfScore 1.09
-Total bytes of code 409, prolog size 24, PerfScore 86.73, instruction count 82, allocated bytes for code 415 (MethodHash=f1994b54) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
+Total bytes of code 415, prolog size 24, PerfScore 86.73, instruction count 82, allocated bytes for code 421 (MethodHash=f1994b54) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
; ============================================================ Unwind Info:

realworld.run.windows.x64.checked.mch

+3 (+0.08%) : 1544.dasm - BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)

@@ -170,7 +170,7 @@ ; V159 tmp109 [V159,T78] ( 6, 22 ) simd32 -> mm3 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]> ;* V160 tmp110 [V160 ] ( 0, 0 ) struct (96) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <BepuUtilities.Vector3Wide> ;* V161 tmp111 [V161 ] ( 0, 0 ) struct (96) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <BepuUtilities.Vector3Wide>
-; V162 tmp112 [V162,T148] ( 3, 10 ) simd32 -> mm7 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
+; V162 tmp112 [V162,T148] ( 3, 10 ) simd32 -> mm16 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
; V163 tmp113 [V163,T95] ( 4, 16 ) simd32 -> mm0 ld-addr-op "Inline stloc first use temp" <System.Numerics.Vector`1[float]> ;* V164 tmp114 [V164 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]> ;* V165 tmp115 [V165 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]> @@ -324,7 +324,7 @@ ; V313 cse6 [V313,T101] ( 4, 16 ) simd32 -> mm23 "CSE - conservative" ; V314 cse7 [V314,T65] ( 2, 5 ) long -> rdx hoist "CSE - conservative" ; V315 cse8 [V315,T64] ( 2, 5 ) byref -> [rbp+0x10] spill-single-def hoist "CSE - conservative"
-; V316 rat0 [V316,T77] ( 3, 24 ) simd32 -> mm8 "ReplaceWithLclVar is creating a new local variable"
+; V316 rat0 [V316,T77] ( 3, 24 ) simd32 -> mm7 "ReplaceWithLclVar is creating a new local variable"
; ; Lcl frame size = 1848 @@ -982,16 +982,16 @@ G_M11466_IG22: ; bbWeight=4, extend vaddps ymm16, ymm16, ymm21 vmovups ymm21, ymmword ptr [rbp+0x408] vmulps ymm21, ymm21, ymmword ptr [rbp+0x408]
- vaddps ymm7, ymm16, ymm21 - vxorps ymm8, ymm8, ymm8 - vcmpps ymm8, ymm7, ymm8, 14 - vptest ymm8, ymm8
+ vaddps ymm16, ymm16, ymm21 + vxorps ymm21, ymm21, ymm21 + vcmpps ymm7, ymm16, ymm21, 14 + vptest ymm7, ymm7
je G_M11466_IG24
- ;; size=328 bbWeight=4 PerfScore 689.33
+ ;; size=329 bbWeight=4 PerfScore 689.33
G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r12}, byref
- vmulps ymm16, ymm19, ymm19
+ vmulps ymm19, ymm19, ymm19
vmulps ymm2, ymm2, ymm2
- vaddps ymm2, ymm16, ymm2
+ vaddps ymm2, ymm19, ymm2
vmulps ymm0, ymm0, ymm0 vaddps ymm0, ymm2, ymm0 vsqrtps ymm0, ymm0 @@ -1000,14 +1000,14 @@ G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1 vmovups ymm14, ymmword ptr [rbp+0x60] vmulps ymm2, ymm14, ymm14 vmovups ymm15, ymmword ptr [rbp+0x40]
- vmulps ymm16, ymm15, ymm15 - vaddps ymm2, ymm2, ymm16 - vmovups ymm8, ymmword ptr [rbp+0x20] - vmulps ymm16, ymm8, ymm8 - vaddps ymm2, ymm2, ymm16
+ vmulps ymm19, ymm15, ymm15 + vaddps ymm2, ymm2, ymm19 + vmovups ymm7, ymmword ptr [rbp+0x20] + vmulps ymm19, ymm7, ymm7 + vaddps ymm2, ymm2, ymm19
vsqrtps ymm2, ymm2 vaddps ymm0, ymm0, ymm2
- vsqrtps ymm2, ymm7
+ vsqrtps ymm2, ymm16
vaddps ymm16, ymm0, ymmword ptr [rbp+0x380] vaddps ymm0, ymm0, ymmword ptr [rbp+0x360] vmulps ymm2, ymm2, ymm16 @@ -1031,7 +1031,7 @@ G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1 vaddps ymm4, ymm4, ymm0 vaddps ymm5, ymm5, ymm0 vaddps ymm1, ymm1, ymm0
- ;; size=240 bbWeight=2 PerfScore 348.00
+ ;; size=242 bbWeight=2 PerfScore 348.00
G_M11466_IG24: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r12}, byref, isz vmovups ymm0, ymmword ptr [rbp+0x3A0] vminps ymm4, ymm0, ymm4 @@ -1062,14 +1062,14 @@ G_M11466_IG24: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1 vmovups ymm15, ymmword ptr [rbp+0x40] vaddps ymm0, ymm15, ymmword ptr [rbp+0x240] vmovups ymmword ptr [rbp+0x240], ymm0
- vmovups ymm8, ymmword ptr [rbp+0x20] - vaddps ymm0, ymm8, ymmword ptr [rbp+0x260]
+ vmovups ymm7, ymmword ptr [rbp+0x20] + vaddps ymm0, ymm7, ymmword ptr [rbp+0x260]
vmovups ymmword ptr [rbp+0x260], ymm0 vaddps ymm0, ymm14, ymmword ptr [rbp+0x1C0] vmovups ymmword ptr [rbp+0x1C0], ymm0 vaddps ymm0, ymm15, ymmword ptr [rbp+0x1E0] vmovups ymmword ptr [rbp+0x1E0], ymm0
- vaddps ymm0, ymm8, ymmword ptr [rbp+0x200]
+ vaddps ymm0, ymm7, ymmword ptr [rbp+0x200]
vmovups ymmword ptr [rbp+0x200], ymm0 xor ecx, ecx test r14d, r14d @@ -1208,7 +1208,7 @@ RWD76 dd 3AB60B61h ; 0.00138889 RWD80 dd C0000000h ; -2
-Total bytes of code 3990, prolog size 154, PerfScore 10975.17, instruction count 746, allocated bytes for code 3990 (MethodHash=0979d335) for method BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)
+Total bytes of code 3993, prolog size 154, PerfScore 10975.17, instruction count 746, allocated bytes for code 3993 (MethodHash=0979d335) for method BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)
; ============================================================ Unwind Info:

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.windows.x64.checked.mch 1 0 0 1 -0 +0
benchmarks.run_pgo.windows.x64.checked.mch 1 0 0 1 -0 +0
benchmarks.run_tiered.windows.x64.checked.mch 1 0 0 1 -0 +0
coreclr_tests.run.windows.x64.checked.mch 0 0 0 0 -0 +0
libraries.crossgen2.windows.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.windows.x64.checked.mch 1 1 0 0 -16 +0
libraries_tests.run.windows.x64.Release.mch 6 1 5 0 -16 +18
librariestestsnotieredcompilation.run.windows.x64.Release.mch 0 0 0 0 -0 +0
realworld.run.windows.x64.checked.mch 2 1 1 0 -140 +3
smoke_tests.nativeaot.windows.x64.checked.mch 0 0 0 0 -0 +0
12 3 6 3 -172 +21

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.x64.checked.mch 27,913 4 27,909 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.x64.checked.mch 102,631 50,161 52,470 19 (0.02%) 19 (0.02%)
benchmarks.run_tiered.windows.x64.checked.mch 54,331 36,871 17,460 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.x64.checked.mch 573,719 341,128 232,591 8 (0.00%) 8 (0.00%)
libraries.crossgen2.windows.x64.checked.mch 2,104 0 2,104 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.x64.checked.mch 309,142 6 309,136 0 (0.00%) 0 (0.00%)
libraries_tests.run.windows.x64.Release.mch 671,200 476,124 195,076 111 (0.02%) 111 (0.02%)
librariestestsnotieredcompilation.run.windows.x64.Release.mch 320,485 21,924 298,561 0 (0.00%) 0 (0.00%)
realworld.run.windows.x64.checked.mch 36,840 3 36,837 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 67 0 67 0 (0.00%) 0 (0.00%)
2,098,432 926,221 1,172,211 138 (0.01%) 138 (0.01%)

jit-analyze output

benchmarks.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 8730756 (overridden on cmd)
Total bytes of diff: 8730756 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


benchmarks.run_pgo.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 35773696 (overridden on cmd)
Total bytes of diff: 35773696 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


benchmarks.run_tiered.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 12546772 (overridden on cmd)
Total bytes of diff: 12546772 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


libraries.pmi.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 61645293 (overridden on cmd)
Total bytes of diff: 61645277 (overridden on cmd)
Total bytes of delta: -16 (-0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


libraries_tests.run.windows.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 278809463 (overridden on cmd)
Total bytes of diff: 278809465 (overridden on cmd)
Total bytes of delta: 2 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 6 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


realworld.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 13946185 (overridden on cmd)
Total bytes of diff: 13946048 (overridden on cmd)
Total bytes of delta: -137 (-0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 2 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).