Assembly Diffs

linux arm64

Diffs are based on 2,259,470 contexts (1,008,044 MinOpts, 1,251,426 FullOpts).

MISSED contexts: 159 (0.01%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm64.checked.mch 32,435 2,362 30,073 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.linux.arm64.checked.mch 152,737 60,751 91,986 14 (0.01%) 14 (0.01%)
benchmarks.run_tiered.linux.arm64.checked.mch 60,787 45,077 15,710 0 (0.00%) 0 (0.00%)
coreclr_tests.run.linux.arm64.checked.mch 626,684 383,548 243,136 12 (0.00%) 12 (0.00%)
libraries.crossgen2.linux.arm64.checked.mch 1,936 0 1,936 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 295,687 6 295,681 3 (0.00%) 3 (0.00%)
libraries_tests.run.linux.arm64.Release.mch 750,983 494,543 256,440 128 (0.02%) 128 (0.02%)
librariestestsnotieredcompilation.run.linux.arm64.Release.mch 304,826 21,600 283,226 2 (0.00%) 2 (0.00%)
realworld.run.linux.arm64.checked.mch 33,343 157 33,186 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.arm64.checked.mch 52 0 52 0 (0.00%) 0 (0.00%)
2,259,470 1,008,044 1,251,426 159 (0.01%) 159 (0.01%)


linux x64

Diffs are based on 2,249,675 contexts (981,298 MinOpts, 1,268,377 FullOpts).

MISSED contexts: 134 (0.01%)

Overall (-6 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,142,945 +0
benchmarks.run_tiered.linux.x64.checked.mch 15,896,118 +0
realworld.run.linux.x64.checked.mch 13,051,281 -6

FullOpts (-6 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 47,800,900 +0
benchmarks.run_tiered.linux.x64.checked.mch 3,637,734 +0
realworld.run.linux.x64.checked.mch 12,662,399 -6

Example diffs

benchmarks.run_pgo.linux.x64.checked.mch

+0 (0.00%) : 68681.dasm - Algorithms.VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (Tier1-OSR)

@@ -184,8 +184,8 @@ G_M57953_IG06: ; bbWeight=100, gcrefRegs=C008 {rbx r14 r15}, byrefRegs=00 vmovups ymmword ptr [rbp+0x170], ymm3 vmovups ymmword ptr [rbp+0x290], ymm6 vpcmpq k1, ymm3, ymm6, 2
- vpmovm2q ymm12, k1 - vpternlogq ymm15, ymm12, ymm2, -128
+ vpmovm2q ymm16, k1 + vpternlogq ymm15, ymm16, ymm2, -128
vmovaps ymm2, ymm15 vptest ymm2, ymm2 vmovups ymm3, ymmword ptr [rbp+0x170]

benchmarks.run_tiered.linux.x64.checked.mch

+0 (0.00%) : 32863.dasm - Algorithms.VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (Tier1-OSR)

@@ -181,8 +181,8 @@ G_M57953_IG08: ; bbWeight=64, gcrefRegs=C008 {rbx r14 r15}, byrefRegs=000 vmovups ymmword ptr [rbp+0x170], ymm3 vmovups ymmword ptr [rbp+0x290], ymm7 vpcmpq k1, ymm3, ymm7, 2
- vpmovm2q ymm13, k1 - vpternlogq ymm15, ymm13, ymm2, -128
+ vpmovm2q ymm16, k1 + vpternlogq ymm15, ymm16, ymm2, -128
vmovaps ymm2, ymm15 vptest ymm2, ymm2 vmovups ymm3, ymmword ptr [rbp+0x170]

realworld.run.linux.x64.checked.mch

-6 (-0.37%) : 1300.dasm - BepuPhysics.CollisionDetection.CollisionTasks.CapsulePairTester:Test(byref,byref,byref,byref,byref,byref,int,byref):this (FullOpts)

@@ -448,8 +448,7 @@ G_M61246_IG05: ; bbWeight=1, extend ;; size=302 bbWeight=1 PerfScore 198.25 G_M61246_IG06: ; bbWeight=1, extend vaddps ymm0, ymm0, ymm2
- vmovaps ymm2, ymm19 - vcmpps ymm2, ymm2, ymm7, 1
+ vcmpps ymm2, ymm19, ymm7, 1
vmulps ymm3, ymm1, ymm15 vsubps ymm3, ymm4, ymm3 vmovaps ymm4, ymm2 @@ -506,7 +505,7 @@ G_M61246_IG06: ; bbWeight=1, extend vcmpps ymm1, ymm1, ymm2, 14 vpand ymm0, ymm0, ymm1 vmovups ymmword ptr [rax+0x1C0], ymm0
- ;; size=324 bbWeight=1 PerfScore 198.67
+ ;; size=318 bbWeight=1 PerfScore 198.42
G_M61246_IG07: ; bbWeight=1, epilog, nogc, extend vzeroupper pop rbp @@ -524,7 +523,7 @@ RWD200 dd 00000000h, 00000000h RWD208 dq 0000000100000001h, 0000000100000001h
-Total bytes of code 1614, prolog size 7, PerfScore 971.33, instruction count 307, allocated bytes for code 1614 (MethodHash=5c6b10c1) for method BepuPhysics.CollisionDetection.CollisionTasks.CapsulePairTester:Test(byref,byref,byref,byref,byref,byref,int,byref):this (FullOpts)
+Total bytes of code 1608, prolog size 7, PerfScore 971.08, instruction count 306, allocated bytes for code 1608 (MethodHash=5c6b10c1) for method BepuPhysics.CollisionDetection.CollisionTasks.CapsulePairTester:Test(byref,byref,byref,byref,byref,byref,int,byref):this (FullOpts)
; ============================================================ Unwind Info:

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.linux.x64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_pgo.linux.x64.checked.mch 1 0 0 1 -0 +0
benchmarks.run_tiered.linux.x64.checked.mch 1 0 0 1 -0 +0
coreclr_tests.run.linux.x64.checked.mch 0 0 0 0 -0 +0
libraries.crossgen2.linux.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.linux.x64.checked.mch 0 0 0 0 -0 +0
libraries_tests.run.linux.x64.Release.mch 0 0 0 0 -0 +0
librariestestsnotieredcompilation.run.linux.x64.Release.mch 0 0 0 0 -0 +0
realworld.run.linux.x64.checked.mch 1 1 0 0 -6 +0
smoke_tests.nativeaot.linux.x64.checked.mch 0 0 0 0 -0 +0
3 1 0 2 -6 +0

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.x64.checked.mch 34,975 3,135 31,840 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.linux.x64.checked.mch 156,552 60,225 96,327 13 (0.01%) 13 (0.01%)
benchmarks.run_tiered.linux.x64.checked.mch 56,296 42,308 13,988 0 (0.00%) 0 (0.00%)
coreclr_tests.run.linux.x64.checked.mch 598,040 355,280 242,760 10 (0.00%) 10 (0.00%)
libraries.crossgen2.linux.x64.checked.mch 1,930 0 1,930 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.x64.checked.mch 296,878 6 296,872 0 (0.00%) 0 (0.00%)
libraries_tests.run.linux.x64.Release.mch 766,353 498,383 267,970 111 (0.01%) 111 (0.01%)
librariestestsnotieredcompilation.run.linux.x64.Release.mch 305,396 21,912 283,484 0 (0.00%) 0 (0.00%)
realworld.run.linux.x64.checked.mch 33,191 49 33,142 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.x64.checked.mch 64 0 64 0 (0.00%) 0 (0.00%)
2,249,675 981,298 1,268,377 134 (0.01%) 134 (0.01%)

jit-analyze output

benchmarks.run_pgo.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 69142945 (overridden on cmd)
Total bytes of diff: 69142945 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


benchmarks.run_tiered.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 15896118 (overridden on cmd)
Total bytes of diff: 15896118 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


realworld.run.linux.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os linux -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 13051281 (overridden on cmd)
Total bytes of diff: 13051275 (overridden on cmd)
Total bytes of delta: -6 (-0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).



osx arm64

Diffs are based on 2,029,386 contexts (927,368 MinOpts, 1,102,018 FullOpts).

MISSED contexts: 109 (0.01%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.osx.arm64.checked.mch 24,861 5 24,856 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.osx.arm64.checked.mch 84,163 48,254 35,909 13 (0.02%) 13 (0.02%)
benchmarks.run_tiered.osx.arm64.checked.mch 48,057 37,339 10,718 0 (0.00%) 0 (0.00%)
coreclr_tests.run.osx.arm64.checked.mch 584,881 356,502 228,379 7 (0.00%) 7 (0.00%)
libraries.crossgen2.osx.arm64.checked.mch 1,881 0 1,881 0 (0.00%) 0 (0.00%)
libraries.pmi.osx.arm64.checked.mch 316,291 18 316,273 3 (0.00%) 3 (0.00%)
libraries_tests.run.osx.arm64.Release.mch 634,566 463,650 170,916 83 (0.01%) 83 (0.01%)
librariestestsnotieredcompilation.run.osx.arm64.Release.mch 303,144 21,597 281,547 2 (0.00%) 2 (0.00%)
realworld.run.osx.arm64.checked.mch 31,542 3 31,539 1 (0.00%) 1 (0.00%)
2,029,386 927,368 1,102,018 109 (0.01%) 109 (0.01%)


windows arm64

Diffs are based on 2,070,850 contexts (937,853 MinOpts, 1,132,997 FullOpts).

MISSED contexts: 139 (0.01%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.arm64.checked.mch 24,455 4 24,451 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.arm64.checked.mch 97,527 48,627 48,900 13 (0.01%) 13 (0.01%)
benchmarks.run_tiered.windows.arm64.checked.mch 49,174 36,718 12,456 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.arm64.checked.mch 595,172 362,437 232,735 11 (0.00%) 11 (0.00%)
libraries.crossgen2.windows.arm64.checked.mch 2,130 0 2,130 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.arm64.checked.mch 305,519 6 305,513 3 (0.00%) 3 (0.00%)
libraries_tests.run.windows.arm64.Release.mch 646,533 468,460 178,073 107 (0.02%) 107 (0.02%)
librariestestsnotieredcompilation.run.windows.arm64.Release.mch 317,022 21,598 295,424 4 (0.00%) 4 (0.00%)
realworld.run.windows.arm64.checked.mch 33,241 3 33,238 1 (0.00%) 1 (0.00%)
smoke_tests.nativeaot.windows.arm64.checked.mch 77 0 77 0 (0.00%) 0 (0.00%)
2,070,850 937,853 1,132,997 139 (0.01%) 139 (0.01%)


windows x64

Diffs are based on 2,227,722 contexts (987,923 MinOpts, 1,239,799 FullOpts).

MISSED contexts: 138 (0.01%)

Overall (-151 bytes)

Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 47,041,738 +0
benchmarks.run.windows.x64.checked.mch 8,730,756 +0
benchmarks.run_pgo.windows.x64.checked.mch 35,773,696 +0
benchmarks.run_tiered.windows.x64.checked.mch 12,546,772 +0
libraries.pmi.windows.x64.checked.mch 61,645,293 -16
libraries_tests.run.windows.x64.Release.mch 278,809,463 +2
realworld.run.windows.x64.checked.mch 13,946,185 -137

FullOpts (-151 bytes)

Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 28,550,689 +0
benchmarks.run.windows.x64.checked.mch 8,730,393 +0
benchmarks.run_pgo.windows.x64.checked.mch 21,741,615 +0
benchmarks.run_tiered.windows.x64.checked.mch 3,451,035 +0
libraries.pmi.windows.x64.checked.mch 61,531,772 -16
libraries_tests.run.windows.x64.Release.mch 106,634,847 +2
realworld.run.windows.x64.checked.mch 13,559,576 -137

Example diffs

aspnet.run.windows.x64.checked.mch

+0 (0.00%) : 101001.dasm - System.SpanHelpers:IndexOfAnyValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,short,short,short,short,int):int (Tier1)

@@ -57,10 +57,10 @@ ;* V45 tmp8 [V45 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V46 tmp9 [V46 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V47 tmp10 [V47 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg"
-; V48 tmp11 [V48,T36] ( 4, 0 ) simd64 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V48 tmp11 [V48,T36] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V49 tmp12 [V49 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V50 tmp13 [V50 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
-; V51 tmp14 [V51,T37] ( 4, 0 ) simd64 -> mm0 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V51 tmp14 [V51,T37] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V52 tmp15 [V52 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V53 tmp16 [V53 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" ; V54 tmp17 [V54,T38] ( 4, 0 ) simd32 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector256`1[short]> @@ -364,8 +364,8 @@ G_M50250_IG30: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm0, zmm5 kord k1, k1, k2
- vpmovm2w zmm6, k1 - vptestmw k1, zmm6, zmm6
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 jne G_M50250_IG31 add rdx, 64 @@ -385,14 +385,14 @@ G_M50250_IG30: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm0, zmm5 kord k1, k1, k2
- vpmovm2w zmm0, k1 - vptestmw k1, zmm0, zmm0
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 je G_M50250_IG22 sub r10, rcx ; byrRegs -[r10] shr r10, 1
- vpmovw2m k1, zmm0
+ vpmovw2m k1, zmm16
kmovd ecx, k1 xor eax, eax tzcnt rax, rcx @@ -406,7 +406,7 @@ G_M50250_IG31: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, sub rax, rcx ; byrRegs -[rax] shr rax, 1
- vpmovw2m k1, zmm6
+ vpmovw2m k1, zmm16
kmovd r11d, k1 xor ecx, ecx ; byrRegs -[rcx]

+0 (0.00%) : 114045.dasm - System.SpanHelpers:IndexOfAnyValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,short,short,short,short,int):int (Tier1)

@@ -57,10 +57,10 @@ ;* V45 tmp8 [V45 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V46 tmp9 [V46 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V47 tmp10 [V47 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg"
-; V48 tmp11 [V48,T37] ( 4, 0 ) simd64 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V48 tmp11 [V48,T37] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V49 tmp12 [V49 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V50 tmp13 [V50 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
-; V51 tmp14 [V51,T38] ( 4, 0 ) simd64 -> mm0 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V51 tmp14 [V51,T38] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V52 tmp15 [V52 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V53 tmp16 [V53 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" ; V54 tmp17 [V54,T39] ( 4, 0 ) simd32 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector256`1[short]> @@ -399,8 +399,8 @@ G_M50250_IG33: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm4, zmm5 kord k1, k1, k2
- vpmovm2w zmm6, k1 - vptestmw k1, zmm6, zmm6
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 jne G_M50250_IG34 add rdx, 64 @@ -420,14 +420,14 @@ G_M50250_IG33: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm4, zmm5 kord k1, k1, k2
- vpmovm2w zmm0, k1 - vptestmw k1, zmm0, zmm0
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 je G_M50250_IG25 sub r10, rcx ; byrRegs -[r10] shr r10, 1
- vpmovw2m k1, zmm0
+ vpmovw2m k1, zmm16
kmovd ecx, k1 xor eax, eax tzcnt rax, rcx @@ -441,7 +441,7 @@ G_M50250_IG34: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, sub rax, rcx ; byrRegs -[rax] shr rax, 1
- vpmovw2m k1, zmm6
+ vpmovw2m k1, zmm16
kmovd r11d, k1 xor ecx, ecx ; byrRegs -[rcx]

+0 (0.00%) : 118531.dasm - System.SpanHelpers:IndexOfAnyValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,short,short,short,short,int):int (Tier1)

@@ -57,10 +57,10 @@ ;* V45 tmp8 [V45 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V46 tmp9 [V46 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V47 tmp10 [V47 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg"
-; V48 tmp11 [V48,T36] ( 4, 0 ) simd64 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V48 tmp11 [V48,T36] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V49 tmp12 [V49 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V50 tmp13 [V50 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
-; V51 tmp14 [V51,T37] ( 4, 0 ) simd64 -> mm0 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V51 tmp14 [V51,T37] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V52 tmp15 [V52 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V53 tmp16 [V53 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" ; V54 tmp17 [V54,T38] ( 4, 0 ) simd32 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector256`1[short]> @@ -362,8 +362,8 @@ G_M50250_IG29: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm0, zmm5 kord k1, k1, k2
- vpmovm2w zmm6, k1 - vptestmw k1, zmm6, zmm6
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 jne G_M50250_IG30 add rdx, 64 @@ -383,14 +383,14 @@ G_M50250_IG29: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm0, zmm5 kord k1, k1, k2
- vpmovm2w zmm0, k1 - vptestmw k1, zmm0, zmm0
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 je G_M50250_IG21 sub r10, rcx ; byrRegs -[r10] shr r10, 1
- vpmovw2m k1, zmm0
+ vpmovw2m k1, zmm16
kmovd ecx, k1 xor eax, eax tzcnt rax, rcx @@ -404,7 +404,7 @@ G_M50250_IG30: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, sub rax, rcx ; byrRegs -[rax] shr rax, 1
- vpmovw2m k1, zmm6
+ vpmovw2m k1, zmm16
kmovd r11d, k1 xor ecx, ecx ; byrRegs -[rcx]

+0 (0.00%) : 128815.dasm - System.SpanHelpers:IndexOfAnyValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,short,short,short,short,int):int (Tier1)

@@ -57,10 +57,10 @@ ;* V45 tmp8 [V45 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V46 tmp9 [V46 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V47 tmp10 [V47 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg"
-; V48 tmp11 [V48,T36] ( 4, 0 ) simd64 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V48 tmp11 [V48,T36] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V49 tmp12 [V49 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V50 tmp13 [V50 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
-; V51 tmp14 [V51,T37] ( 4, 0 ) simd64 -> mm0 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
+; V51 tmp14 [V51,T37] ( 4, 0 ) simd64 -> mm16 "Inlining Arg" <System.Runtime.Intrinsics.Vector512`1[short]>
;* V52 tmp15 [V52 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" ;* V53 tmp16 [V53 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" ; V54 tmp17 [V54,T38] ( 4, 0 ) simd32 -> mm6 "Inlining Arg" <System.Runtime.Intrinsics.Vector256`1[short]> @@ -366,8 +366,8 @@ G_M50250_IG31: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm0, zmm5 kord k1, k1, k2
- vpmovm2w zmm6, k1 - vptestmw k1, zmm6, zmm6
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 jne G_M50250_IG32 add rdx, 64 @@ -387,14 +387,14 @@ G_M50250_IG31: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0406 {rcx rdx r kord k1, k1, k2 vpcmpeqw k2, zmm0, zmm5 kord k1, k1, k2
- vpmovm2w zmm0, k1 - vptestmw k1, zmm0, zmm0
+ vpmovm2w zmm16, k1 + vptestmw k1, zmm16, zmm16
kortestd k1, k1 je G_M50250_IG23 sub r10, rcx ; byrRegs -[r10] shr r10, 1
- vpmovw2m k1, zmm0
+ vpmovw2m k1, zmm16
kmovd ecx, k1 xor eax, eax tzcnt rax, rcx @@ -408,7 +408,7 @@ G_M50250_IG32: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, sub rax, rcx ; byrRegs -[rax] shr rax, 1
- vpmovw2m k1, zmm6
+ vpmovw2m k1, zmm16
kmovd r11d, k1 xor ecx, ecx ; byrRegs -[rcx]

+0 (0.00%) : 67100.dasm - System.Text.Ascii:EqualsIgnoreCase[ushort,ushort,System.Text.Ascii+PlainLoader`1[ushort]](byref,byref,ulong):ubyte (Tier1)

@@ -186,9 +186,9 @@ G_M2558_IG10: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0407 {rax rcx rd kortestd k1, k1 jne G_M2558_IG24 vpcmpeqw k1, zmm4, zmm5
- vpmovm2w zmm6, k1 - vpternlogd zmm16, zmm16, zmm16, -1 - vpxord zmm16, zmm6, zmm16
+ vpmovm2w zmm16, k1 + vpternlogd zmm17, zmm17, zmm17, -1 + vpxord zmm16, zmm16, zmm17
vptestmw k1, zmm16, zmm16 kortestd k1, k1 je SHORT G_M2558_IG12 @@ -197,8 +197,8 @@ G_M2558_IG11: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0407 {rax rcx rd vpord zmm4, zmm4, zmm0 vpord zmm5, zmm5, zmm0 vpsubw zmm17, zmm4, zmm1
- vpandd zmm6, zmm17, zmm16 - vpcmpuw k1, zmm6, zmm2, 6
+ vpandd zmm16, zmm17, zmm16 + vpcmpuw k1, zmm16, zmm2, 6
kortestd k1, k1 setne r9b movzx r9, r9b @@ -275,8 +275,8 @@ G_M2558_IG16: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0407 {rax rcx rd vpor ymm4, ymm4, ymm0 vpor ymm5, ymm5, ymm0 vpsubw ymm16, ymm4, ymm1
- vpandd ymm6, ymm16, ymm6 - vpcmpuw k1, ymm6, ymm2, 6
+ vpandd ymm16, ymm16, ymm6 + vpcmpuw k1, ymm16, ymm2, 6
kortestw k1, k1 setne r9b movzx r9, r9b @@ -350,8 +350,8 @@ G_M2558_IG21: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0407 {rax rcx rd vpor xmm4, xmm4, xmm0 vpor xmm5, xmm5, xmm0 vpsubw xmm16, xmm4, xmm1
- vpandd xmm6, xmm16, xmm6 - vpcmpuw k1, xmm6, xmm2, 6
+ vpandd xmm16, xmm16, xmm6 + vpcmpuw k1, xmm16, xmm2, 6
kortestb k1, k1 setne r9b movzx r9, r9b

benchmarks.run.windows.x64.checked.mch

+0 (0.00%) : 16504.dasm - Algorithms.VectorFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (FullOpts)

@@ -220,8 +220,8 @@ G_M3972_IG07: ; bbWeight=128, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=000 vaddps ymm5, ymm5, ymm16 vcmpps ymm5, ymm5, ymm10, 2 vpcmpd k1, ymm6, ymm7, 2
- vpmovm2d ymm9, k1 - vpternlogd ymm5, ymm9, ymm4, -128
+ vpmovm2d ymm16, k1 + vpternlogd ymm5, ymm16, ymm4, -128
vmovaps ymm4, ymm5 vptest ymm4, ymm4 vmovups ymm1, ymmword ptr [rsp+0x20]

benchmarks.run_pgo.windows.x64.checked.mch

+0 (0.00%) : 31047.dasm - System.Text.Ascii:EqualsIgnoreCase[ushort,ushort,System.Text.Ascii+PlainLoader`1[ushort]](byref,byref,ulong):ubyte (Tier1)

@@ -160,8 +160,8 @@ G_M2558_IG04: ; bbWeight=0.95, gcrefRegs=0000 {}, byrefRegs=0107 {rax rcx vpor xmm4, xmm4, xmm0 vpor xmm5, xmm5, xmm0 vpsubw xmm16, xmm4, xmm1
- vpandd xmm6, xmm16, xmm6 - vpcmpuw k1, xmm6, xmm2, 6
+ vpandd xmm16, xmm16, xmm6 + vpcmpuw k1, xmm16, xmm2, 6
kortestb k1, k1 setne r10b movzx r10, r10b

benchmarks.run_tiered.windows.x64.checked.mch

+0 (0.00%) : 32432.dasm - Algorithms.VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this (Tier1-OSR)

@@ -214,8 +214,8 @@ G_M57953_IG09: ; bbWeight=64, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=000 vaddpd ymm5, ymm5, ymm16 vcmppd ymm5, ymm5, ymm9, 2 vpcmpq k1, ymm6, ymm10, 2
- vpmovm2q ymm3, k1 - vpternlogq ymm5, ymm3, ymm2, -128
+ vpmovm2q ymm16, k1 + vpternlogq ymm5, ymm16, ymm2, -128
vmovaps ymm2, ymm5 vptest ymm2, ymm2 jne SHORT G_M57953_IG09

libraries.pmi.windows.x64.checked.mch

-16 (-5.93%) : 27601.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector1281[ubyte],System.Runtime.Intrinsics.Vector1281[ubyte],byref):System.Runtime.Intrinsics.Vector128`1ubyte

@@ -47,18 +47,16 @@ ; V36 cse1 [V36,T11] ( 3, 3 ) simd16 -> mm3 "CSE - aggressive" ; V37 cse2 [V37,T12] ( 3, 3 ) simd16 -> mm4 "CSE - aggressive" ; V38 cse3 [V38,T13] ( 3, 3 ) simd16 -> mm5 "CSE - aggressive"
-; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm6 "CSE - aggressive" -; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" -; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" -; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm19 "CSE - aggressive"
+; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" +; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" +; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm18 "CSE - aggressive" +; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm20 "CSE - aggressive"
;
-; Lcl frame size = 24
+; Lcl frame size = 0
G_M35004_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- sub rsp, 24
vzeroupper
- vmovaps xmmword ptr [rsp], xmm6 - ;; size=12 bbWeight=1 PerfScore 3.25
+ ;; size=3 bbWeight=1 PerfScore 1.00
G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r8 r9}, byref ; byrRegs +[rcx rdx r8-r9] vmovups xmm0, xmmword ptr [r9] @@ -77,15 +75,15 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpshufb xmm1, xmm4, xmm1 vmovups xmm5, xmmword ptr [reloc @RWD48] vpand xmm2, xmm2, xmm5
- vmovups xmm6, xmmword ptr [reloc @RWD64] - vpcmpub k1, xmm2, xmm6, 6 - vmovups xmm16, xmmword ptr [r8] - vmovups xmm17, xmmword ptr [reloc @RWD80] - vpsubb xmm18, xmm2, xmm17 - vpshufb xmm18, xmm16, xmm18 - vmovups xmm19, xmmword ptr [rdx] - vpshufb xmm2, xmm19, xmm2 - vpblendmb xmm2 {k1}, xmm2, xmm18
+ vmovups xmm16, xmmword ptr [reloc @RWD64] + vpcmpub k1, xmm2, xmm16, 6 + vmovups xmm17, xmmword ptr [r8] + vmovups xmm18, xmmword ptr [reloc @RWD80] + vpsubb xmm19, xmm2, xmm18 + vpshufb xmm19, xmm17, xmm19 + vmovups xmm20, xmmword ptr [rdx] + vpshufb xmm2, xmm20, xmm2 + vpblendmb xmm2 {k1}, xmm2, xmm19
vpand xmm1, xmm2, xmm1 vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 @@ -95,10 +93,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpand xmm2, xmm2, xmm3 vpshufb xmm2, xmm4, xmm2 vpand xmm0, xmm0, xmm5
- vpcmpub k1, xmm0, xmm6, 6 - vpsubb xmm3, xmm0, xmm17 - vpshufb xmm3, xmm16, xmm3 - vpshufb xmm0, xmm19, xmm0
+ vpcmpub k1, xmm0, xmm16, 6 + vpsubb xmm3, xmm0, xmm18 + vpshufb xmm3, xmm17, xmm3 + vpshufb xmm0, xmm20, xmm0
vpblendmb xmm0 {k1}, xmm0, xmm3 vpand xmm0, xmm0, xmm2 vxorps xmm2, xmm2, xmm2 @@ -109,12 +107,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vmovups xmmword ptr [rcx], xmm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 65.25
+ ;; size=250 bbWeight=1 PerfScore 65.25
G_M35004_IG03: ; bbWeight=1, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp] - add rsp, 24
ret
- ;; size=10 bbWeight=1 PerfScore 5.25
+ ;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh RWD16 dq 0707070707070707h, 0707070707070707h RWD32 dq 8040201008040201h, 8040201008040201h @@ -123,7 +119,7 @@ RWD64 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh RWD80 dq 1010101010101010h, 1010101010101010h
-Total bytes of code 270, prolog size 12, PerfScore 73.75, instruction count 53, allocated bytes for code 270 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
+Total bytes of code 254, prolog size 3, PerfScore 67.25, instruction count 49, allocated bytes for code 254 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
; ============================================================ Unwind Info: @@ -131,11 +127,8 @@ Unwind Info: >> End offset : 0xd1ffab1e (not in unwind data) Version : 1 Flags : 0x00
- SizeOfProlog : 0x0C - CountOfUnwindCodes: 3
+ SizeOfProlog : 0x00 + CountOfUnwindCodes: 0
FrameRegister : none (0) FrameOffset : N/A (no FrameRegister) (Value=0) UnwindCodes :
- CodeOffset: 0x0C UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6) - Scaled Small Offset: 0 * 16 = 0 = 0x00000 - CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 2 * 8 + 8 = 24 = 0x18

libraries_tests.run.windows.x64.Release.mch

-16 (-5.93%) : 339303.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector1281[ubyte],System.Runtime.Intrinsics.Vector1281[ubyte],byref):System.Runtime.Intrinsics.Vector128`1ubyte

@@ -48,18 +48,16 @@ ; V36 cse1 [V36,T11] ( 3, 3 ) simd16 -> mm3 "CSE - aggressive" ; V37 cse2 [V37,T12] ( 3, 3 ) simd16 -> mm4 "CSE - aggressive" ; V38 cse3 [V38,T13] ( 3, 3 ) simd16 -> mm5 "CSE - aggressive"
-; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm6 "CSE - aggressive" -; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" -; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" -; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm19 "CSE - aggressive"
+; V39 cse4 [V39,T14] ( 3, 3 ) simd16 -> mm16 "CSE - aggressive" +; V40 cse5 [V40,T15] ( 3, 3 ) simd16 -> mm17 "CSE - aggressive" +; V41 cse6 [V41,T16] ( 3, 3 ) simd16 -> mm18 "CSE - aggressive" +; V42 cse7 [V42,T17] ( 3, 3 ) simd16 -> mm20 "CSE - aggressive"
;
-; Lcl frame size = 24
+; Lcl frame size = 0
G_M35004_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- sub rsp, 24
vzeroupper
- vmovaps xmmword ptr [rsp], xmm6 - ;; size=12 bbWeight=1 PerfScore 3.25
+ ;; size=3 bbWeight=1 PerfScore 1.00
G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r8 r9}, byref ; byrRegs +[rcx rdx r8-r9] vmovups xmm0, xmmword ptr [r9] @@ -78,15 +76,15 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpshufb xmm1, xmm4, xmm1 vmovups xmm5, xmmword ptr [reloc @RWD48] vpand xmm2, xmm2, xmm5
- vmovups xmm6, xmmword ptr [reloc @RWD64] - vpcmpub k1, xmm2, xmm6, 6 - vmovups xmm16, xmmword ptr [r8] - vmovups xmm17, xmmword ptr [reloc @RWD80] - vpsubb xmm18, xmm2, xmm17 - vpshufb xmm18, xmm16, xmm18 - vmovups xmm19, xmmword ptr [rdx] - vpshufb xmm2, xmm19, xmm2 - vpblendmb xmm2 {k1}, xmm2, xmm18
+ vmovups xmm16, xmmword ptr [reloc @RWD64] + vpcmpub k1, xmm2, xmm16, 6 + vmovups xmm17, xmmword ptr [r8] + vmovups xmm18, xmmword ptr [reloc @RWD80] + vpsubb xmm19, xmm2, xmm18 + vpshufb xmm19, xmm17, xmm19 + vmovups xmm20, xmmword ptr [rdx] + vpshufb xmm2, xmm20, xmm2 + vpblendmb xmm2 {k1}, xmm2, xmm19
vpand xmm1, xmm2, xmm1 vxorps xmm2, xmm2, xmm2 vpcmpeqb xmm1, xmm1, xmm2 @@ -96,10 +94,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vpand xmm2, xmm2, xmm3 vpshufb xmm2, xmm4, xmm2 vpand xmm0, xmm0, xmm5
- vpcmpub k1, xmm0, xmm6, 6 - vpsubb xmm3, xmm0, xmm17 - vpshufb xmm3, xmm16, xmm3 - vpshufb xmm0, xmm19, xmm0
+ vpcmpub k1, xmm0, xmm16, 6 + vpsubb xmm3, xmm0, xmm18 + vpshufb xmm3, xmm17, xmm3 + vpshufb xmm0, xmm20, xmm0
vpblendmb xmm0 {k1}, xmm0, xmm3 vpand xmm0, xmm0, xmm2 vxorps xmm2, xmm2, xmm2 @@ -110,12 +108,10 @@ G_M35004_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0306 {rcx rdx r vmovups xmmword ptr [rcx], xmm0 mov rax, rcx ; byrRegs +[rax]
- ;; size=248 bbWeight=1 PerfScore 65.25
+ ;; size=250 bbWeight=1 PerfScore 65.25
G_M35004_IG03: ; bbWeight=1, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp] - add rsp, 24
ret
- ;; size=10 bbWeight=1 PerfScore 5.25
+ ;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh RWD16 dq 0707070707070707h, 0707070707070707h RWD32 dq 8040201008040201h, 8040201008040201h @@ -124,7 +120,7 @@ RWD64 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh RWD80 dq 1010101010101010h, 1010101010101010h
-Total bytes of code 270, prolog size 12, PerfScore 73.75, instruction count 53, allocated bytes for code 270 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
+Total bytes of code 254, prolog size 3, PerfScore 67.25, instruction count 49, allocated bytes for code 254 (MethodHash=a0077743) for method System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (Tier1)
; ============================================================ Unwind Info: @@ -132,11 +128,8 @@ Unwind Info: >> End offset : 0xd1ffab1e (not in unwind data) Version : 1 Flags : 0x00
- SizeOfProlog : 0x0C - CountOfUnwindCodes: 3
+ SizeOfProlog : 0x00 + CountOfUnwindCodes: 0
FrameRegister : none (0) FrameOffset : N/A (no FrameRegister) (Value=0) UnwindCodes :
- CodeOffset: 0x0C UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6) - Scaled Small Offset: 0 * 16 = 0 = 0x00000 - CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 2 * 8 + 8 = 24 = 0x18

+2 (+0.10%) : 385984.dasm - System.Numerics.Tensors.TensorPrimitives:g_Vectorized256|2272[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)

@@ -248,21 +248,20 @@ ; V236 tmp204 [V236,T07] ( 4, 7.03) long -> r8 "Cast away GC" ; V237 cse0 [V237,T12] ( 4, 3.51) long -> rbx "CSE - conservative" ;
-; Lcl frame size = 88
+; Lcl frame size = 72
G_M219_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG push rsi push rbx
- sub rsp, 88
+ sub rsp, 72
vzeroupper
- vmovaps xmmword ptr [rsp+0x40], xmm6 - vmovaps xmmword ptr [rsp+0x30], xmm7 - vmovaps xmmword ptr [rsp+0x20], xmm8
+ vmovaps xmmword ptr [rsp+0x30], xmm6 + vmovaps xmmword ptr [rsp+0x20], xmm7
vxorps xmm4, xmm4, xmm4 vmovdqu xmmword ptr [rsp+0x08], xmm4 xor eax, eax mov qword ptr [rsp+0x18], rax
- ;; size=44 bbWeight=1 PerfScore 12.83
+ ;; size=38 bbWeight=1 PerfScore 10.83
G_M219_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0106 {rcx rdx r8}, byref ; byrRegs +[rcx rdx r8] mov rax, r8 @@ -340,8 +339,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0x20] vmovups ymm1, ymmword ptr [r11+0x20] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -349,8 +348,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0x40] vmovups ymm1, ymmword ptr [r11+0x40] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -358,8 +357,8 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0x60] vmovups ymm1, ymmword ptr [r11+0x60] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -380,19 +379,19 @@ G_M219_IG06: ; bbWeight=3.54, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymm0, ymmword ptr [r10+0xA0] vmovups ymm1, ymmword ptr [r11+0xA0] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 vpternlogq ymm5, ymm16, ymm0, -54 vmovups ymm0, ymmword ptr [r10+0xC0]
- ;; size=363 bbWeight=3.54 PerfScore 428.52
+ ;; size=370 bbWeight=3.54 PerfScore 428.52
G_M219_IG07: ; bbWeight=3.54, extend vmovups ymm1, ymmword ptr [r11+0xC0] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -400,8 +399,8 @@ G_M219_IG07: ; bbWeight=3.54, extend vmovups ymm0, ymmword ptr [r10+0xE0] vmovups ymm1, ymmword ptr [r11+0xE0] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -415,7 +414,7 @@ G_M219_IG07: ; bbWeight=3.54, extend add rbx, 256 add r9, -32 jmp G_M219_IG05
- ;; size=174 bbWeight=3.54 PerfScore 148.74
+ ;; size=177 bbWeight=3.54 PerfScore 148.74
G_M219_IG08: ; bbWeight=0.88, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref mov rcx, r10 ; byrRegs +[rcx] @@ -542,15 +541,14 @@ G_M219_IG20: ; bbWeight=0.99, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, by vmovups ymmword ptr [rax], ymm2 ;; size=4 bbWeight=0.99 PerfScore 1.98 G_M219_IG21: ; bbWeight=0.99, epilog, nogc, extend
- vmovaps xmm6, xmmword ptr [rsp+0x40] - vmovaps xmm7, xmmword ptr [rsp+0x30] - vmovaps xmm8, xmmword ptr [rsp+0x20]
+ vmovaps xmm6, xmmword ptr [rsp+0x30] + vmovaps xmm7, xmmword ptr [rsp+0x20]
vzeroupper
- add rsp, 88
+ add rsp, 72
pop rbx pop rsi ret
- ;; size=28 bbWeight=0.99 PerfScore 15.13
+ ;; size=22 bbWeight=0.99 PerfScore 11.17
G_M219_IG22: ; bbWeight=0, gcVars=00000000000000000000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, gcvars, byref cmp r9, 32 jb G_M219_IG08 @@ -568,8 +566,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0x20] vmovups ymm1, ymmword ptr [r11+0x20] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -577,8 +575,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0x40] vmovups ymm1, ymmword ptr [r11+0x40] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -586,8 +584,8 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0x60] vmovups ymm1, ymmword ptr [r11+0x60] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -608,19 +606,19 @@ G_M219_IG23: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0001 {rax}, byref vmovups ymm0, ymmword ptr [r10+0xA0] vmovups ymm1, ymmword ptr [r11+0xA0] vpcmpeqq ymm5, ymm1, ymm0
- vxorps ymm6, ymm6, ymm6 - vpcmpuq k1, ymm1, ymm6, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 vpternlogq ymm5, ymm16, ymm0, -54 vmovups ymm0, ymmword ptr [r10+0xC0]
- ;; size=363 bbWeight=0 PerfScore 0.00
+ ;; size=370 bbWeight=0 PerfScore 0.00
G_M219_IG24: ; bbWeight=0, extend vmovups ymm1, ymmword ptr [r11+0xC0] vpcmpeqq ymm6, ymm1, ymm0
- vxorps ymm7, ymm7, ymm7 - vpcmpuq k1, ymm1, ymm7, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -628,8 +626,8 @@ G_M219_IG24: ; bbWeight=0, extend vmovups ymm0, ymmword ptr [r10+0xE0] vmovups ymm1, ymmword ptr [r11+0xE0] vpcmpeqq ymm7, ymm1, ymm0
- vxorps ymm8, ymm8, ymm8 - vpcmpuq k1, ymm1, ymm8, 1
+ vxorps ymm16, ymm16, ymm16 + vpcmpuq k1, ymm1, ymm16, 1
vpblendmq ymm16 {k1}, ymm0, ymm1 vpcmpuq k1, ymm1, ymm0, 1 vpblendmq ymm0 {k1}, ymm0, ymm1 @@ -645,18 +643,17 @@ G_M219_IG24: ; bbWeight=0, extend cmp r9, 32 jae G_M219_IG23 jmp G_M219_IG08
- ;; size=184 bbWeight=0 PerfScore 0.00
+ ;; size=187 bbWeight=0 PerfScore 0.00
G_M219_IG25: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc ; byrRegs -[rax]
- vmovaps xmm6, xmmword ptr [rsp+0x40] - vmovaps xmm7, xmmword ptr [rsp+0x30] - vmovaps xmm8, xmmword ptr [rsp+0x20]
+ vmovaps xmm6, xmmword ptr [rsp+0x30] + vmovaps xmm7, xmmword ptr [rsp+0x20]
vzeroupper
- add rsp, 88
+ add rsp, 72
pop rbx pop rsi ret
- ;; size=28 bbWeight=0 PerfScore 0.00
+ ;; size=22 bbWeight=0 PerfScore 0.00
RWD00 dd G_M219_IG20 - G_M219_IG02 dd G_M219_IG19 - G_M219_IG02 dd G_M219_IG18 - G_M219_IG02 @@ -668,7 +665,7 @@ RWD00 dd G_M219_IG20 - G_M219_IG02 dd G_M219_IG12 - G_M219_IG02
-Total bytes of code 2004, prolog size 44, PerfScore 732.42, instruction count 346, allocated bytes for code 2004 (MethodHash=9888ff24) for method System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|227_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
+Total bytes of code 2006, prolog size 38, PerfScore 726.45, instruction count 343, allocated bytes for code 2006 (MethodHash=9888ff24) for method System.Numerics.Tensors.TensorPrimitives:<InvokeSpanSpanIntoSpan>g__Vectorized256|227_2[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ulong]](byref,byref,byref,ulong) (Tier1)
; ============================================================ Unwind Info: @@ -676,17 +673,15 @@ Unwind Info: >> End offset : 0xd1ffab1e (not in unwind data) Version : 1 Flags : 0x00
- SizeOfProlog : 0x1B - CountOfUnwindCodes: 9
+ SizeOfProlog : 0x15 + CountOfUnwindCodes: 7
FrameRegister : none (0) FrameOffset : N/A (no FrameRegister) (Value=0) UnwindCodes :
- CodeOffset: 0x1B UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM8 (8) - Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x15 UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM7 (7)
- Scaled Small Offset: 3 * 16 = 48 = 0x00030
+ Scaled Small Offset: 2 * 16 = 32 = 0x00020
CodeOffset: 0x0F UnwindOp: UWOP_SAVE_XMM128 (8) OpInfo: XMM6 (6)
- Scaled Small Offset: 4 * 16 = 64 = 0x00040 - CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 10 * 8 + 8 = 88 = 0x58
+ Scaled Small Offset: 3 * 16 = 48 = 0x00030 + CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 8 * 8 + 8 = 72 = 0x48
CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3) CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6)

+2 (+0.71%) : 393286.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector1281[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)

@@ -85,11 +85,11 @@ G_M8683_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpshufd xmm3, xmm2, -79 vcmpps xmm4, xmm0, xmm1, 14 vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne SHORT G_M8683_IG06
- ;; size=83 bbWeight=1 PerfScore 32.33
+ ;; size=85 bbWeight=1 PerfScore 32.33
G_M8683_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byref ; byrRegs -[rcx] vpternlogd xmm0, xmm0, xmm4, 85 @@ -130,7 +130,7 @@ G_M8683_IG07: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx jmp G_M8683_IG03 ;; size=53 bbWeight=0.16 PerfScore 1.49
-Total bytes of code 280, prolog size 18, PerfScore 74.19, instruction count 59, allocated bytes for code 284 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
+Total bytes of code 282, prolog size 18, PerfScore 74.19, instruction count 59, allocated bytes for code 286 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
; ============================================================ Unwind Info:

+4 (+0.88%) : 397867.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator1[float]](System.Runtime.Intrinsics.Vector2561[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)

@@ -104,11 +104,11 @@ G_M33561_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vandnps xmm6, xmm4, xmm0 vcmpps xmm7, xmm5, xmm6, 1 vcmpps xmm5, xmm5, xmm6, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M33561_IG09
- ;; size=75 bbWeight=1 PerfScore 29.50
+ ;; size=77 bbWeight=1 PerfScore 29.50
G_M33561_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; byrRegs -[rcx rdx] vpternlogd xmm5, xmm5, xmm7, 85 @@ -121,11 +121,11 @@ G_M33561_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vandnps xmm6, xmm4, xmm0 vcmpps xmm7, xmm5, xmm6, 1 vcmpps xmm5, xmm5, xmm6, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M33561_IG08
- ;; size=75 bbWeight=1 PerfScore 17.00
+ ;; size=77 bbWeight=1 PerfScore 17.00
G_M33561_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz vpternlogd xmm5, xmm5, xmm7, 85 vblendvps xmm1 xmm1, xmm0, xmm5 @@ -197,7 +197,7 @@ G_M33561_IG09: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr RWD00 dq 8000000080000000h, 8000000080000000h
-Total bytes of code 457, prolog size 30, PerfScore 97.71, instruction count 91, allocated bytes for code 463 (MethodHash=5eda7ce6) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
+Total bytes of code 461, prolog size 30, PerfScore 97.71, instruction count 91, allocated bytes for code 467 (MethodHash=5eda7ce6) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMinMagnitudeOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
; ============================================================ Unwind Info:

+4 (+1.47%) : 395837.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector1281[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)

@@ -68,11 +68,11 @@ G_M8683_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpshufd xmm3, xmm2, 78 vcmpps xmm4, xmm0, xmm1, 14 vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 je SHORT G_M8683_IG04
- ;; size=45 bbWeight=1 PerfScore 21.33
+ ;; size=47 bbWeight=1 PerfScore 21.33
G_M8683_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, byref vpcmpgtd xmm2, xmm3, xmm2 vxorps xmm6, xmm6, xmm6 @@ -99,11 +99,11 @@ G_M8683_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vpshufd xmm3, xmm2, -79 vcmpps xmm4, xmm0, xmm1, 14 vcmpps xmm5, xmm0, xmm1, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 je SHORT G_M8683_IG06
- ;; size=83 bbWeight=1 PerfScore 32.33
+ ;; size=85 bbWeight=1 PerfScore 32.33
G_M8683_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byref ; byrRegs -[rcx] vpcmpgtd xmm6, xmm3, xmm2 @@ -129,7 +129,7 @@ G_M8683_IG07: ; bbWeight=1, epilog, nogc, extend ret ;; size=16 bbWeight=1 PerfScore 9.25
-Total bytes of code 273, prolog size 18, PerfScore 78.42, instruction count 58, allocated bytes for code 277 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
+Total bytes of code 277, prolog size 18, PerfScore 78.42, instruction count 58, allocated bytes for code 281 (MethodHash=a7f2de14) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[float]):int (Tier1)
; ============================================================ Unwind Info:

+6 (+1.47%) : 393288.dasm - System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator1[float]](System.Runtime.Intrinsics.Vector2561[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)

@@ -93,11 +93,11 @@ G_M46251_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0006 {rcx rdx}, vextractf128 xmm2, ymm2, 1 vcmpps xmm4, xmm1, xmm0, 14 vcmpps xmm5, xmm1, xmm0, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M46251_IG09
- ;; size=59 bbWeight=1 PerfScore 25.83
+ ;; size=61 bbWeight=1 PerfScore 25.83
G_M46251_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; byrRegs -[rcx rdx] vpternlogd xmm5, xmm5, xmm4, 85 @@ -108,11 +108,11 @@ G_M46251_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpshufd xmm2, xmm3, 78 vcmpps xmm4, xmm1, xmm0, 14 vcmpps xmm5, xmm1, xmm0, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne G_M46251_IG08
- ;; size=67 bbWeight=1 PerfScore 16.33
+ ;; size=69 bbWeight=1 PerfScore 16.33
G_M46251_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz vpternlogd xmm5, xmm5, xmm4, 85 vblendvps xmm1 xmm1, xmm0, xmm5 @@ -122,11 +122,11 @@ G_M46251_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, vpshufd xmm3, xmm0, -79 vcmpps xmm4, xmm1, xmm2, 14 vcmpps xmm5, xmm1, xmm2, 0
- vxorps xmm6, xmm6, xmm6 - vcmpps k1, xmm5, xmm6, 4
+ vxorps xmm16, xmm16, xmm16 + vcmpps k1, xmm5, xmm16, 4
kortestb k1, k1 jne SHORT G_M46251_IG07
- ;; size=63 bbWeight=1 PerfScore 16.33
+ ;; size=65 bbWeight=1 PerfScore 16.33
G_M46251_IG05: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref vpternlogd xmm1, xmm1, xmm4, 85 vblendvps xmm0 xmm0, xmm3, xmm1 @@ -179,7 +179,7 @@ G_M46251_IG09: ; bbWeight=0.16, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr jmp G_M46251_IG03 ;; size=54 bbWeight=0.16 PerfScore 1.09
-Total bytes of code 409, prolog size 24, PerfScore 86.73, instruction count 82, allocated bytes for code 415 (MethodHash=f1994b54) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
+Total bytes of code 415, prolog size 24, PerfScore 86.73, instruction count 82, allocated bytes for code 421 (MethodHash=f1994b54) for method System.Numerics.Tensors.TensorPrimitives:IndexOfFinalAggregate[float,System.Numerics.Tensors.TensorPrimitives+IndexOfMaxOperator`1[float]](System.Runtime.Intrinsics.Vector256`1[float],System.Runtime.Intrinsics.Vector256`1[float]):int (Tier1)
; ============================================================ Unwind Info:

realworld.run.windows.x64.checked.mch

+3 (+0.08%) : 1544.dasm - BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)

@@ -170,7 +170,7 @@ ; V159 tmp109 [V159,T78] ( 6, 22 ) simd32 -> mm3 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]> ;* V160 tmp110 [V160 ] ( 0, 0 ) struct (96) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <BepuUtilities.Vector3Wide> ;* V161 tmp111 [V161 ] ( 0, 0 ) struct (96) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <BepuUtilities.Vector3Wide>
-; V162 tmp112 [V162,T148] ( 3, 10 ) simd32 -> mm7 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
+; V162 tmp112 [V162,T148] ( 3, 10 ) simd32 -> mm16 ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]>
; V163 tmp113 [V163,T95] ( 4, 16 ) simd32 -> mm0 ld-addr-op "Inline stloc first use temp" <System.Numerics.Vector`1[float]> ;* V164 tmp114 [V164 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]> ;* V165 tmp115 [V165 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op "Inline ldloca(s) first use temp" <System.Numerics.Vector`1[float]> @@ -324,7 +324,7 @@ ; V313 cse6 [V313,T101] ( 4, 16 ) simd32 -> mm23 "CSE - conservative" ; V314 cse7 [V314,T65] ( 2, 5 ) long -> rdx hoist "CSE - conservative" ; V315 cse8 [V315,T64] ( 2, 5 ) byref -> [rbp+0x10] spill-single-def hoist "CSE - conservative"
-; V316 rat0 [V316,T77] ( 3, 24 ) simd32 -> mm8 "ReplaceWithLclVar is creating a new local variable"
+; V316 rat0 [V316,T77] ( 3, 24 ) simd32 -> mm7 "ReplaceWithLclVar is creating a new local variable"
; ; Lcl frame size = 1848 @@ -982,16 +982,16 @@ G_M11466_IG22: ; bbWeight=4, extend vaddps ymm16, ymm16, ymm21 vmovups ymm21, ymmword ptr [rbp+0x408] vmulps ymm21, ymm21, ymmword ptr [rbp+0x408]
- vaddps ymm7, ymm16, ymm21 - vxorps ymm8, ymm8, ymm8 - vcmpps ymm8, ymm7, ymm8, 14 - vptest ymm8, ymm8
+ vaddps ymm16, ymm16, ymm21 + vxorps ymm21, ymm21, ymm21 + vcmpps ymm7, ymm16, ymm21, 14 + vptest ymm7, ymm7
je G_M11466_IG24
- ;; size=328 bbWeight=4 PerfScore 689.33
+ ;; size=329 bbWeight=4 PerfScore 689.33
G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r12}, byref
- vmulps ymm16, ymm19, ymm19
+ vmulps ymm19, ymm19, ymm19
vmulps ymm2, ymm2, ymm2
- vaddps ymm2, ymm16, ymm2
+ vaddps ymm2, ymm19, ymm2
vmulps ymm0, ymm0, ymm0 vaddps ymm0, ymm2, ymm0 vsqrtps ymm0, ymm0 @@ -1000,14 +1000,14 @@ G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1 vmovups ymm14, ymmword ptr [rbp+0x60] vmulps ymm2, ymm14, ymm14 vmovups ymm15, ymmword ptr [rbp+0x40]
- vmulps ymm16, ymm15, ymm15 - vaddps ymm2, ymm2, ymm16 - vmovups ymm8, ymmword ptr [rbp+0x20] - vmulps ymm16, ymm8, ymm8 - vaddps ymm2, ymm2, ymm16
+ vmulps ymm19, ymm15, ymm15 + vaddps ymm2, ymm2, ymm19 + vmovups ymm7, ymmword ptr [rbp+0x20] + vmulps ymm19, ymm7, ymm7 + vaddps ymm2, ymm2, ymm19
vsqrtps ymm2, ymm2 vaddps ymm0, ymm0, ymm2
- vsqrtps ymm2, ymm7
+ vsqrtps ymm2, ymm16
vaddps ymm16, ymm0, ymmword ptr [rbp+0x380] vaddps ymm0, ymm0, ymmword ptr [rbp+0x360] vmulps ymm2, ymm2, ymm16 @@ -1031,7 +1031,7 @@ G_M11466_IG23: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1 vaddps ymm4, ymm4, ymm0 vaddps ymm5, ymm5, ymm0 vaddps ymm1, ymm1, ymm0
- ;; size=240 bbWeight=2 PerfScore 348.00
+ ;; size=242 bbWeight=2 PerfScore 348.00
G_M11466_IG24: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r12}, byref, isz vmovups ymm0, ymmword ptr [rbp+0x3A0] vminps ymm4, ymm0, ymm4 @@ -1062,14 +1062,14 @@ G_M11466_IG24: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=1008 {rbx r1 vmovups ymm15, ymmword ptr [rbp+0x40] vaddps ymm0, ymm15, ymmword ptr [rbp+0x240] vmovups ymmword ptr [rbp+0x240], ymm0
- vmovups ymm8, ymmword ptr [rbp+0x20] - vaddps ymm0, ymm8, ymmword ptr [rbp+0x260]
+ vmovups ymm7, ymmword ptr [rbp+0x20] + vaddps ymm0, ymm7, ymmword ptr [rbp+0x260]
vmovups ymmword ptr [rbp+0x260], ymm0 vaddps ymm0, ymm14, ymmword ptr [rbp+0x1C0] vmovups ymmword ptr [rbp+0x1C0], ymm0 vaddps ymm0, ymm15, ymmword ptr [rbp+0x1E0] vmovups ymmword ptr [rbp+0x1E0], ymm0
- vaddps ymm0, ymm8, ymmword ptr [rbp+0x200]
+ vaddps ymm0, ymm7, ymmword ptr [rbp+0x200]
vmovups ymmword ptr [rbp+0x200], ymm0 xor ecx, ecx test r14d, r14d @@ -1208,7 +1208,7 @@ RWD76 dd 3AB60B61h ; 0.00138889 RWD80 dd C0000000h ; -2
-Total bytes of code 3990, prolog size 154, PerfScore 10975.17, instruction count 746, allocated bytes for code 3990 (MethodHash=0979d335) for method BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)
+Total bytes of code 3993, prolog size 154, PerfScore 10975.17, instruction count 746, allocated bytes for code 3993 (MethodHash=0979d335) for method BepuPhysics.CollisionDetection.CollisionTasks.CompoundPairOverlapFinder`2[BepuPhysics.Collidables.Compound,BepuPhysics.Collidables.Compound]:FindLocalOverlaps(byref,int,BepuUtilities.Memory.BufferPool,BepuPhysics.Collidables.Shapes,float,byref):this (FullOpts)
; ============================================================ Unwind Info:

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
aspnet.run.windows.x64.checked.mch 5 0 0 5 -0 +0
benchmarks.run.windows.x64.checked.mch 1 0 0 1 -0 +0
benchmarks.run_pgo.windows.x64.checked.mch 1 0 0 1 -0 +0
benchmarks.run_tiered.windows.x64.checked.mch 1 0 0 1 -0 +0
coreclr_tests.run.windows.x64.checked.mch 0 0 0 0 -0 +0
libraries.crossgen2.windows.x64.checked.mch 0 0 0 0 -0 +0
libraries.pmi.windows.x64.checked.mch 1 1 0 0 -16 +0
libraries_tests.run.windows.x64.Release.mch 6 1 5 0 -16 +18
librariestestsnotieredcompilation.run.windows.x64.Release.mch 0 0 0 0 -0 +0
realworld.run.windows.x64.checked.mch 2 1 1 0 -140 +3
smoke_tests.nativeaot.windows.x64.checked.mch 0 0 0 0 -0 +0
17 3 6 8 -172 +21

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
aspnet.run.windows.x64.checked.mch 129,290 61,702 67,588 0 (0.00%) 0 (0.00%)
benchmarks.run.windows.x64.checked.mch 27,913 4 27,909 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.x64.checked.mch 102,631 50,161 52,470 19 (0.02%) 19 (0.02%)
benchmarks.run_tiered.windows.x64.checked.mch 54,331 36,871 17,460 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.x64.checked.mch 573,719 341,128 232,591 8 (0.00%) 8 (0.00%)
libraries.crossgen2.windows.x64.checked.mch 2,104 0 2,104 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.x64.checked.mch 309,142 6 309,136 0 (0.00%) 0 (0.00%)
libraries_tests.run.windows.x64.Release.mch 671,200 476,124 195,076 111 (0.02%) 111 (0.02%)
librariestestsnotieredcompilation.run.windows.x64.Release.mch 320,485 21,924 298,561 0 (0.00%) 0 (0.00%)
realworld.run.windows.x64.checked.mch 36,840 3 36,837 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 67 0 67 0 (0.00%) 0 (0.00%)
2,227,722 987,923 1,239,799 138 (0.01%) 138 (0.01%)

jit-analyze output

aspnet.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 47041738 (overridden on cmd)
Total bytes of diff: 47041738 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 5 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


benchmarks.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 8730756 (overridden on cmd)
Total bytes of diff: 8730756 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


benchmarks.run_pgo.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 35773696 (overridden on cmd)
Total bytes of diff: 35773696 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


benchmarks.run_tiered.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 12546772 (overridden on cmd)
Total bytes of diff: 12546772 (overridden on cmd)
Total bytes of delta: 0 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


libraries.pmi.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 61645293 (overridden on cmd)
Total bytes of diff: 61645277 (overridden on cmd)
Total bytes of delta: -16 (-0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


libraries_tests.run.windows.x64.Release.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 278809463 (overridden on cmd)
Total bytes of diff: 278809465 (overridden on cmd)
Total bytes of delta: 2 (0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 6 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).


realworld.run.windows.x64.checked.mch

To reproduce these diffs on Windows x64: superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 13946185 (overridden on cmd)
Total bytes of diff: 13946048 (overridden on cmd)
Total bytes of delta: -137 (-0.00 % of base)
    relative diff is a regression.

Detail diffs



0 total files with Code Size differences (0 improved, 0 regressed), 2 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed).