Assembly Diffs

linux arm

Diffs are based on 2,237,081 contexts (825,130 MinOpts, 1,411,951 FullOpts).

MISSED contexts: 70,976 (3.08%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm.checked.mch 46,198 5,279 40,919 1,202 (2.54%) 1,202 (2.54%)
benchmarks.run_pgo.linux.arm.checked.mch 159,584 58,093 101,491 3,243 (1.99%) 3,243 (1.99%)
benchmarks.run_tiered.linux.arm.checked.mch 71,534 38,077 33,457 945 (1.30%) 945 (1.30%)
coreclr_tests.run.linux.arm.checked.mch 471,885 259,093 212,792 7,156 (1.49%) 7,156 (1.49%)
libraries.crossgen2.linux.arm.checked.mch 195,441 14 195,427 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm.checked.mch 271,663 6 271,657 7,766 (2.78%) 7,766 (2.78%)
libraries_tests.run.linux.arm.Release.mch 709,797 442,850 266,947 15,984 (2.20%) 15,984 (2.20%)
librariestestsnotieredcompilation.run.linux.arm.Release.mch 274,582 21,565 253,017 33,273 (10.81%) 33,273 (10.81%)
realworld.run.linux.arm.checked.mch 36,397 153 36,244 1,407 (3.72%) 1,407 (3.72%)
2,237,081 825,130 1,411,951 70,976 (3.08%) 70,976 (3.08%)


windows x86

Diffs are based on 2,299,121 contexts (840,463 MinOpts, 1,458,658 FullOpts).

MISSED contexts: 7 (0.00%)

Overall (-2,931 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,123,696 -113
benchmarks.run_pgo.windows.x86.checked.mch 45,854,626 -122
benchmarks.run_tiered.windows.x86.checked.mch 9,444,502 -113
coreclr_tests.run.windows.x86.checked.mch 309,424,823 -576
libraries.pmi.windows.x86.checked.mch 49,148,609 -498
libraries_tests.run.windows.x86.Release.mch 188,553,323 -772
librariestestsnotieredcompilation.run.windows.x86.Release.mch 103,930,242 -737

FullOpts (-2,931 bytes)

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,123,417 -113
benchmarks.run_pgo.windows.x86.checked.mch 39,241,495 -122
benchmarks.run_tiered.windows.x86.checked.mch 5,176,811 -113
coreclr_tests.run.windows.x86.checked.mch 107,730,518 -576
libraries.pmi.windows.x86.checked.mch 49,053,295 -498
libraries_tests.run.windows.x86.Release.mch 90,396,173 -772
librariestestsnotieredcompilation.run.windows.x86.Release.mch 95,260,534 -737

Example diffs

benchmarks.run.windows.x86.checked.mch

-113 (-10.79%) : 22326.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

@@ -9,23 +9,23 @@ ; Final local variable assignments ; ; V00 arg0 [V00,T13] ( 4, 4 ) byref -> edi single-def
-; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0xB4] single-def
+; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0x74] single-def
; V02 arg2 [V02,T14] ( 3, 3 ) int -> [ebp+0x10] single-def ; V03 arg3 [V03,T15] ( 2, 2 ) struct ( 8) [ebp+0x08] do-not-enreg[S] single-def <System.ReadOnlySpan`1[ushort]>
-; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0xB8] spill-single-def
+; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0x78] spill-single-def
; V05 loc1 [V05,T00] ( 19, 93.50) byref -> ebx ; V06 loc2 [V06,T30] ( 5, 10 ) simd16 -> [ebp-0x1C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V07 loc3 [V07,T31] ( 5, 10 ) simd16 -> [ebp-0x2C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]>
-; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0xBC] single-def
+; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0x7C] single-def
; V09 loc5 [V09,T32] ( 3, 8.50) simd32 -> [ebp-0x4C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V10 loc6 [V10,T33] ( 3, 8.50) simd32 -> [ebp-0x6C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0xC0] spill-single-def
+; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0x80] spill-single-def
; V12 loc8 [V12,T20] ( 4, 14 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V13 loc9 [V13,T01] ( 5, 66 ) int -> esi ; V14 loc10 [V14,T07] ( 3, 32.50) byref -> edi ; V15 loc11 [V15,T21] ( 4, 14 ) simd16 -> mm2 <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V16 loc12 [V16,T02] ( 5, 66 ) int -> esi
-; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0xC4] spill-single-def
+; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0x84] spill-single-def
;* V18 tmp0 [V18 ] ( 0, 0 ) int -> zero-ref ;* V19 tmp1 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V20 tmp2 [V20 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" @@ -39,10 +39,10 @@ ; V28 tmp10 [V28,T23] ( 3, 12 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V29 tmp11 [V29,T24] ( 3, 12 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V30 tmp12 [V30,T25] ( 3, 12 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> [ebp-0x8C] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V32 tmp14 [V32,T35] ( 2, 8 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V33 tmp15 [V33 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> [ebp-0xAC] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V35 tmp17 [V35,T16] ( 4, 16 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V36 tmp18 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V37 tmp19 [V37 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -99,10 +99,10 @@ ;* V88 tmp70 [V88 ] ( 0, 0 ) int -> zero-ref "field V80._length (fldOffset=0x4)" P-INDEP ;* V89 tmp71 [V89 ] ( 0, 0 ) byref -> zero-ref "field V82._reference (fldOffset=0x0)" P-INDEP ;* V90 tmp72 [V90 ] ( 0, 0 ) int -> zero-ref "field V82._length (fldOffset=0x4)" P-INDEP
-; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0xC8] spill-single-def "V03.[000..004)" -; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0xB0] spill-single-def "V03.[004..008)"
+; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0x88] spill-single-def "V03.[000..004)" +; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0x70] spill-single-def "V03.[004..008)"
;
-; Lcl frame size = 188
+; Lcl frame size = 124
G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp @@ -110,25 +110,25 @@ G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} push edi push esi push ebx
- sub esp, 188
+ sub esp, 124
vzeroupper mov edi, ecx ; byrRegs +[edi] mov esi, edx ; byrRegs +[esi] mov ebx, dword ptr [ebp+0x10]
- ;; size=22 bbWeight=1 PerfScore 7.00
+ ;; size=19 bbWeight=1 PerfScore 7.00
G_M48875_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C0 {esi edi}, byref, isz mov eax, bword ptr [ebp+0x08] ; byrRegs +[eax]
- mov bword ptr [ebp-0xC8], eax
+ mov bword ptr [ebp-0x88], eax
; GC ptr vars +{V91} mov edx, dword ptr [ebp+0x0C]
- mov dword ptr [ebp-0xB0], edx - mov eax, bword ptr [ebp-0xC8]
+ mov dword ptr [ebp-0x70], edx + mov eax, bword ptr [ebp-0x88]
cmp ebx, 16 jge SHORT G_M48875_IG04
- ;; size=29 bbWeight=1 PerfScore 6.25
+ ;; size=26 bbWeight=1 PerfScore 6.25
G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, gcvars, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -137,16 +137,16 @@ G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs= call [<unknown method>] ; gcrRegs -[ecx edx] ; byrRegs -[eax]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] ;; size=22 bbWeight=0.50 PerfScore 2.25 G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, byref mov dword ptr [ebp+0x10], ebx lea ecx, bword ptr [esi+2*ebx] ; byrRegs +[ecx]
- mov bword ptr [ebp-0xB8], ecx
+ mov bword ptr [ebp-0x78], ecx
; GC ptr vars +{V04}
- mov bword ptr [ebp-0xB4], esi
+ mov bword ptr [ebp-0x74], esi
; GC ptr vars +{V01} mov ebx, esi ; byrRegs +[ebx] @@ -156,7 +156,7 @@ G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {e vmovups xmmword ptr [ebp-0x2C], xmm1 cmp dword ptr [ebp+0x10], 32 jl G_M48875_IG17
- ;; size=49 bbWeight=1 PerfScore 16.75
+ ;; size=43 bbWeight=1 PerfScore 16.75
G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, gcvars, byref ; byrRegs -[esi edi] vmovaps ymm2, ymm0 @@ -167,9 +167,9 @@ G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc vmovups ymmword ptr [ebp-0x6C], ymm3 lea edi, bword ptr [ecx-0x40] ; byrRegs +[edi]
- mov bword ptr [ebp-0xC0], edi
+ mov bword ptr [ebp-0x80], edi
; GC ptr vars +{V11}
- ;; size=39 bbWeight=0.50 PerfScore 4.00
+ ;; size=36 bbWeight=0.50 PerfScore 4.00
G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref, isz ; byrRegs -[ecx edi] vmovups ymm4, ymmword ptr [ebx] @@ -184,41 +184,36 @@ G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, g vpand ymm5, ymm5, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm5, ymm7, ymm5
- vmovups ymmword ptr [ebp-0xAC], ymm5
vpand ymm6, ymm6, ymmword ptr [@RWD96] vpcmpub k1, ymm6, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm6, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm6, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm7, ymm5, ymm6, -54 - vpand ymm5, ymm7, ymmword ptr [ebp-0xAC]
+ vpblendmb ymm6 k1, ymm6, ymm7 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 vpxor ymm5, ymm5, ymm6
- vmovups ymmword ptr [ebp-0x8C], ymm5
vpsrld ymm6, ymm4, 5 vpand ymm6, ymm6, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymmword ptr [@RWD96] vpcmpub k1, ymm4, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm4, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm4, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm5, ymm4, -54 - vpand ymm4, ymm7, ymm6 - vxorps ymm5, ymm5, ymm5 - vpcmpeqb ymm4, ymm4, ymm5 - vpcmpeqd ymm5, ymm5, ymm5 - vpxor ymm4, ymm4, ymm5 - vmovups ymm5, ymmword ptr [ebp-0x8C]
+ vpblendmb ymm4 k1, ymm4, ymm7 + vpand ymm4, ymm4, ymm6 + vxorps ymm6, ymm6, ymm6 + vpcmpeqb ymm4, ymm4, ymm6 + vpcmpeqd ymm6, ymm6, ymm6 + vpxor ymm4, ymm4, ymm6
vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je SHORT G_M48875_IG11
- ;; size=274 bbWeight=4 PerfScore 348.00
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref vpermq ymm4, ymm4, -40 vpmovmskb esi, ymm4 @@ -229,7 +224,7 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { lea edi, bword ptr [ebx+2*edi] ; byrRegs +[edi] movzx ecx, word ptr [edi]
- push dword ptr [ebp-0xB0]
+ push dword ptr [ebp-0x70]
movsx edx, cx mov ecx, eax ; byrRegs +[ecx] @@ -239,19 +234,19 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { jne SHORT G_M48875_IG12 blsr esi, esi jne SHORT G_M48875_IG10
- ;; size=40 bbWeight=16 PerfScore 192.00
+ ;; size=37 bbWeight=16 PerfScore 192.00
G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[edi] add ebx, 64
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
; byrRegs +[edi] cmp ebx, edi
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] vmovups ymm2, ymmword ptr [ebp-0x4C] vmovups ymm3, ymmword ptr [ebp-0x6C] jbe G_M48875_IG06
- mov ecx, bword ptr [ebp-0xB8]
+ mov ecx, bword ptr [ebp-0x78]
; byrRegs +[ecx] cmp ebx, ecx je SHORT G_M48875_IG14 @@ -261,13 +256,13 @@ G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {e ; byrRegs -[esi] cmp esi, 32 jle SHORT G_M48875_IG16
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
mov ebx, edi jmp G_M48875_IG06
- ;; size=65 bbWeight=4 PerfScore 75.00
+ ;; size=56 bbWeight=4 PerfScore 75.00
G_M48875_IG10: ; bbWeight=8, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[eax ecx edi]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] jmp SHORT G_M48875_IG08 ;; size=8 bbWeight=8 PerfScore 24.00 @@ -279,10 +274,10 @@ G_M48875_IG12: ; bbWeight=0.50, gcVars=0000000000001000 {V01}, gcrefRegs= ; GC ptr vars -{V04 V11 V91} mov eax, edi ; byrRegs +[eax]
- sub eax, dword ptr [ebp-0xB4]
+ sub eax, dword ptr [ebp-0x74]
; byrRegs -[eax] shr eax, 1
- ;; size=10 bbWeight=0.50 PerfScore 1.38
+ ;; size=7 bbWeight=0.50 PerfScore 1.38
G_M48875_IG13: ; bbWeight=0.50, epilog, nogc, extend vzeroupper lea esp, [ebp-0x0C] @@ -317,10 +312,10 @@ G_M48875_IG16: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc G_M48875_IG17: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, byref lea edi, bword ptr [ecx-0x20] ...

benchmarks.run_pgo.windows.x86.checked.mch

-122 (-11.36%) : 94556.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

@@ -9,23 +9,23 @@ ; Final local variable assignments ; ; V00 arg0 [V00,T13] ( 4, 4 ) byref -> edi single-def
-; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0xB4] single-def
+; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0x74] single-def
; V02 arg2 [V02,T14] ( 3, 3 ) int -> [ebp+0x10] single-def ; V03 arg3 [V03,T15] ( 2, 2 ) struct ( 8) [ebp+0x08] do-not-enreg[S] single-def <System.ReadOnlySpan`1[ushort]>
-; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0xB8] spill-single-def
+; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0x78] spill-single-def
; V05 loc1 [V05,T00] ( 19, 93.50) byref -> ebx ; V06 loc2 [V06,T30] ( 5, 10 ) simd16 -> [ebp-0x1C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V07 loc3 [V07,T31] ( 5, 10 ) simd16 -> [ebp-0x2C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]>
-; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0xBC] single-def
+; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0x7C] single-def
; V09 loc5 [V09,T32] ( 3, 8.50) simd32 -> [ebp-0x4C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V10 loc6 [V10,T33] ( 3, 8.50) simd32 -> [ebp-0x6C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0xC0] spill-single-def
+; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0x80] spill-single-def
; V12 loc8 [V12,T20] ( 4, 14 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V13 loc9 [V13,T01] ( 5, 66 ) int -> esi ; V14 loc10 [V14,T07] ( 3, 32.50) byref -> edi ; V15 loc11 [V15,T21] ( 4, 14 ) simd16 -> mm2 <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V16 loc12 [V16,T02] ( 5, 66 ) int -> esi
-; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0xC4] spill-single-def
+; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0x84] spill-single-def
;* V18 tmp0 [V18 ] ( 0, 0 ) int -> zero-ref ;* V19 tmp1 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V20 tmp2 [V20 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" @@ -39,10 +39,10 @@ ; V28 tmp10 [V28,T23] ( 3, 12 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V29 tmp11 [V29,T24] ( 3, 12 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V30 tmp12 [V30,T25] ( 3, 12 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> [ebp-0x8C] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V32 tmp14 [V32,T35] ( 2, 8 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V33 tmp15 [V33 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> [ebp-0xAC] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V35 tmp17 [V35,T16] ( 4, 16 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V36 tmp18 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V37 tmp19 [V37 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -99,10 +99,10 @@ ;* V88 tmp70 [V88 ] ( 0, 0 ) int -> zero-ref "field V80._length (fldOffset=0x4)" P-INDEP ;* V89 tmp71 [V89 ] ( 0, 0 ) byref -> zero-ref "field V82._reference (fldOffset=0x0)" P-INDEP ;* V90 tmp72 [V90 ] ( 0, 0 ) int -> zero-ref "field V82._length (fldOffset=0x4)" P-INDEP
-; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0xC8] spill-single-def "V03.[000..004)" -; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0xB0] spill-single-def "V03.[004..008)"
+; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0x88] spill-single-def "V03.[000..004)" +; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0x70] spill-single-def "V03.[004..008)"
;
-; Lcl frame size = 188
+; Lcl frame size = 124
G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp @@ -110,31 +110,31 @@ G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} push edi push esi push ebx
- sub esp, 188
+ sub esp, 124
vzeroupper mov edi, ecx ; byrRegs +[edi] mov esi, edx ; byrRegs +[esi] mov ebx, dword ptr [ebp+0x10]
- ;; size=22 bbWeight=1 PerfScore 7.00
+ ;; size=19 bbWeight=1 PerfScore 7.00
G_M48875_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C0 {esi edi}, byref mov eax, bword ptr [ebp+0x08] ; byrRegs +[eax]
- mov bword ptr [ebp-0xC8], eax
+ mov bword ptr [ebp-0x88], eax
; GC ptr vars +{V91} mov ecx, dword ptr [ebp+0x0C]
- mov dword ptr [ebp-0xB0], ecx
+ mov dword ptr [ebp-0x70], ecx
cmp ebx, 16 jl G_M48875_IG25
- ;; size=27 bbWeight=1 PerfScore 5.25
+ ;; size=24 bbWeight=1 PerfScore 5.25
G_M48875_IG03: ; bbWeight=1, gcVars=0000000000000020 {V91}, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, gcvars, byref mov dword ptr [ebp+0x10], ebx lea edx, bword ptr [esi+2*ebx] ; byrRegs +[edx]
- mov bword ptr [ebp-0xB8], edx
+ mov bword ptr [ebp-0x78], edx
; GC ptr vars +{V04}
- mov bword ptr [ebp-0xB4], esi
+ mov bword ptr [ebp-0x74], esi
; GC ptr vars +{V01} mov ebx, esi ; byrRegs +[ebx] @@ -144,7 +144,7 @@ G_M48875_IG03: ; bbWeight=1, gcVars=0000000000000020 {V91}, gcrefRegs=000 vmovups xmmword ptr [ebp-0x2C], xmm1 cmp dword ptr [ebp+0x10], 32 jl G_M48875_IG16
- ;; size=49 bbWeight=1 PerfScore 16.75
+ ;; size=43 bbWeight=1 PerfScore 16.75
G_M48875_IG04: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gcrefRegs=00000000 {}, byrefRegs=0000000D {eax edx ebx}, gcvars, byref ; byrRegs -[esi edi] vmovaps ymm2, ymm0 @@ -155,10 +155,10 @@ G_M48875_IG04: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc vmovups ymmword ptr [ebp-0x6C], ymm3 lea edi, bword ptr [edx-0x40] ; byrRegs +[edi]
- mov bword ptr [ebp-0xC0], edi
+ mov bword ptr [ebp-0x80], edi
; GC ptr vars +{V11}
- ;; size=39 bbWeight=0.50 PerfScore 4.00 -G_M48875_IG05: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref
+ ;; size=36 bbWeight=0.50 PerfScore 4.00 +G_M48875_IG05: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref, isz
; byrRegs -[edx edi] vmovups ymm4, ymmword ptr [ebx] vmovups ymm5, ymmword ptr [ebx+0x20] @@ -172,41 +172,36 @@ G_M48875_IG05: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, g vpand ymm5, ymm5, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm5, ymm7, ymm5
- vmovups ymmword ptr [ebp-0xAC], ymm5
vpand ymm6, ymm6, ymmword ptr [@RWD96] vpcmpub k1, ymm6, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm6, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm6, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm7, ymm5, ymm6, -54 - vpand ymm5, ymm7, ymmword ptr [ebp-0xAC]
+ vpblendmb ymm6 k1, ymm6, ymm7 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 vpxor ymm5, ymm5, ymm6
- vmovups ymmword ptr [ebp-0x8C], ymm5
vpsrld ymm6, ymm4, 5 vpand ymm6, ymm6, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymmword ptr [@RWD96] vpcmpub k1, ymm4, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm4, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm4, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm5, ymm4, -54 - vpand ymm4, ymm7, ymm6 - vxorps ymm5, ymm5, ymm5 - vpcmpeqb ymm4, ymm4, ymm5 - vpcmpeqd ymm5, ymm5, ymm5 - vpxor ymm4, ymm4, ymm5 - vmovups ymm5, ymmword ptr [ebp-0x8C]
+ vpblendmb ymm4 k1, ymm4, ymm7 + vpand ymm4, ymm4, ymm6 + vxorps ymm6, ymm6, ymm6 + vpcmpeqb ymm4, ymm4, ymm6 + vpcmpeqd ymm6, ymm6, ymm6 + vpxor ymm4, ymm4, ymm6
vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4
- je G_M48875_IG10 - ;; size=278 bbWeight=4 PerfScore 348.00
+ je SHORT G_M48875_IG10 + ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG06: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref vpermq ymm4, ymm4, -40 vpmovmskb esi, ymm4 @@ -231,16 +226,16 @@ G_M48875_IG07: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { G_M48875_IG08: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[edi] add ebx, 64
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
; byrRegs +[edi] cmp ebx, edi
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax]
- mov ecx, dword ptr [ebp-0xB0]
+ mov ecx, dword ptr [ebp-0x70]
vmovups ymm2, ymmword ptr [ebp-0x4C] vmovups ymm3, ymmword ptr [ebp-0x6C] jbe G_M48875_IG05
- mov edx, bword ptr [ebp-0xB8]
+ mov edx, bword ptr [ebp-0x78]
; byrRegs +[edx] cmp ebx, edx je SHORT G_M48875_IG13 @@ -250,17 +245,17 @@ G_M48875_IG08: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {e ; byrRegs -[esi] cmp esi, 32 jle SHORT G_M48875_IG15
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
mov ebx, edi jmp G_M48875_IG05
- ;; size=71 bbWeight=4 PerfScore 79.00
+ ;; size=59 bbWeight=4 PerfScore 79.00
G_M48875_IG09: ; bbWeight=8, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[eax edx edi]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax]
- mov ecx, dword ptr [ebp-0xB0]
+ mov ecx, dword ptr [ebp-0x70]
jmp SHORT G_M48875_IG07
- ;; size=14 bbWeight=8 PerfScore 32.00
+ ;; size=11 bbWeight=8 PerfScore 32.00
G_M48875_IG10: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref, isz jmp SHORT G_M48875_IG08 ;; size=2 bbWeight=2 PerfScore 4.00 @@ -269,10 +264,10 @@ G_M48875_IG11: ; bbWeight=0.50, gcVars=0000000000001000 {V01}, gcrefRegs= ; GC ptr vars -{V04 V11 V91} mov eax, edi ; byrRegs +[eax]
- sub eax, dword ptr [ebp-0xB4]
+ sub eax, dword ptr [ebp-0x74]
; byrRegs -[eax] shr eax, 1
- ;; size=10 bbWeight=0.50 PerfScore 1.38
+ ;; size=7 bbWeight=0.50 PerfScore 1.38
G_M48875_IG12: ; bbWeight=0.50, epilog, nogc, extend vzeroupper lea esp, [ebp-0x0C] @@ -307,9 +302,9 @@ G_M48875_IG15: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc G_M48875_IG16: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=0000000D {eax edx ebx}, byref lea edi, bword ptr [edx-0x20] ; byrRegs +[edi]
- mov bword ptr [ebp-0xBC], edi
+ mov bword ptr [ebp-0x7C], edi
; GC ptr vars +{V08}
- ;; size=9 bbWeight=0.50 PerfScore 0.75
+ ;; size=6 bbWeight=0.50 PerfScore 0.75
G_M48875_IG17: ; bbWeight=4, gcVars=0000000000001620 {V01 V04 V08 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref ; byrRegs -[edx edi] ; GC ptr vars -{V05 V09 V12} @@ -327,12 +322,11 @@ G_M48875_IG17: ; bbWeight=4, gcVars=0000000000001620 {V01 V04 V08 V91}, g vpshufb xmm3, xmm5, xmm3 vpand xmm4, xmm4, xmmword ptr [@RWD96] vpcmpub k1, xmm4, xmmword ptr [@RWD128], 6
- vpmovm2b xmm5, k1 - vpsubb xmm6, xmm4, xmmword ptr [@RWD160] - vpshufb xmm6, xmm1, xmm6
...

benchmarks.run_tiered.windows.x86.checked.mch

-113 (-10.79%) : 44440.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

@@ -9,23 +9,23 @@ ; Final local variable assignments ; ; V00 arg0 [V00,T13] ( 4, 4 ) byref -> edi single-def
-; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0xB4] single-def
+; V01 arg1 [V01,T12] ( 6, 5 ) byref -> [ebp-0x74] single-def
; V02 arg2 [V02,T14] ( 3, 3 ) int -> [ebp+0x10] single-def ; V03 arg3 [V03,T15] ( 2, 2 ) struct ( 8) [ebp+0x08] do-not-enreg[S] single-def <System.ReadOnlySpan`1[ushort]>
-; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0xB8] spill-single-def
+; V04 loc0 [V04,T09] ( 7, 14.50) byref -> [ebp-0x78] spill-single-def
; V05 loc1 [V05,T00] ( 19, 93.50) byref -> ebx ; V06 loc2 [V06,T30] ( 5, 10 ) simd16 -> [ebp-0x1C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V07 loc3 [V07,T31] ( 5, 10 ) simd16 -> [ebp-0x2C] spill-single-def <System.Runtime.Intrinsics.Vector128`1[ubyte]>
-; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0xBC] single-def
+; V08 loc4 [V08,T10] ( 3, 8.50) byref -> [ebp-0x7C] single-def
; V09 loc5 [V09,T32] ( 3, 8.50) simd32 -> [ebp-0x4C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V10 loc6 [V10,T33] ( 3, 8.50) simd32 -> [ebp-0x6C] spill-single-def <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0xC0] spill-single-def
+; V11 loc7 [V11,T11] ( 3, 8.50) byref -> [ebp-0x80] spill-single-def
; V12 loc8 [V12,T20] ( 4, 14 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V13 loc9 [V13,T01] ( 5, 66 ) int -> esi ; V14 loc10 [V14,T07] ( 3, 32.50) byref -> edi ; V15 loc11 [V15,T21] ( 4, 14 ) simd16 -> mm2 <System.Runtime.Intrinsics.Vector128`1[ubyte]> ; V16 loc12 [V16,T02] ( 5, 66 ) int -> esi
-; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0xC4] spill-single-def
+; V17 loc13 [V17,T08] ( 3, 32.50) byref -> [ebp-0x84] spill-single-def
;* V18 tmp0 [V18 ] ( 0, 0 ) int -> zero-ref ;* V19 tmp1 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ;* V20 tmp2 [V20 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" @@ -39,10 +39,10 @@ ; V28 tmp10 [V28,T23] ( 3, 12 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V29 tmp11 [V29,T24] ( 3, 12 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V30 tmp12 [V30,T25] ( 3, 12 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> [ebp-0x8C] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V31 tmp13 [V31,T34] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V32 tmp14 [V32,T35] ( 2, 8 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V33 tmp15 [V33 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> [ebp-0xAC] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V34 tmp16 [V34,T36] ( 2, 8 ) simd32 -> mm5 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V35 tmp17 [V35,T16] ( 4, 16 ) simd32 -> mm6 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V36 tmp18 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V37 tmp19 [V37 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -99,10 +99,10 @@ ;* V88 tmp70 [V88 ] ( 0, 0 ) int -> zero-ref "field V80._length (fldOffset=0x4)" P-INDEP ;* V89 tmp71 [V89 ] ( 0, 0 ) byref -> zero-ref "field V82._reference (fldOffset=0x0)" P-INDEP ;* V90 tmp72 [V90 ] ( 0, 0 ) int -> zero-ref "field V82._length (fldOffset=0x4)" P-INDEP
-; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0xC8] spill-single-def "V03.[000..004)" -; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0xB0] spill-single-def "V03.[004..008)"
+; V91 tmp73 [V91,T05] ( 3, 33 ) byref -> [ebp-0x88] spill-single-def "V03.[000..004)" +; V92 tmp74 [V92,T06] ( 3, 33 ) int -> [ebp-0x70] spill-single-def "V03.[004..008)"
;
-; Lcl frame size = 188
+; Lcl frame size = 124
G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp @@ -110,25 +110,25 @@ G_M48875_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} push edi push esi push ebx
- sub esp, 188
+ sub esp, 124
vzeroupper mov edi, ecx ; byrRegs +[edi] mov esi, edx ; byrRegs +[esi] mov ebx, dword ptr [ebp+0x10]
- ;; size=22 bbWeight=1 PerfScore 7.00
+ ;; size=19 bbWeight=1 PerfScore 7.00
G_M48875_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C0 {esi edi}, byref, isz mov eax, bword ptr [ebp+0x08] ; byrRegs +[eax]
- mov bword ptr [ebp-0xC8], eax
+ mov bword ptr [ebp-0x88], eax
; GC ptr vars +{V91} mov edx, dword ptr [ebp+0x0C]
- mov dword ptr [ebp-0xB0], edx - mov eax, bword ptr [ebp-0xC8]
+ mov dword ptr [ebp-0x70], edx + mov eax, bword ptr [ebp-0x88]
cmp ebx, 16 jge SHORT G_M48875_IG04
- ;; size=29 bbWeight=1 PerfScore 6.25
+ ;; size=26 bbWeight=1 PerfScore 6.25
G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, gcvars, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -137,16 +137,16 @@ G_M48875_IG03: ; bbWeight=0.50, gcVars=0000000000000020 {V91}, gcrefRegs= call [<unknown method>] ; gcrRegs -[ecx edx] ; byrRegs -[eax]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] ;; size=22 bbWeight=0.50 PerfScore 2.25 G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {eax esi edi}, byref mov dword ptr [ebp+0x10], ebx lea ecx, bword ptr [esi+2*ebx] ; byrRegs +[ecx]
- mov bword ptr [ebp-0xB8], ecx
+ mov bword ptr [ebp-0x78], ecx
; GC ptr vars +{V04}
- mov bword ptr [ebp-0xB4], esi
+ mov bword ptr [ebp-0x74], esi
; GC ptr vars +{V01} mov ebx, esi ; byrRegs +[ebx] @@ -156,7 +156,7 @@ G_M48875_IG04: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=000000C1 {e vmovups xmmword ptr [ebp-0x2C], xmm1 cmp dword ptr [ebp+0x10], 32 jl G_M48875_IG17
- ;; size=49 bbWeight=1 PerfScore 16.75
+ ;; size=43 bbWeight=1 PerfScore 16.75
G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, gcvars, byref ; byrRegs -[esi edi] vmovaps ymm2, ymm0 @@ -167,9 +167,9 @@ G_M48875_IG05: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc vmovups ymmword ptr [ebp-0x6C], ymm3 lea edi, bword ptr [ecx-0x40] ; byrRegs +[edi]
- mov bword ptr [ebp-0xC0], edi
+ mov bword ptr [ebp-0x80], edi
; GC ptr vars +{V11}
- ;; size=39 bbWeight=0.50 PerfScore 4.00
+ ;; size=36 bbWeight=0.50 PerfScore 4.00
G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, gcvars, byref, isz ; byrRegs -[ecx edi] vmovups ymm4, ymmword ptr [ebx] @@ -184,41 +184,36 @@ G_M48875_IG06: ; bbWeight=4, gcVars=0000000000001A20 {V01 V04 V11 V91}, g vpand ymm5, ymm5, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm5, ymm7, ymm5
- vmovups ymmword ptr [ebp-0xAC], ymm5
vpand ymm6, ymm6, ymmword ptr [@RWD96] vpcmpub k1, ymm6, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm6, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm6, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm6, ymm2, ymm6
- vpternlogd ymm7, ymm5, ymm6, -54 - vpand ymm5, ymm7, ymmword ptr [ebp-0xAC]
+ vpblendmb ymm6 k1, ymm6, ymm7 + vpand ymm5, ymm6, ymm5
vxorps ymm6, ymm6, ymm6 vpcmpeqb ymm5, ymm5, ymm6 vpcmpeqd ymm6, ymm6, ymm6 vpxor ymm5, ymm5, ymm6
- vmovups ymmword ptr [ebp-0x8C], ymm5
vpsrld ymm6, ymm4, 5 vpand ymm6, ymm6, ymmword ptr [@RWD32] vmovups ymm7, ymmword ptr [@RWD64] vpshufb ymm6, ymm7, ymm6 vpand ymm4, ymm4, ymmword ptr [@RWD96] vpcmpub k1, ymm4, ymmword ptr [@RWD128], 6
- vpmovm2b ymm7, k1 - vpsubb ymm5, ymm4, ymmword ptr [@RWD160] - vpshufb ymm5, ymm3, ymm5
+ vpsubb ymm7, ymm4, ymmword ptr [@RWD160] + vpshufb ymm7, ymm3, ymm7
vpshufb ymm4, ymm2, ymm4
- vpternlogd ymm7, ymm5, ymm4, -54 - vpand ymm4, ymm7, ymm6 - vxorps ymm5, ymm5, ymm5 - vpcmpeqb ymm4, ymm4, ymm5 - vpcmpeqd ymm5, ymm5, ymm5 - vpxor ymm4, ymm4, ymm5 - vmovups ymm5, ymmword ptr [ebp-0x8C]
+ vpblendmb ymm4 k1, ymm4, ymm7 + vpand ymm4, ymm4, ymm6 + vxorps ymm6, ymm6, ymm6 + vpcmpeqb ymm4, ymm4, ymm6 + vpcmpeqd ymm6, ymm6, ymm6 + vpxor ymm4, ymm4, ymm6
vpand ymm4, ymm5, ymm4 vptest ymm4, ymm4 je SHORT G_M48875_IG11
- ;; size=274 bbWeight=4 PerfScore 348.00
+ ;; size=234 bbWeight=4 PerfScore 313.33
G_M48875_IG07: ; bbWeight=2, gcrefRegs=00000000 {}, byrefRegs=00000009 {eax ebx}, byref vpermq ymm4, ymm4, -40 vpmovmskb esi, ymm4 @@ -229,7 +224,7 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { lea edi, bword ptr [ebx+2*edi] ; byrRegs +[edi] movzx ecx, word ptr [edi]
- push dword ptr [ebp-0xB0]
+ push dword ptr [ebp-0x70]
movsx edx, cx mov ecx, eax ; byrRegs +[ecx] @@ -239,19 +234,19 @@ G_M48875_IG08: ; bbWeight=16, gcrefRegs=00000000 {}, byrefRegs=00000009 { jne SHORT G_M48875_IG12 blsr esi, esi jne SHORT G_M48875_IG10
- ;; size=40 bbWeight=16 PerfScore 192.00
+ ;; size=37 bbWeight=16 PerfScore 192.00
G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[edi] add ebx, 64
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
; byrRegs +[edi] cmp ebx, edi
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] vmovups ymm2, ymmword ptr [ebp-0x4C] vmovups ymm3, ymmword ptr [ebp-0x6C] jbe G_M48875_IG06
- mov ecx, bword ptr [ebp-0xB8]
+ mov ecx, bword ptr [ebp-0x78]
; byrRegs +[ecx] cmp ebx, ecx je SHORT G_M48875_IG14 @@ -261,13 +256,13 @@ G_M48875_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000008 {e ; byrRegs -[esi] cmp esi, 32 jle SHORT G_M48875_IG16
- mov edi, bword ptr [ebp-0xC0]
+ mov edi, bword ptr [ebp-0x80]
mov ebx, edi jmp G_M48875_IG06
- ;; size=65 bbWeight=4 PerfScore 75.00
+ ;; size=56 bbWeight=4 PerfScore 75.00
G_M48875_IG10: ; bbWeight=8, gcrefRegs=00000000 {}, byrefRegs=00000008 {ebx}, byref, isz ; byrRegs -[eax ecx edi]
- mov eax, bword ptr [ebp-0xC8]
+ mov eax, bword ptr [ebp-0x88]
; byrRegs +[eax] jmp SHORT G_M48875_IG08 ;; size=8 bbWeight=8 PerfScore 24.00 @@ -279,10 +274,10 @@ G_M48875_IG12: ; bbWeight=0.50, gcVars=0000000000001000 {V01}, gcrefRegs= ; GC ptr vars -{V04 V11 V91} mov eax, edi ; byrRegs +[eax]
- sub eax, dword ptr [ebp-0xB4]
+ sub eax, dword ptr [ebp-0x74]
; byrRegs -[eax] shr eax, 1
- ;; size=10 bbWeight=0.50 PerfScore 1.38
+ ;; size=7 bbWeight=0.50 PerfScore 1.38
G_M48875_IG13: ; bbWeight=0.50, epilog, nogc, extend vzeroupper lea esp, [ebp-0x0C] @@ -317,10 +312,10 @@ G_M48875_IG16: ; bbWeight=0.50, gcVars=0000000000001220 {V01 V04 V91}, gc G_M48875_IG17: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=0000000B {eax ecx ebx}, byref lea edi, bword ptr [ecx-0x20] ...

coreclr_tests.run.windows.x86.checked.mch

-48 (-3.16%) : 207946.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Tier0-FullOpts)

@@ -89,10 +89,9 @@ G_M5927_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, vpbroadcastd ymm2, ecx vmovups ymmword ptr [ebp-0x68], ymm2 vpcmpud k1, ymm2, ymm1, 1
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=60 bbWeight=1 PerfScore 14.00
+ ;; size=54 bbWeight=1 PerfScore 12.83
G_M5927_IG03: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -140,10 +139,9 @@ G_M5927_IG05: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG06: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpud k1, ymm1, ymm2, 1
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG07: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -191,10 +189,9 @@ G_M5927_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG10: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm2, ymm1, 6
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG11: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -242,10 +239,9 @@ G_M5927_IG13: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG14: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm1, ymm2, 6
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG15: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -293,10 +289,9 @@ G_M5927_IG17: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpud k1, ymm2, ymmword ptr [ebp-0x28], 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=25 bbWeight=1 PerfScore 9.58
G_M5927_IG19: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -344,10 +339,9 @@ G_M5927_IG21: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG22: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm2, ymm1, 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG23: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -395,10 +389,9 @@ G_M5927_IG25: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG26: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpud k1, ymm2, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=25 bbWeight=1 PerfScore 9.58
G_M5927_IG27: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -446,10 +439,9 @@ G_M5927_IG29: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG30: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm1, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=25 bbWeight=1 PerfScore 9.58
G_M5927_IG31: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -604,6 +596,6 @@ G_M5927_IG42: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, ret ;; size=10 bbWeight=1 PerfScore 4.00
-; Total bytes of code 1519, prolog size 26, PerfScore 1148.08, instruction count 339, allocated bytes for code 1519 (MethodHash=3106e8d8) for method VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Tier0-FullOpts)
+; Total bytes of code 1471, prolog size 26, PerfScore 1138.75, instruction count 331, allocated bytes for code 1471 (MethodHash=3106e8d8) for method VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Tier0-FullOpts)
; ============================================================

-48 (-3.16%) : 469367.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)

@@ -89,10 +89,9 @@ G_M5927_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, vpbroadcastd ymm2, ecx vmovups ymmword ptr [ebp-0x68], ymm2 vpcmpud k1, ymm2, ymm1, 1
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=60 bbWeight=1 PerfScore 14.00
+ ;; size=54 bbWeight=1 PerfScore 12.83
G_M5927_IG03: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -140,10 +139,9 @@ G_M5927_IG05: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG06: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpud k1, ymm1, ymm2, 1
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG07: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -191,10 +189,9 @@ G_M5927_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG10: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm2, ymm1, 6
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG11: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -242,10 +239,9 @@ G_M5927_IG13: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG14: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm1, ymm2, 6
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG15: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -293,10 +289,9 @@ G_M5927_IG17: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpud k1, ymm2, ymmword ptr [ebp-0x28], 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=25 bbWeight=1 PerfScore 9.58
G_M5927_IG19: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -344,10 +339,9 @@ G_M5927_IG21: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG22: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm2, ymm1, 2
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 8.75
+ ;; size=21 bbWeight=1 PerfScore 7.58
G_M5927_IG23: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -395,10 +389,9 @@ G_M5927_IG25: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG26: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x68] vpcmpud k1, ymm2, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=25 bbWeight=1 PerfScore 9.58
G_M5927_IG27: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -446,10 +439,9 @@ G_M5927_IG29: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, G_M5927_IG30: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm1, ymmword ptr [ebp-0x28], 5
- vpmovm2d ymm3, k1 - vpternlogd ymm3, ymm2, ymm1, -54
+ vpblendmd ymm3 k1, ymm1, ymm2
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 10.75
+ ;; size=25 bbWeight=1 PerfScore 9.58
G_M5927_IG31: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -604,6 +596,6 @@ G_M5927_IG42: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, ret ;; size=10 bbWeight=1 PerfScore 4.00
-; Total bytes of code 1519, prolog size 26, PerfScore 1148.08, instruction count 339, allocated bytes for code 1519 (MethodHash=3106e8d8) for method VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
+; Total bytes of code 1471, prolog size 26, PerfScore 1138.75, instruction count 331, allocated bytes for code 1471 (MethodHash=3106e8d8) for method VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
; ============================================================

-48 (-3.12%) : 207941.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Tier0-FullOpts)

@@ -91,10 +91,9 @@ G_M38654_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} vmovups ymmword ptr [ebp-0x48], ymm2 vmovups ymmword ptr [ebp-0x68], ymm0 vpcmpub k1, ymm0, ymm2, 1
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=60 bbWeight=1 PerfScore 13.50
+ ;; size=54 bbWeight=1 PerfScore 13.00
G_M38654_IG03: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -142,10 +141,9 @@ G_M38654_IG05: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M38654_IG06: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm0, ymmword ptr [ebp-0x68] vpcmpub k1, ymm2, ymm0, 1
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=21 bbWeight=1 PerfScore 9.25
G_M38654_IG07: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -193,10 +191,9 @@ G_M38654_IG09: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M38654_IG10: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpub k1, ymm0, ymm2, 6
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=21 bbWeight=1 PerfScore 9.25
G_M38654_IG11: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -244,10 +241,9 @@ G_M38654_IG13: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M38654_IG14: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpub k1, ymm2, ymm0, 6
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=21 bbWeight=1 PerfScore 9.25
G_M38654_IG15: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -295,10 +291,9 @@ G_M38654_IG17: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M38654_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm0, ymmword ptr [ebp-0x68] vpcmpub k1, ymm0, ymmword ptr [ebp-0x28], 2
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=25 bbWeight=1 PerfScore 11.25
G_M38654_IG19: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -346,10 +341,9 @@ G_M38654_IG21: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M38654_IG22: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpub k1, ymm0, ymm2, 2
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=27 bbWeight=1 PerfScore 9.75
+ ;; size=21 bbWeight=1 PerfScore 9.25
G_M38654_IG23: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -397,10 +391,9 @@ G_M38654_IG25: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M38654_IG26: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm0, ymmword ptr [ebp-0x68] vpcmpub k1, ymm0, ymmword ptr [ebp-0x28], 5
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=25 bbWeight=1 PerfScore 11.25
G_M38654_IG27: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -448,10 +441,9 @@ G_M38654_IG29: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M38654_IG30: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref vmovups ymm2, ymmword ptr [ebp-0x48] vpcmpub k1, ymm2, ymmword ptr [ebp-0x28], 5
- vpmovm2b ymm3, k1 - vpternlogd ymm3, ymm0, ymm2, -54
+ vpblendmb ymm3 k1, ymm2, ymm0
xor esi, esi
- ;; size=31 bbWeight=1 PerfScore 11.75
+ ;; size=25 bbWeight=1 PerfScore 11.25
G_M38654_IG31: ; bbWeight=4, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz mov ecx, esi vmovups ymmword ptr [ebp-0x88], ymm3 @@ -606,6 +598,6 @@ G_M38654_IG42: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} ret ;; size=10 bbWeight=1 PerfScore 4.00
-; Total bytes of code 1539, prolog size 26, PerfScore 1154.58, instruction count 340, allocated bytes for code 1539 (MethodHash=43726901) for method VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Tier0-FullOpts)
+; Total bytes of code 1491, prolog size 26, PerfScore 1150.58, instruction count 332, allocated bytes for code 1491 (MethodHash=43726901) for method VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Tier0-FullOpts)
; ============================================================

-48 (-1.06%) : 469368.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)

@@ -141,9 +141,8 @@ G_M5886_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, mov dword ptr [ebp-0x48], edx vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x44], 1
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -151,7 +150,7 @@ G_M5886_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor ecx, dword ptr [ebp-0x44] xor edx, dword ptr [ebp-0x40] or ecx, edx
- ;; size=217 bbWeight=1 PerfScore 74.75
+ ;; size=211 bbWeight=1 PerfScore 73.75
G_M5886_IG03: ; bbWeight=1, isz, extend je SHORT G_M5886_IG05 ;; size=2 bbWeight=1 PerfScore 1.00 @@ -283,9 +282,8 @@ G_M5886_IG07: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M5886_IG08: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x44] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x24], 1
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -294,7 +292,7 @@ G_M5886_IG08: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M5886_IG10
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M5886_IG09: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -423,9 +421,8 @@ G_M5886_IG12: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M5886_IG13: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x44], 6
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -434,7 +431,7 @@ G_M5886_IG13: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M5886_IG15
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M5886_IG14: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -563,9 +560,8 @@ G_M5886_IG17: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M5886_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x44] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x24], 6
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -574,7 +570,7 @@ G_M5886_IG18: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M5886_IG20
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M5886_IG19: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -703,9 +699,8 @@ G_M5886_IG22: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M5886_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x64], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -714,7 +709,7 @@ G_M5886_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M5886_IG25
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M5886_IG24: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -843,9 +838,8 @@ G_M5886_IG27: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M5886_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x44], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -854,7 +848,7 @@ G_M5886_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M5886_IG30
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M5886_IG29: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -983,9 +977,8 @@ G_M5886_IG32: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M5886_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -994,7 +987,7 @@ G_M5886_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M5886_IG35
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M5886_IG34: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1123,9 +1116,8 @@ G_M5886_IG37: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M5886_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x44] vpcmpuq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -1134,7 +1126,7 @@ G_M5886_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M5886_IG40
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M5886_IG39: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1546,6 +1538,6 @@ G_M5886_IG53: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, ret 16 ;; size=11 bbWeight=1 PerfScore 4.50
-; Total bytes of code 4520, prolog size 32, PerfScore 925.83, instruction count 1041, allocated bytes for code 4520 (MethodHash=7af5e901) for method VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
+; Total bytes of code 4472, prolog size 32, PerfScore 917.83, instruction count 1033, allocated bytes for code 4472 (MethodHash=7af5e901) for method VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
; ============================================================

-24 (-0.54%) : 207938.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)

@@ -699,9 +699,8 @@ G_M59915_IG22: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -710,7 +709,7 @@ G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG25
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG24: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -839,9 +838,8 @@ G_M59915_IG27: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x44], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -850,7 +848,7 @@ G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG30
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG29: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -979,9 +977,8 @@ G_M59915_IG32: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -990,7 +987,7 @@ G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG35
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG34: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1119,9 +1116,8 @@ G_M59915_IG37: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x44] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -1130,7 +1126,7 @@ G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG40
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG39: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1542,6 +1538,6 @@ G_M59915_IG53: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} ret 16 ;; size=11 bbWeight=1 PerfScore 4.50
-; Total bytes of code 4476, prolog size 32, PerfScore 917.83, instruction count 1037, allocated bytes for code 4476 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)
+; Total bytes of code 4452, prolog size 32, PerfScore 913.83, instruction count 1033, allocated bytes for code 4452 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)
; ============================================================

-24 (-0.54%) : 469362.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)

@@ -699,9 +699,8 @@ G_M59915_IG22: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -710,7 +709,7 @@ G_M59915_IG23: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG25
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG24: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -839,9 +838,8 @@ G_M59915_IG27: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x44], 2
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -850,7 +848,7 @@ G_M59915_IG28: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG30
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG29: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -979,9 +977,8 @@ G_M59915_IG32: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x24] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -990,7 +987,7 @@ G_M59915_IG33: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x20] or ecx, edx je SHORT G_M59915_IG35
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG34: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1119,9 +1116,8 @@ G_M59915_IG37: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz vmovups ymm0, ymmword ptr [ebp-0x44] vpcmpq k1, ymm0, ymmword ptr [ebp-0x64], 5
- vpmovm2q ymm0, k1 - vmovups ymm1, ymmword ptr [ebp-0x24] - vpternlogq ymm0, ymm1, ymmword ptr [ebp-0x44], -54
+ vmovups ymm0, ymmword ptr [ebp-0x24] + vpblendmq ymm0 k1, ymmword ptr [ebp-0x44], ymm0
vmovd ecx, xmm0 vmovups ymmword ptr [ebp-0x84], ymm0 vmovaps ymm1, ymm0 @@ -1130,7 +1126,7 @@ G_M59915_IG38: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} xor edx, dword ptr [ebp-0x40] or ecx, edx je SHORT G_M59915_IG40
- ;; size=70 bbWeight=1 PerfScore 27.50
+ ;; size=64 bbWeight=1 PerfScore 26.50
G_M59915_IG39: ; bbWeight=0.50, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref mov ecx, 0xD1FFAB1E ; gcrRegs +[ecx] @@ -1542,6 +1538,6 @@ G_M59915_IG53: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} ret 16 ;; size=11 bbWeight=1 PerfScore 4.50
-; Total bytes of code 4476, prolog size 32, PerfScore 917.83, instruction count 1037, allocated bytes for code 4476 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
+; Total bytes of code 4452, prolog size 32, PerfScore 913.83, instruction count 1033, allocated bytes for code 4452 (MethodHash=e2e315f4) for method VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
; ============================================================

libraries.pmi.windows.x86.checked.mch

-18 (-17.65%) : 273965.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -16,7 +16,7 @@ ;* V05 loc2 [V05 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V06 tmp1 [V06 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 tmp2 [V07 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm2 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm0 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V09 tmp4 [V09 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; ; Lcl frame size = 0 @@ -31,23 +31,20 @@ G_M27576_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M27576_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {ecx}, byref ; byrRegs +[ecx] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm1, zmm0, -54 - vpcmpub k1, zmm0, zmm1, 6 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [ecx], zmm2 - ;; size=69 bbWeight=1 PerfScore 16.33
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 k2, zmm0, zmm1 + vpcmpub k2, zmm0, zmm1, 6 + vpblendmb zmm0 k2, zmm1, zmm0 + vpblendmb zmm0 k1, zmm0, zmm2 + vmovups zmmword ptr [ecx], zmm0 + ;; size=51 bbWeight=1 PerfScore 14.83
G_M27576_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp ret 128 ;; size=7 bbWeight=1 PerfScore 3.50
-; Total bytes of code 102, prolog size 6, PerfScore 28.08, instruction count 19, allocated bytes for code 102 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 84, prolog size 6, PerfScore 26.58, instruction count 16, allocated bytes for code 84 (MethodHash=a5449447) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================

-18 (-17.65%) : 274022.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -16,7 +16,7 @@ ;* V05 loc2 [V05 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V06 tmp1 [V06 ] ( 0, 0 ) simd64 -> zero-ref single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 tmp2 [V07 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument"
-; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm2 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V08 tmp3 [V08,T03] ( 2, 2 ) simd64 -> mm0 single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V09 tmp4 [V09 ] ( 0, 0 ) simd64 -> zero-ref "Inline return value spill temp" <System.Runtime.Intrinsics.Vector512`1[ubyte]> ; ; Lcl frame size = 0 @@ -31,23 +31,20 @@ G_M10214_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M10214_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {ecx}, byref ; byrRegs +[ecx] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm0, zmm1, -54 - vpcmpub k1, zmm0, zmm1, 1 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [ecx], zmm2 - ;; size=69 bbWeight=1 PerfScore 16.33
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 k2, zmm1, zmm0 + vpcmpub k2, zmm0, zmm1, 1 + vpblendmb zmm0 k2, zmm1, zmm0 + vpblendmb zmm0 k1, zmm0, zmm2 + vmovups zmmword ptr [ecx], zmm0 + ;; size=51 bbWeight=1 PerfScore 14.83
G_M10214_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp ret 128 ;; size=7 bbWeight=1 PerfScore 3.50
-; Total bytes of code 102, prolog size 6, PerfScore 28.08, instruction count 19, allocated bytes for code 102 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 84, prolog size 6, PerfScore 26.58, instruction count 16, allocated bytes for code 84 (MethodHash=6846d819) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================

-18 (-17.65%) : 273943.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector5121[ubyte],System.Runtime.Intrinsics.Vector5121[ubyte]):System.Runtime.Intrinsics.Vector5121ubyte

@@ -13,7 +13,7 @@ ; V02 arg1 [V02,T02] ( 4, 4 ) simd64 -> mm1 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V03 loc0 [V03 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V04 loc1 [V04 ] ( 0, 0 ) simd64 -> zero-ref single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
-; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm2 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
+; V05 loc2 [V05,T03] ( 2, 2 ) simd64 -> mm0 single-def <System.Runtime.Intrinsics.Vector512`1[ubyte]>
;* V06 loc3 [V06 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V07 loc4 [V07 ] ( 0, 0 ) simd64 -> zero-ref <System.Runtime.Intrinsics.Vector512`1[ubyte]> ;* V08 loc5 [V08 ] ( 0, 0 ) simd64 -> zero-ref "spilled call-like call argument" @@ -31,23 +31,20 @@ G_M22834_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} G_M22834_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {ecx}, byref ; byrRegs +[ecx] vpcmpeqb k1, zmm0, zmm1
- vpmovm2b zmm2, k1 - vxorps ymm3, ymm3, ymm3 - vpcmpub k1, zmm0, zmm3, 1 - vpmovm2b zmm3, k1 - vpternlogd zmm3, zmm1, zmm0, -54 - vpcmpub k1, zmm0, zmm1, 6 - vpmovm2b zmm4, k1 - vpternlogd zmm4, zmm0, zmm1, -54 - vpternlogd zmm2, zmm3, zmm4, -54 - vmovups zmmword ptr [ecx], zmm2 - ;; size=69 bbWeight=1 PerfScore 16.33
+ vxorps ymm2, ymm2, ymm2 + vpcmpub k2, zmm0, zmm2, 1 + vpblendmb zmm2 k2, zmm0, zmm1 + vpcmpub k2, zmm0, zmm1, 6 + vpblendmb zmm0 k2, zmm1, zmm0 + vpblendmb zmm0 k1, zmm0, zmm2 + vmovups zmmword ptr [ecx], zmm0 + ;; size=51 bbWeight=1 PerfScore 14.83
G_M22834_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp ret 128 ;; size=7 bbWeight=1 PerfScore 3.50
-; Total bytes of code 102, prolog size 6, PerfScore 28.08, instruction count 19, allocated bytes for code 102 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
+; Total bytes of code 84, prolog size 6, PerfScore 26.58, instruction count 16, allocated bytes for code 84 (MethodHash=885fa6cd) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
; ============================================================

-6 (-4.69%) : 4816.dasm - System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte]):System.Runtime.Intrinsics.Vector2561ubyte

@@ -35,20 +35,19 @@ G_M53822_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpshufb ymm1, ymm2, ymm1 vpand ymm0, ymm0, ymmword ptr [@RWD64] vpcmpub k1, ymm0, ymmword ptr [@RWD96], 6
- vpmovm2b ymm2, k1 - vpsubb ymm3, ymm0, ymmword ptr [@RWD128] - vmovups ymm4, ymmword ptr [ebp+0x28] - vpshufb ymm3, ymm4, ymm3 - vmovups ymm4, ymmword ptr [ebp+0x48] - vpshufb ymm0, ymm4, ymm0 - vpternlogd ymm2, ymm3, ymm0, -54 - vpand ymm0, ymm2, ymm1
+ vpsubb ymm2, ymm0, ymmword ptr [@RWD128] + vmovups ymm3, ymmword ptr [ebp+0x28] + vpshufb ymm2, ymm3, ymm2 + vmovups ymm3, ymmword ptr [ebp+0x48] + vpshufb ymm0, ymm3, ymm0 + vpblendmb ymm0 k1, ymm0, ymm2 + vpand ymm0, ymm0, ymm1
vxorps ymm1, ymm1, ymm1 vpcmpeqb ymm0, ymm0, ymm1 vpcmpeqd ymm1, ymm1, ymm1 vpxor ymm0, ymm0, ymm1 vmovups ymmword ptr [ecx], ymm0
- ;; size=110 bbWeight=1 PerfScore 35.50
+ ;; size=104 bbWeight=1 PerfScore 35.00
G_M53822_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp @@ -61,6 +60,6 @@ RWD96 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD128 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 128, prolog size 6, PerfScore 45.25, instruction count 26, allocated bytes for code 128 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 122, prolog size 6, PerfScore 44.75, instruction count 25, allocated bytes for code 122 (MethodHash=47dc2dc1) for method System.Buffers.ProbabilisticMap:IsCharBitSetAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================

-13 (-4.51%) : 4815.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector2561[ubyte],System.Runtime.Intrinsics.Vector2561[ubyte],byref):System.Runtime.Intrinsics.Vector256`1ubyte

@@ -16,8 +16,8 @@ ; V05 loc1 [V05,T05] ( 3, 3 ) simd32 -> mm3 <System.Runtime.Intrinsics.Vector256`1[ushort]> ; V06 loc2 [V06,T06] ( 3, 3 ) simd32 -> mm4 <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V07 loc3 [V07,T07] ( 3, 3 ) simd32 -> mm2 <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V08 loc4 [V08,T15] ( 2, 2 ) simd32 -> mm1 <System.Runtime.Intrinsics.Vector256`1[ubyte]> -; V09 loc5 [V09,T16] ( 2, 2 ) simd32 -> mm0 <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V08 loc4 [V08,T15] ( 2, 2 ) simd32 -> mm0 <System.Runtime.Intrinsics.Vector256`1[ubyte]> +; V09 loc5 [V09,T16] ( 2, 2 ) simd32 -> mm1 <System.Runtime.Intrinsics.Vector256`1[ubyte]>
;* V10 loc6 [V10 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V11 tmp1 [V11,T17] ( 2, 2 ) simd32 -> [ebp-0x20] spill-single-def "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ; V12 tmp2 [V12,T02] ( 4, 4 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -26,7 +26,7 @@ ;* V15 tmp5 [V15 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V16 tmp6 [V16 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V17 tmp7 [V17 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
-; V18 tmp8 [V18,T18] ( 2, 2 ) simd32 -> mm3 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
+; V18 tmp8 [V18,T18] ( 2, 2 ) simd32 -> mm4 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]>
; V19 tmp9 [V19,T03] ( 4, 4 ) simd32 -> mm2 "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V20 tmp10 [V20 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> ;* V21 tmp11 [V21 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector256`1[ubyte]> @@ -36,16 +36,17 @@ ; V25 cse1 [V25,T09] ( 3, 3 ) simd32 -> mm5 "CSE - moderate" ; V26 cse2 [V26,T10] ( 3, 3 ) simd32 -> mm6 "CSE - moderate" ; V27 cse3 [V27,T11] ( 3, 3 ) simd32 -> mm7 "CSE - moderate"
-; V28 cse4 [V28,T12] ( 3, 3 ) simd32 -> [ebp-0x40] spill-single-def "CSE - moderate"
+; V28 cse4 [V28,T12] ( 3, 3 ) simd32 -> mm3 "CSE - moderate"
;
-; Lcl frame size = 64
+; Lcl frame size = 32
G_M59405_IG01: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG push ebp mov ebp, esp
- sub esp, 64
+ sub esp, 32
vzeroupper
- ;; size=9 bbWeight=1 PerfScore 2.50
+ vmovups ymm1, ymmword ptr [ebp+0x08] + ;; size=14 bbWeight=1 PerfScore 6.50
G_M59405_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000006 {ecx edx}, byref ; byrRegs +[ecx edx] vmovups ymm2, ymmword ptr [edx] @@ -67,40 +68,37 @@ G_M59405_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000006 {e vpand ymm4, ymm4, ymm6 vmovups ymm7, ymmword ptr [@RWD128] vpcmpub k1, ymm4, ymm7, 6
- vpmovm2b ymm3, k1 - vmovups ymm0, ymmword ptr [@RWD160] - vmovups ymmword ptr [ebp-0x40], ymm0 - vpsubb ymm1, ymm4, ymm0 - vmovups ymm0, ymmword ptr [ebp+0x08] - vpshufb ymm1, ymm0, ymm1 - vmovups ymm0, ymmword ptr [ebp+0x28] - vpshufb ymm4, ymm0, ymm4 - vpternlogd ymm3, ymm1, ymm4, -54 - vpand ymm1, ymm3, ymmword ptr [ebp-0x20] - vxorps ymm3, ymm3, ymm3 - vpcmpeqb ymm1, ymm1, ymm3 - vpcmpeqd ymm3, ymm3, ymm3 - vpxor ymm1, ymm1, ymm3 - vpsrld ymm3, ymm2, 5 - vpand ymm3, ymm3, ymm5 - vmovups ymm4, ymmword ptr [@RWD64] - vpshufb ymm3, ymm4, ymm3
+ vmovups ymm3, ymmword ptr [@RWD160] + vpsubb ymm0, ymm4, ymm3 + vmovups ymmword ptr [ebp+0x08], ymm1 + vpshufb ymm0, ymm1, ymm0 + vmovups ymm1, ymmword ptr [ebp+0x28] + vpshufb ymm4, ymm1, ymm4 + vpblendmb ymm0 k1, ymm4, ymm0 + vpand ymm0, ymm0, ymmword ptr [ebp-0x20] + vxorps ymm4, ymm4, ymm4 + vpcmpeqb ymm0, ymm0, ymm4 + vpcmpeqd ymm4, ymm4, ymm4 + vpxor ymm0, ymm0, ymm4 + vpsrld ymm4, ymm2, 5 + vpand ymm4, ymm4, ymm5 + vmovups ymm5, ymmword ptr [@RWD64] + vpshufb ymm4, ymm5, ymm4
vpand ymm2, ymm2, ymm6 vpcmpub k1, ymm2, ymm7, 6
- vpmovm2b ymm4, k1 - vpsubb ymm5, ymm2, ymmword ptr [ebp-0x40] - vmovups ymm6, ymmword ptr [ebp+0x08] - vpshufb ymm5, ymm6, ymm5 - vpshufb ymm0, ymm0, ymm2 - vpternlogd ymm4, ymm5, ymm0, -54 - vpand ymm0, ymm4, ymm3
+ vpsubb ymm3, ymm2, ymm3 + vmovups ymm5, ymmword ptr [ebp+0x08] + vpshufb ymm3, ymm5, ymm3 + vpshufb ymm1, ymm1, ymm2 + vpblendmb ymm1 k1, ymm1, ymm3 + vpand ymm1, ymm1, ymm4
vxorps ymm2, ymm2, ymm2
- vpcmpeqb ymm0, ymm0, ymm2
+ vpcmpeqb ymm1, ymm1, ymm2
vpcmpeqd ymm2, ymm2, ymm2
- vpxor ymm0, ymm0, ymm2 - vpand ymm0, ymm1, ymm0
+ vpxor ymm1, ymm1, ymm2 + vpand ymm0, ymm0, ymm1
vmovups ymmword ptr [ecx], ymm0
- ;; size=270 bbWeight=1 PerfScore 95.33
+ ;; size=252 bbWeight=1 PerfScore 88.67
G_M59405_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper mov esp, ebp @@ -115,6 +113,6 @@ RWD128 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F RWD160 dq 1010101010101010h, 1010101010101010h, 1010101010101010h, 1010101010101010h
-; Total bytes of code 288, prolog size 9, PerfScore 101.58, instruction count 60, allocated bytes for code 288 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
+; Total bytes of code 275, prolog size 9, PerfScore 98.92, instruction count 58, allocated bytes for code 275 (MethodHash=e39717f2) for method System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
; ============================================================

-6 (-3.75%) : 273746.dasm - System.Numerics.Tensors.TensorPrimitives:g_HalfAsWidenedUInt32ToSingleVector512|210_2(System.Runtime.Intrinsics.Vector5121[uint]):System.Runtime.Intrinsics.Vector5121float

@@ -39,16 +39,15 @@ G_M58105_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpandd zmm3, zmm4, dword ptr [@RWD128] {1to16} vpord zmm4, zmm3, dword ptr [@RWD132] {1to16} vptestnmd k1, zmm2, zmm2
- vpmovm2d zmm2, k1 - vpslld zmm5, zmm4, 1 - vpternlogd zmm2, zmm4, zmm5, -54
+ vpslld zmm2, zmm4, 1 + vpblendmd zmm2 k1, zmm2, zmm4
vpslld zmm0, zmm0, 13 vpandd zmm0, zmm0, dword ptr [@RWD136] {1to16} vpaddd zmm0, zmm0, zmm2 vsubps zmm0, zmm0, zmm3 vpord zmm0, zmm0, zmm1 vmovups zmmword ptr [ecx], zmm0
- ;; size=137 bbWeight=1 PerfScore 29.00
+ ;; size=131 bbWeight=1 PerfScore 28.00
G_M58105_IG03: ; bbWeight=1, epilog, nogc, extend vzeroupper pop ebp @@ -64,6 +63,6 @@ RWD132 dd 38000000h RWD136 dd 0FFFE000h
-; Total bytes of code 160, prolog size 6, PerfScore 37.75, instruction count 26, allocated bytes for code 160 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
+; Total bytes of code 154, prolog size 6, PerfScore 36.75, instruction count 25, allocated bytes for code 154 (MethodHash=e6ab1d06) for method System.Numerics.Tensors.TensorPrimitives:<ConvertToSingle>g__HalfAsWidenedUInt32ToSingle_Vector512|210_2(System.Runtime.Intrinsics.Vector512`1[uint]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
; ============================================================

libraries_tests.run.windows.x86.Release.mch

-12 (-15.19%) : 370873.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,19 +34,17 @@ G_M10273_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm1, -54
+ vpblendmd xmm3 k1, xmm1, xmm0
vpcmpud k1, xmm0, xmm1, 1
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 k1, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [ecx], xmm2
- ;; size=59 bbWeight=1 PerfScore 12.33
+ ;; size=47 bbWeight=1 PerfScore 10.00
G_M10273_IG03: ; bbWeight=1, epilog, nogc, extend pop ebp ret 32 ;; size=4 bbWeight=1 PerfScore 2.50
-; Total bytes of code 79, prolog size 6, PerfScore 23.08, instruction count 17, allocated bytes for code 79 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 6, PerfScore 20.75, instruction count 15, allocated bytes for code 67 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================

-12 (-15.19%) : 366898.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,19 +34,17 @@ G_M23551_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm1, xmm0, -54
+ vpblendmd xmm3 k1, xmm0, xmm1
vpcmpud k1, xmm0, xmm1, 6
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 k1, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [ecx], xmm2
- ;; size=59 bbWeight=1 PerfScore 12.33
+ ;; size=47 bbWeight=1 PerfScore 10.00
G_M23551_IG03: ; bbWeight=1, epilog, nogc, extend pop ebp ret 32 ;; size=4 bbWeight=1 PerfScore 2.50
-; Total bytes of code 79, prolog size 6, PerfScore 23.08, instruction count 17, allocated bytes for code 79 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 6, PerfScore 20.75, instruction count 15, allocated bytes for code 67 (MethodHash=3243a400) for method System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================

-12 (-15.19%) : 366792.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator1[uint]:Invoke(System.Runtime.Intrinsics.Vector1281[uint],System.Runtime.Intrinsics.Vector1281[uint]):System.Runtime.Intrinsics.Vector1281uint

@@ -34,19 +34,17 @@ G_M10273_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000002 {e vpcmpeqd xmm2, xmm0, xmm1 vxorps xmm3, xmm3, xmm3 vpcmpud k1, xmm0, xmm3, 1
- vpmovm2d xmm3, k1 - vpternlogd xmm3, xmm0, xmm1, -54
+ vpblendmd xmm3 k1, xmm1, xmm0
vpcmpud k1, xmm0, xmm1, 1
- vpmovm2d xmm4, k1 - vpternlogd xmm4, xmm0, xmm1, -54 - vpternlogd xmm2, xmm3, xmm4, -54
+ vpblendmd xmm0 k1, xmm1, xmm0 + vpternlogd xmm2, xmm3, xmm0, -54
vmovups xmmword ptr [ecx], xmm2
- ;; size=59 bbWeight=1 PerfScore 12.33
+ ;; size=47 bbWeight=1 PerfScore 10.00
G_M10273_IG03: ; bbWeight=1, epilog, nogc, extend pop ebp ret 32 ;; size=4 bbWeight=1 PerfScore 2.50
-; Total bytes of code 79, prolog size 6, PerfScore 23.08, instruction count 17, allocated bytes for code 79 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
+; Total bytes of code 67, prolog size 6, PerfScore 20.75, instruction count 15, allocated bytes for code 67 (MethodHash=7471d7de) for method System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
; ============================================================

librariestestsnotieredcompilation.run.windows.x86.Release.mch

-12 (-1.69%) : 167868.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelectuint:this (FullOpts)

@@ -78,8 +78,7 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} vmovups ymm1, ymmword ptr [ecx+0x08] vmovups ymmword ptr [ebp-0x48], ymm1 vpcmpud k1, ymm0, ymm1, 6
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54
+ vpblendmd ymm2 k1, ymm1, ymm0
vmovups ymmword ptr [ebp-0x68], ymm2 mov ecx, 0xD1FFAB1E ; System.Action`2[int,uint] ; gcrRegs -[ecx] @@ -129,9 +128,9 @@ G_M21446_IG02: ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {} vmovups ymm2, ymmword ptr [ebp-0xA8] vextracti128 xmm0, ymm2, 1 vmovd edx, xmm0
- ;; size=272 bbWeight=1 PerfScore 104.50 -G_M21446_IG03: ; bbWeight=1, extend
push edx
+ ;; size=267 bbWeight=1 PerfScore 104.33 +G_M21446_IG03: ; bbWeight=1, extend
mov edx, 4 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] @@ -167,9 +166,8 @@ G_M21446_IG03: ; bbWeight=1, extend vmovups ymm0, ymmword ptr [ebp-0x28] vmovups ymm1, ymmword ptr [ebp-0x48] vpcmpud k1, ymm0, ymm1, 2
- vpmovm2d ymm2, k1 - vpternlogd ymm2, ymm0, ymm1, -54 - vmovups ymmword ptr [ebp-0x88], ymm2
+ vpblendmd ymm0 k1, ymm1, ymm0 + vmovups ymmword ptr [ebp-0x88], ymm0
mov ecx, 0xD1FFAB1E ; System.Action`2[int,uint] call CORINFO_HELP_NEWSFAST ; gcrRegs +[eax] @@ -181,70 +179,70 @@ G_M21446_IG03: ; bbWeight=1, extend ; gcrRegs -[eax esi] ; byrRegs -[edx] mov dword ptr [edi+0x0C], 0xD1FFAB1E
- vmovups ymm2, ymmword ptr [ebp-0x88] - vmovups ymmword ptr [ebp-0xC8], ymm2 - vmovd edx, xmm2
+ vmovups ymm0, ymmword ptr [ebp-0x88] + vmovups ymmword ptr [ebp-0xC8], ymm0 + vmovd edx, xmm0
push edx xor edx, edx mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovdqu xmm0, xmmword ptr [ebp-0xC8] - vpextrd edx, xmm0, 1
+ vmovdqu xmm1, xmmword ptr [ebp-0xC8] + vpextrd edx, xmm1, 1
push edx mov edx, 1 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovdqu xmm0, xmmword ptr [ebp-0xC8] - vpextrd edx, xmm0, 2
+ vmovdqu xmm1, xmmword ptr [ebp-0xC8] + vpextrd edx, xmm1, 2
push edx mov edx, 2 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovdqu xmm0, xmmword ptr [ebp-0xC8] - vpextrd edx, xmm0, 3
+ vmovdqu xmm1, xmmword ptr [ebp-0xC8] + vpextrd edx, xmm1, 3
push edx mov edx, 3 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1 - vmovd edx, xmm0
+ vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm1, ymm0, 1 + vmovd edx, xmm1
push edx mov edx, 4
- ;; size=304 bbWeight=1 PerfScore 128.75 -G_M21446_IG04: ; bbWeight=1, extend
mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 1
+ ;; size=303 bbWeight=1 PerfScore 131.58 +G_M21446_IG04: ; bbWeight=1, extend + vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 1
push edx mov edx, 5 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1 - vpextrd edx, xmm0, 2
+ vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm1, ymm0, 1 + vpextrd edx, xmm1, 2
push edx mov edx, 6 mov ecx, gword ptr [edi+0x04] ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx]
- vmovups ymm2, ymmword ptr [ebp-0xC8] - vextracti128 xmm0, ymm2, 1
+ vmovups ymm0, ymmword ptr [ebp-0xC8] + vextracti128 xmm0, ymm0, 1
vpextrd edx, xmm0, 3 push edx mov edx, 7 @@ -252,7 +250,7 @@ G_M21446_IG04: ; bbWeight=1, extend ; gcrRegs +[ecx] call [edi+0x0C]<unknown method> ; gcrRegs -[ecx edi]
- ;; size=102 bbWeight=1 PerfScore 50.75
+ ;; size=96 bbWeight=1 PerfScore 45.75
G_M21446_IG05: ; bbWeight=1, epilog, nogc, extend vzeroupper lea esp, [ebp-0x08] @@ -266,6 +264,6 @@ G_M21446_IG06: ; bbWeight=0, gcVars=00000000 {}, gcrefRegs=00000000 {}, b int3 ;; size=7 bbWeight=0 PerfScore 0.00
-; Total bytes of code 709, prolog size 14, PerfScore 292.50, instruction count 167, allocated bytes for code 709 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
+; Total bytes of code 697, prolog size 14, PerfScore 290.17, instruction count 165, allocated bytes for code 697 (MethodHash=8544ac39) for method System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)
; ============================================================

Details

Improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.windows.x86.checked.mch 1 1 0 0 -113 +0
benchmarks.run_pgo.windows.x86.checked.mch 1 1 0 0 -122 +0
benchmarks.run_tiered.windows.x86.checked.mch 1 1 0 0 -113 +0
coreclr_tests.run.windows.x86.checked.mch 16 16 0 0 -576 +0
libraries.crossgen2.windows.x86.checked.mch 0 0 0 0 -0 +0
libraries.pmi.windows.x86.checked.mch 24 24 0 0 -498 +0
libraries_tests.run.windows.x86.Release.mch 10 10 0 0 -772 +0
librariestestsnotieredcompilation.run.windows.x86.Release.mch 7 7 0 0 -737 +0
realworld.run.windows.x86.checked.mch 0 0 0 0 -0 +0
60 60 0 0 -2,931 +0

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.x86.checked.mch 24,486 4 24,482 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.windows.x86.checked.mch 119,833 41,887 77,946 0 (0.00%) 0 (0.00%)
benchmarks.run_tiered.windows.x86.checked.mch 47,980 28,727 19,253 0 (0.00%) 0 (0.00%)
coreclr_tests.run.windows.x86.checked.mch 574,728 320,026 254,702 7 (0.00%) 7 (0.00%)
libraries.crossgen2.windows.x86.checked.mch 242,344 15 242,329 0 (0.00%) 0 (0.00%)
libraries.pmi.windows.x86.checked.mch 305,049 6 305,043 0 (0.00%) 0 (0.00%)
libraries_tests.run.windows.x86.Release.mch 632,286 427,924 204,362 0 (0.00%) 0 (0.00%)
librariestestsnotieredcompilation.run.windows.x86.Release.mch 316,428 21,871 294,557 0 (0.00%) 0 (0.00%)
realworld.run.windows.x86.checked.mch 35,987 3 35,984 0 (0.00%) 0 (0.00%)
2,299,121 840,463 1,458,658 7 (0.00%) 7 (0.00%)

jit-analyze output

benchmarks.run.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 7123696 (overridden on cmd)
Total bytes of diff: 7123583 (overridden on cmd)
Total bytes of delta: -113 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -113 : 22326.dasm (-10.79 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -113 (-10.79 % of base) : 22326.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

Top method improvements (percentages):
        -113 (-10.79 % of base) : 22326.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


benchmarks.run_pgo.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 45854626 (overridden on cmd)
Total bytes of diff: 45854504 (overridden on cmd)
Total bytes of delta: -122 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -122 : 94556.dasm (-11.36 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -122 (-11.36 % of base) : 94556.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

Top method improvements (percentages):
        -122 (-11.36 % of base) : 94556.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


benchmarks.run_tiered.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 9444502 (overridden on cmd)
Total bytes of diff: 9444389 (overridden on cmd)
Total bytes of delta: -113 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -113 : 44440.dasm (-10.79 % of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -113 (-10.79 % of base) : 44440.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

Top method improvements (percentages):
        -113 (-10.79 % of base) : 44440.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

1 total methods with Code Size differences (1 improved, 0 regressed).


coreclr_tests.run.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 309424823 (overridden on cmd)
Total bytes of diff: 309424247 (overridden on cmd)
Total bytes of delta: -576 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
         -48 : 207939.dasm (-3.12 % of base)
         -48 : 469368.dasm (-1.06 % of base)
         -48 : 207947.dasm (-1.06 % of base)
         -48 : 469364.dasm (-3.12 % of base)
         -48 : 469367.dasm (-3.16 % of base)
         -48 : 207941.dasm (-3.12 % of base)
         -48 : 207946.dasm (-3.16 % of base)
         -48 : 469363.dasm (-3.12 % of base)
         -24 : 207944.dasm (-1.60 % of base)
         -24 : 469362.dasm (-0.54 % of base)
         -24 : 207935.dasm (-1.62 % of base)
         -24 : 469361.dasm (-1.62 % of base)
         -24 : 469366.dasm (-1.60 % of base)
         -24 : 207938.dasm (-0.54 % of base)
         -24 : 207945.dasm (-1.60 % of base)
         -24 : 469365.dasm (-1.60 % of base)

16 total files with Code Size differences (16 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -48 (-3.12 % of base) : 469364.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -48 (-3.12 % of base) : 207941.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Tier0-FullOpts)
         -48 (-3.16 % of base) : 469367.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -48 (-3.16 % of base) : 207946.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Tier0-FullOpts)
         -48 (-1.06 % of base) : 469368.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -48 (-1.06 % of base) : 207947.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Tier0-FullOpts)
         -48 (-3.12 % of base) : 469363.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -48 (-3.12 % of base) : 207939.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Tier0-FullOpts)
         -24 (-1.60 % of base) : 469366.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -24 (-1.60 % of base) : 207945.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Tier0-FullOpts)
         -24 (-1.62 % of base) : 469361.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -24 (-1.62 % of base) : 207935.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Tier0-FullOpts)
         -24 (-0.54 % of base) : 469362.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -24 (-0.54 % of base) : 207938.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)
         -24 (-1.60 % of base) : 469365.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -24 (-1.60 % of base) : 207944.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Tier0-FullOpts)

Top method improvements (percentages):
         -48 (-3.16 % of base) : 469367.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (FullOpts)
         -48 (-3.16 % of base) : 207946.dasm - VectorTest+VectorRelopTest`1[uint]:VectorRelOp(uint,uint):int (Tier0-FullOpts)
         -48 (-3.12 % of base) : 469364.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (FullOpts)
         -48 (-3.12 % of base) : 207941.dasm - VectorTest+VectorRelopTest`1[ubyte]:VectorRelOp(ubyte,ubyte):int (Tier0-FullOpts)
         -48 (-3.12 % of base) : 469363.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (FullOpts)
         -48 (-3.12 % of base) : 207939.dasm - VectorTest+VectorRelopTest`1[ushort]:VectorRelOp(ushort,ushort):int (Tier0-FullOpts)
         -24 (-1.62 % of base) : 469361.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (FullOpts)
         -24 (-1.62 % of base) : 207935.dasm - VectorTest+VectorRelopTest`1[int]:VectorRelOp(int,int):int (Tier0-FullOpts)
         -24 (-1.60 % of base) : 469366.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (FullOpts)
         -24 (-1.60 % of base) : 207945.dasm - VectorTest+VectorRelopTest`1[byte]:VectorRelOp(byte,byte):int (Tier0-FullOpts)
         -24 (-1.60 % of base) : 469365.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (FullOpts)
         -24 (-1.60 % of base) : 207944.dasm - VectorTest+VectorRelopTest`1[short]:VectorRelOp(short,short):int (Tier0-FullOpts)
         -48 (-1.06 % of base) : 469368.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (FullOpts)
         -48 (-1.06 % of base) : 207947.dasm - VectorTest+VectorRelopTest`1[ulong]:VectorRelOp(ulong,ulong):int (Tier0-FullOpts)
         -24 (-0.54 % of base) : 469362.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (FullOpts)
         -24 (-0.54 % of base) : 207938.dasm - VectorTest+VectorRelopTest`1[long]:VectorRelOp(long,long):int (Tier0-FullOpts)

16 total methods with Code Size differences (16 improved, 0 regressed).


libraries.pmi.windows.x86.checked.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 49148609 (overridden on cmd)
Total bytes of diff: 49148111 (overridden on cmd)
Total bytes of delta: -498 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -113 : 4822.dasm (-10.79 % of base)
         -48 : 274001.dasm (-17.45 % of base)
         -48 : 273944.dasm (-17.45 % of base)
         -24 : 274003.dasm (-14.12 % of base)
         -24 : 273946.dasm (-14.12 % of base)
         -18 : 273943.dasm (-17.65 % of base)
         -18 : 273965.dasm (-17.65 % of base)
         -18 : 274000.dasm (-17.65 % of base)
         -18 : 274022.dasm (-17.65 % of base)
         -18 : 4817.dasm (-6.36 % of base)
         -13 : 4815.dasm (-4.51 % of base)
         -12 : 273942.dasm (-14.63 % of base)
         -12 : 273963.dasm (-15.19 % of base)
         -12 : 273998.dasm (-15.19 % of base)
         -12 : 274020.dasm (-15.19 % of base)
         -12 : 273945.dasm (-12.37 % of base)
         -12 : 274021.dasm (-14.63 % of base)
         -12 : 273964.dasm (-14.63 % of base)
         -12 : 273999.dasm (-14.63 % of base)
         -12 : 274002.dasm (-12.37 % of base)

24 total files with Code Size differences (24 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -113 (-10.79 % of base) : 4822.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -48 (-17.45 % of base) : 273944.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -48 (-17.45 % of base) : 274001.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -24 (-14.12 % of base) : 273946.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -24 (-14.12 % of base) : 274003.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -18 (-6.36 % of base) : 4817.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -18 (-17.65 % of base) : 273943.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -18 (-17.65 % of base) : 273965.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -18 (-17.65 % of base) : 274000.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -18 (-17.65 % of base) : 274022.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -13 (-4.51 % of base) : 4815.dasm - System.Buffers.ProbabilisticMap:ContainsMask32CharsAvx2(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte],byref):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -12 (-15.19 % of base) : 273941.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -12 (-12.37 % of base) : 273945.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -12 (-14.63 % of base) : 273942.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -12 (-15.19 % of base) : 273963.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -12 (-14.63 % of base) : 273964.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -12 (-15.19 % of base) : 273998.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -12 (-12.37 % of base) : 274002.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -12 (-14.63 % of base) : 273999.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -12 (-15.19 % of base) : 274020.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

Top method improvements (percentages):
         -18 (-17.65 % of base) : 273943.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -18 (-17.65 % of base) : 273965.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -18 (-17.65 % of base) : 274000.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -18 (-17.65 % of base) : 274022.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte],System.Runtime.Intrinsics.Vector512`1[ubyte]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts)
         -48 (-17.45 % of base) : 273944.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -48 (-17.45 % of base) : 274001.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts)
         -12 (-15.19 % of base) : 273941.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -12 (-15.19 % of base) : 273963.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -12 (-15.19 % of base) : 273998.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -12 (-15.19 % of base) : 274020.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)
         -12 (-14.63 % of base) : 273942.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -12 (-14.63 % of base) : 273964.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -12 (-14.63 % of base) : 273999.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -12 (-14.63 % of base) : 274021.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts)
         -24 (-14.12 % of base) : 273946.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -24 (-14.12 % of base) : 274003.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector512`1[ubyte]):ubyte (FullOpts)
         -12 (-12.37 % of base) : 273945.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
         -12 (-12.37 % of base) : 274002.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]:Invoke(System.Runtime.Intrinsics.Vector256`1[ubyte]):ubyte (FullOpts)
        -113 (-10.79 % of base) : 4822.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -18 (-6.36 % of base) : 4817.dasm - System.Buffers.ProbabilisticMap:ContainsMask16Chars(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],byref):System.Runtime.Intrinsics.Vector128`1[ubyte] (FullOpts)

24 total methods with Code Size differences (24 improved, 0 regressed).


libraries_tests.run.windows.x86.Release.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 188553323 (overridden on cmd)
Total bytes of diff: 188552551 (overridden on cmd)
Total bytes of delta: -772 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -156 : 367511.dasm (-15.43 % of base)
        -156 : 369103.dasm (-15.43 % of base)
        -132 : 369537.dasm (-14.95 % of base)
         -84 : 363521.dasm (-12.67 % of base)
         -84 : 369394.dasm (-12.67 % of base)
         -72 : 370885.dasm (-8.61 % of base)
         -52 : 318312.dasm (-4.91 % of base)
         -12 : 366898.dasm (-15.19 % of base)
         -12 : 366792.dasm (-15.19 % of base)
         -12 : 370873.dasm (-15.19 % of base)

10 total files with Code Size differences (10 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -156 (-15.43 % of base) : 369103.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
        -156 (-15.43 % of base) : 367511.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
        -132 (-14.95 % of base) : 369537.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (Tier0-FullOpts)
         -84 (-12.67 % of base) : 363521.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -84 (-12.67 % of base) : 369394.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -72 (-8.61 % of base) : 370885.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](System.ReadOnlySpan`1[uint],System.ReadOnlySpan`1[uint],System.Span`1[uint]) (Tier1)
         -52 (-4.91 % of base) : 318312.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)
         -12 (-15.19 % of base) : 366898.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -12 (-15.19 % of base) : 366792.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -12 (-15.19 % of base) : 370873.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)

Top method improvements (percentages):
        -156 (-15.43 % of base) : 369103.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
        -156 (-15.43 % of base) : 367511.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (Tier0-FullOpts)
         -12 (-15.19 % of base) : 366898.dasm - System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -12 (-15.19 % of base) : 366792.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
         -12 (-15.19 % of base) : 370873.dasm - System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[uint]:Invoke(System.Runtime.Intrinsics.Vector128`1[uint],System.Runtime.Intrinsics.Vector128`1[uint]):System.Runtime.Intrinsics.Vector128`1[uint] (Tier1)
        -132 (-14.95 % of base) : 369537.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (Tier0-FullOpts)
         -84 (-12.67 % of base) : 363521.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -84 (-12.67 % of base) : 369394.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (Tier0-FullOpts)
         -72 (-8.61 % of base) : 370885.dasm - System.Numerics.Tensors.TensorPrimitives:InvokeSpanSpanIntoSpan[uint,System.Numerics.Tensors.TensorPrimitives+MinMagnitudePropagateNaNOperator`1[uint]](System.ReadOnlySpan`1[uint],System.ReadOnlySpan`1[uint],System.Span`1[uint]) (Tier1)
         -52 (-4.91 % of base) : 318312.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (Tier0-FullOpts)

10 total methods with Code Size differences (10 improved, 0 regressed).


librariestestsnotieredcompilation.run.windows.x86.Release.mch

To reproduce these diffs on Windows x86: superpmi.py asmdiffs -target_os windows -target_arch x86 -arch x86


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 103930242 (overridden on cmd)
Total bytes of diff: 103929505 (overridden on cmd)
Total bytes of delta: -737 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
        -156 : 165267.dasm (-15.43 % of base)
        -156 : 167191.dasm (-15.43 % of base)
        -132 : 167215.dasm (-14.95 % of base)
        -113 : 149348.dasm (-10.79 % of base)
         -84 : 167075.dasm (-12.67 % of base)
         -84 : 166467.dasm (-12.67 % of base)
         -12 : 167868.dasm (-1.69 % of base)

7 total files with Code Size differences (7 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
        -156 (-15.43 % of base) : 167191.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -156 (-15.43 % of base) : 165267.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -132 (-14.95 % of base) : 167215.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
        -113 (-10.79 % of base) : 149348.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -84 (-12.67 % of base) : 167075.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -84 (-12.67 % of base) : 166467.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -12 (-1.69 % of base) : 167868.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

Top method improvements (percentages):
        -156 (-15.43 % of base) : 167191.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MaxMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -156 (-15.43 % of base) : 165267.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ubyte,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ubyte]](System.ReadOnlySpan`1[ubyte]):ubyte (FullOpts)
        -132 (-14.95 % of base) : 167215.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ushort,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ushort]](System.ReadOnlySpan`1[ushort]):ushort (FullOpts)
         -84 (-12.67 % of base) : 167075.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
         -84 (-12.67 % of base) : 166467.dasm - System.Numerics.Tensors.TensorPrimitives:MinMaxCore[ulong,System.Numerics.Tensors.TensorPrimitives+MinMagnitudeOperator`1[ulong]](System.ReadOnlySpan`1[ulong]):ulong (FullOpts)
        -113 (-10.79 % of base) : 149348.dasm - System.Buffers.ProbabilisticMap:IndexOfAnyVectorized(byref,byref,int,System.ReadOnlySpan`1[ushort]):int (FullOpts)
         -12 (-1.69 % of base) : 167868.dasm - System.Numerics.Tests.GenericVectorTests:TestConditionalSelect[uint]():this (FullOpts)

7 total methods with Code Size differences (7 improved, 0 regressed).