It is said that Brother Jiang Min conducted a sufficiently rigorous performance test on the Array.Sort and Enumerable.OrderBy methods a while ago. The conclusion seems to be inconsistent with the theory and expectations, but this conclusion was measured in a relatively rigorous environment, which also aroused the interest of experts. I also tested it on my machine. Of course, I sorted out some irrelevant codes first:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Threading;
namespace Exam11
{
public static class CodeTimer
{
public static void Initialize()
{
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Time( "", 1, () => { } );
}
public static void Time( string name, int iteration, Action action )
{
if ( String.IsNullOrEmpty( name ) ) return;
// warm up
action();
// 1.
ConsoleColor currentForeColor = Console.ForegroundColor;
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine( name );
// 2.
GC.Collect( GC.MaxGeneration, GCCollectionMode.Forced );
int[] gcCounts = new int[GC.MaxGeneration + 1];
for ( int i = 0; i <= GC.MaxGeneration; i++ )
{
gcCounts[i] = GC.CollectionCount(i);
}
// 3.
Stopwatch watch = new Stopwatch();
watch.Start();
ulong cycleCount = GetCycleCount();
for ( int i = 0; i < iteration; i++ ) action();
ulong cpuCycles = GetCycleCount() - cycleCount;
watch.Stop();
// 4.
Console.ForegroundColor = currentForeColor;
Console.WriteLine( "tTime Elapsed:t" + watch.ElapsedMilliseconds.ToString( "N0" ) + "ms" );
Console.WriteLine( "tCPU Cycles:t" + cpuCycles.ToString( "N0" ) );
// 5.
for ( int i = 0; i <= GC.MaxGeneration; i++ )
{
int count = GC.CollectionCount(i) - gcCounts[i];
Console.WriteLine( "tGen " + i + ": tt" + count );
}
Console.WriteLine();
}
private static ulong GetCycleCount()
{
ulong cycleCount = 0;
QueryThreadCycleTime( GetCurrentThread(), ref cycleCount );
return cycleCount;
}
[DllImport( "kernel32.dll" )]
[return: MarshalAs(UnmanagedType.Bool)]
static extern bool QueryThreadCycleTime(IntPtr threadHandle, ref ulong cycleTime);
[DllImport( "kernel32.dll" )]
static extern IntPtr GetCurrentThread();
}
class Program
{
static void Main( string[] args )
{
var random = new Random(DateTime.Now.Millisecond);
var array = Enumerable.Repeat( 0, 50000 ).Select( _ => new Person { ID = random.Next() } ).ToArray();
//Lao Zhao program
CodeTimer.Initialize();
CodeTimer.Time( "SortWithCustomComparer", 500, () => SortWithCustomComparer( CloneArray( array ) ) );
CodeTimer.Time( "SortWithLinq", 500, () => SortWithLinq( CloneArray( array ) ) );
Console.ReadLine();
}
private static void SortWithCustomComparer( Person[] array )
{
Array.Sort( array, new PersonComparer() );
}
private static void SortWithLinq( Person[] array )
{
var sorted =
(from person in array
orderby person.ID
select person ).ToList();
}
private static T[] CloneArray<T>( T[] source )
{
var dest = new T[source.Length];
Array.Copy( source, dest, source.Length );
return dest;
}
}
public class Person
{
public string FirstName
{
get;
set;
}
public string LastName
{
get;
set;
}
public int ID
{
get;
set;
}
}
public class PersonComparer : IComparer<Person>
{
public int Compare( Person x, Person y )
{
return x.ID - y.ID;
}
}
}
The test results indeed show the obvious advantages of Enumerable.OrderBy, even in Release compilation mode.
In principle, this is impossible, so there must be some unfair competition somewhere.
I revisited the entire code and adjusted the position of the two test codes:
CodeTimer.Time( "SortWithLinq", 100, () => SortWithLinq( CloneArray( array ) ) );
CodeTimer.Time( "SortWithCustomComparer", 100, () => SortWithCustomComparer( CloneArray( array ) ) );
Of course, this won't have any effect. Secondly, I found that this comparer instance was created multiple times (actually only 100 times), so I optimized it as follows:
private static PersonComparer comparer = new PersonComparer();
private static void SortWithCustomComparer( Person[] array )
{
Array.Sort(array, comparer);
}
There is no improvement in the results.
So I can only focus on the implementation of comparer:
public class PersonComparer : IComparer<Person>
{
public int Compare( Person x, Person y )
{
return x.ID - y.ID;
}
}
This is not a fair approach, but should not cause performance problems. Change it to the following form:
public int Compare( Person x, Person y )
{
return x.ID.CompareTo( y.ID );
}
There is still no improvement in performance, but a slight decrease (this is normal).
Seems like we're at an impasse? In fact, we are very close to the truth. In fact, the title has already announced the answer. Because the problem lies in person.ID being an attribute, the method is actually called three times here.
So simply change the ID to a field, and the performance results will be close to expected.
So why does LINQ have an advantage here? Ah, if you understand the sorting principle of LINQ, it is not difficult to think that LINQ can save many calls to Person.get_ID. I believe this analysis will be elaborated in Daniel’s next article, so I won’t show off here.
So the conclusion can be drawn, but it is still a little surprising. LINQ does have performance advantages in certain environments. This theory is established. But what I want to say is that there are actually many factors that will affect this conclusion. We really cannot say in a general way that LINQ must have performance advantages or Array.Sort must have performance advantages.