Quantcast
Channel: Dan Byström's Bwain » .NET
Viewing all articles
Browse latest Browse all 10

Optimizing away II

$
0
0

Continued from Optimizing away. Ok, now I have worked up the courage.

Prepare yourself for a major disappointment. I really do not know how to tweak that C#-loop to run a nanosecond faster. But I can do the same calculation much faster. How? Just my old favorite party trick. It goes like this:

1. Add a new project to your solution

2. Chose Visual C++ / CLR / Class Library

3. Insert the following managed class:

	public ref class FastImageCompare
	{
	public:
		static double compare( void* p1, void* p2, int count )
		{
			return NativeCode::fastImageCompare( p1, p2, count );
		}
		static double compare( IntPtr p1, IntPtr p2, int count )
		{
			return NativeCode::fastImageCompare( p1.ToPointer(), p2.ToPointer(), count );
		}
	};

4. Insert the following function into an unmanaged class (which I happened to call NativeCode):

unsigned long long NativeCode::fastImageCompare( void* p1, void* p2, int count )
{
	int high32 = 0;

	_asm
	{
		push	ebx
		push	esi
		push	edi

		mov		esi, p1
		mov		edi, p2
		xor		eax, eax
again:
		dec		count
		js		done

		movzx	ebx, [esi]
		movzx	edx, [edi]
		sub		edx, ebx
		imul	edx, edx

		movzx	ebx, [esi+1]
		movzx	ecx, [edi+1]
		sub		ebx, ecx
		imul	ebx, ebx
		add		edx, ebx

		movzx	ebx, [esi+2]
		movzx	ecx, [edi+2]
		sub		ebx, ecx
		imul	ebx, ebx
		add		edx, ebx

		add		esi, 4
		add		edi, 4

		add		eax, edx
		jnc		again

		inc		high32
		jmp		again
done:
		mov		edx, high32

		pop		edi
		pop		esi
		pop		ebx
	}

}

Yeah. That’s it. Hand tuned assembly language within a .NET Assembly. UPDATE 2009-01-01: return type of the function changed from “unsigned long” to “unsigned long long”, see here.

I guess that’s almost cheating. And we will be locked inside the Intel platform. Most people won’t mind I guess, but other may have very strong feelings about it. If we really would like to exploit this kind of optimizations while still be portable (to Mono/Mac for example) one possibility would be to load the assembly with native code dynamically. If it fails we could fall back to an alternative version written in pure managed code.

(I know from experience that some people with lesser programming skills react to this with a “what? it must be a crappy compiler if you can write faster code by yourself”. Let me assure you that this is not the case. On the contrary: I’m amazed about the quality of the code emitted by the C# + .NET JIT compilers.)



Viewing all articles
Browse latest Browse all 10

Trending Articles