Jump to content
HWBOT Community Forums

What is considered a fair optimisation?


moog

Recommended Posts

Hello, let's suppose that we have a benchmark for endian swaps and popcounts.

Intro: Let us have any 32bit integer, a 386 CPU, 486 CPU and Intel Nehalem CPU.

Procedure:

Endian swap:

let x := 0xaabbccdd;

expected result : 0xddccbbaa;

Population count:

for any 32bit integer x, the expected result is number of bits set to 1 in given x

 

My query arises from the fact that each of the 3 specified example CPUs can achieve these results in different ways, one better than the other.

 

386:

00000000 <__bswap_32>:
  0:   55                      push   %ebp
  1:   89 e5                   mov    %esp,%ebp
  3:   8b 45 08                mov    0x8(%ebp),%eax
  6:   66 c1 c0 08             rol    $0x8,%ax
  a:   c1 c0 10                rol    $0x10,%eax
  d:   66 c1 c0 08             rol    $0x8,%ax
 11:   5d                      pop    %ebp
 12:   c3                      ret    

00000013 <_mm_popcnt_u32>:
 13:   55                      push   %ebp
 14:   89 e5                   mov    %esp,%ebp
 16:   83 ec 10                sub    $0x10,%esp
 19:   c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%ebp)
 20:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
 27:   eb 19                   jmp    42 <_mm_popcnt_u32+0x2f>
 29:   8b 45 fc                mov    -0x4(%ebp),%eax
 2c:   8b 55 08                mov    0x8(%ebp),%edx
 2f:   88 c1                   mov    %al,%cl
 31:   d3 fa                   sar    %cl,%edx
 33:   89 d0                   mov    %edx,%eax
 35:   83 e0 01                and    $0x1,%eax
 38:   85 c0                   test   %eax,%eax
 3a:   74 03                   je     3f <_mm_popcnt_u32+0x2c>
 3c:   ff 45 f8                incl   -0x8(%ebp)
 3f:   ff 45 fc                incl   -0x4(%ebp)
 42:   83 7d fc 1f             cmpl   $0x1f,-0x4(%ebp)
 46:   7e e1                   jle    29 <_mm_popcnt_u32+0x16>
 48:   8b 45 f8                mov    -0x8(%ebp),%eax
 4b:   c9                      leave  
 4c:   c3                      ret

00000055 <popcnt>:
 55:   55                      push   %ebp
 56:   89 e5                   mov    %esp,%ebp
 58:   ff 75 08                pushl  0x8(%ebp)
 5b:   e8 aa ff ff ff          call   a <_mm_popcnt_u32>
 60:   83 c4 04                add    $0x4,%esp
 63:   c9                      leave  
 64:   c3                      ret

486 has a dedicated instruction for byteswaps:

00000000 <__bswap_32>:
  0:   55                      push   %ebp
  1:   89 e5                   mov    %esp,%ebp
  3:   8b 45 08                mov    0x8(%ebp),%eax
  6:   0f c8                   bswap  %eax
  8:   5d                      pop    %ebp
  9:   c3                      ret

Nehalem has a dedicated instruction for popcounts:

0000001b <popcnt>:
 1b:   55                      push   %ebp
 1c:   89 e5                   mov    %esp,%ebp
 1e:   83 ec 10                sub    $0x10,%esp
 21:   8b 45 08                mov    0x8(%ebp),%eax
 24:   89 45 fc                mov    %eax,-0x4(%ebp)
 27:   f3 0f b8 45 fc          popcnt -0x4(%ebp),%eax
 2c:   90                      nop
 2d:   c9                      leave  
 2e:   c3                      ret

 

So, is a fair optimisation something that can exploit the additional instructions of the hardware? Is a benchmark fair if 2 CPUs with different capabilities run the same code to achieve the same results, or is the benchmark still fair if one CPU uses better code, because it actually can run it, to achieve the same result faster?

 

@update: Sorry, I think I asked this in the wrong section.

Edited by moog
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...