Jump to content
HWBOT Community Forums

moog

Members
  • Posts

    3
  • Joined

  • Last visited

Posts posted by moog

  1. Link?

    [Tutorial] How to Install and run Linux on your PS4 - via WOLOLO

    How to Install and run Linux on your PS4 - Wololo.net

    https://www.reddit.com/r/linux/comments/3yt1dl/linux_on_the_ps4/

    I think thats the part where he said hwbot prime is broken on big endian...

     

     

    @Massman I will try to create an entry...

     

    EDIT: Sony Playstation 3 @ HWBOT

    Big thanks! :D I've already added a result.

    Hard to tell if this is good or bad, but it's pretty awesome to see :)

     

    The code of HWBOT prime is open source, in case you want to add code to optimize for the PS3 https://github.com/frederikcolardyn/hwbotprime

    @Strunkenbold: do we have database support for this? :D

    I will definitely try to see if any optimizations can be made :) There's plenty of possibilities. Since this is Java, we're probably getting only PPE results. Right off the top of my head I can think of 3 things that require attention: endianess, AltiVec utilization and SPE execution. As soon as I take a look at the code when I'm back at home, I will confirm the most obvious disaster scenario - that the endianess mismatch causes the test to give false positives and false negatives.

     

    I've also noticed that HWBOT has an OpenGL test. This can also be run on the PS3 under Linux, but it won't test the GPU until Linux has a DRM driver for the RSX and Mesa has a GL implementation for RSX. If you get an older version of Mesa (8-10) you can compile a very incomplete implementation of GL for SPE, which is still pretty slow and throws artifacts around like crazy. It got removed in newer releases because its development ceased. If you're going with stock Mesa, you're still testing just the PPE.

  2. Hello, let's suppose that we have a benchmark for endian swaps and popcounts.

    Intro: Let us have any 32bit integer, a 386 CPU, 486 CPU and Intel Nehalem CPU.

    Procedure:

    Endian swap:

    let x := 0xaabbccdd;

    expected result : 0xddccbbaa;

    Population count:

    for any 32bit integer x, the expected result is number of bits set to 1 in given x

     

    My query arises from the fact that each of the 3 specified example CPUs can achieve these results in different ways, one better than the other.

     

    386:

    00000000 <__bswap_32>:
      0:   55                      push   %ebp
      1:   89 e5                   mov    %esp,%ebp
      3:   8b 45 08                mov    0x8(%ebp),%eax
      6:   66 c1 c0 08             rol    $0x8,%ax
      a:   c1 c0 10                rol    $0x10,%eax
      d:   66 c1 c0 08             rol    $0x8,%ax
     11:   5d                      pop    %ebp
     12:   c3                      ret    
    
    00000013 <_mm_popcnt_u32>:
     13:   55                      push   %ebp
     14:   89 e5                   mov    %esp,%ebp
     16:   83 ec 10                sub    $0x10,%esp
     19:   c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%ebp)
     20:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
     27:   eb 19                   jmp    42 <_mm_popcnt_u32+0x2f>
     29:   8b 45 fc                mov    -0x4(%ebp),%eax
     2c:   8b 55 08                mov    0x8(%ebp),%edx
     2f:   88 c1                   mov    %al,%cl
     31:   d3 fa                   sar    %cl,%edx
     33:   89 d0                   mov    %edx,%eax
     35:   83 e0 01                and    $0x1,%eax
     38:   85 c0                   test   %eax,%eax
     3a:   74 03                   je     3f <_mm_popcnt_u32+0x2c>
     3c:   ff 45 f8                incl   -0x8(%ebp)
     3f:   ff 45 fc                incl   -0x4(%ebp)
     42:   83 7d fc 1f             cmpl   $0x1f,-0x4(%ebp)
     46:   7e e1                   jle    29 <_mm_popcnt_u32+0x16>
     48:   8b 45 f8                mov    -0x8(%ebp),%eax
     4b:   c9                      leave  
     4c:   c3                      ret
    
    00000055 <popcnt>:
     55:   55                      push   %ebp
     56:   89 e5                   mov    %esp,%ebp
     58:   ff 75 08                pushl  0x8(%ebp)
     5b:   e8 aa ff ff ff          call   a <_mm_popcnt_u32>
     60:   83 c4 04                add    $0x4,%esp
     63:   c9                      leave  
     64:   c3                      ret

    486 has a dedicated instruction for byteswaps:

    00000000 <__bswap_32>:
      0:   55                      push   %ebp
      1:   89 e5                   mov    %esp,%ebp
      3:   8b 45 08                mov    0x8(%ebp),%eax
      6:   0f c8                   bswap  %eax
      8:   5d                      pop    %ebp
      9:   c3                      ret

    Nehalem has a dedicated instruction for popcounts:

    0000001b <popcnt>:
     1b:   55                      push   %ebp
     1c:   89 e5                   mov    %esp,%ebp
     1e:   83 ec 10                sub    $0x10,%esp
     21:   8b 45 08                mov    0x8(%ebp),%eax
     24:   89 45 fc                mov    %eax,-0x4(%ebp)
     27:   f3 0f b8 45 fc          popcnt -0x4(%ebp),%eax
     2c:   90                      nop
     2d:   c9                      leave  
     2e:   c3                      ret

     

    So, is a fair optimisation something that can exploit the additional instructions of the hardware? Is a benchmark fair if 2 CPUs with different capabilities run the same code to achieve the same results, or is the benchmark still fair if one CPU uses better code, because it actually can run it, to achieve the same result faster?

     

    @update: Sorry, I think I asked this in the wrong section.

  3. Hello, I've decided to run hwbotprime on a PS3 running Debian Linux. It went okay, but there are a couple caveats.

     

    First, 0.8.3 is broken on big-endian. Not sure how this may affect results, but it's pretty serious if we're missing primes because numbers are read backwards.

    Secondly, it fails to detect the CPU clock and name.

    Thirdly, the results vary significantly between running the test in CLI (316.88ps) and GUI (240-280ps).

     

    mdec@Tycho:~/hwbot$ java -jar hwbotprime-0.8.3.jar 
    OpenJDK Zero VM warning: You have loaded library /tmp/libCpuId-32-0.8.3.so which might have disabled stack guard. The VM will try to fix the stack guard now.
    It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
    Failed to load native library CpuId-32-0.8.3 on OS linux: /tmp/libCpuId-32-0.8.3.so: /tmp/libCpuId-32-0.8.3.so: kodowanie danych w pliku ELF nie jest big-endian (Possible cause: endianness mismatch)
    --------- HWBOT Prime 0.8.3 ----------
    
    Processor detected:
    
    Estimating speed... 2x n/aMHz
    211 MB memory
    Running benchmark using 2 threads.
    Starting benchmark...
    Warm up phase:   ..................................................................................................... done!
    Benchmark phase: ..................................................................................................... done!
    All done!
    Score: 316.88.
    Hit enter to compare online, enter a filename to save to file, or type q to quit.
    ps3-results
    q
    Saved file: ps3-results.hwbot
    Hit enter to compare online, enter a filename to save to file, or type q to quit.
    Bye!

    Here's what /proc/cpuinfo has to say about CPU detection:

    mdec@Tycho:~$ cat /proc/cpuinfo
    processor       : 0
    cpu             : Cell Broadband Engine, altivec supported
    clock           : 3192.000000MHz
    revision        : 5.1 (pvr 0070 0501)
    
    processor       : 1
    cpu             : Cell Broadband Engine, altivec supported
    clock           : 3192.000000MHz
    revision        : 5.1 (pvr 0070 0501)
    
    timebase        : 79800000
    platform        : PS3
    model           : SonyPS3

    And here's a screenshot:

    z247EYf.png

    Additional info: kernel 3.15.10, IcedTea 2.6.4 (7u95-2.6.4-1~deb7u1)

×
×
  • Create New...