Jump to content
HWBOT Community Forums

moog

Members
  • Posts

    3
  • Joined

  • Last visited

Converted

  • Location
    Poland

moog's Achievements

Newbie

Newbie (1/14)

10

Reputation

  1. [Tutorial] How to Install and run Linux on your PS4 - via WOLOLO How to Install and run Linux on your PS4 - Wololo.net https://www.reddit.com/r/linux/comments/3yt1dl/linux_on_the_ps4/ Big thanks! I've already added a result. I will definitely try to see if any optimizations can be made There's plenty of possibilities. Since this is Java, we're probably getting only PPE results. Right off the top of my head I can think of 3 things that require attention: endianess, AltiVec utilization and SPE execution. As soon as I take a look at the code when I'm back at home, I will confirm the most obvious disaster scenario - that the endianess mismatch causes the test to give false positives and false negatives. I've also noticed that HWBOT has an OpenGL test. This can also be run on the PS3 under Linux, but it won't test the GPU until Linux has a DRM driver for the RSX and Mesa has a GL implementation for RSX. If you get an older version of Mesa (8-10) you can compile a very incomplete implementation of GL for SPE, which is still pretty slow and throws artifacts around like crazy. It got removed in newer releases because its development ceased. If you're going with stock Mesa, you're still testing just the PPE.
  2. Hello, let's suppose that we have a benchmark for endian swaps and popcounts. Intro: Let us have any 32bit integer, a 386 CPU, 486 CPU and Intel Nehalem CPU. Procedure: Endian swap: let x := 0xaabbccdd; expected result : 0xddccbbaa; Population count: for any 32bit integer x, the expected result is number of bits set to 1 in given x My query arises from the fact that each of the 3 specified example CPUs can achieve these results in different ways, one better than the other. 386: 00000000 <__bswap_32>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 08 mov 0x8(%ebp),%eax 6: 66 c1 c0 08 rol $0x8,%ax a: c1 c0 10 rol $0x10,%eax d: 66 c1 c0 08 rol $0x8,%ax 11: 5d pop %ebp 12: c3 ret 00000013 <_mm_popcnt_u32>: 13: 55 push %ebp 14: 89 e5 mov %esp,%ebp 16: 83 ec 10 sub $0x10,%esp 19: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%ebp) 20: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp) 27: eb 19 jmp 42 <_mm_popcnt_u32+0x2f> 29: 8b 45 fc mov -0x4(%ebp),%eax 2c: 8b 55 08 mov 0x8(%ebp),%edx 2f: 88 c1 mov %al,%cl 31: d3 fa sar %cl,%edx 33: 89 d0 mov %edx,%eax 35: 83 e0 01 and $0x1,%eax 38: 85 c0 test %eax,%eax 3a: 74 03 je 3f <_mm_popcnt_u32+0x2c> 3c: ff 45 f8 incl -0x8(%ebp) 3f: ff 45 fc incl -0x4(%ebp) 42: 83 7d fc 1f cmpl $0x1f,-0x4(%ebp) 46: 7e e1 jle 29 <_mm_popcnt_u32+0x16> 48: 8b 45 f8 mov -0x8(%ebp),%eax 4b: c9 leave 4c: c3 ret 00000055 <popcnt>: 55: 55 push %ebp 56: 89 e5 mov %esp,%ebp 58: ff 75 08 pushl 0x8(%ebp) 5b: e8 aa ff ff ff call a <_mm_popcnt_u32> 60: 83 c4 04 add $0x4,%esp 63: c9 leave 64: c3 ret 486 has a dedicated instruction for byteswaps: 00000000 <__bswap_32>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 08 mov 0x8(%ebp),%eax 6: 0f c8 bswap %eax 8: 5d pop %ebp 9: c3 ret Nehalem has a dedicated instruction for popcounts: 0000001b <popcnt>: 1b: 55 push %ebp 1c: 89 e5 mov %esp,%ebp 1e: 83 ec 10 sub $0x10,%esp 21: 8b 45 08 mov 0x8(%ebp),%eax 24: 89 45 fc mov %eax,-0x4(%ebp) 27: f3 0f b8 45 fc popcnt -0x4(%ebp),%eax 2c: 90 nop 2d: c9 leave 2e: c3 ret So, is a fair optimisation something that can exploit the additional instructions of the hardware? Is a benchmark fair if 2 CPUs with different capabilities run the same code to achieve the same results, or is the benchmark still fair if one CPU uses better code, because it actually can run it, to achieve the same result faster? @update: Sorry, I think I asked this in the wrong section.
  3. Hello, I've decided to run hwbotprime on a PS3 running Debian Linux. It went okay, but there are a couple caveats. First, 0.8.3 is broken on big-endian. Not sure how this may affect results, but it's pretty serious if we're missing primes because numbers are read backwards. Secondly, it fails to detect the CPU clock and name. Thirdly, the results vary significantly between running the test in CLI (316.88ps) and GUI (240-280ps). mdec@Tycho:~/hwbot$ java -jar hwbotprime-0.8.3.jar OpenJDK Zero VM warning: You have loaded library /tmp/libCpuId-32-0.8.3.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. Failed to load native library CpuId-32-0.8.3 on OS linux: /tmp/libCpuId-32-0.8.3.so: /tmp/libCpuId-32-0.8.3.so: kodowanie danych w pliku ELF nie jest big-endian (Possible cause: endianness mismatch) --------- HWBOT Prime 0.8.3 ---------- Processor detected: Estimating speed... 2x n/aMHz 211 MB memory Running benchmark using 2 threads. Starting benchmark... Warm up phase: ..................................................................................................... done! Benchmark phase: ..................................................................................................... done! All done! Score: 316.88. Hit enter to compare online, enter a filename to save to file, or type q to quit. ps3-results q Saved file: ps3-results.hwbot Hit enter to compare online, enter a filename to save to file, or type q to quit. Bye! Here's what /proc/cpuinfo has to say about CPU detection: mdec@Tycho:~$ cat /proc/cpuinfo processor : 0 cpu : Cell Broadband Engine, altivec supported clock : 3192.000000MHz revision : 5.1 (pvr 0070 0501) processor : 1 cpu : Cell Broadband Engine, altivec supported clock : 3192.000000MHz revision : 5.1 (pvr 0070 0501) timebase : 79800000 platform : PS3 model : SonyPS3 And here's a screenshot: Additional info: kernel 3.15.10, IcedTea 2.6.4 (7u95-2.6.4-1~deb7u1)
×
×
  • Create New...