Jump to content
HWBOT Community Forums

GPUPI - SuperPI on the GPU


_mat_

Recommended Posts

Did you install the newest drivers? Are you sure they can handle OpenCL? If the device doesn't support double precision but can be detected on the system, it will get listed as ignored when starting the benchmark. I'm not sure about your cards, GTS 250 seems to have double support, I think.

 

Detection is mostly a driver issue and has not much to with the benchmark itself.

 

i gone re try it whit the gts 250 but now whit the 344.11 and not the 340.52 drivers. its working on a 560 ti 448 whit the 344.11 drivers

Link to comment
Share on other sites

If you don't mind... please do a 2nd setting which takes a bit longer. What about 32B? :) Sure, most people prefer short benchmarks, but something heavy is also worth considering.
10B already takes about 15 minutes on my GTX 980. 20B would be possible without adapting the algorithm for higher precision. But I have to test.

 

I am currently working on a CUDA implementation. Just curious to see how good the OpenCL implementation of NVIDIA really is.

Link to comment
Share on other sites

  • 2 weeks later...

gpupi-version-1-3-benchmark-opencl-cuda_198739.png

 

Guys, version 1.3 is here. The whole code was refactored and allows multiple APIs now, that are loaded when the system supports it. The new version also includes a standalone for OpenCL and CUDA. The main reason is that the OpenCL version will run on Windows XP, the CUDA version won't.

 

The CUDA implementation was pretty easy and also uses less code to work. Especially the part to setup the application and prepare the calculation was a piece of cake compared to OpenCL. That said, I tried to be as fair as possible and implemented both APIs with each of their advantages, but still rely on the same algorithms and the same basic optimizations. I've also adjusted the OpenCL code a little bit to get them closer together, so the new version might differ a few milliseconds from the results of 1.2. Please use the new version as of now.

 

As requested I added two more digits to reach: 20B and 32B. Smaller graphics cards and CPUs will have to crunch those for days. Karl would have loved it! :D

Have a fun and let me know your results and what you think!

 

Download: GPUPI Beta 1.3

Link to comment
Share on other sites

I've just added "GPUPI - 32B" and "GPUPI for CPU - 1B" to the benchmarks. Let's hope you use it. :)

 

Btw, I guess there should be a discussion if it's allowed to use CUDA for NVIDIA cards to compete in the rankings. Well, that's why I have so carefully implemented CUDA and OpenCL so close together. I think it would be fair, because any performance improvement is due to the vendor's implementation and optimization of kernel, which is basicly the same.

Link to comment
Share on other sites

Great benchmark, love it!

 

I would vote for 2B and 32B to be accepted as 'retail' benchmarks once benchmark passes through the Hwbot validation:

 

I would also lock down the benchmark window when splash screen "Pi Calculation is Done!" comes up. Just for the sake of nostalgia :)

 

Here's my workstation machine with FirePro W9000:

4b6idFG.jpg

 

Wannabe W7000 (flashed 7870, slower DP):

5XYHEnK.jpg

Edited by tiborrr
Link to comment
Share on other sites

Thanks Nico, very much appreciated! :)

 

I will change the message box, good idea. I really wanted to implement it as close to SuperPi as possible, but changed what I felt was outdated or not well done in the orginial benchmark. For example the window can be moved after the message box for a successful calculation is shown. The screenshot weirdos will thank me for this - yeah, I am one of those. It was also important for me to have an options file, that rembers what was set last time.

 

Regarding the default bench settings for ranking, I let you guys decide. I will update the bench to show it as default too.

Edited by _mat_
Link to comment
Share on other sites

It would be an honour! I promise to support the benchmark actively in the foreseeable future.

 

Btw: Next stop is multi gpu support. I just got supported by ASUS with a couple of GTX 980s for an overclocking show in Vienna today. I will use them wisely. :D

Link to comment
Share on other sites

I fully concur with 1B and 32B. This way we're cool for couple of years (y)

 

For the sake of nostalgia please:

- use the original font

- remove the cancel button on calculation start notification :)

 

Also, here's the scaling of my HD7870 (Pitcairn):

g949bcK.png

Was a bit busy with an overclocking show the last few days. If anybody wants the see some pictures, have a look here. :)

 

First off, thanks for the nice scaling diagram. It clearly shows that GPUPI is not bandwidth limited in any way. The millions or even billions of parallely calculated values always stay in the graphics memory, never have to be copied to the host. That's because I've implemented not only the pure calculation on the GPU, there is also a two-stage memory reduction done right afterwards using shared memory inside the workgroups. Only two doubles have to be transfered afterwards to the host to accumulate the final sum of each series (there are four of them).

 

Regarding your suggestions, when writing the bench I initially wanted to use the original font. But it has a lot of kerning and I was not able to put in all the information without resizing the window to a strange and not SuperPi-like proportion. As I don't like the original font that much, I thought I'll better choose something more readable. Additionally I also implemented the text as an editable control in WIN32, because I wanted the text to be selectable for copying information, something I also missed in SuperPI. That comes with the sacrifice to control the spacing of the text itself. SuperPI uses GDI draw calls, where you can state a pixel coordinates to place the text.

 

I've also but some thought in the cancel button before calculation, it's not a bug. ;)

I thought it was a good idea to be able to cancel it. You know how it is benching with ln2. You open the bench, press calculate and wait for the message box to start. Now focus is back on the temperature, pouring ln2 to the maximum of the component. When ready, press Ok. But sometimes things are not ready or you've forgot something. Now you have a choice to go back, quit the application and do it.

 

But guys, if you want it to be more nostalgic, I can try. I've just wanted to let you know, that I put some thought into it and didn't change things for nothing. :)

Link to comment
Share on other sites

It would be an honour! I promise to support the benchmark actively in the foreseeable future.

 

Btw: Next stop is multi gpu support. I just got supported by ASUS with a couple of GTX 980s for an overclocking show in Vienna today. I will use them wisely. :D

 

Nice benchmark!

Please add an option to select which GPU device will do the calculations. So in multi-gpu systems we will be able to select the best overclocker like we play with affinity in task manager to select the best overclocking core while benching Super Pi.

Link to comment
Share on other sites

gpupi-1-4-preview_199145.png

 

First official release version is here! Just a few minor bugfixes, no changes on the GPU code and the results. I recommend to use this version for benching though.

 

Changelog

 

  • Explicit device selection for SLI, Crossfire und systems with multiple sockets and CPUs, implemented for CUDA as well as OpenCL. Important: The sort order of the devices depends on the driver and therefor the vendor implementation. In my tests this is same order in which GPU-Z sorts its devices. Some overclocking tools might order them in their own way.
  • Bugfix: The previously selected CUDA graphics card was not correctly preselected in the settings dialog
  • The final message box after the calculations is now exactly shown as in SuperPI

 

Download: GPUPI 1.4 (723 KB)

Link to comment
Share on other sites

How much does this benefit from FP64? Because if FP64 makes a big impact the WR will always belong to compute cards and that's just no fun for most people.

 

You mean just like many WR's belong to 2P/4P Xeons that cost more than a car? Get with it, we want to see the absolute record possible, not the down-to-earth-Joe-can-afford-it-too.

 

EDIT: @mat

 

Can you implement i18n support?

Link to comment
Share on other sites

How much does this benefit from FP64? Because if FP64 makes a big impact the WR will always belong to compute cards and that's just no fun for most people.
Good double precision helps, but choosing the right weapon - currently R9 290 - and overclocking it to the maximum is key in this benchmark. So FirePro and Quadro won't have a chance to fetch the cups, if they can't be overclocked.

 

Can you implement i18n support?
Some of the is already using WIN32's WCHAR, but not all of it. Why, seen any problems with English yet? Anyways, I have no intention to translate it.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...