Jump to content
HWBOT Community Forums

GPUPI - SuperPI on the GPU


_mat_

Recommended Posts

i took a screenshot which i will post when i can, but the legacy works with hwinfo disabled
i guess my c++ 2015 got corrupted since i could run 3.1 not long ago.
i got an appcrash on file saving, I was happy :o the calculation ran fine, and faster (36sec instead of 47) but no file saved :D

 

Link to comment
Share on other sites

I have to agree with bigblock because he uses asrock gear wait sorry did I say that out loud. ahem

 

yes lets push forward! make the bench run efficiently...why make it be 7 minutes when it could be 4 minutes. always push forward, if rebenching is so hard you dont have to...but you will be beaten. this happens every time a new generation of HW is released not much different than that imo

  • Like 2
  • Thanks 2
Link to comment
Share on other sites

Then lets push forward! I love it! :D

A few facts:

  • Due to the less complex calculation code, the comparison between different GPU/CPU architectures and GPGPU APIs (CUDA VS OpenCL) is fairer than it ever was. No extra code for compatibility, just what's necessary to make it run.
  • The improved handling of OpenCL makes better use of the devices (= CPUs and AMD GPUs). That results in less differences between the OpenCL drivers. Plus Batch Size and Reduction Size are not that important than they were before. They are still if you want the golden cup, but beginners won't score as bad if they just click Calculate + Ok.
  • AMD OpenCL 1.2 now trumps everything that Intel has got. Even for 100M on Intel CPUs. The Intel OpenCL drivers suck.
  • AMD OpenCL 2.0 works much better now. It still has problems with bigger reduction sizes like 512 and 1024. Seems like 256 is the best currently.

I have just uploaded a version with some last minute fixes for the HWBOT submission (better wording, removed the screenshot canceling).

Please download GPUPI 3.3 from here: https://www.overclockers.at/news/gpupi-3-ist-final
It's now official! 

Link to comment
Share on other sites

On 4/5/2018 at 12:23 AM, _mat_ said:

Fair warning: GPUPI 3.3 is now officially available and will give a decent speedup on all calculations. I'd like to clarify the reasons behind this, because I know that means some rebenching might be necessary for the top ranks.

It's the first time since GPUPI 1.x that I changed code inside the calculation kernel. Yes, GPUPI had some speed increases before but only because new hardware was officially supported/optimized or new CUDA/OpenCL versions got new features that GPUPI could use for improvements to stay on the edge of what's out there.

I am currently implementing a native path for CPUs, that will take advantage of OpenMP and AVX/AVX2. OpenMP needs no extra installation like OpenCL (it's compiled into the application) and will be far more efficient, highly optimizable and all in all faster than any OpenCL implementation for CPUs out there. Last but not least OpenCL is treated badly by CPU vendors and rarely gets updates nor optimizations or fast support for new hardware. With OpenMP I can decide all that for myself and optimize/support any CPU I can get my hands on. Plus GPUPI gets less complicated because no additional drivers will need to be installed. GPUs already get their GPGPU API with the graphics driver (with the exception of Intel's iGPUs) and CPUs won't need anything installed anymore. Just start GPUPI and you are good to go. :)

Bottom point is I want to get rid of OpenCL for CPUs in the long run. To make that happen I needed to slim down the calculation part that handles 128 bit integers to improve vectorization (for AVX support), which ultimately led to the speedup. But why the hell did I decide to release the improved code already with GPUPI 3.3? Because:

  1. It resolves a number of compatibility issues that I had since the first release of GPUPI. I had to manually tweak some kernel code for older devices and that's a very time consuming thing.
  2. I always want to release the best version of GPUPI that my current abilities allow me to. It's kind of my personal way of overclocking (with code).
  3. I had some feedback that GPUPI 3.1 is faster than 3.2 (which can be true for some GPU/CPU combinations) and so people are currently using 3.1, 3.1.1 or 3.2 for results. A fourth version of GPUPI (3.3) with similar speed wouldn't have made the situation easier, neither for maintaing nor for benching. I think it's no fun at all to have to try different versions of the same bench to get the best result. The speedup of 3.3 resolves that, because now there is only one obvious choice.

Without further ado, here is GPUPI 3.3: https://www.overclockers.at/news/gpupi-3-ist-final (it's in German for now but there are lots of images ;))

This is already screwing up the rankings, like this GPU on air and ~ 300 mhz less beating SS:
http://hwbot.org/submission/3826281_niuulh_gpupi___1b_geforce_gtx_970_24sec_0ms/

My suggestion would be to make this two different rankings, like CB11.5 and 15. My last GPUpi for CPU 1B run (and I've done many!) took about 30$ worth of LN2 and several hours. Let's not make people rebench that just because a new version came out. New hardware releases already do that often enough.

@richba5tard@Leeghoofd What are your thoughts? IMO GPUpi runs on enough different hardware that making two rankings wouldn't split it too much.

Link to comment
Share on other sites

3 hours ago, unityofsaints said:

This is already screwing up the rankings, like this GPU on air and ~ 300 mhz less beating SS:
http://hwbot.org/submission/3826281_niuulh_gpupi___1b_geforce_gtx_970_24sec_0ms/

My suggestion would be to make this two different rankings, like CB11.5 and 15. My last GPUpi for CPU 1B run (and I've done many!) took about 30$ worth of LN2 and several hours. Let's not make people rebench that just because a new version came out. New hardware releases already do that often enough.

@richba5tard@Leeghoofd What are your thoughts? IMO GPUpi runs on enough different hardware that making two rankings wouldn't split it too much.

Yeah thats me, I spent 2 rough weeks working on getting that gold but am happy about the change. We NEED active developers like mat in our community. GPU PI works for both cpu and gpu and does it well, doesn't cost much, has one of the best anti cheat integration and isn't hwbot prime. (aka "Slot Machine") or XTU (favors slower hardware just because it's new and Intel wants all your monies)

  • Like 2
Link to comment
Share on other sites

18 hours ago, _mat_ said:

That's not possible, C++ 2015 is necessary since 2.x. It might just have been installed by another application with an installer.

That's definitely not enough information to get anything fixed. Are you opening the normal version with your Q9950? That can result in an OpenCL error, if you don't have an OpenCL 2.x runtime installed. GPUPI would complain about a missing "clCreateCommandQueueWithProperties" function. Old hardware should only use the Legacy version, it supports OpenCL 1.1.

 

The app crashes when you save a file to submit later, with normal or legacy version (I had it on ep45 ud3p q9550 and maximus 8 ranger 7600k) if you click to SKIP CPU detection.
For q9550, I did not do hwinfo detection. For 7600K I did detection, but it crashed in both cases.

If I do not skip cpu detection, file is saved normally, no app crash.

Hope this is clear, and may help

Link to comment
Share on other sites

GPUPI 3.3.1

Not a day old and already a bugfix release. :P

Download here: https://www.overclockers.at/news/gpupi-3-is-now-official

Changelog:

  • Bugfix for kernel compilation on old AMD graphics cards
  • Bugfix for command line mode when using "-a" parameter (optional API selection)
  • Improved error message including a tip when the calculation fails due to the watchdog timer resetting the graphics driver (only happens on old graphics cards when a kernel takes longer than 5 seconds).
  • Improved error message when an OpenCL device runs out of ressources including a tip how to fix it (for example on old AMD graphics cards with reduction size 512)
  • Bugfix for Multi-GPU Mode: If one of the devices abort the calculation due to an error, the benchmark run is now aborted.

 

Link to comment
Share on other sites

2 minutes ago, GeorgeStorm said:

Thought I'd have a quick go, seemed to crash whilst saving screenshot everytime, with skipping and not skipping detection, and having hwinfo full or off.

 

Specs:

W7, Q9550, GTX970 (what I was using to run it).

I tried only on the 7600k, it may also crash here with q9550 with or without skipping cpu detection...

Link to comment
Share on other sites

20 minutes ago, GeorgeStorm said:

Thought I'd have a quick go, seemed to crash whilst saving screenshot everytime, with skipping and not skipping detection, and having hwinfo full or off.

 

Specs:

W7, Q9550, GTX970 (what I was using to run it).

Can you do me a favor please and post a screenshot with open Debug Log (Menu: Tools => Debug Log). Please open the log window before you are saving the result file. Is that possible or is the application instantly closed?

Link to comment
Share on other sites

41 minutes ago, Jokot said:

I just tried with both 290x and 390 and it crashed right when bench was suppose to start (GPUPI 3.3.1)

Thanks to the open Debug Log window I can narrow this down to the memory not being properly detected by HWiNFO.

Can you please test GPUPI 3.2 as well? No screenshot needed, just a confirmation, that it's not working there too. Thanks!

Link to comment
Share on other sites

2 minutes ago, _mat_ said:

Thanks to the open Debug Log window I can narrow this down to the memory not being properly detected by HWiNFO.

Can you please test GPUPI 3.2 as well? No screenshot needed, just a confirmation, that it's not working there too. Thanks!

3.2 worked no problem.

Link to comment
Share on other sites

The last bugfix release is already an hour old, so lets post a new one: GPUPI 3.3.2

Download here: https://www.overclockers.at/news/gpupi-3-is-now-official

Changelog:

  • Bugfix: Application crashed while saving a result file
  • Bugfix: Some Intel iGPs could not compile the OpenCL kernels due to an incompatibility
  • Bugfix: Application crashed on certain systems while benchmark run initialization (due to memory detection)
  • The hardware detection of the memory manufacturer is a delicate process and can crash the application, so it will be skipped when running HWiNFO in "Safe Mode"
  • Improved the error message when an OpenCL reduction kernel can't be initialized due to limited shared memory (only possible on weak iGPs)

All open bugs should be fixed now! Thanks to everybody that helped to improve GPUPI!

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

I mean originally I thought it was just supporting new instructions which is the same as new xtu versions. But now it seems that it's just more efficient code, which I'm not sure about. Either way, if we're moving this direction the sooner the change is made the better. I take the quick and painful approach to band aids myself.

Link to comment
Share on other sites

Yes, it's cleaner code with improved comparability between different devices and the OpenCL path is now implemented correctly as it always should have been.

The different OpenCL drivers produce closer results (although AMD OpenCL 1.2 = AMD APP SDK 2.9-1 is now the best choice in all categories), Batch Size and Reduction Size are not as picky as it was before and on NVIDIA cards the OpenCL implementation comes very close to the CUDA implementation, which indicates that everything is done right now.

The bottom line is GPUPI is now much better as a benchmark in general. I would have done this with GPUPI 1 already, if I could have. But I wasn't good enough at OpenCL coding and mathematics back then (OpenCL is a brutal beast though).

The good news is that something like this won't happen again. I will not touch the algorithm anymore, because it's pretty much maxed out the way I do it. The next step is an OpenMP path that gets rid of the OpenCL implementation, but that's many months away and will not overrule current results. The CPU path will be split into OpenCL and OpenMP (or Native, don't know yet), so no rebenching necessary. The new path will make use of AVX and whatever comes next to support the hell out of everything that comes in my way. :)

I know the XTU coders and they don't seem to be interested in overclocking, let alone competitive oc. They just do their job and as far as my experience with XTU SDK goes, it's not entirely a good one (sry guys). I really try to do things differently with GPUPI. I want it to be on the bleeding edge too, but I wouldn't have introduced the speedup with 3.3 if everybody in this thread would have stood against it. As I already said: GPUPI should first and foremost be fun to bench.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...