Jump to content
HWBOT Community Forums

GPUPI - SuperPI on the GPU


_mat_

Recommended Posts

This is so strange, it's the only bench I have ever had an issue that I couldnt eventually figure out. I guess I should try with different mobo and cpu.
Have you tried a memory benchmark yet? Try AIDA64 to test your bandwidth, it should be impacted as well: Overclocking, overclocking, and much more! Like overclocking.

 

Otherwise it's a driver issue, but I doubt it. We never had an efficiency problem with the memory reduction before. It's btw a very common technique for summing up a lot of memory in parallel. The pi calculation itself depends on much more to be efficient.

Link to comment
Share on other sites

Have you tried a memory benchmark yet? Try AIDA64 to test your bandwidth, it should be impacted as well: Overclocking, overclocking, and much more! Like overclocking.

 

Otherwise it's a driver issue, but I doubt it. We never had an efficiency problem with the memory reduction before. It's btw a very common technique for summing up a lot of memory in parallel. The pi calculation itself depends on much more to be efficient.

 

I will give it a shot, the only memory benchmark I have done with this same system is geekbench and when I look at my memory scores compared with others they seem perfectly normal.

 

If my bandwidth was off the memory score part of this bench would probably be affected also right? Will run aida. Also isnt it strange that 1/64 and 4/512 are so far apart. 1/64 is very close to normal and 4/512 isnt.

 

Strong Island`s Geekbench3 - Multi Core score: 14551 points with a Core i3 6100

Link to comment
Share on other sites

I don't know what Geekbench really does when benching memory. If it's mostly a bandwidth test, it should be affected as well.

 

The gap between 4/512 and 1/64 says a lot. The more the batch size is adjusted to the architecture itself, the faster the bench will be. That's because the workload is aligned to the maximum worksize, that can be run in parallel. 4M seems to be the best choice for the 6100 with 2 Cores/4 Threads.

 

About the same is true for the reduction size. The bigger the better, because 512 means that 512 sums will be added up to one until only one sum remains. Lets say that we want to sum up 8192 single numbers, that would be:

 

step 1: sum of 512 numbers, sum of 512 numbers, ... (16 times)

step 2: sum of 16 numbers = result

 

Where as the reduction size of 64 would produce:

 

step 1: sum of 64 numbers, sum of 64 numbers, sum of 64 numbers ... (128 times)

step 2: sum of 64 numbers, sum of 64 numbers

step 3: sum of 2 numbers = result

 

If you consider that GPUPI produces billions of partial results, that need to be added up, then 512 also needs a lot less steps in general to sum up the batches after they are calculated. Additionally the bigger the batch size, the less reduction process have to be made for the calculation. So these two values mean a lot for the whole computation. :)

Link to comment
Share on other sites

I don't know what Geekbench really does when benching memory. If it's mostly a bandwidth test, it should be affected as well.

 

The gap between 4/512 and 1/64 says a lot. The more the batch size is adjusted to the architecture itself, the faster the bench will be. That's because the workload is aligned to the maximum worksize, that can be run in parallel. 4M seems to be the best choice for the 6100 with 2 Cores/4 Threads.

 

About the same is true for the reduction size. The bigger the better, because 512 means that 512 sums will be added up to one until only one sum remains. Lets say that we want to sum up 8192 single numbers, that would be:

 

step 1: sum of 512 numbers, sum of 512 numbers, ... (16 times)

step 2: sum of 16 numbers = result

 

Where as the reduction size of 64 would produce:

 

step 1: sum of 64 numbers, sum of 64 numbers, sum of 64 numbers ... (128 times)

step 2: sum of 64 numbers, sum of 64 numbers

step 3: sum of 2 numbers = result

 

If you consider that GPUPI produces billions of partial results, that need to be added up, then 512 also needs a lot less steps in general to sum up the batches after they are calculated. Additionally the bigger the batch size, the less reduction process have to be made for the calculation. So these two values mean a lot for the whole computation. :)

 

ya it's just strange because the top scores for dual core are 6100's and are using 4/512. I'm sorry I'm probably being so annoying but Im just so confused.

 

Even switched mobos today and did fresh install and didnt disable any services and 1/64 and 4/512 are still about 7sec apart with 1/64 being faster. The bigger my batch size the higher my reduction time is.

 

With gpupi 2.2 am I supposed to have opencl drivers in the gpupi folder, like the cuda drivers are? Because I dont have those in my folder when I download gpupi.

Edited by Strong Island
Link to comment
Share on other sites

With gpupi 2.2 am I supposed to have opencl drivers in the gpupi folder, like the cuda drivers are? Because I dont have those in my folder when I download gpupi.

No, it's not necessary. OpenCL works out of the box when installed on the system.
Link to comment
Share on other sites

Guest PH Benchmarker
What do you exactly mean with crashing? Runtime assertion? Please specify the exact error message and have a look at GPUPI.log as well.

 

A wrangled OpenCL driver is btw much more common to produce such an error. Please uninstall any driver and reinstall again.

 

GPUPI Log:

 

 

LOG START at 2016-06-21 20:57:42 ----------------------

Timer frequency is only 3139160 Hz. HPET timer not found!

CUDA driver version is insufficient for CUDA runtime version

Timer frequency is only 3139160 Hz. HPET timer not found!

Starting run to calculate 1000000 digits with 1 batches

Batch Size: 1M

Maximum Reduction Size: 64

Message box: Press OK to start the calculation. (Start)

Timer frequency is only 3139160 Hz. HPET timer not found!

 

 

 

I'm running Windows 7 x64

Link to comment
Share on other sites

GPUPI Log:

 

 

LOG START at 2016-06-21 20:57:42 ----------------------

Timer frequency is only 3139160 Hz. HPET timer not found!

CUDA driver version is insufficient for CUDA runtime version

Timer frequency is only 3139160 Hz. HPET timer not found!

Starting run to calculate 1000000 digits with 1 batches

Batch Size: 1M

Maximum Reduction Size: 64

Message box: Press OK to start the calculation. (Start)

Timer frequency is only 3139160 Hz. HPET timer not found!

 

I'm running Windows 7 x64

 

Try to enable HPET (High Precision Event Timer ) in BIOS. And read FAQ section from link below about hpet. You'll need to run Command Prompt (Admin) showing. . .

 

bcdedit /set useplatformclock yes

 

https://www.overclockers.at/news/gpupi-international-support-thread

 

EDIT: Make sure you have Visual Studio 2013 runtime download name is vcredist_64.exe

Edited by GtiJason
Link to comment
Share on other sites

it looks like I finally figured it out. Maybe I was installing the amd sdk wrong. I used my hd5450 and installed catalyst 14.2 beta 1.3 and my first couple tests 4/512 was faster than 1/64 and it seemed normal again. Of course right as I figured it out I had trouble with cold bug on mocf so I had to stop but I think as long as chip is ok, tonight should be awesome.

 

Just wanted to say thanks for everyone's help, especially mat and Jason.

Link to comment
Share on other sites

Guest PH Benchmarker
Try to enable HPET (High Precision Event Timer ) in BIOS. And read FAQ section from link below about hpet. You'll need to run Command Prompt (Admin) showing. . .

 

bcdedit /set useplatformclock yes

 

https://www.overclockers.at/news/gpupi-international-support-thread

 

EDIT: Make sure you have Visual Studio 2013 runtime download name is vcredist_64.exe

 

Roger that!! The Visual Studio have been installed, and gonna go try enable the HPET.

Link to comment
Share on other sites

Guest PH Benchmarker
Man I dont think a score ever felt so good. I was trying for almost 2 months for this. New dual core #1.

 

Thank you so much _mat_ and gtijason for helping me so much

 

1080

 

Congratulations!!!!!

 

About my problem with GPUPI, i solved this issue installing SP1 for Windows 7.

 

Hugs.

Link to comment
Share on other sites

  • 2 weeks later...

Just a quick heads up! Tomorrow I will release GPUPI 2.3 with multiple bugfixes and features. I am very happy with the new support of CUDA 8.0 plus serveral optimizations of the CUDA kernels that finally led to faster scores than the OpenCL implementation.

 

Have a look at this score with a GTX 1080, it's top 10 on air cooling - and my sample doesn't really clock well:

 

gpupi-2-3-score-gtx-1080_215326.jpg

 

So hold your horses for now if you are benching NVIDIA cards with GPUPI. :)

Link to comment
Share on other sites

For the purpose of full exposure, I am posting this here as well. After lots of hours of bugfixing version 2.3 is finally bulletproof. Please redownload the newest build for before benching: GPUPI 2.3.4

 

Additionally the following features were added in the last four minor version:

 

  • Supporting Tesla graphics cards
  • Support for more than 8 devices - theoretically thousands of devices could be used for calculation now!
  • Detection of AMDs RX 480 graphics cards
  • Important bugfixes for the Legacy version and GeForce 200 series cards
  • Cleanup for source code

 

Download: https://www.overclockers.at/news/gpupi-international-support-thread

 

Many many thanks to dhenzjhen again, because of his support GPUPI is now better and more flexible than ever! If you haven't seen his score with 10 Tesla M40s you better do it now: dhenzjhen`s GPUPI - 1B score: 2sec 621ms with a GeForce GTX 1080 (it's momentarily filed under GTX 1080 because the M40s are not in the database)

Link to comment
Share on other sites

Good work, looking forward to test the GTX 285 in the future. :)

But there is bug of some kind affecting the legacy 2.3.4 executable when using winXP. At the moment I'm trying to run the new GPUPI on P4 Celeron + winXP SP3 32bit, the only thing I get is this error:

 

... is not a valid win32 application.

 

GPUPI 2.2. works fine on this PC. Perhaps wrong compile settings, which prevents GPUPI 2.3.4 from running on XP?

Link to comment
Share on other sites

I have not tested the Legacy version yet. It was compiled with VS 2013 (instead of 2012) which introduces some major changes. I will have a look at it as soon as I have more time.

 

Btw: I don't know if I will continue to support the Legacy version. It's a lot of work and has very few downloads. This may be the last version.

Link to comment
Share on other sites

Does this build fix runs done using different cards? Eg with a 980 and 970 or a 1080 and 970 it would error out after the 4th iteration even at stock although 2.2 was fine even with both cards heavily overclocked.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...