GPUPI - SuperPI on the GPU

Strong Island · June 21, 2016

Have you tried a memory benchmark yet? Try AIDA64 to test your bandwidth, it should be impacted as well: Overclocking, overclocking, and much more! Like overclocking.

Otherwise it's a driver issue, but I doubt it. We never had an efficiency problem with the memory reduction before. It's btw a very common technique for summing up a lot of memory in parallel. The pi calculation itself depends on much more to be efficient.

I will give it a shot, the only memory benchmark I have done with this same system is geekbench and when I look at my memory scores compared with others they seem perfectly normal.

If my bandwidth was off the memory score part of this bench would probably be affected also right? Will run aida. Also isnt it strange that 1/64 and 4/512 are so far apart. 1/64 is very close to normal and 4/512 isnt.

Strong Island`s Geekbench3 - Multi Core score: 14551 points with a Core i3 6100

_mat_ · June 21, 2016

I don't know what Geekbench really does when benching memory. If it's mostly a bandwidth test, it should be affected as well.

The gap between 4/512 and 1/64 says a lot. The more the batch size is adjusted to the architecture itself, the faster the bench will be. That's because the workload is aligned to the maximum worksize, that can be run in parallel. 4M seems to be the best choice for the 6100 with 2 Cores/4 Threads.

About the same is true for the reduction size. The bigger the better, because 512 means that 512 sums will be added up to one until only one sum remains. Lets say that we want to sum up 8192 single numbers, that would be:

step 1: sum of 512 numbers, sum of 512 numbers, ... (16 times)

step 2: sum of 16 numbers = result

Where as the reduction size of 64 would produce:

step 1: sum of 64 numbers, sum of 64 numbers, sum of 64 numbers ... (128 times)

step 2: sum of 64 numbers, sum of 64 numbers

step 3: sum of 2 numbers = result

If you consider that GPUPI produces billions of partial results, that need to be added up, then 512 also needs a lot less steps in general to sum up the batches after they are calculated. Additionally the bigger the batch size, the less reduction process have to be made for the calculation. So these two values mean a lot for the whole computation.

Strong Island · June 21, 2016

I don't know what Geekbench really does when benching memory. If it's mostly a bandwidth test, it should be affected as well.

The gap between 4/512 and 1/64 says a lot. The more the batch size is adjusted to the architecture itself, the faster the bench will be. That's because the workload is aligned to the maximum worksize, that can be run in parallel. 4M seems to be the best choice for the 6100 with 2 Cores/4 Threads.

About the same is true for the reduction size. The bigger the better, because 512 means that 512 sums will be added up to one until only one sum remains. Lets say that we want to sum up 8192 single numbers, that would be:

step 1: sum of 512 numbers, sum of 512 numbers, ... (16 times)

step 2: sum of 16 numbers = result

Where as the reduction size of 64 would produce:

step 1: sum of 64 numbers, sum of 64 numbers, sum of 64 numbers ... (128 times)

step 2: sum of 64 numbers, sum of 64 numbers

step 3: sum of 2 numbers = result

If you consider that GPUPI produces billions of partial results, that need to be added up, then 512 also needs a lot less steps in general to sum up the batches after they are calculated. Additionally the bigger the batch size, the less reduction process have to be made for the calculation. So these two values mean a lot for the whole computation.

ya it's just strange because the top scores for dual core are 6100's and are using 4/512. I'm sorry I'm probably being so annoying but Im just so confused.

Even switched mobos today and did fresh install and didnt disable any services and 1/64 and 4/512 are still about 7sec apart with 1/64 being faster. The bigger my batch size the higher my reduction time is.

With gpupi 2.2 am I supposed to have opencl drivers in the gpupi folder, like the cuda drivers are? Because I dont have those in my folder when I download gpupi.

Edited June 21, 2016 by Strong Island

_mat_ · June 21, 2016

With gpupi 2.2 am I supposed to have opencl drivers in the gpupi folder, like the cuda drivers are? Because I dont have those in my folder when I download gpupi.

No, it's not necessary. OpenCL works out of the box when installed on the system.

Strong Island · June 21, 2016

No, it's not necessary. OpenCL works out of the box when installed on the system.

ok thanks, damn thought I found something. I never had trouble with gpupi before. Got some really nice scores with gtx 980 and 5960x. Only the non k cpu's are giving me trouble.

June 21, 2016

What do you exactly mean with crashing? Runtime assertion? Please specify the exact error message and have a look at GPUPI.log as well.

A wrangled OpenCL driver is btw much more common to produce such an error. Please uninstall any driver and reinstall again.

GPUPI Log:

LOG START at 2016-06-21 20:57:42 ----------------------

Timer frequency is only 3139160 Hz. HPET timer not found!

CUDA driver version is insufficient for CUDA runtime version

Timer frequency is only 3139160 Hz. HPET timer not found!

Starting run to calculate 1000000 digits with 1 batches

Batch Size: 1M

Maximum Reduction Size: 64

Message box: Press OK to start the calculation. (Start)

Timer frequency is only 3139160 Hz. HPET timer not found!

I'm running Windows 7 x64

GtiJason · June 22, 2016

GPUPI Log:

LOG START at 2016-06-21 20:57:42 ----------------------

Timer frequency is only 3139160 Hz. HPET timer not found!

CUDA driver version is insufficient for CUDA runtime version

Timer frequency is only 3139160 Hz. HPET timer not found!

Starting run to calculate 1000000 digits with 1 batches

Batch Size: 1M

Maximum Reduction Size: 64

Message box: Press OK to start the calculation. (Start)

Timer frequency is only 3139160 Hz. HPET timer not found!

I'm running Windows 7 x64

Try to enable HPET (High Precision Event Timer ) in BIOS. And read FAQ section from link below about hpet. You'll need to run Command Prompt (Admin) showing. . .

bcdedit /set useplatformclock yes

https://www.overclockers.at/news/gpupi-international-support-thread

EDIT: Make sure you have Visual Studio 2013 runtime download name is vcredist_64.exe

Edited June 22, 2016 by GtiJason

Strong Island · June 22, 2016

it looks like I finally figured it out. Maybe I was installing the amd sdk wrong. I used my hd5450 and installed catalyst 14.2 beta 1.3 and my first couple tests 4/512 was faster than 1/64 and it seemed normal again. Of course right as I figured it out I had trouble with cold bug on mocf so I had to stop but I think as long as chip is ok, tonight should be awesome.

Just wanted to say thanks for everyone's help, especially mat and Jason.

June 22, 2016

Try to enable HPET (High Precision Event Timer ) in BIOS. And read FAQ section from link below about hpet. You'll need to run Command Prompt (Admin) showing. . .

bcdedit /set useplatformclock yes

https://www.overclockers.at/news/gpupi-international-support-thread

EDIT: Make sure you have Visual Studio 2013 runtime download name is vcredist_64.exe

Roger that!! The Visual Studio have been installed, and gonna go try enable the HPET.

Strong Island · June 23, 2016

Man I dont think a score ever felt so good. I was trying for almost 2 months for this. New dual core #1.

Thank you so much _mat_ and gtijason for helping me so much

1080

Massman · June 24, 2016

Congrats @Strong Island, epic score!

Strong Island · June 24, 2016

Congrats @Strong Island, epic score!

Thanks a lot, been having so much fun with this cpu, and to think it only cost me $100.

_mat_ · June 24, 2016

Awesome work and congrats!

June 27, 2016

Man I dont think a score ever felt so good. I was trying for almost 2 months for this. New dual core #1.

Thank you so much _mat_ and gtijason for helping me so much

Congratulations!!!!!

About my problem with GPUPI, i solved this issue installing SP1 for Windows 7.

Hugs.

_mat_ · June 27, 2016

Nice and thanks for reporting back! Added both solutions to the FAQ.

_mat_ · July 9, 2016

Just a quick heads up! Tomorrow I will release GPUPI 2.3 with multiple bugfixes and features. I am very happy with the new support of CUDA 8.0 plus serveral optimizations of the CUDA kernels that finally led to faster scores than the OpenCL implementation.

Have a look at this score with a GTX 1080, it's top 10 on air cooling - and my sample doesn't really clock well:

So hold your horses for now if you are benching NVIDIA cards with GPUPI.

_mat_ · July 16, 2016

For the purpose of full exposure, I am posting this here as well. After lots of hours of bugfixing version 2.3 is finally bulletproof. Please redownload the newest build for before benching: GPUPI 2.3.4

Additionally the following features were added in the last four minor version:

Supporting Tesla graphics cards
Support for more than 8 devices - theoretically thousands of devices could be used for calculation now!
Detection of AMDs RX 480 graphics cards
Important bugfixes for the Legacy version and GeForce 200 series cards
Cleanup for source code

Download: https://www.overclockers.at/news/gpupi-international-support-thread

Many many thanks to dhenzjhen again, because of his support GPUPI is now better and more flexible than ever! If you haven't seen his score with 10 Tesla M40s you better do it now: dhenzjhen`s GPUPI - 1B score: 2sec 621ms with a GeForce GTX 1080 (it's momentarily filed under GTX 1080 because the M40s are not in the database)

havli · July 16, 2016

Good work, looking forward to test the GTX 285 in the future.

But there is bug of some kind affecting the legacy 2.3.4 executable when using winXP. At the moment I'm trying to run the new GPUPI on P4 Celeron + winXP SP3 32bit, the only thing I get is this error:

... is not a valid win32 application.

GPUPI 2.2. works fine on this PC. Perhaps wrong compile settings, which prevents GPUPI 2.3.4 from running on XP?

_mat_ · July 17, 2016

I have not tested the Legacy version yet. It was compiled with VS 2013 (instead of 2012) which introduces some major changes. I will have a look at it as soon as I have more time.

Btw: I don't know if I will continue to support the Legacy version. It's a lot of work and has very few downloads. This may be the last version.

_mat_ · July 17, 2016

I fixed it, but couldn't make it run in a virtual machine because I can't install OpenCL. Please redownload this version and try again: https://www.overclockers.at/news/gpupi-2-3

Edited July 17, 2016 by _mat_

havli · July 17, 2016

This build works! Thank you.

_mat_ · July 17, 2016

Awesome! Thanks for your feedback.

Oj0 · July 17, 2016

Does this build fix runs done using different cards? Eg with a 980 and 970 or a 1080 and 970 it would error out after the 4th iteration even at stock although 2.2 was fine even with both cards heavily overclocked.

_mat_ · July 17, 2016

Yes, use 2.3.4 it fixes all fo the sync and validation issues with multiple cards.

Btw, 2.2 had no runtime validation, it only validates the final result. That's why it works.

Oj0 · July 17, 2016

Yes, use 2.3.4 it fixes all fo the sync and validation issues with multiple cards.

Btw, 2.2 had no runtime validation, it only validates the final result. That's why it works.

I know, but 2.2 would pass the end validation.

GPUPI - SuperPI on the GPU

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

_mat_

Leeghoofd

Splave

Posted Images

Guest PH Benchmarker

Guest PH Benchmarker

Guest PH Benchmarker

Join the conversation