GPUPI - SuperPI on the GPU

niobium615 · February 1, 2019

On 1/29/2019 at 8:03 AM, Gunslinger said:

Anyone have any tips for getting 2080 Ti to go to 3D clocks for GPUPI?

Single card and SLI are both staying at low power clocks despite OS settings to high performance.

Have you checked to make sure the CUDA - Force P2 State flag is set to Off/False in the driver? Should be the default value on consumer Turing cards AFAIK, but it might have gotten changed somehow.

Edited February 1, 2019 by niobium615

Gunslinger · February 2, 2019

my settings match that screen grab.

So ThermSpy is working fine as far as forcing 3d clocks, but as soon as I open GPUPI, they revert to 2D mode

3.2, 3.3 and 3.2 legacy are all acting this way. I have no idea how to fix it.

Edited February 2, 2019 by Gunslinger

richiec77 · February 2, 2019

41 minutes ago, Gunslinger said:

my settings match that screen grab.

So ThermSpy is working fine as far as forcing 3d clocks, but as soon as I open GPUPI, they revert to 2D mode

3.2, 3.3 and 3.2 legacy are all acting this way. I have no idea how to fix it.

If they match the Screenshot, then you have CUDA P2 states enabled. You want them disabled.

P2 ON drops the clocks for better stability for critical calculations. For max Clocks, P2 should be OFF.

niobium615 · February 3, 2019

Exactly what richie said. Sorry about the screenshot settings not matching my post, just grabbed it from a bunch of screenshots that I took a while ago.

Gunslinger · February 3, 2019

I changed tthe P2 state setting and it's now working.

laine · October 22, 2019

hello

I seek help about gpupi and invalid result. I am very new to modifying the clocks but since I am on a laptop, my goal is really undervolting in order to reduce the heat. I have Microsoft Windows [version 10.0.17763.805]

I have a thinkpad and the cpu is core i5 8265u, intel driver is 26.20.100.6999 f or Intel(R) UHD Graphics 620 dated 27/06/2019 . Currently nothing is undervolted and the invalidity is irrespective of the ''power scheme'' by windows, which is currently ''high performance''.

THe driver in gpupi is ''opencl 2.1'' by intel

I run benchmate 0.9.2 to have all the benchmarks available easily and I run gpupi 3.3.3 or 3.2.

However when I select ''reduction size'' for pure GPU test, as 512 or 1024, I always get

Quote

Timer: HPET (24.00 MHz)

Intel OpenCL 2.1
- Intel UHD Graphics 620 (24 CUs, 1100 MHz, OpenCL 2.0)
   Compiling OpenCL kernels ... done.

Calculating 32.000.000th digit of PI. 4 iterations.

Allocated device memory : 536.89 MB
Batch Size              : 32M
Reduction Size          : 512 (Type: Default)

00h 00m 00.000s Batch 1 finished.

Error: Invalid partial result by a margin of 0.445571395913538

Calculation aborted due to an invalid partial result.

irrespective of the ''batch size''. I have no error when ''reduction size'' is 256 or lower. I have no error with CPU for any ''reduction size'' .

since I already have the latest drivers offered by lenovo, does it mean my chip is bad ?

Can i be sure that my future undervolting is safe when I stick to 256 as ''reduction size'' and the result is valid in gpupi ?

Edited October 22, 2019 by laine

laine · October 22, 2019

actually even on stock CPU, before any undervolting, I get the error at 0,5B, always on the 20th iteration, with 1024 as ''reduction size''. no error with ''reduction size'' at 128 or lower.

Quote

Timer: HPET (24.00 MHz)

Intel OpenCL 2.1
- Intel Core i5-8265U (8 CUs, 1600 MHz, OpenCL 2.0)
   Compiling OpenCL kernels ... done.

Calculating 500.000.000th digit of PI. 20 iterations.

Allocated device memory : 16.81 MB
Batch Size              : 1M
Reduction Size          : 1024 (Type: Default)

00h 00m 08.477s Batch 1 finished.
00h 00m 17.546s Batch 2 finished.
00h 00m 26.569s Batch 3 finished.
00h 00m 35.427s Batch 4 finished.
00h 00m 43.830s Batch 5 finished.
00h 00m 53.210s Batch 6 finished.
00h 01m 02.611s Batch 7 finished.
00h 01m 12.250s Batch 8 finished.
00h 01m 21.827s Batch 9 finished.
00h 01m 30.307s Batch 10 finished.
00h 01m 39.925s Batch 11 finished.
00h 01m 49.422s Batch 12 finished.
00h 01m 58.808s Batch 13 finished.
00h 02m 07.835s Batch 14 finished.
00h 02m 16.562s Batch 15 finished.
00h 02m 25.781s Batch 16 finished.
00h 02m 34.915s Batch 17 finished.
00h 02m 44.168s Batch 18 finished.
00h 02m 52.885s Batch 19 finished.
00h 03m 01.324s PI value output -> E62134265

Statistics:
Calculation + Reduction time: 166.699s + 14.470s

so what does it all mean that I do not get error when the reduction size is low, but i get errors on cpu and gpu when the reduction size is high ?

Edited October 22, 2019 by laine

_mat_ · October 23, 2019

@laine, thank you for reporting these problems.

32M on Intel iGPU: Have you tried smaller batch sizes? More than 500 MB of system memory might be more than Intel's OpenCL implementation can handle. It might be a driver problem as well, I will look into it. The Intel GPU OpenCL drivers are a bit nasty.

500M results: This is not caused by system instability, but rather a bug in GPUPI. The validation seems to be messed for some reason. Please use 100M or 1B, they should work fine.

Edited October 23, 2019 by _mat_

laine · October 23, 2019

yes I test my undervolting on 1b with 256 as ''reduction size'' for GPU and 128 for CPU to avoid the bug.

I noticed that my GPU passed 1b test with -120mV in throttlestop, but when I tested it on Cinebench 11.5.2.9, it crashed on the first car chase so I had to reduce my undervolting.

yosarianilives · October 23, 2019

Is there any plan/possibility to add non opencl support for graphics chips that have hardware support for double precision but not opnecl driver support? for example hd 3000 series supports double precision through ATI Stream but not OpenCl or Ivy and Haswell IGP both have double precision and support it through driver calls however there is no opencl driver for them.

_mat_ · October 24, 2019

18 hours ago, laine said:

yes I test my undervolting on 1b with 256 as ''reduction size'' for GPU and 128 for CPU to avoid the bug.

Just as I expected. The bigger reductions use lots of local shared memory that smaller GPUs normally can't handle. The kernel should actually fail but it seems like the Intel driver just returns nonsense. Maybe I can check if there is enough memory available beforehand to avoid the confusing error that GPUPI gives. That should be reserved for stability issues.

Thanks for your feedback!

18 hours ago, yosarianilives said:

Is there any plan/possibility to add non opencl support for graphics chips that have hardware support for double precision but not opnecl driver support? for example hd 3000 series supports double precision through ATI Stream but not OpenCl or Ivy and Haswell IGP both have double precision and support it through driver calls however there is no opencl driver for them.

I won't be touching the old ATI Stream stuff with a stick. I've worked with it many years ago and it's really really buggy and also very slow. Maybe it was just me, but I thought that GPUPI won't be possible at that time and gave up. As for DirectX and OpenGL compute shaders, that would be a possible way to enable support for these two iGPUs. I've put it on my list, right below Vulkan compute support, which is something I wanted to look into for some time now.

yosarianilives · October 24, 2019

19 minutes ago, _mat_ said:

I won't be touching the old ATI Stream stuff with a stick. I've worked with it many years ago and it's really really buggy and also very slow. Maybe it was just me, but I thought that GPUPI won't be possible at that time and gave up. As for DirectX and OpenGL compute shaders, that would be a possible way to enable support for these two iGPUs. I've put it on my list, right below Vulkan compute support, which is something I wanted to look into for some time now.

Would the same trick also work for those ATI Stream gpus as for intel? I know they wouldn't be very fast but neither is intel igp. Even on lake igp it takes days to get a 32b score.

laine · October 25, 2019

today or tomorrow I will update my intel video driver from DriverVer=06/27/2019,26.20.100.6999 to 26.20.100.7260 but first i will try to install opencl_runtime_18.1_x64_setup.msi to see if the bug is removed. Currently it says there is already a video driver installed so opencl_runtime_18.1_x64_setup.msi refuses to install.

I will use DDu to remove the old drivers cleanly.

laine · October 25, 2019

I did:

-go to safe made

-remove intel video driver (by Lenovo) with DDU

-reboot

-install opencl_runtime_18.1_x64_setup.msi [this time the exe does not say there is already a video driver installed]

-reboot

launch gpupi 3.3.3 with my CPU undervolted by throttlestop. THe GPU is not seen by GPUPI, so I can only test the CPU.

the test is OK at 256 as reduction size now. But it still fails at 1024 on the last iteration.

this is a success

Timer: HPET (24.00 MHz)

Intel CPU Runtime for OpenCL(TM) Applications 2.1
- Intel Core i5-8265U (8 CUs, 1600 MHz, OpenCL 2.0)
Compiling OpenCL kernels ... done.

Calculating 500.000.000th digit of PI. 20 iterations.

Allocated device memory : 1677.73 MB
Batch Size : 100M
Reduction Size : 256 (Type: Default)

00h 00m 06.166s Batch 1 finished.
00h 00m 12.857s Batch 2 finished.
00h 00m 19.663s Batch 3 finished.
00h 00m 26.269s Batch 4 finished.
00h 00m 33.288s Batch 5 finished.
00h 00m 40.228s Batch 6 finished.
00h 00m 47.160s Batch 7 finished.
00h 00m 53.992s Batch 8 finished.
00h 01m 00.794s Batch 9 finished.
00h 01m 07.881s Batch 10 finished.
00h 01m 14.882s Batch 11 finished.
00h 01m 21.898s Batch 12 finished.
00h 01m 28.801s Batch 13 finished.
00h 01m 35.542s Batch 14 finished.
00h 01m 42.706s Batch 15 finished.
00h 01m 49.758s Batch 16 finished.
00h 01m 56.839s Batch 17 finished.
00h 02m 03.809s Batch 18 finished.
00h 02m 10.610s Batch 19 finished.
00h 02m 17.781s PI value output -> E62134264

Statistics:
Calculation + Reduction time: 133.083s + 4.670s

this is a failure

Timer: HPET (24.00 MHz)

Intel CPU Runtime for OpenCL(TM) Applications 2.1
- Intel Core i5-8265U (8 CUs, 1600 MHz, OpenCL 2.0)
Compiling OpenCL kernels ... done.

Calculating 500.000.000th digit of PI. 20 iterations.

Allocated device memory : 1677.74 MB
Batch Size : 100M
Reduction Size : 512 (Type: Default)

00h 00m 07.227s Batch 1 finished.
00h 00m 14.616s Batch 2 finished.
00h 00m 21.839s Batch 3 finished.
00h 00m 29.001s Batch 4 finished.
00h 00m 36.756s Batch 5 finished.
00h 00m 44.083s Batch 6 finished.
00h 00m 51.527s Batch 7 finished.
00h 00m 59.383s Batch 8 finished.
00h 01m 06.507s Batch 9 finished.
00h 01m 13.793s Batch 10 finished.
00h 01m 20.965s Batch 11 finished.
00h 01m 28.095s Batch 12 finished.
00h 01m 35.130s Batch 13 finished.
00h 01m 42.028s Batch 14 finished.
00h 01m 49.345s Batch 15 finished.
00h 01m 56.534s Batch 16 finished.
00h 02m 03.675s Batch 17 finished.
00h 02m 10.738s Batch 18 finished.
00h 02m 17.606s Batch 19 finished.
00h 02m 24.904s PI value output -> E62134265

Statistics:
Calculation + Reduction time: 140.396s + 4.485s

yosarianilives · November 10, 2019

Is there a hard limit to how many threads gpupi can scale to? I found on 7742 that it was way faster smt off than on, like 55s smt on, 33s smt off.

Also had crash on save in 3.2 vs 3.3 that was fine but I think that's a hwinfo thing cause benchmate crashes on hwinfo initialization despite hwinfo itself running fine.

jab383 · November 10, 2019

The crash on save in 3.2 happens when hwinfo has not been initialized. It needs to be initialized just once each time a GPUPI window is opened. After that once, hwinfo can be turned off for slightly faster runs.

That score difference with high thread count could be fun to try to figure out.

Noxinite · November 10, 2019

2 hours ago, jab383 said:

The crash on save in 3.2 happens when hwinfo has not been initialized. It needs to be initialized just once each time a GPUPI window is opened. After that once, hwinfo can be turned off for slightly faster runs.

That score difference with high thread count could be fun to try to figure out.

Weird. Will have to see if that fixes the issues I was having on L3014.

yosarianilives · November 10, 2019

2 hours ago, jab383 said:

The crash on save in 3.2 happens when hwinfo has not been initialized. It needs to be initialized just once each time a GPUPI window is opened. After that once, hwinfo can be turned off for slightly faster runs.

That score difference with high thread count could be fun to try to figure out.

If I run anything but hwinfo disabled it crashes before run. Similar to how benchmate crashes on hwinfo initialization during open. I think the problem is with hwinfo initialization in the two apps and I'm not sure why, hwinfo opens fine and I even tried dropping the dlls from working hwinfo into the folders and it didn't work

yosarianilives · March 8, 2020

Anyone experience with this error? I get it on launch on both 3.2 and 3.3, only the legacy versions launch but obviously they suck for effi

yosarianilives · March 8, 2020

Using am3 cpus and opencl 1.2

cbjaust · March 8, 2020

OpenCL is not installed properly or doesn't support CPU.

Bring on GPUPI v4!

cbjaust · April 5, 2020

GPUPI 3.2 and GPUPI 3.2 (Legacy) both crash silently when trying to save the result file or send directly to hwbot.

Info:

ROG Strix X370-F Gaming | Athlon 3000G | Vega 3 Graphics (APU integrated)
Windows 10 1909 (OS Build 18363.720)
AMD Chipset Drivers 1.09.27.1033

Tries latest chipset drivers but still the same outcome. Also can't use Benchmate (10.5) because the GPU is not detected.

Is there something simple I am missing here?

Thanks.

_mat_ · April 5, 2020

4 hours ago, cbjaust said:

Tries latest chipset drivers but still the same outcome. Also can't use Benchmate (10.5) because the GPU is not detected.

This is already fixed and will be released with BenchMate 0.11. Intel and AMD iGPUs will be correctly measured.

4 hours ago, cbjaust said:

Is there something simple I am missing here?

The lesson here is, that it's not a good idea to put any detection code or submission implementation into a specific benchmark version. There are too many moving pieces at work, it will fail eventually. That's why BenchMate's new infrastructure is so important. A benchmark should focus on its workload.

GPUPI 4 will not use HWiNFO or provide any form of submission. Instead it natively integrates BenchMate.

This is the way.

Edited April 5, 2020 by _mat_

cbjaust · April 5, 2020

So no suggestions to fix GPUPI 3.2 crashing while saving the result. GPUPI 3.3.3 works fine but, you know, no hardware points.

_mat_ · April 5, 2020

Sadly unfixable, it's a hardware detection error.

GPUPI - SuperPI on the GPU

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

_mat_

Leeghoofd

Splave

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation