Jump to content
HWBOT Community Forums

GPUPI - SuperPI on the GPU


_mat_

Recommended Posts

On 1/29/2019 at 8:03 AM, Gunslinger said:

Anyone have any tips for getting 2080 Ti to go to 3D clocks for GPUPI?

Single card and SLI are both staying at low power clocks despite OS settings to high performance.

 

Have you checked to make sure the CUDA - Force P2 State flag is set to Off/False in the driver?  Should be the default value on consumer Turing cards AFAIK, but it might have gotten changed somehow.

CudaP2.PNG

Edited by niobium615
  • Thanks 2
Link to comment
Share on other sites

41 minutes ago, Gunslinger said:

my settings match that screen grab.

 

So ThermSpy is working fine as far as forcing 3d clocks, but as soon as I open GPUPI, they revert to 2D mode

 

3.2, 3.3 and 3.2 legacy are all acting this way.  I have no idea how to fix it.

 

 

If they match the Screenshot, then you have CUDA P2 states enabled. You want them disabled. 

P2 ON drops the clocks for better stability for critical calculations. For max Clocks, P2 should be OFF.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  • 8 months later...

hello

 

I seek help about gpupi and invalid result. I am very new to modifying the clocks but since I am on a laptop, my goal is really undervolting in order to reduce the heat. I have Microsoft Windows [version 10.0.17763.805]

I have a thinkpad and the cpu is core i5 8265u, intel driver is   26.20.100.6999  f or    Intel(R) UHD Graphics 620    dated  27/06/2019  . Currently nothing is undervolted and the invalidity is irrespective of the  ''power scheme'' by windows, which is currently ''high performance''.

THe driver in gpupi is ''opencl 2.1'' by intel

I run benchmate 0.9.2 to have all the benchmarks available easily and I run gpupi 3.3.3 or 3.2.

However when I select ''reduction size'' for pure GPU test,  as 512 or 1024, I always get

Quote

 

Timer: HPET (24.00 MHz)

Intel OpenCL 2.1
- Intel UHD Graphics 620 (24 CUs, 1100 MHz, OpenCL 2.0)
   Compiling OpenCL kernels ... done.

Calculating 32.000.000th digit of PI. 4 iterations.

 Allocated device memory : 536.89 MB
 Batch Size              : 32M
 Reduction Size          : 512 (Type: Default)

 00h 00m 00.000s Batch  1 finished.

Error: Invalid partial result by a margin of 0.445571395913538

Calculation aborted due to an invalid partial result.

 

 

 

 

irrespective of the ''batch size''. I have no error when ''reduction size''  is 256 or lower. I have no error with CPU for any ''reduction size'' .

since I already have the latest drivers offered by lenovo, does it mean my chip is bad ?

 

Can i be sure that my future undervolting is safe when I stick to 256 as ''reduction size'' and  the result is valid in gpupi ?

 

Edited by laine
Link to comment
Share on other sites

actually even on stock CPU, before any undervolting, I get the error at 0,5B, always on the 20th iteration, with 1024 as ''reduction size''. no error with ''reduction size'' at 128 or lower.


 

Quote

 

Timer: HPET (24.00 MHz)

Intel OpenCL 2.1
- Intel Core i5-8265U (8 CUs, 1600 MHz, OpenCL 2.0)
   Compiling OpenCL kernels ... done.

Calculating 500.000.000th digit of PI. 20 iterations.

 Allocated device memory : 16.81 MB
 Batch Size              : 1M
 Reduction Size          : 1024 (Type: Default)

 00h 00m 08.477s Batch  1 finished.
 00h 00m 17.546s Batch  2 finished.
 00h 00m 26.569s Batch  3 finished.
 00h 00m 35.427s Batch  4 finished.
 00h 00m 43.830s Batch  5 finished.
 00h 00m 53.210s Batch  6 finished.
 00h 01m 02.611s Batch  7 finished.
 00h 01m 12.250s Batch  8 finished.
 00h 01m 21.827s Batch  9 finished.
 00h 01m 30.307s Batch 10 finished.
 00h 01m 39.925s Batch 11 finished.
 00h 01m 49.422s Batch 12 finished.
 00h 01m 58.808s Batch 13 finished.
 00h 02m 07.835s Batch 14 finished.
 00h 02m 16.562s Batch 15 finished.
 00h 02m 25.781s Batch 16 finished.
 00h 02m 34.915s Batch 17 finished.
 00h 02m 44.168s Batch 18 finished.
 00h 02m 52.885s Batch 19 finished.
 00h 03m 01.324s PI value output -> E62134265

Statistics:
 Calculation + Reduction time: 166.699s + 14.470s

 

so what does it all mean that I do not get error when the reduction size is low, but i get errors on cpu and gpu when the reduction size is high ?

Edited by laine
Link to comment
Share on other sites

@laine, thank you for reporting these problems.

32M on Intel iGPU: Have you tried smaller batch sizes? More than 500 MB of system memory might be more than Intel's OpenCL implementation can handle. It might be a driver problem as well, I will look into it. The Intel GPU OpenCL drivers are a bit nasty.

500M results: This is not caused by system instability, but rather a bug in GPUPI. The validation seems to be messed for some reason. Please use 100M or 1B, they should work fine.

Edited by _mat_
Link to comment
Share on other sites

yes I test my undervolting on 1b with 256 as ''reduction size'' for GPU and 128 for CPU to avoid the bug.

 

I noticed that my GPU passed 1b test with -120mV in throttlestop, but when I tested it on Cinebench 11.5.2.9, it crashed on the first car chase so I had to reduce my undervolting.

Link to comment
Share on other sites

Is there any plan/possibility to add non opencl support for graphics chips that have hardware support for double precision but not opnecl driver support? for example hd 3000 series supports double precision through ATI Stream but not OpenCl or Ivy and Haswell IGP both have double precision and support it through driver calls however there is no opencl driver for them.

Link to comment
Share on other sites

18 hours ago, laine said:

yes I test my undervolting on 1b with 256 as ''reduction size'' for GPU and 128 for CPU to avoid the bug.

Just as I expected. The bigger reductions use lots of local shared memory that smaller GPUs normally can't handle. The kernel should actually fail but it seems like the Intel driver just returns nonsense. Maybe I can check if there is enough memory available beforehand to avoid the confusing error that GPUPI gives. That should be reserved for stability issues.

Thanks for your feedback!

18 hours ago, yosarianilives said:

Is there any plan/possibility to add non opencl support for graphics chips that have hardware support for double precision but not opnecl driver support? for example hd 3000 series supports double precision through ATI Stream but not OpenCl or Ivy and Haswell IGP both have double precision and support it through driver calls however there is no opencl driver for them.

I won't be touching the old ATI Stream stuff with a stick. I've worked with it many years ago and it's really really buggy and also very slow. Maybe it was just me, but I thought that GPUPI won't be possible at that time and gave up. As for DirectX and OpenGL compute shaders, that would be a possible way to enable support for these two iGPUs. I've put it on my list, right below Vulkan compute support, which is something I wanted to look into for some time now.

Link to comment
Share on other sites

19 minutes ago, _mat_ said:

I won't be touching the old ATI Stream stuff with a stick. I've worked with it many years ago and it's really really buggy and also very slow. Maybe it was just me, but I thought that GPUPI won't be possible at that time and gave up. As for DirectX and OpenGL compute shaders, that would be a possible way to enable support for these two iGPUs. I've put it on my list, right below Vulkan compute support, which is something I wanted to look into for some time now.

Would the same trick also work for those ATI Stream gpus as for intel? I know they wouldn't be very fast but neither is intel igp. Even on lake igp it takes days to get a 32b score.

Link to comment
Share on other sites

today or tomorrow I will update my intel video driver from DriverVer=06/27/2019,26.20.100.6999 to 26.20.100.7260 but first i will try to install opencl_runtime_18.1_x64_setup.msi to see if the bug is removed. Currently it says there is already a video driver installed so opencl_runtime_18.1_x64_setup.msi refuses to install.

I will use DDu to remove the old drivers cleanly.

Link to comment
Share on other sites

I did:

-go to safe made

-remove intel video driver (by Lenovo) with DDU

-reboot

-install opencl_runtime_18.1_x64_setup.msi [this time the exe does not say there is already a  video driver installed]

-reboot

launch gpupi 3.3.3 with my CPU undervolted by throttlestop. THe GPU is not seen by GPUPI, so I can only test the CPU.

 

the test is OK at 256 as reduction size now. But it still fails at 1024 on the last iteration.

 

 

this is a success

Timer: HPET (24.00 MHz)

Intel CPU Runtime for OpenCL(TM) Applications 2.1
- Intel Core i5-8265U (8 CUs, 1600 MHz, OpenCL 2.0)
   Compiling OpenCL kernels ... done.

Calculating 500.000.000th digit of PI. 20 iterations.

 Allocated device memory : 1677.73 MB
 Batch Size              : 100M
 Reduction Size          : 256 (Type: Default)

 00h 00m 06.166s Batch  1 finished.
 00h 00m 12.857s Batch  2 finished.
 00h 00m 19.663s Batch  3 finished.
 00h 00m 26.269s Batch  4 finished.
 00h 00m 33.288s Batch  5 finished.
 00h 00m 40.228s Batch  6 finished.
 00h 00m 47.160s Batch  7 finished.
 00h 00m 53.992s Batch  8 finished.
 00h 01m 00.794s Batch  9 finished.
 00h 01m 07.881s Batch 10 finished.
 00h 01m 14.882s Batch 11 finished.
 00h 01m 21.898s Batch 12 finished.
 00h 01m 28.801s Batch 13 finished.
 00h 01m 35.542s Batch 14 finished.
 00h 01m 42.706s Batch 15 finished.
 00h 01m 49.758s Batch 16 finished.
 00h 01m 56.839s Batch 17 finished.
 00h 02m 03.809s Batch 18 finished.
 00h 02m 10.610s Batch 19 finished.
 00h 02m 17.781s PI value output -> E62134264

Statistics:
 Calculation + Reduction time: 133.083s + 4.670s

 

 

this is a failure

 

Timer: HPET (24.00 MHz)

Intel CPU Runtime for OpenCL(TM) Applications 2.1
- Intel Core i5-8265U (8 CUs, 1600 MHz, OpenCL 2.0)
   Compiling OpenCL kernels ... done.

Calculating 500.000.000th digit of PI. 20 iterations.

 Allocated device memory : 1677.74 MB
 Batch Size              : 100M
 Reduction Size          : 512 (Type: Default)

 00h 00m 07.227s Batch  1 finished.
 00h 00m 14.616s Batch  2 finished.
 00h 00m 21.839s Batch  3 finished.
 00h 00m 29.001s Batch  4 finished.
 00h 00m 36.756s Batch  5 finished.
 00h 00m 44.083s Batch  6 finished.
 00h 00m 51.527s Batch  7 finished.
 00h 00m 59.383s Batch  8 finished.
 00h 01m 06.507s Batch  9 finished.
 00h 01m 13.793s Batch 10 finished.
 00h 01m 20.965s Batch 11 finished.
 00h 01m 28.095s Batch 12 finished.
 00h 01m 35.130s Batch 13 finished.
 00h 01m 42.028s Batch 14 finished.
 00h 01m 49.345s Batch 15 finished.
 00h 01m 56.534s Batch 16 finished.
 00h 02m 03.675s Batch 17 finished.
 00h 02m 10.738s Batch 18 finished.
 00h 02m 17.606s Batch 19 finished.
 00h 02m 24.904s PI value output -> E62134265

Statistics:
 Calculation + Reduction time: 140.396s + 4.485s

Link to comment
Share on other sites

  • 3 weeks later...

The crash on save in 3.2 happens when hwinfo has not been initialized.  It needs to be initialized just once each time a GPUPI window is opened.  After that once, hwinfo can be turned off for slightly faster runs.

That score difference with high thread count could be fun to try to figure out.

Link to comment
Share on other sites

2 hours ago, jab383 said:

The crash on save in 3.2 happens when hwinfo has not been initialized.  It needs to be initialized just once each time a GPUPI window is opened.  After that once, hwinfo can be turned off for slightly faster runs.

That score difference with high thread count could be fun to try to figure out.

Weird. Will have to see if that fixes the issues I was having on L3014.

Link to comment
Share on other sites

2 hours ago, jab383 said:

The crash on save in 3.2 happens when hwinfo has not been initialized.  It needs to be initialized just once each time a GPUPI window is opened.  After that once, hwinfo can be turned off for slightly faster runs.

That score difference with high thread count could be fun to try to figure out.

If I run anything but hwinfo disabled it crashes before run. Similar to how benchmate crashes on hwinfo initialization during open. I think the problem is with hwinfo initialization in the two apps and I'm not sure why, hwinfo opens fine and I even tried dropping the dlls from working hwinfo into the folders and it didn't work

Link to comment
Share on other sites

  • 3 months later...
  • 4 weeks later...

GPUPI 3.2 and GPUPI 3.2 (Legacy) both crash silently when trying to save the result file or send directly to hwbot.

Info:

ROG Strix X370-F Gaming | Athlon 3000G | Vega 3 Graphics (APU integrated)
Windows 10 1909 (OS Build 18363.720)
AMD Chipset Drivers 1.09.27.1033
 

Tries latest chipset drivers but still the same outcome. Also can't use Benchmate (10.5) because the GPU is not detected.

Is there something simple I am missing here?

Thanks.

Link to comment
Share on other sites

4 hours ago, cbjaust said:

Tries latest chipset drivers but still the same outcome. Also can't use Benchmate (10.5) because the GPU is not detected.

This is already fixed and will be released with BenchMate 0.11. Intel and AMD iGPUs will be correctly measured.

4 hours ago, cbjaust said:

Is there something simple I am missing here?

The lesson here is, that it's not a good idea to put any detection code or submission implementation into a specific benchmark version. There are too many moving pieces at work, it will fail eventually. That's why BenchMate's new infrastructure is so important. A benchmark should focus on its workload.

GPUPI 4 will not use HWiNFO or provide any form of submission. Instead it natively integrates BenchMate.

This is the way.

Edited by _mat_
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...