GPUPI - SuperPI on the GPU - Page 19 - Benchmark software

July 17, 20168 yr

Author

The end validation has not changed at all. But if you are using CUDA, the kernels have changed so it will be harder on your GPUs. Try to reduce oc or increase voltage/cooling.

Quote

August 11, 20168 yr

Hello im having a bad day with gpupi.

pops up this message: Error synchronizing device after kernel execution

any hints for solution, drivers reinstalled etc. No can do. 6700k&ranger & gt710 pcie card

Maybe ill just swap to system for gpupi bench

Quote

August 11, 20168 yr

Author

That's a CUDA error, that happens directly after the calculation kernel when waiting for the GPU to return the data. Something like this occurs for example when there was something wrong with the memory (read or write in unallocated areas). Is the card heavily overclocked? Are you using high batch and reduction sizes? Try stock clocks and the lowest sizes and see if that's the problem.

Btw, you should have also gotten a detailed error message in square brackets right next to the error you posted. Please let me know what it is.

Quote

August 11, 20168 yr

Hello im having a bad day with gpupi.

pops up this message: Error synchronizing device after kernel execution

any hints for solution, drivers reinstalled etc. No can do. 6700k&ranger & gt710 pcie card

Maybe ill just swap to system for gpupi bench

To run 1b on a gt 710 you need to set batch size below 10m or it will fail at loop 4

Quote

November 2, 20168 yr

Since I can't flag my own submission, I noticed something about GPU PI 2.3.4....

Any particular reasons why the data file puts my Phenom II X4 955 BE as a non-BE? I've gone in and edited the submission so it's showing as a Black Edition, but is this a limitation of GPU Pi in and of itself, or?

WhiteWulfe`s GPUPI for CPU - 1B score: 20min 19sec 331ms with a Phenom II X4 955 for my submission in question.

EDIT: Actually, it won't let me edit it to a Black Edition, just winds up sitting at Apache Tomcat/7.0.59 - Error report and doesn't go back to my submission.

Quote

November 4, 20168 yr

Administrators

Moved the post to the GPUPI thread, maybe _mat_ can explain or help

Quote

November 4, 20168 yr

Moved the post to the GPUPI thread, maybe _mat_ can explain or help

It would be nice to know... I also didn't know about this thread

On the plus side, the other bit about not being to edit it.... I was able to edit my 1B and 100M score afterwards so it's the correct processor. Still annoying that it registers as the incorrect processor, and then the server wouldn't let me manually correct it to a Black Edition.

Quote

November 4, 20168 yr

Administrators

It would be nice to know... I also didn't know about this thread

On the plus side, the other bit about not being to edit it.... I was able to edit my 1B and 100M score afterwards so it's the correct processor. Still annoying that it registers as the incorrect processor, and then the server wouldn't let me manually correct it to a Black Edition.

I checked the sub, it is shown as Phenom II X4 955 BE for me... is this different for you? I donÂ´t think so, edit worked, maybe it can be fixed so that edit is no need in the future

WhiteWulfe`s GPUPI for CPU - 1B score: 20min 19sec 331ms with a Phenom II X4 955 BE

Quote

November 4, 20168 yr

I checked the sub, it is shown as Phenom II X4 955 BE for me... is this different for you? I donÂ´t think so, edit worked, maybe it can be fixed so that edit is no need in the future

WhiteWulfe`s GPUPI for CPU - 1B score: 20min 19sec 331ms with a Phenom II X4 955 BE

.... .....I just said I was finally able to edit it. Wasn't able to when it was originally submitted, hence why I had created the thread in the first place. GPU Pi is identifying my processor as a 955 non-black edition for some reason.

I can resubmit the datafile as a no-points submission if that helps.

Quote

November 4, 20168 yr

Administrators

I am not sure this will help, because cpuz identifies cpu as non be as well and this problem seems to be as old as the cpu. Stepping and all this are recognized the same on BE and non BE

CPU-Z Validator 3.1

Quote

November 4, 20168 yr

Author

The name of the device is retrieved via the opencl driver, which normally just takes the CPUID brand string as it is shown in CPU-Z, a hardcoded value inside the CPU. GPUPI removes various prefixes and postfixes to be able to submit the result to HWBOT.

Quote

June 25, 20177 yr

I have a question and before someone says "search" or it was discussed already in this thread I fully read from the last page in this thread all the way to page 30 and then fell asleep at my keyboard and now just woke up and decided to ask my question anyways at risk of being flamed or moderated - so here goes:

My understanding is that as long as I have a skylake/kabylake processor AND I'm on Windows 10 then I'm not supposed to be forced to set the HPET settings in the operating system or the BIOS since my system isn't "bugged". When I run GPUPI there is no way for me to avoid setting it in the O/S or it complains about the HPET timer not being set.

How can I bypass this? If I can't because I'm totally misunderstanding the HPET bug thingy please feel free to re-educate me... right after I get a coffee. ;-)

Marco

Quote

June 25, 20177 yr

You misunderstand.

HPET needs to be on no matter what OS is involved.

There is no HPET bug. There is a RTC bug.

Edited June 25, 20177 yr by Mr.Scott

Quote

June 25, 20177 yr

Ok, got it... so it's not actually a timer bug. Got it. So I guess I'm still a little perplexed because those of us on Skylake/KabyLake using Windows 10 aren't affected by this bug so once again my question is why do I need to change anything for this particular benchmark?

- Marco

Quote

June 25, 20177 yr

Because for the bench to run, it needs the High Precision Event Timer. That's the way the bench was made. If it's not on, the bench doesn't work. It's that simple.

Quote

June 25, 20177 yr

Author

Windows 8 and above are effected by the RTC skewing bug when bclock is changed in Windows. I don't think that Skylake and Kaby Lake are any exception to this rule, but I haven't tested it myself yet.

To circumvent HPET you have to use Windows 7.

Edit: Rules of HWBOT allow the legacy benchmarks on SL and KL, so I guess it has been tested and it's not affecting the RTC timer.

Well, with the next version GPUPI I will remove the HPET restriction on SL and KL.

Edited June 25, 20177 yr by _mat_

Quote

June 26, 20177 yr

Wow, Mat - thanks so much for responding and I can't speak for everyone but thank you for all the work you do for the community and I look forward to the next version :-)

- Marco

Quote

June 26, 20177 yr

Author

Thank you for your kind words, very much appreciated.

Quote

June 27, 20177 yr

Windows 8 and above are effected by the RTC skewing bug when bclock is changed in Windows. I don't think that Skylake and Kaby Lake are any exception to this rule, but I haven't tested it myself yet.

To circumvent HPET you have to use Windows 7.

Edit: Rules of HWBOT allow the legacy benchmarks on SL and KL, so I guess it has been tested and it's not affecting the RTC timer.

Well, with the next version GPUPI I will remove the HPET restriction on SL and KL.

Intel introduced a separate BCLK clockgen on SKL which circumvents the RTC bug :celebration:.

Quote

June 28, 20177 yr

Author

A little preview of some of the features of GPUPI 3.0 for my fellow overclockers. Command line version for Windows:

attachment.php?attachmentid=223510

Autoselection of the compute platform, Batch Size and Reduction Size depending by prebenching it for the user:

$ ./GPUPI_x64 -c -d 100M
GPUPI 3.0 (64 bit)

API: OpenCL GPU with 1 devices

API: OpenCL CPU with 2 devices

API: CUDA with 1 devices

Testing device: OpenCL CPU -> Intel® OpenCL -> Intel Core i7-6950X

=> 1M, 16: 2.294076 (Kernel: 2.250068, Reduction: 0.043280)

=> 1M, 32: 2.248263 (Kernel: 2.209528, Reduction: 0.037883)

=> 1M, 64: 2.270596 (Kernel: 2.231809, Reduction: 0.038067)

=> 1M, 128: 2.245034 (Kernel: 2.207602, Reduction: 0.036715)

=> 1M, 256: 2.279390 (Kernel: 2.229491, Reduction: 0.049113)

=> 1M, 512: 2.266061 (Kernel: 2.193988, Reduction: 0.071337)

=> 2M, 16: 2.315099 (Kernel: 2.236380, Reduction: 0.078018)

=> 2M, 32: 2.288076 (Kernel: 2.219284, Reduction: 0.068005)

=> 2M, 64: 2.287389 (Kernel: 2.226804, Reduction: 0.059873)

=> 2M, 128: 2.249376 (Kernel: 2.191177, Reduction: 0.057482)

=> 2M, 256: 2.283105 (Kernel: 2.215427, Reduction: 0.066962)

=> 2M, 512: 2.254495 (Kernel: 2.194912, Reduction: 0.058892)

=> 4M, 16: 2.307497 (Kernel: 2.218491, Reduction: 0.088419)

=> 4M, 32: 2.260795 (Kernel: 2.183106, Reduction: 0.077162)

=> 4M, 64: 2.304972 (Kernel: 2.238267, Reduction: 0.066159)

=> 4M, 128: 2.255765 (Kernel: 2.196260, Reduction: 0.058924)

=> 4M, 256: 2.277544 (Kernel: 2.209126, Reduction: 0.067898)

=> 4M, 512: 2.249683 (Kernel: 2.191406, Reduction: 0.057736)

=> 5M, 16: 2.304984 (Kernel: 2.217214, Reduction: 0.087279)

=> 5M, 32: 2.265134 (Kernel: 2.187128, Reduction: 0.077524)

=> 5M, 64: 2.279445 (Kernel: 2.212483, Reduction: 0.066463)

=> 5M, 128: 2.238783 (Kernel: 2.180829, Reduction: 0.057460)

=> 5M, 256: 2.299566 (Kernel: 2.231994, Reduction: 0.067090)

=> 5M, 512: 2.267714 (Kernel: 2.197324, Reduction: 0.069908)

=> 10M, 16: 2.311983 (Kernel: 2.226683, Reduction: 0.084900)

=> 10M, 32: 2.271478 (Kernel: 2.194653, Reduction: 0.076431)

=> 10M, 64: 2.261646 (Kernel: 2.190358, Reduction: 0.070862)

=> 10M, 128: 2.238901 (Kernel: 2.181579, Reduction: 0.056898)

=> 10M, 256: 2.278743 (Kernel: 2.215327, Reduction: 0.062982)

=> 10M, 512: 2.271698 (Kernel: 2.204813, Reduction: 0.066495)

=> 20M, 16: 2.316387 (Kernel: 2.224665, Reduction: 0.091349)

=> 20M, 32: 2.264053 (Kernel: 2.185630, Reduction: 0.078063)

=> 20M, 64: 2.302941 (Kernel: 2.229473, Reduction: 0.073087)

=> 20M, 128: 2.256671 (Kernel: 2.193457, Reduction: 0.062854)

=> 20M, 256: 2.256185 (Kernel: 2.194374, Reduction: 0.061453)

=> 20M, 512: 2.239121 (Kernel: 2.177762, Reduction: 0.061003)

=> 100M, 16: 2.351734 (Kernel: 2.219785, Reduction: 0.131625)

=> 100M, 32: 2.284867 (Kernel: 2.182241, Reduction: 0.102328)

=> 100M, 64: 2.331753 (Kernel: 2.241809, Reduction: 0.089634)

=> 100M, 128: 2.314707 (Kernel: 2.239817, Reduction: 0.074468)

=> 100M, 256: 2.272911 (Kernel: 2.204067, Reduction: 0.068538)

=> 100M, 512: 2.262099 (Kernel: 2.198868, Reduction: 0.062920)

Testing device: OpenCL CPU -> Experimental OpenCL 2.1 CPU Only Platform -> Intel Core i7-6950X

=> 1M, 16: 2.256272 (Kernel: 2.208813, Reduction: 0.046615)

=> 1M, 32: 2.259307 (Kernel: 2.213463, Reduction: 0.045005)

=> 1M, 64: 2.261296 (Kernel: 2.217321, Reduction: 0.043054)

=> 1M, 128: 2.255290 (Kernel: 2.210472, Reduction: 0.043979)

=> 1M, 256: 2.276476 (Kernel: 2.228651, Reduction: 0.046952)

=> 1M, 512: 2.290972 (Kernel: 2.212072, Reduction: 0.078073)

=> 2M, 16: 2.281812 (Kernel: 2.198045, Reduction: 0.082942)

=> 2M, 32: 2.257342 (Kernel: 2.186708, Reduction: 0.069812)

=> 2M, 64: 2.274022 (Kernel: 2.210600, Reduction: 0.062644)

=> 2M, 128: 2.242322 (Kernel: 2.182171, Reduction: 0.059333)

=> 2M, 256: 2.284035 (Kernel: 2.214316, Reduction: 0.068943)

=> 2M, 512: 2.245358 (Kernel: 2.182079, Reduction: 0.062468)

=> 4M, 16: 2.314692 (Kernel: 2.223573, Reduction: 0.090524)

=> 4M, 32: 2.287976 (Kernel: 2.207894, Reduction: 0.079501)

=> 4M, 64: 2.280830 (Kernel: 2.212408, Reduction: 0.067748)

=> 4M, 128: 2.246577 (Kernel: 2.185500, Reduction: 0.060417)

=> 4M, 256: 2.267059 (Kernel: 2.196838, Reduction: 0.069591)

=> 4M, 512: 2.245285 (Kernel: 2.185138, Reduction: 0.059543)

=> 5M, 16: 2.297634 (Kernel: 2.208198, Reduction: 0.088819)

=> 5M, 32: 2.252606 (Kernel: 2.173261, Reduction: 0.078811)

=> 5M, 64: 2.289753 (Kernel: 2.219699, Reduction: 0.069525)

=> 5M, 128: 2.241125 (Kernel: 2.181959, Reduction: 0.058589)

=> 5M, 256: 2.272509 (Kernel: 2.203694, Reduction: 0.068293)

=> 5M, 512: 2.255514 (Kernel: 2.184216, Reduction: 0.070740)

=> 10M, 16: 2.283480 (Kernel: 2.197331, Reduction: 0.085667)

=> 10M, 32: 2.259312 (Kernel: 2.181606, Reduction: 0.077262)

=> 10M, 64: 2.273700 (Kernel: 2.201239, Reduction: 0.071997)

=> 10M, 128: 2.239782 (Kernel: 2.180927, Reduction: 0.058395)

=> 10M, 256: 2.288214 (Kernel: 2.223593, Reduction: 0.064161)

=> 10M, 512: 2.275962 (Kernel: 2.210551, Reduction: 0.064933)

=> 20M, 16: 2.298107 (Kernel: 2.206268, Reduction: 0.091459)

=> 20M, 32: 2.282686 (Kernel: 2.204328, Reduction: 0.077989)

=> 20M, 64: 2.264337 (Kernel: 2.189890, Reduction: 0.074074)

=> 20M, 128: 2.261340 (Kernel: 2.197828, Reduction: 0.063139)

=> 20M, 256: 2.254334 (Kernel: 2.191051, Reduction: 0.062873)

=> 20M, 512: 2.261168 (Kernel: 2.199776, Reduction: 0.060975)

=> 100M, 16: 2.363773 (Kernel: 2.221893, Reduction: 0.141445)

=> 100M, 32: 2.325590 (Kernel: 2.220586, Reduction: 0.104667)

=> 100M, 64: 2.281907 (Kernel: 2.196272, Reduction: 0.085290)

=> 100M, 128: 2.290375 (Kernel: 2.214583, Reduction: 0.075452)

=> 100M, 256: 2.274559 (Kernel: 2.203661, Reduction: 0.070590)

=> 100M, 512: 2.287639 (Kernel: 2.223986, Reduction: 0.063316)

Best device found: OpenCL CPU -> Intel® OpenCL -> Intel Core i7-6950X with 5M, 128.

Timer: HPET (14.32 MHz)

Init HWiNFO: Ok

OpenCL CPU: Intel Core i7-6950X (20 CUs, 3000 MHz)

Compiling OpenCL kernels ... done.

Calculating 100.000.000th digit of PI. 20 iterations.

Allocated device memory : 83.89 MB

Batch Size : 5M

Reduction Size : 128

00h 00m 00.480s Batch 1 finished.

00h 00m 00.945s Batch 2 finished.

00h 00m 01.403s Batch 3 finished.

00h 00m 01.850s Batch 4 finished.

00h 00m 02.263s Batch 5 finished.

00h 00m 02.734s Batch 6 finished.

00h 00m 03.201s Batch 7 finished.

00h 00m 03.649s Batch 8 finished.

00h 00m 04.089s Batch 9 finished.

00h 00m 04.502s Batch 10 finished.

00h 00m 04.980s Batch 11 finished.

00h 00m 05.450s Batch 12 finished.

00h 00m 05.915s Batch 13 finished.

00h 00m 06.367s Batch 14 finished.

00h 00m 06.784s Batch 15 finished.

00h 00m 07.257s Batch 16 finished.

00h 00m 07.724s Batch 17 finished.

00h 00m 08.187s Batch 18 finished.

00h 00m 08.639s Batch 19 finished.

00h 00m 09.055s PI value output -> CB840E219

Highest clocks measured:

CPU: 3800.11 MHz

GPU: 202.50 MHz

GPU memory: 101.25 MHz

Statistics:

Calculation + Reduction time: 8.822s + 0.231s

PI calculation is done!

Quote

June 28, 20177 yr

Can't wait!

Quote

July 2, 20177 yr

Quick question, having an issue with GPUPI. I'm doing 32B runs on a 1080Ti, they'll complete just fine and the checksum appears to be right. If I save the validation file, I get the normal message saying that the file was saved successfully. However, if I try to immediately validate that same file, I get the following error: "The result file was successfully decrypted, but the data is invalid[invalid XML data]". Tried validating the file on another computer to no avail, any ideas?

Quote

July 2, 20177 yr

Author

Check GPUPI.log there should be an extended error description which XML node makes the file invalid. Please post it here, might be a bug in the validation.

Quote

July 2, 20177 yr

Check GPUPI.log there should be an extended error description which XML node makes the file invalid. Please post it here, might be a bug in the validation.

Here's the relevant data from the log file.

Error while decrypting output: StreamTransformationFilter: invalid PKCS #7 block padding found
Error while decrypting output: StreamTransformationFilter: invalid PKCS #7 block padding found

XML validation: submission node not found!

Message box: The result file was successfully decrypted, but the data is invalid [invalid XML data] (Validation result)

Thanks for your help.

Quote

July 2, 20177 yr

Author

The submission node is the root node of the HWBOT validation file. But the errors before that already make it very clear, that the data could not be decrypted correctly. Sry, that's bad news. Seems like the file was damaged at some point.

Quote

Sign In

GPUPI - SuperPI on the GPU

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

_mat_

Leeghoofd

Splave

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

_mat_

Leeghoofd

Splave

Posted Images

Account

Navigation

Search