Jump to content
HWBOT Community Forums

GPUPI - SuperPI on the GPU


_mat_

Recommended Posts

Too late for that, already having 3rd batch (8h, 30 something minutes) ... and it look suspicious, when the time distances between batches vary, as I will add another slowing-down program :)

 

But I have better ideas, the bench can run on notably slower hardware, causing it to choke a bit (or a lot?) ... like on another machine running Aquamark for 24h now, having only drawn little over 1/10 from the 5200 frames :D

Link to comment
Share on other sites

Slow it down by running another benchmark in parallel. ;)

 

Problem is loops will look irregular, people will report for nothing. I had 70hours with just 7 loops, 3 loops with SPi32 (like 15hrs/loop) and 4 without it ("normal" 4-5hours/loop). Decided to cancel it since everyone is sandbagging it anyway.

Link to comment
Share on other sites

GPUPI 2.2 is out now! Thanks to the feedback from you guys, I was able to fix a lot of things and improve many others. It will be mandatory to use this version in the future, because there are some important changes to make the bench even more bullet proof. But we will wait until the the currently started competitions have finished, including the Team Cup of course.

 

Last but not least I would like to talk about our next plans with GPUPI. Thanks to HWBOT I can integrate CPU-Z into the next version, which will improve hardware detection a lot and allows several frequencies and even voltages to be automatically submitted to the database. Additionally we are going to include support for HWBOT competitions directly into the online submission dialog. I already have a prototype working, but I didn't want to rush anything. :)

 

Full changelog and download here: https://www.overclockers.at/news/gpupi-2-2-english

 

Reporting that it actually solved the problem with time going out of sync, well done :)

Link to comment
Share on other sites

Problem is loops will look irregular, people will report for nothing.

 

Well, then did not change the load on the run :D:P Right now, mine seems to be relatively consistent:

 

1 - 2:46 (166min)

2 - 5:32 (332min - exactly 166 per loop)

3 - 8:26 (506min - exactly 168.666 per loop)

4 - 11:47 (707min - exactly 176.75 per loop)

 

...looks like the times are slightly increasing, but all in all it seems very well consistent - IMHO.

 

But maybe you get the irregularity because you did not yet compiled the two Aquamark scores for me :o That sort of defy the purpose of having setup another superslow run... :(

 

(BTW and in Aquamark, there is clearly irregularity in the speed the frames increase between various tests. At first it run well, but now it turned into real snail...)

Link to comment
Share on other sites

LOL ... interesting. Why then the fails? Mine load seems quite regular and no sudden jumps are seen... but maybe things get different on different HW? Who knows. Noctua just delivered the support for S775/S1366 ...and one more socket I forget, so I install cooler on the ASRock 775Dual-VSTA and try make the GPUPI work with really low-end HW and we see, what will happen.

Link to comment
Share on other sites

I speak too soon.

At the end of batch 4 was the test yesterday and today it did not move a single bit...! Still at batch 5, so...

 

That means that there will be much longer loop (just as GENiEBEN say, 15h per loop suddently) and all my "quite regular" loops claims are gone :eek:

(or more precisely are valid up to the loop 4...)

 

WTF!

 

Question: will be the result accepted or I should strike it down and stop right now? Because it would have a little purpose of wasting time for result, that will be discarded, as there will be sudden and serious jump in time between batch 4 and 5... (and possibly that will repeat somehow during the test)

And no, I did not changed anything at all. Two instances of new CPU-Z provide the load on both cores + GPUPI and that it is.

Link to comment
Share on other sites

That's normal behaviour. There are two different kinds of loops for 1B results. The loops to calculate the partial results until 500M use less precision and are therefor faster. The loops between 500M and 1B have to use 128 bit integer algorithms, which is much slower. This behaviour is repeated for 4 times, so it's like this:

 

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

=> result

 

The number of loops that are fast/slow and how slow/fast they actually are is determined by the batch size. But don't worry, the batch size only slows down or speeds up the whole calculation, depending on how the hardware can process the work load (too small and not all cores can be used at once, too big and the hardware is overwhelmed). It will influence the loops, but it's not like fiddling with the batch size will introduce more slow loops. As I tried to explain, the whole calculation time will just be split differently between those loops. That's because the loops, that are show are just a visual thing and do not really show these two different parts of the calculation.

 

Well, a difficult topic, but I hope I could shed some light on this.

Link to comment
Share on other sites

That's normal behaviour. There are two different kinds of loops for 1B results. The loops to calculate the partial results until 500M use less precision and are therefor faster. The loops between 500M and 1B have to use 128 bit integer algorithms, which is much slower. This behaviour is repeated for 4 times, so it's like this:

 

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

=> result

 

The number of loops that are fast/slow and how slow/fast they actually are is determined by the batch size. But don't worry, the batch size only slows down or speeds up the whole calculation, depending on how the hardware can process the work load (too small and not all cores can be used at once, too big and the hardware is overwhelmed). It will influence the loops, but it's not like fiddling with the batch size will introduce more slow loops. As I tried to explain, the whole calculation time will just be split differently between those loops. That's because the loops, that are show are just a visual thing and do not really show these two different parts of the calculation.

 

Well, a difficult topic, but I hope I could shed some light on this.

 

As a question to that bold part, would it be possible to add a setting for how many batches would be dispayed during a run if it doesnt affect the calculation anyways?

I guess it doesnt mater that much on new hardware but would be sweat for those older CPUs where the runs take like an hour+

 

EDIT:

And I think there is some rounding error, atleast in the smaller tests (running 32m to test how different settings affect the speed) couse all batches end with a time like XX.999s

Edited by lanbonden
Link to comment
Share on other sites

Oh, that is good to know that there is nothign to wory about:

 

GPUPI_longer_runs_now.jpg

 

...I was just considered stoping this, because the jump are quite severe. I mean... from around 166 to 176min per loop is suddently jumped between loops 4 and 5 to 846min per loop, witch is roughly 4.8x more :o

 

But it is true, upon futher inspection, that the slowest run yet had also big jump between batch 4 (18h 36min) and batch 5 (26h 16min), so it is probably okay. I still have to get slower, because I managed only 25h 53min at batch 5, while he have 26h 16min...

GPUPI_84h_03min_to_beat.jpg

Edited by trodas
Link to comment
Share on other sites

As a question to that bold part, would it be possible to add a setting for how many batches would be dispayed during a run if it doesnt affect the calculation anyways?

I guess it doesnt mater that much on new hardware but would be sweat for those older CPUs where the runs take like an hour+

 

EDIT:

And I think there is some rounding error, atleast in the smaller tests (running 32m to test how different settings affect the speed) couse all batches end with a time like XX.999s

It's possible for sure, but I want to keep a certain standard for the output, so results are easy to moderate. Custom loop sizes won't help with that and might get banned from HWBOT for that reason.

 

Sounds more like a bad resolution of your OS timer. Try using the HPET timer (see the FAQ on the download page of GPUPI), you will get more precise results.

 

Btw, testing batch sizes on 32M is not a good strategy, because only the fast loops will be measured. You should take the high precision loops of 1B into account as well.

Link to comment
Share on other sites

It's possible for sure, but I want to keep a certain standard for the output, so results are easy to moderate. Custom loop sizes won't help with that and might get banned from HWBOT for that reason.

 

Sounds more like a bad resolution of your OS timer. Try using the HPET timer (see the FAQ on the download page of GPUPI), you will get more precise results.

 

Btw, testing batch sizes on 32M is not a good strategy, because only the fast loops will be measured. You should take the high precision loops of 1B into account as well.

 

Same motherboard as before when I had the time out of sync problem so no HPET to fix it.

Now a few rounds later am I sure that its not just second that rounds but it rounds to 5 sec intervalls even.

Doubt it affects very many motherboards but might be interesting to see for you anyways.

0756f1bbc1.png

 

And more important for now, would a submission with times looking as the above picture be accepted in the slowrun for the teamcup? But runing 1B instead and with CPUz open showing mem and cpu ofc.

Edited by lanbonden
Link to comment
Share on other sites

Batch 6 finally finished - 48h? What is going on?

 

That way I will be lucky if this run is finished, before are finished the competition... But why such brutal slow-down? I did not touch anythig...

As I said, depending on the batch size the slow and fast loops converge into each other. Seems like loop 6 uses a lot 128 bit integer calculations, loop 5 was still part of the 64 bit integer kernels.

 

I got the problem with this benchmark, always getting crash on 4/512 have trying fresh install os W7 W8 and trying different version gpu pi still same but run normal with another 4/512 did anyone know how to fixed ?
Don't use a reduction size of 512. Seems like the drivers can't handle these reduction depths. Use 256 instead, it should work (or the system is not stable).

 

Was support for some older hardware disabled in the latest revision for the sake of running newer stuff?

This is a s423.

Any ideas why it says its being ignored?

As the message states, these CPUs do not support double precision calculations with OpenCL. This has nothing to do with the version of GPUPI.

 

And I clicked few times into the working windows and on the next batch I get this:

 

...hope I did not screw up anything. I will stop clicking for sure :(

You somehow deleted the output text buffer (which is not possible at all). If the whole output text buffer does not reappear on the next loop, I would restart. Otherwise the moderators could reject the result. :( Edited by _mat_
Link to comment
Share on other sites

As the message states, these CPUs do not support double precision calculations with OpenCL. This has nothing to do with the version of GPUPI.

 

I beg to differ. He ran that same CPU on GPUPI version 1.4 in the Turrican challenge. Something changed.

http://hwbot.org/submission/2749253_mr.paco_gpupi_for_cpu___1b_pentium_4_1.5ghz_willamette_s423_9h_38min_40sec_284ms

Link to comment
Share on other sites

I'm sorry to bring this up again... but despite all efforts Nvidia G200 still refuses to work with GPUPI 2.2 (legacy). Although the error message is different this time.

 

If you manage to fix this issue, I promise to bench all G200 videocards I can find. :D

gpupi22_error4xud0.png

 

LOG START at 2015-08-16  01:03:07 ----------------------
Starting run to calculate 1000000 digits with 1 batches
Batch Size: 1M
Maximum Reduction Size: 64
Message box: Press OK to start the calculation. (Start)
Error while calculating series term!
Result digits: 000000000
Result time: 0.006000
Device statistics for NVIDIA GeForce GTX 285:
Calculated Batches: 1 of 4 (25.000000%)
Kernel time: 0.000000 seconds
Reduction time: 0.000000 seconds
Message box: Invalid result! (Error)

Link to comment
Share on other sites

I beg to differ. He ran that same CPU on GPUPI version 1.4 in the Turrican challenge. Something changed.

http://hwbot.org/submission/2749253_mr.paco_gpupi_for_cpu___1b_pentium_4_1.5ghz_willamette_s423_9h_38min_40sec_284ms

The code in GPUPI itself can not determine the capability of the hardware, that's done by querying OpenCL. So this is a driver issue. Have you tried the same drivers you used last time?
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...