GPUPI - SuperPI on the GPU - Page 14 - Benchmark software

August 13, 20159 yr

Too late for that, already having 3rd batch (8h, 30 something minutes) ... and it look suspicious, when the time distances between batches vary, as I will add another slowing-down program

But I have better ideas, the bench can run on notably slower hardware, causing it to choke a bit (or a lot?) ... like on another machine running Aquamark for 24h now, having only drawn little over 1/10 from the 5200 frames

Quote

August 13, 20159 yr

Slow it down by running another benchmark in parallel.

Problem is loops will look irregular, people will report for nothing. I had 70hours with just 7 loops, 3 loops with SPi32 (like 15hrs/loop) and 4 without it ("normal" 4-5hours/loop). Decided to cancel it since everyone is sandbagging it anyway.

Quote

August 13, 20159 yr

GPUPI 2.2 is out now! Thanks to the feedback from you guys, I was able to fix a lot of things and improve many others. It will be mandatory to use this version in the future, because there are some important changes to make the bench even more bullet proof. But we will wait until the the currently started competitions have finished, including the Team Cup of course.

Last but not least I would like to talk about our next plans with GPUPI. Thanks to HWBOT I can integrate CPU-Z into the next version, which will improve hardware detection a lot and allows several frequencies and even voltages to be automatically submitted to the database. Additionally we are going to include support for HWBOT competitions directly into the online submission dialog. I already have a prototype working, but I didn't want to rush anything.

Full changelog and download here: https://www.overclockers.at/news/gpupi-2-2-english

Reporting that it actually solved the problem with time going out of sync, well done

Quote

August 13, 20159 yr

Problem is loops will look irregular, people will report for nothing.

Well, then did not change the load on the run Right now, mine seems to be relatively consistent:

1 - 2:46 (166min)

2 - 5:32 (332min - exactly 166 per loop)

3 - 8:26 (506min - exactly 168.666 per loop)

4 - 11:47 (707min - exactly 176.75 per loop)

...looks like the times are slightly increasing, but all in all it seems very well consistent - IMHO.

But maybe you get the irregularity because you did not yet compiled the two Aquamark scores for me That sort of defy the purpose of having setup another superslow run...

(BTW and in Aquamark, there is clearly irregularity in the speed the frames increase between various tests. At first it run well, but now it turned into real snail...)

Quote

August 13, 20159 yr

Well it's not like I'll wake in the middle of the night to restart Spi32

Quote

August 13, 20159 yr

LOL ... interesting. Why then the fails? Mine load seems quite regular and no sudden jumps are seen... but maybe things get different on different HW? Who knows. Noctua just delivered the support for S775/S1366 ...and one more socket I forget, so I install cooler on the ASRock 775Dual-VSTA and try make the GPUPI work with really low-end HW and we see, what will happen.

Quote

August 14, 20159 yr

I speak too soon.

At the end of batch 4 was the test yesterday and today it did not move a single bit...! Still at batch 5, so...

That means that there will be much longer loop (just as GENiEBEN say, 15h per loop suddently) and all my "quite regular" loops claims are gone :eek:

(or more precisely are valid up to the loop 4...)

WTF!

Question: will be the result accepted or I should strike it down and stop right now? Because it would have a little purpose of wasting time for result, that will be discarded, as there will be sudden and serious jump in time between batch 4 and 5... (and possibly that will repeat somehow during the test)

And no, I did not changed anything at all. Two instances of new CPU-Z provide the load on both cores + GPUPI and that it is.

Quote

August 14, 20159 yr

Author

That's normal behaviour. There are two different kinds of loops for 1B results. The loops to calculate the partial results until 500M use less precision and are therefor faster. The loops between 500M and 1B have to use 128 bit integer algorithms, which is much slower. This behaviour is repeated for 4 times, so it's like this:

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

=> result

The number of loops that are fast/slow and how slow/fast they actually are is determined by the batch size. But don't worry, the batch size only slows down or speeds up the whole calculation, depending on how the hardware can process the work load (too small and not all cores can be used at once, too big and the hardware is overwhelmed). It will influence the loops, but it's not like fiddling with the batch size will introduce more slow loops. As I tried to explain, the whole calculation time will just be split differently between those loops. That's because the loops, that are show are just a visual thing and do not really show these two different parts of the calculation.

Well, a difficult topic, but I hope I could shed some light on this.

Quote

August 14, 20159 yr

I think everyone just wants to know how to speed up loops 20-24.

Oh wait, that's another benchmark

Quote

August 14, 20159 yr

That's normal behaviour. There are two different kinds of loops for 1B results. The loops to calculate the partial results until 500M use less precision and are therefor faster. The loops between 500M and 1B have to use 128 bit integer algorithms, which is much slower. This behaviour is repeated for 4 times, so it's like this:

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

fast loops

slow loops

=> result

The number of loops that are fast/slow and how slow/fast they actually are is determined by the batch size. But don't worry, the batch size only slows down or speeds up the whole calculation, depending on how the hardware can process the work load (too small and not all cores can be used at once, too big and the hardware is overwhelmed). It will influence the loops, but it's not like fiddling with the batch size will introduce more slow loops. As I tried to explain, the whole calculation time will just be split differently between those loops. That's because the loops, that are show are just a visual thing and do not really show these two different parts of the calculation.

Well, a difficult topic, but I hope I could shed some light on this.

As a question to that bold part, would it be possible to add a setting for how many batches would be dispayed during a run if it doesnt affect the calculation anyways?

I guess it doesnt mater that much on new hardware but would be sweat for those older CPUs where the runs take like an hour+

EDIT:

And I think there is some rounding error, atleast in the smaller tests (running 32m to test how different settings affect the speed) couse all batches end with a time like XX.999s

Edited August 14, 20159 yr by lanbonden

Quote

August 14, 20159 yr

Oh, that is good to know that there is nothign to wory about:

...I was just considered stoping this, because the jump are quite severe. I mean... from around 166 to 176min per loop is suddently jumped between loops 4 and 5 to 846min per loop, witch is roughly 4.8x more

But it is true, upon futher inspection, that the slowest run yet had also big jump between batch 4 (18h 36min) and batch 5 (26h 16min), so it is probably okay. I still have to get slower, because I managed only 25h 53min at batch 5, while he have 26h 16min...

Edited August 14, 20159 yr by trodas

Quote

August 14, 20159 yr

Author

As a question to that bold part, would it be possible to add a setting for how many batches would be dispayed during a run if it doesnt affect the calculation anyways?
I guess it doesnt mater that much on new hardware but would be sweat for those older CPUs where the runs take like an hour+

EDIT:

And I think there is some rounding error, atleast in the smaller tests (running 32m to test how different settings affect the speed) couse all batches end with a time like XX.999s

It's possible for sure, but I want to keep a certain standard for the output, so results are easy to moderate. Custom loop sizes won't help with that and might get banned from HWBOT for that reason.

Sounds more like a bad resolution of your OS timer. Try using the HPET timer (see the FAQ on the download page of GPUPI), you will get more precise results.

Btw, testing batch sizes on 32M is not a good strategy, because only the fast loops will be measured. You should take the high precision loops of 1B into account as well.

Quote

August 14, 20159 yr

It's possible for sure, but I want to keep a certain standard for the output, so results are easy to moderate. Custom loop sizes won't help with that and might get banned from HWBOT for that reason.

Sounds more like a bad resolution of your OS timer. Try using the HPET timer (see the FAQ on the download page of GPUPI), you will get more precise results.

Btw, testing batch sizes on 32M is not a good strategy, because only the fast loops will be measured. You should take the high precision loops of 1B into account as well.

Same motherboard as before when I had the time out of sync problem so no HPET to fix it.

Now a few rounds later am I sure that its not just second that rounds but it rounds to 5 sec intervalls even.

Doubt it affects very many motherboards but might be interesting to see for you anyways.

And more important for now, would a submission with times looking as the above picture be accepted in the slowrun for the teamcup? But runing 1B instead and with CPUz open showing mem and cpu ofc.

Edited August 14, 20159 yr by lanbonden

Quote

August 15, 20159 yr

Batch 6 finally finished - 48h? What is going on?

That way I will be lucky if this run is finished, before are finished the competition... But why such brutal slow-down? I did not touch anythig...

Quote

August 15, 20159 yr

I got the problem with this benchmark, always getting crash on 4/512 have trying fresh install os W7 W8 and trying different version gpu pi still same but run normal with another 4/512 did anyone know how to fixed ?

Quote

August 15, 20159 yr

No crash here, make sure your PC is stable? Did you checked with OCCT or at least MultiCore Prime95 stress test, starting with FFT size 1792? ( http://postimg.org/image/xg7y7ytrt/ )

http://www.mediafire.com/?5clk6102wkw6ckz

(backup: https://mega.co.nz/#!OM0HEDbL!8QgEdEsPgArz9IpqK5iksbEf3LYh1lEuJvjL656b_tI )

Quote

August 16, 20159 yr

Was support for some older hardware disabled in the latest revision for the sake of running newer stuff?

This is a s423.

Any ideas why it says its being ignored?

Quote

August 16, 20159 yr

Oh, no double precision... That suxx... I was planing oh this move too, but with Socket 775 Celeron

Quote

August 16, 20159 yr

Oh, no double precision... That suxx... I was planing oh this move too, but with Socket 775 Celeron

775 celeron worked for sure with the last version, havent tested with 2.2 as Im doing the run on another socket atm.

Quote

August 16, 20159 yr

Hmmm, then why it does fail on P4, as above was shown? Is he used wrong AMD OpenCL or what?

And I clicked few times into the working windows and on the next batch I get this:

...hope I did not screw up anything. I will stop clicking for sure

Quote

August 16, 20159 yr

Author

Batch 6 finally finished - 48h? What is going on?

That way I will be lucky if this run is finished, before are finished the competition... But why such brutal slow-down? I did not touch anythig...

As I said, depending on the batch size the slow and fast loops converge into each other. Seems like loop 6 uses a lot 128 bit integer calculations, loop 5 was still part of the 64 bit integer kernels.

I got the problem with this benchmark, always getting crash on 4/512 have trying fresh install os W7 W8 and trying different version gpu pi still same but run normal with another 4/512 did anyone know how to fixed ?

Don't use a reduction size of 512. Seems like the drivers can't handle these reduction depths. Use 256 instead, it should work (or the system is not stable).

Was support for some older hardware disabled in the latest revision for the sake of running newer stuff?
This is a s423.

Any ideas why it says its being ignored?

As the message states, these CPUs do not support double precision calculations with OpenCL. This has nothing to do with the version of GPUPI.

And I clicked few times into the working windows and on the next batch I get this:

...hope I did not screw up anything. I will stop clicking for sure

You somehow deleted the output text buffer (which is not possible at all). If the whole output text buffer does not reappear on the next loop, I would restart. Otherwise the moderators could reject the result.

Edited August 16, 20159 yr by _mat_

Quote

August 16, 20159 yr

As the message states, these CPUs do not support double precision calculations with OpenCL. This has nothing to do with the version of GPUPI.

I beg to differ. He ran that same CPU on GPUPI version 1.4 in the Turrican challenge. Something changed.

http://hwbot.org/submission/2749253_mr.paco_gpupi_for_cpu___1b_pentium_4_1.5ghz_willamette_s423_9h_38min_40sec_284ms

Quote

August 16, 20159 yr

I'm sorry to bring this up again... but despite all efforts Nvidia G200 still refuses to work with GPUPI 2.2 (legacy). Although the error message is different this time.

If you manage to fix this issue, I promise to bench all G200 videocards I can find.

LOG START at 2015-08-16  01:03:07 ----------------------
Starting run to calculate 1000000 digits with 1 batches
Batch Size: 1M
Maximum Reduction Size: 64
Message box: Press OK to start the calculation. (Start)
Error while calculating series term!
Result digits: 000000000
Result time: 0.006000
Device statistics for NVIDIA GeForce GTX 285:
Calculated Batches: 1 of 4 (25.000000%)
Kernel time: 0.000000 seconds
Reduction time: 0.000000 seconds
Message box: Invalid result! (Error)

Quote

August 16, 20159 yr

As the message states, these CPUs do not support double precision calculations with OpenCL. This has nothing to do with the version of GPUPI.

If you look here ALL ran on s423 for the Turrican Memorial Comp...

Quote

August 16, 20159 yr

Author

I beg to differ. He ran that same CPU on GPUPI version 1.4 in the Turrican challenge. Something changed.
http://hwbot.org/submission/2749253_mr.paco_gpupi_for_cpu___1b_pentium_4_1.5ghz_willamette_s423_9h_38min_40sec_284ms

The code in GPUPI itself can not determine the capability of the hardware, that's done by querying OpenCL. So this is a driver issue. Have you tried the same drivers you used last time?

Quote

Sign In

GPUPI - SuperPI on the GPU

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

_mat_

Leeghoofd

Splave

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

_mat_

Leeghoofd

Splave

Posted Images

Account

Navigation

Search