GPUPI - SuperPI on the GPU

_mat_ · April 3, 2018

11 hours ago, bolc said:

I make only offline save and submit later files

Then it's either a submission to the wrong category (like GPUPI 1B with a CPU score) or some problem with HWBOT and/or the network connection.

_mat_ · April 3, 2018

12 hours ago, GeorgeStorm said:

Ah so you need to use the legacy version to get cuda 8 to work? (seemingly drivers with cuda 9 cause my os to freeze/lag/crash when going below -60 so can't use them)

Yes and I don't like it as well. I don't know why NVIDIA decided to go that way, but I hope that there's a good reason for it.

I could compile a CUDA 8 version of GPUPI 3.x of course, but then I have to maintain 3 different versions in the future. I don't think that's a good solution at all as people are already confused with the Legacy Version.

GeorgeStorm · April 3, 2018

12 minutes ago, _mat_ said:

Yes and I don't like it as well. I don't know why NVIDIA decided to go that way, but I hope that there's a good reason for it.

I could compile a CUDA 8 version of GPUPI 3.x of course, but then I have to maintain 3 different versions in the future. I don't think that's a good solution at all as people are already confused with the Legacy Version.

That's fair enough, might be worth a note on the benchmark download page/FAQ section though? I had a quick look before I posted here but didn't see anything about it.

_mat_ · April 4, 2018

Good idea, thanks for the input!

_mat_ · April 5, 2018

Fair warning: GPUPI 3.3 is now officially available and will give a decent speedup on all calculations. I'd like to clarify the reasons behind this, because I know that means some rebenching might be necessary for the top ranks.

It's the first time since GPUPI 1.x that I changed code inside the calculation kernel. Yes, GPUPI had some speed increases before but only because new hardware was officially supported/optimized or new CUDA/OpenCL versions got new features that GPUPI could use for improvements to stay on the edge of what's out there.

I am currently implementing a native path for CPUs, that will take advantage of OpenMP and AVX/AVX2. OpenMP needs no extra installation like OpenCL (it's compiled into the application) and will be far more efficient, highly optimizable and all in all faster than any OpenCL implementation for CPUs out there. Last but not least OpenCL is treated badly by CPU vendors and rarely gets updates nor optimizations or fast support for new hardware. With OpenMP I can decide all that for myself and optimize/support any CPU I can get my hands on. Plus GPUPI gets less complicated because no additional drivers will need to be installed. GPUs already get their GPGPU API with the graphics driver (with the exception of Intel's iGPUs) and CPUs won't need anything installed anymore. Just start GPUPI and you are good to go.

Bottom point is I want to get rid of OpenCL for CPUs in the long run. To make that happen I needed to slim down the calculation part that handles 128 bit integers to improve vectorization (for AVX support), which ultimately led to the speedup. But why the hell did I decide to release the improved code already with GPUPI 3.3? Because:

It resolves a number of compatibility issues that I had since the first release of GPUPI. I had to manually tweak some kernel code for older devices and that's a very time consuming thing.
I always want to release the best version of GPUPI that my current abilities allow me to. It's kind of my personal way of overclocking (with code).
I had some feedback that GPUPI 3.1 is faster than 3.2 (which can be true for some GPU/CPU combinations) and so people are currently using 3.1, 3.1.1 or 3.2 for results. A fourth version of GPUPI (3.3) with similar speed wouldn't have made the situation easier, neither for maintaing nor for benching. I think it's no fun at all to have to try different versions of the same bench to get the best result. The speedup of 3.3 resolves that, because now there is only one obvious choice.

Without further ado, here is GPUPI 3.3: https://www.overclockers.at/news/gpupi-3-ist-final (it's in German for now but there are lots of images ;))

Edited April 5, 2018 by _mat_

_mat_ · April 5, 2018

Please share your thoughts on speedups like this as well, any feedback is appreciated! I'll promise to keep updates like this to a minimum to avoid unnecessary rebenching.

GeorgeStorm · April 5, 2018

It's great that it makes things easier for you etc, but does ruin any work people have already put in to benching it

bolc · April 5, 2018

+1 with george

why not making a separate gpupi 3.3 opencl-less scoring ?

so one up to gpupi 3.2 and one as of 3.3 ?

bolc · April 5, 2018

I am for opening a new category :

keeping the actual "gpupi OpenCL" bench

and opening "gpupi OpenMP" bench

so nobody is hurt, and more soft to bench is always nice to have

Edited April 5, 2018 by bolc

_mat_ · April 5, 2018

The OpenMP path won't happen until the end of the year or maybe even 2019, so we can discuss that at a later point (maybe split the CPU category into GPUPI 3 and GPUPI 4 like GeekBench 3 and 4 is handled).

This discussion should be about GPUPI 3.1+3.2 VS the faster 3.3 (in all categories).

Btw, GPUPI 3.3 is currently not enabled for HWBOT submission although I have released a few scores for testing.

_mat_ · April 5, 2018

To show you the current evolution of GPUPI 3.1.1 VS 3.2 VS 3.3 I have benched all versions on 2x GTX 1080 Ti @ 1987 MHz in 32B on CUDA.

GPUPI 3.1.1: 6m 59.194s (GPU frequency is shown too low, API information is not as detailed as in the other versions)
GPUPI 3.2: 6m 56.414s
GPUPI 3.3: 4m 37.242s (that's the speedup in question)

Edited April 5, 2018 by _mat_

GeorgeStorm · April 5, 2018

That basically would nullify any results made previously the difference is so big no?

_mat_ · April 5, 2018

I just recompiled GPUPI 3.3 with the old kernel code and the difference is about non-existent for CUDA 32B:

GPUPI 3.2: 6m 56.414s
GPUPI 3.3b: 6m 56.676s

~200 ms difference is about nothing in a 32B run.

basco · April 5, 2018

thx _mat_ for your work on your tool !

wenn du eine übersetzung brauchst sags bitte.

bigblock990 · April 5, 2018

I feel that guys shouldn't complain about the speedup. Just dust off your hardware and rebench if needed. This is much better than if the new version is slower, so you can't beat old records.

yosarianilives · April 5, 2018

This is no different than a new version of XTU and for some reason those are allowed every year. Perhaps _mat_ should throw some money at hwbot to make this complaining go away? Seems to work for intel...

bolc · April 5, 2018

Vcomp140.dll missing

c++ 2015 required ?

ok so with c++ 2015, appcrash when loading, on win7 64 pro edition

Edited April 5, 2018 by bolc

bolc · April 5, 2018

Legacy edition starts but crashes on running confirmation
ep45-ud3p / q9550

_mat_ · April 5, 2018

6 minutes ago, bolc said:

Vcomp140.dll missing

c++ 2015 required ?

ok so with c++ 2015, appcrash when loading, on win7 64 pro edition

The Visual Studio C++ 2015 Redistributable is needed for the normal version of GPUPI since 2.x. Download is here: https://www.microsoft.com/de-at/download/details.aspx?id=48145 (use the 64 bit version).

Btw, the Legacy Version needs the Visual Studio C++ 2013 Redistributable because CUDA 6.5 can not compile on newer version of Visual Studio.

Just now, bolc said:

Legacy edition starts but crashes on running confirmation
ep45-ud3p / q9550

Have you tried different settings for the hardware detection? You can try "Safe Mode" first and select "Off" if that doesn't help. I reckon that this is just a compatibility issue with HWiNFO.

GeorgeStorm · April 5, 2018

31 minutes ago, bigblock990 said:

I feel that guys shouldn't complain about the speedup. Just dust off your hardware and rebench if needed. This is much better than if the new version is slower, so you can't beat old records.

It doesn't really affect me as I've got most of the hardware I've run gpupi on, but it never feels great beating others because the software has got better, nothing to do with you, also nullifies their hard work no?

In the end I'll go along with whatever, just pointing out this sounds like a problem to me, and I was unaware you got crazy jumps with xtu, thought it was purely new generations do much better, not that the same hardware does much better.

Also @_mat_ I'm confused, so which version is being released, the one that's significantly faster or only 200ms faster?

Edited April 5, 2018 by GeorgeStorm

bolc · April 5, 2018

i could run 3.1 without c++ 2015 i think
on the legacy version, i tried safe mode, will put it off next

on the normal version, appcrash on opening
thanks

Edited April 5, 2018 by bolc

yosarianilives · April 5, 2018

6 minutes ago, GeorgeStorm said:

It doesn't really affect me as I've got most of the hardware I've run gpupi on, but it never feels great beating others because the software has got better, nothing to do with you, also nullifies their hard work no?

In the end I'll go along with whatever, just pointing out this sounds like a problem to me, and I was unaware you got crazy jumps with xtu, thought it was purely new generations do much better, not that the same hardware does much better.

Also @_mat_ I'm confused, so which version is being released, the one that's significantly faster or only 200ms faster?

New xtu versions support new instructions that favor new hw and also score worse on old hw. This is the same thing, the new version of the bench supports new instructions so it will be faster if you can use that instruction. As for the difference in speed on gpus, yeah that seems questionable.

bolc · April 5, 2018

5 minutes ago, yosarianilives said:

New xtu versions support new instructions that favor new hw and also score worse on old hw. This is the same thing, the new version of the bench supports new instructions so it will be faster if you can use that instruction. As for the difference in speed on gpus, yeah that seems questionable.

a new xtu ver affects global points but not the hw score isn it?

_mat_ · April 5, 2018

8 minutes ago, GeorgeStorm said:

It doesn't really affect me as I've got most of the hardware I've run gpupi on, but it never feels great beating others because the software has got better, nothing to do with you, also nullifies their hard work no?

The way I see it as an overclocker, you are beating others by being active and putting more effort into it. I remember Turrican redoing each of his impacted GPU scores with every new CPU generation out there. There is a multiude of other factors that need rebenching as well:

Driver updates
Tweaks uncovered
New OS version
Bugfixes and cheat protections for benchmarks

I guess it's safe to say, that rebenching is pretty normal to be in the top ranks. Especially in the race for Hardware Masters.

17 minutes ago, GeorgeStorm said:

Also @_mat_ I'm confused, so which version is being released, the one that's significantly faster or only 200ms faster?

That's what I am trying to find out together with all of you. I don't want to overrule things like that, just because I can. I want to make a good decision, that keeps overclockers happy and motivated to bench GPUPI. That's my only goal with the benchmark!

_mat_ · April 5, 2018

21 minutes ago, bolc said:

i could run 3.1 without c++ 2015 i think
on the legacy version, i tried safe mode, will put it off next

That's not possible, C++ 2015 is necessary since 2.x. It might just have been installed by another application with an installer.

22 minutes ago, bolc said:

on the normal version, appcrash on opening
thanks

That's definitely not enough information to get anything fixed. Are you opening the normal version with your Q9950? That can result in an OpenCL error, if you don't have an OpenCL 2.x runtime installed. GPUPI would complain about a missing "clCreateCommandQueueWithProperties" function. Old hardware should only use the Legacy version, it supports OpenCL 1.1.

GPUPI - SuperPI on the GPU

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

_mat_

Leeghoofd

Splave

Posted Images

Join the conversation