_mat_

April 5, 2018

I just recompiled GPUPI 3.3 with the old kernel code and the difference is about non-existent for CUDA 32B:

GPUPI 3.2: 6m 56.414s
GPUPI 3.3b: 6m 56.676s

~200 ms difference is about nothing in a 32B run.

April 5, 2018

To show you the current evolution of GPUPI 3.1.1 VS 3.2 VS 3.3 I have benched all versions on 2x GTX 1080 Ti @ 1987 MHz in 32B on CUDA.

GPUPI 3.1.1: 6m 59.194s (GPU frequency is shown too low, API information is not as detailed as in the other versions)
GPUPI 3.2: 6m 56.414s
GPUPI 3.3: 4m 37.242s (that's the speedup in question)

April 5, 2018

The OpenMP path won't happen until the end of the year or maybe even 2019, so we can discuss that at a later point (maybe split the CPU category into GPUPI 3 and GPUPI 4 like GeekBench 3 and 4 is handled).

This discussion should be about GPUPI 3.1+3.2 VS the faster 3.3 (in all categories).

Btw, GPUPI 3.3 is currently not enabled for HWBOT submission although I have released a few scores for testing.

April 5, 2018

Please share your thoughts on speedups like this as well, any feedback is appreciated! I'll promise to keep updates like this to a minimum to avoid unnecessary rebenching.

April 5, 2018

Fair warning: GPUPI 3.3 is now officially available and will give a decent speedup on all calculations. I'd like to clarify the reasons behind this, because I know that means some rebenching might be necessary for the top ranks.

It's the first time since GPUPI 1.x that I changed code inside the calculation kernel. Yes, GPUPI had some speed increases before but only because new hardware was officially supported/optimized or new CUDA/OpenCL versions got new features that GPUPI could use for improvements to stay on the edge of what's out there.

I am currently implementing a native path for CPUs, that will take advantage of OpenMP and AVX/AVX2. OpenMP needs no extra installation like OpenCL (it's compiled into the application) and will be far more efficient, highly optimizable and all in all faster than any OpenCL implementation for CPUs out there. Last but not least OpenCL is treated badly by CPU vendors and rarely gets updates nor optimizations or fast support for new hardware. With OpenMP I can decide all that for myself and optimize/support any CPU I can get my hands on. Plus GPUPI gets less complicated because no additional drivers will need to be installed. GPUs already get their GPGPU API with the graphics driver (with the exception of Intel's iGPUs) and CPUs won't need anything installed anymore. Just start GPUPI and you are good to go.

Bottom point is I want to get rid of OpenCL for CPUs in the long run. To make that happen I needed to slim down the calculation part that handles 128 bit integers to improve vectorization (for AVX support), which ultimately led to the speedup. But why the hell did I decide to release the improved code already with GPUPI 3.3? Because:

It resolves a number of compatibility issues that I had since the first release of GPUPI. I had to manually tweak some kernel code for older devices and that's a very time consuming thing.
I always want to release the best version of GPUPI that my current abilities allow me to. It's kind of my personal way of overclocking (with code).
I had some feedback that GPUPI 3.1 is faster than 3.2 (which can be true for some GPU/CPU combinations) and so people are currently using 3.1, 3.1.1 or 3.2 for results. A fourth version of GPUPI (3.3) with similar speed wouldn't have made the situation easier, neither for maintaing nor for benching. I think it's no fun at all to have to try different versions of the same bench to get the best result. The speedup of 3.3 resolves that, because now there is only one obvious choice.

Without further ado, here is GPUPI 3.3: https://www.overclockers.at/news/gpupi-3-ist-final (it's in German for now but there are lots of images ;))

April 4, 2018

What's your system hardware?

April 4, 2018

Good idea, thanks for the input!

April 3, 2018

12 hours ago, GeorgeStorm said:

Ah so you need to use the legacy version to get cuda 8 to work? (seemingly drivers with cuda 9 cause my os to freeze/lag/crash when going below -60 so can't use them)

Yes and I don't like it as well. I don't know why NVIDIA decided to go that way, but I hope that there's a good reason for it.

I could compile a CUDA 8 version of GPUPI 3.x of course, but then I have to maintain 3 different versions in the future. I don't think that's a good solution at all as people are already confused with the Legacy Version.

April 3, 2018

11 hours ago, bolc said:

I make only offline save and submit later files

Then it's either a submission to the wrong category (like GPUPI 1B with a CPU score) or some problem with HWBOT and/or the network connection.

April 3, 2018

You have to install the latest GeForce drivers.

You can verify this by opening the debug log (Menu: Tools => Debug Log), there will be an error that states, that your driver is not ready for CUDA 9.x.

As an alternative you can use the Legacy Version of GPUPI. It's compatible with older GeForce driver versions, but will most certainly perform worse.

April 3, 2018

On 29.3.2018 at 2:43 PM, bolc said:

Next time I will save a snapshot funny thing is that it give the checksum error, but when you retry 1 or twice, it is then ok. but in the same oc conditions, the 3.1 will not give the error, so I tend to use 3.1 currently but I will

Thanks, that would be very helpful. Please open the Debug Log (Menu: Tools => Debug Log) as well when you are taking the screenshot. There should be a technical description coming from the HWBOT server that provides further information about the submission error.

Just to clarify: There is no difference in the HWBOT submission code between GPUPI 3.1 and 3.2. I have heard of the issue and it's most likely a bad network connection or a hiccup of the HWBOT servers while submitting. I don't want to rule out an error on my part though.

April 3, 2018

On 26.3.2018 at 7:37 PM, DR4G00N said:

Unfortunately I can't provide that due to GPUPI crashing before the validation file is created.

I have rewritten the data file saving for GPUPI 3.3 and the skipping of CPU or GPU detections is done in a safer way now. The downloads links will be up soon, so please retest any time.

One more side note: David (mllrkllr88) from overclock.net is currently holding a competition with GTX 260s and GPUPI is part of it. I am working closely together with him to fix any bugs the users encounter. The GTX 260 is one of the oldest and slowest cards to run GPUPI plus GPGPU computing was in a very early stage back then when these cards were released, so it is a bit of challenge.

Here is the link to the competition: http://www.overclock.net/forum/410-benchmarking-competitions/1675577-freezer-burn-overclocking-competition.html

March 29, 2018

On 26.3.2018 at 1:31 PM, bolc said:

... on 3.2 my biggest issue is that the checksum for saving the file is given as bad, while no pb at all on 3.1.

on 3.1.1 i get some crashes on hwinfo loading.

Hi bolc,

thanks for your kind words, much appreciated!

Regarding your problems: Can you provide a screenshot for the bad checksum problem please? I need to know the exact error message and where it happens.

The crashes due to HWiNFO are hard to completely avoid. There is so much hardware out there, that it's almost certain that something goes wrong when you are testing a lot and especially old components. But every new version gets the latest HWiNFO library, that always comes with a lot of fixes and improvements, so please use the latest version of GPUPI if possible.

And if you come across a hardware detection error, please select the "Debug Mode" for detection and post it here together with some information of the hardware, that seems to cause the crash. I will forward it to Martin from HWiNFO, so it will get a chance to be fixed.

March 29, 2018

On 26.3.2018 at 7:37 PM, DR4G00N said:

Unfortunately I can't provide that due to GPUPI crashing before the validation file is created.

Ok, now I understand, what you meant. I thought it happens when you are using the validation functionality on your already saved data file.

Have you tried to skip the detection of your graphics card when saving the data file?

March 26, 2018

Thank you for your bug report. Can you send me the validation file to matthias [at] hwbot.org? I will have a look at it.

March 21, 2018

Version 3.1+ is now mandatory. The rules are not uptodate. Congrats OGS! Crazy 1080 Ti!

February 26, 2018

Seems like the RX 470 has very limited shared memory. Try to lower the reduction size.

February 20, 2018

Nice work, thank you!

February 15, 2018

Btw - since you have nice HW detection implemented, would it be too much to ask for automatic datafile name suggesting, like x265 is doing for example?
It would make benchmarking for HW masters a little easier I'm using following pattern:

amount_of_HW_x_HW_name_benchmark_type_score

2x_Opteron_2216_HWBOT_x265-1080p_2.18 fps.hwbot

Thank you for your kinds words and thanks for the testing. Good to hear that it works for P4s.

I like the idea of the unified result file naming and will put it in the next minor release!

I am also testing some old graphics cards currently and will add some improvements as well, if possible.

February 15, 2018

Use GPUPI 3.2 and try other HWiNFO settings in the Settings dialog before the run. Try "Safe Mode" first and disable it if necessary. If the error happens already when you start GPUPI, please post a screenshot.

The Legacy version will be slower on new hardware as it can't use that many features due to OpenCL 1.1 compatibility. But on old hardware that shouldn't matter, just use the newest OS and drivers possible for the device.

February 13, 2018

..., but watching at the statistics segment it seems that some small hickup with Titan V #2 lead to the loss of the WR. Slinky's values are way more even in comparison. Anyway, congratulations!

Actually card #1 and #3 are the problem, card #2 was the most efficient. You have to look at the percentage of batches calculated, #2 did 32% of the whole calculation in about the same time as #1 did only 21%.

On the software part, there was only a minor upgrade of the CUDA toolkit between 3.0 and 3.1, so there shouldn't be much difference. I think this is a good case for hardware efficiency through better stability.

Well done, H2o! Try GPUPI 3.2 as well, it should have a little less overhead, because it doesn't use a physical log file on a disk anymore. That could scrape off another ms maybe.

February 13, 2018

I leave that here for anybody that wants to try something hot off the compiler: https://clockers.at/p3880354

Big announcement soon! :cool:

February 12, 2018

Thanks guys, I will look into it!

February 9, 2018

Crazy good! 1 Minute barrier finally broken, awesome!

February 8, 2018

The solution for the GTX 980 Ti and similiar problems is easy: Just install the newest GeForce drivers. Old ones are not compatible with the new CUDA toolkit.

You will very soon have an extensive debug log on your side to aid you with internal problems of device detection. It will come with GPUPI 3.2.

As for old Radeon cards: That depends heavily on your driver situation. If you have an OpenCL 1.2 and higher driver installed on your system, you can use the normal GPUPI version. Otherwise it will give you an error like "cl... function not found in OpenCL.dll".

If you have old OpenCL 1.1 drivers, you will need the Legacy version.

Sign In

_mat_

Posts

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Events

Blogs

Posts posted by _mat_

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

HPET timer issues when activated

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

OGS - GeForce GTX 1080 Ti @ 2974/1251.4MHz - 7sec 408ms GPUPI - 1B

GPUPI - SuperPI on the GPU

HWBOT Forums going down for migration to InvisionPB

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

H2o vs. Ln2 - 4x Titan V - 0sec 995ms GPUPI - 1B

GPUPI - SuperPI on the GPU

GPUPI - SuperPI on the GPU

Splave - Core i9 7980XE @ 5929MHz - 59sec 224ms GPUPI for CPU - 1B

GPUPI - SuperPI on the GPU

HWBOT

Browse

Activity