skulstation Posted November 13, 2014 Posted November 13, 2014 Did you install the newest drivers? Are you sure they can handle OpenCL? If the device doesn't support double precision but can be detected on the system, it will get listed as ignored when starting the benchmark. I'm not sure about your cards, GTS 250 seems to have double support, I think. Detection is mostly a driver issue and has not much to with the benchmark itself. i gone re try it whit the gts 250 but now whit the 344.11 and not the 340.52 drivers. its working on a 560 ti 448 whit the 344.11 drivers Quote
knopflerbruce Posted November 14, 2014 Posted November 14, 2014 If you don't mind... please do a 2nd setting which takes a bit longer. What about 32B? Sure, most people prefer short benchmarks, but something heavy is also worth considering. Quote
_mat_ Posted November 14, 2014 Author Posted November 14, 2014 If you don't mind... please do a 2nd setting which takes a bit longer. What about 32B? Sure, most people prefer short benchmarks, but something heavy is also worth considering.10B already takes about 15 minutes on my GTX 980. 20B would be possible without adapting the algorithm for higher precision. But I have to test. I am currently working on a CUDA implementation. Just curious to see how good the OpenCL implementation of NVIDIA really is. Quote
knopflerbruce Posted November 14, 2014 Posted November 14, 2014 What about 16B, then? 20 would be cool. It would be somewhat interesting in a few years - even wprime 1024m is not what it used to be because it's almost over before it really began Quote
TaPaKaH Posted November 14, 2014 Posted November 14, 2014 +1 for really large problem instances. Computing power in GPUs grows a lot faster than "+5%" per year so if a test takes 20s now, in a couple of years it will be faster than 1M. Quote
Massman Posted November 15, 2014 Posted November 15, 2014 I concur, it would be nice to have a really demanding test You know, like how 32M used to take an hour to run. Quote
Calathea Posted November 15, 2014 Posted November 15, 2014 I too would like a longer test. Just you guys don't forget how powerful a 980 or 290 can be. Don't want to see 4h+ runs with something like a radeon 270x Quote
_mat_ Posted November 24, 2014 Author Posted November 24, 2014 I'm currently testing the CUDA implementation with 32B digits. Takes about 50 minutes on a GTX 980 with stock clocks. Long enough? I will release version 1.3 later today. Quote
_mat_ Posted November 24, 2014 Author Posted November 24, 2014 Guys, version 1.3 is here. The whole code was refactored and allows multiple APIs now, that are loaded when the system supports it. The new version also includes a standalone for OpenCL and CUDA. The main reason is that the OpenCL version will run on Windows XP, the CUDA version won't. The CUDA implementation was pretty easy and also uses less code to work. Especially the part to setup the application and prepare the calculation was a piece of cake compared to OpenCL. That said, I tried to be as fair as possible and implemented both APIs with each of their advantages, but still rely on the same algorithms and the same basic optimizations. I've also adjusted the OpenCL code a little bit to get them closer together, so the new version might differ a few milliseconds from the results of 1.2. Please use the new version as of now. As requested I added two more digits to reach: 20B and 32B. Smaller graphics cards and CPUs will have to crunch those for days. Karl would have loved it! Have a fun and let me know your results and what you think! Download: GPUPI Beta 1.3 Quote
_mat_ Posted November 24, 2014 Author Posted November 24, 2014 I've just added "GPUPI - 32B" and "GPUPI for CPU - 1B" to the benchmarks. Let's hope you use it. Btw, I guess there should be a discussion if it's allowed to use CUDA for NVIDIA cards to compete in the rankings. Well, that's why I have so carefully implemented CUDA and OpenCL so close together. I think it would be fair, because any performance improvement is due to the vendor's implementation and optimization of kernel, which is basicly the same. Quote
tiborrr Posted December 2, 2014 Posted December 2, 2014 (edited) Great benchmark, love it! I would vote for 2B and 32B to be accepted as 'retail' benchmarks once benchmark passes through the Hwbot validation: I would also lock down the benchmark window when splash screen "Pi Calculation is Done!" comes up. Just for the sake of nostalgia Here's my workstation machine with FirePro W9000: Wannabe W7000 (flashed 7870, slower DP): Edited December 2, 2014 by tiborrr Quote
_mat_ Posted December 2, 2014 Author Posted December 2, 2014 (edited) Thanks Nico, very much appreciated! I will change the message box, good idea. I really wanted to implement it as close to SuperPi as possible, but changed what I felt was outdated or not well done in the orginial benchmark. For example the window can be moved after the message box for a successful calculation is shown. The screenshot weirdos will thank me for this - yeah, I am one of those. It was also important for me to have an options file, that rembers what was set last time. Regarding the default bench settings for ranking, I let you guys decide. I will update the bench to show it as default too. Edited December 2, 2014 by _mat_ Quote
GENiEBEN Posted December 3, 2014 Posted December 3, 2014 1B and 32B, since the idea was to make it a Spi-Clone. You can add a poll to the thread if you want to. Quote
tiborrr Posted December 3, 2014 Posted December 3, 2014 I fully concur with 1B and 32B. This way we're cool for couple of years (y) For the sake of nostalgia please: - use the original font - remove the cancel button on calculation start notification Also, here's the scaling of my HD7870 (Pitcairn): Quote
Massman Posted December 4, 2014 Posted December 4, 2014 So cool to see this benchmark kick off so nicely. Let's fast-track it for points :D Quote
_mat_ Posted December 4, 2014 Author Posted December 4, 2014 It would be an honour! I promise to support the benchmark actively in the foreseeable future. Btw: Next stop is multi gpu support. I just got supported by ASUS with a couple of GTX 980s for an overclocking show in Vienna today. I will use them wisely. Quote
_mat_ Posted December 6, 2014 Author Posted December 6, 2014 I fully concur with 1B and 32B. This way we're cool for couple of years (y) For the sake of nostalgia please: - use the original font - remove the cancel button on calculation start notification Also, here's the scaling of my HD7870 (Pitcairn): Was a bit busy with an overclocking show the last few days. If anybody wants the see some pictures, have a look here. First off, thanks for the nice scaling diagram. It clearly shows that GPUPI is not bandwidth limited in any way. The millions or even billions of parallely calculated values always stay in the graphics memory, never have to be copied to the host. That's because I've implemented not only the pure calculation on the GPU, there is also a two-stage memory reduction done right afterwards using shared memory inside the workgroups. Only two doubles have to be transfered afterwards to the host to accumulate the final sum of each series (there are four of them). Regarding your suggestions, when writing the bench I initially wanted to use the original font. But it has a lot of kerning and I was not able to put in all the information without resizing the window to a strange and not SuperPi-like proportion. As I don't like the original font that much, I thought I'll better choose something more readable. Additionally I also implemented the text as an editable control in WIN32, because I wanted the text to be selectable for copying information, something I also missed in SuperPI. That comes with the sacrifice to control the spacing of the text itself. SuperPI uses GDI draw calls, where you can state a pixel coordinates to place the text. I've also but some thought in the cancel button before calculation, it's not a bug. I thought it was a good idea to be able to cancel it. You know how it is benching with ln2. You open the bench, press calculate and wait for the message box to start. Now focus is back on the temperature, pouring ln2 to the maximum of the component. When ready, press Ok. But sometimes things are not ready or you've forgot something. Now you have a choice to go back, quit the application and do it. But guys, if you want it to be more nostalgic, I can try. I've just wanted to let you know, that I put some thought into it and didn't change things for nothing. Quote
OLDcomer Posted December 6, 2014 Posted December 6, 2014 It would be an honour! I promise to support the benchmark actively in the foreseeable future. Btw: Next stop is multi gpu support. I just got supported by ASUS with a couple of GTX 980s for an overclocking show in Vienna today. I will use them wisely. Nice benchmark! Please add an option to select which GPU device will do the calculations. So in multi-gpu systems we will be able to select the best overclocker like we play with affinity in task manager to select the best overclocking core while benching Super Pi. Quote
Massman Posted December 7, 2014 Posted December 7, 2014 I'm not too sure 1B is a good choice. The currently wr is already at 19 seconds, so it looks like we're going to be at the end of the benchmark very soon. Quote
_mat_ Posted December 9, 2014 Author Posted December 9, 2014 First official release version is here! Just a few minor bugfixes, no changes on the GPU code and the results. I recommend to use this version for benching though. Changelog Explicit device selection for SLI, Crossfire und systems with multiple sockets and CPUs, implemented for CUDA as well as OpenCL. Important: The sort order of the devices depends on the driver and therefor the vendor implementation. In my tests this is same order in which GPU-Z sorts its devices. Some overclocking tools might order them in their own way. Bugfix: The previously selected CUDA graphics card was not correctly preselected in the settings dialog The final message box after the calculations is now exactly shown as in SuperPI Download: GPUPI 1.4 (723 KB) Quote
buildzoid Posted December 11, 2014 Posted December 11, 2014 How much does this benefit from FP64? Because if FP64 makes a big impact the WR will always belong to compute cards and that's just no fun for most people. Quote
GENiEBEN Posted December 11, 2014 Posted December 11, 2014 How much does this benefit from FP64? Because if FP64 makes a big impact the WR will always belong to compute cards and that's just no fun for most people. You mean just like many WR's belong to 2P/4P Xeons that cost more than a car? Get with it, we want to see the absolute record possible, not the down-to-earth-Joe-can-afford-it-too. EDIT: @mat Can you implement i18n support? Quote
_mat_ Posted December 11, 2014 Author Posted December 11, 2014 How much does this benefit from FP64? Because if FP64 makes a big impact the WR will always belong to compute cards and that's just no fun for most people.Good double precision helps, but choosing the right weapon - currently R9 290 - and overclocking it to the maximum is key in this benchmark. So FirePro and Quadro won't have a chance to fetch the cups, if they can't be overclocked. Can you implement i18n support?Some of the is already using WIN32's WCHAR, but not all of it. Why, seen any problems with English yet? Anyways, I have no intention to translate it. Quote
GENiEBEN Posted December 11, 2014 Posted December 11, 2014 ^ No, was hoping to translate it, np. Btw, how exactly do I start it on CPU, ran it on Mobile HD4000 + i5 3320M, says it can't run on IGP (missing DP), but it should run w/o problems on the cpu. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.