Strong Island Posted June 21, 2016 Share Posted June 21, 2016 Have you tried a memory benchmark yet? Try AIDA64 to test your bandwidth, it should be impacted as well: Overclocking, overclocking, and much more! Like overclocking. Otherwise it's a driver issue, but I doubt it. We never had an efficiency problem with the memory reduction before. It's btw a very common technique for summing up a lot of memory in parallel. The pi calculation itself depends on much more to be efficient. I will give it a shot, the only memory benchmark I have done with this same system is geekbench and when I look at my memory scores compared with others they seem perfectly normal. If my bandwidth was off the memory score part of this bench would probably be affected also right? Will run aida. Also isnt it strange that 1/64 and 4/512 are so far apart. 1/64 is very close to normal and 4/512 isnt. Strong Island`s Geekbench3 - Multi Core score: 14551 points with a Core i3 6100 Quote Link to comment Share on other sites More sharing options...
_mat_ Posted June 21, 2016 Author Share Posted June 21, 2016 I don't know what Geekbench really does when benching memory. If it's mostly a bandwidth test, it should be affected as well. The gap between 4/512 and 1/64 says a lot. The more the batch size is adjusted to the architecture itself, the faster the bench will be. That's because the workload is aligned to the maximum worksize, that can be run in parallel. 4M seems to be the best choice for the 6100 with 2 Cores/4 Threads. About the same is true for the reduction size. The bigger the better, because 512 means that 512 sums will be added up to one until only one sum remains. Lets say that we want to sum up 8192 single numbers, that would be: step 1: sum of 512 numbers, sum of 512 numbers, ... (16 times) step 2: sum of 16 numbers = result Where as the reduction size of 64 would produce: step 1: sum of 64 numbers, sum of 64 numbers, sum of 64 numbers ... (128 times) step 2: sum of 64 numbers, sum of 64 numbers step 3: sum of 2 numbers = result If you consider that GPUPI produces billions of partial results, that need to be added up, then 512 also needs a lot less steps in general to sum up the batches after they are calculated. Additionally the bigger the batch size, the less reduction process have to be made for the calculation. So these two values mean a lot for the whole computation. Quote Link to comment Share on other sites More sharing options...
Strong Island Posted June 21, 2016 Share Posted June 21, 2016 (edited) I don't know what Geekbench really does when benching memory. If it's mostly a bandwidth test, it should be affected as well. The gap between 4/512 and 1/64 says a lot. The more the batch size is adjusted to the architecture itself, the faster the bench will be. That's because the workload is aligned to the maximum worksize, that can be run in parallel. 4M seems to be the best choice for the 6100 with 2 Cores/4 Threads. About the same is true for the reduction size. The bigger the better, because 512 means that 512 sums will be added up to one until only one sum remains. Lets say that we want to sum up 8192 single numbers, that would be: step 1: sum of 512 numbers, sum of 512 numbers, ... (16 times) step 2: sum of 16 numbers = result Where as the reduction size of 64 would produce: step 1: sum of 64 numbers, sum of 64 numbers, sum of 64 numbers ... (128 times) step 2: sum of 64 numbers, sum of 64 numbers step 3: sum of 2 numbers = result If you consider that GPUPI produces billions of partial results, that need to be added up, then 512 also needs a lot less steps in general to sum up the batches after they are calculated. Additionally the bigger the batch size, the less reduction process have to be made for the calculation. So these two values mean a lot for the whole computation. ya it's just strange because the top scores for dual core are 6100's and are using 4/512. I'm sorry I'm probably being so annoying but Im just so confused. Even switched mobos today and did fresh install and didnt disable any services and 1/64 and 4/512 are still about 7sec apart with 1/64 being faster. The bigger my batch size the higher my reduction time is. With gpupi 2.2 am I supposed to have opencl drivers in the gpupi folder, like the cuda drivers are? Because I dont have those in my folder when I download gpupi. Edited June 21, 2016 by Strong Island Quote Link to comment Share on other sites More sharing options...
_mat_ Posted June 21, 2016 Author Share Posted June 21, 2016 With gpupi 2.2 am I supposed to have opencl drivers in the gpupi folder, like the cuda drivers are? Because I dont have those in my folder when I download gpupi.No, it's not necessary. OpenCL works out of the box when installed on the system. Quote Link to comment Share on other sites More sharing options...
Strong Island Posted June 21, 2016 Share Posted June 21, 2016 No, it's not necessary. OpenCL works out of the box when installed on the system. ok thanks, damn thought I found something. I never had trouble with gpupi before. Got some really nice scores with gtx 980 and 5960x. Only the non k cpu's are giving me trouble. Quote Link to comment Share on other sites More sharing options...
Guest PH Benchmarker Posted June 21, 2016 Share Posted June 21, 2016 What do you exactly mean with crashing? Runtime assertion? Please specify the exact error message and have a look at GPUPI.log as well. A wrangled OpenCL driver is btw much more common to produce such an error. Please uninstall any driver and reinstall again. GPUPI Log: LOG START at 2016-06-21 20:57:42 ---------------------- Timer frequency is only 3139160 Hz. HPET timer not found! CUDA driver version is insufficient for CUDA runtime version Timer frequency is only 3139160 Hz. HPET timer not found! Starting run to calculate 1000000 digits with 1 batches Batch Size: 1M Maximum Reduction Size: 64 Message box: Press OK to start the calculation. (Start) Timer frequency is only 3139160 Hz. HPET timer not found! I'm running Windows 7 x64 Quote Link to comment Share on other sites More sharing options...
GtiJason Posted June 22, 2016 Share Posted June 22, 2016 (edited) GPUPI Log: LOG START at 2016-06-21 20:57:42 ---------------------- Timer frequency is only 3139160 Hz. HPET timer not found! CUDA driver version is insufficient for CUDA runtime version Timer frequency is only 3139160 Hz. HPET timer not found! Starting run to calculate 1000000 digits with 1 batches Batch Size: 1M Maximum Reduction Size: 64 Message box: Press OK to start the calculation. (Start) Timer frequency is only 3139160 Hz. HPET timer not found! I'm running Windows 7 x64 Try to enable HPET (High Precision Event Timer ) in BIOS. And read FAQ section from link below about hpet. You'll need to run Command Prompt (Admin) showing. . . bcdedit /set useplatformclock yes https://www.overclockers.at/news/gpupi-international-support-thread EDIT: Make sure you have Visual Studio 2013 runtime download name is vcredist_64.exe Edited June 22, 2016 by GtiJason Quote Link to comment Share on other sites More sharing options...
Strong Island Posted June 22, 2016 Share Posted June 22, 2016 it looks like I finally figured it out. Maybe I was installing the amd sdk wrong. I used my hd5450 and installed catalyst 14.2 beta 1.3 and my first couple tests 4/512 was faster than 1/64 and it seemed normal again. Of course right as I figured it out I had trouble with cold bug on mocf so I had to stop but I think as long as chip is ok, tonight should be awesome. Just wanted to say thanks for everyone's help, especially mat and Jason. Quote Link to comment Share on other sites More sharing options...
Guest PH Benchmarker Posted June 22, 2016 Share Posted June 22, 2016 Try to enable HPET (High Precision Event Timer ) in BIOS. And read FAQ section from link below about hpet. You'll need to run Command Prompt (Admin) showing. . . bcdedit /set useplatformclock yes https://www.overclockers.at/news/gpupi-international-support-thread EDIT: Make sure you have Visual Studio 2013 runtime download name is vcredist_64.exe Roger that!! The Visual Studio have been installed, and gonna go try enable the HPET. Quote Link to comment Share on other sites More sharing options...
Strong Island Posted June 23, 2016 Share Posted June 23, 2016 Man I dont think a score ever felt so good. I was trying for almost 2 months for this. New dual core #1. Thank you so much _mat_ and gtijason for helping me so much Quote Link to comment Share on other sites More sharing options...
Massman Posted June 24, 2016 Share Posted June 24, 2016 Congrats @Strong Island, epic score! Quote Link to comment Share on other sites More sharing options...
Strong Island Posted June 24, 2016 Share Posted June 24, 2016 Congrats @Strong Island, epic score! Thanks a lot, been having so much fun with this cpu, and to think it only cost me $100. Quote Link to comment Share on other sites More sharing options...
_mat_ Posted June 24, 2016 Author Share Posted June 24, 2016 Awesome work and congrats! Quote Link to comment Share on other sites More sharing options...
Guest PH Benchmarker Posted June 27, 2016 Share Posted June 27, 2016 Man I dont think a score ever felt so good. I was trying for almost 2 months for this. New dual core #1. Thank you so much _mat_ and gtijason for helping me so much Congratulations!!!!! About my problem with GPUPI, i solved this issue installing SP1 for Windows 7. Hugs. Quote Link to comment Share on other sites More sharing options...
_mat_ Posted June 27, 2016 Author Share Posted June 27, 2016 Nice and thanks for reporting back! Added both solutions to the FAQ. Quote Link to comment Share on other sites More sharing options...
_mat_ Posted July 9, 2016 Author Share Posted July 9, 2016 Just a quick heads up! Tomorrow I will release GPUPI 2.3 with multiple bugfixes and features. I am very happy with the new support of CUDA 8.0 plus serveral optimizations of the CUDA kernels that finally led to faster scores than the OpenCL implementation. Have a look at this score with a GTX 1080, it's top 10 on air cooling - and my sample doesn't really clock well: So hold your horses for now if you are benching NVIDIA cards with GPUPI. Quote Link to comment Share on other sites More sharing options...
_mat_ Posted July 16, 2016 Author Share Posted July 16, 2016 For the purpose of full exposure, I am posting this here as well. After lots of hours of bugfixing version 2.3 is finally bulletproof. Please redownload the newest build for before benching: GPUPI 2.3.4 Additionally the following features were added in the last four minor version: Supporting Tesla graphics cards Support for more than 8 devices - theoretically thousands of devices could be used for calculation now! Detection of AMDs RX 480 graphics cards Important bugfixes for the Legacy version and GeForce 200 series cards Cleanup for source code Download: https://www.overclockers.at/news/gpupi-international-support-thread Many many thanks to dhenzjhen again, because of his support GPUPI is now better and more flexible than ever! If you haven't seen his score with 10 Tesla M40s you better do it now: dhenzjhen`s GPUPI - 1B score: 2sec 621ms with a GeForce GTX 1080 (it's momentarily filed under GTX 1080 because the M40s are not in the database) Quote Link to comment Share on other sites More sharing options...
havli Posted July 16, 2016 Share Posted July 16, 2016 Good work, looking forward to test the GTX 285 in the future. But there is bug of some kind affecting the legacy 2.3.4 executable when using winXP. At the moment I'm trying to run the new GPUPI on P4 Celeron + winXP SP3 32bit, the only thing I get is this error: ... is not a valid win32 application. GPUPI 2.2. works fine on this PC. Perhaps wrong compile settings, which prevents GPUPI 2.3.4 from running on XP? Quote Link to comment Share on other sites More sharing options...
_mat_ Posted July 17, 2016 Author Share Posted July 17, 2016 I have not tested the Legacy version yet. It was compiled with VS 2013 (instead of 2012) which introduces some major changes. I will have a look at it as soon as I have more time. Btw: I don't know if I will continue to support the Legacy version. It's a lot of work and has very few downloads. This may be the last version. Quote Link to comment Share on other sites More sharing options...
_mat_ Posted July 17, 2016 Author Share Posted July 17, 2016 (edited) I fixed it, but couldn't make it run in a virtual machine because I can't install OpenCL. Please redownload this version and try again: https://www.overclockers.at/news/gpupi-2-3 Edited July 17, 2016 by _mat_ Quote Link to comment Share on other sites More sharing options...
havli Posted July 17, 2016 Share Posted July 17, 2016 This build works! Thank you. Quote Link to comment Share on other sites More sharing options...
_mat_ Posted July 17, 2016 Author Share Posted July 17, 2016 Awesome! Thanks for your feedback. Quote Link to comment Share on other sites More sharing options...
Oj0 Posted July 17, 2016 Share Posted July 17, 2016 Does this build fix runs done using different cards? Eg with a 980 and 970 or a 1080 and 970 it would error out after the 4th iteration even at stock although 2.2 was fine even with both cards heavily overclocked. Quote Link to comment Share on other sites More sharing options...
_mat_ Posted July 17, 2016 Author Share Posted July 17, 2016 Yes, use 2.3.4 it fixes all fo the sync and validation issues with multiple cards. Btw, 2.2 had no runtime validation, it only validates the final result. That's why it works. Quote Link to comment Share on other sites More sharing options...
Oj0 Posted July 17, 2016 Share Posted July 17, 2016 Yes, use 2.3.4 it fixes all fo the sync and validation issues with multiple cards. Btw, 2.2 had no runtime validation, it only validates the final result. That's why it works. I know, but 2.2 would pass the end validation. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.