Jump to content
HWBOT Community Forums

_mat_

Members
  • Posts

    1003
  • Joined

  • Last visited

  • Days Won

    41

Posts posted by _mat_

  1. Very weird error. Points to either faulty memory or I fucked something up in the code. ;)

     

    As the error seems to occur after the last loop (it's just not drawn yet), it will have to do with accumulation of the partial results or the validation of the final sum. Both depend only on system memory, that's why the error happens both on GPU and CPU.

     

    The error is not repeatable on my side and was never reported by anyone else, so I guess that your memory is faulty. Try memtest86 and Prime95 with focus on memory for an hour or so and let me know if your system is still stable.

  2. This answer from the FAQ section might help you:

     

    GPUPI says that MSVCP110.dll/MSVCP120.dll is missing?

     

    The benchmark executable is compiled with Visual Studio 2013 and therefor needs the Visual C++ Redistributable Packages for Visual Studio 2013. Download vcredist_86.exe to run GPUPI.exe (32 bit) or vcredist_64.exe to run GPUPI_x64.exe (64 bit).

     

    The legacy version of the benchmark needed to be built with Visual Studio 2012 to allow CUDA support on Windows XP. So you will have to install the Visual C++ Redistributable Packages for Visual Studio 2012 instead.

  3. havli, we will look into this. Thanks for your feedback.

     

    Strunkenbold, this is a very old submission with GPUPI 1.3, that was done manually, not via the HWBOT integration. The user made the mistake to enter the wrong calculation time. Please report it to a moderator, it should get fixed or removed.

  4. The validation file only avoids photochopping of screenshots and validates the result itself. It does not show irregularities in loops, if you know what I mean. Only GPUPI 2.2 ensures that to 100%. Well, let's see if the screen gets redrawn, but I won't bet on it. ;)

  5. All I want to know is why CPU's that ran on version 1.4 will no longer run on version 2.2.

    Are we sacrificing some older stuff to bench newer stuff for compatibility reasons? That's all I'm asking.

    You stated before that CPU support shouldn't have changed version to version but clearly it has.

    Sorry if you think I'm wasting your time. Just figured you wanted to know. I won't bother you any more.

    BTW, I know a little more than you credit me for......not that it mattered anyway.

    That's completely different than saying that the bench is broken and I am happy to answer it.

     

    As I said before, CPU support has not changed at all. The hardware limitations are OpenCL support (which depends on SSE availabilty) and double precision support. So if it's not listed as available device, the installed drivers do not recognise it as device, that can run OpenCL code. If it's listed but ignored because of missing double precision support, it's because the drivers say that dp is not available. In conclusion and as answered a few posts above the limitations depend heavily on the installed drivers, not on the version of GPUPI.

     

    I would recommend to install the AMD OpenCL drivers also for Intel CPUs especially on hardware, because they support CPUs since SSE2.x. The Intel drivers only offer support since SSE4.1, which excludes all Pentium 4 models. Seems like that may be the problem that mr. paco posted here.

     

    Regarding GeForce 200 series cards, there also have been no deliberate changes to sacrifice compatibility. But there have been changes to the kernel calls to allow multiple GPUs and since then many smaller kernel improvements (mostly performance enhancements) were implemented. That's why the GPUPI 2.2 kernel with high precision exceeds the available registers and shared memory limitations of the GeForce GTX 285. As soon as I get my hands on one of those cards, I will optimize the legacy version to slim the code down. Maybe just for these cards.

     

    So the bench is far away from being broken and it would be sad if it would be after all the time that I have invested. These are just driver bugs or certain limitations for old hardware, that I would never have anticapted to work at all.

     

    Well, works pretty well to me. I discovered (by my mistake) only one serious bug, the cosmetic bug is, that when you click into the window during bench, you get only the last line of the bench reported on the screen now... but that it is.
    Can anybody reproduce this? I tried with all my test systems and had no luck at all. If I had a guess, I would say that the output buffer was and is deleted because there are memory problems. The clicking just redraws the window, that's all. Let's see if the run turns out valid ... keep me posted please. :)
  6. And I said earlier as well, Mr Scott, you have little to no knowledge on this topic. Parallel processing is a relatively new technology, especially for these old cards. APIs, drivers and GPUs themselves have bugs and several limitations, which are constantly improved with every generation. I am actually quite impressed that so much old hardware could be successfully benched.

     

    So Mr, stop with your over-simplifying oneliner bunnyextraction. It's not appreciated here.

  7. I research some more and the problem seems to be register limitations of these old graphics cards. So the calculation code of the high precision loops is too complex to be successfully processed. To fix this I would have to get myself a GTX 200 series card and restructure the code to use less registers. I might do that, but it will take a while.

     

    Btw, OpenCL on GTX 200 series should not be a problem. CatEye (Turrican's sister) benched a GTX 295 on Windows 7 64 bit with GeForce 340.52 drivers. Have you tried that?

     

    http://hwbot.org/submission/2776702_cateye_gpupi___1b_geforce_gtx_295_11min_46sec_73ms

     

    Please also be sure to only have cudart32_65.dll and cudart64_65.dll next to the GPUPI executable. If you still have cudart32_60.dll etc in there, delete these files.

  8. I'm sorry to bring this up again... but despite all efforts Nvidia G200 still refuses to work with GPUPI 2.2 (legacy). Although the error message is different this time.

     

    If you manage to fix this issue, I promise to bench all G200 videocards I can find. :D

    We certainly will get to the bottom of this. :)

     

    Have you tried more recent drivers? The drivers you are using only support CUDA 6.0.5, but the legacy version is now bundled with CUDA 6.5. Please use the newest drivers possible. The 340.52 should work for your combination: http://www.geforce.com/drivers/results/77225

     

    Btw, try to use OpenCL as well. Maybe it works better.

  9. Batch 6 finally finished - 48h? What is going on?

     

    That way I will be lucky if this run is finished, before are finished the competition... But why such brutal slow-down? I did not touch anythig...

    As I said, depending on the batch size the slow and fast loops converge into each other. Seems like loop 6 uses a lot 128 bit integer calculations, loop 5 was still part of the 64 bit integer kernels.

     

    I got the problem with this benchmark, always getting crash on 4/512 have trying fresh install os W7 W8 and trying different version gpu pi still same but run normal with another 4/512 did anyone know how to fixed ?
    Don't use a reduction size of 512. Seems like the drivers can't handle these reduction depths. Use 256 instead, it should work (or the system is not stable).

     

    Was support for some older hardware disabled in the latest revision for the sake of running newer stuff?

    This is a s423.

    Any ideas why it says its being ignored?

    As the message states, these CPUs do not support double precision calculations with OpenCL. This has nothing to do with the version of GPUPI.

     

    And I clicked few times into the working windows and on the next batch I get this:

     

    ...hope I did not screw up anything. I will stop clicking for sure :(

    You somehow deleted the output text buffer (which is not possible at all). If the whole output text buffer does not reappear on the next loop, I would restart. Otherwise the moderators could reject the result. :(
  10. As a question to that bold part, would it be possible to add a setting for how many batches would be dispayed during a run if it doesnt affect the calculation anyways?

    I guess it doesnt mater that much on new hardware but would be sweat for those older CPUs where the runs take like an hour+

     

    EDIT:

    And I think there is some rounding error, atleast in the smaller tests (running 32m to test how different settings affect the speed) couse all batches end with a time like XX.999s

    It's possible for sure, but I want to keep a certain standard for the output, so results are easy to moderate. Custom loop sizes won't help with that and might get banned from HWBOT for that reason.

     

    Sounds more like a bad resolution of your OS timer. Try using the HPET timer (see the FAQ on the download page of GPUPI), you will get more precise results.

     

    Btw, testing batch sizes on 32M is not a good strategy, because only the fast loops will be measured. You should take the high precision loops of 1B into account as well.

  11. That's normal behaviour. There are two different kinds of loops for 1B results. The loops to calculate the partial results until 500M use less precision and are therefor faster. The loops between 500M and 1B have to use 128 bit integer algorithms, which is much slower. This behaviour is repeated for 4 times, so it's like this:

     

    fast loops

    slow loops

    fast loops

    slow loops

    fast loops

    slow loops

    fast loops

    slow loops

    => result

     

    The number of loops that are fast/slow and how slow/fast they actually are is determined by the batch size. But don't worry, the batch size only slows down or speeds up the whole calculation, depending on how the hardware can process the work load (too small and not all cores can be used at once, too big and the hardware is overwhelmed). It will influence the loops, but it's not like fiddling with the batch size will introduce more slow loops. As I tried to explain, the whole calculation time will just be split differently between those loops. That's because the loops, that are show are just a visual thing and do not really show these two different parts of the calculation.

     

    Well, a difficult topic, but I hope I could shed some light on this.

  12. GPUPI 2.2 is out now! Thanks to the feedback from you guys, I was able to fix a lot of things and improve many others. It will be mandatory to use this version in the future, because there are some important changes to make the bench even more bullet proof. But we will wait until the the currently started competitions have finished, including the Team Cup of course.

     

    Last but not least I would like to talk about our next plans with GPUPI. Thanks to HWBOT I can integrate CPU-Z into the next version, which will improve hardware detection a lot and allows several frequencies and even voltages to be automatically submitted to the database. Additionally we are going to include support for HWBOT competitions directly into the online submission dialog. I already have a prototype working, but I didn't want to rush anything. :)

     

    Full changelog and download here: https://www.overclockers.at/news/gpupi-2-2-english

  13. Ah, thanks for the tips, _mat_ ! Did someone reported that bug to Intel yet?

    I, sadly, did not have any OpenCL supported AMD cards, so could you pretty please extract the OpenCL installer for me from the drivers, so I'm out of luck?

    As GENIEBEN already posted, you should be able to install the AMD OpenCL drivers with the APP SDK 2.9.1.

     

    I will be at an Intel Technical Workshop in London soon, where one of the topics is parallel processing. Hopefully I can make some contacts to forward our findings to the Intel OpenCL driver team, so the bugs can get addressed properly.

  14. This was run on XP pro, 32bit with sp3.

     

    As of the quick solution, there is nothing called HPET mentioned in bios or manual and from what Ive read its introduced around 2005 so after this motherboards are launched (dfi lanparty ut nf3 250gb). I was about to try and disabling HPET in windows to try and see if it worked tho but managed to corupt the instalation before I could do that.

     

    I can also say that the timer worked good (no noticible variation atleast) when I tried running at the normal frequency (200fsb) atleast, unfortunatly didnt get a chanse to test how higher FSB affected the timer as that gave the corrupt OS...

    I have looked into this issue now and it seems like your mainboard and OS combination falls back to an invalid and likely bus dependend timer function. That's why the time is skewed.

     

    I have already implemented a more secure way to detect issues with the timer to be able to fall back to RTC on XP accordingly. It will be published with the next version of GPUPI.

×
×
  • Create New...