Jump to content
HWBOT Community Forums

Mysticial

Members
  • Posts

    156
  • Joined

  • Last visited

  • Days Won

    2

Everything posted by Mysticial

  1. Ram will have no effect on that benchmark. The benchmark is 100% CPU. I was able to calculate your clock speed because the benchmark achieves very close to the theoretical FLOPs on the system. For the Core i9 7900X assuming full-throughput AVX512: (2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (10 cores) * (4.5 GHz) = 1440 GFlops The benchmark is showing 1443.84 GFlops. It's actually slightly more than the theoretical limit because of timing variations.
  2. Wow! Over 1 TFlops for double-precision! CPUz doesn't seem accurate in that screenshot. But based on the numbers it looks like you were clocked around 4.5 GHz? Possibly in 100 x 45 configuration? And it didn't melt?
  3. That's good to see! Full output? Though I'm seeing rumors that the integer throughput will not be doubled. And I can see architecturally why that might be. Unfortunately I don't have a benchmark for that.
  4. Would you be able to try with the latest binaries? I updated them last night. As far as I can tell, I've removed the check. So it should get past that message and either run successfully or crash. Thanks for you time.
  5. I found a way to disable that check by the compiler and I've updated the binaries. So if anyone is willing to try now, it should (hopefully) work regardless of whether RDSEED is enabled or not. Thanks.
  6. Thank you! This is interesting though. The compiler seems to be trying to enforce that the computer has RDSEED instructions. But RDSEED was already available starting from Broadwell. I don't see why it would be missing from Skylake X unless it was explicitly disabled in the BIOS or something. This might be a problem moving forward since the compiler forces these checks even though most programs won't use them anyway. EDIT: Is virtualization disabled in the BIOS? I'm reading around and it seems that some machines have all the crypto instructions disabled (AES-NI, RDRAND, and RDSEED) and it may be related to virtualization.
  7. Bump. NDAs lifting today. I'm most curious about the 7820X and the 7900X. EDIT: The reviews seems to indicate that the 6 and 8-core models will have half-throughput, and the 10-core model will have full-throughput. Microarchitecture Analysis: Adding in AVX-512 and Tweaks to Skylake-S - The Intel Skylake-X Review: Core i9 7900X, i7 7820X and i7 7800X Tested
  8. Right now, there are conflicting reports that this first line of Skylake X processors (based on the 10-core Skylake Purley LCC die) will not have full-throughput AVX512. Skylake-X not support AVX-512 instructions Skylake-X i7-7900X Performance Leaked: 55% faster than i7-6950X @ 4.5GHz If this is true, the current Skylake X processors will only be able to run AVX512 at half the speed as the server Xeons - IOW, no better than AVX2. I want to definitively answer this question - both for myself and for anyone else looking to purchase a Skylake X processor for the purpose of AVX512. Using the same FLOPs benchmark that discovered the Ryzen FMA bug, we should be able to find out if Skylake X has full-throughput, or half-throughput AVX512. So my request for someone who has a Skylake X sample* to: Run the "2017-SkylakePurley" binary here: https://github.com/Mysticial/Flops/tree/master/version3/binaries-windows** Do it at a fixed CPU frequency (to avoid the affects of Turbo Boost). Do it with HT enabled. Don't use an extreme overclock. If the chip has full-throughput AVX512, then those AVX512 instructions may produce more heat than any other benchmark you've ever run. Do it with a fully updated Windows 10. Or a recent version of Linux (like Ubuntu 17.04). This is needed to ensure that the OS has support for AVX512. *I may be wrong, but I don't believe Skylake X benchmarks are under NDA anymore since there's already a gazillion HWBOT submissions and you can get access to the server variants on Google Cloud. **The source code is also in that GitHub repo if you want to build it yourself. But be aware that if you need the Intel Compiler if you want to build the AVX512 binaries for Windows. ---------------- When you run the benchmark, I expect one of 3 things to happen: The binary crashes: This means that Windows 10 does not have support for AVX512 and we'll need to wait for that support. The numbers for 512-bit AVX are about the same as the 256-bit AVX: This means that the processor only supports half-throughput AVX512. The numbers for the 512-bit AVX are about 2x as that of the 256-bit AVX: This means that the processor supports full-throughput AVX512. Here is what the benchmark looks like for a 32-core Skylake Purley system on Google Cloud running at 2.0 GHz with 2.5 GHz turbo: Running Skylake Purley tuned binary with 1 thread... Single-Precision - 128-bit AVX - Add/Sub GFlops = 15.904 Result = 2.02376e+06 Double-Precision - 128-bit AVX - Add/Sub GFlops = 7.952 Result = 1.00995e+06 Single-Precision - 128-bit AVX - Multiply GFlops = 15.936 Result = 2.03498e+06 Double-Precision - 128-bit AVX - Multiply GFlops = 7.968 Result = 1.00712e+06 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.69085e+06 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 7.968 Result = 841756 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.02868e+06 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 15.936 Result = 1.01782e+06 Single-Precision - 256-bit AVX - Add/Sub GFlops = 31.808 Result = 4.06688e+06 Double-Precision - 256-bit AVX - Add/Sub GFlops = 15.936 Result = 2.02901e+06 Single-Precision - 256-bit AVX - Multiply GFlops = 31.872 Result = 4.06158e+06 Double-Precision - 256-bit AVX - Multiply GFlops = 15.936 Result = 2.02013e+06 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 31.872 Result = 3.34696e+06 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.70441e+06 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 63.744 Result = 4.0399e+06 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.00801e+06 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 63.744 Result = 8.11456e+06 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 31.872 Result = 4.03949e+06 Single-Precision - 512-bit AVX512 - Multiply GFlops = 63.36 Result = 8.0743e+06 Double-Precision - 512-bit AVX512 - Multiply GFlops = 31.872 Result = 4.05014e+06 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 63.744 Result = 6.68723e+06 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 31.872 Result = 3.3739e+06 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 127.488 Result = 8.22848e+06 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 63.744 Result = 4.03805e+06 Running Skylake Purley tuned binary with 64 thread(s)... Single-Precision - 128-bit AVX - Add/Sub GFlops = 683.36 Result = 8.68179e+07 Double-Precision - 128-bit AVX - Add/Sub GFlops = 263.568 Result = 3.35065e+07 Single-Precision - 128-bit AVX - Multiply GFlops = 527.616 Result = 6.69453e+07 Double-Precision - 128-bit AVX - Multiply GFlops = 263.88 Result = 3.34619e+07 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 527.136 Result = 5.58561e+07 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 263.64 Result = 2.79832e+07 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 1056.77 Result = 6.71142e+07 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 528.336 Result = 3.36188e+07 Single-Precision - 256-bit AVX - Add/Sub GFlops = 1054.14 Result = 1.34076e+08 Double-Precision - 256-bit AVX - Add/Sub GFlops = 527.52 Result = 6.68866e+07 Single-Precision - 256-bit AVX - Multiply GFlops = 1056.77 Result = 1.34416e+08 Double-Precision - 256-bit AVX - Multiply GFlops = 527.664 Result = 6.70251e+07 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 1055.33 Result = 1.12018e+08 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 527.52 Result = 5.59086e+07 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 2110.08 Result = 1.34046e+08 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 1055.33 Result = 6.69451e+07 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 2112.26 Result = 2.68216e+08 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 1056 Result = 1.34131e+08 Single-Precision - 512-bit AVX512 - Multiply GFlops = 2117.38 Result = 2.69031e+08 Double-Precision - 512-bit AVX512 - Multiply GFlops = 1059.26 Result = 1.34601e+08 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 2118.14 Result = 2.24393e+08 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 1058.5 Result = 1.12102e+08 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 4242.43 Result = 2.69409e+08 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 2115.07 Result = 1.34365e+08 This Skylake Purley system has full-throughput AVX512.
  9. I can't reproduce the issue. Submissions work fine for me. Can you give me more information? Such a screenshot of the error?
  10. I'll investigate tonight. I pushed an update a few days ago, but it's certainly possible that I broke something.
  11. As an update, I'm now using the Gigabyte GA-AB350M. With BIOS version F2, the system crashes on the flops benchmark. With BIOS version F3c, it no longer crashes. So indeed, this does appear to be fixed.
  12. I did actually make it in time. Version 0.7.2 has been released. Happy Pi Day everyone! This new version is faster than the previous version on pretty much all processors that I've tested on. For internal reasons, you are now required to use the latest submitter version (v0.9.6). Here are some submissions from the Ryzen build that I used to tune for Zen. Mysticial`s Y-Cruncher - Pi-25m score: 1sec 503ms with a Ryzen 7 1800X Mysticial`s Y-Cruncher - Pi-1b score: 1min 36sec 626ms with a Ryzen 7 1800X Mysticial`s Y-Cruncher - Pi-10b score: 22min 12sec 565ms with a Ryzen 7 1800X
  13. Before I go through the trouble of trying out Win7. Have you guys tried running the benchmark? Did it crash? (I'm desperately working to get the Zen tuning parameters for y-cruncher v0.7.2 in time for March 14. So I don't really have that much time to keep debugging this.)
  14. The BIOS won't let me set an SOC voltage offset of more than 0.2. It puts a hard limit of 1.0 volts. Perhaps it can forced higher via the LLC settings. But I'm hesitant to put settings up the limit of what the BIOS allows. I'll play around with that a bit tomorrow. But if what you say is true (SOC should be 1.0 - 1.2), and the BIOS number is correct, then perhaps ASUS is simply setting it too low to begin with?
  15. Things I've tried: One stick of memory. Crashes both with my Corsair and G.Skill TridentZ. Two different video cards. Two different installations of Win10 on different devices. (SSD + HD) The only parts I haven't changed are: The CPU. (I only have one Ryzen CPU.) The PSU. (I don't have any spare PSUs lying around and it's too much work to take apart my other builds.) The motherboard. (I only have one AM4 motherboard.) Temperatures are always below 80C. So I doubt it's a cooling issue.
  16. Enabling/disabling HPET has no effect. Both instantly crash. I can't install Win7 because the installer doesn't have USB drivers and I don't have a PS2 mouse/keyboard.
  17. There's no option to disable XFR or turbo in my BIOS. I don't trust CPUz's vcore reading since it is clearly too high and it conflicts with AI Suite. These are at stock settings, so it shouldn't getting anywhere near 1.5 anyway. When I use AI Suite to manually downclock, it seems to disable both the XFR and the turbo and it holds the frequency steady at 2.2 GHz. The vcore seems to stay at a static 1.35 (under load) according to AI Suite. Again CPUz jumps all the over place to as high as 1.550. But that's beside the point. It really shouldn't be crashing at stock settings - let alone downclock. Which is why I'm looking for more people to test this on different motherboards and from different manufacturers. So far I have 3 positive confirmations (crash), and zero negative confirmations (did not crash). The crashes have these setups - all running at stock and/or underclocked. 1800X + Asus Prime B350M-A (BIOS 0502) 1700 + Asus Prime B350M-A (BIOS ???) 1700 + Asus CrossHair The unanswered questions that I want to know are: Specific to my setup? No - Confirmed by two other people. Specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS? Is this an issue with Windows? Is this a CPU errata? (I hope not - however unlikely it might be.)
  18. For me yes. BIOS 0502 (February 28) The BIOS and AI Suite show a vcore of 1.350. CPUz shows it as 1.550. And it also happens when underclocked to 2.2 GHz. The Windows Event Log occasionally manages to record which core it crashes on. It's pretty random among all 16 vcores. There's no single core that it always happens to. IOW, I don't see any signs of weakness to a specific core.
  19. Uh oh... This doesn't look good. I also have one other confirmation on a different forum. Other things to note: It doesn't always freeze instantly. I have a different Win10 installation that sometimes manages to survive the first FMA test only to crash on the second. The crash doesn't reproduce in Linux, but the code for Linux is slightly different since it uses a different compiler.
  20. One of my internal benchmark applications is insta-hard-freezing on Ryzen. Ryzen 7 1800X Asus Prime B350M-A (BIOS 0502) 4 x 8GB Corsair CMK32GX4M4A2400C14 @ 2133 MHz Nothing is overclocked. Everything is stock. Windows 10 Anniversary Update When I run the Haswell binary from here: https://github.com/Mysticial/Flops/tree/master/version2/binaries-windows The entire system usually freezes when it gets to: Sometimes, it will make it past that, but it usually ends up crashing/freezing later on in the test anyway. For those who don't trust the binary, the program is completely open-sourced in that GitHub repo. If you have Visual Studio installed: Open the project, build the x64 Haswell binary, and run. For me this always hard freezes the computer: At all clock speeds. When running single-threaded, it happens to any core that I pin it to. The questions that I want to answer are: Is this specific to my setup? No - Confirmed by multiple other people. Is this specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS? Is this an issue with Windows? The crash does not seem to happen in Linux, but that is with slightly different code due to differing compilers. Is this a CPU errata? (I hope not - however unlikely it might be.) --------------------------- Current Testing Status: All of these are running Windows, and are at stock settings or underclocked. Confirmed Crashes: 1800X + Asus Prime B350M-A (BIOS 0502) 1700 + Asus Prime B350M-A (BIOS ???) 1700 + Asus Crosshair VI Hero 1700 + Asus Crosshair VI Hero (BIOS 5803) (two sets of memory G.Skill + Kingston - also fails with overvolted SOC) 1800X + Asus Crosshair VI Hero (Windows 7) - Once pass, mostly failures. Confirmed No-Crash: none yet For those interested in the technical details, I'm getting hard freezes for all types of FMAs (128-bit, 256-bit, single and double precision). But for some reason, it only affects this particular benchmark. Other programs (like prime95 and y-cruncher) aren't affected despite using FMAs. --------------------------- Update 3/16/2017: As much as I had least expected this to be the case, this appears to have been confirmed as an errata in the AMD Zen processor. In other words, the last bullet on my list (and the most serious). Fortunately, it's one that is fixable with a microcode update and will not result in something catastrophic like a recall or the disabling of features. To everyone pouring in from the various news sites: The important part is that a user mode program should not be able to hard freeze the entire system. Because if it can (as is the case here), it makes it possible to perform DOS attacks. IOW, this errata is a security issue. Don't be fooled by the "Haswell binary". The benchmark is 5 years old and I've largely neglected it for the last 3. So I haven't updated it for Zen yet. Any processor will be able to run any of the binaries if it supports the underlying instruction sets. If it doesn't, the program merely crashes with an, "illegal instruction". Under no circumstances should a user-mode application be able to bring down an entire system.
  21. This whole DDR4 shortage is not playing along well with the Zen demand. I did manage to get some 16GB sticks without paying an arm and a leg, but that sold out within hours after I ordered. At least I have a tracking # this time... So I think I'm safe this time. Why do smartphones need DDR4? The lower voltage?
×
×
  • Create New...