Jump to content
HWBOT Community Forums

Mysticial

Members
  • Posts

    158
  • Joined

  • Last visited

  • Days Won

    2

Everything posted by Mysticial

  1. I'm doing some AVX512 testing right now and it seems that Intel found a very sneaky but ingenious way to do wattage throttling. More details on that later. And by "later", I meaning probably a week from now since it's "quite complicated".
  2. Now THAT's interesting... They also show full-throughput AVX512. That's contrary to what all the articles out there are reporting. (2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 864 GFlops Benchmark shows 872.832 GFlops. If they were only half-throughput, I'd have expected: (1 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 432 GFlops Thanks for running all these benchmarks!
  3. Ram will have no effect on that benchmark. The benchmark is 100% CPU. I was able to calculate your clock speed because the benchmark achieves very close to the theoretical FLOPs on the system. For the Core i9 7900X assuming full-throughput AVX512: (2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (10 cores) * (4.5 GHz) = 1440 GFlops The benchmark is showing 1443.84 GFlops. It's actually slightly more than the theoretical limit because of timing variations.
  4. Wow! Over 1 TFlops for double-precision! CPUz doesn't seem accurate in that screenshot. But based on the numbers it looks like you were clocked around 4.5 GHz? Possibly in 100 x 45 configuration? And it didn't melt?
  5. That's good to see! Full output? Though I'm seeing rumors that the integer throughput will not be doubled. And I can see architecturally why that might be. Unfortunately I don't have a benchmark for that.
  6. Would you be able to try with the latest binaries? I updated them last night. As far as I can tell, I've removed the check. So it should get past that message and either run successfully or crash. Thanks for you time.
  7. I found a way to disable that check by the compiler and I've updated the binaries. So if anyone is willing to try now, it should (hopefully) work regardless of whether RDSEED is enabled or not. Thanks.
  8. Thank you! This is interesting though. The compiler seems to be trying to enforce that the computer has RDSEED instructions. But RDSEED was already available starting from Broadwell. I don't see why it would be missing from Skylake X unless it was explicitly disabled in the BIOS or something. This might be a problem moving forward since the compiler forces these checks even though most programs won't use them anyway. EDIT: Is virtualization disabled in the BIOS? I'm reading around and it seems that some machines have all the crypto instructions disabled (AES-NI, RDRAND, and RDSEED) and it may be related to virtualization.
  9. Bump. NDAs lifting today. I'm most curious about the 7820X and the 7900X. EDIT: The reviews seems to indicate that the 6 and 8-core models will have half-throughput, and the 10-core model will have full-throughput. Microarchitecture Analysis: Adding in AVX-512 and Tweaks to Skylake-S - The Intel Skylake-X Review: Core i9 7900X, i7 7820X and i7 7800X Tested
  10. Right now, there are conflicting reports that this first line of Skylake X processors (based on the 10-core Skylake Purley LCC die) will not have full-throughput AVX512. Skylake-X not support AVX-512 instructions Skylake-X i7-7900X Performance Leaked: 55% faster than i7-6950X @ 4.5GHz If this is true, the current Skylake X processors will only be able to run AVX512 at half the speed as the server Xeons - IOW, no better than AVX2. I want to definitively answer this question - both for myself and for anyone else looking to purchase a Skylake X processor for the purpose of AVX512. Using the same FLOPs benchmark that discovered the Ryzen FMA bug, we should be able to find out if Skylake X has full-throughput, or half-throughput AVX512. So my request for someone who has a Skylake X sample* to: Run the "2017-SkylakePurley" binary here: https://github.com/Mysticial/Flops/tree/master/version3/binaries-windows** Do it at a fixed CPU frequency (to avoid the affects of Turbo Boost). Do it with HT enabled. Don't use an extreme overclock. If the chip has full-throughput AVX512, then those AVX512 instructions may produce more heat than any other benchmark you've ever run. Do it with a fully updated Windows 10. Or a recent version of Linux (like Ubuntu 17.04). This is needed to ensure that the OS has support for AVX512. *I may be wrong, but I don't believe Skylake X benchmarks are under NDA anymore since there's already a gazillion HWBOT submissions and you can get access to the server variants on Google Cloud. **The source code is also in that GitHub repo if you want to build it yourself. But be aware that if you need the Intel Compiler if you want to build the AVX512 binaries for Windows. ---------------- When you run the benchmark, I expect one of 3 things to happen: The binary crashes: This means that Windows 10 does not have support for AVX512 and we'll need to wait for that support. The numbers for 512-bit AVX are about the same as the 256-bit AVX: This means that the processor only supports half-throughput AVX512. The numbers for the 512-bit AVX are about 2x as that of the 256-bit AVX: This means that the processor supports full-throughput AVX512. Here is what the benchmark looks like for a 32-core Skylake Purley system on Google Cloud running at 2.0 GHz with 2.5 GHz turbo: Running Skylake Purley tuned binary with 1 thread... Single-Precision - 128-bit AVX - Add/Sub GFlops = 15.904 Result = 2.02376e+06 Double-Precision - 128-bit AVX - Add/Sub GFlops = 7.952 Result = 1.00995e+06 Single-Precision - 128-bit AVX - Multiply GFlops = 15.936 Result = 2.03498e+06 Double-Precision - 128-bit AVX - Multiply GFlops = 7.968 Result = 1.00712e+06 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.69085e+06 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 7.968 Result = 841756 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.02868e+06 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 15.936 Result = 1.01782e+06 Single-Precision - 256-bit AVX - Add/Sub GFlops = 31.808 Result = 4.06688e+06 Double-Precision - 256-bit AVX - Add/Sub GFlops = 15.936 Result = 2.02901e+06 Single-Precision - 256-bit AVX - Multiply GFlops = 31.872 Result = 4.06158e+06 Double-Precision - 256-bit AVX - Multiply GFlops = 15.936 Result = 2.02013e+06 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 31.872 Result = 3.34696e+06 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.70441e+06 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 63.744 Result = 4.0399e+06 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.00801e+06 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 63.744 Result = 8.11456e+06 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 31.872 Result = 4.03949e+06 Single-Precision - 512-bit AVX512 - Multiply GFlops = 63.36 Result = 8.0743e+06 Double-Precision - 512-bit AVX512 - Multiply GFlops = 31.872 Result = 4.05014e+06 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 63.744 Result = 6.68723e+06 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 31.872 Result = 3.3739e+06 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 127.488 Result = 8.22848e+06 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 63.744 Result = 4.03805e+06 Running Skylake Purley tuned binary with 64 thread(s)... Single-Precision - 128-bit AVX - Add/Sub GFlops = 683.36 Result = 8.68179e+07 Double-Precision - 128-bit AVX - Add/Sub GFlops = 263.568 Result = 3.35065e+07 Single-Precision - 128-bit AVX - Multiply GFlops = 527.616 Result = 6.69453e+07 Double-Precision - 128-bit AVX - Multiply GFlops = 263.88 Result = 3.34619e+07 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 527.136 Result = 5.58561e+07 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 263.64 Result = 2.79832e+07 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 1056.77 Result = 6.71142e+07 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 528.336 Result = 3.36188e+07 Single-Precision - 256-bit AVX - Add/Sub GFlops = 1054.14 Result = 1.34076e+08 Double-Precision - 256-bit AVX - Add/Sub GFlops = 527.52 Result = 6.68866e+07 Single-Precision - 256-bit AVX - Multiply GFlops = 1056.77 Result = 1.34416e+08 Double-Precision - 256-bit AVX - Multiply GFlops = 527.664 Result = 6.70251e+07 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 1055.33 Result = 1.12018e+08 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 527.52 Result = 5.59086e+07 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 2110.08 Result = 1.34046e+08 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 1055.33 Result = 6.69451e+07 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 2112.26 Result = 2.68216e+08 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 1056 Result = 1.34131e+08 Single-Precision - 512-bit AVX512 - Multiply GFlops = 2117.38 Result = 2.69031e+08 Double-Precision - 512-bit AVX512 - Multiply GFlops = 1059.26 Result = 1.34601e+08 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 2118.14 Result = 2.24393e+08 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 1058.5 Result = 1.12102e+08 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 4242.43 Result = 2.69409e+08 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 2115.07 Result = 1.34365e+08 This Skylake Purley system has full-throughput AVX512.
  11. I can't reproduce the issue. Submissions work fine for me. Can you give me more information? Such a screenshot of the error?
  12. I'll investigate tonight. I pushed an update a few days ago, but it's certainly possible that I broke something.
  13. Has anyone ever seen a machine with 6TB of memory?
  14. As an update, I'm now using the Gigabyte GA-AB350M. With BIOS version F2, the system crashes on the flops benchmark. With BIOS version F3c, it no longer crashes. So indeed, this does appear to be fixed.
  15. I never tested it with SMT off since my mobo doesn't have that option.
  16. I did actually make it in time. Version 0.7.2 has been released. Happy Pi Day everyone! This new version is faster than the previous version on pretty much all processors that I've tested on. For internal reasons, you are now required to use the latest submitter version (v0.9.6). Here are some submissions from the Ryzen build that I used to tune for Zen. Mysticial`s Y-Cruncher - Pi-25m score: 1sec 503ms with a Ryzen 7 1800X Mysticial`s Y-Cruncher - Pi-1b score: 1min 36sec 626ms with a Ryzen 7 1800X Mysticial`s Y-Cruncher - Pi-10b score: 22min 12sec 565ms with a Ryzen 7 1800X
  17. Wow... Did I really find a bug/errata in the Zen processor? Do I get anything shiny?
  18. Before I go through the trouble of trying out Win7. Have you guys tried running the benchmark? Did it crash? (I'm desperately working to get the Zen tuning parameters for y-cruncher v0.7.2 in time for March 14. So I don't really have that much time to keep debugging this.)
  19. The BIOS won't let me set an SOC voltage offset of more than 0.2. It puts a hard limit of 1.0 volts. Perhaps it can forced higher via the LLC settings. But I'm hesitant to put settings up the limit of what the BIOS allows. I'll play around with that a bit tomorrow. But if what you say is true (SOC should be 1.0 - 1.2), and the BIOS number is correct, then perhaps ASUS is simply setting it too low to begin with?
  20. Things I've tried: One stick of memory. Crashes both with my Corsair and G.Skill TridentZ. Two different video cards. Two different installations of Win10 on different devices. (SSD + HD) The only parts I haven't changed are: The CPU. (I only have one Ryzen CPU.) The PSU. (I don't have any spare PSUs lying around and it's too much work to take apart my other builds.) The motherboard. (I only have one AM4 motherboard.) Temperatures are always below 80C. So I doubt it's a cooling issue.
  21. Enabling/disabling HPET has no effect. Both instantly crash. I can't install Win7 because the installer doesn't have USB drivers and I don't have a PS2 mouse/keyboard.
  22. There's no option to disable XFR or turbo in my BIOS. I don't trust CPUz's vcore reading since it is clearly too high and it conflicts with AI Suite. These are at stock settings, so it shouldn't getting anywhere near 1.5 anyway. When I use AI Suite to manually downclock, it seems to disable both the XFR and the turbo and it holds the frequency steady at 2.2 GHz. The vcore seems to stay at a static 1.35 (under load) according to AI Suite. Again CPUz jumps all the over place to as high as 1.550. But that's beside the point. It really shouldn't be crashing at stock settings - let alone downclock. Which is why I'm looking for more people to test this on different motherboards and from different manufacturers. So far I have 3 positive confirmations (crash), and zero negative confirmations (did not crash). The crashes have these setups - all running at stock and/or underclocked. 1800X + Asus Prime B350M-A (BIOS 0502) 1700 + Asus Prime B350M-A (BIOS ???) 1700 + Asus CrossHair The unanswered questions that I want to know are: Specific to my setup? No - Confirmed by two other people. Specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS? Is this an issue with Windows? Is this a CPU errata? (I hope not - however unlikely it might be.)
  23. For me yes. BIOS 0502 (February 28) The BIOS and AI Suite show a vcore of 1.350. CPUz shows it as 1.550. And it also happens when underclocked to 2.2 GHz. The Windows Event Log occasionally manages to record which core it crashes on. It's pretty random among all 16 vcores. There's no single core that it always happens to. IOW, I don't see any signs of weakness to a specific core.
  24. Uh oh... This doesn't look good. I also have one other confirmation on a different forum. Other things to note: It doesn't always freeze instantly. I have a different Win10 installation that sometimes manages to survive the first FMA test only to crash on the second. The crash doesn't reproduce in Linux, but the code for Linux is slightly different since it uses a different compiler.
×
×
  • Create New...