Jump to content
HWBOT Community Forums

Mysticial

Members
  • Posts

    156
  • Joined

  • Last visited

  • Days Won

    2

Posts posted by Mysticial

  1. if you see a review use over 1.3v for full threading benching and stress testing on an AIO without delidding then the cpu is throttling for sure. delidded conductonaut on both die and on top of ihs with 360x360mm rad 9 fans will hit mid to high 80s with the same voltage.

     

    Stress testing is tricky on x299 since there is clipping and throttling. You can easily see it by running XTU or R15. You can get 3600 Points at 4.9 and then 5ghz will pass with a score of 1800 without showing any cpu freq drop. Same with R15 4.9 2800ish then 5ghz 1300.

     

    Is this throttling only in the clock speeds? As in you can see the throttle happen by watching CPUz and seeing the frequency drop.

     

    I'm noticing on my Gigabyte AORUS 7 that there is a sort "phantom AVX512 throttle" that disables half the AVX512 while maintaining the same clock speed. So while CPUz shows a constant 4 GHz, the performance (and temperatures) drop when the "AVX512 throttle" kicks in. I can partially get around the throttling by lowering the clock to 3.8 GHz and increasing the TDP limit to 400W. But never was I able to avoid the throttling at or above 4.0 GHz.

     

    I've spoken to Silicon Lottery about this and he says all the Gigabyte boards for X299 have tons of background throttling that make it hard to use and I'm not sure if he's referring to the "AVX512 throttle" or clock speed throttling in general.

  2. If anyone's wondering about AVX512. It's coming... No ETA yet since I've hit a number of unexpected snags.

     

    2017-7-1.jpg

     

    If you're wondering why the chip is underclocked. There's a reason for that. And it's a long story.

    Don't worry, it's much harder to fry your chip with AVX512 than I'm making it sound - unless you disable thermal protection...

  3. 6c results 4.5 (45x100) 1.2vcore:

     

    Imgur: The most awesome images on the Internet

     

    No 8c in office atm :(

     

    Now THAT's interesting... They also show full-throughput AVX512. That's contrary to what all the articles out there are reporting.

     

    (2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 864 GFlops

     

    Benchmark shows 872.832 GFlops.

     

    If they were only half-throughput, I'd have expected:

     

    (1 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 432 GFlops

     

    Thanks for running all these benchmarks!

  4. 4.5Ghz (45x100) on the dot, 1.2vcore with a Corsair H110i AIO.

     

    Will try with my good ram on Monday I forgot them at the house and see if it scales at all.

     

    Mesh was at 3Ghz for the above screenie.

     

    Ram will have no effect on that benchmark. The benchmark is 100% CPU.

     

    I was able to calculate your clock speed because the benchmark achieves very close to the theoretical FLOPs on the system.

     

    For the Core i9 7900X assuming full-throughput AVX512:

     

    (2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (10 cores) * (4.5 GHz) = 1440 GFlops

     

    The benchmark is showing 1443.84 GFlops. It's actually slightly more than the theoretical limit because of timing variations.

  5. //Pieter update on Elmor test//

     

    RDSEED not supported was on early 12c. Works on 7900X.

     

    Single Precision - Add/subtract

     

    AVX128 = 1.00x

    AVX256 = 1.84x

    AVX512 = 3.51x

     

    That's good to see! :D Full output?

     

    Though I'm seeing rumors that the integer throughput will not be doubled. And I can see architecturally why that might be. Unfortunately I don't have a benchmark for that.

  6. attachment.php?attachmentid=5756&stc=1&d=1498048762

     

    Windows 10 1703 with Intel C++ redists installed.

     

    Thank you!

     

    This is interesting though. The compiler seems to be trying to enforce that the computer has RDSEED instructions. But RDSEED was already available starting from Broadwell. I don't see why it would be missing from Skylake X unless it was explicitly disabled in the BIOS or something.

     

    This might be a problem moving forward since the compiler forces these checks even though most programs won't use them anyway.

     

    EDIT:

     

    Is virtualization disabled in the BIOS? I'm reading around and it seems that some machines have all the crypto instructions disabled (AES-NI, RDRAND, and RDSEED) and it may be related to virtualization.

  7. Right now, there are conflicting reports that this first line of Skylake X processors (based on the 10-core Skylake Purley LCC die) will not have full-throughput AVX512.

    If this is true, the current Skylake X processors will only be able to run AVX512 at half the speed as the server Xeons - IOW, no better than AVX2.

     

    I want to definitively answer this question - both for myself and for anyone else looking to purchase a Skylake X processor for the purpose of AVX512.

    Using the same FLOPs benchmark that discovered the Ryzen FMA bug, we should be able to find out if Skylake X has full-throughput, or half-throughput AVX512.

     

    So my request for someone who has a Skylake X sample* to:

    1. Run the "2017-SkylakePurley" binary here: https://github.com/Mysticial/Flops/tree/master/version3/binaries-windows**
    2. Do it at a fixed CPU frequency (to avoid the affects of Turbo Boost).
    3. Do it with HT enabled.
    4. Don't use an extreme overclock. If the chip has full-throughput AVX512, then those AVX512 instructions may produce more heat than any other benchmark you've ever run.
    5. Do it with a fully updated Windows 10. Or a recent version of Linux (like Ubuntu 17.04). This is needed to ensure that the OS has support for AVX512.

    *I may be wrong, but I don't believe Skylake X benchmarks are under NDA anymore since there's already a gazillion HWBOT submissions and you can get access to the server variants on Google Cloud.

     

    **The source code is also in that GitHub repo if you want to build it yourself. But be aware that if you need the Intel Compiler if you want to build the AVX512 binaries for Windows.

     

    ----------------

     

    When you run the benchmark, I expect one of 3 things to happen:

    1. The binary crashes: This means that Windows 10 does not have support for AVX512 and we'll need to wait for that support.
    2. The numbers for 512-bit AVX are about the same as the 256-bit AVX: This means that the processor only supports half-throughput AVX512.
    3. The numbers for the 512-bit AVX are about 2x as that of the 256-bit AVX: This means that the processor supports full-throughput AVX512.

     

    Here is what the benchmark looks like for a 32-core Skylake Purley system on Google Cloud running at 2.0 GHz with 2.5 GHz turbo:

     

    Running Skylake Purley tuned binary with 1 thread...
    
    Single-Precision - 128-bit AVX - Add/Sub
       GFlops = 15.904
       Result = 2.02376e+06
    
    Double-Precision - 128-bit AVX - Add/Sub
       GFlops = 7.952
       Result = 1.00995e+06
    
    Single-Precision - 128-bit AVX - Multiply
       GFlops = 15.936
       Result = 2.03498e+06
    
    Double-Precision - 128-bit AVX - Multiply
       GFlops = 7.968
       Result = 1.00712e+06
    
    Single-Precision - 128-bit AVX - Multiply + Add
       GFlops = 15.936
       Result = 1.69085e+06
    
    Double-Precision - 128-bit AVX - Multiply + Add
       GFlops = 7.968
       Result = 841756
    
    Single-Precision - 128-bit FMA3 - Fused Multiply Add
       GFlops = 31.872
       Result = 2.02868e+06
    
    Double-Precision - 128-bit FMA3 - Fused Multiply Add
       GFlops = 15.936
       Result = 1.01782e+06
    
    Single-Precision - 256-bit AVX - Add/Sub
       GFlops = 31.808
       Result = 4.06688e+06
    
    Double-Precision - 256-bit AVX - Add/Sub
       GFlops = 15.936
       Result = 2.02901e+06
    
    Single-Precision - 256-bit AVX - Multiply
       GFlops = 31.872
       Result = 4.06158e+06
    
    Double-Precision - 256-bit AVX - Multiply
       GFlops = 15.936
       Result = 2.02013e+06
    
    Single-Precision - 256-bit AVX - Multiply + Add
       GFlops = 31.872
       Result = 3.34696e+06
    
    Double-Precision - 256-bit AVX - Multiply + Add
       GFlops = 15.936
       Result = 1.70441e+06
    
    Single-Precision - 256-bit FMA3 - Fused Multiply Add
       GFlops = 63.744
       Result = 4.0399e+06
    
    Double-Precision - 256-bit FMA3 - Fused Multiply Add
       GFlops = 31.872
       Result = 2.00801e+06
    
    Single-Precision - 512-bit AVX512 - Add/Sub
       GFlops = 63.744
       Result = 8.11456e+06
    
    Double-Precision - 512-bit AVX512 - Add/Sub
       GFlops = 31.872
       Result = 4.03949e+06
    
    Single-Precision - 512-bit AVX512 - Multiply
       GFlops = 63.36
       Result = 8.0743e+06
    
    Double-Precision - 512-bit AVX512 - Multiply
       GFlops = 31.872
       Result = 4.05014e+06
    
    Single-Precision - 512-bit AVX512 - Multiply + Add
       GFlops = 63.744
       Result = 6.68723e+06
    
    Double-Precision - 512-bit AVX512 - Multiply + Add
       GFlops = 31.872
       Result = 3.3739e+06
    
    Single-Precision - 512-bit AVX512 - Fused Multiply Add
       GFlops = 127.488
       Result = 8.22848e+06
    
    Double-Precision - 512-bit AVX512 - Fused Multiply Add
       GFlops = 63.744
       Result = 4.03805e+06
    
    
    Running Skylake Purley tuned binary with 64 thread(s)...
    
    Single-Precision - 128-bit AVX - Add/Sub
       GFlops = 683.36
       Result = 8.68179e+07
    
    Double-Precision - 128-bit AVX - Add/Sub
       GFlops = 263.568
       Result = 3.35065e+07
    
    Single-Precision - 128-bit AVX - Multiply
       GFlops = 527.616
       Result = 6.69453e+07
    
    Double-Precision - 128-bit AVX - Multiply
       GFlops = 263.88
       Result = 3.34619e+07
    
    Single-Precision - 128-bit AVX - Multiply + Add
       GFlops = 527.136
       Result = 5.58561e+07
    
    Double-Precision - 128-bit AVX - Multiply + Add
       GFlops = 263.64
       Result = 2.79832e+07
    
    Single-Precision - 128-bit FMA3 - Fused Multiply Add
       GFlops = 1056.77
       Result = 6.71142e+07
    
    Double-Precision - 128-bit FMA3 - Fused Multiply Add
       GFlops = 528.336
       Result = 3.36188e+07
    
    Single-Precision - 256-bit AVX - Add/Sub
       GFlops = 1054.14
       Result = 1.34076e+08
    
    Double-Precision - 256-bit AVX - Add/Sub
       GFlops = 527.52
       Result = 6.68866e+07
    
    Single-Precision - 256-bit AVX - Multiply
       GFlops = 1056.77
       Result = 1.34416e+08
    
    Double-Precision - 256-bit AVX - Multiply
       GFlops = 527.664
       Result = 6.70251e+07
    
    Single-Precision - 256-bit AVX - Multiply + Add
       GFlops = 1055.33
       Result = 1.12018e+08
    
    Double-Precision - 256-bit AVX - Multiply + Add
       GFlops = 527.52
       Result = 5.59086e+07
    
    Single-Precision - 256-bit FMA3 - Fused Multiply Add
       GFlops = 2110.08
       Result = 1.34046e+08
    
    Double-Precision - 256-bit FMA3 - Fused Multiply Add
       GFlops = 1055.33
       Result = 6.69451e+07
    
    Single-Precision - 512-bit AVX512 - Add/Sub
       GFlops = 2112.26
       Result = 2.68216e+08
    
    Double-Precision - 512-bit AVX512 - Add/Sub
       GFlops = 1056
       Result = 1.34131e+08
    
    Single-Precision - 512-bit AVX512 - Multiply
       GFlops = 2117.38
       Result = 2.69031e+08
    
    Double-Precision - 512-bit AVX512 - Multiply
       GFlops = 1059.26
       Result = 1.34601e+08
    
    Single-Precision - 512-bit AVX512 - Multiply + Add
       GFlops = 2118.14
       Result = 2.24393e+08
    
    Double-Precision - 512-bit AVX512 - Multiply + Add
       GFlops = 1058.5
       Result = 1.12102e+08
    
    Single-Precision - 512-bit AVX512 - Fused Multiply Add
       GFlops = 4242.43
       Result = 2.69409e+08
    
    Double-Precision - 512-bit AVX512 - Fused Multiply Add
       GFlops = 2115.07
       Result = 1.34365e+08

     

    This Skylake Purley system has full-throughput AVX512.

  8. I did actually make it in time. Version 0.7.2 has been released. Happy Pi Day everyone!

     

    This new version is faster than the previous version on pretty much all processors that I've tested on.

    For internal reasons, you are now required to use the latest submitter version (v0.9.6).

     

    Here are some submissions from the Ryzen build that I used to tune for Zen.

     

  9. OK the SOC voltage is preety damn low. It should be around 1.00 minimum and the range is 1.0-1.20 bump to gain stability.

     

    The bottom voltage Dram termination voltage should be equal to 50%

     

    The BIOS won't let me set an SOC voltage offset of more than 0.2. It puts a hard limit of 1.0 volts. Perhaps it can forced higher via the LLC settings. But I'm hesitant to put settings up the limit of what the BIOS allows.

     

    I'll play around with that a bit tomorrow. But if what you say is true (SOC should be 1.0 - 1.2), and the BIOS number is correct, then perhaps ASUS is simply setting it too low to begin with?

  10. pull 3 sticks and run

     

    Things I've tried:

     

    • One stick of memory. Crashes both with my Corsair and G.Skill TridentZ.
    • Two different video cards.
    • Two different installations of Win10 on different devices. (SSD + HD)

     

    The only parts I haven't changed are:

    • The CPU. (I only have one Ryzen CPU.)
    • The PSU. (I don't have any spare PSUs lying around and it's too much work to take apart my other builds.)
    • The motherboard. (I only have one AM4 motherboard.)

     

    Temperatures are always below 80C. So I doubt it's a cooling issue.

  11. So, try it different, what you will see in HWinfo about voltage in load? https://www.fosshub.com/HWiNFO.html/hw64_545_3090.zip

     

    Its bad for me, because I have not here yet the test setup with Ryzen (tomorow with Crosshair).

    Iny my theory it could be:

    -overheating the CPU or VRM because the vcore is fluctuating to high at auto settings

    -BIOS issue

    -Windows 10 issue (Win7 seems more ready for Ryzens as few guys at another forum wrote)

     

    PS:HPET is enabled via cmd in WIndows?

     

    Enabling/disabling HPET has no effect. Both instantly crash.

     

    Mystical can this program run in win 7? If so can you please try it for me?

     

    I can't install Win7 because the installer doesn't have USB drivers and I don't have a PS2 mouse/keyboard.

  12. Its clear, seems too high voltage...Its possible this voltage is for XFR/turbo. You can try disable turbo in BIOS and try the test again and watch your voltage/temps

    1800X is hot chip with voltage, 1700 or 1700X have lower temps with same voltage.

     

    There's no option to disable XFR or turbo in my BIOS. I don't trust CPUz's vcore reading since it is clearly too high and it conflicts with AI Suite. These are at stock settings, so it shouldn't getting anywhere near 1.5 anyway.

     

    When I use AI Suite to manually downclock, it seems to disable both the XFR and the turbo and it holds the frequency steady at 2.2 GHz. The vcore seems to stay at a static 1.35 (under load) according to AI Suite. Again CPUz jumps all the over place to as high as 1.550.

     

    But that's beside the point. It really shouldn't be crashing at stock settings - let alone downclock. Which is why I'm looking for more people to test this on different motherboards and from different manufacturers.

     

    So far I have 3 positive confirmations (crash), and zero negative confirmations (did not crash).

     

    The crashes have these setups - all running at stock and/or underclocked.

    1. 1800X + Asus Prime B350M-A (BIOS 0502)
    2. 1700 + Asus Prime B350M-A (BIOS ???)
    3. 1700 + Asus CrossHair

     

    The unanswered questions that I want to know are:

    1. Specific to my setup? No - Confirmed by two other people.
    2. Specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS?
    3. Is this an issue with Windows?
    4. Is this a CPU errata? (I hope not - however unlikely it might be.)

×
×
  • Create New...