Mysticial
-
Posts
156 -
Joined
-
Last visited
-
Days Won
2
Content Type
Profiles
Forums
Events
Blogs
Posts posted by Mysticial
-
-
If anyone's wondering about AVX512. It's coming... No ETA yet since I've hit a number of unexpected snags.
If you're wondering why the chip is underclocked. There's a reason for that. And it's a long story.
Don't worry, it's much harder to fry your chip with AVX512 than I'm making it sound - unless you disable thermal protection...
-
I'm doing some AVX512 testing right now and it seems that Intel found a very sneaky but ingenious way to do wattage throttling.
More details on that later. And by "later", I meaning probably a week from now since it's "quite complicated".
-
6c results 4.5 (45x100) 1.2vcore:
Imgur: The most awesome images on the Internet
No 8c in office atm
Now THAT's interesting... They also show full-throughput AVX512. That's contrary to what all the articles out there are reporting.
(2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 864 GFlops
Benchmark shows 872.832 GFlops.
If they were only half-throughput, I'd have expected:
(1 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 432 GFlops
Thanks for running all these benchmarks!
-
4.5Ghz (45x100) on the dot, 1.2vcore with a Corsair H110i AIO.
Will try with my good ram on Monday I forgot them at the house and see if it scales at all.
Mesh was at 3Ghz for the above screenie.
Ram will have no effect on that benchmark. The benchmark is 100% CPU.
I was able to calculate your clock speed because the benchmark achieves very close to the theoretical FLOPs on the system.
For the Core i9 7900X assuming full-throughput AVX512:
(2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (10 cores) * (4.5 GHz) = 1440 GFlops
The benchmark is showing 1443.84 GFlops. It's actually slightly more than the theoretical limit because of timing variations.
-
Here is my testing, how many Gflops? All the Gflops
Imgur: The most awesome images on the Internet
Will test the 6c on Monday.
Wow! Over 1 TFlops for double-precision!
CPUz doesn't seem accurate in that screenshot. But based on the numbers it looks like you were clocked around 4.5 GHz? Possibly in 100 x 45 configuration?
And it didn't melt?
-
//Pieter update on Elmor test//
RDSEED not supported was on early 12c. Works on 7900X.
Single Precision - Add/subtract
AVX128 = 1.00x
AVX256 = 1.84x
AVX512 = 3.51x
That's good to see! Full output?
Though I'm seeing rumors that the integer throughput will not be doubled. And I can see architecturally why that might be. Unfortunately I don't have a benchmark for that.
-
It was, but still get the same after enabling it.
Would you be able to try with the latest binaries? I updated them last night.
As far as I can tell, I've removed the check. So it should get past that message and either run successfully or crash.
Thanks for you time.
-
I found a way to disable that check by the compiler and I've updated the binaries.
So if anyone is willing to try now, it should (hopefully) work regardless of whether RDSEED is enabled or not.
Thanks.
-
Windows 10 1703 with Intel C++ redists installed.
Thank you!
This is interesting though. The compiler seems to be trying to enforce that the computer has RDSEED instructions. But RDSEED was already available starting from Broadwell. I don't see why it would be missing from Skylake X unless it was explicitly disabled in the BIOS or something.
This might be a problem moving forward since the compiler forces these checks even though most programs won't use them anyway.
EDIT:
Is virtualization disabled in the BIOS? I'm reading around and it seems that some machines have all the crypto instructions disabled (AES-NI, RDRAND, and RDSEED) and it may be related to virtualization.
-
Bump. NDAs lifting today.
I'm most curious about the 7820X and the 7900X.
EDIT:
The reviews seems to indicate that the 6 and 8-core models will have half-throughput, and the 10-core model will have full-throughput. Microarchitecture Analysis: Adding in AVX-512 and Tweaks to Skylake-S - The Intel Skylake-X Review: Core i9 7900X, i7 7820X and i7 7800X Tested
-
Right now, there are conflicting reports that this first line of Skylake X processors (based on the 10-core Skylake Purley LCC die) will not have full-throughput AVX512.
-
Skylake-X not support AVX-512 instructions
-
Skylake-X i7-7900X Performance Leaked: 55% faster than i7-6950X @ 4.5GHz
If this is true, the current Skylake X processors will only be able to run AVX512 at half the speed as the server Xeons - IOW, no better than AVX2.
I want to definitively answer this question - both for myself and for anyone else looking to purchase a Skylake X processor for the purpose of AVX512.
Using the same FLOPs benchmark that discovered the Ryzen FMA bug, we should be able to find out if Skylake X has full-throughput, or half-throughput AVX512.
So my request for someone who has a Skylake X sample* to:
- Run the "2017-SkylakePurley" binary here: https://github.com/Mysticial/Flops/tree/master/version3/binaries-windows**
- Do it at a fixed CPU frequency (to avoid the affects of Turbo Boost).
- Do it with HT enabled.
- Don't use an extreme overclock. If the chip has full-throughput AVX512, then those AVX512 instructions may produce more heat than any other benchmark you've ever run.
- Do it with a fully updated Windows 10. Or a recent version of Linux (like Ubuntu 17.04). This is needed to ensure that the OS has support for AVX512.
*I may be wrong, but I don't believe Skylake X benchmarks are under NDA anymore since there's already a gazillion HWBOT submissions and you can get access to the server variants on Google Cloud.
**The source code is also in that GitHub repo if you want to build it yourself. But be aware that if you need the Intel Compiler if you want to build the AVX512 binaries for Windows.
----------------
When you run the benchmark, I expect one of 3 things to happen:
- The binary crashes: This means that Windows 10 does not have support for AVX512 and we'll need to wait for that support.
- The numbers for 512-bit AVX are about the same as the 256-bit AVX: This means that the processor only supports half-throughput AVX512.
- The numbers for the 512-bit AVX are about 2x as that of the 256-bit AVX: This means that the processor supports full-throughput AVX512.
Here is what the benchmark looks like for a 32-core Skylake Purley system on Google Cloud running at 2.0 GHz with 2.5 GHz turbo:
Running Skylake Purley tuned binary with 1 thread... Single-Precision - 128-bit AVX - Add/Sub GFlops = 15.904 Result = 2.02376e+06 Double-Precision - 128-bit AVX - Add/Sub GFlops = 7.952 Result = 1.00995e+06 Single-Precision - 128-bit AVX - Multiply GFlops = 15.936 Result = 2.03498e+06 Double-Precision - 128-bit AVX - Multiply GFlops = 7.968 Result = 1.00712e+06 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.69085e+06 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 7.968 Result = 841756 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.02868e+06 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 15.936 Result = 1.01782e+06 Single-Precision - 256-bit AVX - Add/Sub GFlops = 31.808 Result = 4.06688e+06 Double-Precision - 256-bit AVX - Add/Sub GFlops = 15.936 Result = 2.02901e+06 Single-Precision - 256-bit AVX - Multiply GFlops = 31.872 Result = 4.06158e+06 Double-Precision - 256-bit AVX - Multiply GFlops = 15.936 Result = 2.02013e+06 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 31.872 Result = 3.34696e+06 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.70441e+06 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 63.744 Result = 4.0399e+06 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.00801e+06 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 63.744 Result = 8.11456e+06 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 31.872 Result = 4.03949e+06 Single-Precision - 512-bit AVX512 - Multiply GFlops = 63.36 Result = 8.0743e+06 Double-Precision - 512-bit AVX512 - Multiply GFlops = 31.872 Result = 4.05014e+06 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 63.744 Result = 6.68723e+06 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 31.872 Result = 3.3739e+06 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 127.488 Result = 8.22848e+06 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 63.744 Result = 4.03805e+06 Running Skylake Purley tuned binary with 64 thread(s)... Single-Precision - 128-bit AVX - Add/Sub GFlops = 683.36 Result = 8.68179e+07 Double-Precision - 128-bit AVX - Add/Sub GFlops = 263.568 Result = 3.35065e+07 Single-Precision - 128-bit AVX - Multiply GFlops = 527.616 Result = 6.69453e+07 Double-Precision - 128-bit AVX - Multiply GFlops = 263.88 Result = 3.34619e+07 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 527.136 Result = 5.58561e+07 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 263.64 Result = 2.79832e+07 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 1056.77 Result = 6.71142e+07 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 528.336 Result = 3.36188e+07 Single-Precision - 256-bit AVX - Add/Sub GFlops = 1054.14 Result = 1.34076e+08 Double-Precision - 256-bit AVX - Add/Sub GFlops = 527.52 Result = 6.68866e+07 Single-Precision - 256-bit AVX - Multiply GFlops = 1056.77 Result = 1.34416e+08 Double-Precision - 256-bit AVX - Multiply GFlops = 527.664 Result = 6.70251e+07 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 1055.33 Result = 1.12018e+08 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 527.52 Result = 5.59086e+07 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 2110.08 Result = 1.34046e+08 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 1055.33 Result = 6.69451e+07 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 2112.26 Result = 2.68216e+08 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 1056 Result = 1.34131e+08 Single-Precision - 512-bit AVX512 - Multiply GFlops = 2117.38 Result = 2.69031e+08 Double-Precision - 512-bit AVX512 - Multiply GFlops = 1059.26 Result = 1.34601e+08 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 2118.14 Result = 2.24393e+08 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 1058.5 Result = 1.12102e+08 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 4242.43 Result = 2.69409e+08 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 2115.07 Result = 1.34365e+08
This Skylake Purley system has full-throughput AVX512.
-
Skylake-X not support AVX-512 instructions
-
No client version given, required version: 0.9.6 ?!
Using 0.9.6.114, submit through the interface.
I can't reproduce the issue. Submissions work fine for me. Can you give me more information? Such a screenshot of the error?
-
No client version given, required version: 0.9.6 ?!
Using 0.9.6.114, submit through the interface.
I'll investigate tonight. I pushed an update a few days ago, but it's certainly possible that I broke something.
-
Has anyone ever seen a machine with 6TB of memory?
-
As an update, I'm now using the Gigabyte GA-AB350M.
- With BIOS version F2, the system crashes on the flops benchmark.
- With BIOS version F3c, it no longer crashes.
So indeed, this does appear to be fixed.
- With BIOS version F2, the system crashes on the flops benchmark.
-
I never tested it with SMT off since my mobo doesn't have that option.
-
I did actually make it in time. Version 0.7.2 has been released. Happy Pi Day everyone!
This new version is faster than the previous version on pretty much all processors that I've tested on.
For internal reasons, you are now required to use the latest submitter version (v0.9.6).
Here are some submissions from the Ryzen build that I used to tune for Zen.
-
Was told this issue will be fixed in a new AGESA code. In other words: it was an AMD issue, not C6H issue.
Thanks for finding this bug @Mysticial!
Wow... Did I really find a bug/errata in the Zen processor? Do I get anything shiny?
-
Before I go through the trouble of trying out Win7. Have you guys tried running the benchmark? Did it crash?
(I'm desperately working to get the Zen tuning parameters for y-cruncher v0.7.2 in time for March 14. So I don't really have that much time to keep debugging this.)
-
OK the SOC voltage is preety damn low. It should be around 1.00 minimum and the range is 1.0-1.20 bump to gain stability.
The bottom voltage Dram termination voltage should be equal to 50%
The BIOS won't let me set an SOC voltage offset of more than 0.2. It puts a hard limit of 1.0 volts. Perhaps it can forced higher via the LLC settings. But I'm hesitant to put settings up the limit of what the BIOS allows.
I'll play around with that a bit tomorrow. But if what you say is true (SOC should be 1.0 - 1.2), and the BIOS number is correct, then perhaps ASUS is simply setting it too low to begin with?
-
can you take a picture of the bios voltages for me, namely dram dram termination and SOC
What's your hypothesis?
-
pull 3 sticks and run
Things I've tried:
- One stick of memory. Crashes both with my Corsair and G.Skill TridentZ.
- Two different video cards.
- Two different installations of Win10 on different devices. (SSD + HD)
The only parts I haven't changed are:
- The CPU. (I only have one Ryzen CPU.)
- The PSU. (I don't have any spare PSUs lying around and it's too much work to take apart my other builds.)
- The motherboard. (I only have one AM4 motherboard.)
Temperatures are always below 80C. So I doubt it's a cooling issue.
- One stick of memory. Crashes both with my Corsair and G.Skill TridentZ.
-
So, try it different, what you will see in HWinfo about voltage in load? https://www.fosshub.com/HWiNFO.html/hw64_545_3090.zip
Its bad for me, because I have not here yet the test setup with Ryzen (tomorow with Crosshair).
Iny my theory it could be:
-overheating the CPU or VRM because the vcore is fluctuating to high at auto settings
-BIOS issue
-Windows 10 issue (Win7 seems more ready for Ryzens as few guys at another forum wrote)
PS:HPET is enabled via cmd in WIndows?
Enabling/disabling HPET has no effect. Both instantly crash.
Mystical can this program run in win 7? If so can you please try it for me?I can't install Win7 because the installer doesn't have USB drivers and I don't have a PS2 mouse/keyboard.
-
Its clear, seems too high voltage...Its possible this voltage is for XFR/turbo. You can try disable turbo in BIOS and try the test again and watch your voltage/temps
1800X is hot chip with voltage, 1700 or 1700X have lower temps with same voltage.
There's no option to disable XFR or turbo in my BIOS. I don't trust CPUz's vcore reading since it is clearly too high and it conflicts with AI Suite. These are at stock settings, so it shouldn't getting anywhere near 1.5 anyway.
When I use AI Suite to manually downclock, it seems to disable both the XFR and the turbo and it holds the frequency steady at 2.2 GHz. The vcore seems to stay at a static 1.35 (under load) according to AI Suite. Again CPUz jumps all the over place to as high as 1.550.
But that's beside the point. It really shouldn't be crashing at stock settings - let alone downclock. Which is why I'm looking for more people to test this on different motherboards and from different manufacturers.
So far I have 3 positive confirmations (crash), and zero negative confirmations (did not crash).
The crashes have these setups - all running at stock and/or underclocked.
- 1800X + Asus Prime B350M-A (BIOS 0502)
- 1700 + Asus Prime B350M-A (BIOS ???)
- 1700 + Asus CrossHair
The unanswered questions that I want to know are:
-
Specific to my setup? No - Confirmed by two other people.
- Specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS?
- Is this an issue with Windows?
- Is this a CPU errata? (I hope not - however unlikely it might be.)
- 1800X + Asus Prime B350M-A (BIOS 0502)
X299 wattage throttling, is it caused by bad cooler?
in Intel CPU Overclocking
Posted
Is this throttling only in the clock speeds? As in you can see the throttle happen by watching CPUz and seeing the frequency drop.
I'm noticing on my Gigabyte AORUS 7 that there is a sort "phantom AVX512 throttle" that disables half the AVX512 while maintaining the same clock speed. So while CPUz shows a constant 4 GHz, the performance (and temperatures) drop when the "AVX512 throttle" kicks in. I can partially get around the throttling by lowering the clock to 3.8 GHz and increasing the TDP limit to 400W. But never was I able to avoid the throttling at or above 4.0 GHz.
I've spoken to Silicon Lottery about this and he says all the Gigabyte boards for X299 have tons of background throttling that make it hard to use and I'm not sure if he's referring to the "AVX512 throttle" or clock speed throttling in general.