Mysticial

July 3, 2017

if you see a review use over 1.3v for full threading benching and stress testing on an AIO without delidding then the cpu is throttling for sure. delidded conductonaut on both die and on top of ihs with 360x360mm rad 9 fans will hit mid to high 80s with the same voltage.

Stress testing is tricky on x299 since there is clipping and throttling. You can easily see it by running XTU or R15. You can get 3600 Points at 4.9 and then 5ghz will pass with a score of 1800 without showing any cpu freq drop. Same with R15 4.9 2800ish then 5ghz 1300.

Is this throttling only in the clock speeds? As in you can see the throttle happen by watching CPUz and seeing the frequency drop.

I'm noticing on my Gigabyte AORUS 7 that there is a sort "phantom AVX512 throttle" that disables half the AVX512 while maintaining the same clock speed. So while CPUz shows a constant 4 GHz, the performance (and temperatures) drop when the "AVX512 throttle" kicks in. I can partially get around the throttling by lowering the clock to 3.8 GHz and increasing the TDP limit to 400W. But never was I able to avoid the throttling at or above 4.0 GHz.

I've spoken to Silicon Lottery about this and he says all the Gigabyte boards for X299 have tons of background throttling that make it hard to use and I'm not sure if he's referring to the "AVX512 throttle" or clock speed throttling in general.

July 2, 2017

If anyone's wondering about AVX512. It's coming... No ETA yet since I've hit a number of unexpected snags.

If you're wondering why the chip is underclocked. There's a reason for that. And it's a long story.

Don't worry, it's much harder to fry your chip with AVX512 than I'm making it sound - unless you disable thermal protection...

July 1, 2017

I'm doing some AVX512 testing right now and it seems that Intel found a very sneaky but ingenious way to do wattage throttling.

More details on that later. And by "later", I meaning probably a week from now since it's "quite complicated".

June 24, 2017

6c results 4.5 (45x100) 1.2vcore:

Imgur: The most awesome images on the Internet

No 8c in office atm

Now THAT's interesting... They also show full-throughput AVX512. That's contrary to what all the articles out there are reporting.

(2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 864 GFlops

Benchmark shows 872.832 GFlops.

If they were only half-throughput, I'd have expected:

(1 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 432 GFlops

Thanks for running all these benchmarks!

June 24, 2017

4.5Ghz (45x100) on the dot, 1.2vcore with a Corsair H110i AIO.

Will try with my good ram on Monday I forgot them at the house and see if it scales at all.

Mesh was at 3Ghz for the above screenie.

Ram will have no effect on that benchmark. The benchmark is 100% CPU.

I was able to calculate your clock speed because the benchmark achieves very close to the theoretical FLOPs on the system.

For the Core i9 7900X assuming full-throughput AVX512:

(2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (10 cores) * (4.5 GHz) = 1440 GFlops

The benchmark is showing 1443.84 GFlops. It's actually slightly more than the theoretical limit because of timing variations.

June 24, 2017

Here is my testing, how many Gflops? All the Gflops

Imgur: The most awesome images on the Internet

Will test the 6c on Monday.

Wow! Over 1 TFlops for double-precision! :eek:

CPUz doesn't seem accurate in that screenshot. But based on the numbers it looks like you were clocked around 4.5 GHz? Possibly in 100 x 45 configuration?

And it didn't melt?

June 23, 2017

//Pieter update on Elmor test//

RDSEED not supported was on early 12c. Works on 7900X.

Single Precision - Add/subtract

AVX128 = 1.00x

AVX256 = 1.84x

AVX512 = 3.51x

That's good to see! Full output?

Though I'm seeing rumors that the integer throughput will not be doubled. And I can see architecturally why that might be. Unfortunately I don't have a benchmark for that.

June 22, 2017

It was, but still get the same after enabling it.

Would you be able to try with the latest binaries? I updated them last night.

As far as I can tell, I've removed the check. So it should get past that message and either run successfully or crash.

Thanks for you time.

June 22, 2017

I found a way to disable that check by the compiler and I've updated the binaries.

So if anyone is willing to try now, it should (hopefully) work regardless of whether RDSEED is enabled or not.

Thanks.

June 21, 2017

Windows 10 1703 with Intel C++ redists installed.

Thank you!

This is interesting though. The compiler seems to be trying to enforce that the computer has RDSEED instructions. But RDSEED was already available starting from Broadwell. I don't see why it would be missing from Skylake X unless it was explicitly disabled in the BIOS or something.

This might be a problem moving forward since the compiler forces these checks even though most programs won't use them anyway.

EDIT:

Is virtualization disabled in the BIOS? I'm reading around and it seems that some machines have all the crypto instructions disabled (AES-NI, RDRAND, and RDSEED) and it may be related to virtualization.

June 19, 2017

Bump. NDAs lifting today.

I'm most curious about the 7820X and the 7900X.

EDIT:

The reviews seems to indicate that the 6 and 8-core models will have half-throughput, and the 10-core model will have full-throughput. Microarchitecture Analysis: Adding in AVX-512 and Tweaks to Skylake-S - The Intel Skylake-X Review: Core i9 7900X, i7 7820X and i7 7800X Tested

June 8, 2017

Right now, there are conflicting reports that this first line of Skylake X processors (based on the 10-core Skylake Purley LCC die) will not have full-throughput AVX512.

If this is true, the current Skylake X processors will only be able to run AVX512 at half the speed as the server Xeons - IOW, no better than AVX2.

I want to definitively answer this question - both for myself and for anyone else looking to purchase a Skylake X processor for the purpose of AVX512.

Using the same FLOPs benchmark that discovered the Ryzen FMA bug, we should be able to find out if Skylake X has full-throughput, or half-throughput AVX512.

So my request for someone who has a Skylake X sample* to:

Run the "2017-SkylakePurley" binary here: https://github.com/Mysticial/Flops/tree/master/version3/binaries-windows**
Do it at a fixed CPU frequency (to avoid the affects of Turbo Boost).
Do it with HT enabled.
Don't use an extreme overclock. If the chip has full-throughput AVX512, then those AVX512 instructions may produce more heat than any other benchmark you've ever run.
Do it with a fully updated Windows 10. Or a recent version of Linux (like Ubuntu 17.04). This is needed to ensure that the OS has support for AVX512.

*I may be wrong, but I don't believe Skylake X benchmarks are under NDA anymore since there's already a gazillion HWBOT submissions and you can get access to the server variants on Google Cloud.

**The source code is also in that GitHub repo if you want to build it yourself. But be aware that if you need the Intel Compiler if you want to build the AVX512 binaries for Windows.

----------------

When you run the benchmark, I expect one of 3 things to happen:

The binary crashes: This means that Windows 10 does not have support for AVX512 and we'll need to wait for that support.
The numbers for 512-bit AVX are about the same as the 256-bit AVX: This means that the processor only supports half-throughput AVX512.
The numbers for the 512-bit AVX are about 2x as that of the 256-bit AVX: This means that the processor supports full-throughput AVX512.

Here is what the benchmark looks like for a 32-core Skylake Purley system on Google Cloud running at 2.0 GHz with 2.5 GHz turbo:

Running Skylake Purley tuned binary with 1 thread...

Single-Precision - 128-bit AVX - Add/Sub
   GFlops = 15.904
   Result = 2.02376e+06

Double-Precision - 128-bit AVX - Add/Sub
   GFlops = 7.952
   Result = 1.00995e+06

Single-Precision - 128-bit AVX - Multiply
   GFlops = 15.936
   Result = 2.03498e+06

Double-Precision - 128-bit AVX - Multiply
   GFlops = 7.968
   Result = 1.00712e+06

Single-Precision - 128-bit AVX - Multiply + Add
   GFlops = 15.936
   Result = 1.69085e+06

Double-Precision - 128-bit AVX - Multiply + Add
   GFlops = 7.968
   Result = 841756

Single-Precision - 128-bit FMA3 - Fused Multiply Add
   GFlops = 31.872
   Result = 2.02868e+06

Double-Precision - 128-bit FMA3 - Fused Multiply Add
   GFlops = 15.936
   Result = 1.01782e+06

Single-Precision - 256-bit AVX - Add/Sub
   GFlops = 31.808
   Result = 4.06688e+06

Double-Precision - 256-bit AVX - Add/Sub
   GFlops = 15.936
   Result = 2.02901e+06

Single-Precision - 256-bit AVX - Multiply
   GFlops = 31.872
   Result = 4.06158e+06

Double-Precision - 256-bit AVX - Multiply
   GFlops = 15.936
   Result = 2.02013e+06

Single-Precision - 256-bit AVX - Multiply + Add
   GFlops = 31.872
   Result = 3.34696e+06

Double-Precision - 256-bit AVX - Multiply + Add
   GFlops = 15.936
   Result = 1.70441e+06

Single-Precision - 256-bit FMA3 - Fused Multiply Add
   GFlops = 63.744
   Result = 4.0399e+06

Double-Precision - 256-bit FMA3 - Fused Multiply Add
   GFlops = 31.872
   Result = 2.00801e+06

Single-Precision - 512-bit AVX512 - Add/Sub
   GFlops = 63.744
   Result = 8.11456e+06

Double-Precision - 512-bit AVX512 - Add/Sub
   GFlops = 31.872
   Result = 4.03949e+06

Single-Precision - 512-bit AVX512 - Multiply
   GFlops = 63.36
   Result = 8.0743e+06

Double-Precision - 512-bit AVX512 - Multiply
   GFlops = 31.872
   Result = 4.05014e+06

Single-Precision - 512-bit AVX512 - Multiply + Add
   GFlops = 63.744
   Result = 6.68723e+06

Double-Precision - 512-bit AVX512 - Multiply + Add
   GFlops = 31.872
   Result = 3.3739e+06

Single-Precision - 512-bit AVX512 - Fused Multiply Add
   GFlops = 127.488
   Result = 8.22848e+06

Double-Precision - 512-bit AVX512 - Fused Multiply Add
   GFlops = 63.744
   Result = 4.03805e+06


Running Skylake Purley tuned binary with 64 thread(s)...

Single-Precision - 128-bit AVX - Add/Sub
   GFlops = 683.36
   Result = 8.68179e+07

Double-Precision - 128-bit AVX - Add/Sub
   GFlops = 263.568
   Result = 3.35065e+07

Single-Precision - 128-bit AVX - Multiply
   GFlops = 527.616
   Result = 6.69453e+07

Double-Precision - 128-bit AVX - Multiply
   GFlops = 263.88
   Result = 3.34619e+07

Single-Precision - 128-bit AVX - Multiply + Add
   GFlops = 527.136
   Result = 5.58561e+07

Double-Precision - 128-bit AVX - Multiply + Add
   GFlops = 263.64
   Result = 2.79832e+07

Single-Precision - 128-bit FMA3 - Fused Multiply Add
   GFlops = 1056.77
   Result = 6.71142e+07

Double-Precision - 128-bit FMA3 - Fused Multiply Add
   GFlops = 528.336
   Result = 3.36188e+07

Single-Precision - 256-bit AVX - Add/Sub
   GFlops = 1054.14
   Result = 1.34076e+08

Double-Precision - 256-bit AVX - Add/Sub
   GFlops = 527.52
   Result = 6.68866e+07

Single-Precision - 256-bit AVX - Multiply
   GFlops = 1056.77
   Result = 1.34416e+08

Double-Precision - 256-bit AVX - Multiply
   GFlops = 527.664
   Result = 6.70251e+07

Single-Precision - 256-bit AVX - Multiply + Add
   GFlops = 1055.33
   Result = 1.12018e+08

Double-Precision - 256-bit AVX - Multiply + Add
   GFlops = 527.52
   Result = 5.59086e+07

Single-Precision - 256-bit FMA3 - Fused Multiply Add
   GFlops = 2110.08
   Result = 1.34046e+08

Double-Precision - 256-bit FMA3 - Fused Multiply Add
   GFlops = 1055.33
   Result = 6.69451e+07

Single-Precision - 512-bit AVX512 - Add/Sub
   GFlops = 2112.26
   Result = 2.68216e+08

Double-Precision - 512-bit AVX512 - Add/Sub
   GFlops = 1056
   Result = 1.34131e+08

Single-Precision - 512-bit AVX512 - Multiply
   GFlops = 2117.38
   Result = 2.69031e+08

Double-Precision - 512-bit AVX512 - Multiply
   GFlops = 1059.26
   Result = 1.34601e+08

Single-Precision - 512-bit AVX512 - Multiply + Add
   GFlops = 2118.14
   Result = 2.24393e+08

Double-Precision - 512-bit AVX512 - Multiply + Add
   GFlops = 1058.5
   Result = 1.12102e+08

Single-Precision - 512-bit AVX512 - Fused Multiply Add
   GFlops = 4242.43
   Result = 2.69409e+08

Double-Precision - 512-bit AVX512 - Fused Multiply Add
   GFlops = 2115.07
   Result = 1.34365e+08

This Skylake Purley system has full-throughput AVX512.

March 25, 2017

No client version given, required version: 0.9.6 ?!

Using 0.9.6.114, submit through the interface.

I can't reproduce the issue. Submissions work fine for me. Can you give me more information? Such a screenshot of the error?

March 24, 2017

No client version given, required version: 0.9.6 ?!

Using 0.9.6.114, submit through the interface.

I'll investigate tonight. I pushed an update a few days ago, but it's certainly possible that I broke something.

March 23, 2017

Has anyone ever seen a machine with 6TB of memory?

March 22, 2017

As an update, I'm now using the Gigabyte GA-AB350M.

With BIOS version F2, the system crashes on the flops benchmark.
With BIOS version F3c, it no longer crashes.

So indeed, this does appear to be fixed.

March 16, 2017

I never tested it with SMT off since my mobo doesn't have that option.

March 14, 2017

I did actually make it in time. Version 0.7.2 has been released. Happy Pi Day everyone!

This new version is faster than the previous version on pretty much all processors that I've tested on.

For internal reasons, you are now required to use the latest submitter version (v0.9.6).

Here are some submissions from the Ryzen build that I used to tune for Zen.

March 13, 2017

Was told this issue will be fixed in a new AGESA code. In other words: it was an AMD issue, not C6H issue.

Thanks for finding this bug @Mysticial!

Wow... Did I really find a bug/errata in the Zen processor? Do I get anything shiny?

March 8, 2017

Before I go through the trouble of trying out Win7. Have you guys tried running the benchmark? Did it crash?

(I'm desperately working to get the Zen tuning parameters for y-cruncher v0.7.2 in time for March 14. So I don't really have that much time to keep debugging this.)

March 8, 2017

OK the SOC voltage is preety damn low. It should be around 1.00 minimum and the range is 1.0-1.20 bump to gain stability.

The bottom voltage Dram termination voltage should be equal to 50%

The BIOS won't let me set an SOC voltage offset of more than 0.2. It puts a hard limit of 1.0 volts. Perhaps it can forced higher via the LLC settings. But I'm hesitant to put settings up the limit of what the BIOS allows.

I'll play around with that a bit tomorrow. But if what you say is true (SOC should be 1.0 - 1.2), and the BIOS number is correct, then perhaps ASUS is simply setting it too low to begin with?

March 8, 2017

can you take a picture of the bios voltages for me, namely dram dram termination and SOC

What's your hypothesis?

March 8, 2017

pull 3 sticks and run

Things I've tried:

One stick of memory. Crashes both with my Corsair and G.Skill TridentZ.
Two different video cards.
Two different installations of Win10 on different devices. (SSD + HD)

The only parts I haven't changed are:

The CPU. (I only have one Ryzen CPU.)
The PSU. (I don't have any spare PSUs lying around and it's too much work to take apart my other builds.)
The motherboard. (I only have one AM4 motherboard.)

Temperatures are always below 80C. So I doubt it's a cooling issue.

March 8, 2017

So, try it different, what you will see in HWinfo about voltage in load? https://www.fosshub.com/HWiNFO.html/hw64_545_3090.zip

Its bad for me, because I have not here yet the test setup with Ryzen (tomorow with Crosshair).

Iny my theory it could be:

-overheating the CPU or VRM because the vcore is fluctuating to high at auto settings

-BIOS issue

-Windows 10 issue (Win7 seems more ready for Ryzens as few guys at another forum wrote)

PS:HPET is enabled via cmd in WIndows?

Enabling/disabling HPET has no effect. Both instantly crash.

Mystical can this program run in win 7? If so can you please try it for me?

I can't install Win7 because the installer doesn't have USB drivers and I don't have a PS2 mouse/keyboard.

March 6, 2017

Its clear, seems too high voltage...Its possible this voltage is for XFR/turbo. You can try disable turbo in BIOS and try the test again and watch your voltage/temps
1800X is hot chip with voltage, 1700 or 1700X have lower temps with same voltage.

There's no option to disable XFR or turbo in my BIOS. I don't trust CPUz's vcore reading since it is clearly too high and it conflicts with AI Suite. These are at stock settings, so it shouldn't getting anywhere near 1.5 anyway.

When I use AI Suite to manually downclock, it seems to disable both the XFR and the turbo and it holds the frequency steady at 2.2 GHz. The vcore seems to stay at a static 1.35 (under load) according to AI Suite. Again CPUz jumps all the over place to as high as 1.550.

But that's beside the point. It really shouldn't be crashing at stock settings - let alone downclock. Which is why I'm looking for more people to test this on different motherboards and from different manufacturers.

So far I have 3 positive confirmations (crash), and zero negative confirmations (did not crash).

The crashes have these setups - all running at stock and/or underclocked.

1800X + Asus Prime B350M-A (BIOS 0502)
1700 + Asus Prime B350M-A (BIOS ???)
1700 + Asus CrossHair

The unanswered questions that I want to know are:

Specific to my setup? No - Confirmed by two other people.
Specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS?
Is this an issue with Windows?
Is this a CPU errata? (I hope not - however unlikely it might be.)

Sign In

Mysticial

Posts

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Events

Blogs

Posts posted by Mysticial

X299 wattage throttling, is it caused by bad cooler?

Math turns benchmark: y-cruncher meets HWBOT

X299 wattage throttling, is it caused by bad cooler?

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

A Favor to Ask: Skylake X and AVX512

Math turns benchmark: y-cruncher meets HWBOT

Math turns benchmark: y-cruncher meets HWBOT

Math turns benchmark: y-cruncher meets HWBOT

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Math turns benchmark: y-cruncher meets HWBOT

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

HWBOT

Browse

Activity