Jump to content
HWBOT Community Forums
Mysticial

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Recommended Posts

One of my internal benchmark applications is insta-hard-freezing on Ryzen.

 

  • Ryzen 7 1800X
  • Asus Prime B350M-A (BIOS 0502)
  • 4 x 8GB Corsair CMK32GX4M4A2400C14 @ 2133 MHz
  • Nothing is overclocked. Everything is stock.
  • Windows 10 Anniversary Update

 

When I run the Haswell binary from here: https://github.com/Mysticial/Flops/tree/master/version2/binaries-windows

 

The entire system usually freezes when it gets to:

 

Single-Precision - 128-bit FMA3 - Fused Multiply Add:

 

Sometimes, it will make it past that, but it usually ends up crashing/freezing later on in the test anyway.

 

For those who don't trust the binary, the program is completely open-sourced in that GitHub repo. If you have Visual Studio installed: Open the project, build the x64 Haswell binary, and run.

 

For me this always hard freezes the computer:

  • At all clock speeds.
  • When running single-threaded, it happens to any core that I pin it to.

 

The questions that I want to answer are:

  1. Is this specific to my setup? No - Confirmed by multiple other people.
  2. Is this specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS?
  3. Is this an issue with Windows? The crash does not seem to happen in Linux, but that is with slightly different code due to differing compilers.
  4. Is this a CPU errata? (I hope not - however unlikely it might be.)

 

---------------------------

 

Current Testing Status:

 

All of these are running Windows, and are at stock settings or underclocked.

 

Confirmed Crashes:

  1. 1800X + Asus Prime B350M-A (BIOS 0502)
  2. 1700 + Asus Prime B350M-A (BIOS ???)
  3. 1700 + Asus Crosshair VI Hero
  4. 1700 + Asus Crosshair VI Hero (BIOS 5803) (two sets of memory G.Skill + Kingston - also fails with overvolted SOC)
  5. 1800X + Asus Crosshair VI Hero (Windows 7) - Once pass, mostly failures.

 

Confirmed No-Crash:

  1. none yet

 

 

For those interested in the technical details, I'm getting hard freezes for all types of FMAs (128-bit, 256-bit, single and double precision). But for some reason, it only affects this particular benchmark. Other programs (like prime95 and y-cruncher) aren't affected despite using FMAs.

 

---------------------------

 

Update 3/16/2017:

 

As much as I had least expected this to be the case, this appears to have been confirmed as an errata in the AMD Zen processor. In other words, the last bullet on my list (and the most serious). Fortunately, it's one that is fixable with a microcode update and will not result in something catastrophic like a recall or the disabling of features.

 

To everyone pouring in from the various news sites:

  • The important part is that a user mode program should not be able to hard freeze the entire system. Because if it can (as is the case here), it makes it possible to perform DOS attacks. IOW, this errata is a security issue.
  • Don't be fooled by the "Haswell binary". The benchmark is 5 years old and I've largely neglected it for the last 3. So I haven't updated it for Zen yet. Any processor will be able to run any of the binaries if it supports the underlying instruction sets. If it doesn't, the program merely crashes with an, "illegal instruction". Under no circumstances should a user-mode application be able to bring down an entire system.

Edited by Mysticial

Share this post


Link to post
Share on other sites

Confirmed at stock clocks on Ryzen 1700 on Crosshair motherboard. Becomes unresponsive, DRAM led flashing and 8 on q-code display.

Share this post


Link to post
Share on other sites

Uh oh... This doesn't look good. I also have one other confirmation on a different forum.

 

Other things to note: It doesn't always freeze instantly. I have a different Win10 installation that sometimes manages to survive the first FMA test only to crash on the second.

 

The crash doesn't reproduce in Linux, but the code for Linux is slightly different since it uses a different compiler.

Share this post


Link to post
Share on other sites

Guysm do you have last BIOSes for motherboards? Whats your voltage in BIOS or CPUZ in load?

Share this post


Link to post
Share on other sites
Guysm do you have last BIOSes for motherboards? Whats your voltage in BIOS or CPUZ in load?

 

For me yes. BIOS 0502 (February 28)

 

The BIOS and AI Suite show a vcore of 1.350. CPUz shows it as 1.550. And it also happens when underclocked to 2.2 GHz.

 

The Windows Event Log occasionally manages to record which core it crashes on. It's pretty random among all 16 vcores. There's no single core that it always happens to. IOW, I don't see any signs of weakness to a specific core.

Share this post


Link to post
Share on other sites

Its clear, seems too high voltage...Its possible this voltage is for XFR/turbo. You can try disable turbo in BIOS and try the test again and watch your voltage/temps

1800X is hot chip with voltage, 1700 or 1700X have lower temps with same voltage.

Share this post


Link to post
Share on other sites
Its clear, seems too high voltage...Its possible this voltage is for XFR/turbo. You can try disable turbo in BIOS and try the test again and watch your voltage/temps

1800X is hot chip with voltage, 1700 or 1700X have lower temps with same voltage.

 

There's no option to disable XFR or turbo in my BIOS. I don't trust CPUz's vcore reading since it is clearly too high and it conflicts with AI Suite. These are at stock settings, so it shouldn't getting anywhere near 1.5 anyway.

 

When I use AI Suite to manually downclock, it seems to disable both the XFR and the turbo and it holds the frequency steady at 2.2 GHz. The vcore seems to stay at a static 1.35 (under load) according to AI Suite. Again CPUz jumps all the over place to as high as 1.550.

 

But that's beside the point. It really shouldn't be crashing at stock settings - let alone downclock. Which is why I'm looking for more people to test this on different motherboards and from different manufacturers.

 

So far I have 3 positive confirmations (crash), and zero negative confirmations (did not crash).

 

The crashes have these setups - all running at stock and/or underclocked.

  1. 1800X + Asus Prime B350M-A (BIOS 0502)
  2. 1700 + Asus Prime B350M-A (BIOS ???)
  3. 1700 + Asus CrossHair

 

The unanswered questions that I want to know are:

  1. Specific to my setup? No - Confirmed by two other people.
  2. Specific to Asus mobos or an immature BIOS? If so, can it be fixed with a later BIOS?
  3. Is this an issue with Windows?
  4. Is this a CPU errata? (I hope not - however unlikely it might be.)

Edited by Mysticial

Share this post


Link to post
Share on other sites

So, try it different, what you will see in HWinfo about voltage in load? https://www.fosshub.com/HWiNFO.html/hw64_545_3090.zip

 

Its bad for me, because I have not here yet the test setup with Ryzen (tomorow with Crosshair).

Iny my theory it could be:

-overheating the CPU or VRM because the vcore is fluctuating to high at auto settings

-BIOS issue

-Windows 10 issue (Win7 seems more ready for Ryzens as few guys at another forum wrote)

 

PS:HPET is enabled via cmd in WIndows?

Edited by flanker

Share this post


Link to post
Share on other sites

Mystical can this program run in win 7? If so can you please try it for me?

 

flanker I have an idea that its not related to that stuff nor smt.

 

errata I doubt misinformed as to how much cache really exists possibly or the memory.

 

Before you go rip a windows install put in one quick test. Use 1 stick please run and tell me if they are double sided and if it runs.

 

Also what is the default SOC voltage on that board.

Edited by chew*

Share this post


Link to post
Share on other sites
So, try it different, what you will see in HWinfo about voltage in load? https://www.fosshub.com/HWiNFO.html/hw64_545_3090.zip

 

Its bad for me, because I have not here yet the test setup with Ryzen (tomorow with Crosshair).

Iny my theory it could be:

-overheating the CPU or VRM because the vcore is fluctuating to high at auto settings

-BIOS issue

-Windows 10 issue (Win7 seems more ready for Ryzens as few guys at another forum wrote)

 

PS:HPET is enabled via cmd in WIndows?

 

Enabling/disabling HPET has no effect. Both instantly crash.

 

Mystical can this program run in win 7? If so can you please try it for me?

 

I can't install Win7 because the installer doesn't have USB drivers and I don't have a PS2 mouse/keyboard.

Share this post


Link to post
Share on other sites
Enabling/disabling HPET has no effect. Both instantly crash.

 

 

 

I can't install Win7 because the installer doesn't have USB drivers and I don't have a PS2 mouse/keyboard.

 

pull 3 sticks and run

Share this post


Link to post
Share on other sites
pull 3 sticks and run

 

Things I've tried:

 

  • One stick of memory. Crashes both with my Corsair and G.Skill TridentZ.
  • Two different video cards.
  • Two different installations of Win10 on different devices. (SSD + HD)

 

The only parts I haven't changed are:

  • The CPU. (I only have one Ryzen CPU.)
  • The PSU. (I don't have any spare PSUs lying around and it's too much work to take apart my other builds.)
  • The motherboard. (I only have one AM4 motherboard.)

 

Temperatures are always below 80C. So I doubt it's a cooling issue.

Share this post


Link to post
Share on other sites
Things I've tried:

 

  • One stick of memory. Crashes both with my Corsair and G.Skill TridentZ.
  • Two different video cards.
  • Two different installations of Win10 on different devices. (SSD + HD)

 

The only parts I haven't changed are:

  • The CPU. (I only have one Ryzen CPU.)
  • The PSU. (I don't have any spare PSUs lying around and it's too much work to take apart my other builds.)
  • The motherboard. (I only have one AM4 motherboard.)

 

can you take a picture of the bios voltages for me, namely dram dram termination and SOC

Share this post


Link to post
Share on other sites
can you take a picture of the bios voltages for me, namely dram dram termination and SOC

 

What's your hypothesis?

Share this post


Link to post
Share on other sites

The reason I wanted you to test win 7 is.....

 

Logical Processor to Cache Map:

*--------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64

*--------------- Instruction Cache 0, Level 1, 64 KB, Assoc 4, LineSize 64

*--------------- Unified Cache 0, Level 2, 512 KB, Assoc 8, LineSize 64

*--------------- Unified Cache 1, Level 3, 16 MB, Assoc 16, LineSize 64

-*-------------- Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64

-*-------------- Instruction Cache 1, Level 1, 64 KB, Assoc 4, LineSize 64

-*-------------- Unified Cache 2, Level 2, 512 KB, Assoc 8, LineSize 64

-*-------------- Unified Cache 3, Level 3, 16 MB, Assoc 16, LineSize 64

--*------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64

--*------------- Instruction Cache 2, Level 1, 64 KB, Assoc 4, LineSize 64

--*------------- Unified Cache 4, Level 2, 512 KB, Assoc 8, LineSize 64

--*------------- Unified Cache 5, Level 3, 16 MB, Assoc 16, LineSize 64

---*------------ Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64

---*------------ Instruction Cache 3, Level 1, 64 KB, Assoc 4, LineSize 64

---*------------ Unified Cache 6, Level 2, 512 KB, Assoc 8, LineSize 64

---*------------ Unified Cache 7, Level 3, 16 MB, Assoc 16, LineSize 64

----*----------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64

----*----------- Instruction Cache 4, Level 1, 64 KB, Assoc 4, LineSize 64

----*----------- Unified Cache 8, Level 2, 512 KB, Assoc 8, LineSize 64

----*----------- Unified Cache 9, Level 3, 16 MB, Assoc 16, LineSize 64

-----*---------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64

-----*---------- Instruction Cache 5, Level 1, 64 KB, Assoc 4, LineSize 64

-----*---------- Unified Cache 10, Level 2, 512 KB, Assoc 8, LineSize 64

-----*---------- Unified Cache 11, Level 3, 16 MB, Assoc 16, LineSize 64

------*--------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64

------*--------- Instruction Cache 6, Level 1, 64 KB, Assoc 4, LineSize 64

------*--------- Unified Cache 12, Level 2, 512 KB, Assoc 8, LineSize 64

------*--------- Unified Cache 13, Level 3, 16 MB, Assoc 16, LineSize 64

-------*-------- Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64

-------*-------- Instruction Cache 7, Level 1, 64 KB, Assoc 4, LineSize 64

-------*-------- Unified Cache 14, Level 2, 512 KB, Assoc 8, LineSize 64

-------*-------- Unified Cache 15, Level 3, 16 MB, Assoc 16, LineSize 64

--------*------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64

--------*------- Instruction Cache 8, Level 1, 64 KB, Assoc 4, LineSize 64

--------*------- Unified Cache 16, Level 2, 512 KB, Assoc 8, LineSize 64

--------*------- Unified Cache 17, Level 3, 16 MB, Assoc 16, LineSize 64

---------*------ Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64

---------*------ Instruction Cache 9, Level 1, 64 KB, Assoc 4, LineSize 64

---------*------ Unified Cache 18, Level 2, 512 KB, Assoc 8, LineSize 64

---------*------ Unified Cache 19, Level 3, 16 MB, Assoc 16, LineSize 64

----------*----- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64

----------*----- Instruction Cache 10, Level 1, 64 KB, Assoc 4, LineSize 64

----------*----- Unified Cache 20, Level 2, 512 KB, Assoc 8, LineSize 64

----------*----- Unified Cache 21, Level 3, 16 MB, Assoc 16, LineSize 64

-----------*---- Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64

-----------*---- Instruction Cache 11, Level 1, 64 KB, Assoc 4, LineSize 64

-----------*---- Unified Cache 22, Level 2, 512 KB, Assoc 8, LineSize 64

-----------*---- Unified Cache 23, Level 3, 16 MB, Assoc 16, LineSize 64

------------*--- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64

------------*--- Instruction Cache 12, Level 1, 64 KB, Assoc 4, LineSize 64

------------*--- Unified Cache 24, Level 2, 512 KB, Assoc 8, LineSize 64

------------*--- Unified Cache 25, Level 3, 16 MB, Assoc 16, LineSize 64

-------------*-- Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64

-------------*-- Instruction Cache 13, Level 1, 64 KB, Assoc 4, LineSize 64

-------------*-- Unified Cache 26, Level 2, 512 KB, Assoc 8, LineSize 64

-------------*-- Unified Cache 27, Level 3, 16 MB, Assoc 16, LineSize 64

--------------*- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64

--------------*- Instruction Cache 14, Level 1, 64 KB, Assoc 4, LineSize 64

--------------*- Unified Cache 28, Level 2, 512 KB, Assoc 8, LineSize 64

--------------*- Unified Cache 29, Level 3, 16 MB, Assoc 16, LineSize 64

---------------* Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64

---------------* Instruction Cache 15, Level 1, 64 KB, Assoc 4, LineSize 64

---------------* Unified Cache 30, Level 2, 512 KB, Assoc 8, LineSize 64

---------------* Unified Cache 31, Level 3, 16 MB, Assoc 16, LineSize 64

 

each zen thread is being registered as an individual core with its own L2 and L3 cache

 

I have a weird feeling that this and some other gremlins some are experiencing could be related......

Share this post


Link to post
Share on other sites

OK the SOC voltage is preety damn low. It should be around 1.00 minimum and the range is 1.0-1.20 bump to gain stability.

 

The bottom voltage Dram termination voltage should be equal to 50%

Share this post


Link to post
Share on other sites
OK the SOC voltage is preety damn low. It should be around 1.00 minimum and the range is 1.0-1.20 bump to gain stability.

 

The bottom voltage Dram termination voltage should be equal to 50%

 

The BIOS won't let me set an SOC voltage offset of more than 0.2. It puts a hard limit of 1.0 volts. Perhaps it can forced higher via the LLC settings. But I'm hesitant to put settings up the limit of what the BIOS allows.

 

I'll play around with that a bit tomorrow. But if what you say is true (SOC should be 1.0 - 1.2), and the BIOS number is correct, then perhaps ASUS is simply setting it too low to begin with?

Share this post


Link to post
Share on other sites
The BIOS won't let me set an SOC voltage offset of more than 0.2. It puts a hard limit of 1.0 volts. Perhaps it can forced higher via the LLC settings. But I'm hesitant to put settings up the limit of what the BIOS allows.

 

I'll play around with that a bit tomorrow. But if what you say is true (SOC should be 1.0 - 1.2), and the BIOS number is correct, then perhaps ASUS is simply setting it too low to begin with?

 

I'm running high LLC and 1.00 SOC my board defaults at 1.1. I found in my case chasing highest clocks I can drop it .100 to drop heat.

 

For 4 dims populated yes that would be a tad to low by default imo.

 

My biggest concern is not your board but the report of the crosshair also exhibiting this issue. I'm under the understanding that they are working on updating mainstream boards second top tier first......but if ch6 has this issue..it should have latest agesa already.

Edited by chew*

Share this post


Link to post
Share on other sites

you can try install win7 via iso with USB3 drivers inside (at web of Asus is in download section software to create bootable USB win7 with USB3 drivers)

Share this post


Link to post
Share on other sites

Before I go through the trouble of trying out Win7. Have you guys tried running the benchmark? Did it crash?

 

(I'm desperately working to get the Zen tuning parameters for y-cruncher v0.7.2 in time for March 14. So I don't really have that much time to keep debugging this.)

Edited by Mysticial

Share this post


Link to post
Share on other sites

Quickly tested here too, failed on both systems (changed memory). Also failed with higher SOC voltage.

 

attachment.php?attachmentid=5419&stc=1&d=1489032583

 

attachment.php?attachmentid=5420&stc=1&d=1489032583

Share this post


Link to post
Share on other sites

The bug is repetible on WIN10. I heard that win10 is unable to recognize cache of Ryzen correctly but Win7 Could. You should benchmark it on WIN7

Share this post


Link to post
Share on other sites

I could reproduce it on Windows 7 x64 SP1 and everything on default in bios with 1800X.

The system shuts down (8 code on Hero). Tried with OC and manual settings, same thing. Had it pass once though (on 4.1GHz).

Share this post


Link to post
Share on other sites

Was told this issue will be fixed in a new AGESA code. In other words: it was an AMD issue, not C6H issue.

 

Thanks for finding this bug @Mysticial!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×