Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Mysticial · March 13, 2017

Was told this issue will be fixed in a new AGESA code. In other words: it was an AMD issue, not C6H issue.

Thanks for finding this bug @Mysticial!

Wow... Did I really find a bug/errata in the Zen processor? Do I get anything shiny?

flanker · March 13, 2017

you are star now

IanCutress · March 14, 2017

Win7 if you don't know how to DISM: How To Get Ryzen Working on Windows 7 x64

ancieque · March 15, 2017

I tried it on my Gigabyte AB350-Gaming 3-CF and my display went completely black. So this is definitely not an ASUS only thing.

The Stilt · March 16, 2017

The issue with Flops was found and fixed in the beginning of february.

The current Âµcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation).

Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).

chew* · March 16, 2017

The issue with Flops was found and fixed in the beginning of february.
The current Âµcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation).

Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).

Except for the fact. He already ran smt off tests and still had said issue. Least thats what he said.

Edited March 16, 2017 by chew*

The Stilt · March 16, 2017

Worked just fine for me, with SMT turned off?

Mysticial · March 16, 2017

I never tested it with SMT off since my mobo doesn't have that option.

The Stilt · March 16, 2017

Running Benchmarks for AMD Zen...

Single-Precision - 128-bit SSE - Add/Sub:
   Dependency Chains  = 8
   Result     = 345.6
   FP Ops     = 1024000000000
   seconds    = 5.10914
   GFlops     = 200.425

Single-Precision - 128-bit SSE - Multiply:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 7.50842
   GFlops     = 204.57

Single-Precision - 128-bit SSE - Multiply + Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 5.6472
   GFlops     = 271.993

Single-Precision - 128-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 3072000000000
   seconds    = 6.67126
   GFlops     = 460.483

Double-Precision - 128-bit SSE2 - Add/Sub:
   Dependency Chains  = 8
   Result     = 172.8
   FP Ops     = 512000000000
   seconds    = 4.48418
   GFlops     = 114.179

Double-Precision - 128-bit SSE2 - Multiply:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 768000000000
   seconds    = 7.00079
   GFlops     = 109.702

Double-Precision - 128-bit SSE2 - Multiply + Add:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 768000000000
   seconds    = 7.5535
   GFlops     = 101.675

Double-Precision - 128-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 1536000000000
   seconds    = 6.71436
   GFlops     = 228.763

Single-Precision - 256-bit AVX - Add/Sub:
   Dependency Chains  = 8
   Result     = 691.2
   FP Ops     = 2048000000000
   seconds    = 8.89565
   GFlops     = 230.225

Single-Precision - 256-bit AVX - Multiply:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 3072000000000
   seconds    = 13.3701
   GFlops     = 229.767

Single-Precision - 256-bit AVX - Multiply + Add:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 3072000000000
   seconds    = 8.31182
   GFlops     = 369.594

Single-Precision - 256-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 6144000000000
   seconds    = 13.3439
   GFlops     = 460.433

Double-Precision - 256-bit AVX - Add/Sub:
   Dependency Chains  = 8
   Result     = 345.6
   FP Ops     = 1024000000000
   seconds    = 8.89834
   GFlops     = 115.078

Double-Precision - 256-bit AVX - Multiply:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 13.6687
   GFlops     = 112.374

Double-Precision - 256-bit AVX - Multiply + Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 8.52216
   GFlops     = 180.236

Double-Precision - 256-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 3072000000000
   seconds    = 13.3443
   GFlops     = 230.211

Flops version 2, compiled with MSVC 2015 Update 3 using the standard project settings. Copied Haswell header (arch_2013_Haswell to arch_2017_Zen) and changed "Running Benchmarks for Intel Haswell..." to "Running Benchmarks for AMD Zen...".

No other changes.

eachus · March 16, 2017

The issue with Flops was found and fixed in the beginning of february.
The current Âµcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation).

Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).

Just to be clear, AMD supplies the CPU BIOS to the motherboard manufacturers, who build it into their motherboards. So the fix may be waiting on validation, but it is the validation at the mobo maker, and different mobo makers will send out their fix at different times.

However, don't worry that much about working around it. AFAIK no code exists that does real work and runs into this bug. It may be possible to come up with some computational fluid dynamics (CFD) code that runs into the problem. But linear algebra code (matrix multiplication, eigenvalues, inverses, etc.) that actually does real work writes the results to memory rather than overwriting it like FLOPS does. You can, in theory have a long sequence of FMA3 instructions that only touch L1 cache, but in practice you will have cache misses.* Even if these are caught by L2, that should give the CPU a break.

Is it likely that code you write will hit this problem? Highly unlikely, you need two threads on the same CPU pounding away, or one instruction stream that contains FMA3 instructions 256 or 512 bits wide. Oh, and remember you need to get all that loop cruft into one clock cycle: two load instructions which increment their indicies, the FMA3, a load that moves the result somewhere, and a conditional jump instruction. Do all that in one clock cycle? More to the point Get all those microOps through the front-end in one clock cycle? I can do it, with both AMD and Intel hardware, but it isn't easy, and every new processor generation I have to check to see which version works right there, or if I need something new. Ryzen can dispatch six integer (including index and move instructions) and four floating-point microOps in one clock, so it is not that hard. But notice that the four floating-point microOps can be taken up by a 256-bit FMA3 instruction. A 512-bit FMA3 takes two clock cycles so lots of integer room to play with--this generation.

*Yes, I can write junk code which does run several hundred FMA3 instructions in a row. Real matrix multiplication code splits big matrices into small chunks, and use write through move instructions to write results to avoid cache pollution. You don't want final results or partials that won't be used again for seconds to stay in cache.

zeroprobe · March 17, 2017

I think I may have found my problem from this thread.

I've a stable 1700 @4GHZ overclock with Realbench, Folding 3hrs+ and Prime.

When I tried to export a video using Adobe Premiere CC my computer would crash on any overclock. I have to use 3.3ghz or below for the export to work correctly.

Can anyone else test my Premiere CC project and try to export the video.

1700 @4GHZ stable with all other app.

Gigabyte Gaming 3 Motherboard

32GB Avexir 2400mhz

Premiere test project here

https://mega.nz/#!XYNzyR6B!3-ibb1Vaapsm2ZPSUZsfAO9-Ixnycfe97_eB3sTOFl4

Just try to export the default settings to H264.

zeroprobe · March 17, 2017

I think I may have found my problem from this thread.

I've a stable 1700 @4GHZ overclock with Realbench, Folding 3hrs+ and Prime.

When I tried to export a video using Adobe Premiere CC my computer would crash on any overclock. I have to use 3.3ghz or below for the export to work correctly.

Can anyone else test my Premiere CC project and try to export the video.

1700 @4GHZ stable with all other app.

Gigabyte Gaming 3 Motherboard

32GB Avexir 2400mhz

Premiere test project here

Just try to export the default settings to H264.

zeroprobe · March 17, 2017

I think this may be related to an issue I am having.

Does anyone have Premiere CC here they could help me test an issue?

I can't export video above 3.3ghz on my 1700. My system is stable for all other applications 3.4ghz - 4ghz. Similar problem happens as the example in this thread.

I've done a sample Premiere project below that just has a sample video and a couple of demanding effects. If you can try to File > Export > Media to the default H264 settings and see if it processes OK on your overclock.

https://mega.nz/#!XYNzyR6B!3-ibb1Vaapsm2ZPSUZsfAO9-Ixnycfe97_eB3sTOFl4

If I know it is this common issue I can stop digging.

chew* · March 19, 2017

I never tested it with SMT off since my mobo doesn't have that option.

Ahh well that explains why it did not work.

March 21, 2017

1. 1700X + ASUS Prime X370-PRO (0504) - crash

2. 1700X + ASUS Prime B350M-A (0502) - crash

U180 · March 21, 2017

No problems on:

AMD R7 1700

Gigabyte AB350-GAMING 3 bios F6d

Mysticial · March 22, 2017

As an update, I'm now using the Gigabyte GA-AB350M.

With BIOS version F2, the system crashes on the flops benchmark.
With BIOS version F3c, it no longer crashes.

So indeed, this does appear to be fixed.

kakabubu · March 22, 2017

gigabyte ax370 gaming 5, bios F5c, still crashing

nesham · March 24, 2017

Ryzen 7 1800X@default + ASUS C6H without problems

Screenshot is from second run.https://1drv.ms/i/s!Am8R6osEOJLVkUJfI-RF4PCIQemp

nesham · March 24, 2017

Ryzen 7 1800x @Default + ASUS C6H with BIOS 1002

without crash

s!Am8R6osEOJLVkUJfI-RF4PCIQemp

sirzooro · April 11, 2017

Hi all,

People crunching at TN-Grid also experienced some crashed. They reported that recent BIOS update solved problem only partially. It would be good if someone from AMD could take a look on this. Here is thread where this problem is discussed: FMA problems (Ryzen and others?)

kakabubu · April 13, 2017

gigabyte ax370 gaming 5, bios F5 - work like a charm

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest brian

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation