Jump to content
HWBOT Community Forums

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.


Mysticial

Recommended Posts

The issue with Flops was found and fixed in the beginning of february.

The current µcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation).

Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).

Link to comment
Share on other sites

The issue with Flops was found and fixed in the beginning of february.

The current µcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation).

Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).

 

Except for the fact. He already ran smt off tests and still had said issue. Least thats what he said.

Edited by chew*
Link to comment
Share on other sites

hgsHdP6.png

 

Running Benchmarks for AMD Zen...

Single-Precision - 128-bit SSE - Add/Sub:
   Dependency Chains  = 8
   Result     = 345.6
   FP Ops     = 1024000000000
   seconds    = 5.10914
   GFlops     = 200.425

Single-Precision - 128-bit SSE - Multiply:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 7.50842
   GFlops     = 204.57

Single-Precision - 128-bit SSE - Multiply + Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 5.6472
   GFlops     = 271.993

Single-Precision - 128-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 3072000000000
   seconds    = 6.67126
   GFlops     = 460.483

Double-Precision - 128-bit SSE2 - Add/Sub:
   Dependency Chains  = 8
   Result     = 172.8
   FP Ops     = 512000000000
   seconds    = 4.48418
   GFlops     = 114.179

Double-Precision - 128-bit SSE2 - Multiply:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 768000000000
   seconds    = 7.00079
   GFlops     = 109.702

Double-Precision - 128-bit SSE2 - Multiply + Add:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 768000000000
   seconds    = 7.5535
   GFlops     = 101.675

Double-Precision - 128-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 1536000000000
   seconds    = 6.71436
   GFlops     = 228.763

Single-Precision - 256-bit AVX - Add/Sub:
   Dependency Chains  = 8
   Result     = 691.2
   FP Ops     = 2048000000000
   seconds    = 8.89565
   GFlops     = 230.225

Single-Precision - 256-bit AVX - Multiply:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 3072000000000
   seconds    = 13.3701
   GFlops     = 229.767

Single-Precision - 256-bit AVX - Multiply + Add:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 3072000000000
   seconds    = 8.31182
   GFlops     = 369.594

Single-Precision - 256-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 6144000000000
   seconds    = 13.3439
   GFlops     = 460.433

Double-Precision - 256-bit AVX - Add/Sub:
   Dependency Chains  = 8
   Result     = 345.6
   FP Ops     = 1024000000000
   seconds    = 8.89834
   GFlops     = 115.078

Double-Precision - 256-bit AVX - Multiply:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 13.6687
   GFlops     = 112.374

Double-Precision - 256-bit AVX - Multiply + Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 8.52216
   GFlops     = 180.236

Double-Precision - 256-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 3072000000000
   seconds    = 13.3443
   GFlops     = 230.211

 

Flops version 2, compiled with MSVC 2015 Update 3 using the standard project settings. Copied Haswell header (arch_2013_Haswell to arch_2017_Zen) and changed "Running Benchmarks for Intel Haswell..." to "Running Benchmarks for AMD Zen...".

 

No other changes.

Link to comment
Share on other sites

The issue with Flops was found and fixed in the beginning of february.

The current µcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation).

Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).

 

Just to be clear, AMD supplies the CPU BIOS to the motherboard manufacturers, who build it into their motherboards. So the fix may be waiting on validation, but it is the validation at the mobo maker, and different mobo makers will send out their fix at different times.

 

However, don't worry that much about working around it. AFAIK no code exists that does real work and runs into this bug. It may be possible to come up with some computational fluid dynamics (CFD) code that runs into the problem. But linear algebra code (matrix multiplication, eigenvalues, inverses, etc.) that actually does real work writes the results to memory rather than overwriting it like FLOPS does. You can, in theory have a long sequence of FMA3 instructions that only touch L1 cache, but in practice you will have cache misses.* Even if these are caught by L2, that should give the CPU a break.

 

Is it likely that code you write will hit this problem? Highly unlikely, you need two threads on the same CPU pounding away, or one instruction stream that contains FMA3 instructions 256 or 512 bits wide. Oh, and remember you need to get all that loop cruft into one clock cycle: two load instructions which increment their indicies, the FMA3, a load that moves the result somewhere, and a conditional jump instruction. Do all that in one clock cycle? More to the point Get all those microOps through the front-end in one clock cycle? I can do it, with both AMD and Intel hardware, but it isn't easy, and every new processor generation I have to check to see which version works right there, or if I need something new. Ryzen can dispatch six integer (including index and move instructions) and four floating-point microOps in one clock, so it is not that hard. But notice that the four floating-point microOps can be taken up by a 256-bit FMA3 instruction. A 512-bit FMA3 takes two clock cycles so lots of integer room to play with--this generation.

 

*Yes, I can write junk code which does run several hundred FMA3 instructions in a row. Real matrix multiplication code splits big matrices into small chunks, and use write through move instructions to write results to avoid cache pollution. You don't want final results or partials that won't be used again for seconds to stay in cache.

Link to comment
Share on other sites

I think I may have found my problem from this thread.

 

I've a stable 1700 @4GHZ overclock with Realbench, Folding 3hrs+ and Prime.

 

When I tried to export a video using Adobe Premiere CC my computer would crash on any overclock. I have to use 3.3ghz or below for the export to work correctly.

 

Can anyone else test my Premiere CC project and try to export the video.

 

1700 @4GHZ stable with all other app.

Gigabyte Gaming 3 Motherboard

32GB Avexir 2400mhz

 

Premiere test project here

https://mega.nz/#!XYNzyR6B!3-ibb1Vaapsm2ZPSUZsfAO9-Ixnycfe97_eB3sTOFl4

 

Just try to export the default settings to H264.

Link to comment
Share on other sites

I think I may have found my problem from this thread.

 

I've a stable 1700 @4GHZ overclock with Realbench, Folding 3hrs+ and Prime.

 

When I tried to export a video using Adobe Premiere CC my computer would crash on any overclock. I have to use 3.3ghz or below for the export to work correctly.

 

Can anyone else test my Premiere CC project and try to export the video.

 

1700 @4GHZ stable with all other app.

Gigabyte Gaming 3 Motherboard

32GB Avexir 2400mhz

 

Premiere test project here

 

 

Just try to export the default settings to H264.

Link to comment
Share on other sites

I think this may be related to an issue I am having.

 

Does anyone have Premiere CC here they could help me test an issue?

 

I can't export video above 3.3ghz on my 1700. My system is stable for all other applications 3.4ghz - 4ghz. Similar problem happens as the example in this thread.

 

I've done a sample Premiere project below that just has a sample video and a couple of demanding effects. If you can try to File > Export > Media to the default H264 settings and see if it processes OK on your overclock.

https://mega.nz/#!XYNzyR6B!3-ibb1Vaapsm2ZPSUZsfAO9-Ixnycfe97_eB3sTOFl4

 

If I know it is this common issue I can stop digging.

Link to comment
Share on other sites

  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...