eachus

Members

View Profile See their activity

Posts
1
Joined
March 16, 2017
Last visited
March 19, 2017

About eachus

Birthday 01/11/1946

Converted

Location
Nashua, NH, USA

Converted

Interests
I have a degree in Operations Research, which means I like real messy mathematics.

Converted

Occupation
retired Software Engineer

Converted

realname
Robert Iredell Eachus

eachus's Achievements

Newbie (1/14)

Reputation

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

eachus replied to Mysticial's topic in AMD CPU Overclocking

Just to be clear, AMD supplies the CPU BIOS to the motherboard manufacturers, who build it into their motherboards. So the fix may be waiting on validation, but it is the validation at the mobo maker, and different mobo makers will send out their fix at different times. However, don't worry that much about working around it. AFAIK no code exists that does real work and runs into this bug. It may be possible to come up with some computational fluid dynamics (CFD) code that runs into the problem. But linear algebra code (matrix multiplication, eigenvalues, inverses, etc.) that actually does real work writes the results to memory rather than overwriting it like FLOPS does. You can, in theory have a long sequence of FMA3 instructions that only touch L1 cache, but in practice you will have cache misses.* Even if these are caught by L2, that should give the CPU a break. Is it likely that code you write will hit this problem? Highly unlikely, you need two threads on the same CPU pounding away, or one instruction stream that contains FMA3 instructions 256 or 512 bits wide. Oh, and remember you need to get all that loop cruft into one clock cycle: two load instructions which increment their indicies, the FMA3, a load that moves the result somewhere, and a conditional jump instruction. Do all that in one clock cycle? More to the point Get all those microOps through the front-end in one clock cycle? I can do it, with both AMD and Intel hardware, but it isn't easy, and every new processor generation I have to check to see which version works right there, or if I need something new. Ryzen can dispatch six integer (including index and move instructions) and four floating-point microOps in one clock, so it is not that hard. But notice that the four floating-point microOps can be taken up by a 256-bit FMA3 instruction. A 512-bit FMA3 takes two clock cycles so lots of integer room to play with--this generation. *Yes, I can write junk code which does run several hundred FMA3 instructions in a row. Real matrix multiplication code splits big matrices into small chunks, and use write through move instructions to write results to avoid cache pollution. You don't want final results or partials that won't be used again for seconds to stay in cache.
- March 16, 2017
- 46 replies

Sign In

eachus

Posts

Joined

Last visited

About eachus

Converted

Converted

Converted

Converted

eachus's Achievements

Newbie (1/14)

Reputation

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

HWBOT

Browse

Activity