The Stilt

March 16, 2017

Running Benchmarks for AMD Zen...

Single-Precision - 128-bit SSE - Add/Sub:
   Dependency Chains  = 8
   Result     = 345.6
   FP Ops     = 1024000000000
   seconds    = 5.10914
   GFlops     = 200.425

Single-Precision - 128-bit SSE - Multiply:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 7.50842
   GFlops     = 204.57

Single-Precision - 128-bit SSE - Multiply + Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 5.6472
   GFlops     = 271.993

Single-Precision - 128-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 3072000000000
   seconds    = 6.67126
   GFlops     = 460.483

Double-Precision - 128-bit SSE2 - Add/Sub:
   Dependency Chains  = 8
   Result     = 172.8
   FP Ops     = 512000000000
   seconds    = 4.48418
   GFlops     = 114.179

Double-Precision - 128-bit SSE2 - Multiply:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 768000000000
   seconds    = 7.00079
   GFlops     = 109.702

Double-Precision - 128-bit SSE2 - Multiply + Add:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 768000000000
   seconds    = 7.5535
   GFlops     = 101.675

Double-Precision - 128-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 297.6
   FP Ops     = 1536000000000
   seconds    = 6.71436
   GFlops     = 228.763

Single-Precision - 256-bit AVX - Add/Sub:
   Dependency Chains  = 8
   Result     = 691.2
   FP Ops     = 2048000000000
   seconds    = 8.89565
   GFlops     = 230.225

Single-Precision - 256-bit AVX - Multiply:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 3072000000000
   seconds    = 13.3701
   GFlops     = 229.767

Single-Precision - 256-bit AVX - Multiply + Add:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 3072000000000
   seconds    = 8.31182
   GFlops     = 369.594

Single-Precision - 256-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 1190.4
   FP Ops     = 6144000000000
   seconds    = 13.3439
   GFlops     = 460.433

Double-Precision - 256-bit AVX - Add/Sub:
   Dependency Chains  = 8
   Result     = 345.6
   FP Ops     = 1024000000000
   seconds    = 8.89834
   GFlops     = 115.078

Double-Precision - 256-bit AVX - Multiply:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 13.6687
   GFlops     = 112.374

Double-Precision - 256-bit AVX - Multiply + Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 1536000000000
   seconds    = 8.52216
   GFlops     = 180.236

Double-Precision - 256-bit FMA3 - Fused Multiply Add:
   Dependency Chains = 12
   Result     = 595.2
   FP Ops     = 3072000000000
   seconds    = 13.3443
   GFlops     = 230.211

Flops version 2, compiled with MSVC 2015 Update 3 using the standard project settings. Copied Haswell header (arch_2013_Haswell to arch_2017_Zen) and changed "Running Benchmarks for Intel Haswell..." to "Running Benchmarks for AMD Zen...".

No other changes.

March 16, 2017

Worked just fine for me, with SMT turned off?

March 16, 2017

The issue with Flops was found and fixed in the beginning of february.

The current Âµcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation).

Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).

September 20, 2016

What was the actual voltage? 1.5V++ I would imagine?

That 1.325V should be the default voltage (VID) for P0 @ 3800MHz.

July 8, 2016

Try wiping the heatspreader with some < 10% hydrochlorid acid for couple of minutes. It should remove the gallium residue / oxidation. After wiping it with the acid, clean the acid residue first with water and then again with alcohol. The send it back to Intel.

May 27, 2016

The 1.2V voltages are effectively PCI-E & DRAM PHY voltages. Shouldn't matter.

Of course one could put the contacts in good use and ask them to replace the A88X FCH with A85X one You'll lose the FCH USB3s and need to modify the bios for different AHCI roms but that's ok IMO.

May 27, 2016

I wonder if it's easier to get high frequency in Windows 7. My chip can only do 120MHz Pifast 1/100 runs or so (usually stuck at 118 MHz).

@The Stilt: I'm quite confident that the super high Vcore is only to get the high BCLK stable. I can run 117x38 at 1.5v with ~ 10c 32M stable, but had it up at 1.7V for my final runs.

It's really hard to tell where the X4 845 would be if unlocked.

Thanks for the info!

I wonder if the training is happening properly, actually. @l0ud_sil3nc3 mentioned this already, but on my end I see no actual benefit from dialing in timings from the BIOS. In fact, apart from tCL and tRCD I see no change in performance or stability adjusting any of the other timings. Usually adjusting the timings makes things worse performance wise.

I guess I'm going to pack things up for now and wait for Bristol Ridge to appear.

But seriously, WTF AMD ...

The training and timings work perfectly on Excavator as long as AGESA receives the correct parameters to use (from the bios). If the timings are not working as expected, I would assume that's because of bios bugs.

The AGESA required by FM2+ Carrizos is are cluster *uck. The same code has to support five different chips at the same time (Trinity, Richland, Kaveri, Godavari, Carrizo)...

AGESA itself of course has different paths for all of these, however they are pretty hard to implement from the bios side. So I would assume that it is more a issue with the bios, rather than with anything else.

May 26, 2016

mine chip got 4.6 GHz with 1.55V and only at -50C at pot (CB seems around -90C)

SuperPI 32M pass?

May 26, 2016

unlocked models AM4 could hit 5.5 GHz or so with LN2

If 4.6GHz requires 1.76V I find 5.5GHz pretty unlikely. Also AFAIK there won't be any unlocked models (BR).

May 26, 2016

@The Stilt: is it possible to change the DCT timings at run-time? I can execute read commands, but any write command results in 1) no change to timings and 2) about 5 seconds later a hard shutdown.

The same thing applies on Excavator too, since the DDR3 controllers on Steamroller and Excavator are identical. The only major difference is that on Excavator the PMU SRAM interface is actually working, which makes it possible to train and configure the memory parameters correctly, unlike on Steamroller. Sad stuff

The PMU communication should (not sure about public docs) be explained in BKDG, but it is quite a complex procedure. Also with Excavator you need to take into account that it is purely a mobile chip. You need to write the parameters in a right context (i.e with correct MemPS and NBPS targets). It is certainly possible on Excavator, but it is a bunnying nightmare to do.

Not worth doing, IMO. Do what you can from the bios

These chips are not any kind of priority for me and I've been working on other stuff instead. Check you EDC reading in recent HWInfo beta versions, could come handy

May 19, 2016

Is that voltage the actual requirement for those clocks? :eek:

May 18, 2016

D18F4x15C
[31:31] = 0

[4:2] = 2

D18F4x16C

[3:3] = 1 (TdpLimitDis)

There might be hope.

TdpLimitDis shouldn't do anything since the power management is a SMU feature. This bit should only affect Apm itself. Will writing 15Ch Bits 4:2 to 0 (from 2) stick?

May 18, 2016

No need to send anything over as these chips can be had for 60â‚¬ or so. I just haven't had any interested on these since I already have a Carrizo in a laptop which I have already tested throughly.

So let say if you start SuperPI, the executing core (CU) will jump to 38x multiplier? If that's the case then Turbo is obviously active and working. Carrizo is the first AMD chip which can accurately monitor it's operating parameters and adjust the frequency accordingly. At some point you will most likely be limited by the TDP limit, at least when multiple cores are used. Even the mobile Carrizos running at significantly lower voltages require around 50W TDP to maintain all cores at 3400MHz during Cinebench.

I don't expect that the "number of boosted states" can be changed on this CPU. You would need to change it to zero in order to constantly use the highest available multiplier (38x). However if you're already limited by the TDP, then it won't obviously help much. The only other way around would be basically cheating the power management. If AMD hasn't disabled the TDP control (through SMU), increasing the TDP limit to "sufficient" levels would make the chip run constantly at the maximum frequency under the load. This works for mobile Carrizos at least.

Also the information displayed by MSRTweaker is wrong for most parts. The displayed voltages are wrong (SVI scale used instead of SVI2) and the other information is not displayed properly either. The multipliers are correct, but that's about it.

To know your original voltages:

Calculate the delta between the voltage displayed by MSRTweaker (each PState) and 1.55V. Divide the delta by two and add it to the displayed value. 1.40000V for "P0" (Pb0) is actually 1.47500V.

I'll let you know if I find a good way to solve the pending issue.

Edit: Check D18F4x15C.

Bit 31:31 is 1, correct (BoostLock)?

Bits 4:2 is 2, correct (NumBoostStates, Pb0 38x & Pb1 37x on this CPU)

If that's the case, then increasing the TDP is most likely the only way. Unless you can "tune" the engineer sandbox fuses...

May 17, 2016

Athlon X4 845 is a locked SKU so you cannot configure the maximum boosted multiplier for any other PState than Pb0. The programming conditions for each PState on Carrizo are =< the original FID and VID of the PState.

Carrizo is PITA to work with anyway. I can think of several ways to get around this using SMU, but the implementation would be pretty complex and I have not been able to test it since I don't have any of these chips available.

Also you cannot force the PState to switch to any boosted PState, since PState 0 command points to P0 instead of Pb0. Boosted PStates are not visible to the PStateCMD register, unless you set the "number of boosted states" value to zero (which cannot be done with ease).

It seems that BR will be useless too as it appears that AMD won't be releasing any unlocked SKUs :rolleyes:

April 4, 2016

Does changing the MEMCLK from 1250, 1375 or 1500MHz to +1MHz (e.g. 1501MHz) degrade the performance? If not, then then workload is not latency intensive.

February 22, 2016

Michal, are you running CB at default voltages (i.e voltage not adjusted or left to Auto)? Athlon X4 845 should have Pb0 (3800MHz) VID < 1.475V. Since the CB15 score is so poor the chip most likely throttles due TDP (PPT) limit of 65W being exceed. Carrizo is the first AMD CPU / APU which can measure itÂ´s power consumption accurately so lowering the voltage should address the throttling.

Athlon X4 845 should score > 300 in Cinebench R15 at default clocks (3.5GHz base). Even the FX-8800P scores 288pts at 35/42W TDP.

January 11, 2016

LetÂ´s put it this way. I got no truly in-depth technical information about 17h or AM4 platform in general, but based on the information I have the new stuff might be a quite hostile target for overclockers. Putting two completely differently targeted designs on the same infrastructure (AM4) is a huge compromise itself.

Also when you see both Intel and nVidia implementing high performance targeted nodes for their flag ship products while AMD is doing low power targeted node all the way... IÂ´m not saying the 14nm LPP is completely rubbish, IÂ´m just questioning itÂ´s suitability for a high performance CPU. If you look at the difference of the two 14nm Intel nodes (P1272 & P1273), the high performance node used on Skylake does significally better than the efficiency / density optimized one used on Broadwell.

If the 17h happens to exceed my expectations and the other issues can be solved, then I have no issues in supporting the platform in the same way I have done in the past.

January 11, 2016

I don't think AOD even supports Kaveri. //ninja-edit: apparently the latest version actually does!

The Stilt used to support AMD with enthusiast-grade software, but I don't think AMD is too bothered. Haven't heard of any recent tool that support the latest architectures.

The thing is that you cannot easily change the memory timings on anything newer than Richland. In Kaveri AMD introduced a completely overhauled and "vastly improved" (truth: FUBAR) memory controller. In order to change the timings on these controllers, youÂ´ll need to create an array which contains all the timings and some other parameters. Once you have created the array, youÂ´ll need to stop the PMU (PHY management unit) clock, write the "argument array" to certain register, send a interrupt to the PMU, wait it to ack, restart the PMU clock and hope the thing didnÂ´t hang :rolleyes:

The timings "can" be changed through the PCI config space as usual, but changing them this way doesnÂ´t have the same effect. When done from the bios the timings are being programmed properly by AGESA.

Take a wild guess which method AOD uses

Hopefully AMD will use in the house designed IMC in 17h. These outsourced ones are either complete rubbish or their are just badly implemented into the design. Neither Steamroller or Excavator IMCs can support > DDR-2400 without tampering with the BCLK...

March 26, 2015

Really? I've run 2.3v on a regular basis on air with richland and 2.4v with cold and no damage done

How have you evaluated the "no damage" done aspect?

Still working?

March 26, 2015

The maximum official VDDIO for AMD 28nm designs is 100mV less than for 32nm parts.

It is highly advised not to exceed 1.7V even temporarily or you will risk frying the IMC.

March 23, 2015

It´s just a very minor improvement, around 2% in 32M.

The main issue was fixed in SR/XV design so further optimizations only yield a minor boost.

There might be some additional stuff coming to adress the NB-DRAM FIFO latency issue, but I cannot guarantee I find the time to do it.

The fix is highly configuration dependent and therefore pretty damn time consuming to implement.

March 17, 2015

Regarding BDC, OP will surely deliver, let´s just wait...

Finished some larger projects recently so I could not find enough time to do this earlier.

BDC R1.03B

- Added support for Steamroller; Kaveri (KV-A1) & Via Drago (VD-A1).

Steamroller only requires setting DSWS to enabled.

It will slightly improve the performance in SuperPI depending on the digits used.

Validated on Kaveri and Via Drago with the latest code base (Patch, SMU, PMU & ScS only).

Some of the lesser AV will flag this SW as malware, but as long as you only use the original package you´re safe.

https://www.virustotal.com/fi/file/a00004302efbf4779c358b86e9ec66b8b8ed53304797627e42e50caa26a4f3f6/analysis/1426595828/

The checksum of the original package is DA817FFBCAFCE8C42702E4052A69FC81 (MD5).

In case the checksum differs discard the archive and re-download from another source.

http://1drv.ms/1EsKd0c

January 21, 2015

All of the boards without an external Pll are limited to 136MHz really.

You can go higher, if you know how

A88X sucks at high BCLKs thou.

December 20, 2014

While going through some newer stuff I noticed that Kaveri SuperPi performance can be improved too. The main issue in Piledriver was fixed in Kaveri however the fix is not complete.

The fix improves the performance by 2%, which is not much but still something.

I'll pop out a newer BDC 1.3 version when I find time to add the changes.

The "LSU DSS-SLP" fix applies on both SR & XV designs.

November 4, 2013

it will be question...Its TSMC 28nm, hard to say....I heard some spekulation about 28nm TSMC and FX: one of the reasons no more FX yet is technology process by TSMC. In theory 28nm TSMC is more power hungry than current 32nm SOI HK with the same size of die. So maybe some FX with new process (20nm?)???

Only SoCs and the GPUs are made on TSMC 28nm node.

The APUs are made on GlobalFoundries new 28nm node.

Sign In

The Stilt

Posts

Joined

Last visited

Content Type

Profiles

Forums

Events

Blogs

Posts posted by The Stilt

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

Ryzen 1800X - Instant system crash when running sequence of FMA3 instructions. Request for verification.

NAMEGT - A12-9800 @ 4798.9MHz - 4798.88 mhz CPU Frequency

Intel Customer Support refuse to replace my 5960x

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Athlon X4 845 | Crossblade Ranger | And so it begins ...

Tweak GPUs for faster ETHEREUM Mining ( 7970 / 280x )

Excavator

Kaveri 3DMark social club :)

Kaveri 3DMark social club :)

Kaveri and high VDIMM

Kaveri and high VDIMM

The Stilt's AMD "Extreme" Tools Collection

The Stilt's AMD "Extreme" Tools Collection

A88XM-A looking for method to edit bios or for moded bios

Excavator 5G / Kaveri 5G / Kabini 2.5G SuperPI 32M Challenge

APU Fuse Interpreter (AFI)

HWBOT

Browse

Activity