Jump to content
HWBOT Community Forums

The Stilt

Members
  • Posts

    101
  • Joined

  • Last visited

Everything posted by The Stilt

  1. Running Benchmarks for AMD Zen... Single-Precision - 128-bit SSE - Add/Sub: Dependency Chains = 8 Result = 345.6 FP Ops = 1024000000000 seconds = 5.10914 GFlops = 200.425 Single-Precision - 128-bit SSE - Multiply: Dependency Chains = 12 Result = 595.2 FP Ops = 1536000000000 seconds = 7.50842 GFlops = 204.57 Single-Precision - 128-bit SSE - Multiply + Add: Dependency Chains = 12 Result = 595.2 FP Ops = 1536000000000 seconds = 5.6472 GFlops = 271.993 Single-Precision - 128-bit FMA3 - Fused Multiply Add: Dependency Chains = 12 Result = 595.2 FP Ops = 3072000000000 seconds = 6.67126 GFlops = 460.483 Double-Precision - 128-bit SSE2 - Add/Sub: Dependency Chains = 8 Result = 172.8 FP Ops = 512000000000 seconds = 4.48418 GFlops = 114.179 Double-Precision - 128-bit SSE2 - Multiply: Dependency Chains = 12 Result = 297.6 FP Ops = 768000000000 seconds = 7.00079 GFlops = 109.702 Double-Precision - 128-bit SSE2 - Multiply + Add: Dependency Chains = 12 Result = 297.6 FP Ops = 768000000000 seconds = 7.5535 GFlops = 101.675 Double-Precision - 128-bit FMA3 - Fused Multiply Add: Dependency Chains = 12 Result = 297.6 FP Ops = 1536000000000 seconds = 6.71436 GFlops = 228.763 Single-Precision - 256-bit AVX - Add/Sub: Dependency Chains = 8 Result = 691.2 FP Ops = 2048000000000 seconds = 8.89565 GFlops = 230.225 Single-Precision - 256-bit AVX - Multiply: Dependency Chains = 12 Result = 1190.4 FP Ops = 3072000000000 seconds = 13.3701 GFlops = 229.767 Single-Precision - 256-bit AVX - Multiply + Add: Dependency Chains = 12 Result = 1190.4 FP Ops = 3072000000000 seconds = 8.31182 GFlops = 369.594 Single-Precision - 256-bit FMA3 - Fused Multiply Add: Dependency Chains = 12 Result = 1190.4 FP Ops = 6144000000000 seconds = 13.3439 GFlops = 460.433 Double-Precision - 256-bit AVX - Add/Sub: Dependency Chains = 8 Result = 345.6 FP Ops = 1024000000000 seconds = 8.89834 GFlops = 115.078 Double-Precision - 256-bit AVX - Multiply: Dependency Chains = 12 Result = 595.2 FP Ops = 1536000000000 seconds = 13.6687 GFlops = 112.374 Double-Precision - 256-bit AVX - Multiply + Add: Dependency Chains = 12 Result = 595.2 FP Ops = 1536000000000 seconds = 8.52216 GFlops = 180.236 Double-Precision - 256-bit FMA3 - Fused Multiply Add: Dependency Chains = 12 Result = 595.2 FP Ops = 3072000000000 seconds = 13.3443 GFlops = 230.211 Flops version 2, compiled with MSVC 2015 Update 3 using the standard project settings. Copied Haswell header (arch_2013_Haswell to arch_2017_Zen) and changed "Running Benchmarks for Intel Haswell..." to "Running Benchmarks for AMD Zen...". No other changes.
  2. The issue with Flops was found and fixed in the beginning of february. The current µcode version dates to 01/27/2017, so the fix is obviously not included yet (due to the time required for validation). Flops is only affected when the SMT is enabled, so disabling the SMT can be used as a temporary work-around (until the actual fix arrives).
  3. What was the actual voltage? 1.5V++ I would imagine? That 1.325V should be the default voltage (VID) for P0 @ 3800MHz.
  4. Try wiping the heatspreader with some < 10% hydrochlorid acid for couple of minutes. It should remove the gallium residue / oxidation. After wiping it with the acid, clean the acid residue first with water and then again with alcohol. The send it back to Intel.
  5. The 1.2V voltages are effectively PCI-E & DRAM PHY voltages. Shouldn't matter. Of course one could put the contacts in good use and ask them to replace the A88X FCH with A85X one You'll lose the FCH USB3s and need to modify the bios for different AHCI roms but that's ok IMO.
  6. The training and timings work perfectly on Excavator as long as AGESA receives the correct parameters to use (from the bios). If the timings are not working as expected, I would assume that's because of bios bugs. The AGESA required by FM2+ Carrizos is are cluster *uck. The same code has to support five different chips at the same time (Trinity, Richland, Kaveri, Godavari, Carrizo)... AGESA itself of course has different paths for all of these, however they are pretty hard to implement from the bios side. So I would assume that it is more a issue with the bios, rather than with anything else.
  7. If 4.6GHz requires 1.76V I find 5.5GHz pretty unlikely. Also AFAIK there won't be any unlocked models (BR).
  8. The same thing applies on Excavator too, since the DDR3 controllers on Steamroller and Excavator are identical. The only major difference is that on Excavator the PMU SRAM interface is actually working, which makes it possible to train and configure the memory parameters correctly, unlike on Steamroller. Sad stuff The PMU communication should (not sure about public docs) be explained in BKDG, but it is quite a complex procedure. Also with Excavator you need to take into account that it is purely a mobile chip. You need to write the parameters in a right context (i.e with correct MemPS and NBPS targets). It is certainly possible on Excavator, but it is a bunnying nightmare to do. Not worth doing, IMO. Do what you can from the bios These chips are not any kind of priority for me and I've been working on other stuff instead. Check you EDC reading in recent HWInfo beta versions, could come handy
  9. There might be hope. TdpLimitDis shouldn't do anything since the power management is a SMU feature. This bit should only affect Apm itself. Will writing 15Ch Bits 4:2 to 0 (from 2) stick?
  10. No need to send anything over as these chips can be had for 60€ or so. I just haven't had any interested on these since I already have a Carrizo in a laptop which I have already tested throughly. So let say if you start SuperPI, the executing core (CU) will jump to 38x multiplier? If that's the case then Turbo is obviously active and working. Carrizo is the first AMD chip which can accurately monitor it's operating parameters and adjust the frequency accordingly. At some point you will most likely be limited by the TDP limit, at least when multiple cores are used. Even the mobile Carrizos running at significantly lower voltages require around 50W TDP to maintain all cores at 3400MHz during Cinebench. I don't expect that the "number of boosted states" can be changed on this CPU. You would need to change it to zero in order to constantly use the highest available multiplier (38x). However if you're already limited by the TDP, then it won't obviously help much. The only other way around would be basically cheating the power management. If AMD hasn't disabled the TDP control (through SMU), increasing the TDP limit to "sufficient" levels would make the chip run constantly at the maximum frequency under the load. This works for mobile Carrizos at least. Also the information displayed by MSRTweaker is wrong for most parts. The displayed voltages are wrong (SVI scale used instead of SVI2) and the other information is not displayed properly either. The multipliers are correct, but that's about it. To know your original voltages: Calculate the delta between the voltage displayed by MSRTweaker (each PState) and 1.55V. Divide the delta by two and add it to the displayed value. 1.40000V for "P0" (Pb0) is actually 1.47500V. I'll let you know if I find a good way to solve the pending issue. Edit: Check D18F4x15C. Bit 31:31 is 1, correct (BoostLock)? Bits 4:2 is 2, correct (NumBoostStates, Pb0 38x & Pb1 37x on this CPU) If that's the case, then increasing the TDP is most likely the only way. Unless you can "tune" the engineer sandbox fuses...
  11. Athlon X4 845 is a locked SKU so you cannot configure the maximum boosted multiplier for any other PState than Pb0. The programming conditions for each PState on Carrizo are =< the original FID and VID of the PState. Carrizo is PITA to work with anyway. I can think of several ways to get around this using SMU, but the implementation would be pretty complex and I have not been able to test it since I don't have any of these chips available. Also you cannot force the PState to switch to any boosted PState, since PState 0 command points to P0 instead of Pb0. Boosted PStates are not visible to the PStateCMD register, unless you set the "number of boosted states" value to zero (which cannot be done with ease). It seems that BR will be useless too as it appears that AMD won't be releasing any unlocked SKUs
  12. Does changing the MEMCLK from 1250, 1375 or 1500MHz to +1MHz (e.g. 1501MHz) degrade the performance? If not, then then workload is not latency intensive.
  13. Michal, are you running CB at default voltages (i.e voltage not adjusted or left to Auto)? Athlon X4 845 should have Pb0 (3800MHz) VID < 1.475V. Since the CB15 score is so poor the chip most likely throttles due TDP (PPT) limit of 65W being exceed. Carrizo is the first AMD CPU / APU which can measure it´s power consumption accurately so lowering the voltage should address the throttling. Athlon X4 845 should score > 300 in Cinebench R15 at default clocks (3.5GHz base). Even the FX-8800P scores 288pts at 35/42W TDP.
  14. Let´s put it this way. I got no truly in-depth technical information about 17h or AM4 platform in general, but based on the information I have the new stuff might be a quite hostile target for overclockers. Putting two completely differently targeted designs on the same infrastructure (AM4) is a huge compromise itself. Also when you see both Intel and nVidia implementing high performance targeted nodes for their flag ship products while AMD is doing low power targeted node all the way... I´m not saying the 14nm LPP is completely rubbish, I´m just questioning it´s suitability for a high performance CPU. If you look at the difference of the two 14nm Intel nodes (P1272 & P1273), the high performance node used on Skylake does significally better than the efficiency / density optimized one used on Broadwell. If the 17h happens to exceed my expectations and the other issues can be solved, then I have no issues in supporting the platform in the same way I have done in the past.
  15. The thing is that you cannot easily change the memory timings on anything newer than Richland. In Kaveri AMD introduced a completely overhauled and "vastly improved" (truth: FUBAR) memory controller. In order to change the timings on these controllers, you´ll need to create an array which contains all the timings and some other parameters. Once you have created the array, you´ll need to stop the PMU (PHY management unit) clock, write the "argument array" to certain register, send a interrupt to the PMU, wait it to ack, restart the PMU clock and hope the thing didn´t hang The timings "can" be changed through the PCI config space as usual, but changing them this way doesn´t have the same effect. When done from the bios the timings are being programmed properly by AGESA. Take a wild guess which method AOD uses Hopefully AMD will use in the house designed IMC in 17h. These outsourced ones are either complete rubbish or their are just badly implemented into the design. Neither Steamroller or Excavator IMCs can support > DDR-2400 without tampering with the BCLK...
  16. How have you evaluated the "no damage" done aspect? Still working?
  17. The maximum official VDDIO for AMD 28nm designs is 100mV less than for 32nm parts. It is highly advised not to exceed 1.7V even temporarily or you will risk frying the IMC.
  18. It´s just a very minor improvement, around 2% in 32M. The main issue was fixed in SR/XV design so further optimizations only yield a minor boost. There might be some additional stuff coming to adress the NB-DRAM FIFO latency issue, but I cannot guarantee I find the time to do it. The fix is highly configuration dependent and therefore pretty damn time consuming to implement.
  19. Regarding BDC, OP will surely deliver, let´s just wait... Finished some larger projects recently so I could not find enough time to do this earlier. BDC R1.03B - Added support for Steamroller; Kaveri (KV-A1) & Via Drago (VD-A1). Steamroller only requires setting DSWS to enabled. It will slightly improve the performance in SuperPI depending on the digits used. Validated on Kaveri and Via Drago with the latest code base (Patch, SMU, PMU & ScS only). Some of the lesser AV will flag this SW as malware, but as long as you only use the original package you´re safe. https://www.virustotal.com/fi/file/a00004302efbf4779c358b86e9ec66b8b8ed53304797627e42e50caa26a4f3f6/analysis/1426595828/ The checksum of the original package is DA817FFBCAFCE8C42702E4052A69FC81 (MD5). In case the checksum differs discard the archive and re-download from another source. http://1drv.ms/1EsKd0c
  20. All of the boards without an external Pll are limited to 136MHz really. You can go higher, if you know how A88X sucks at high BCLKs thou.
  21. While going through some newer stuff I noticed that Kaveri SuperPi performance can be improved too. The main issue in Piledriver was fixed in Kaveri however the fix is not complete. The fix improves the performance by 2%, which is not much but still something. I'll pop out a newer BDC 1.3 version when I find time to add the changes. The "LSU DSS-SLP" fix applies on both SR & XV designs.
  22. Only SoCs and the GPUs are made on TSMC 28nm node. The APUs are made on GlobalFoundries new 28nm node.
×
×
  • Create New...