Jump to content
HWBOT Community Forums

The Stilt

Members
  • Posts

    101
  • Joined

  • Last visited

Posts posted by The Stilt

  1. hgsHdP6.png

     

    Running Benchmarks for AMD Zen...
    
    Single-Precision - 128-bit SSE - Add/Sub:
       Dependency Chains  = 8
       Result     = 345.6
       FP Ops     = 1024000000000
       seconds    = 5.10914
       GFlops     = 200.425
    
    Single-Precision - 128-bit SSE - Multiply:
       Dependency Chains = 12
       Result     = 595.2
       FP Ops     = 1536000000000
       seconds    = 7.50842
       GFlops     = 204.57
    
    Single-Precision - 128-bit SSE - Multiply + Add:
       Dependency Chains = 12
       Result     = 595.2
       FP Ops     = 1536000000000
       seconds    = 5.6472
       GFlops     = 271.993
    
    Single-Precision - 128-bit FMA3 - Fused Multiply Add:
       Dependency Chains = 12
       Result     = 595.2
       FP Ops     = 3072000000000
       seconds    = 6.67126
       GFlops     = 460.483
    
    Double-Precision - 128-bit SSE2 - Add/Sub:
       Dependency Chains  = 8
       Result     = 172.8
       FP Ops     = 512000000000
       seconds    = 4.48418
       GFlops     = 114.179
    
    Double-Precision - 128-bit SSE2 - Multiply:
       Dependency Chains = 12
       Result     = 297.6
       FP Ops     = 768000000000
       seconds    = 7.00079
       GFlops     = 109.702
    
    Double-Precision - 128-bit SSE2 - Multiply + Add:
       Dependency Chains = 12
       Result     = 297.6
       FP Ops     = 768000000000
       seconds    = 7.5535
       GFlops     = 101.675
    
    Double-Precision - 128-bit FMA3 - Fused Multiply Add:
       Dependency Chains = 12
       Result     = 297.6
       FP Ops     = 1536000000000
       seconds    = 6.71436
       GFlops     = 228.763
    
    Single-Precision - 256-bit AVX - Add/Sub:
       Dependency Chains  = 8
       Result     = 691.2
       FP Ops     = 2048000000000
       seconds    = 8.89565
       GFlops     = 230.225
    
    Single-Precision - 256-bit AVX - Multiply:
       Dependency Chains = 12
       Result     = 1190.4
       FP Ops     = 3072000000000
       seconds    = 13.3701
       GFlops     = 229.767
    
    Single-Precision - 256-bit AVX - Multiply + Add:
       Dependency Chains = 12
       Result     = 1190.4
       FP Ops     = 3072000000000
       seconds    = 8.31182
       GFlops     = 369.594
    
    Single-Precision - 256-bit FMA3 - Fused Multiply Add:
       Dependency Chains = 12
       Result     = 1190.4
       FP Ops     = 6144000000000
       seconds    = 13.3439
       GFlops     = 460.433
    
    Double-Precision - 256-bit AVX - Add/Sub:
       Dependency Chains  = 8
       Result     = 345.6
       FP Ops     = 1024000000000
       seconds    = 8.89834
       GFlops     = 115.078
    
    Double-Precision - 256-bit AVX - Multiply:
       Dependency Chains = 12
       Result     = 595.2
       FP Ops     = 1536000000000
       seconds    = 13.6687
       GFlops     = 112.374
    
    Double-Precision - 256-bit AVX - Multiply + Add:
       Dependency Chains = 12
       Result     = 595.2
       FP Ops     = 1536000000000
       seconds    = 8.52216
       GFlops     = 180.236
    
    Double-Precision - 256-bit FMA3 - Fused Multiply Add:
       Dependency Chains = 12
       Result     = 595.2
       FP Ops     = 3072000000000
       seconds    = 13.3443
       GFlops     = 230.211
    

     

    Flops version 2, compiled with MSVC 2015 Update 3 using the standard project settings. Copied Haswell header (arch_2013_Haswell to arch_2017_Zen) and changed "Running Benchmarks for Intel Haswell..." to "Running Benchmarks for AMD Zen...".

     

    No other changes.

  2. I wonder if it's easier to get high frequency in Windows 7. My chip can only do 120MHz Pifast 1/100 runs or so (usually stuck at 118 MHz).

     

    @The Stilt: I'm quite confident that the super high Vcore is only to get the high BCLK stable. I can run 117x38 at 1.5v with ~ 10c 32M stable, but had it up at 1.7V for my final runs.

     

    It's really hard to tell where the X4 845 would be if unlocked.

     

     

     

    Thanks for the info!

     

    I wonder if the training is happening properly, actually. @l0ud_sil3nc3 mentioned this already, but on my end I see no actual benefit from dialing in timings from the BIOS. In fact, apart from tCL and tRCD I see no change in performance or stability adjusting any of the other timings. Usually adjusting the timings makes things worse performance wise.

     

    I guess I'm going to pack things up for now and wait for Bristol Ridge to appear.

     

    But seriously, WTF AMD ...

     

    The training and timings work perfectly on Excavator as long as AGESA receives the correct parameters to use (from the bios). If the timings are not working as expected, I would assume that's because of bios bugs.

     

    The AGESA required by FM2+ Carrizos is are cluster *uck. The same code has to support five different chips at the same time (Trinity, Richland, Kaveri, Godavari, Carrizo)...

     

    AGESA itself of course has different paths for all of these, however they are pretty hard to implement from the bios side. So I would assume that it is more a issue with the bios, rather than with anything else.

  3. @The Stilt: is it possible to change the DCT timings at run-time? I can execute read commands, but any write command results in 1) no change to timings and 2) about 5 seconds later a hard shutdown.

     

    The same thing applies on Excavator too, since the DDR3 controllers on Steamroller and Excavator are identical. The only major difference is that on Excavator the PMU SRAM interface is actually working, which makes it possible to train and configure the memory parameters correctly, unlike on Steamroller. Sad stuff :(

     

    The PMU communication should (not sure about public docs) be explained in BKDG, but it is quite a complex procedure. Also with Excavator you need to take into account that it is purely a mobile chip. You need to write the parameters in a right context (i.e with correct MemPS and NBPS targets). It is certainly possible on Excavator, but it is a bunnying nightmare to do.

     

    Not worth doing, IMO. Do what you can from the bios :(

     

    These chips are not any kind of priority for me and I've been working on other stuff instead. Check you EDC reading in recent HWInfo beta versions, could come handy ;)

  4. No need to send anything over as these chips can be had for 60€ or so. I just haven't had any interested on these since I already have a Carrizo in a laptop which I have already tested throughly.

     

    So let say if you start SuperPI, the executing core (CU) will jump to 38x multiplier? If that's the case then Turbo is obviously active and working. Carrizo is the first AMD chip which can accurately monitor it's operating parameters and adjust the frequency accordingly. At some point you will most likely be limited by the TDP limit, at least when multiple cores are used. Even the mobile Carrizos running at significantly lower voltages require around 50W TDP to maintain all cores at 3400MHz during Cinebench.

     

    I don't expect that the "number of boosted states" can be changed on this CPU. You would need to change it to zero in order to constantly use the highest available multiplier (38x). However if you're already limited by the TDP, then it won't obviously help much. The only other way around would be basically cheating the power management. If AMD hasn't disabled the TDP control (through SMU), increasing the TDP limit to "sufficient" levels would make the chip run constantly at the maximum frequency under the load. This works for mobile Carrizos at least.

     

    Also the information displayed by MSRTweaker is wrong for most parts. The displayed voltages are wrong (SVI scale used instead of SVI2) and the other information is not displayed properly either. The multipliers are correct, but that's about it.

     

    To know your original voltages:

     

    Calculate the delta between the voltage displayed by MSRTweaker (each PState) and 1.55V. Divide the delta by two and add it to the displayed value. 1.40000V for "P0" (Pb0) is actually 1.47500V.

     

    I'll let you know if I find a good way to solve the pending issue.

     

    Edit: Check D18F4x15C.

     

    Bit 31:31 is 1, correct (BoostLock)?

    Bits 4:2 is 2, correct (NumBoostStates, Pb0 38x & Pb1 37x on this CPU)

     

    If that's the case, then increasing the TDP is most likely the only way. Unless you can "tune" the engineer sandbox fuses... :D

  5. Athlon X4 845 is a locked SKU so you cannot configure the maximum boosted multiplier for any other PState than Pb0. The programming conditions for each PState on Carrizo are =< the original FID and VID of the PState.

     

    Carrizo is PITA to work with anyway. I can think of several ways to get around this using SMU, but the implementation would be pretty complex and I have not been able to test it since I don't have any of these chips available.

     

    Also you cannot force the PState to switch to any boosted PState, since PState 0 command points to P0 instead of Pb0. Boosted PStates are not visible to the PStateCMD register, unless you set the "number of boosted states" value to zero (which cannot be done with ease).

     

    It seems that BR will be useless too as it appears that AMD won't be releasing any unlocked SKUs :rolleyes:

  6. Michal, are you running CB at default voltages (i.e voltage not adjusted or left to Auto)? Athlon X4 845 should have Pb0 (3800MHz) VID < 1.475V. Since the CB15 score is so poor the chip most likely throttles due TDP (PPT) limit of 65W being exceed. Carrizo is the first AMD CPU / APU which can measure it´s power consumption accurately so lowering the voltage should address the throttling.

     

    Athlon X4 845 should score > 300 in Cinebench R15 at default clocks (3.5GHz base). Even the FX-8800P scores 288pts at 35/42W TDP.

  7. Let´s put it this way. I got no truly in-depth technical information about 17h or AM4 platform in general, but based on the information I have the new stuff might be a quite hostile target for overclockers. Putting two completely differently targeted designs on the same infrastructure (AM4) is a huge compromise itself.

     

    Also when you see both Intel and nVidia implementing high performance targeted nodes for their flag ship products while AMD is doing low power targeted node all the way... I´m not saying the 14nm LPP is completely rubbish, I´m just questioning it´s suitability for a high performance CPU. If you look at the difference of the two 14nm Intel nodes (P1272 & P1273), the high performance node used on Skylake does significally better than the efficiency / density optimized one used on Broadwell.

     

    If the 17h happens to exceed my expectations and the other issues can be solved, then I have no issues in supporting the platform in the same way I have done in the past.

  8. I don't think AOD even supports Kaveri. //ninja-edit: apparently the latest version actually does!

     

    The Stilt used to support AMD with enthusiast-grade software, but I don't think AMD is too bothered. Haven't heard of any recent tool that support the latest architectures.

     

    The thing is that you cannot easily change the memory timings on anything newer than Richland. In Kaveri AMD introduced a completely overhauled and "vastly improved" (truth: FUBAR) memory controller. In order to change the timings on these controllers, you´ll need to create an array which contains all the timings and some other parameters. Once you have created the array, you´ll need to stop the PMU (PHY management unit) clock, write the "argument array" to certain register, send a interrupt to the PMU, wait it to ack, restart the PMU clock and hope the thing didn´t hang :rolleyes:

     

    The timings "can" be changed through the PCI config space as usual, but changing them this way doesn´t have the same effect. When done from the bios the timings are being programmed properly by AGESA.

     

    Take a wild guess which method AOD uses ;)

     

    Hopefully AMD will use in the house designed IMC in 17h. These outsourced ones are either complete rubbish or their are just badly implemented into the design. Neither Steamroller or Excavator IMCs can support > DDR-2400 without tampering with the BCLK...

  9. It´s just a very minor improvement, around 2% in 32M.

    The main issue was fixed in SR/XV design so further optimizations only yield a minor boost.

     

    There might be some additional stuff coming to adress the NB-DRAM FIFO latency issue, but I cannot guarantee I find the time to do it.

    The fix is highly configuration dependent and therefore pretty damn time consuming to implement.

  10. Regarding BDC, OP will surely deliver, let´s just wait...

    Finished some larger projects recently so I could not find enough time to do this earlier.

     

    BDC R1.03B

     

    - Added support for Steamroller; Kaveri (KV-A1) & Via Drago (VD-A1).

     

    Steamroller only requires setting DSWS to enabled.

    It will slightly improve the performance in SuperPI depending on the digits used.

     

    Validated on Kaveri and Via Drago with the latest code base (Patch, SMU, PMU & ScS only).

     

    Some of the lesser AV will flag this SW as malware, but as long as you only use the original package you´re safe.

     

    https://www.virustotal.com/fi/file/a00004302efbf4779c358b86e9ec66b8b8ed53304797627e42e50caa26a4f3f6/analysis/1426595828/

     

    The checksum of the original package is DA817FFBCAFCE8C42702E4052A69FC81 (MD5).

    In case the checksum differs discard the archive and re-download from another source.

     

    http://1drv.ms/1EsKd0c

  11. While going through some newer stuff I noticed that Kaveri SuperPi performance can be improved too. The main issue in Piledriver was fixed in Kaveri however the fix is not complete.

     

    The fix improves the performance by 2%, which is not much but still something.

     

    I'll pop out a newer BDC 1.3 version when I find time to add the changes.

    The "LSU DSS-SLP" fix applies on both SR & XV designs.

  12. it will be question...Its TSMC 28nm, hard to say....I heard some spekulation about 28nm TSMC and FX: one of the reasons no more FX yet is technology process by TSMC. In theory 28nm TSMC is more power hungry than current 32nm SOI HK with the same size of die. So maybe some FX with new process (20nm?)???

     

    Only SoCs and the GPUs are made on TSMC 28nm node.

    The APUs are made on GlobalFoundries new 28nm node.

×
×
  • Create New...