nnimrod Posted July 29, 2021 Posted July 29, 2021 Context is Z97, giga SOC Force motherboard. Micron D9KPT memories. If I understand correctly, a value for tRTP of less than (tCAS+tBurst) - tRP should be meaningless, right? Since the page can't be closed until it's finished with tBurst, even if it's finished precharging. In my example case, I have tCAS 5 tRP 5 tRTP 3 and 4 tBurst is always 4 (Since a 4 bit burst still takes 4 clock cycles) Some SuperPi and AIDA64 results: (666mhz memory is why it's 7 minutes ~46 seconds) tRTP 3 1 465.828 2 466.015 3 466.110 best half avg 466.091 4 466.140 best half range 0.422 5 466.203 best half stdev 0.152 6 466.250 read write copy 7 466.343 20849 22328 19590 8 466.344 20841 22336 19433 9 466.344 20837 22328 19413 10 466.375 20856 22324 19425 11 466.453 20875 22335 19435 12 467.891 NEIR 17 20852 22330 19459 average tRTP 4 1 465.594 2 465.797 3 465.890 best half avg. 465.966 4 466.141 best half range 0.625 5 466.156 best half stdev 0.246 6 466.219 7 466.265 20785 22343 19428 8 466.328 NEIR 19, 14, NCIS 7 20822 22316 19452 9 466.453 20838 22310 19431 10 466.484 20851 22331 19452 11 466.860 20789 22336 19442 12 467.015 NEIR 3 20817 22327 19441 average In 32m, tRTP 4 is generally better, although less stable (4 fails vs 1 fail for tRTP 3). In AIDA tRTP 3 has unambiguosly better read performance, and by extension, a little better copy. I don't know why tighter tRTP is slower in 32m I don't know why tighter tRTP is more stable in 32m I don't know why tighter tRTP has better performance in AIDA Tighter tRTP shouldn't matter at all because page closure is waiting on data transfer (tBurst), not tRP, right? Is this just random error throwing me for a loop? Do you think increasing sample size will make this inconsistency go away? Quote
alatron978 Posted July 30, 2021 Posted July 30, 2021 No, it does not have to be. Firstly I'll explain the prefetch architecture, then a read to precharge scenario with a single burst. Many DDR memory systems use prefetching technology to reduce the internal memory clock while still allowing for high transfer rates. The prefetch architecture uses an internal memory bus that is wider than the I/O bus by however many times the prefetch architecture used is. On DDR3 and DDR4 and 8n prefetch architecture is used, this means that internal memory bus is 8 times wider than the external I/O bus. The prefetch architecture works by having the data stored transferred from the internal core memory into prefetch buffers for reads and the data transferred from the prefetch buffers to the internal memory for reads. It takes a single internal memory clock cycle to transfer this data both ways, meaning that when the read command is addressed, after 4 I/O bus clock cycles the data will be in the prefetch buffers, and ready to transfer. Due to this having a CAS latency below 4 on DDR3 is not possible. This means that the DRAM is free to be precharged just 4 clock cycles after the read command is addressed, even though the burst has not occurred yet. Now I'll explain a single read burst to a precharge. This is a diagram of a read to precharge scenario I made my that applies to both DDR4 and DDR3 memory systems. This is a hypothetical situation where CL = 16, tRCD = 16, tRP = 16 and tRTP = 4, these timings are all legal and viable. So in this scenario the memory is firstly activated, which opens the row that is going to be read from, then tRCD clock cycles later, the read command is addressed which chooses the column to read from, and then starts the internal data transfer from the internal memory to the prefetch buffer. As DDR3 is 8n, the internal memory bus is 8 times wider, and as DDR3 is well DDR, the internal memory clock is 4 times slower then the physical I/O bus clock. This means that the data is transferred to the external prefetch buffer just 4 I/O clock cycles after. This means after this point, the memory can be precharged when ever. When the read command is addressed you can then see tRTP is the read to precharge delay, whilst CL is the delay to the start of the burst, both commands starting simultaneously but not caring about the other. tRP is then the recovery from the precharge to when the memory can be activated or refreshed again. So the min value for tRTP is just 4, not (CL+BL) - tRP, CL and BL don't even need to be accounted for since they go down a different path, and tRP happens when tRTP is expired. I hope this helps, I'm happy to answer any other questions you might have 2 Quote
nnimrod Posted July 30, 2021 Author Posted July 30, 2021 Thanks for making an account to reply! I'm not sure I understand correctly yet tho. Why do we have to wait tCAS clocks to start the burst if the memory is already in the buffer 4 clocks after the read command was processed? And why does this motherboard allow me to set tRTP to 3? Setting it to 3 isn't a meaningless change, it changes performance, and it shows up in the gigabyte software that shows memory timings in OS. Is it setting tRTP to 3 only in cases where the burst has been chopped to 4 bits? Do you happen to work in the field? Quote
TerraRaptor Posted July 30, 2021 Posted July 30, 2021 5 hours ago, nnimrod said: Setting it to 3 isn't a meaningless change, it changes performance Data in your table displays just a normal variance imho. I think after 100 runs both datasets will average to the same values. Quote
nnimrod Posted July 30, 2021 Author Posted July 30, 2021 6 hours ago, TerraRaptor said: Data in your table displays just a normal variance imho. I think after 100 runs both datasets will average to the same values. I fear you might be correct. But surely I'm not alone in my reluctance to run 32m 200 times to know for sure if one secondary timing is faster... Confidence in results would be better if I could reduce variance. And I had much better variance until I tightened to cas 5. Previously at cas 6 I was down to about .2 or less variance between the best half of 12 runs. Cas 5 brought better best runs, and worse worst runs. I hope that I can get the variance down again by going back over seconds/terts. One of the important terts was skewing tRDRD_dr/_dd to 5/6. tWR 9 was also very important. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.