Jump to content
HWBOT Community Forums

Recommended Posts

Posted

Context is Z97, giga SOC Force motherboard. Micron D9KPT memories.

If I understand correctly, a value for tRTP of less than (tCAS+tBurst) - tRP should be meaningless, right? Since the page can't be closed until it's finished with tBurst, even if it's finished precharging.

In my example case, I have

tCAS 5

tRP 5

tRTP 3 and 4

tBurst is always 4 (Since a 4 bit burst still takes 4 clock cycles)

Some SuperPi and AIDA64 results: (666mhz memory is why it's 7 minutes ~46 seconds)

tRTP 3            
1 465.828          
2 466.015          
3 466.110   best half avg 466.091    
4 466.140   best half range 0.422    
5 466.203   best half stdev 0.152    
6 466.250   read write copy  
7 466.343   20849 22328 19590  
8 466.344   20841 22336 19433  
9 466.344   20837 22328 19413  
10 466.375   20856 22324 19425  
11 466.453   20875 22335 19435  
12 467.891 NEIR 17 20852 22330 19459 average

 

tRTP 4            
1 465.594          
2 465.797          
3 465.890   best half avg. 465.966    
4 466.141   best half range 0.625    
5 466.156   best half stdev 0.246    
6 466.219          
7 466.265   20785 22343 19428  
8 466.328 NEIR 19, 14, NCIS 7 20822 22316 19452  
9 466.453   20838 22310 19431  
10 466.484   20851 22331 19452  
11 466.860   20789 22336 19442  
12 467.015 NEIR 3 20817 22327 19441 average

 

In 32m, tRTP 4 is generally better, although less stable (4 fails vs 1 fail for tRTP 3). In AIDA tRTP 3 has unambiguosly better read performance, and by extension, a little better copy.

  • I don't know why tighter tRTP is slower in 32m
  • I don't know why tighter tRTP is more stable in 32m
  • I don't know why tighter tRTP has better performance in AIDA
  • Tighter tRTP shouldn't matter at all because page closure is waiting on data transfer (tBurst), not tRP, right?

Is this just random error throwing me for a loop? Do you think increasing sample size will make this inconsistency go away?

Posted

No, it does not have to be. Firstly I'll explain the prefetch architecture, then a read to precharge scenario with a single burst.

Many DDR memory systems use prefetching technology to reduce the internal memory clock while still allowing for high transfer rates. The prefetch architecture uses an internal memory bus that is wider than the I/O bus by however many times the prefetch architecture used is. On DDR3 and DDR4 and 8n prefetch architecture is used, this means that internal memory bus is 8 times wider than the external I/O bus.

The prefetch architecture works by having the data stored transferred from the internal core memory into prefetch buffers for reads and the data transferred from the prefetch buffers to the internal memory for reads. It takes a single internal memory clock cycle to transfer this data both ways, meaning that when the read command is addressed, after 4 I/O bus clock cycles the data will be in the prefetch buffers, and ready to transfer. Due to this having a CAS latency below 4 on DDR3 is not possible.

This means that the DRAM is free to be precharged just 4 clock cycles after the read command is addressed, even though the burst has not occurred yet.

Now I'll explain a single read burst to a precharge.

image

This is a diagram of a read to precharge scenario I made my that applies to both DDR4 and DDR3 memory systems. This is a hypothetical situation where CL = 16, tRCD = 16, tRP = 16 and tRTP = 4, these timings are all legal and viable.

So in this scenario the memory is firstly activated, which opens the row that is going to be read from, then tRCD clock cycles later, the read command is addressed which chooses the column to read from, and then starts the internal data transfer from the internal memory to the prefetch buffer. As DDR3 is 8n, the internal memory bus is 8 times wider, and as DDR3 is well DDR, the internal memory clock is 4 times slower then the physical I/O bus clock. This means that the data is transferred to the external prefetch buffer just 4 I/O clock cycles after. This means after this point, the memory can be precharged when ever.

When the read command is addressed you can then see tRTP is the read to precharge delay, whilst CL is the delay to the start of the burst, both commands starting simultaneously but not caring about the other.

tRP is then the recovery from the precharge to when the memory can be activated or refreshed again.

So the min value for tRTP is just 4, not (CL+BL) - tRP, CL and BL don't even need to be accounted for since they go down a different path, and tRP happens when tRTP is expired.

I hope this helps, I'm happy to answer any other questions you might have :)

 

 

  • Like 2
Posted

Thanks for making an account to reply! I'm not sure I understand correctly yet tho.

 

  • Why do we have to wait tCAS clocks to start the burst if the memory is already in the buffer 4 clocks after the read command was processed?
  • And why does this motherboard allow me to set tRTP to 3? Setting it to 3 isn't a meaningless change, it changes performance, and it shows up in the gigabyte software that shows memory timings in OS. Is it setting tRTP to 3 only in cases where the burst has been chopped to 4 bits?

Do you happen to work in the field?

Posted
5 hours ago, nnimrod said:

Setting it to 3 isn't a meaningless change, it changes performance

Data in your table displays just a normal variance imho. I think after 100 runs both datasets will average to the same values.

Posted
6 hours ago, TerraRaptor said:

Data in your table displays just a normal variance imho. I think after 100 runs both datasets will average to the same values.

I fear you might be correct. But surely I'm not alone in my reluctance to run 32m 200 times to know for sure if one secondary timing is faster...

Confidence in results would be better if I could reduce variance. And I had much better variance until I tightened to cas 5. Previously at cas 6 I was down to about .2 or less variance between the best half of 12 runs. Cas 5 brought better best runs, and worse worst runs. I hope that I can get the variance down again by going back over seconds/terts.

One of the important terts was skewing tRDRD_dr/_dd to 5/6. tWR 9 was also very important. 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...