Jump to content
HWBOT Community Forums

mickulty

Members
  • Posts

    544
  • Joined

  • Last visited

  • Days Won

    4

Posts posted by mickulty

  1. 10 hours ago, websmile said:

    The by far best way to limit stress and stop hardware from failing is to stop overclocking and of course benching. Great proposal, but I am not sure this is the correct forum to post it..

    troll_face.png

    Well if you overclock you accept risk, that's what we tell people on day 1 isn't it?  But I don't want to tell people just straight up "wrong hobby" when they start complaining because there's a risk, better to work out a way to mitigate it if someone's going to be so concerned, people pick their own risk level.  Can't have it both ways, it doesn't make sense to go "my chip died bench bad" then "I don't wanna have limits, that's not the point".

    Like... yeah, share your experiences.  Warn people.  Get upset, I would.  It sucks when hardware dies.  I just think it comes a long way below relevance, security and ability to run on different hardware when deciding if a benchmark is a good benchmark.

    Off-topic but maybe relevant to some readers, apparently Luumi left his 9900K in a dewar for a couple of days and it's reanimated.

    • Like 3
  2. 23 hours ago, Splave said:

    So the answer is to put amperage limits on my cpu while doing xoc....that sounds quite moronic no? 

    It sounds counterintuitive that you can make your eyesight *better* by putting weird curvy bits of glass in front of your eyes as well, but you don't see me insisting to you that small books should just be got rid of and these "glasses" things are moronic.

    Yes, there is an amount of stress that causes parts to fail.  A great way to stop them from failing is to limit the stress to below that amount.

    • Like 1
  3. 8 hours ago, Splave said:

    Yes you are just supposed to arbitrarily stop at a mythical voltage limit which is the square root of cotton candy divided by unicorns, then you will be safe. How do we even know vcore is the issue and not system agent or io voltage etc? We don't know and will never know. 

    People run at hard memory settings at ambient and we don't hear of deaths from that, and LN2 should be safer at the same voltage, so there's no need for the mud of the memory controller to cloud the waters.

    You should be able to set current limits in bios per intel datasheets - 193A for 8c coffee, 138A for 6-core (-K SKUs - some lower power bins have slightly less tolerance), 100A for 4-core.  These are specified on page 124 of the 8th gen core family datasheet, volume 1.  Above those limits, chips are going to be damaged - that's the nature of extreme overclocking.

    As far as temperature scaling, it's a long time since I've done this kind of 'pure' maths but I think after filling two sides of A5 with rearrangements and simplifications of Black's equation the "safe" current (IE same MTTF as at the max specified by Intel)............................ does still ultimately depend on unknown factors.  I tried (even thought I'd nailed down a proportional relationship to e^(1/t) before I checked my work and found a mistake lol - it's actually proportional to (e^(1/T))^x where x is some unknown constant depending on the material properties, so fat lot of good that is).  But honestly, that's life.  It will be more but might be 0.00001A more, it all depends on the actual material properties of Intel's process, specifically of the aspect used for whatever the weak point is.

    Of course, this stuff about setting current limits is all assuming that electromigration is THE explanation.  It could as bones said be related to mechanical stress from differences in thermals in different parts of the chip or substrate.  This would also have the same possibility of brief "reanimation" when refrozen.  It could even be a combination of both.

    What we do know for sure is it's not x265's fault if Intel chips get really weak at true 100% load, and you no longer have roll-over there to hold your hand.

  4. To bring this back to benchmarks rather than ridiculous ad hominems;

    For those who don't know, one of the ways chips fail is electromigration.  This is a phenomenon where the physical movement of electrons causes damage over time to the conductor.  Because this is a physical effect, it's entirely possible that a connection could be brought past the brink while cold but not actually physically disconnect until the slight expansion associated with warming back up happens.  This would appear to fit with what people have reported (I'd be really interested to hear the results of refreezing one of these 'walking dead' chips, in the event someone ends up with spare LN2).

    Electromigration can be predicted according to an equation called Black's Equation.  Several of the factors are unknown to us mere mortals (though probably estimated/studied internally at Intel etc) but it does tell us that reducing the temperature gives a longer time to failure (but not indefinite extension with 'only' ln2), and increased current gives a shorter time to failure.

    If the weak link that's failing is in the power delivery within the chip (or the package, I suppose), then the current draw by the entire core would be what matters.  That would mean something like whycruncher, XTU or x265 that pretty much maxes it out, hitting all the execution units most of the time (AVX2 is a good way to do this, I'd have to look up architectural info to know if it's possible without AVX2, it certainly could be), is going to be a worst case scenario for chip life because the current going into the cores is way higher for the same voltage and clocks.  It's also possible that x265 happens to hit different, weaker bits of Intel's chips than cinebench does.

    However, you can just compensate for this with less voltage.  Technically semiconductors don't follow ohm's law, but in practice if you reduce voltage, current reduces roughly in proportion (buildzoid did a study demonstrating this).

    The conclusion is that you absolutely can run x265 and other super-heavy benchmarks safely, just use less voltage to do so unless it's a suicide run.  There's no problem with the software, it's just a heavier load than people are used to and maybe requires that you back off on voltage more than people realise.

    Regarding this original comment;

    Quote

    I'm sorry to say this because the developer works hard on it but x265 4k is a cpu killer and should be disabled for globals. I'm not talking randomly either, if you run it on ln2 with a modern cpu there is a greater than 50% chance it won't work the next day

    You can absolutely run it on LN2, you just need to rethink the voltage.  Normal "safe" voltages may not apply.  Fundamentally, there is not some CPU instruction invoked by x265 that causes chip death only if you're under -100C.  It's just a heavier load, causing more stress than you're used to for the same voltage (probably very close to the maximum stress possible), whereas the "classic" benchmarks are clearly a long way from maximum stress.

    An interesting example of this that's actually documented, by the way, is The Stilt's "strictly technical" analysis of PBO limits on Pinnacle Ridge.  AMD program a much higher boost voltage for single-core loads, not only way way above what's programmed for multi-core but also way above what's safe for 24/7 all-core use even well within safe temperatures (this isn't just a theory, people have tested safe voltages - 1.425V@60C degrades a 2700X noticeably over ~3 months).  It's not about keeping a lid on thermals, because the boost algorithm controls for them separately.  It'll be some part of the chip that can withstand the current from one core at the higher voltage, but not from multiple.

    • Like 4
  5. 42 minutes ago, yosarianilives said:

    I think people's complaint isn't that it's a chip killer so much that you can't tell what voltage it kills the chip because it's seemingly random and you won't know until the next day when the cpu no longer works.

    That's an Intel issue, you can't magically do that with software.  Someone doesn't want to deal with that, they gonna call for a ban on Intel CPUs?

    Not knowing a voltage is killer until it kills is the same on anything.  I know it's galling to think you got away with it and turn out to be wrong but look at it this way, at least they got the run first ?

  6. hahahaha, wow, didn't realise how immature some people can be.  You kill a chip and blame the benchmark for putting it under load?  Seriously?  Can I get pifast demoted if I kill a modern CPU by giving it 3V to the core?

    Competitive OC has to have some tenuous relevance to normal people, that means real-world benchmarks that actually use the capabilities of modern hardware, not the same x87 crap from 1995.  So you have to change your voltage and maxmem settings, boo hoo.  Benchmarks don't kill chips, voltage kills chips.  You should all be thanking havli for making a proper modern benchmark with visual feedback and scalability with overkill to extreme core counts that can still be run on stuff right back to Coppermine, not insisting that because you can't bench it safely it's not a good benchmark.

    BTW there's no point in a non-AVX version.  It'd just be pandering to the kind of people who are never happy anyway.  Just let them entertain themselves with their meaningless zero-load frequency validations (which there is a place for, but other competitive OC has no obligation to emulate them).

    Oh, while I'm at it, ycruncher should get points as well.  Needs revising for offline data file saving without TAGG's workaround and I guess its own JRE/libraries like x265 has so it's not a PITA to set up, but ultimately it's a good, meaningful bench that showcases a different aspect of performance (very high memory size requirement) while again still working on very old hardware unlike CB15.

    • Like 5
  7. Current leagues lead to a lot of meta-benching.  Ambient benchers who are enthusiastic to try better cooling shy away from ice, chillers and (if they're honest ones) putting rads outside because they don't wanna bench against dice.  Apprentice benchers shy away from trying LN2, or at least from posting the scores they get, because they don't wanna bench against people who own dewars and get regular LN2 deliveries.  If the leagues are brought together I think we'd see more creative cooling, that has to be good.

    One thing, if the leagues are all merged might need to reset the achievements for league position.

  8. 3 hours ago, richba5tard said:

    Hmm, currently the team points = sum TPP + 1/10th of seasonal member points. Change it to 1/10th of career member points? Maybe makes more sense. I don't really care. :)

    Probably makes more sense, thinking about it.  Otherwise people who have got top scores in the past are at a huge disadvantage not because they're less active now, but because they can't get points for scores this year when they already got a better score last year (unless people sandbag...).

    This actually seems like it may be a problem for seasonal rankings in general.  Say me and nachtfalke both posted a fairly low-effort HD 4890 vantage score of 16K, probably not too difficult on modern physics platforms, I'd get 5 seasonal points but nachtfalke would get 0 because he already made an amazing score on cascade in 2016 - that doesn't seem fair.

    Some kind of dynamic ranking would be good though.  Maybe a "champion's ranking" based on globals + recent competitions?

  9. 9900K and 8086K are the same socket.  Same for 6950X+5960X, 2990WX+1950X.

    Yos and unity are both trolling.  Understandably.  It's one thing to check for bad hardware combinations, but I don't think you can expect people to hold your hand and tell you that LGA2066 is also capable of very high BCLK in the same way 1151 is.

    • Thanks 1
  10. Rev.6 was taking forever to calculate

    Rev.7 is a buggy mess

    If Rev.8 will be easier to maintain, I'm all for it.

    I do wonder though, is it really a good idea in terms of code base and server load to bring in a seasonal ranking at the same time as trying to simplify things?  I'm neutral on the concept but maybe one thing at a time.  Mostly I just want hwbot to work and be reliable, we can worry about the meta later...

  11. 50 minutes ago, Leeghoofd said:

    yep normally if all goes well ?

    Sorry, quick follow-up/afterthought - there was some suggestion before that socket was by board and not CPU, is this still the case?  So for example (to pick the most annoying examples) a Crosshair IV with an FX-8150 would count as AM3, a Crosshair V with a sempron 145 would count as AM3+?  Sorry to be awkward :P

    • Like 1
  12. 22 minutes ago, GeorgeStorm said:

    I think that's kind of the point :P It can boost points for members of teams thanks to non actual team members.

    It'll probably recalculate and lose us points too, leaving us below what we'd have had if other team members had filled in instead.  It's not about esports points though, it's about the spirit of it and what we can do as a community without excluding people who also have their own projects.  I'm proud of the openness of /r/oc and really glad we didn't have to leave people in the cold just because they aren't always giving us team points.

    • Like 1
  13. 3 hours ago, Noxinite said:

    Thanks for organising the comp Leeghoofd! Was a lot of fun, even if I didn't manage to successfully volt mod anything for it. XD

    Shoutout to r/OC for coming 2nd and beating us in DDR3 and DDR4!

    Cheers, and grats on the well-deserved win.  We'll get you next year :P

    • Thanks 1
×
×
  • Create New...