Jump to content
HWBOT Community Forums

allegratorial

Members
  • Posts

    15
  • Joined

  • Last visited

Posts posted by allegratorial

  1. Here's the relevant part of his response.

    (I've re-worded a few things to fill in the context.)

     

    It's an interesting idea, but it deviates too far from the original goals as a research project:

     

    1. Break [a few world records] and provide some more numerical evidence on the irrationality of [Catalan's Constant] and [Euler's Constant].

     

    2. Show that high-precision arithmetic can be efficiently paralleled.

     

    3. Make a decent multi-threaded Pi-program.

     

    As for the memory problem... lol... I would say it'd be pretty bad. The longest thing I can fit comfortably under 6GB of ram takes like ~12 minutes on [dual X5482]...

     

    I can't expect everyone to have more than 6GB right now since few people will overkill on ram.

     

     

    So I guess that's a no.

  2. I like that idea. :) makes it more like 3DMark.

     

    It probably isn't doable directly, but it something more like a binary search sort might work to find the largest computation that can be done in say xx minutes.

     

    I'll ask author about this to see if he is interested.

     

    One problem I can forsee right off the bat is that faster computers might not have enough ram to do it.

    Suppose the benchmark is to find the largest computation that can be done in xx minutes, there will be a problem if the largest computation that will fit into ram is shorter than that.

  3. nearly linear scaling with cores/threads is not possible.

    i personally think, that y-cruncher also uses binary splitting and a strong

    usage of gmp. so if this is true (or near by truth) linear scaling is im-possible.

    this is one of the reasons why MaxxPI² will calculate parallel PI results for each core.

    to keep much load as possible on the cores.

     

    Maxxpi uses GMP? :confused: So that Pi chudnovsky implementation that it uses is merely this?

     

    http://gmplib.org/pi-with-gmp.html

     

     

    Whatever y-cruncher uses, it gets more than 4x scaling on Core i7 (with HT) and 7x on his dual-harpers (according to his website) - which by "my" judgement is nearly linear.

     

     

    But anyways, I'll let this thread get back to Maxxpi. Sorry I interrupted. :o

  4.  

    again: maxxpi does *not claim to be the fastest PI application.

    it does not need to be.

     

    fast enough to bench without sleeping *and long enough to show clearly differences between different setups. speed doesnt matter at all. it's the comparative between pc's

    that makes a benchmark a benchmark.

     

    true, being more of a software benchmarker, I've never really thought about this.

     

     

    hmm good question...!?!, here a screen from an very early alpha, but as i said

    i don't think that this will be used in maxxpi (q6600 at 4ghz):

     

    Assuming clock speed scales roughly linear, 36 seconds for 32M will beat both quickpi and y-cruncher in single-threaded mode.

    It will also beat quickpi in multi-threaded mode... But it scales too poorly to even compare with y-cruncher.

     

     

    Interesting question to ask:

     

    Of the 3 multithreaded pi programs that exist now:

     

    Why do MaxxPi and QuickPi's implementations for Chudnovsky's formula scale so poorly with multiple cores? Whereas y-cruncher achieves near linear scaling.

     

    Interesting thing to notice is that the author y-cruncher is merely a junior in college. His purpose for writing the program was to smash a few size records (and he did). (and as you'd mentioned: for record breaking, speed matters)

     

    Also, when I compared the speeds of the other constants that y-cruncher can compute with QuickPi (all of which y-cruncher currently holds the world record for), y-cruncher beats QuickPi hands down even in single-threaded mode. It's only with Pi is y-cruncher slower than QuickPi - which leads me to think that because there's no "attainable" record at stake for Pi, this kid never even bothered to optimize his implementation for Pi.

     

    So from a software benchmarker's standpoint, this has me wondering what y-cruncher could turn into given that it's already a killer program in terms of pure-multithreaded speed. I also wonder what will happen when this kid gets older and becomes more experienced.

     

    Of course crunching pi itself is pretty useless, but it's the underlying arithmetic engine in y-cruncher that is valuable as it currently has the only multithreaded multiplication in the world that will beat even GMP - and it does so single-threaded and without assembly optimizations.

     

     

     

    i think you have to read this:

    http://en.wikipedia.org/wiki/Benchmark_(computing)

    to understand.

    if you willing to get a worldrecord by calculation PI

    with xxxxM then your are right=speed matters.

     

     

     

    well as i said *binary splitting*, that means multicore(thread) for one calculation.

     

    chudnovsky *is the fastest formula at current,

    but it will give not that consistently cpu load as gauss do.

     

     

     

    surely do, look at MaxxPI :-)

     

    but anyways, if your favorite is y-cruncher then use it!

    it's a pice of wonderfull and incedible fast software.

     

     

     

    that's the point!

     

    cu

     

    Yes I know what benchmarking is. I'm more of a software benchmarker than a hardware benchmarker.

     

    I know that Chudnovsky's formula is currently the fastest known algorithm, but all that binary splitting stuff is over my head. too much math for me. :o

     

     

    And by "fast", I mean something comparable to QuickPi at the least. If you can get him to release a GUI version of his Chudnovsky implementation, then it will satisfy both "fast" and "pretty". :)

  5.  

    he also has an *incredible fast chudnovsky

    algorithm (incl. binary splitting), but this one will not produce

    that clean load on the cpu/memory as the gauss do.

    You can see that via performance monitoring Unit (PMU-CPU).

     

    so i don't think he will include this into maxxpi.

     

    True, it pretty fast. Here's what the numbers look like at 32M on a friend's 2.66 GHz Harpertowns BSEL to 3.2.

     

    MaxxPi 1.35 - 213.36

    PiFast 4.3 - 101.81

    QuickPi 4.5 (x64) - 44.51

    y-cruncher 0.3.2 (x64 SSE3) - 14.68

     

    Any idea where his Chudnovsky implementation stands? I'm sure the pi-community would like to see it. (since all they seem to care about is speed)

     

     

    Also, if it's single-threaded (since it clearly is), why would it matter which formula (gauss vs. chudnovsky) is used? Regardless of resource distribution, it would still be 100% cpu over 1 core anyway.

     

     

    Doesn't look like fast and pretty will ever go together...

  6. According to his website, the author does have plans for light-weight GUI. When will it come, it doesn't say.

     

    As far as consistency goes between versions, that is unlikely to happen as the program was clearly written for speed. But if HWbot takes it up, it'll more than likely that he will maintain a separate version that will be consistent - sorta like what wprime does by keeping version 1.55.

  7. the requirements outlined are those for a bench to be able to be used by HWBot with good success.

     

    The program appears to satisfy all but the last one in that list. (And maybe the first one too, but I'm not sure what you mean by easy launch button.)

     

    If you'd like, you can contact the author directly:

     

    http://www.numberworld.org/y-cruncher/

    (bottom of the page)

     

    But seeing as how HWbot isn't interested in another Pi program, this probably doesn't really matter.

     

     

    One bench that needs to be added is one that lasts 1h+ for current CPUs. We had it in the past, but not anymore:rolleyes:

     

    This is probably besides the point, but the program can compute other (slower) constants. The two slowest constants that the program can do will take a skulltrail system more than 3 hours to bench 1 billion digits. Again these can go well above 10 billion digits - limited only by memory. According to the author (on his website), computing these slower constants to >10 billion digits will take days. So length of benchmark isn't an issue.

     

    http://www.numberworld.org/y-cruncher/benchmarks/v0.2.1/eulergamma.html

  8. ic.

     

    So it wouldn't matter if the program is more suitable and up-to-date with current hardware. (as far as I can tell, this program has support for 64-bit and SSE)

     

    What's the whole idea of having 5 versions of 3DMark?

     

    Has richba5tard taken a look of this? Or are you speaking for him?

  9. chance is very slim this will be added, we already have Pifast, SuperPi and Wprime for 2D performance

     

    I don't see how you can compare PiFast and SuperPi with this y-cruncher program.

     

    PiFast and SuperPi are both single-threaded.

    y-cruncher is multithreaded.

     

    Also, this program is "more realistic" than wprime in terms of application performance. It isn't perfectly threaded like wprime, but it still does it very well.

     

    There's been quite a few forums that want to see this on hwbot. I say we give it a chance.

     

    The thing is, over the past few years a lot of people have been crying out for a multithreaded pi program. But most attempts so far had been either complete failures or were obscenely slow. QuickPi is an interesting exception because it is very fast to begin with, but it doesn't scale very well.

     

    So we finally have the answer. This thing is fast even on one thread and is a monster on multiple threads...

     

    Also, the author is "still alive" and maintaining the program. So unlike SuperPi and PiFast, it can still be easily updated.

  10. If it's going to be added, then the slowest test should take about 2-3 min. at least - since it's multithreaded (no need for another 2 sec benchmark...), and the slowest one 1h++

     

    There are too few benchmarks here that need some time to finish - even superpi32m is over almost before you've started it:(

     

    That program has 9 different bench sizes ranging from 25 million to 10 billion digits. So you can take your pick. :)

     

    But for all practical purposes, the most that can be done with current hardware is 1 billion digits. Any higher and memory becomes prohibitive.

     

    The largest bench size at 10 billion digits took 2 hours on the author's dual 3.2GHz harpertowns. But nobody has been able to match that because it requires 46GB of ram.

     

    I highly doubt that the author expected anyone to have that much ram. So I'm guessing his reason for taking it so high is for future systems, or for stress-testing servers.

     

    Apparently, the program can go MUCH higher than 10 billion digits, but he only has enough ram to test it up to 10 billion.

  11. Has anyone seen this?

     

    http://www.xtremesystems.org/forums/showthread.php?t=221773

     

    25 million digits in 8.46 seconds on a 4GHz skulltrail!!!

    1 billion digits in 7 minutes and 52 seconds on 2 x 2.80GHz Gainestown!!! - That's almost as fast as the record 32m SuperPi time...

     

     

    Believe it or not, this IS a multithreaded program that will compute Pi.

     

    Single threaded, it destroys superpi and is as fast as PiFast.

    Multi-threaded, it destroys it PiFast.

     

    The latest version even has anti-cheat protection and validation.

     

     

    So apparently, Pi CAN be multithreaded.

×
×
  • Create New...