MaxxPI² - Pi/CPU/System Benchmark

TheKarmakazi · May 4, 2009

I am going to be beta testing this program soon. I have notified the user about hwbot (they were unaware of the site). It would potentially be a great hwbot integrated (like wprime) program or you can use the online validation they will provide!

The programmers nickname on i4memory.com is "alice" if you want to PM. Or use the contact us form on the website.

http://www.maxxpi.net/

features:

MMX / SSE Support

hardware based time measurement to, approx. 1-2ms exactly

arithmetic-deep: 128 m max.

memory usage approx. 1.2gb ram (with 128M)

multithread (not multicore, MaxxPi² will do multicore, not the preview)

for example an 4000mhz intel (+sse2), 6mb Cache will give:

128 m = approx. 14min

32 m = approx. 3min

1 m = approx. 2.4sec.

description of score: K / sec.:

is given to compare easier achieved results,

so instead of having to say 2min 32sec 343ms...

now you need only to tell on single number, k/sec.

it is nothing else than: the number of calculated decimal places per sec. in K (1024)

for example:

1m became 356.2 (K/sec.) means that the cpu was able to

calculate 364,748.8 (1024 x 356.2) decimal places per sec.

AI (Shortest path, Genetic Algorithm / Traveling Salesman Problem TSP):

Memory Reliability:

Image Processing (Blurring):

some statistics:

http://www.maxxpi.net/pages/result-browser/statistics.php

and here some comparisons for example:

memory: http://www.maxxpi.net/pages/result-browser/top10---memory.php

flops: http://www.maxxpi.net/pages/result-browser/top10---flops.php

Edited May 4, 2009 by TheKarmakazi

Dualist · May 4, 2009

Looks very interesting. Should be a good addition

allegratorial · May 4, 2009

Interesting.

Speed-wise it's slower than PiFast, QuickPI, and y-cruncher.

But it has a GUI

Edited May 8, 2009 by allegratorial
at the request of somebody...

allegratorial · May 4, 2009

huh? how is this multi-threaded?

TheKarmakazi · May 4, 2009

it may not be as fast as some, thats true. But very nice clean interface, online validation and error checking, integration with cpu-z. Multi core / multithread compatible. Vareity of benchmarks and tests run with only one program. Compatible with XP 32/64, Vista 32/64, Win 7 32/64 etc.

Plus its still in development phase so maybe easier to get features implemented!

jmke: I wasnt sure what the requirements were for hwbot integration but I guess this one wont cut it Still looks like a nice app though

IanCutress · May 5, 2009

jmke: I wasnt sure what the requirements were for hwbot integration but I guess this one wont cut it Still looks like a nice app though

Get it to have a 1-button test (like wPrime) that just runs a test. But the test should do something other than wPrime (the TSP looks interesting).

Then some form of checksum checking system for online submission, or direct HWBot submission like wPrime. Sorted.

The program looks interesting for sure

K404 · May 5, 2009

requirements for HWBotification:
- one/two settings accessed by easy launch button from the main menu

- a single score as result

- cheat proof or at least have some implementations to prevent it

- checksum available and correct, ability to verify checksum and score

- for best integration: online submission to HWBot's API

Technically that rules out SPi though?

As for speed...the slower the better. We need marathons as well as sprints for benching. SPi 1M is the first bench tried under LN2, lets be honest. We need bigger challenges not smaller ones.

Tharamis · May 5, 2009

hi,

Interesting.

Speed-wise it's absolutely pathetic compared to:

http://www.numberworld.org/y-cruncher/

and it's also a lot slower than PiFast and QuickPi.

But it has a GUI Apparently, the author the y-cruncher program has plans for a GUI but nothing fancy like this.

http://www.numberworld.org/y-cruncher/version_history.html#Future

it's pretty fast, fast enough for benching *and! comparing.

it doesnt have to be the *fastest.

it's precise, uses CRC and the HW based clock (not winclock) and provides one result (k/sec.)

witch is very easy to compare. it does not need *any additional librarys / installation.

I know him, he specially choosed the gauss algo. because of

its high *and continues (no fluctuation) cpu load.

there are no optimizations for any cpu-manufacturer on board.

all of them will benched with the same non-optimized code.

As he said, this will show the real world, better.

he also has an *incredible fast chudnovsky

algorithm (incl. binary splitting), but this one will not produce

that clean load on the cpu/memory as the gauss do.

You can see that via performance monitoring Unit (PMU-CPU).

so i don't think he will include this into maxxpi.

His MaxxPI² is very professional,

i was one of the first beta-testers on board.

example:

is there an difference between dual/tripple channel on x58?

search the web, you will find nothing. try maxxpi2 and you will see it

(memory <> overall-memory)

this is also very interesting:

and viewing/exporting own results (excel):

cu

Edited May 5, 2009 by Tharamis
forgotten something...

allegratorial · May 5, 2009

he also has an *incredible fast chudnovsky

algorithm (incl. binary splitting), but this one will not produce

that clean load on the cpu/memory as the gauss do.

You can see that via performance monitoring Unit (PMU-CPU).

so i don't think he will include this into maxxpi.

True, it pretty fast. Here's what the numbers look like at 32M on a friend's 2.66 GHz Harpertowns BSEL to 3.2.

MaxxPi 1.35 - 213.36

PiFast 4.3 - 101.81

QuickPi 4.5 (x64) - 44.51

y-cruncher 0.3.2 (x64 SSE3) - 14.68

Any idea where his Chudnovsky implementation stands? I'm sure the pi-community would like to see it. (since all they seem to care about is speed)

Also, if it's single-threaded (since it clearly is), why would it matter which formula (gauss vs. chudnovsky) is used? Regardless of resource distribution, it would still be 100% cpu over 1 core anyway.

Doesn't look like fast and pretty will ever go together...

Tharamis · May 5, 2009

hi,

True, it pretty fast. Here's what the numbers look like at 32M on a friend's 2.66 GHz Harpertowns BSEL to 3.2.

MaxxPi 1.35 - 213.36

PiFast 4.3 - 101.81

QuickPi 4.5 (x64) - 44.51

y-cruncher 0.3.2 (x64 SSE3) - 14.68

great results!, here with i7 at 4ghz:

pifast4.3: 65.6 sec

maxxpi1.35: 127.1 sec

superpi: 581.5 sec.

pretty fast... :-)

again: maxxpi does *not claim to be the fastest PI application.

it does not need to be.

fast enough to bench without sleeping *and long enough to show clearly differences between different setups. speed doesnt matter at all. it's the comparative between pc's

that makes a benchmark a benchmark.

Any idea where his Chudnovsky implementation stands? I'm sure the pi-community would like to see it.

hmm good question...!?!

(since all they seem to care about is speed)

i think you have to read this:

http://en.wikipedia.org/wiki/Benchmark_(computing)

to understand.

if you willing to get a worldrecord by calculation PI

with xxxxM then your are right=speed matters.

Also, if it's single-threaded (since it clearly is), why would it matter which formula (gauss vs. chudnovsky) is used? Regardless of resource distribution, it would still be 100% cpu over 1 core anyway.

well as i said *binary splitting*, that means multicore(thread) for one calculation.

chudnovsky *is the fastest formula at current,

but it will give not that consistently cpu load as gauss do.

Doesn't look like fast and pretty will ever go together...

surely do, look at MaxxPI :-)

but anyways, if your favorite is y-cruncher then use it!

it's a pice of wonderfull and incedible fast software.

As for speed...the slower the better. We need marathons as well as sprints for benching. SPi 1M is the first bench tried under LN2, lets be honest. We need bigger challenges not smaller ones.

that's the point!

cu

Edited May 6, 2009 by Tharamis
sry, i must edit.

allegratorial · May 5, 2009

again: maxxpi does *not claim to be the fastest PI application.

it does not need to be.

fast enough to bench without sleeping *and long enough to show clearly differences between different setups. speed doesnt matter at all. it's the comparative between pc's

that makes a benchmark a benchmark.

true, being more of a software benchmarker, I've never really thought about this.

hmm good question...!?!, here a screen from an very early alpha, but as i said

i don't think that this will be used in maxxpi (q6600 at 4ghz):

Assuming clock speed scales roughly linear, 36 seconds for 32M will beat both quickpi and y-cruncher in single-threaded mode.

It will also beat quickpi in multi-threaded mode... But it scales too poorly to even compare with y-cruncher.

Interesting question to ask:

Of the 3 multithreaded pi programs that exist now:

Why do MaxxPi and QuickPi's implementations for Chudnovsky's formula scale so poorly with multiple cores? Whereas y-cruncher achieves near linear scaling.

Interesting thing to notice is that the author y-cruncher is merely a junior in college. His purpose for writing the program was to smash a few size records (and he did). (and as you'd mentioned: for record breaking, speed matters)

Also, when I compared the speeds of the other constants that y-cruncher can compute with QuickPi (all of which y-cruncher currently holds the world record for), y-cruncher beats QuickPi hands down even in single-threaded mode. It's only with Pi is y-cruncher slower than QuickPi - which leads me to think that because there's no "attainable" record at stake for Pi, this kid never even bothered to optimize his implementation for Pi.

So from a software benchmarker's standpoint, this has me wondering what y-cruncher could turn into given that it's already a killer program in terms of pure-multithreaded speed. I also wonder what will happen when this kid gets older and becomes more experienced.

Of course crunching pi itself is pretty useless, but it's the underlying arithmetic engine in y-cruncher that is valuable as it currently has the only multithreaded multiplication in the world that will beat even GMP - and it does so single-threaded and without assembly optimizations.

i think you have to read this:

http://en.wikipedia.org/wiki/Benchmark_(computing)

to understand.

if you willing to get a worldrecord by calculation PI

with xxxxM then your are right=speed matters.

well as i said *binary splitting*, that means multicore(thread) for one calculation.

chudnovsky *is the fastest formula at current,

but it will give not that consistently cpu load as gauss do.

surely do, look at MaxxPI :-)

but anyways, if your favorite is y-cruncher then use it!

it's a pice of wonderfull and incedible fast software.

that's the point!

cu

Yes I know what benchmarking is. I'm more of a software benchmarker than a hardware benchmarker.

I know that Chudnovsky's formula is currently the fastest known algorithm, but all that binary splitting stuff is over my head. too much math for me.

And by "fast", I mean something comparable to QuickPi at the least. If you can get him to release a GUI version of his Chudnovsky implementation, then it will satisfy both "fast" and "pretty".

Edited May 5, 2009 by allegratorial

Tharamis · May 5, 2009

hi,

Why do MaxxPi and QuickPi's implementations for Chudnovsky's formula scale so poorly with multiple cores? Whereas y-cruncher achieves near linear scaling.

nearly linear scaling with cores/threads is not possible.

i personally think, that y-cruncher also uses binary splitting and a strong

usage of gmp. so if this is true (or near by truth) linear scaling is im-possible.

this is one of the reasons why MaxxPI² will calculate parallel PI results for each core.

to keep much load as possible on the cores.

Interesting thing to notice is that the author y-cruncher is merely a junior in college. His purpose for writing the program was to smash a few size records (and he did). (and as you'd mentioned: for record breaking, speed matters)

only one thing matters here: TIME, if you had time, everything is possible

and as i was in college... i had time. much time.

So from a software benchmarker's standpoint, this has me wondering what y-cruncher could turn into given that it's already a killer program in terms of pure-multithreaded speed. I also wonder what will happen when this kid gets older and becomes more experienced.

hmm... GMP... i think, but anyways i wish him luck!

And by "fast", I mean something comparable to QuickPi at the least. If you can get him to release a GUI version of his Chudnovsky implementation, then it will satisfy both "fast" and "pretty".

and again: MaxxPI² is only comparable (PI and all other calc.benchs) to it self.

well, for me personally, MaxxPI is fast enough, it provides reliable consistent results.

this is the most important thing.

There is no need to hassle about 512M in 1sec. this is useless

and the author of MaxxPI² shares this opinion with me (i strongly think).

and don't forget this thread is written for MaxxPI²,

not for y-cruncher and comparing against them.

cu

allegratorial · May 5, 2009

nearly linear scaling with cores/threads is not possible.
i personally think, that y-cruncher also uses binary splitting and a strong

usage of gmp. so if this is true (or near by truth) linear scaling is im-possible.

this is one of the reasons why MaxxPI² will calculate parallel PI results for each core.

to keep much load as possible on the cores.

Maxxpi uses GMP? :confused: So that Pi chudnovsky implementation that it uses is merely this?

http://gmplib.org/pi-with-gmp.html

Whatever y-cruncher uses, it gets more than 4x scaling on Core i7 (with HT) and 7x on his dual-harpers (according to his website) - which by "my" judgement is nearly linear.

But anyways, I'll let this thread get back to Maxxpi. Sorry I interrupted.

Tharamis · May 6, 2009

hi,

Maxxpi uses GMP? So that Pi chudnovsky implementation that it uses is merely this?

no, i mean that y-cruncher's characteristics match in wide areas with GMP...

for MaxxPI i don't know this at all, but i don't think so because it will use the gauss algo.

But anyways, I'll let this thread get back to Maxxpi. Sorry I interrupted.

no problem, fine

cu

Edited May 6, 2009 by Tharamis

Tharamis · May 26, 2009

hi all,

some little news, now i'm *authorised to post this:

1, MaxxPI² MultiCore ( Pre Alpha ):

screen with an Q6600 at 4100mhz, first 1core below 4cores (scaling)

will support 2,3,4 and 8cores (for now), calculate up to 256M (for now)

chudnovsky used, incl. splitting.

put about >78%! constant load (PMU),

on *all cores, so be carefull. has no CPU-specific optimizations for any

CPU manufaturers. uses mmx/sse

main problem was, load balancing (especially with chudnovsky) and not

to prefer any CPU manufacturer.

this both slow down the calc. speed, but i think at CPU/PC -benchmarking,

this doesn't matter at all, because comparability is the key.

optimized for an major CPU manufacturer a performance

gain from +12% to +18% is possible.

@allegratorial, i know it's important for you:

MaxxPI² MultiCore ( Pre Alpha all x86):

256M, with i7 at 4ghz, with 4cores/4threads (not 8): 473sec.

256M, same machine, with 4cores/4threads (not 8): QPI: 402sec.

256M, same machine, with 4cores/??threads, (not 8??): y-chruncher: 229sec.

2, much more intressing I think, is this:

MaxxMEM²

this will released soon

cu

Edited May 26, 2009 by Tharamis

Tharamis · May 26, 2009

hi,

MaxxMem is very interesting

yes

do you think the author could come up with a total score for MaxMem, something like MaxMem-Total = (MemCopy+MemRead+memWrite)*(1/MemLatency)

this in combination maybe with CPu-Z memory tab algorithm detect would allow for auto submit to HWbot.

for now, the *memory score is the arithmetic average between:

"read" and "write", same as the big brother MaxxPI² does.

Memory copy is not part of the memory score, because big MaxxPI²

doesn't use memory copy at all (no need for).

Reaced memory / latency score's will be comparable to MaxxPI².

For further suggestions concerning hwbot, you should contact him directly, via:

http://www.maxxpi.net/pages/contact.php

Regards

Tharamis

Edited May 26, 2009 by Tharamis

Tharamis · May 28, 2009

hi,

little update to v1.40:

* v1.40, application name change, to MaxxPI² - PreView - Single (28/05/2009)

* v1.40, change in OS name detection (28/05/2009)

* v1.40, added 256M calculation option for x64 (28/05/2009) NEW!

cu

Tharamis · June 3, 2009

hi,

now new!

as a part of MaxxPI²'s memory benchmark, as a preview-version:

MaxxMEM² (Memory/Latency, v1.05):

http://www.maxxpi.net

cu

Tharamis · June 11, 2009

hi,

Review/preview of:

upcomming "MaxxPI² - PreView - Multi" (unpublished until now)

Review done by Frank Hempel, at: http://www.radeon3d.org (review in german)

Direkt link to review: http://www.radeon3d.org/artikel/sonstiges/maxxpi_multi

cu

Edited June 11, 2009 by Tharamis

Tharamis · June 12, 2009

hi,

here: http://www.maxxpi.net/pages/reviews/rev.php

Review/preview of: "MaxxPI² - PreView - Multi" (english translation)

it scales *very well, on any kind of cpu (cpu/type/manufacturer)

cu

Tharamis · June 13, 2009

hi,

we already have a few Pi program; so not that interesting for the bot;

hmm

I think you partly misunderstood me, I was not asking for anything.

It was a pure declarative statement.

cu

Tharamis

u22 · June 13, 2009

now downloading for everybody

http://www.radeon3d.org/downloads/benchmarks/maxxpi/

Tharamis · June 14, 2009

hi,

now downloading for everybody

http://www.radeon3d.org/downloads/benchmarks/maxxpi/

That was the right decision!

cu

Tharamis · June 19, 2009

hi,

update MaxxPI² - PreView - Single to v1.41

• v1.41, Batchmode added (18/06/2009) NEW!

cu

Tharamis · July 16, 2009

hi,

MaxxPI² - PreView - Multi

This benchmark uses as formula the Chudnovsky algorithm, unlike the MaxxPI ² - PreView - Single,

that use the Gauss-Legendre algorithm.

The advantage of the Chudnovsky algorithm is, that principally,

multi-core capability is possible. MaxxPI ² - PreView - Multi use this.

That means: That all available CPU cores work together on a single calculation.

Technical:

MaxxPI² - PreView - Multi needs at least a dual-core processor and supports in the current version 1.07,

CPU's with 2,3,4 and 8 Core's.

Note:

HT core counts as real core, so 1+1HT core will accepted).

Maximum depth of calculation:

268.435.456 decimal places

• v1.07, initial public release (16/07/2009) NEW!

cu

MaxxPI² - Pi/CPU/System Benchmark

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation