Jump to content
HWBOT Community Forums

The official Y-Cruncher Beta Competition Discussion


Massman

Recommended Posts

This is what the introduction says:

 

 

 

Y-Cruncher is a program that computes Pi and other constants to billions/trillions of digits. It currently holds the world record for the most digits of Pi ever computed (13.3 trillion digits) as well as a bunch of other less popular constants.

 

Y-Cruncher is also the first Pi computing program that can a) use multiple threads for a worthwhile speedup, b) use (and stress) an unlimited amount of memory and c) utilize ISA extensions (SSE, AVX, etc...) for nearly all modern processors. Y-Cruncher owner Mysticial developed a GUI that can run and submit results to HWBOT. This competition is to best test the submitter. More information in the forum thread.

 

Does that help?

Link to comment
Share on other sites

This is what the introduction says:

 

 

 

Y-Cruncher is a program that computes Pi and other constants to billions/trillions of digits. It currently holds the world record for the most digits of Pi ever computed (13.3 trillion digits) as well as a bunch of other less popular constants.

 

Y-Cruncher is also the first Pi computing program that can a) use multiple threads for a worthwhile speedup, b) use (and stress) an unlimited amount of memory and c) utilize ISA extensions (SSE, AVX, etc...) for nearly all modern processors. Y-Cruncher owner Mysticial developed a GUI that can run and submit results to HWBOT. This competition is to best test the submitter. More information in the forum thread.

 

Does that help?

 

Ahh there it is! Posting a link so other blind people might find it aswell :)

http://oc-esports.io/#!/round/ycruncher_beta_contest

Link to comment
Share on other sites

Guest george.kokovinis

I have to ask again. maybe I am stupid :)

 

Running on RVE with 5960X and Windows 10 with Hpet enabled or disabled.

I get the message coefficient too high-Redundance too large to execute.

All three benchmarks.

Any help ?

 

Thanks in advance.

Link to comment
Share on other sites

  • Administrators
For now, the same rules apply to Y-Cruncher as to other benchmarks without HPET detection. Win8/10 allowed for Skylake, but not the others (in short).

 

George, please note that Pieter made a clarification on OS use for the y-cruncher, it is beta an we still need to learn so please use win7 unless you use skylake. On the coefficient question, do yu have enough available memory and is it stable?

Link to comment
Share on other sites

I have to ask again. maybe I am stupid :)

 

Running on RVE with 5960X and Windows 10 with Hpet enabled or disabled.

I get the message coefficient too high-Redundance too large to execute.

All three benchmarks.

Any help ?

 

Thanks in advance.

 

You're the second person to hit this issue. So now I'm really curious because it's starting to point at a potential bug in the program.

 

The program has been around for 7 years, and this problem is only showing up now with competitive overclockers. What changes have you made to the OS that normally wouldn't be in place for an out-of-the-box installation?

 

You mentioned HPET. Anything else, special drivers? System configurations? BIOS?

 

 

George, please note that Pieter made a clarification on OS use for the y-cruncher, it is beta an we still need to learn so please use win7 unless you use skylake. On the coefficient question, do yu have enough available memory and is it stable?

 

The coefficient thing is a redundancy check failure in one of the core algorithms. It should have nothing to do with the timings or the memory. Someone else on the other thread had the same problem and it was failing consistently even at stock.

 

The fact that this has never been seen before strongly points at something that only overclockers do. Either way, it's something that should be fixed regardless and I'd need more information to be able to do anything.

 

I have one theory that could cause it - and I suspect it is driver related. But I have no evidence to support it other than the fact that it seems to be only affecting that one algorithm (which is unique in one important aspect).

 

On the topic of the HPET, that's something that I'm looking at right now since it seems to be the solution to fixing the base clock exploit on Win8/10.

Edited by Mysticial
Link to comment
Share on other sites

Guest george.kokovinis

Gentlemen,

 

Websmile ( my friend Michael ) and Mysticial ( apologies sir, I do not know your name ).

 

Here are the details, with which I will try to help as much as I can.

Michael, is well aware that I am a pretty experienced person with Intel architecture.

 

I am not going to hold my breath if it is Win 7 or win 10.

We are not discussing YET an official HWBOT benchmark, so actual rules do not apply.

 

To the juice.

Two platforms tested.

1) X99.

Test setup.

 

Asus Rampage V extreme, bios 2001

Intel 5960X, overclocked to 4875 core and 4250 cache. Strap 125.

G. Skill Ripjaws 4, 3200, 4x4gb. 13-14-16-16-280-1T. Rock solid in any benchmark during the last 18 months.

4 x Sandisk SSD 256gb - raid0

4 x EVGA Titan X SC, heavily modded.

 

Official Windows 10 - 64.

Official drivers from Asus website ( latest )

Official NVidia drivers.

 

No tweaks on the OS, bios or any kind.

 

2) Z170

AsRock OCF

Skylake 6700K overclocked to 5ghz.

G.Skill Trident Z - 3600 - 2x8gb and 4x8gb.

Irrelevant bits and bobs.

Windows 10 - 64.

Again official bios and drivers.

 

I managed to take only one measurement on the X99, when I detuned

the system below 4.5ghz.

 

Could not repeat It though.

 

That is the input I can offer so far.

If I can help more, I will be glad to.

 

 

Kind regards to all.

Link to comment
Share on other sites

 

I managed to take only one measurement on the X99, when I detuned

the system below 4.5ghz.

 

Interesting, so you got it to work once (and only once) after you downclocked? Other than that, I don't see anything in your configuration which stands out.

 

Btw, my real name is Alex. :)

Link to comment
Share on other sites

I just tested this on my 4770K:

  • Disable HPET via bcdedit.
  • Reboot the machine at 39 * 104 = 4056 MHz
  • Within the OS, set it to 41 * 99 = 4059 MHz

Sure enough, the clock skew kicks in and y-cruncher is magically 5% faster.

 

Furthermore, I wrote a small C++ program that tests 7 different timers. Every single one of them is fooled by the clock skew. That explains why almost every benchmark is vulnerable.

 

This is pretty bad... :(

 

There doesn't seem to be a way to access the HPET without kernel level access. Nor is there a way to reliably detect if HPET is disabled.

 

I spent a couple hours playing around with the behavior of the various system and hardware timers and examining how they respond to the base clock changes. From that I could see a pattern which is strong enough for me to build a heuristic that can detect if someone is cheating with clock skew.

 

But I'm not 100% sure it'll be free of false positives. I won't disclose my findings publicly. But if any staff members are interested, please PM me. If I don't come up anything better, this heuristic will go into y-cruncher v0.7.1.

Link to comment
Share on other sites

Regarding the "coefficient too large failure" - it seems to be a stability issue for stages 1 and 2 (I only have 32GB so could not test stage 3). I got the same error with HPET on or off however, lowering the core clock one multi with all other settings the same, it passes. Load seems similar to p95 and current draw is very high on an 8 core. Is there something I'm missing?

 

Different question:

 

Stage 1 needs at least 195MB of ram, Stage 2 needs at least 4.8GB RAM... then stage 3 jumps to 46GB (so 64GB min with all channels populated the same on most platforms)? Bypassing the more common 32GB configuration? Very limiting for Stage 3 entries, no?

Link to comment
Share on other sites

Well can you implement a way to detect HPET like _Mat_ did with GPUPI ?

 

I can't find a way to detect the HPET directly. At least not without kernel mode drivers. But there is a way to do it heuristically from user space with normal access privileges. Unfortunately, it will be a while before I can fully test this. Both of my test machines have ASUS motherboards which don't have a BIOS option to disable the HPET and force Windows to use something else. And the only Skylake system I have is a gaming laptop (overclockable, but with limited options).

 

One thing that makes things more complicated is that Windows doesn't consistently use HPET even when it is available:

  • On my 4770K and 5960X, Win7 doesn't use it.
  • Win8 uses it by default on my 4770K.
  • Win10 doesn't on both my 4770K and 6820HK.

(I haven't yet checked to see what happens on my FX-8350.)

 

If I do a blanket requirement that HPET is enabled, then all systems which are ok without it (Win7, Skylake) would be collateral damage and a major inconvenience to turn it on. From a usability perspective I can add a button that will create a script that runs the bcdedit command to turn on HPET. But come on... A program that creates scripts which run themselves as admin and modifies the BCD? Sounds totally harmless. That will have absolutely no issues with virus scanners. None at all... :rolleyes:

 

Right now v0.7.1 has OS detection. So I can whitelist all of Vista and Win7 and require a hardware reference clock for Windows 8/8.1 and 10. But there's no detection for Skylake. And then there's the question of Linux. (Which I'm not even gonna try to solve since it's open sourced and anyone can just modify and compile their own kernel.)

 

At this point we're well into the territory where anything that is built will probably fall apart on the next generation of hardware or software updates. So looking forward, I'll need to design this "reference clock sanity check" in a way that can be reconfigured on short notice.

 

Whatever the case is, I'm not going to write a kernel mode driver. That's a big can of worms and it's not my area of expertise. Furthermore, it's too invasive. I'm already lifting the privilege elevation for v0.7.1 and I really don't want to revert on that.

Link to comment
Share on other sites

Regarding the "coefficient too large failure" - it seems to be a stability issue for stages 1 and 2 (I only have 32GB so could not test stage 3). I got the same error with HPET on or off however, lowering the core clock one multi with all other settings the same, it passes. Load seems similar to p95 and current draw is very high on an 8 core. Is there something I'm missing?

 

Different question:

 

Stage 1 needs at least 195MB of ram, Stage 2 needs at least 4.8GB RAM... then stage 3 jumps to 46GB (so 64GB min with all channels populated the same on most platforms)? Bypassing the more common 32GB configuration? Very limiting for Stage 3 entries, no?

 

I guess that's starting to make sense now. I didn't realize that the gap between "stable" and "AVX-stable" is so large that you can be completely "stable", and yet fail immediately on any AVX.

 

About the stage 3 size. I asked Massman about the 48 GB requirement, and he said he was okay with it since it forces people to run a large and stable memory configuration. Technically, you can still run it without enough memory if you use the swap mode. That basically turns it into hybrid CPU/disk benchmark.

 

(@Massman, if you're interested in this, my favorite swap mode test is 100b. Takes about 12 hours on my 4770K @ 4 GHz with 16 hard drives. ;) I run these almost every weekend for regression testing.)

 

It's just a beta competition for now. We can adjust the sizes later. There's a very good chance that 25m is going to go away at some point since it's too small.

Link to comment
Share on other sites

<blockquote><span style="font-weight:bold;">jpmboy said: </span>Regarding the "coefficient too large failure" - it seems to be a stability issue for stages 1 and 2 (I only have 32GB so could not test stage 3). I got the same error with HPET on or off however, lowering the core clock one multi with all other settings the same, it passes. Load seems similar to p95 and current draw is very high on an 8 core. Is there something I'm missing?<br/>

<br/>

Different question:<br/>

<br/>

Stage 1 needs at least 195MB of ram, Stage 2 needs at least 4.8GB RAM... then stage 3 jumps to 46GB (so 64GB min with all channels populated the same on most platforms)? Bypassing the more common 32GB configuration? Very limiting for Stage 3 entries, no?</blockquote><br/>

 

<br/>

I guess that's starting to make sense now. I didn't realize that the gap between "stable" and "AVX-stable" is so large that you can be completely "stable", and yet fail immediately on <span style="font-weight:bold;"><span style="font-style:italic;">any</span></span> AVX.<br/>

<br/>

About the stage 3 size. I asked Massman about the 48 GB requirement, and he said he was okay with it since it forces people to run a large and stable memory configuration. Technically, you can still run it without enough memory if you use the swap mode. That basically turns it into hybrid CPU/disk benchmark.<br/>

(snip)

Yeah, it's really an 8 core thing. 8+ core E-class server processors will downclock when AVX calls are in the prefetch or stack due to the high current capability of the instruction set, so p95 stability with a 5960X for example will usually lower the OC by 1-200MHz due to neat management issues in most day-driver rigs. On Ivy 6-core or Haswell 6 core, p95 is not really a problem.

Hopefully BW-E 10-cores can (somehow) address the heat-transfer issue at it's 10-core density. But frankly, p95 really does not put any higher logic stress on the architecture than many lower current stressors.

 

Nice benchmark you worked out - adds the need for high load-current stability to the cpu benchmark set. The Kelvin cooler class will be burning thu lots o' LN2. :-)

Link to comment
Share on other sites

I can't find a way to detect the HPET directly. At least not without kernel mode drivers. But there is a way to do it heuristically from user space with normal access privileges. Unfortunately, it will be a while before I can fully test this. Both of my test machines have ASUS motherboards which don't have a BIOS option to disable the HPET and force Windows to use something else. And the only Skylake system I have is a gaming laptop (overclockable, but with limited options).

 

One thing that makes things more complicated is that Windows doesn't consistently use HPET even when it is available:

  • On my 4770K and 5960X, Win7 doesn't use it.
  • Win8 uses it by default on my 4770K.
  • Win10 doesn't on both my 4770K and 6820HK.

(I haven't yet checked to see what happens on my FX-8350.)

 

If I do a blanket requirement that HPET is enabled, then all systems which are ok without it (Win7, Skylake) would be collateral damage and a major inconvenience to turn it on. From a usability perspective I can add a button that will create a script that runs the bcdedit command to turn on HPET. But come on... A program that creates scripts which run themselves as admin and modifies the BCD? Sounds totally harmless. That will have absolutely no issues with virus scanners. None at all... :rolleyes:

 

Right now v0.7.1 has OS detection. So I can whitelist all of Vista and Win7 and require a hardware reference clock for Windows 8/8.1 and 10. But there's no detection for Skylake. And then there's the question of Linux. (Which I'm not even gonna try to solve since it's open sourced and anyone can just modify and compile their own kernel.)

 

At this point we're well into the territory where anything that is built will probably fall apart on the next generation of hardware or software updates. So looking forward, I'll need to design this "reference clock sanity check" in a way that can be reconfigured on short notice.

 

Whatever the case is, I'm not going to write a kernel mode driver. That's a big can of worms and it's not my area of expertise. Furthermore, it's too invasive. I'm already lifting the privilege elevation for v0.7.1 and I really don't want to revert on that.

No kernel mode driver necessary. Just avoid certain timing methods on some windows version to avoid clock skew. I introduced this with GPUPI 2.1 and wrote a small article including some tests: https://www.overclockers.at/articles/gpupi-2-1

 

Let me know if you need any kind of help. And keep up the good work!

Link to comment
Share on other sites

No kernel mode driver necessary. Just avoid certain timing methods on some windows version to avoid clock skew. I introduced this with GPUPI 2.1 and wrote a small article including some tests: https://www.overclockers.at/articles/gpupi-2-1

 

Let me know if you need any kind of help. And keep up the good work!

 

Hey! We haven't spoken in a while!

 

Yes, I actually did get it work a few days ago: http://forum.hwbot.org/showpost.php?p=440475&postcount=50

 

And I was able to test it for the other clock as well:

2016_4_10_clock_detection.png

 

Unfortunately, I had to change y-cruncher itself to do this. So this won't be public for at least another month. And even if it was ready before that, I can't release it during the beta competition because the new version (v0.7.1) is faster and will break speed consistency.

 

The QC process for new releases of y-cruncher is very long and drawn out. It usually takes months since there is zero tolerance for bugs that may affect the correctness of a large computation. I also can't easily back-port this thing back to v0.6.9.

Edited by Mysticial
Link to comment
Share on other sites

Glad you got it working and nicely done as well! :)

 

You have to run the wrapper with administrator rights to call bcdedit, am I correct? That's why I decided to just give a link to my FAQ section.

 

If the submitter app is run without admin rights, the UAC will prompt you for it when you try to do anything that requires admin.

 

I had to solve the problem of admin anyway since y-cruncher v0.6.9 requires admin to run. I did some dirty hack with VB scripts to get it to work in a way that was aesthetically pleasing.

 

The bcdedit option won't be enabled until v0.7.1 launches. There's no point in doing it earlier since y-cruncher can't detect the clock yet. And it's probably just gonna encourage people to exploit it.

 

 

I'm not really benching anymore so haven't bothered to test, but from what I can see the 25M test is kinda pointless?

 

I wouldn't personally have that as one of the permanent benchmarks in the bot since it's already below a second on a 5960x.

 

Correct. The 25m is kinda pointless. It was the original alpha-testing size that I used to develop the submitter app in the first place. Other than that, it's really only useful to test if you are setup properly to submit to HWBOT.

 

It'll probably go away after the beta competition. If we decide to keep it, there shouldn't be any points awarded for it. (at least that's my recommendation)

 

-----

 

And with respect to the benchmarks becoming too short. Since these are fixed sized benchmarks, they will inevitably get shorter and shorter until it's pointless. So the small ones will get phased out and bigger ones added.

 

It may seem a bit far-fetched, but I imagine that the 1b time will probably go under 20 or even 10 seconds in the next couple years. (Especially if Skylake Purley with AVX512 lives up to expectations.)

 

Here's a screenshot that Shigeru Kondo sent me. (Shigeru Kondo is the one who ran the 5, 10, and 12.1 trillion digit Pi records.)

 

2016_4_10_kondo_1b.jpg

 

He doesn't have an account on HWBOT, nor can he submit it since benchmarks from y-cruncher v0.7.1 are currently not accepted. But it destroys the current 1b record set with a 5960X on LN2.

 

Can anyone beat this? :P

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...