Jump to content
HWBOT Community Forums

Math turns benchmark: y-cruncher meets HWBOT


Mysticial

Recommended Posts

Can you please xplain why I can submit with the hwbotsubmitter but same file I think cannot be uploaded at hwbot form?

 

Originally, it was because the datafile wasn't encrypted. So you could just edit and submit whatever you want. But that point is moot now since I got the encryption to work.

 

But one problem remains:

 

I haven't tested this so I can't confirm it. But by the looks of it, HWBOT doesn't try to verify if the datafile is actually for the right benchmark. So you can manually submit a 25m benchmark into the 1b category.

 

That's a big enough loop-hole for me to simply disallow manual submissions for now. I could fix that by using different encryption keys, but that complicates things and it wasn't something I considered important enough to bother with.

Edited by Mysticial
Link to comment
Share on other sites

Win XP doesn't work

 

Support for Windows XP was dropped 4 years ago. Version 0.5.5 was the last version that could run on XP. Unfortunately, only v0.6.1 and later is supported for HWBOT.

 

In short, the latest versions can't run on XP because they make system calls which didn't exist prior to Vista.

 

As far as performance goes, you need a minimum of Win7 SP1 to be able to run the AVX binaries. So even if you could run on XP, you can expect a slow-down of at least 2x compared to Win7 SP1 and later. (Assuming you're running on Haswell.)

Link to comment
Share on other sites

I just pushed an update to the submitter app. The new features are:

  • Integrated support for screenshots.
  • An option to easily override the binary selection.

Along with this are some miscellaneous UI changes. The top menu bar is largely redundant for now. It's a placeholder for later when there isn't enough space to make a button for everything.

 

The screenshots are sent as part of the datafile. Due to the encoding that HWBOT uses, the datafiles would be really large. So I had to turn on compression. In other words, the older versions of the submitter will no longer work. So you will need to update to this one to make any new submissions.

 

Hopefully this won't be too problematic for everyone. Please let me know if there are any issues with this new version. Quite a few things changed and it's possible I broke something.

 

Here's what the latest version looks like: http://hwbot.org/submission/3189606_

 

image_id_1623393.jpg

Edited by Mysticial
Link to comment
Share on other sites

for me the new Version is not 100% ok.

i can run ,but submition butten don't work.

and manuel submitting give : Invalid data file: Unable to decrypt the datafile

 

Ugh...

 

Manual submissions have always been disabled for reasons I mentioned in the other thread.

 

In what way is the submission button not working? It doesn't do anything when you click? When you click it, do you at least get the status update on the bottom right corner?

 

2016_4_16_status-update.png

 

The behavior of that changed since the last version. Previously, the entire app would freeze while it sends out the datafile. But with screenshots, the datafile is much larger. So depending on your internet connection, it may freeze for a long time. So I changed the implementation so it wouldn't freeze but would at least display an indicator that it's still sending.

Link to comment
Share on other sites

Submission worked here: http://hwbot.org/submission/3189693_massman_y_cruncher_pi_25m_core_i7_4500u_12sec_927ms

 

I'm super impressed with the integration, @Mysticial, brilliant approach to the integration problem. If all benchmarks would have a submit function like this, part-taking in OC would be so much easier!

 

That's a pretty low bar to be "impressed" by. :) If I tried to present this to the UI folks back when I was at Google, they would've blown me off and told me to go back to programming.

 

In all seriousness. I'm also a user of the app. So if something doesn't feel right, or inconvenient, I'll tweak it until it does. After a few iterations of that, this is what it converged to (at least for my workflow).

 

The screenshot one was particularly tricky since I didn't want to over-complicate the UI. Screenshots are very invasive and can capture sensitive stuff. So I wanted the user to know exactly what he/she is about to send before actually sending it. And then there was the horrific side-effect of making the datafile really big and taking 20 seconds to send and freezing the app for the entire time. (My 2560 x 1440p monitor screenshots to a 2MB png.)

Link to comment
Share on other sites

submition working whit out the screenhot.

but if i click take a screenshot furst and than submit. noting happens ,no message : sending datafile, pleas wait

 

So you do not see the message, "Sending datafile. Please wait..."?

 

It may take a few seconds before the message pops up. But if you have a slow internet connection or a really large screenshot, then the actual sending may take a while. It takes 20+ seconds for me if I try to upload a 1440p image. And my connection isn't the worst one out there.

 

It sounds like I need to make the send progress more explicit. That way even when it's not working, it's clearer where it gets stuck. Unfortunately, I can't do a progress bar since the Java network API doesn't seem to have any way to relay that information back to the caller.

Link to comment
Share on other sites

yes i don't see the message, "Sending datafile. Please wait..."

i am alredy waiting 8 minits sind the last try

So you do not see the message, "Sending datafile. Please wait..."?

 

It may take a few seconds before the message pops up. But if you have a slow internet connection or a really large screenshot, then the actual sending may take a while. It takes 20+ seconds for me if I try to upload a 1440p image. And my connection isn't the worst one out there.

 

It sounds like I need to make the send progress more explicit. That way even when it's not working, it's clearer where it gets stuck. Unfortunately, I can't do a progress bar since the Java network API doesn't seem to have any way to relay that information back to the caller.

Edited by skulstation
Link to comment
Share on other sites

yes i don't see the message, "Sending datafile. Please wait..."

i am alredy waiting 8 minits sind the last try

 

Try this one and see how far it gets.

 

Version 0.9.3.95: http://www.numberworld.org/y-cruncher/HWBOT%20Submitter%20v0.9.3.95.jar

 

I did manage to hack in a progress counter for the submission.

 

When everything works properly, you should see things in this order:

  1. Building datafile. Please wait...
  2. Sending datafile. Please wait...
  3. Sending: XX.X MiB / XX.X MiB

That last one will refresh every second until the submission is complete. Then it disappears. If any errors occur, there should be an error-box that pops up. If something crashes, you probably won't see anything and it will hang.

 

Anyway. I need to get some sleep. So I probably won't be able to respond for quite a while.

Link to comment
Share on other sites

I was going through the submissions and I noticed a number of multi-socket systems that have seemingly terrible performance. (Especially that 4-socket Magny-Cours Opteron.)

 

I'll go ahead and explain why this is the case. It will probably be obvious to those of you who are familiar with the topic.

 

-----

 

Why does y-cruncher (sometimes) suck on multi-socket systems?

 

This is due to memory access. Specifically, Non-Uniform Memory Access (NUMA).

 

y-cruncher can only run efficiently when the following assumption is true:

  • Every core/processor has fast access to all the memory.

This is true for all single-socket systems as well as some of the pre-Nehalem dual-socket servers. But not on modern multi-socket systems.

 

On multi-socket systems, each processor socket has its own set of memory banks. A processor has fast access to its own set of memory. But if it needs to access memory that's elsewhere (on a different socket), it needs to go over the interconnect to get it from the other processor. So it's a lot slower.

 

In other words, the assumption that is critical to y-cruncher's performance is no longer valid. Some memory is faster, and some memory is really slow - hence "Non-Uniform Memory Access". If you have two sockets, half the memory will be fast and the other half slow. If you have a lot of sockets, then the vast majority of the memory will be slow with respect to each individual processor.

 

If you think that's bad, get ready for more bad news.

 

Operating systems are aware of the NUMA. So they try to be smart about it. When a program runs, it biases the memory in favor of the core that asked for it. This maximizes locality so that memory access stay within the same NUMA node. While this sounds reasonable for most applications, it actually backfires for y-cruncher. Unlike most programs, y-cruncher wants to use the entire system.

 

Some of you might have noticed that y-cruncher's memory usage is static throughout the entire computation. What's happening is that y-cruncher allocates all the memory it needs upfront and reuses it through the computation. And that's where the problem is. That allocation is done by a single thread. So the OS will put all of it on one NUMA node.

 

During the computation, y-cruncher spawns threads that run on all the cores and all the sockets/NUMA nodes. Since all the memory is on one socket, all processors from all the sockets will hammer that one socket. Not only is it overloading the memory bandwidth in that node, it's also swamping the QPI going in and out of that socket. Meanwhile, all memory on the other nodes are idle. In other words, a massive traffic jam while everybody tries to park in one garage while there are 3 others that are empty.

 

This is why the performance sucks on those quad-Opteron servers. It also affects Intel machines as well, but to a lesser degree since they seem to have better interconnects.

 

What can you do about it?

 

The biggest problem is the traffic imbalance. If your BIOS has the option to disable the NUMA, then do it. This doesn't actually disable the NUMA since the NUMA is a physical thing, but it tricks the OS into thinking there's no NUMA so it randomizes the memory allocations across all the nodes.

 

In Linux, you have a bit more control. The numactl package lets you run a program with interleaved memory. This also spreads out the memory across the nodes.

 

These tweaks will help y-cruncher run faster. But it doesn't completely solve the NUMA problem. There's still the latency problem, and even when the interconnect traffic is balanced out, it will still be a bottleneck.

 

Solving the NUMA problem can only be done by redesigning the program. That's obviously beyond the scope of benchmarking.

 

That said, it doesn't mean you should avoid multi-socket systems. A high-end dual-socket machine that is properly configured will still beat out all the single-socket setups - LN2 or not.

 

What makes y-cruncher different from programs like wPrime?

 

y-cruncher actually needs to use memory - and a lot of it. (Not that I needed to say that.)

Link to comment
Share on other sites

When you will have time after finalizing Y-Cruncher launcher, it would be appreciated if you could give your talents to rebuild Hwbot Prime and Unigine launchers.

Open a Kickstarter/GoFundMe, we'll take care of you.

 

 

 

(Joke aside, if it happens one day, i would totally fund it ! )

Link to comment
Share on other sites

one nice feature to UI , adding swap mode, because for total novice this will be very difficult to do via command line..

but so far very good..

 

I've thought about that. But swap mode is part of the custom compute menu. And that custom compute menu is f:D:oking complicated. So it will take a lot of work to mirror that menu into the UI.

 

Aside from that, there are some technical roadblocks. The custom compute menu is an interactive menu. Unlike the benchmark menu, you can't just fire a command at it and expect it to always work gracefully.

 

The custom compute menu will show memory calculations and warnings. It will also hide/disable options that aren't applicable or are incompatible with existing settings. And it will automatically adjust things in response to user-input.

 

But in order to do that in the UI, y-cruncher needs to be able to send information back to the UI. Unfortunately, that's not possible right now. And I don't know how to do it that atm. That said, I can try to design something around this limitation, but no promises.

Link to comment
Share on other sites

I've thought about that. But swap mode is part of the custom compute menu. And that custom compute menu is f:D:oking complicated. So it will take a lot of work to mirror that menu into the UI.

 

Aside from that, there are some technical roadblocks. The custom compute menu is an interactive menu. Unlike the benchmark menu, you can't just fire a command at it and expect it to always work gracefully.

 

The custom compute menu will show memory calculations and warnings. It will also hide/disable options that aren't applicable or are incompatible with existing settings. And it will automatically adjust things in response to user-input.

 

But in order to do that in the UI, y-cruncher needs to be able to send information back to the UI. Unfortunately, that's not possible right now. And I don't know how to do it that atm. That said, I can try to design something around this limitation, but no promises.

thought so , maybe via a batch file?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...