Jump to content
HWBOT Community Forums

Mysticial

Members
  • Content Count

    126
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by Mysticial

  1. Not sure when this started happening. But when I attempt to submit via the API, the server responds with: <html> <head><title>301 Moved Permanently</title></head> <body bgcolor="white"> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx/1.10.3 (Ubuntu)</center> </body> </html> What happened to the server? What's the alternative? And why weren't benchmark developers given notice prior to this change? If I missed some announcement, I apologize. I rarely check these forums nowadays. So unless I get an email or something, I won't know.
  2. Bingo. The URL I was sending to is: http://hwbot.org/submit/api?client=y-cruncher_-_Pi-25m&amp;clientVersion=1.0.1 Changing it to https works. I'll push out an update for that.
  3. Mysticial

    The Official Team CUP 2018 DDR4 stage thread:

    Btw, I'll be adding a datafile-only button to the y-cruncher HWBOT Submitter app. So you won't need to do the ugly disconnect work-around anymore. Apologies for that inconvenience. I never anticipated that submissions would be done anywhere other than directly to HWBOT. And there were some technical reasons why I didn't want to expose the raw datafile. (mods feel free to PM me on the details) Fixing this will require a backwards incompatible change. So I'll be doing it in November after the competition ends. I don't really want to touch anything while that's ongoing.
  4. Mysticial

    GPUPI - SuperPI on the GPU

    I just want to highlight this statement by Mat in his justification of reworking the implementation: As a hobbyist software developer myself, I can completely relate to that. When you put that much time and effort into making something, there is a strong desire to make it the best it can be. And it becomes a personal challenge to make it better and better. And quite often, "better" means faster and more efficient. However, from the perspective of competitive benchmarking, you want everything to stay the same. Once the benchmark is released, it never changes. No speedups, no utilization of new processor features, it needs to be frozen in time - forever. In other words, there is a conflict of interest between competitive benchmarking and benchmark writers doing personal projects. So these sorts of breaking changes are bound to happen eventually. It's a normal part of software development. The only thing you cannot change is change itself. So I think it would probably be more fruitful to start coming of up with better ways to cope with speed changes in benchmarks. Do you periodically introduce new benchmarks? If so then maybe consider phasing out old ones - even if they are still popular. How about making the speed changes part of the game? If the benchmarker wants to stay on top he/she must stay up-to-date with the latest versions of the program and the improvements that they bring. For what it's worth, the y-cruncher benchmark which I maintain has never kept speed consistency. It gets faster with almost every single release. But nobody on HWBOT notices since there aren't any points for it. Furthermore, the improvements are much more incremental and are spread out over many releases so you don't see the massive 50% speedups that we're seeing with GPUPI 3.3. In reality, y-cruncher is a scientific application first and a benchmark second. One* of the goals of the program is to compute Pi as fast as possible by any means necessary - on any hardware and with any software changes. This is why it can utilize stuff like AVX/AVX512, unlimited memory, etc... But fundamentally, this is incompatible with competitive benchmarking as it is today so I've never really been bothered that it never "caught on". *The "real" goal is of course to set size records. But that's outside the scope of HWBOT since the hardware needed to do this is typically on the order of 5-6 figures USD and requires months of computation.
  5. I just noticed that submissions to the HWBOT via the API are now breaking because the API seems to have changed. In the past, the server responds with JSON. Now it responds with XML. Why did this change? And why wasn't there a notification to all the benchmark maintainers that depend on the API. ----- Unrelated note, the version whitelisting seems to be broken. I'm no longer able to specify multiple versions to whitelist. It's either no filtering, or one version only.
  6. I'm also seeing them right now.
  7. I'm not sure how many of you are familiar with the y-cruncher Multi-Threaded Pi Program. It's been around for quite a while now (since 2009). In short, it's a program that computes Pi and other constants to billions/trillions of digits. It currently holds the world record for the most digits of Pi ever computed (13.3 trillion digits) as well as a bunch of other less popular constants. y-cruncher is also the first Pi computing program that can: Use multiple threads for a worthwhile (sometimes linear) speedup. Use (and stress) an unlimited amount of memory. Utilize ISA extensions (SSE, AVX, etc...) for nearly all modern processors. There has been some hope that it could be a SuperPi/PiFast alternative. But that never really happened. Over the years, I've been asked numerous times why this program never became part of HWBOT. In fact, I've had many chats with Massman. But none of them really got anywhere until recently. Also, the fact that the program lacks a GUI didn't really help either. And to be fair, y-cruncher was designed as a math program with one purpose in mind - to break size records. Competitive benchmarking and user-friendliness was always secondary. Official XtremeSystems thread here. (Though it's been a while since I've updated it.) ----- Anyways. It took about a week of work, but I've thrown together an app complete with a GUI that can run and submit y-cruncher benchmarks to HWBOT. This app is called the "y-cruncher HWBOT Submitter". It's written in Java and requires the Java 8 runtime to run. (y-cruncher itself has no requirements other than Windows Vista or later.) For now, I've only enabled 3 benchmarks for HWBOT: Pi - 25 million digits Pi - 1 billion digits Pi - 10 billion digits The 25m benchmark will go under a few seconds for modern hardware. That's too fast, so we'll probably drop that at some point. The only reason it exists in the first place is because it's fast and easy to test. The 1b benchmark is the standard size. It requires about 5 GB of ram to run and will take a few minutes to run for most high-end systems. The 10b benchmark will require 48 GB of ram. That basically implies a minimum of Skylake, Haswell-E, or some server. If you don't have enough memory, it's possible to run it using swap mode. But that's more complicated to setup and will be slower than doing it all in memory. How does it work? Anyone who's familiar with y-cruncher will know that it outputs a validation file at the end of every computation. The submitter app is a runnable .jar file that you can put in the y-cruncher folder. When you run it, it automatically searches out all the validation files and verifies the checksums in them. The ones that are valid and match a supported HWBOT size are available for submission to HWBOT. The submitter app is a wrapper on top of y-cruncher itself. No changes to y-cruncher were needed for this to happen. And quite frankly, I designed it this way so that I could keep all the Java networking/GUI separate from the 300,000 lines of ugly C++ that is y-cruncher. Download: Current y-cruncher version: 0.7.7.9495 Current HWBOT submitter version: 1.0.1.133 y-cruncher v0.7.7 with HWBOT Submitter Version Support: The submitter app: Supports all validation files generated by y-cruncher v0.6.1 - v0.7.7. So you can retroactively submit old benchmarks if you still have the validation files for them. Supports benchmark integration with y-cruncher v0.6.6 - v0.7.7. Despite being written in Java, the submitter app does not run in Linux (at least I couldn't get it to run). But at the very least, validation files generated in Linux can still be submitted to HWBOT if you transfer it to Windows and run the submitter app there. I have yet to figure out why it's broken in Linux, but it seems to involve the JavaFX library. In any case, even if someone does manage get it to run, the benchmark integration will still be broken since it uses Windows-specific command-line parameters to launch y-cruncher. Version History: Main Page: http://www.numberworld.org/y-cruncher/version_history_ui.html
  8. Mysticial

    Math turns benchmark: y-cruncher meets HWBOT

    This is getting fixed on both sides. richba5tard says a server-side push tomorrow will change the default from XML back to JSON. At the same time, I've updated the HWBOT Submitter to explicitly request JSON. Here the first submission of the new y-cruncher version that I released yesterday (and re-released just now with the fixed HWBOT Submitter): http://hwbot.org/submission/3766703_
  9. Thanks! I was going to say that I currently don't set the accept header. (I actually have no idea what that even is.) I'll figure out how to do it later for a future release. But it should start working again once the server-side change rolls out.
  10. Thanks. I am aware of Massman's departure and the new revision. But I thought the revision was mainly points-related and not have anything to do with the submission APIs. While I *can* switch y-cruncher's HWBOT submitter to handle the new XML response, it will take time. And I figured that this could be breaking more than just y-cruncher.
  11. Mysticial

    Math turns benchmark: y-cruncher meets HWBOT

    Just a heads up, submissions are currently broken. The HWBOT API seems to have changed without any warning: http://forum.hwbot.org/showthread.php?t=175630 So until that gets resolved, no submissions will go through.
  12. Right now, there are conflicting reports that this first line of Skylake X processors (based on the 10-core Skylake Purley LCC die) will not have full-throughput AVX512. Skylake-X not support AVX-512 instructions Skylake-X i7-7900X Performance Leaked: 55% faster than i7-6950X @ 4.5GHz If this is true, the current Skylake X processors will only be able to run AVX512 at half the speed as the server Xeons - IOW, no better than AVX2. I want to definitively answer this question - both for myself and for anyone else looking to purchase a Skylake X processor for the purpose of AVX512. Using the same FLOPs benchmark that discovered the Ryzen FMA bug, we should be able to find out if Skylake X has full-throughput, or half-throughput AVX512. So my request for someone who has a Skylake X sample* to: Run the "2017-SkylakePurley" binary here: https://github.com/Mysticial/Flops/tree/master/version3/binaries-windows** Do it at a fixed CPU frequency (to avoid the affects of Turbo Boost). Do it with HT enabled. Don't use an extreme overclock. If the chip has full-throughput AVX512, then those AVX512 instructions may produce more heat than any other benchmark you've ever run. Do it with a fully updated Windows 10. Or a recent version of Linux (like Ubuntu 17.04). This is needed to ensure that the OS has support for AVX512. *I may be wrong, but I don't believe Skylake X benchmarks are under NDA anymore since there's already a gazillion HWBOT submissions and you can get access to the server variants on Google Cloud. **The source code is also in that GitHub repo if you want to build it yourself. But be aware that if you need the Intel Compiler if you want to build the AVX512 binaries for Windows. ---------------- When you run the benchmark, I expect one of 3 things to happen: The binary crashes: This means that Windows 10 does not have support for AVX512 and we'll need to wait for that support. The numbers for 512-bit AVX are about the same as the 256-bit AVX: This means that the processor only supports half-throughput AVX512. The numbers for the 512-bit AVX are about 2x as that of the 256-bit AVX: This means that the processor supports full-throughput AVX512. Here is what the benchmark looks like for a 32-core Skylake Purley system on Google Cloud running at 2.0 GHz with 2.5 GHz turbo: Running Skylake Purley tuned binary with 1 thread... Single-Precision - 128-bit AVX - Add/Sub GFlops = 15.904 Result = 2.02376e+06 Double-Precision - 128-bit AVX - Add/Sub GFlops = 7.952 Result = 1.00995e+06 Single-Precision - 128-bit AVX - Multiply GFlops = 15.936 Result = 2.03498e+06 Double-Precision - 128-bit AVX - Multiply GFlops = 7.968 Result = 1.00712e+06 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.69085e+06 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 7.968 Result = 841756 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.02868e+06 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 15.936 Result = 1.01782e+06 Single-Precision - 256-bit AVX - Add/Sub GFlops = 31.808 Result = 4.06688e+06 Double-Precision - 256-bit AVX - Add/Sub GFlops = 15.936 Result = 2.02901e+06 Single-Precision - 256-bit AVX - Multiply GFlops = 31.872 Result = 4.06158e+06 Double-Precision - 256-bit AVX - Multiply GFlops = 15.936 Result = 2.02013e+06 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 31.872 Result = 3.34696e+06 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 15.936 Result = 1.70441e+06 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 63.744 Result = 4.0399e+06 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 31.872 Result = 2.00801e+06 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 63.744 Result = 8.11456e+06 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 31.872 Result = 4.03949e+06 Single-Precision - 512-bit AVX512 - Multiply GFlops = 63.36 Result = 8.0743e+06 Double-Precision - 512-bit AVX512 - Multiply GFlops = 31.872 Result = 4.05014e+06 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 63.744 Result = 6.68723e+06 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 31.872 Result = 3.3739e+06 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 127.488 Result = 8.22848e+06 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 63.744 Result = 4.03805e+06 Running Skylake Purley tuned binary with 64 thread(s)... Single-Precision - 128-bit AVX - Add/Sub GFlops = 683.36 Result = 8.68179e+07 Double-Precision - 128-bit AVX - Add/Sub GFlops = 263.568 Result = 3.35065e+07 Single-Precision - 128-bit AVX - Multiply GFlops = 527.616 Result = 6.69453e+07 Double-Precision - 128-bit AVX - Multiply GFlops = 263.88 Result = 3.34619e+07 Single-Precision - 128-bit AVX - Multiply + Add GFlops = 527.136 Result = 5.58561e+07 Double-Precision - 128-bit AVX - Multiply + Add GFlops = 263.64 Result = 2.79832e+07 Single-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 1056.77 Result = 6.71142e+07 Double-Precision - 128-bit FMA3 - Fused Multiply Add GFlops = 528.336 Result = 3.36188e+07 Single-Precision - 256-bit AVX - Add/Sub GFlops = 1054.14 Result = 1.34076e+08 Double-Precision - 256-bit AVX - Add/Sub GFlops = 527.52 Result = 6.68866e+07 Single-Precision - 256-bit AVX - Multiply GFlops = 1056.77 Result = 1.34416e+08 Double-Precision - 256-bit AVX - Multiply GFlops = 527.664 Result = 6.70251e+07 Single-Precision - 256-bit AVX - Multiply + Add GFlops = 1055.33 Result = 1.12018e+08 Double-Precision - 256-bit AVX - Multiply + Add GFlops = 527.52 Result = 5.59086e+07 Single-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 2110.08 Result = 1.34046e+08 Double-Precision - 256-bit FMA3 - Fused Multiply Add GFlops = 1055.33 Result = 6.69451e+07 Single-Precision - 512-bit AVX512 - Add/Sub GFlops = 2112.26 Result = 2.68216e+08 Double-Precision - 512-bit AVX512 - Add/Sub GFlops = 1056 Result = 1.34131e+08 Single-Precision - 512-bit AVX512 - Multiply GFlops = 2117.38 Result = 2.69031e+08 Double-Precision - 512-bit AVX512 - Multiply GFlops = 1059.26 Result = 1.34601e+08 Single-Precision - 512-bit AVX512 - Multiply + Add GFlops = 2118.14 Result = 2.24393e+08 Double-Precision - 512-bit AVX512 - Multiply + Add GFlops = 1058.5 Result = 1.12102e+08 Single-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 4242.43 Result = 2.69409e+08 Double-Precision - 512-bit AVX512 - Fused Multiply Add GFlops = 2115.07 Result = 1.34365e+08 This Skylake Purley system has full-throughput AVX512.
  13. Mysticial

    A Favor to Ask: Skylake X and AVX512

    WOW... AVX512 @ 4.5 GHz. How many watts did it pull? That also confirms full-throughput AVX512 (both FMAs enabled) for the 7980XE. *I love Intel calls this a "1 teraflop" CPU when it's really 2 - 3 TFLOPs (stock), or 5 TFLOPs (here at 4.5 GHz).
  14. Well, that's a huge letdown... I guess Intel can't even do a proper knee-jerk to Threadripper...
  15. Mysticial

    Math turns benchmark: y-cruncher meets HWBOT

    And version 0.7.3 is out. And here are the first few submissions on a Core i9 7900X @ 4.0 GHz (all-core AVX512) and memory @ 3200 MHz: Mysticial`s Y-Cruncher - Pi-25m score: 0sec 739ms with a Core i9 7900X Mysticial`s Y-Cruncher - Pi-1b score: 38sec 522ms with a Core i9 7900X Mysticial`s Y-Cruncher - Pi-10b score: 8min 51sec 111ms with a Core i9 7900X The AVX512 didn't bring as much of a speedup as I'd hoped. But it's still enough to beat all the Haswell and Broadwell HEDTs and come within arm's length of the dual-socket servers.
  16. Is this throttling only in the clock speeds? As in you can see the throttle happen by watching CPUz and seeing the frequency drop. I'm noticing on my Gigabyte AORUS 7 that there is a sort "phantom AVX512 throttle" that disables half the AVX512 while maintaining the same clock speed. So while CPUz shows a constant 4 GHz, the performance (and temperatures) drop when the "AVX512 throttle" kicks in. I can partially get around the throttling by lowering the clock to 3.8 GHz and increasing the TDP limit to 400W. But never was I able to avoid the throttling at or above 4.0 GHz. I've spoken to Silicon Lottery about this and he says all the Gigabyte boards for X299 have tons of background throttling that make it hard to use and I'm not sure if he's referring to the "AVX512 throttle" or clock speed throttling in general.
  17. Mysticial

    Math turns benchmark: y-cruncher meets HWBOT

    If anyone's wondering about AVX512. It's coming... No ETA yet since I've hit a number of unexpected snags. If you're wondering why the chip is underclocked. There's a reason for that. And it's a long story. Don't worry, it's much harder to fry your chip with AVX512 than I'm making it sound - unless you disable thermal protection...
  18. I'm doing some AVX512 testing right now and it seems that Intel found a very sneaky but ingenious way to do wattage throttling. More details on that later. And by "later", I meaning probably a week from now since it's "quite complicated".
  19. Mysticial

    A Favor to Ask: Skylake X and AVX512

    Now THAT's interesting... They also show full-throughput AVX512. That's contrary to what all the articles out there are reporting. (2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 864 GFlops Benchmark shows 872.832 GFlops. If they were only half-throughput, I'd have expected: (1 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (6 cores) * (4.5 GHz) = 432 GFlops Thanks for running all these benchmarks!
  20. Mysticial

    A Favor to Ask: Skylake X and AVX512

    Ram will have no effect on that benchmark. The benchmark is 100% CPU. I was able to calculate your clock speed because the benchmark achieves very close to the theoretical FLOPs on the system. For the Core i9 7900X assuming full-throughput AVX512: (2 FMA/cycle for full-throughput AVX512) * (2 Flops/FMA) * (8 DP/instruction for AVX512) * (10 cores) * (4.5 GHz) = 1440 GFlops The benchmark is showing 1443.84 GFlops. It's actually slightly more than the theoretical limit because of timing variations.
  21. Mysticial

    A Favor to Ask: Skylake X and AVX512

    Wow! Over 1 TFlops for double-precision! CPUz doesn't seem accurate in that screenshot. But based on the numbers it looks like you were clocked around 4.5 GHz? Possibly in 100 x 45 configuration? And it didn't melt?
  22. Mysticial

    A Favor to Ask: Skylake X and AVX512

    That's good to see! Full output? Though I'm seeing rumors that the integer throughput will not be doubled. And I can see architecturally why that might be. Unfortunately I don't have a benchmark for that.
  23. Mysticial

    A Favor to Ask: Skylake X and AVX512

    Would you be able to try with the latest binaries? I updated them last night. As far as I can tell, I've removed the check. So it should get past that message and either run successfully or crash. Thanks for you time.
  24. Mysticial

    A Favor to Ask: Skylake X and AVX512

    I found a way to disable that check by the compiler and I've updated the binaries. So if anyone is willing to try now, it should (hopefully) work regardless of whether RDSEED is enabled or not. Thanks.
×