BenchMate Integration

_mat_ · August 28, 2019

I'd like to invite all active benchmark developers to discuss integration into BenchMate. If you have any technical questions about BenchMate or the integration, please feel free to ask here.

In general there are two ways of integration:

1) External Integration

A DLL is injected into the benchmark's process and all of its child processes. Several WIN32 and CRT functions are hooked as needed to inject accurate timing and file integrity hashing as well as result capturing.

2) SDK integration

One of the next releases will bring native BenchMate integration. The SDK will be available for C/C++ first as a static library. There will be several classes available to help you use BenchMate's features in your benchmark. Reliable timer functions, secure string classes, installation verification, file integrity hashing and much more. Additionally, the SDK integration lets BenchMate know when a run is started, ended or canceled and securely transfer results to the client for validation. No DLL injection is necessary here.

Let's get this going!

@Mysticial @havli

Edited August 28, 2019 by _mat_

Mysticial · August 28, 2019

Interesting. Both of these are bit more intrusive than I thought! I'll also mention specifics for y-cruncher since I maintain that benchmark.

DLL Injection:

I don't know enough about DLL injection to fully comment on it. But I'm guessing this is aimed at all the "frozen" benchmarks that are no longer maintained?

The main problem with this approach is that this essentially subverts the timers for the benchmark. So if BenchMate can do it, then any other program (including a cheat tool) can do it as well. If the validation and HWBOT submission is still handled by the benchmark, then this approach is very vulnerable. You can just DLL inject your own timers and completely fool the program into sending a bad score to HWBOT.

(Translation: Everything we have now is completely broken. Yes we already knew that.)

Thus, it seems that the only source that can be trusted to submit a score to HWBOT is BenchMate itself. But BenchMate doesn't know when the benchmark started or ended unless it tries to parse program output.

For y-cruncher, this will work conceptually since it has a parse-able output that can be trusted (the validation file). But if not done correctly, it will conflict with y-cruncher's own internal protections. My guess is that if you run y-cruncher with BenchMate DLL-injection, then you change the base clock in a way that skews the timers, BenchMate will report the correct score. But y-cruncher itself will detect a clock skew and block the score.

This is probably fixable if we pull the right strings.

SDK Integration:

This is arguably the better approach, but will obviously require non-trivial modifications to the benchmark. So it automatically rules out all the "frozen" benchmarks.

For y-cruncher, this is tricky in a different way since it requires taking on an external dependency. Since y-cruncher isn't a dedicated benchmark, I generally don't allow taking dependencies. The exceptions I allow are for very self-contained ones like Cilk and TBB that don't require privs and are portable across both Windows and Linux.

So it would need to either be done in a wrapper, or some "official" DLL side-load into the main binaries that is officially supported by the program. Both approaches are messy. In either case, I would need to mock out the relevant timers.

For the wrapper solution, the "trusted" timers are only available in the wrapper (where BenchMate lives). Every important timer call will require a secure RPC over either SHM mapping or TCP. y-cruncher currently has a TCP stack, but it's for a different purpose and it's not secure.

Other Thoughts:

From a scalability perspective, it might be worth considering going one step further with BenchMate and put the HWBOT integration into it. Each benchmark then gives a score (and relevant metadata) to BenchMate and it will handle everything from there. This will also allow BenchMate to append any additional metadata that the benchmark doesn't track.

That way everybody doesn't have to reimplement the same thing. Likewise, everybody doesn't have to update it when the HWBOT API/protocols change. And validation bugs and vulnerabilities only need to be fixed in one place. (the flip side being that any vulnerability that is found will likely apply to all benchmarks)

Edited August 28, 2019 by Mysticial

_mat_ · September 3, 2019

@Mysticial

Sorry for the delay. I was very busy finishing BenchMate 0.9(.1).

On 8/28/2019 at 11:27 PM, Mysticial said:

I don't know enough about DLL injection to fully comment on it. But I'm guessing this is aimed at all the "frozen" benchmarks that are no longer maintained?

Yes, either frozen or currently "out of reach". It's still very early for BenchMate, so approaching well-known developers like Futuremark/UL or Geekbench with the new concept would be very risky (also for them).

On 8/28/2019 at 11:27 PM, Mysticial said:

The main problem with this approach is that this essentially subverts the timers for the benchmark. So if BenchMate can do it, then any other program (including a cheat tool) can do it as well. If the validation and HWBOT submission is still handled by the benchmark, then this approach is very vulnerable. You can just DLL inject your own timers and completely fool the program into sending a bad score to HWBOT.

(Translation: Everything we have now is completely broken. Yes we already knew that.)

Exactly! Everything we have can be easily broken and I've done this during research for BenchMate. For example all versions of 3DMark and PCMark can be hijacked to run with unnoticed settings like LOD, different resolution and much more. I dug deep into Intel's XTU as well and written a full analysis on its security vulernabilities. For XTU they made some really horrible choices.

On 8/28/2019 at 11:27 PM, Mysticial said:

Thus, it seems that the only source that can be trusted to submit a score to HWBOT is BenchMate itself. But BenchMate doesn't know when the benchmark started or ended unless it tries to parse program output.

For y-cruncher, this will work conceptually since it has a parse-able output that can be trusted (the validation file). But if not done correctly, it will conflict with y-cruncher's own internal protections. My guess is that if you run y-cruncher with BenchMate DLL-injection, then you change the base clock in a way that skews the timers, BenchMate will report the correct score. But y-cruncher itself will detect a clock skew and block the score.

This is probably fixable if we pull the right strings.

BenchMate uses different low-level timer facilities like HPET, that are accessed via its own kernel driver (which does much more btw). These low-level timers are then either injected or tracked with BenchMate and correctly verified by multiple time sources (not API functions!). In the end BenchMate does not use any timers that are linked to the bclock/ref clock.

How do you detect clock skews in y-cruncher? Which timer functions are you using?

On 8/28/2019 at 11:27 PM, Mysticial said:

So it would need to either be done in a wrapper, or some "official" DLL side-load into the main binaries that is officially supported by the program. Both approaches are messy. In either case, I would need to mock out the relevant timers.

For the wrapper solution, the "trusted" timers are only available in the wrapper (where BenchMate lives). Every important timer call will require a secure RPC over either SHM mapping or TCP. y-cruncher currently has a TCP stack, but it's for a different purpose and it's not secure.

The wrapper is written in Java, so it's easily reversable and debuggable. I had a quick look just yet and although it's minimized, it's still very clear what is happening. Sadly, it would be easy to get the encryption key as well. But not as easy as it was with HWBOT Prime and the x265 wrapper - that was a matter of minutes.

What I'm trying to say is, that passing data through the wrapper isn't very secure. It would be best to directly grab it from the inner benchmark executable. This way it doesn't matter what language the wrapper is written in, it's just a nice way to configure y-cruncher and display insecure, but eye-catching results. To do this I suggest some kind of one-way protocol to transfer readonly data to BenchMate. That could be done with accesses to files, that don't exist but will be caught by my hooks into the process. As for the timing functions, if you don't want to include BenchMate as a dependency to y-cruncher, they need to be "injected" in some way.

Well, maybe DLL injection isn't the worst way to do this and it wouldn't break anything on your side as long as these API calls are happening (any call to QPC for example, and the CreateFile/fopen invocations for beginrun, endrun and result).

On 8/28/2019 at 11:27 PM, Mysticial said:

Other Thoughts:

From a scalability perspective, it might be worth considering going one step further with BenchMate and put the HWBOT integration into it. Each benchmark then gives a score (and relevant metadata) to BenchMate and it will handle everything from there. This will also allow BenchMate to append any additional metadata that the benchmark doesn't track.

That way everybody doesn't have to reimplement the same thing. Likewise, everybody doesn't have to update it when the HWBOT API/protocols change. And validation bugs and vulnerabilities only need to be fixed in one place. (the flip side being that any vulnerability that is found will likely apply to all benchmarks)

Exactly! The wrapper is too transparent for this kind of tasks anyway and I guess it's best if only one of us has to struggle with the peculiarities of the HWBOT submission API.

Edited September 3, 2019 by _mat_

havli · September 3, 2019

@_mat_

At this time I have no plans for future x265 development... Other than perhaps reuploading updated package with recent cpu-z version once in a while, like I did this year after Ryzen 3000 launch. The main executable is more than 1.5 years old and considering I see barely any problems or complains... it means either people don't care ? or everything works reasonably well.

If you wish to integrate x265 into BenchMate using the DLL injection method, I have no objections. ? However doing new version with native BenchMate integration seems like to much work, especially with testing and validation (which I always did with each x265 version) on many different platforms, even the obscure ones and on each of them multiple OS. That is much effort and time which I don't want to spend this way.

_mat_ · September 3, 2019

Thanks for taking the time to reply here, havli.

I will look into it. If it can work in a reliable way, I will use the DLL injection method to hijack everything around the x265 executable.

A small question: Is the timing shown in the wrapper taken from the executable or does the wrapper do the timing itself (so the times differ)?

havli · September 3, 2019

You mean the elapsed and remaining time?

image.png.36d899e80a8d9946ad1f011df6c40b3e.png

Elapsed time is measured directly by the wrapper, using Java nanoTime() function. And remaining time is prediction based on the amount of frames finished and actual framerate.

The x265 encoder executable also measures time (internally, but reports it as fps). However this is the "wall clock time"... so less accurate and vulnerable to system time manipulation. In earlier versions of the benchmark, the score was taken directly from the encoder. But after some cheated scores were discovered I changed the time measuring to use nanoTime() and calculate the fps myself.

Mysticial · September 3, 2019

Quote

How do you detect clock skews in y-cruncher? Which timer functions are you using?

Now that I think about it, it would only detect it if the clock skew happens mid-bench. It tries to reconcile rdtsc with multiple wall clocks and will flag any benchmark where they get out-of-sync. So if the wall clocks are coming from BenchMate, but rdtsc gets skewed mid-bench, it will block the score. While rdtsc can be skewed, it's not easy to fake unless the system is being virtualized (which it heuristically detects, but the information isn't used atm). While you can play with the rdtsc offsets (which also requires kernel mode), I can't think of a useful manner to exploit it without doing backwards jumps which is easy to detect from the benchmark. (though I currently don't try to detect it) In any case, there are bigger problems if the attacker has kernel-level access.

(Very) old versions of y-cruncher would try to rec the measured rdtsc frequency with what the OS measured it at boot time. This blocked a number of hacks back in the Win7 days, but it also produced too many false positives. So I disabled that check.

Quote

The wrapper is written in Java, so it's easily reversable and debuggable. I had a quick look just yet and although it's minimized, it's still very clear what is happening. Sadly, it would be easy to get the encryption key as well. But not as easy as it was with HWBOT Prime and the x265 wrapper - that was a matter of minutes.

Yep, that wrapper was never meant to be fully secure. It's only in Java because I have no experience with GUIs in C++. The vision I have is a full C++ rewrite that would integrate fully with y-cruncher with Slave Mode and provide a fully interactive GUI. All the HWBOT/validation stuff would be in it thus leaving the raw y-cruncher binaries clean. But I'm never gonna have the time for that.

(continued in a PM as I don't want it public)

Quote

As for the timing functions, if you don't want to include BenchMate as a dependency to y-cruncher, they need to be "injected" in some way.

Is there a way to detect dependency injection and verify that it's trusted?

------

The timing aspect can be made fundamentally secure under the assumption that the benchmark binary hasn't been compromised. Do the timing server-side on HWBOT. Then use public-key encryption to match the start/end messages. Just make sure the benchmark runs long enough that network delay has negligible impact.

You could theoretically write a fundamentally secure crypto benchmark by means of a crypto time capsule. (https://en.wikipedia.org/wiki/LCS35) The server sends the problem parameters. The benchmark is to solve the problem and send it back to the server. The timing is done server side.

Edited September 3, 2019 by Mysticial

_mat_ · September 4, 2019

18 hours ago, havli said:

Elapsed time is measured directly by the wrapper, using Java nanoTime() function. And remaining time is prediction based on the amount of frames finished and actual framerate.

The x265 encoder executable also measures time (internally, but reports it as fps). However this is the "wall clock time"... so less accurate and vulnerable to system time manipulation. In earlier versions of the benchmark, the score was taken directly from the encoder. But after some cheated scores were discovered I changed the time measuring to use nanoTime() and calculate the fps myself.

That could pose a challenge, because the I guess your fps calculation and the internal one of the x265 encoder can be different (even without skewing), right?

The BenchMate integration would fix the timing inside the x265 encoder, no doubt about that. So the value in the encoder would be the correct one afterwards. Any chance we can find a solution together to change that behaviour when the integration is active? Details about that might be better via private message.

_mat_ · September 4, 2019

17 hours ago, Mysticial said:

Now that I think about it, it would only detect it if the clock skew happens mid-bench. It tries to reconcile rdtsc with multiple wall clocks and will flag any benchmark where they get out-of-sync. So if the wall clocks are coming from BenchMate, but rdtsc gets skewed mid-bench, it will block the score. While rdtsc can be skewed, it's not easy to fake unless the system is being virtualized (which it heuristically detects, but the information isn't used atm). While you can play with the rdtsc offsets (which also requires kernel mode), I can't think of a useful manner to exploit it without doing backwards jumps which is easy to detect from the benchmark. (though I currently don't try to detect it) In any case, there are bigger problems if the attacker has kernel-level access.

(Very) old versions of y-cruncher would try to rec the measured rdtsc frequency with what the OS measured it at boot time. This blocked a number of hacks back in the Win7 days, but it also produced too many false positives. So I disabled that check.

I have written a hypervisor during my research that does exactly that. Catch RDTSC(P) instructions, count them and skew it as necessary. It is a lot of effort though and with proper hypervisor detection inside the benchmark/BenchMate, it's much worse of course. That's why I also have a hypervisor detection code in there.

There are two main problems with RDTSC as I see it and that's why I'm not using it. First of all it skews with the bclock/reference clock on Pre-Skylake and AMD CPUs, so it's not reliable at all. Secondly, it's hard to get the correct TSC frequency on AMD. There is no CPUID register that reports the correct frequency as far as I'm aware - at least on Ryzen - (and even on Intel it's not that easy), so you have to get the frequency yourself. I've seen really bad implementations to do that out there and they are far from valid in my opinion. And even if they would be correct, you have to assume that reference clock changes will occur and make RDTSC invalid.

I can't recommend comparing RDTSC against other timers on Windows 8/10 as well because nearly every timing method will either depend on RDTSC or the LAPIC timer and both have the same time source, that will skew with the reference clock. The only exception is HPET, but that needs to be enabled for the whole system, which can have a severe performance impact on CPUs with high core count and especially on Kaby Lake X and Skylake X due to a very slow HPET implementation. I've written thorougly about this here if you are interested.

Most of the things I've written above are also valid for Linux and MacOS btw, but it's much harder there to change your bclock/reference clock in the OS and that's why it's not considered a thing there.

17 hours ago, Mysticial said:

Is there a way to detect dependency injection and verify that it's trusted?

There is not authentication currently. As the favored integration I outlined above is a one-way communication of the benchmark to signal certain kind of actions (beginrun, endrun, result) with information that is publicly shown as well (the result for example), I don't think we need it for now. It's different if we want to transfer something secretly, but that would need a lot more effort. To be honest I haven't found a way yet, that can't be reverse engineered and faked easily. Shared Memory File Mapping is horrible in that aspect for example.

17 hours ago, Mysticial said:

The timing aspect can be made fundamentally secure under the assumption that the benchmark binary hasn't been compromised. Do the timing server-side on HWBOT. Then use public-key encryption to match the start/end messages. Just make sure the benchmark runs long enough that network delay has negligible impact.

You could theoretically write a fundamentally secure crypto benchmark by means of a crypto time capsule. (https://en.wikipedia.org/wiki/LCS35) The server sends the problem parameters. The benchmark is to solve the problem and send it back to the server. The timing is done server side.

It would be the ultimate solution to move as much code as possible to the server-side, but that would need a system to be always online. That's a compromise that might be a little too much for the time being. I think that BenchMate as it is brings a lot of changes with it and that can be difficult to adapt to for overclockers on here. I guess there will be a time for "always online" at some point in the future, but for now it's important to find some middle-ground to get this new standard for benchmarks going.

Mysticial · September 4, 2019

4 hours ago, _mat_ said:

There is not authentication currently. As the favored integration I outlined above is a one-way communication of the benchmark to signal certain kind of actions (beginrun, endrun, result) with information that is publicly shown as well (the result for example), I don't think we need it for now. It's different if we want to transfer something secretly, but that would need a lot more effort. To be honest I haven't found a way yet, that can't be reverse engineered and faked easily. Shared Memory File Mapping is horrible in that aspect for example.

If we go by the underlying assumption that the binaries on both ends are sufficiently difficult to compromise/reverse-engineer, then public key encryption on all traffic going both directions?

You would need to hard code the public keys of each side in the other's binaries as you can't transfer that over the wire.

_mat_ · September 4, 2019

What data would you need to transfer that needs encryption?

Mysticial · September 4, 2019

30 minutes ago, _mat_ said:

What data would you need to transfer that needs encryption?

Oh, I was just answering the part about being able to transfer data between the two components in a way that cannot be read, cannot be altered, and cannot be faked.

But really, all traffic needs to be this way if the two binaries are not sharing the same address space. (and even if it is, it's still vulnerable to memory modification by means of a debugger if you don't obfuscate it)

Even the one-way messages need to be protected as you could otherwise fake them and send an "end" message early to fake a faster score. I'm imagining an attack where the attacker side-loads another DLL into the benchmark binary. This malicious DLL would then spin up a separate thread (to avoid interfering with the benchmark itself) which can then call the "signal end" function in BenchMate directly. That API would need to be documented anyway. So the attacker would know exactly what to call it with. And I imagine it wouldn't be difficult to find the address for the function as you could scan the binary and look for a similar footprint in the loaded application. (alternatively, do a symbol lookup if BenchMate is linked dynamically) If there are issues with calling "signal end" a 2nd time when the benchmark finishes for real, the DLL could probably fairly easily remap the memory of the function and point it somewhere else to trap the "real" call at the end of the benchmark.

Without being a real expert in this area, all of this would be possible without needing to reverse-engineer either the benchmark or BenchMate.

_mat_ · September 4, 2019

That's true, these API calls can be faked. But I can track that the call comes from inside the executable and BenchMate additionally does various debugging checks and avoids access the processes's memory. And to check that the executable itself has not been changed, a file integrity check is done and the hash part of the result dialog.

14 minutes ago, Mysticial said:

Oh, I was just answering the part about being able to transfer data between the two components in a way that cannot be read, cannot be altered, and cannot be faked.

An impossible task so to speak. But you are right, it can be hardened to make it very difficult to reverse and fake. From experience I try not to store any kind of key in the executable, it can always be found/reversed.

These are both viable ways, my first proposal will be easier for you, but harder for me to secure. The second way will take some time for us to figure it out and much more intrusive. Don't forget that we can't do that from the Java wrapper, it has to be inside y-cruncher.

cbjaust · September 5, 2019

Can the geekbnech browser distinguish between a vanilla run and a run using BenchMate integration? Also re: wPrime, the author has up to v2.10 dated 08-Jul-2013 would you include the newest version? Also the author of wPrime is somewhat active on the OCAU forums, he is user wwwww there (not sure if an account is needed to see member profiles, probably).

Keep up the good work.

Edited September 5, 2019 by cbjaust

yosarianilives · September 5, 2019

I'm going to ask the question that I'm sure is on a few people's mind. If benchmate no longer includes geekbench then how will this affect the teamcup ddr4 stage for geekbench 4 with benchmate? @Leeghoofd

Edited September 5, 2019 by yosarianilives

Leeghoofd · September 5, 2019

we run and finalise the compo as is....

keeph8n · September 5, 2019

afaik, benchmate can still secure Geekbench, but it just can't be included with the BenchMate package. Thats the way I understood it anyway

_mat_ · September 5, 2019

Sadly he wants the whole integration to be gone. I would be okay with not including it in the Big Bundle and would still support it if bugs were to be found. But he doesn't want it at all, and that's why he threatened with legal action.

As my money is on the line here, I'm going to remove it of course. But I didn't want to do anything rash, so I'm postponing that for a little while and wait until our community has discussed this properly.

Edited September 5, 2019 by _mat_

BenchMate Integration

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation