Jump to content
HWBOT Community Forums

[RANT] How to make sure your awards/reviews are totally meaningless


Massman

Recommended Posts

  • Replies 124
  • Created
  • Last Reply

Top Posters In This Topic

  • 1 year later...
Necro-bump here... Hardware heaven is complaining about other websites' reviews:

 

http://www.hardwareheaven.com/reviews/1699/pg1/why-many-gpu-reviews-are-not-fit-for-purpose-article.html

 

 

Our article aside, do you agree sites that recycle results from past reviews across various driver versions/systems and utilise time demos give an accurate picture of performance?

 

Regarding the two sites singled out here, HH and KG. The editor at KG was in fact the previous editor at DriverHeaven, which we rebranded HardwareHeaven (We considered AwardHeaven as suggested here but it simply wasn't catchy enough) and he was responsible for creating ours and presumably KG's award model. So it was inherited by us, when he was removed from his position and we re-launched. We have made tweaks to the scoring system and have looked several times at removing the scores all together. But there are people out there who do like to look at overall scores as well. So a full overhaul is still in the works.

 

Any feedback is appreciated. One change we did make and that was highlighted above, was that we no longer give out more than one award, originally it was set out that 10/10 for value/performance/overall would equal an award, but now we will choose the overall more appropriate award, hence why you see a motherboard receive gold/performance and then one - reviewed later receiving only gold for the same scores.

 

It's not a perfect system by far, and we're always looking for ways to improve it.

Link to comment
Share on other sites

Stop giving out awards which promote the products and make them sound super special, when in fact if you look at all reviews done at HH everyone gets a reward/award just for participating. Also stop assigning arbitrary numbers, in the range of "8-10 out of 10" for every product you review. If everything gets a score of 8-10, then your ratings should range from 1 to 3 instead of being inflated. Yes, users like to see numbers that summarize the result, but your rating is a disservice because it is artificially inflating the perception of every product. If you insist on providing numbers on the review, do so in a manner such as that done by JonnyGuru - it isn't perfect, but its the most well executed approach out there for number based product ratings in my opinion. They at least have a structure to how the number is developed.

 

I get the sense from comments I've seen from you and others there that you genuinely have the best intentions, but these two items mentioned above, as well as the especially aggressive advertising on your site, can yield an impression that you are pandering to the manufacturers.

 

Also, this latest article gets somewhat laughable in the part where it addresses what users can expect in real world conditions. If you want to represent real world conditions, you should be running a main stream platform with a main stream CPU. That means Z77 and 3570/3770K, not X79 with a 5GHz IB-E. I understand the point you offered about X79 removing bottlenecks, but a 5GHz IB-E isn't real world for most your audience, and effectively could lead to some numbers being exaggerated compared to what users can actually expect. Potential bottlenecking can only be shown to be relevant with multi GPU setups.

 

These are just some basic suggestions, without any filter, and I'm trying not to pass judgement to be fair to you. I hope you don't take offense. But as long as you continue doing what you are doing, it puts you on especially bad footing to write a piece like this most recent article. It looks bad as its ironic looking from the outside, considering how you do ratings and awards.

Edited by I.M.O.G.
Link to comment
Share on other sites

Stop giving out awards which promote the products and make them sound super special, when in fact if you look at all reviews done at HH everyone gets a reward/award just for participating. Also stop assigning arbitrary numbers, in the range of "8-10 out of 10" for every product you review. If everything gets a score of 8-10, then your ratings should range from 1 to 3 instead of being inflated. Yes, users like to see numbers that summarize the result, but your rating is a disservice because it is artificially inflating the perception of every product. If you insist on providing numbers on the review, do so in a manner such as that done by JonnyGuru - it isn't perfect, but its the most well executed approach out there for number based product ratings in my opinion. They at least have a structure to how the number is developed.

 

What areas/weighting would you assign to GPU review's if utilising a method like Johnny's? Would you keep the same Performance(40%), Value(20%), Functionality(20%) and Build Quality(20%)?

 

I get the sense from comments I've seen from you and others there that you genuinely have the best intentions, but these two items mentioned above, as well as the especially aggressive advertising on your site, can yield an impression that you are pandering to the manufacturers.

 

Which aggressive adverts? (e.g. Anand today has a site skin (misco), banner (ocz... and that for a site known for its SSD reviews, 2x squares, Microsoft and Supermicro for me. That's all above the fold.

 

Adverts are a necessary evil, we try and keep relevant adverts on the site, and sometimes there are clashes, for instances it's no coincidence that AMD will launch a campaign on the day of an NV launch and vice-versa. Similarly a company will want a campaign to run along side their product launch. As for placement our current layout isn't very forgiving and we are actively working to improve this.

 

Thank you for the feedback, we do genuinely want to hear feedback, today is the first time I've discovered this topic, I wish I'd seen it 2 years ago. So I guess something I can be thankful for to bassman ;)

Link to comment
Share on other sites

stop Giving Out Awards Which Promote The Products And Make Them Sound Super Special, When In Fact If You Look At All Reviews Done At Hh Everyone Gets A Reward/award Just For Participating. Also Stop Assigning Arbitrary Numbers, In The Range Of "8-10 Out Of 10" For Every Product You Review. If Everything Gets A Score Of 8-10, Then Your Ratings Should Range From 1 To 3 Instead Of Being Inflated. Yes, Users Like To See Numbers That Summarize The Result, But Your Rating Is A Disservice Because It Is Artificially Inflating The Perception Of Every Product. If You Insist On Providing Numbers On The Review, Do So In A Manner Such As That Done By Jonnyguru - It Isn't Perfect, But Its The Most Well Executed Approach Out There For Number Based Product Ratings In My Opinion. They At Least Have A Structure To How The Number Is Developed.

 

I Get The Sense From Comments I've Seen From You And Others There That You Genuinely Have The Best Intentions, But These Two Items Mentioned Above, As Well As The Especially Aggressive Advertising On Your Site, Can Yield An Impression That You Are Pandering To The Manufacturers.

 

Also, This Latest Article Gets Somewhat Laughable In The Part Where It Addresses What Users Can Expect In Real World Conditions. If You Want To Represent Real World Conditions, You Should Be Running A Main Stream Platform With A Main Stream Cpu. That Means Z77 And 3570/3770k, Not X79 With A 5ghz Ib-e. I Understand The Point You Offered About X79 Removing Bottlenecks, But A 5ghz Ib-e Isn't Real World For Most Your Audience, And Effectively Could Lead To Some Numbers Being Exaggerated Compared To What Users Can Actually Expect. Potential Bottlenecking Can Only Be Shown To Be Relevant With Multi Gpu Setups.

 

These Are Just Some Basic Suggestions, Without Any Filter, And I'm Trying Not To Pass Judgement To Be Fair To You. I Hope You Don't Take Offense. But As Long As You Continue Doing What You Are Doing, It Puts You On Especially Bad Footing To Write A Piece Like This Most Recent Article. It Looks Bad As Its Ironic Looking From The Outside, Considering How You Do Ratings And Awards.

 

Ib-e? ;)

Link to comment
Share on other sites

What areas/weighting would you assign to GPU review's if utilising a method like Johnny's? Would you keep the same Performance(40%), Value(20%), Functionality(20%) and Build Quality(20%)?

 

I wouldn't use that method so I can't say, that would be for someone to decide who thinks assigning arbitrary points is a good idea in the first place. If one were actually going to fix it, they may consider polling the audience and see where their values are, and then weight scores accordingly. Still pretty difficult however because with video cards and CPUs scores are typically based on abstracted final performance, whereas with PSUs their quality is determined by electrical characteristics which can be measured discretely, as well as component quality can be evaluated for quality/reputation. Scoring by points is just a bad fit for CPU/GPUS, too arbitrary, even if you score based on priorities of your audience, which will likely change over time.

 

Which aggressive adverts? (e.g. Anand today has a site skin (misco), banner (ocz... and that for a site known for its SSD reviews, 2x squares, Microsoft and Supermicro for me. That's all above the fold.

 

HH looks far more aggressive to me. Your homepage has a large ad block in the content area almost the same size as featured content block, and bigger text than featured content. With that, the leaderboard, and the skyscraper, as well as your own driver cleaner below also in the content area, as well as the background takeover... Combined it is overwhelming. I think part of what I see as a problem is the styling of your site blends with the styling of the advertising - you can't hardly tell where one stops and the other starts.

 

From beginning to end, separation of promotional content from editorial content is a problem. Never the twine should mix. But with the aggressive advertising, as well as the inflated product ratings, then also the awards thrown around like crazy... It looks systematic moreso than incidental.

 

My tastes aren't everyone's however. But I know its harder to get outside perspectives and input than it is to get input from your fans. Just my .02. ;)

 

 

Ib-e? ;)

 

Good catch. ;)

Link to comment
Share on other sites

Hey IMOG!! Where is the author of that thread now?

Massman should argue directly against Craig , even if I can admit his firsts posts say it all...

 

I'm reading the thread, but there's no need for me to discuss or argue. Everything I wanted to say/show is in the opening post. The following quote made me smile, though.

 

The editor at KG was in fact the previous editor at DriverHeaven, which we rebranded HardwareHeaven (We considered AwardHeaven as suggested here but it simply wasn't catchy enough) and he was responsible for creating ours and presumably KG's award model.
Link to comment
Share on other sites

Even benchmark versions, BIOS versions, OS builds, and everything has an impact. Drivers are a huge thing for VGA reviews, so you use the same driver for all reviews until you need a new driver, and if you need to use a new driver then you re bench the whole thing. You guys make so much from the site advertising you better have nothing better to do all freaking day than re bench GPUs. If you don't then well, you are a pitiful disgrace. All review sites should be held to that standard, it isn't something special, its something you should do by default. It is something that should have been thought out before you accepted any sample.

Link to comment
Share on other sites

Even benchmark versions, BIOS versions, OS builds, and everything has an impact. Drivers are a huge thing for VGA reviews, so you use the same driver for all reviews until you need a new driver, and if you need to use a new driver then you re bench the whole thing. You guys make so much from the site advertising you better have nothing better to do all freaking day than re bench GPUs. If you don't then well, you are a pitiful disgrace. All review sites should be held to that standard, it isn't something special, its something you should do by default. It is something that should have been thought out before you accepted any sample.

 

Good words there Steve. I would also add it depends on memory settings (I know for a fact one reviewer does not enable XMP, as they have used motherboards where XMP doesn't work, but lists in the test bed it's some ultra high end kit), what software you have installed (via Steam or Origin or Disk) and what motherboard software is also there.

 

For reference I've been solely on Catalyst 12.3 (March 2012) and NVID 296.10 drivers for the GPU portion of mobo reviews since Ivy launch. I am upgrading now to 13.1 and 310.90, but I am going back through the last 4/5 Intel and AMD platforms in 1/2/3/4 GPU configurations to generate a f*ckton of back data for our readers. Currently 3 weeks in alongside doing other things, many CPUs tested, many platforms tested, over 500 data points.

 

There's also statistical variation - I would hedge a bet that some reviewers only run the scene once. You really need to run a scene multiple times (at least 4) then take an average, or remove top and bottom scores then take an average. Then add statistical variance / standard deviation, if we were doing this with more professionalism. A histogram would be even better.

 

Having the benchmarks with a timedemo is a godsend - it means I don't have to sit over my test bed 24/7 and can still continue doing other reviews. (With every manufacturer wanting 10x reviews done this week, maintaining high throughput but still covering 99% of the bases is important.) Personally I find FRAPsing a portion of the game, even if it is the same portion, to be frustrating and not representative of final performance. If you're recording a regular dull portion of the game repeatedly, that's not fair on readers - if it's a really active game, then you're screwed because you can't keep things consistent. Minimum frame rates can vary wildly depending on the scene as well, or if XYZ software in the background decides to update itself/probe network activity. Yes, it's true that some timedemos can also not be fully representative - but the user can run a public timedemo themselves if they want to see where they stand.

 

There is one issue with timedemos I will touch upon. The Metro2033 timedemo, for example, runs better the first time you run it. In about 50% of the first runs, the results can be up to 3% better than normal. For this benchmark, I typically take the average FPS of four runs. Because of this issue, I do two sets of my four runs, and only take the second set of four (or third set if there's a large variation in results).

Edited by borandi
Link to comment
Share on other sites

Yea Ian is right, running benchmarks really is tougher than it seems, especially GPU ones. What I hate most is FRAPs, it really is a pain trying to get consistency and it really is annoying. Especially games like BFBC2 where you have to wait to play a level over and over again is so frustrating, then you hit points in the game where you aren't really playing SP and its a movie with insane FPS. I wish all games had built in benchmarks...

 

What I wish for one day is to be able to write a script to run benchmarks for me, it is such a pain, IMO benchmarks for motherboards take me the longest to do in a mobo review. They are sometimes also the least meaningful thing because a simple BIOS update can toss everything out the window, especially when you are reviewing the board right in its infancy. I have 3 test systems set up and the only reason i need 3 instead of 1 is because of the time it takes to run benchmarks. Its only useful for benchers, and for telling about issues with the board's performance.

 

Most sites have a set of standards they have their reviewers use, I have seen them and many are good, but whether a reviewer actually runs the benchmark more than once or updates the versions of their benchmarks and then re-benches is totally up to the reviewer. So the best way to tell what is what is to compare many sites, especially for GPUs and CPUs where benchmarks are everything.

Link to comment
Share on other sites

  • Crew

My issue with HH is especially with the CPU reviews that the benchmark suite is adapted to make the sample shine. Must be one of the sole websites that gave the FX-8150/8350 a silver/gold award. By all means they are not bad CPUs, but there's far better stuff out there. At first glance it looks like HH writes what the PR boys want. Denying at that moment you guys didn't follow the reviewers guide, was the silliest thing to do as I had it laying right in front of me.

 

I want you guys to do a Corsair H90 review for honesty's sake, till now I know only two websites that got the conclusion nailed correctly.

 

Secondly I want to test that 2666C10 kit that runs 3000MHz 11-12-12-2t at 1.75 volt. Any sign of stability or just a suicide screenshot ? Never ever is there any proof of stability... either ditch it or do it the right way.

 

Similar case with the Kingston Hyper Beast 8GB 2400C11 dimms doing 2800MHz ? Darn I got the bad samples again...

 

Oh btw your forum admins are magnificent, kuddos to them. Any decent input/criticism on the articles gets rapidly censored/hammered,...

 

 

I must admit the site looks very nice, but the information spread always put's a too sunny light on the tested product. Looking at the HH rating system, there is close to no bad product out there...

 

Now bashing other sites work methods and telling you guys are doing it all right just put a big smile on my face ! HH is a big marketing machine and if it works for you guys, great. But plz don't tell other people they are doing it wrong...

Edited by Leeghoofd
Link to comment
Share on other sites

Most gaming benchmarks are equal with 0 once you run the multiplayer part. While in SP your CPU might not have been of any help, in MP it's all that it matters. Not to mention some games have separate game engines for SP and MP modes...

 

The issue then becomes how to consistently run MP, as the scenes always change and then you have to take the internet connection into consideration.

Link to comment
Share on other sites

The issue then becomes how to consistently run MP, as the scenes always change and then you have to take the internet connection into consideration.

 

Ofcourse, im not suggesting that someone should do that.

Just pointing out that people should not read too much into the numbers presented in GPU benchmarks as it will usually be far from truth.

 

Honestly, who would of finished Metro2033 if we were to judge the expected performance by the numbers presented in its benchmarks :P

 

Anyway, back on the topic: awards are bad, even if they were awarded by a perfect algorithm.

Link to comment
Share on other sites

  • 2 months later...

So I got this news in from TweakTown about their latest memory review. Turns out all you have to do to be considered "must buy" is ... make a product. Any product will suffice. So yeah, another site I will no longer read ;)

 

- 90% - MH - Corsair Vengeance Pro PC3-14900 16GB

- 95% - MH - Kingston HyperX Limited Edition PC3-19200 16GB

- 95% - MH - ADATA XPG Gaming Series V2.0 PC3-19200 16GB

- 95% - MH - Kingston HyperX Beast PC3-19200 16GB

- 98% - MH - GeIL EVO Leggera PC3-14900 16GB

- 95% - MH - GeIL EVO VELOCE PC3-17000 16GB

- 99% - MH - Corsair Dominator Platinum PC3-22400 16GB

- 98% - MH - Patriot Viper 3 PC3-15000 8GB

- 98% - MH - GSkill TridentX PC3-20800 8GB

- 96% - MH - Kingston HyperX Predator PC3-19200 8GB

- 85% - Kingston HyperX T1 PC3-22400 4GB (this kit must be really bunnyextraction ...)

- 99% - MH - Corsair Dominator Platinum PC3-17066 16GB

- 97% - MH - ADATA XPG Xtreme Series PC3-17000 8GB

- 98% - MH - Corsair Dominator Platinum PC3-21300

- 98% - MH - Patriot Viper 3 Intel Extreme Masters Memory Limited Edition PC3-15000 16GB

- ...

 

And so on. I can go down the review list even further, but the trend is very clear. Seems like these awards are, again, super meaningless. But wait! They have a section called "What do TweakTown Awards and Ratings mean?". I am curious!

 

tweakTown.jpg

 

Yeah ...

Edited by Massman
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...