NGADM 2017 R1 Stats Analysis!

Even though at this point, Round 2 of the NGADM has concluded and is currently in its judging phase, I thought it might be cool to write up a newspost doing some analysis of the NGADM scores in Round 1!

------------------------------------------------------------------

First things first: Here's a list of the overall average scores of everyone from Round 1, sorted from highest to lowest. (Bolded and italicized names won their brackets.)

------------------------------------------------------------------

Congrats to Phonometrologist for that crazy crazy high score! Now, let's go deeper into the statistics rabbit hole. Because judging is super subjective, obviously all the judges will have different scales of judging. Perhaps, even though both judges think a song is equally good, SkyeWint's scale might say it's a 7.8, while TaintedLogic might say it's an 8.75. In order to adjust the scores based on these discrepancies, we must first figure out what the judges' scales are, which can be measured through their means (average goodness) and standard deviations (how far they're willing to go from the mean to measure goodness).

Judge | Mean | Standard Deviation

Bosa | 8.083 | 0.917
ChronoNomad | 8.202 | 0.909
EDM364 | 7.843 | 0.992
Samulis | 7.560 | 0.711
SkyeWint | 7.436 | 1.183
TaintedLogic | 8.316 | 0.958

You may wonder why this is significant—this help us understand the different ways that different judges rank songs. For example, Skye's standard deviation was high while Samulis' wasn't, which means that Skye's scores were much more spread out than Samulis'. For Samulis, a good song might be an 8.5 while a bad one is a 6.5, while Skye might see the same good song as an 8.9 and the bad one as a 6.2.

However, because the judges' scales differ so much, the most accurate way to compare the scores against each other is to normalize them. This is a process to make the scores more fair. To do this, I first took each score for each judge and calculated how many standard deviations it was from the judge's mean. (For example, if ChronoNomad were to judge a song as a 9.11, that would be 1 standard deviation above the mean, because if you add one of CN's standard deviations to his mean, you'd get 9.11.) Then, I rescaled all of the judges to have a mean of 8 and a standard deviation of 1. This way, it's a more even comparison of opinions, with factors like leniency or strictness mitigated.

That said, here's what the scores would look like after normalization, as well as an analysis of which ranks changed as a result!

------------------------------------------------------------------

Rank. Score | Artist | Change in position

01. 9.829 | Phonometrologist | -
02. 9.525 | garlagan | -
03. 9.462 | McGorilla42 | +1
04. 9.409 | bassfiddlejones + joshgawaldomusic | -1
05. 9.355 | 1f1n1ty | -
06. 9.286 | etherealwinds | -
07. 9.185 | FinnMK | -
08. 8.913 | midimachine | -
09. 8.846 | johnfn + ethansight | -
10. 8.774 | EvilRaccoon | -
11. 8.742 | LucidShadowDreamer + SnowTeddy | -
12. 8.680 | SoundChris | -
13. 8.639 | Demon-Wolf | -
14. 8.620 | ConnorGrail | +4
15. 8.599 | CloakedSoup | -1
16. 8.561 | Xtrullor + HeliXiX | -
17. 8.533 | JacobCadmus | -2
18. 8.531 | Kor-Rune | +2
19. 8.498 | steelside | -
20. 8.461 | Jabicho + peachymaiden | -3
21. 8.423 | GhostLawyer | +2
22. 8.404 | F-777 | -
23. 8.355 | TolanMusic | -2
24. 8.260 | Papkee | +1
25. 8.252 | MrKoolTrix | -1
26. 8.193 | NahuPyrope | -
27. 8.121 | Noisysundae | +1
28. 8.108 | Adjeye | -1
29. 8.040 | larrynachos | -
30. 7.908 | Lethalix | +2
31. 7.898 | KaixoMusic | -1
32. 7.890 | InYourDreams | -1
33. 7.832 | ThaPredator | -
34. 7.800 | keepwalking | +2
35. 7.796 | JDawg00100 | -
36. 7.778 | Metallica1136 | +1
37. 7.764 | Ceevro | -3
38. 7.733 | Malifex | -
39. 7.694 | AzulJazz | +2
40. 7.671 | AceMantra | -1
41. 7.647 | Spadezer | -1
42. 7.610 | Eviladrianin | -
43. 7.588 | 5TanLey | -
44. 7.537 | Rahmemhotep | +1
45. 7.509 | Techmo-X | -1
46. 7.500 | LunacyEcho | +1
47. 7.487 | TSRBand | +1
48. 7.438 | larrylarrybb | -2
49. 7.368 | Jimmypig | -
50. 7.311 | PomicStone | -
51. 7.288 | ColinMuir | -
52. 7.163 | JeremyKingVA | -
52. 7.163 | DivoFST | -
54. 7.125 | Waidmann | -
55. 6.884 | SourJovis | -
56. 6.862 | RetromanOMG | -
57. 6.708 | Ectisity | +1
58. 6.657 | Veronina | -1
59. 6.584 | DwightFalcon | -
60. 6.250 | JaThu | -
61. 6.101 | DrFunkMonkey | -

------------------------------------------------------------------

Basically, lots of ranks did, in fact, change due to score normalization! Luckily, the matchups were such that there were only two where the outcome changed post-normalization—and funnily enough, they were the two tightest matchups of R1, having to go into the thousandths place to get a winner! In other words, commiserations to AzulJazz and Ectisity, who were beaten by Spadezer and Veronina respectively (but arguably shouldn't have).

A couple other interesting statistical tidbits:

ConnorGrail was bumped up by a whopping four ranks post-normalization, which means the results would've changed had he faced off against CloakedSoup, Xtrullor + HeliXiX, JacobCadmus, or Jabicho + peachymaiden. Luck of the draw there, Connor.
The person whose scores would've changed the most thanks to normalization was garlagan, whose score would've risen by 0.234 points. After that, we have ConnorGrail (+0.218), Phonometrologist (+0.205), 1f1n1ty (+0.200), and McGorilla42 (+0.200).
Almost everyone's scores rose post-normalization (perhaps I should've chosen a lower normalized mean, lol). Only three people's scores actually went down, and not by much: JaThu (-0.001), Veronina (-0.026), and DwightFalcon (-0.053).
It's fun to see which judges are the best at predicting what the overall results will be. No judge actually accurately predicted the outcome of every single bracket, but our most accurate judge was SkyeWint, who was only off on 4 brackets. Bosa, on the opposite end, was off on 10 brackets, which is still a thankfully low amount.

And that's my statistical analysis for the NGADM 2017 Round 1 scores! If you want a copy of the spreadsheet I used to play around with, I've uploaded it here.

If you've made it to the end of this insanely long newspost, thanks so much for reading! All my love to @ChronoNomad et al. for hosting this wonderful tournament, and I'm so excited to follow the NGADM for the next few months and experience all the amazing music I'm sure will be created by its excellent musicians.

Comments

Log in to Comment

McGorilla42 2017-08-02 05:25:29

Wow. That was really interesting! Thanks for all the work you put into it! Will you do it again for the rest of the rounds?

LunacyEcho 2017-08-02 05:25:29

Thanks, McGorilla! I'm planning on it, as long as I can find the time. :P (Although once it gets down to the semifinals and finals, there are so few data points that I may have to get creative...heh)

gadogry 2017-08-02 05:36:56

I totally approve this XD

It's actually pretty rare to have the winner change in TWO pairings after normalizing. IIRC I've only seen one pairing flip in my years of doing the stats.

Thanks for doing the hard (and hopefully fun!) work of putting this together!

LunacyEcho 2017-08-02 05:36:56

Haha I've learned a lot more about statistics in two years so now I actually know what I'm talking about! XD

Yeah I was surprised to see two winners—but then again, not *that* surprised considering the margins between the two pairs originally were 0.002 and 0.004 points (this is not a typo).

Oh it's totally fun! And thank you for paving the way for this post to exist! XD

FinnMK 2017-08-02 08:42:14

Neeeeeeeeeeeeerrrrrrrd.

Just kidding, this is awesome. I love statistics.

LunacyEcho 2017-08-02 08:42:14

my laveneeeeeeeeeeeerrrrrrrrrrrrrrd air balloon will come lift you off your feet

garlagan 2017-08-02 09:21:14

there's only hentai in that .zip; where's the spreadsheet

LunacyEcho 2017-08-02 09:21:14

ohhh wait the OTHER "NGADM '17 R1 Stats' zip oops

ADR3-N 2017-08-02 09:41:16

Neat. Would be interesting to see full stats on the pairings "predicted", but overall, good number fun.

LunacyEcho 2017-08-02 09:41:16

Ahhh I would've included them in the spreadsheet except I did them by hand! :P (Mostly because I couldn't think of an Excel-friendly way to calculate them haha)

IIRC though, the most controversial pairing was SoundChris vs EvilRaccoon, where 4/6 judges slightly preferred SoundChris' piece but the other two loved the raccoon and hated the Chris so much that they tipped the scale in EvilRaccoon's favor! Weeeird.

TaintedLogic 2017-08-02 12:02:02

Really cool work here, LunacyEcho! While I'm not convinced that normalization is a completely fair way of calculating scores, it's totally worth consideration as a fun statistical analysis. :) Hope to see more of these in future rounds!

LunacyEcho 2017-08-02 12:02:02

Thanks, TL! :D Tbh I would still support normalization for two main reasons—accuracy and fairness.

Accuracy | One judge's 8.0 may not be another's 8.0. What each judge is doing is comparing every song against each other, and the difference between judges' likings of different songs is not equal. Without a unified scale, the processing of averaging un-normalized scores is ultimately an inaccurate representation of what they actually think.

Fairness | Each competitor should be given an equal shot, and similarly, each judge should be given the same weight. However, without normalization, the judges can actually have different weights on the scores, making some judges more *valuable* (for lack of a better word) than others in terms of deciding the overall scores. This upsets the inherent idea that of the six judges, each should have a fair and equal say in the scoring process.

However, you're totally right that normalization does have its drawbacks! imo one of the biggest reasons NOT to use normalization is that there's actually a score cap of 10. Because judges are only human, and because there will always be a better song (no matter *how* good a song is), the scores get more iffy the closer they get to a perfect 10. This means that the judges' scoring scales may not be linear, which is an assumption necessary for normalization to be accurate.

...wow, I ended up typing a lot more here than I thought I would. In the end, it's really up to the organizers. But no matter what the scoring process is, that doesn't change the fact that there is so muCH GOOD MUSIC PRODUCED in this competition and in the end, that's what really matters! :D :D

ChronoNomad 2017-08-02 13:45:19

You, sir, have done your homework! And that's putting it lightly...

I quite enjoy delving into all these statistics, so thank YOU for taking so much time to compile all of this data! I'm just stopping by for a moment before continuing on my merry track-scoring way, but I appreciate the heck out of all the effort you put into this.

I hereby dub thee Protégé of Gadogry!

LunacyEcho 2017-08-02 13:45:19

Haha thanks! Good luck scoring, and it's an honor to be dubbed thusly! :)

TaintedLogic 2017-08-02 22:25:53

Hmm...while I agree that normalization maximizes "fairness," I think that fitting the judges' scores into a fixed distribution is problematic because it distorts the extent to which they believe the participants are similar in skill level. Based on the non-normalized figures, SkyeWint may have more "weight" than Samulis, but that weight is a product of the judges' assessment of the similarity of everyone's ability, reflected in the standard deviation. Never mind the fact that each judge uses different scoring standards that are not easily comparable.

Not that our current system is perfect, but I question the value of "equal weight" in judging.

Sorry if I'm being too argumentative, though. I do appreciate all the work you've put into this! :)