Entry #5

NGADM '15 Round 1 "Analysis"

2015-08-27 06:30:44 by LunacyEcho
Updated

So the results for the first round of the Newgrounds Audio Death Match have come out! There's lots of numbers, and I thought it might be fun to look at the numbers and see if anything interesting came up. Disclaimer: I am by no means an expert mathematician, and I might be completely wrong in my understanding of some of these concepts! But hopefully it'll be able to satisfy anyone who's curious to see the scores in a different light. :P

Round 1 Entries by Score

??) - N/A - frootza

01) - 9.38 - bassfiddlejones + joshgawaldomusic
02) - 9.25 - dem0lecule + LucidShadowDreamer
03) - 9.22 - SoundChris + JacobCadmus
04) - 9.10 - PirateCrab
05) - 8.96 - midimachine

06) - 8.87 - LunacyEcho
07) - 8.82 - Azhthar
08) - 8.81 - DSykMusic
09) - 8.75 - SkyeWint
10) - 8.73 - papkee
11) - 8.68 - aliaspharow
12) - 8.62 - IglicaV

13) - 8.59 - Metallica1136
14) - 8.53 - EliteFerrex
15) - 8.52 - Jabicho
16) - 8.46 - NyxTheShield
17) - 8.43 - BlazingDragon
17) - 8.43 - LiquidOoze
17) - 8.43 - lordedri
17) - 8.43 - SolarexMusic
21) - 8.42 - Lich
22) - 8.41 - LunyAlex
23) - 8.40 - Tyven
24) - 8.34 - JDawg00100
25) - 8.20 - Stadler
26) - 8.18 - Mawnz
27) - 8.16 - Analogstik
28) - 8.15 - Phonometrologist
28) - 8.15 - Yahtzei
30) - 8.14 - ChronoNomad
30) - 8.14 - Jimmypig
32) - 8.03 - AeronMusic
33) - 8.02 - Camoshark
33) - 8.02 - Ceevro
35) - 7.97 - nubbinownz
36) - 7.95 - InYourDreams
37) - 7.93 - TheMoebiusProject
38) - 7.89 - thebitterroost
39) - 7.87 - Sequenced
40) - 7.84 - Braiton
41) - 7.80 - Jacob
41) - 7.80 - Marterro
43) - 7.77 - DivoFST
44) - 7.75 - larrynachos
45) - 7.74 - adieuwinter
45) - 7.74 - JustinOfSuburbia
47) - 7.73 - Voltus
48) - 7.50 - Chemiqals
49) - 7.38 - DjAbbic
50) - 7.32 - SourJovis
51) - 7.31 - Nimble
52) - 7.18 - Birdinator99
52) - 7.18 - Crueldeity
54) - 6.85 - SilverPoyozo
55) - 6.67 - Pandasticality
56) - 6.65 - Athanatos

The score breakdowns are actually quite interesting! The scores in general seem to be quite high, with a median score of 8.14. There were also lots of ties this year, complete with a 4-way tie for 17th place! The groupings this year were also full of surprises. We had Group C, where no one submitted except frootza, which means he didn't have to be scored, since he moved on directly! We also had the insane Group P, with the dream team of dem0 and LSD taking home a victory over three other high-scorers. All four scores in Group P were in the top 17, and every member had a number of groups in which they would've advanced, had they been seeded into them!

Speaking of groups, here are the group averages and medians, so you can see how competitive the groups were!

Round 1 Groups by Average Score

Rank - Average - Median - Group - Winner

??) - N/A - N/A - Group C - frootza

01) - 8.81 - 8.77 - Group P - dem0lecule + LucidShadowDreamer
02) - ​8.65 - 8.41 - Group L - bassfiddlejones + joshgawaldomusic
03) - 8.59 - 8.70 - Group D - SoundChris + JacobCadmus
04) - 8.53 - 8.43 - Group E - SkyeWint
05) - 8.36 - 8.45 - Group A - PirateCrab
06) - 8.29 - 8.18 - Group N - EliteFerrex
07) - 8.25 - 8.32 - Group K - IglicaV
08) - 8.24 - 8.09 - Group J - Azhthar
09) - 8.05 - 7.98 - Group I  - NyxTheShield
10) - 8.02 - 7.97 - Group G - midimachine
11) - 7.95 - 8.00 - Group H - Lich
12) - 7.79 - 8.05 - Group O - JDawg00100
13) - 7.64 - 7.85 - Group F - thebitterroost
14) - 7.58 - 7.67 - Group M - ChronoNomad
15) - 7.43 - 7.20 - Group B - aliaspharow

So not only did the three team entries get the three highest scores, they topped the three most competitive group brackets as well! There's also some interesting relationships when you factor in the medians, because using them, the top three most competitive groups become P, D, and A. The groups with the largest difference between the average and median are groups O, L, and B, with differences of 0.26, 0.24, and 0.23 respectively. 

It's not just the contestants that make up the NGADM - it's the judges too. An NG user named gadogry used to do analyses of previous NGADMs where he would calculate the average score and standard deviation of each judge. Personally, I just learned what standard deviation was a few hours ago at the time of this writing, so I'm going to try something similar, and please correct me if you see any mistakes!

Round 1 Judges (Average Score/Standard Deviation)

StepW - 8.36 / 0.764
MetalRenard - 7.97 / 0.762
stunkel - 8.23 / 0.630
TaintedLogic - 7.99 / 1.107
Neon-Bard - 8.23 / 0.629
Geoplex - 8.14 / 0.826

Thanks to gadogry for clearing up what standard deviations are! Visit the comments section to learn more about this statistical stuff from gadogry himself or if you want to share your own thoughts on the matter! (And especially visit if you see any errors!) If you want to see the actual numbers I used to calculate this stuff (which is now outdated because of a few mistakes I made), you can download my spreadsheet here

Anyways, I hope I've been able to satisfy some curious minds about the first round of this year's NGADM! I'm looking forward to rounds to come, and will be following it closely. Thanks for reading, and I hope you enjoyed it!


Comments

You must be logged in to comment on this post.


OakwoodOakwood

2015-08-27 07:06:27

A shared 17th spot, I can live with that. Cool thing that my group was actually the highest scoring group!
Nice stats, they really show a different side of the scores.

LunacyEcho responds:

Hey man! Loved your pop loop :P Great job, too! May the RNG gods shine upon you next year!


LucidShadowDreamerLucidShadowDreamer

2015-08-27 08:54:37

Dude. This is like the most awesome newspost ever! o.O
Great job :D

LunacyEcho responds:

Thanks! Great job to YOU, dude. Good luck in the NGADM! :D


bassfiddlejonesbassfiddlejones

2015-08-27 09:01:50

Nice man! Thanks for doing this! I always love seeing how the numbers stack up.

LunacyEcho responds:

Whoo! Congrats on a highly deserved first place, dude!


gadogrygadogry

2015-08-27 10:17:41

Great job with the stats! While I'm not participating in the contest, I'm a fan of many of the contestants, and thus it's a lot of fun seeing these numbers. Thank you so much for doing this :)

I took a glance at the spreadsheet -- everything looked good, except the standard deviations. They are actually very easy to do: for example, if the formula for the average of StepW's score is "=AVERAGE(B2:B57)", then the standard deviation of StepW's set of scores is "=STDEV(B2:B57)". This number indicates how "spread out" StepW's scores were.

Using this formula, the judges' SD are as follows:

StepW: 0.764
MetalRenard: 0.762
stunkel: 0.630
TaintedLogic: 1.107
Neon-Bard: 0.629
Geoplex: 0.826

Tainted Logic had the highest SD by a considerable margin. This means he/she used a wider spectrum of the scoring scale, and thus had a bigger pull on the overall results compared to other judges.

Also, another nugget that puts your "highest scoring contestant who's not going through" misfortune into perspective: If you randomly place 64 contestants into 16 groups of 4, the contestant that scores the 6th highest overall advances 96.28% of the time. So... that's about "once in every 27 times" unlucky...

On a more positive note, I think it has happened before that some contestant drops out and announces it early, and gets replaced by the highest scoring person from the previous round who had not advanced. So you should be first in that line, if you are indeed interested in getting "revived"!

Again, thanks for doing this!

LunacyEcho responds:

=> standard deviations =>

Ooh, thanks! That sure teaches me a lot :D I think what I thought it was is actually quite different from what it is! I think instead of standard deviation, I was trying to figure out how far the judges' scores strayed from the actual average score. For example, hypothetically, if PirateCrab's average score was 9.1, an individual judge score of 9.2 would have a lower "point total" (for lack of a better word since I don't actually know what this is) than an individual judge score of 7.5, since 9.2 is closer to the actual average of 9.1.

I guess I was trying instead to find which judges had a better sense of what the average score actually was? So would that be something like judge reliability? Potentially, I think you might be able to use that to weight certain judges' scores, so you could have a less sporadic set of scores for each person. But I guess that isn't actually standard deviation. :P I'll be updating the newspost! Thanks! Is there by any chance a name for this, btw?

=> unlucky =>

So I get a free pass to the group of 16 for the next 26 NGADMs?! Yaas! :v

This newspost probably wouldn't exist if I hadn't been inspired by how interesting your previous newsposts are! So I really gotta thank you instead. Thank you! :D


gadogrygadogry

2015-08-27 12:03:17

> I guess I was trying instead to find which judges had a better sense of what the average score actually was? So would that be something like judge reliability?

One way I can think of doing that is finding the correlation between a judge's set of scores versus the average scores of the tracks. The "correlation" between two data sets range from -1 to +1. +1 means the two sets completely agree, and -1 means they completely disagree. For instance, to compute how close StepW's scores fit to the average scores, you can just use the formula "=CORREL(B2:B57;H2:H57)" in the spreadsheet, and obtain 0.882, which is really high. This means that StepW's scores are really close to the average, and the results wouldn't be much different had he been the only judge!

But even if a judge's correlation is lower with the average, it doesn't (IMO) necessarily mean that he/she did a bad job. Music is a very subjective thing, and people can have diverse opinions, and we do value that diversity --- this is pretty much the point of having multiple judges, right?

A better way I can think of to reweight the judges' scores (and I think this idea was casually thrown around last year) was to equalize all judge's standard deviations. For instance, we can take the raw scores from all judges, and rescale their score to make each judge's SD exactly 1. This would mean bunching TaintedLogic's scores together a bit (since he/she is the only one currently with an SD of >1), and spreading others' scores out a little. This doesn't CHANGE each judge's opinion , but it equlizes the LOUDNESS of their opinions, so one judge doesn't have more pull on the results than the other.

While this is more fair in theory, I bet that people may not be comfortable with that level of sophistication. And it isn't like the current system ain't working :)

Sorry for the rant -- you now know what happens once you got me started on Math!

LunacyEcho responds:

=> correlation =>

That's exactly what I was looking for! And your explanation for what it is is much better than what I tried to say. XD

=> music is subjective =>

Ooh, I can see now why correlation isn't the most relevant statistical measure. That method about equalizing the judge's standard deviations sounds really interesting. I wonder how different the scores would be if that were to happen?

I actually think this would be a pretty good way to judge the NGADM! :P It might seem a little confusing, but at least people would stop complaining about the 'one judge who ruined their score'? (even though a lower score from a judge might mean that judge gave everyone else lowers scores as well lol) I guess it could potentially start other complaints too, like the fact that weighted scores aren't "real scores". Then again, this is just speculation! I wonder what the reaction would be if NGADM just tried it for a test run someday? :D

No problem about the rant, dude! I love learning about this stuff! Thanks for spending so much time typing all this up and teaching me about it! :D


bassfiddlejonesbassfiddlejones

2015-08-27 12:25:16

Thanks! Working hard to keep the scores up there... You should totally start on a track, so you are ready if someone drops out. It happens a lot!

LunacyEcho responds:

You definitely will! :D And I guess I should start on a track! Not just for potential NGADM, but there's stuff like samulis' GM thing, and I hear the NGMT might start up again soon... ;)


Agitat0rAgitat0r

2015-08-27 13:42:10

Still mind bogging on frootza's group. All the high hitters didn't submit. How miracle was that? Given his group had 2 NGADM winners, frootza would have no chance making in at all. It feels like there is an invisible force stopping everyone and except frootza lol!

I love your data work. Data is beautiful.

LunacyEcho responds:

You never know! Maybe frootza would've pulled out something even greater if he'd been inspired by an early submission from Kor-Rune or johnfn? :P But that group was so insane. Being pitted against two past winners is one thing, but beating them by default?!

Now I really want to see frootza win by default in every single bracket he comes along to. With some luck, frootza may just become the first-ever NGADM winner who didn't actually have to face off against anyone! :D


SkyeWintSkyeWint

2015-08-27 18:08:47

If I may make one minor suggestion, could you also out the median score of each group to see how competitive they are? With so few scores, one outlying high score will fling that group way up, making it seem far more competitive. I don't know if it will change anything, but it might make more sense imo.

LunacyEcho responds:

Ooh, that's a pretty good idea. It fixes assumptions like you could have about aliaspharow in Group B - even though it was the lowest-scoring group on average because of two coincidentally low scores, aliaspharow definitely was not one of the lower-scoring winners!

How would the median score work, though? Since there are only about 4 submissions per group (±3), what about taking the average of the middle two scores instead? Would that be of any help? :P


SkyeWintSkyeWint

2015-08-28 14:27:58

The median is literally the average of the middle two scores. Like, when there is an even number of scores it is the average of the middle ones by definition IIRC.

LunacyEcho responds:

That moment when you forget 4th grade math. :D Added the medians! Thanks for the advice! :)


SkyeWintSkyeWint

2015-08-29 08:51:50

See? It might not change the results a huge amount, especially on the high end, but there's definitely some significant difference there!

LunacyEcho responds:

It's pretty interesting how large the differences are, since there doesn't seem to be much pattern to the medians. Neither the higher or lower-scoring end has any larger differences between average and median, nor does either end have any more tendency to have a higher/lower median than average. This is all really cool! Thanks again for your suggestion! :P


gadogrygadogry

2015-08-29 11:59:13

For your curiosity, I have modified your spreadsheet to add the adjusted scores:

https://dl.dropboxusercontent.com/u/45987304/NGADM%20Dissected%20Results_modified.xls

For the adjusted scores (columns K to P), every judge has an average score of 8 and an SD of exactly 1. This means that the adjusted scores can be interpreted as follows:

6 or less: Pretty pretty bad. Like, bottom 5% bad.
7: Mediocre, better than around one thirds of submissions (so worse than two thirds)
8: Exactly average among submissions this round.
9: Great. Top one third material.
10 or more: Beyond awesome. Among best in this round.

(Note that adjusted scores can go above 10... and in theory, below 0.)

For illustration, let's take your adjusted scores:

StepW: 8.841254435
MetalRenard: 10.0072406764
stunkel: 9.5484882524
TaintedLogic: 8.4636740402
Neon-Bard: 8.9027543906
Geoplex: 8.1902152723

So, as you see, all judges found your submission to be above average (since all scores are >8). However, Geoplex only found it slightly above average, while MetalRenard were left socksless. Also, notice that while StepW gave you a 9 and Neon-Bard only gave you an 8.8, Neon-Bard's adjusted score is actually (a tiny bit) higher than StepW's, which indicates that his "relative opinion" of your track is no lower than StepW's. They just use different grading scales. In a nutshell, if judge A scores your track 8 but everyone elses' 7 and judge B scores your track 8 but everyone elses' 9, the two 8's mean different things... and adjusting for average and SD reveals these trends.

On the other hand, the actual results (i.e. who won the group) did not change at all if we used the adjusted scores instead of the raw scores. So one may argue that all these Math is for naught :)

Sorry -- can't help myself again. Now I'll just be sitting here waiting for your restraining order to arrive...

LunacyEcho responds:

Wow, this is great! Maybe you should've done the newspost instead! XD Your explanations for concepts are also a lot better than what I can say, too. Are you by any chance an educator of some sort? :P

Kind of interesting how TaintedLogic, who had the highest standard deviation, also had no adjusted scores above 10. :P And while these results didn't change any end results, they're such a cool way of looking at the numbers.

Come to think of it, this idea of comparative scoring brings to mind the way the Eurovision contest is scored. I wonder what would have happened if the NGADM was scored like that instead? Time to go back to the spreadsheet! :D

Thanks so much again for this. It's so cool how you can interpret a set of data in so many ways! Can you please come replace my math teachers? :P