Newgrounds.com — Everything, By Everyone

Joined on 8/29/12

Level:: 10
Exp Points:: 1,018 / 1,110
Exp Rank:: 69,556
Vote Power:: 5.23 votes
Audio Scouts: 6

Rank:: Civilian
Global Rank:: > 100,000
Blams:: 0
Saves:: 11
B/P Bonus:: 0%

Whistle:: Normal

Trophies:: 4
Medals:: 612

Comments

Log in to Comment

Oakwood 2015-08-27 07:06:27

A shared 17th spot, I can live with that. Cool thing that my group was actually the highest scoring group!
Nice stats, they really show a different side of the scores.

LunacyEcho 2015-08-27 07:06:27

Hey man! Loved your pop loop :P Great job, too! May the RNG gods shine upon you next year!

LucidShadowDreamer 2015-08-27 08:54:37

Dude. This is like the most awesome newspost ever! o.O
Great job :D

LunacyEcho 2015-08-27 08:54:37

Thanks! Great job to YOU, dude. Good luck in the NGADM! :D

bassfiddlejones 2015-08-27 09:01:50

Nice man! Thanks for doing this! I always love seeing how the numbers stack up.

LunacyEcho 2015-08-27 09:01:50

Whoo! Congrats on a highly deserved first place, dude!

gadogry 2015-08-27 10:17:41

Great job with the stats! While I'm not participating in the contest, I'm a fan of many of the contestants, and thus it's a lot of fun seeing these numbers. Thank you so much for doing this :)

I took a glance at the spreadsheet -- everything looked good, except the standard deviations. They are actually very easy to do: for example, if the formula for the average of StepW's score is "=AVERAGE(B2:B57)", then the standard deviation of StepW's set of scores is "=STDEV(B2:B57)". This number indicates how "spread out" StepW's scores were.

Using this formula, the judges' SD are as follows:

StepW: 0.764
MetalRenard: 0.762
stunkel: 0.630
TaintedLogic: 1.107
Neon-Bard: 0.629
Geoplex: 0.826

Tainted Logic had the highest SD by a considerable margin. This means he/she used a wider spectrum of the scoring scale, and thus had a bigger pull on the overall results compared to other judges.

Also, another nugget that puts your "highest scoring contestant who's not going through" misfortune into perspective: If you randomly place 64 contestants into 16 groups of 4, the contestant that scores the 6th highest overall advances 96.28% of the time. So... that's about "once in every 27 times" unlucky...

On a more positive note, I think it has happened before that some contestant drops out and announces it early, and gets replaced by the highest scoring person from the previous round who had not advanced. So you should be first in that line, if you are indeed interested in getting "revived"!

Again, thanks for doing this!

LunacyEcho 2015-08-27 10:17:41

=> standard deviations =>

Ooh, thanks! That sure teaches me a lot :D I think what I thought it was is actually quite different from what it is! I think instead of standard deviation, I was trying to figure out how far the judges' scores strayed from the actual average score. For example, hypothetically, if PirateCrab's average score was 9.1, an individual judge score of 9.2 would have a lower "point total" (for lack of a better word since I don't actually know what this is) than an individual judge score of 7.5, since 9.2 is closer to the actual average of 9.1.

I guess I was trying instead to find which judges had a better sense of what the average score actually was? So would that be something like judge reliability? Potentially, I think you might be able to use that to weight certain judges' scores, so you could have a less sporadic set of scores for each person. But I guess that isn't actually standard deviation. :P I'll be updating the newspost! Thanks! Is there by any chance a name for this, btw?

=> unlucky =>

So I get a free pass to the group of 16 for the next 26 NGADMs?! Yaas! :v

This newspost probably wouldn't exist if I hadn't been inspired by how interesting your previous newsposts are! So I really gotta thank you instead. Thank you! :D

gadogry 2015-08-27 12:03:17

> I guess I was trying instead to find which judges had a better sense of what the average score actually was? So would that be something like judge reliability?

One way I can think of doing that is finding the correlation between a judge's set of scores versus the average scores of the tracks. The "correlation" between two data sets range from -1 to +1. +1 means the two sets completely agree, and -1 means they completely disagree. For instance, to compute how close StepW's scores fit to the average scores, you can just use the formula "=CORREL(B2:B57;H2:H57)" in the spreadsheet, and obtain 0.882, which is really high. This means that StepW's scores are really close to the average, and the results wouldn't be much different had he been the only judge!

But even if a judge's correlation is lower with the average, it doesn't (IMO) necessarily mean that he/she did a bad job. Music is a very subjective thing, and people can have diverse opinions, and we do value that diversity --- this is pretty much the point of having multiple judges, right?

A better way I can think of to reweight the judges' scores (and I think this idea was casually thrown around last year) was to equalize all judge's standard deviations. For instance, we can take the raw scores from all judges, and rescale their score to make each judge's SD exactly 1. This would mean bunching TaintedLogic's scores together a bit (since he/she is the only one currently with an SD of >1), and spreading others' scores out a little. This doesn't CHANGE each judge's opinion , but it equlizes the LOUDNESS of their opinions, so one judge doesn't have more pull on the results than the other.

While this is more fair in theory, I bet that people may not be comfortable with that level of sophistication. And it isn't like the current system ain't working :)

Sorry for the rant -- you now know what happens once you got me started on Math!

LunacyEcho 2015-08-27 12:03:17

=> correlation =>

That's exactly what I was looking for! And your explanation for what it is is much better than what I tried to say. XD

=> music is subjective =>

Ooh, I can see now why correlation isn't the most relevant statistical measure. That method about equalizing the judge's standard deviations sounds really interesting. I wonder how different the scores would be if that were to happen?

I actually think this would be a pretty good way to judge the NGADM! :P It might seem a little confusing, but at least people would stop complaining about the 'one judge who ruined their score'? (even though a lower score from a judge might mean that judge gave everyone else lowers scores as well lol) I guess it could potentially start other complaints too, like the fact that weighted scores aren't "real scores". Then again, this is just speculation! I wonder what the reaction would be if NGADM just tried it for a test run someday? :D

No problem about the rant, dude! I love learning about this stuff! Thanks for spending so much time typing all this up and teaching me about it! :D

bassfiddlejones 2015-08-27 12:25:16

Thanks! Working hard to keep the scores up there... You should totally start on a track, so you are ready if someone drops out. It happens a lot!

LunacyEcho 2015-08-27 12:25:16

You definitely will! :D And I guess I should start on a track! Not just for potential NGADM, but there's stuff like samulis' GM thing, and I hear the NGMT might start up again soon... ;)

SkyeWint 2015-08-27 18:08:47

If I may make one minor suggestion, could you also out the median score of each group to see how competitive they are? With so few scores, one outlying high score will fling that group way up, making it seem far more competitive. I don't know if it will change anything, but it might make more sense imo.

LunacyEcho 2015-08-27 18:08:47

Ooh, that's a pretty good idea. It fixes assumptions like you could have about aliaspharow in Group B - even though it was the lowest-scoring group on average because of two coincidentally low scores, aliaspharow definitely was not one of the lower-scoring winners!

How would the median score work, though? Since there are only about 4 submissions per group (±3), what about taking the average of the middle two scores instead? Would that be of any help? :P

SkyeWint 2015-08-28 14:27:58

The median is literally the average of the middle two scores. Like, when there is an even number of scores it is the average of the middle ones by definition IIRC.

LunacyEcho 2015-08-28 14:27:58

That moment when you forget 4th grade math. :D Added the medians! Thanks for the advice! :)

SkyeWint 2015-08-29 08:51:50

See? It might not change the results a huge amount, especially on the high end, but there's definitely some significant difference there!

LunacyEcho 2015-08-29 08:51:50

It's pretty interesting how large the differences are, since there doesn't seem to be much pattern to the medians. Neither the higher or lower-scoring end has any larger differences between average and median, nor does either end have any more tendency to have a higher/lower median than average. This is all really cool! Thanks again for your suggestion! :P

gadogry 2015-08-29 11:59:13

For your curiosity, I have modified your spreadsheet to add the adjusted scores:

https://dl.dropboxusercontent.com/u/45987304/NGADM%20Dissected%20Results_modified.xls

For the adjusted scores (columns K to P), every judge has an average score of 8 and an SD of exactly 1. This means that the adjusted scores can be interpreted as follows:

6 or less: Pretty pretty bad. Like, bottom 5% bad.
7: Mediocre, better than around one thirds of submissions (so worse than two thirds)
8: Exactly average among submissions this round.
9: Great. Top one third material.
10 or more: Beyond awesome. Among best in this round.

(Note that adjusted scores can go above 10... and in theory, below 0.)

For illustration, let's take your adjusted scores:

StepW: 8.841254435
MetalRenard: 10.0072406764
stunkel: 9.5484882524
TaintedLogic: 8.4636740402
Neon-Bard: 8.9027543906
Geoplex: 8.1902152723

So, as you see, all judges found your submission to be above average (since all scores are >8). However, Geoplex only found it slightly above average, while MetalRenard were left socksless. Also, notice that while StepW gave you a 9 and Neon-Bard only gave you an 8.8, Neon-Bard's adjusted score is actually (a tiny bit) higher than StepW's, which indicates that his "relative opinion" of your track is no lower than StepW's. They just use different grading scales. In a nutshell, if judge A scores your track 8 but everyone elses' 7 and judge B scores your track 8 but everyone elses' 9, the two 8's mean different things... and adjusting for average and SD reveals these trends.

On the other hand, the actual results (i.e. who won the group) did not change at all if we used the adjusted scores instead of the raw scores. So one may argue that all these Math is for naught :)

Sorry -- can't help myself again. Now I'll just be sitting here waiting for your restraining order to arrive...

LunacyEcho 2015-08-29 11:59:13

Wow, this is great! Maybe you should've done the newspost instead! XD Your explanations for concepts are also a lot better than what I can say, too. Are you by any chance an educator of some sort? :P

Kind of interesting how TaintedLogic, who had the highest standard deviation, also had no adjusted scores above 10. :P And while these results didn't change any end results, they're such a cool way of looking at the numbers.

Come to think of it, this idea of comparative scoring brings to mind the way the Eurovision contest is scored. I wonder what would have happened if the NGADM was scored like that instead? Time to go back to the spreadsheet! :D

Thanks so much again for this. It's so cool how you can interpret a set of data in so many ways! Can you please come replace my math teachers? :P

Comments

Log in to Comment

Main Sections

Extra, Extra!

Community

NG Related