Four Grand Prix competitions down this season, two left to go before the Final. It's been a heck of a ride, and NHK Trophy this past weekend was no different. Between Yuzuru Hanyu suffering an ankle injury in practice and withdrawing, Sergei Voronov winning his first Grand Prix competition at age 30, Evgenia Medvedeva making two mistakes (!) in her free skate, Wenjing Sui/Cong Han's world record free skate, and Tessa Virtue/Scott Moir's continued undefeated streak, it was a very eventful competition. And one to absolutely have a look at if you haven't already!

Relive NHK Trophy
Short: MEN | LADIES | PAIRS | DANCE
Free: MEN | LADIES | PAIRS | DANCE

Embed from Getty Images

But as with last week, there's been an overarching theme bubbling under for me that I wanted to bring up in one fell swoop. Last week, it was Grades of Execution; this week, I'm bringing component marks into the conversation. A few points of clarification ... and some numbers, up ahead.

What components are, and what components are not
For many who have followed skating before the ISU Judging System was introduced, the component mark is the direct offshoot of the old Presentation (or even older-school, Artistic Impression). And since the start of the IJS, the components have been separated into five dimensions with the intention of drawing distinctions between different "components," if you will of what a program should be judged on outside of the technical elements.

Of course, this distinction would, in theory, allow judges to give a skater who has, say, fantastic foundational skating but mediocre choreography, a high score in Skating Skills and a so-so score in Composition. Does that actually happen? I'll take a look at that later.

But first, a quick look at what components are, as stated by the ISU, and what they are not. A lot of skating observers talk about Program Components as if they have actually read the definition of each component is. The graphic above gives you all five components and the verbatim definition from the ISU. Here's a quick cheat sheet of just some of what they are and what they are not:

What Program Components are:
- They are a set of five separate, distinctive, and *often* correlated dimensions that make up the overall program
- They are a way to quantitatively measure qualitative parts of a program
- They are a way for errors outside of technical elements (e.g., trips or fluke falls) to be addressed (i.e., the "overall cleanness and sureness" of Skating Skills)
- Three of the components are meant address what many think of as "choreography" of a program, including how the program is composed (Composition),
What Program Components are not:
- They are NOT a direct reflection of the technical elements of the program (e.g., a fall, in itself, is NOT taken off of any component mark - however, if the fall, say, leads to a skater not focusing on interpreting their music, then the Interpretation mark should be lowered accordingly)
- They are NOT just about how many crossovers a skater does - it is so important to remember that crossovers are transitions as well, and though they aren't difficult to learn, they are difficult to really master - high-quality crossovers should be rewarded as well
- They are NOT a way for skaters to pack a bunch of intricate steps in and automatically get high components; even within the Transitions mark, the definition calls out the intricate footwork, positions, and movements are to be "varied and purposeful"
- They are NOT written with any language about "well-balanced" programs in the way that the term is used in skating circles (e.g., backloading a program has no effect on the component mark, unless the backloading negatively effects the composition of a program); in fact, in the rulebook, a "well-balanced program" refers to the inclusion, not the distribution, of elements

And to be clear, what I wrote above is not necessarily what components *should* cover (that's a whole different kind of post), rather it is what components are currently written to cover.

Where's the differentiation?
What I'm about to say is nothing new to skating fans who have been looking at judges' protocols for the past decade. But it is the statistical extent of it that surprised even me when I looked at it.

The overwhelming hypothesis is that program component scores are ridiculously narrow. There are five components to mark, but for what it's worth, judges basically stick to a mark and won't deviate from it all that much. For me, "all that much" always meant around one point. But in fact, it's more like half a point.

Methodology: I took a look at the four competitions in the Grand Prix series so far this season, went into each discipline, randomly drew the short program or the free skate, and took all of the judges' component scores for the 1st place, 4th place, and last place skaters to try to sample the extremes and the averages. For each judge for each skater, I took the range of component marks from highest mark to lowest mark (e.g., if a judge rated a skater 8.00, 7.50, 7.75, 8.25, 8.25, the range would be 8.25-7.50=0.75). The result was a numerical analysis of 432 data points from over 100 judges over four competitions.

Now, you can argue that the best of the best will absolutely be tops in all of the dimensions of components, and mere mortal skaters may be great at one thing but not at others. If that were the case, you would see a longer tail on these histograms (i.e., more ranges that are greater than 1.50 points). But when you isolated the 1st and last place skaters' marks, and then added the 4th place skaters' marks, the ranges barely deviated.

As it is, the only differentiation we really see is in the Transitions mark, and somehow, the only differentiation is that judges will score lower in Transitions for the vast majority of programs than they will other components. In fact, if you take out the Transitions mark, you see the ranges tighten up even more. Taking out Transitions from this sample of judges marks brings the percentage of ranges of 0.50 or lower from an already-high 72% to a significantly higher 84%. And it not only tightens the ranges, but it also shifts the mode from a 0.50 range to a 0.25 range.

48% of the judges' ranges were zero or 0.25 when you took out the Transitions mark.

FOURTY. EIGHT. PERCENT.

So that's to say that what separating Program Components into five dimensions was intended to do, well, is not what is actually happening.

You saw some of that this weekend at NHK Trophy. Perhaps most egregious to me was Adam Rippon's stunning free skate - the most well-performed free skate with the strongest musical interpretation that day, yet Performance and Execution was, on average, marked third-best in the field.

Um ... so what next?
Can the marking of components be solved? Much of it is human judgment - there is the availability bias of what is happening right in front of them (e.g., the home crowd goes wild after a program); there is the knowledge of what has happened in the past (e.g., the skater on the ice is a World champion and there's a history of high marks); or there is the simple fact that judges already have to do enough in the few minutes that they have already in marking technical elements AND components.

In an ideal world, every competition would have a set of judges who don't know who the skaters are and are judging them for the first time. But absent this utopian judging panel, acknowledging that judges need to score a skater component by component may help matters. And that may have to do with taking something off the judges' plates to begin with - a panel that is solely focused on watching the program and evaluating the components may help matters. And because you have judging panels for different disciplines (though there is some reuse of judges across disciplines), you can do that with the judges who are at the competition to judge other events.

It may also be necessary to have audits of judges' scores and provide the feedback that is necessary to provide continuous improvement in the use of components. Judges, for the most, most, most part, have good intentions in what they are putting out there. Sometimes, objective feedback is all you need in behavior change. You just don't know what you're doing until you see it.

Relive NHK TrophyShort: MEN | LADIES | PAIRS | DANCEFree: MEN | LADIES | PAIRS | DANCE

Relive NHK Trophy
Short: MEN | LADIES | PAIRS | DANCE
Free: MEN | LADIES | PAIRS | DANCE