Is How You Feel a Lie? · Louis Dallimore

Monitoring fatigue is one of those S&C problems that looks simpler than it is. The brief is straightforward: tell me whether the players are fresh enough for Saturday. The reality is that no single measure has ever consistently delivered on it, and the published literature on this is much more humbling than coaches usually realise.

This piece is about a season I spent at Kintetsu Liners testing one specific question: does a weekly drop-jump test predict the kind of fatigue we care about? I designed the test as carefully as I could, ran it across 50 athletes for a full season, and got a result that contradicted my hypothesis. It also contradicted the simple model most coaches carry around in their heads.

Before I get to my own data, I want to put it next to what the major reviews actually say. Reading my finding through that lens changes the meaning of it.

What we measure when we measure fatigue

Halson's 2014 review in Sports Medicine is the most cited summary of athlete monitoring practice. The list of tools the field has tried is long: heart rate, heart rate variability, blood lactate, creatine kinase, hormonal markers (testosterone, cortisol, the T:C ratio), sleep quality, perception of effort, training impulse calculations, GPS-derived workload, neuromuscular function tests, mood questionnaires, sleep diaries.

Halson's bottom-line conclusion deserves to be quoted plainly: very few of these markers have strong scientific evidence supporting their use, and there is yet to be a single, definitive marker described in the literature. Two decades of monitoring research and the field has no winner.

That should give any coach pause before claiming their dashboard tells them when a player is fatigued. Mine certainly didn't.

The two broad categories of measure are objective (something you read off a device or a blood draw) and subjective (something the athlete tells you). Most working programs use a mix of both. The intuition is usually that the objective measures are the "real" ones and the subjective measures are useful supporting context. Saw, Main, and Gastin (2016), in a systematic review in the British Journal of Sports Medicine, took that intuition apart.

The Saw 2016 finding every S&C coach should know

Saw and colleagues systematically reviewed the literature on objective and subjective measures of athlete well-being and asked a single question: which set of measures is more sensitive to acute and chronic training load?

Their conclusion was the opposite of what the field assumed. Subjective measures (mood, perceived stress, lower body soreness, sleep quality, self-rated wellness) reflected acute and chronic training loads with superior sensitivity and consistency to commonly used objective measures (blood markers, heart rate, oxygen consumption). They also found that subjective and objective measures often did not correlate with each other. They were not measuring the same thing.

This is a significant finding and it is not as widely operationalised in working S&C programs as it should be. The implication is that asking the athlete how they feel, in a structured way, is more informative than running a blood panel. The athlete is the better instrument.

My own season's data does not contradict Saw 2016 so much as complicate it. What follows is the test I ran and what came out, alongside what the literature has to say about the result.

Figure 01 // The monitoring landscapeSaw 2016 // Halson 2014

Column A

Subjective

MoodPOMS / questionnaire
Sleep qualityself-reported
Lower body soreness1–10 daily score
General fatigue1–10 daily score
Perceived stressself-reported

Saw, Main & Gastin · BJSM 2016

More sensitive to load than commonly used objective measures.

Column B

Objective

Heart rate / HRVwearable
Blood lactatefinger-prick
Creatine kinasevenous draw
GPS workloadsession external load
RSI · drop jumptested here · weekly
CMJ force-timeforce plate

Halson · Sports Medicine 2014

No single marker has reliably tracked fatigue across the literature.

Two literatures, one decision. The intuition that objective measures are the “real” ones is not supported by the systematic review evidence. The drop-jump RSI marked here is the test we ran across one Japan League One season. The next figure shows what came back.

What I tested and why

The hypothesis was simple. If a player performs poorly on a drop-jump test relative to his own baseline, he is fatigued and will perform poorly on the pitch.

The drop-jump test produces a Reactive Strength Index (RSI), a ratio of jump height to ground contact time. RSI is a neuromuscular measure that has been used as a fatigue marker in team sports for years. Cormack, Newton, McGuigan, and Cormie (2008) showed that countermovement jump variables, particularly force-time characteristics, were sensitive to fatigue across an Australian rules football season. Subsequent work by McLean, Coutts, Kelly, McGuigan, and Cormack (2010) found neuromuscular, endocrine, and perceptual fatigue responses varied across between-match microcycles in professional rugby league.

My choice of the drop-jump RSI specifically (rather than the more common countermovement jump) was driven by three things: it is fast (a player can complete the test in under 30 seconds), it is non-fatiguing (you do three drops, you read the score, you move on), and the ratio is more sensitive to neuromuscular state than absolute jump height alone.

The test design was as follows. Every Thursday (game day minus three) for an entire competitive season, all squad members performed the drop-jump RSI test. The same morning, they also completed a subjective wellness questionnaire on our Athlete Management System covering lower body soreness, general fatigue, and sleep. We had GPS data for every training session. We had session-RPE (Foster, 1998) for every player for every session.

Critically, I collected the RSI data without letting the result influence training load or intensity that week. The point was to see whether the test would predict on-field performance, not to manage by it. If the test was a useful predictor, I would operationalise it the following season.

50 athletes participated across the full season. The dataset is the largest I have ever analysed in one piece of work, and it gave me a clear signal.

What the data showed

Three things came out of the data, all of them against my prior.

There was no correlation between training load (GPS external or session-RPE internal) and jump test performance. Athletes jumped high during high-load weeks. They jumped low during high-load weeks. The hypothesis that load drives a measurable performance decrement on this test, in this population, in this design, was not supported.

Subjective general fatigue did not track jump performance either. Players reporting tired did not jump differently from players reporting fresh. The literature suggested this should hold. It did not.

The third finding is what made the season worth writing about. There was a positive relationship between lower body soreness and jump performance. Players reporting greater soreness performed better on the test, not worse. The worse they felt in their legs, the higher they jumped. The effect was weak in absolute terms but consistent across the 50-athlete sample.

Figure 02 // Three findings · against three priors50 athletes · 1 season

Finding 01Load × Jump

No correlation

External (GPS) and internal (session-RPE) load did not predict drop-jump RSI on this design. The weeks the squad worked hardest looked the same on the test as the weeks they worked easiest.

Finding 02Fatigue × Jump

No correlation

Self-reported general fatigue did not predict RSI either. Players reporting tired and players reporting fresh sat in the same distribution.

Finding 03Soreness × Jump

Weak positive r

Lower-body soreness predicted higher relative jump performance, not lower. The weak positive slope is the result the season turned on. The simple model was wrong in direction, not size.

Three relationships, one season, fifty athletes. Two against the prior in being null. The third against the prior in being positive instead of negative. The next figure plots Finding 03 against the actual data.

Figure 03 // Soreness vs jump performance · schematicweak positive r ≈ 0.3

Schematic illustration of the relationship described in the essay; individual markers are reconstructed observations, not from the source AMS export. Reported lower-body soreness against RSI normalised to each athlete’s baseline, weak positive slope, in line with the season-long data on 50 athletes. Dashed line: OLS best-fit through the schematic points. The direction of the slope is the finding, contradicting the simple “sore equals fatigued equals worse performance” model. See essay for two competing interpretations.

The simple model I had walked into the season with (sore equals fatigued equals worse performance) was wrong in the direction the data pointed. Soreness predicted better, not worse.

Reading the result against the literature

The null on jump performance versus training load is consistent with Halson 2014. No single marker has reliably tracked fatigue across the literature, and the drop-jump RSI in this design just adds one more entry to that record.

The null on jump performance versus subjective general fatigue against Saw 2016 needs more care. Saw found subjective measures track training load. My data found subjective general fatigue did not track jump performance. Those two findings are compatible: subjective measures can track load without that load expressing itself in a single neuromuscular test. The jump test and the wellness questionnaire are measuring different layers, and they don't have to move together.

The positive soreness-performance relationship is the result the literature has more to say about.

Why soreness might predict better, not worse

Two interpretations are worth taking seriously.

The first is the training-stimulus interpretation. Lower body soreness is often a marker of recent eccentric or unaccustomed loading. The same stimulus that produces delayed-onset muscle soreness also drives neural adaptation and protein-synthesis signalling. In a squad of well-conditioned athletes who recover well session to session, the players reporting more soreness may be the ones who have just had the highest-quality training. The soreness is the receipt. The performance bump on the next test is the early adaptation. This interpretation lines up with the muscle damage literature: the acute inflammatory response to training is not purely a recovery cost, it is part of the signalling that drives adaptation. Athletes who train hardest, recover well, and report soreness are different from athletes who report soreness because they aren't recovering. In an elite squad with monitored recovery, the first group is doing most of the reporting.

The second is the selection-effect interpretation. Athletes in a well-managed in-season program are not randomly assigned to training loads. The athletes training hardest are usually the ones healthy and fit enough to handle the highest loads. They are also the ones reporting the most soreness, because they are doing the most work. They jump well because they are the fittest athletes in the squad to begin with. This is harder to disprove with what I had. It is a confound I would address in a re-run.

Both interpretations point in the same practical direction. In a well-managed elite squad, soreness is a poor proxy for "this athlete needs to back off." It may even be a positive signal that the program is working.

What this changed in practice

A neuromuscular test does not, on its own, tell me whether to load an athlete on a given day. The drop-jump RSI in this design did not predict what I wanted it to predict. I still run it weekly because the aggregate is informative, but it doesn't drive an individual yes-or-no.

The subjective measures get more weight in my call now, in line with Saw 2016. A player reporting low across multiple dimensions of the wellness questionnaire across multiple days carries more weight than any single jump score.

Soreness specifically gets treated with more scepticism as a "back off" signal, especially in well-conditioned senior players. Standalone soreness as a trigger for load reduction isn't backed by what the data showed.

I am also more careful about what I tell coaches I can and cannot predict. No single measure tells me with confidence whether a given player will perform on Saturday. The combination of imperfect signals, weighted by context, is the best available. Saw 2016 is right that subjective measures carry more of the load than coaches assume. Halson 2014 is right that no single marker delivers.

If I ran it again

The first change would be a within-athlete repeated-measures design rather than cross-sectional, comparing each player to his own historical baseline week to week. That gets closer to the question the test was asking and partially controls for the selection effect.

I would extend the subjective questionnaire to include the items Saw flagged as most sensitive: mood, perceived stress, sleep quality. What I used was lighter than the literature suggests is optimal.

And I would pair the design with a parallel objective measure that has stronger literature support as a load tracker. Session-RPE (Foster 1998) is almost free. External load via GPS is what we already had. Pairing those with the subjective set across a full season on the same 50-athlete sample would produce something a lot more useful than what I ended up with.

The lesson I keep coming back to from this season: the relationships you assume exist in monitoring data often don't, and the ones that do exist sometimes run in the opposite direction. The athletes are more variable than any one instrument captures.

Athlete monitoring is watching a number of imperfect signals, weighting them with context, talking to the player, and making a call. The certainty implied by a green-amber-red dashboard is mostly an illusion. The dashboard helps. It doesn't decide.

References

The following references anchor this piece. Verify before final publication.

Saw, A. E., Main, L. C., & Gastin, P. B. (2016). Monitoring the athlete training response: subjective self-reported measures trump commonly used objective measures: a systematic review. British Journal of Sports Medicine, 50(5), 281-291.
Halson, S. L. (2014). Monitoring training load to understand fatigue in athletes. Sports Medicine, 44(Suppl 2), 139-147.
Cormack, S. J., Newton, R. U., McGuigan, M. R., & Cormie, P. (2008). Neuromuscular and endocrine responses of elite players during an Australian rules football season. International Journal of Sports Physiology and Performance, 3(4), 439-453.
McLean, B. D., Coutts, A. J., Kelly, V., McGuigan, M. R., & Cormack, S. J. (2010). Neuromuscular, endocrine, and perceptual fatigue responses during different length between-match microcycles in professional rugby league players. International Journal of Sports Physiology and Performance, 5(3), 367-383.
Foster, C. (1998). Monitoring training in athletes with reference to overtraining syndrome. Medicine and Science in Sports and Exercise, 30(7), 1164-1168.