Louis Dallimore //Strength & Conditioning
Essay//Is How You Feel a Lie?Field Notes

Is How You Feel a Lie?

A season of weekly drop-jump testing against subjective wellness scores across 50 athletes, set against the published athlete-monitoring literature. The finding I expected and the one I got were not the same.

Monitoring fatigue is one of those S&C problems that looks simpler than it is. The brief is straightforward: tell me whether the players are fresh enough for Saturday. The reality is that no single measure has ever consistently delivered on it, and the published literature on this is much more humbling than coaches usually realise.

This piece is about a season I spent at Kintetsu Liners testing one specific question: does a weekly drop-jump test predict the kind of fatigue we care about? I designed the test as carefully as I could, ran it across 50 athletes for a full season, and got a result that contradicted my hypothesis. It also contradicted the simple model most coaches carry around in their heads.

Before I get to my own data, I want to put it next to what the major reviews actually say. Reading my finding through that lens changes the meaning of it.

What we measure when we measure fatigue

Halson's 2014 review in Sports Medicine is the most cited summary of athlete monitoring practice. The list of tools the field has tried is long: heart rate, heart rate variability, blood lactate, creatine kinase, hormonal markers (testosterone, cortisol, the T:C ratio), sleep quality, perception of effort, training impulse calculations, GPS-derived workload, neuromuscular function tests, mood questionnaires, sleep diaries.

Halson's bottom-line conclusion deserves to be quoted plainly: very few of these markers have strong scientific evidence supporting their use, and there is yet to be a single, definitive marker described in the literature. Two decades of monitoring research and the field has no winner.

That should give any coach pause before claiming their dashboard tells them when a player is fatigued. Mine certainly didn't.

The two broad categories of measure are objective (something you read off a device or a blood draw) and subjective (something the athlete tells you). Most working programmes use a mix of both. The intuition is usually that the objective measures are the "real" ones and the subjective measures are useful supporting context. Saw, Main, and Gastin (2016), in a systematic review in the British Journal of Sports Medicine, took that intuition apart.

The Saw 2016 finding every S&C coach should know

Saw and colleagues systematically reviewed the literature on objective and subjective measures of athlete well-being and asked a single question: which set of measures is more sensitive to acute and chronic training load?

Their conclusion was the opposite of what the field assumed. Subjective measures (mood, perceived stress, lower body soreness, sleep quality, self-rated wellness) reflected acute and chronic training loads with superior sensitivity and consistency to commonly used objective measures (blood markers, heart rate, oxygen consumption). They also found that subjective and objective measures often did not correlate with each other. They were not measuring the same thing.

This is a significant finding and it is not as widely operationalised in working S&C programmes as it should be. The implication is that asking the athlete how they feel, in a structured way, is more informative than running a blood panel. The athlete is the better instrument.

My own season's data does not contradict Saw 2016 so much as complicate it. What follows is the test I ran and what came out, alongside what the literature has to say about the result.

Figure 01 // The monitoring landscapeSaw 2016 // Halson 2014
Column A
Subjective
  • MoodPOMS / questionnaire
  • Sleep qualityself-reported
  • Lower body soreness1–10 daily score
  • General fatigue1–10 daily score
  • Perceived stressself-reported
Saw, Main & Gastin · BJSM 2016
More sensitive to load than commonly used objective measures.
Column B
Objective
  • Heart rate / HRVwearable
  • Blood lactatefinger-prick
  • Creatine kinasevenous draw
  • GPS workloadsession external load
  • RSI · drop jumptested here · weekly
  • CMJ force-timeforce plate
Halson · Sports Medicine 2014
No single marker has reliably tracked fatigue across the literature.
Two literatures, one decision. The intuition that objective measures are the “real” ones is not supported by the systematic review evidence. The drop-jump RSI marked here is the test we ran across one Japan League One season. The next figure shows what came back.

What I tested and why

The hypothesis was simple. If a player performs poorly on a drop-jump test relative to his own baseline, he is fatigued and will perform poorly on the pitch.

The drop-jump test produces a Reactive Strength Index (RSI), a ratio of jump height to ground contact time. RSI is a neuromuscular measure that has been used as a fatigue marker in team sports for years. Cormack, Newton, McGuigan, and Cormie (2008) showed that countermovement jump variables, particularly force-time characteristics, were sensitive to fatigue across an Australian rules football season. Subsequent work by McLean, Coutts, Kelly, McGuigan, and Cormack (2010) found neuromuscular, endocrine, and perceptual fatigue responses varied across between-match microcycles in professional rugby league.

My choice of the drop-jump RSI specifically (rather than the more common countermovement jump) was driven by three things: it is fast (a player can complete the test in under 30 seconds), it is non-fatiguing (you do three drops, you read the score, you move on), and the ratio is more sensitive to neuromuscular state than absolute jump height alone.

The test design was as follows. Every Thursday (game day minus three) for an entire competitive season, all squad members performed the drop-jump RSI test. The same morning, they also completed a subjective wellness questionnaire on our Athlete Management System covering lower body soreness, general fatigue, and sleep. We had GPS data for every training session. We had session-RPE (Foster, 1998) for every player for every session.

Critically, I collected the RSI data without letting the result influence training load or intensity that week. The point was to see whether the test would predict on-field performance, not to manage by it. If the test was a useful predictor, I would operationalise it the following season.

50 athletes participated across the full season. The dataset is the largest I have ever sat with in one analysis, and it gave me a clear signal.

What the data showed

Three findings, all of them against the prior.

The first was that there was no correlation between training load (GPS external or session-RPE internal) and jump test performance. Athletes jumped high during high-load weeks. They jumped low during high-load weeks. The hypothesis that load drives a measurable performance decrement on this test, in this population, in this design, was not supported.

The second was that there was no correlation between subjective general fatigue and jump test performance either. Players reporting tired did not jump differently from players reporting fresh. The literature suggested this should hold. It did not.

The third one is what made the season worth writing about. There was a positive relationship between lower body soreness and jump performance. Players reporting greater soreness performed better on the test, not worse. The worse they felt in their legs, the higher they jumped. The effect was weak in absolute terms but consistent across the 50-athlete sample.

Figure 02 // Three findings · against three priors50 athletes · 1 season
Finding 01Load × Jump
No correlation
External (GPS) and internal (session-RPE) load did not predict drop-jump RSI on this design. The weeks the squad worked hardest looked the same on the test as the weeks they worked easiest.
Finding 02Fatigue × Jump
No correlation
Self-reported general fatigue did not predict RSI either. Players reporting tired and players reporting fresh sat in the same distribution.
Finding 03Soreness × Jump
Weak positive r
Lower-body soreness predicted higher relative jump performance, not lower. The weak positive slope is the result the season turned on. The simple model was wrong in direction, not size.
Three relationships, one season, fifty athletes. Two against the prior in being null. The third against the prior in being positive instead of negative. The next figure plots Finding 03 against the actual data.
Figure 03 // Soreness vs jump performanceweak positive r // n = 50
0.901.001.101.2013579REPORTED LOWER-BODY SORENESS (1-10)RSI / BASELINEBASELINE
Reported lower-body soreness against RSI normalised to each athlete’s baseline, weekly observations across one full Japan League One season, 50 athletes. Dashed line: OLS best-fit. Each marker: one weekly observation. The slope is positive, contradicting the simple “sore equals fatigued equals worse performance” model. See essay for two competing interpretations.

The simple model I had walked into the season with (sore equals fatigued equals worse performance) was wrong in the direction the data pointed. Soreness predicted better, not worse.

Reading the result against the literature

The null on jump performance versus training load is consistent with Halson 2014. No single marker has reliably tracked fatigue across the literature, and the drop-jump RSI in this design just adds one more entry to that record.

The null on jump performance versus subjective general fatigue against Saw 2016 needs more care. Saw found subjective measures track training load. My data found subjective general fatigue did not track jump performance. Those two findings are compatible: subjective measures can track load without that load expressing itself in a single neuromuscular test. The jump test and the wellness questionnaire are measuring different layers, and they don't have to move together.

The positive soreness-performance relationship is the result the literature has more to say about.

Why soreness might predict better, not worse

Two interpretations are worth taking seriously.

The first is the training-stimulus interpretation. Lower body soreness is often a marker of recent eccentric or unaccustomed loading. The same stimulus that produces delayed-onset muscle soreness also drives neural adaptation and protein-synthesis signalling. In a squad of well-conditioned athletes who recover well session to session, the players reporting more soreness may be the ones who have just had the highest-quality training. The soreness is the receipt. The performance bump on the next test is the early adaptation. This interpretation lines up with the muscle damage literature: the acute inflammatory response to training is not purely a recovery cost, it is part of the signalling that drives adaptation. Athletes who train hardest, recover well, and report soreness are different from athletes who report soreness because they aren't recovering. In an elite squad with monitored recovery, the first group is doing most of the reporting.

The second is the selection-effect interpretation. Athletes in a well-managed in-season programme are not randomly assigned to training loads. The athletes training hardest are usually the ones healthy and fit enough to handle the highest loads. They are also the ones reporting the most soreness, because they are doing the most work. They jump well because they are the fittest athletes in the squad to begin with. This is harder to disprove with what I had. It is a confound I would address in a re-run.

Both interpretations point in the same practical direction. In a well-managed elite squad, soreness is a poor proxy for "this athlete needs to back off." It may even be a positive signal that the programme is working.

What this changed in practice

A neuromuscular test does not, on its own, tell me whether to load an athlete on a given day. The drop-jump RSI in this design did not predict what I wanted it to predict. I still run it weekly because the aggregate is informative, but it doesn't drive an individual yes-or-no.

The subjective measures get more weight in my call now, in line with Saw 2016. A player reporting low across multiple dimensions of the wellness questionnaire across multiple days carries more weight than any single jump score.

Soreness specifically gets treated with more scepticism as a "back off" signal, especially in well-conditioned senior players. Standalone soreness as a trigger for load reduction isn't backed by what the data showed.

I am also more careful about what I tell coaches I can and cannot predict. The honest answer: no single measure tells me with confidence whether a given player will perform on Saturday. The combination of imperfect signals, weighted by context, is the best available. Saw 2016 is right that subjective measures carry more of the load than coaches assume. Halson 2014 is right that no single marker delivers.

If I ran it again

The first change would be a within-athlete repeated-measures design rather than cross-sectional, comparing each player to his own historical baseline week to week. That gets closer to the question the test was asking and partially controls for the selection effect.

The second would be extending the subjective questionnaire to include the items Saw flagged as most sensitive: mood, perceived stress, sleep quality. What I used was lighter than the literature suggests is optimal.

The third would be running a parallel objective measure with stronger literature support as a load tracker. Session-RPE (Foster 1998) is almost free. External load via GPS is what we already had. Pairing those with the subjective set across a full season on the same 50-athlete sample would produce something a lot more useful than what I ended up with.

The general lesson, the one I keep coming back to, generalises past this test. The relationships you assume exist in monitoring data often don't. The ones that do exist sometimes run in the opposite direction. The athletes are more variable, more individual, more interesting than any one instrument captures.

The honest version of athlete monitoring is that you watch a number of imperfect signals, you weight them with context, you talk to the player, and you make a call. The certainty implied by a green-amber-red dashboard is mostly an illusion. The dashboard helps. It doesn't decide.

References

The following references anchor this piece. Verify before final publication.

New essays in your inbox.

Roughly one a fortnight. Programming, GPS, return-to-play, applied ML. No spam, unsubscribe anytime.