Upload
warhammer13
View
11
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Alan Aragon
Citation preview
Alan Aragons Research Review May 2014 [Back to Contents] Page 1
Copyright May 1st, 2014 by Alan Aragon Home: www.alanaragon.com/researchreview Correspondence: [email protected]
2 Optimizing activity-based fat loss for aesthetic
athletes: Interval or steady-state training?
By Joel Minden, PhD, CSCS
5 How to manipulate research.
By James Heathers, PhD(c)
12 Changes in exercises are more effective than in loading schemes to improve muscle strength [reviewed by Brad Schoenfeld, PhD, CSCS, CSPS, FNSCA].
Fonseca RM, Roschel H, Tricoli V, de Souza EO, Wilson JM, Laurentino GC, Aihara AY, de Souza Leo AR,
Ugrinowitsch C. J Strength Cond Res. 2014 May 14. [Epub
ahead of print] [PubMed]
14 The effects of consuming a high protein diet (4.4 g/kg/d) on body composition in resistance-trained individuals.
Antonio J, Peacock CA, Ellerbroek A, Fromhoff B, Silver T. J Int Soc Sports Nutr. 2014 May 12;11:19. [PubMed]
16 An amino acid-electrolyte beverage may increase
cellular rehydration relative to carbohydrate-electrolyte and flavored water beverages.
Tai CY, Joy JM, Falcone PH, Carson LR, Mosman MM,
Straight JL, Oury SL, Mendez C, Loveridge NJ, Kim MP,
Moon JR. Nutr J 2014, 13:47 doi:10.1186/1475-2891-13-47
[PunMed]
17 Calorie shifting diet versus calorie restriction diet:
a comparative clinical trial study. Davoodi SH, Ajami M, Ayatollahi SA, Dowlatshahi K,
Javedan G, Pazoki-Toroudi HR. Int J Prev Med. 2014
Apr;5(4):447-56. [PubMed]
19 Processed foods - are they really that bad for you? By Chris & Eric Martinez
22 How can you get through to people who *think*
they understand the science behind a certain topic? By Alan Aragon
Alan Aragons Research Review May 2014 [Back to Contents] Page 2
Optimizing activity-based fat loss for aesthetic athletes: Interval or steady-state training?
By Joel Minden
__________________________________________________
For aesthetic athletes, such as dancers, gymnasts, and
bodybuilders, managing body mass and composition is just as
important as sport-specific training. At a selected body weight,
fat mass should be minimized, and dietary strategies, such as
caloric restriction or macronutrient manipulation, are frequently
used to achieve this. For those who prefer to emphasize activity-
based methods to reduce body fat, the optimal strategy is
unclear. Although increasing activity to create a negative energy
balance should be the primary goal, there is considerable debate
concerning the differential effectiveness of interval versus
steady-state training. Perhaps the lack of consensus is due to the
fact that empirical research in this area is compromised by
methodological limitations and an inability to control, either
physically or statistically, for the numerous contextual variables
that cloud interpretability.
For example, research on acute metabolic responses to exercise
is sometimes criticized for the artificiality of the experimental
setting, limited time course, and uncertain relation of measured
variables (i.e., substrate utilization, gas exchange, plasma, and
biopsy data) to long-term changes in body composition.
Similarly, research on chronic responses to exercise has its own
set of limitations: individual differences in protocol compliance,
nonexercise activity, and dietary behavior; unknown accuracy of
subjects record keeping; and questionable reliability of
instruments used to track changes in body composition. Finally,
both acute and chronic outcome data should be interpreted
within the context of participant variables, including
demographic characteristics and fitness levels, and dimensions
of training protocols, such as modality, intensity, duration, and
frequency of exercise. In light of these factors, its no surprise
the efficacy debate continues.
Despite the many challenges to interpretability, consistencies in
the literature can be identified, and tentative conclusions can be
made by directly comparing the effects of multi-week interval
and steady-state training programs on body mass and
composition. Given the enthusiasm for interval training in both
scientific and popular media, its somewhat surprising that these
direct comparisons are limited. In the following section, Ill
present the results of these studies. For ease of interpretation,
data on strength training or diet-only conditions will not be
reported, nor will metabolic or cardiovascular outcome data.
Studies that compared interval training to no-exercise controls or
those that combined interval with steady-state training will also
be excluded. In all studies, interval training sessions, unless
otherwise noted, included 4 to 15 work intervals performed for
15 to 240 seconds, with each repetition followed by low- to
moderate-intensity periods of active recovery for up to 4
minutes.
The Research
In perhaps the earliest direct comparison, Thomas et al1 assigned
recreationally active male and female college students to steady-
state or interval running programs matched for energy
expenditure, 500 kcal per session. Exercise bouts were
performed 3 times per week for 12 weeks. After statistically
controlling for pre-intervention differences in body composition
(assessed through hydrostatic weighing), the data revealed that
subjects in both conditions experienced a reduction in body fat
percentage. There were, however, no differences between the
exercise conditions.
Following the emergence of research by Tremblay et al,2 steady-
state endurance training as a fat loss strategy was dismissed by
many as inferior to intense but brief interval training. In this
classic study, adults with no previous exercise history completed
either a 20-week endurance training program or a 5-week
endurance training program followed by 15 weeks of interval
training bouts that varied in duration and intensity.
Heralded as a breakthrough study two decades ago, the results
appeared to demonstrate a paradoxical advantage of brief
interval work for fat loss despite an energy cost well below that
of endurance training. Although frequently noted for its finding
that subcutaneous fat loss was ninefold greater for those in the
interval condition, this estimate was made after statistically
correcting for the energy cost of each type of exercise. When
actual fat loss between the two conditions was compared, the
difference was nonsignificant. Other aspects of this heavily cited
study make firm conclusions about fat loss differences by
protocol difficult: the undetermined reliability of skinfold data,
the inclusion of an endurance training component (25 30-minute
sessions) to the interval training program, and no control for
dietary behavior.
Years after the release of this promising study, additional
evaluations of interval training began to emerge, the bulk of
which failed to demonstrate any reliable advantage of interval
training. For example, in Tjnna and colleagues 16-week study
of metabolic syndrome patients,3 subjects exercised on inclined
treadmills, and work volume for the interval and endurance
conditions was equivalent. Both groups experienced reductions
in weight, BMI, and waist circumference, but no differences
between the groups were observed.
Trapp et al4 compared fat loss outcomes of a 20-minute interval
program and a 40-minute steady-state program, both performed
Alan Aragons Research Review May 2014 [Back to Contents] Page 3
by young adult women on cycle ergometers 3 times per week for
15 weeks. Despite the difference in duration of exercise bouts,
estimated energy expenditure over the study period for the two
groups was equivalent. This was achieved by having subjects in
the interval condition perform 60 8-second intervals, followed
by 12-second recovery periods, in each session. The interval
training group, but not the steady-state group, experienced a
reduction in DEXA-measured fat mass (~2.5 kg) at the
completion of the study. This apparent intervention effect must,
however, be interpreted with caution due to pre-existing group
differences. At the beginning of the study, the mean fat mass for
the interval group was 3.8 kg greater than that of the steady-state
group, and follow-up analyses revealed that approximately of
the variance in fat loss was accounted for by level of body fat at
the beginning of the study.
Schjerve et al5 compared fat loss responses in obese adults to 12
weeks of interval or steady-state treadmill training performed 3
times per week. Conditions were equalized for energy
expenditure. Both groups experienced similarly small but
significant reductions in weight, BMI, and body fat percentage.
There were no differences between the conditions in these
outcomes.
Wallman et al6 examined the effects of 8 weeks of interval or
steady-state training performed by overweight and obese men
and women 4 times per week on a cycle ergometer. Energy
expenditure between the two conditions was equivalent. The
results yielded nonsignificant reductions in weight or fat mass
for both conditions.
Perhaps the greatest support for fat loss benefits of interval
training comes from MacPherson et al.7 In this study,
recreationally athletic college-aged men and women performed 3
weekly sessions of sprint interval training or steady-state
running for 6 weeks. Both groups experienced significant
reductions in body fat percentage and fat mass, as well as small
increases in lean mass. Although the interval group experienced
a larger total decrease in fat mass (1.7 kg vs. 0.8 kg), the
difference between the conditions was nonsignificant. In contrast
to the methods used in the aforementioned studies, MacPherson
et al. did not attempt to equalize work or energy expenditure,
which makes the difference in total exercise time across the
study period (13.5 and 0.75 hours for the steady-state and
interval conditions, respectively) noteworthy. Nevertheless,
subjects in the interval condition were encouraged to engage in
active rest on the treadmill for 4 minutes following each of
their maximal effort sprints, which resulted in a total activity
time commitment of 6.75 hours.
Recently, Keating et al8 compared fat loss outcomes for
overweight adults randomly assigned to either an interval or
steady-state cycle ergometer program. Both groups performed
exercise 3 days per week for 12 weeks. There was a significant
decrease in DEXA-measured body fat percentage for the steady-
state (-2.6%) but not the interval (-0.3%) group. The absence of
change for the interval group is somewhat unexpected, given
that the aforementioned studies found equivalent effects for the
two types of training. The authors indicated that this result may
be partially explained by the use of interval training bouts that,
to protect this clinical population, were less intense than those
used in previous studies. However, a comparison of protocols
shows the intensity of interval training for the Keating et al.
subjects (~120% of VO2peak and ~90% of maximal heart rate)
was consistent with those used in other studies (e.g., Schjerve,
Tjnna, Wallman and their colleagues) of overweight or obese
subjects.
An alternative explanation is that unmeasured subject variables
contribute to responsivity to exercise. Graphs from the Keating
et al. study show considerable within-group variability for both
exercise groups in body fat percentage change. In fact, some
subjects in both conditions actually gained body fat. This
highlights the importance of going beyond the aggregate data to
search for individual differences that distinguish responders
from non-responders.
Conclusion
Collectively, the data reveal that interval training offers no
reliable advantage over steady-state endurance training for fat
loss. In addition, the effectiveness of interval training is more
likely to be demonstrated when work or energy expenditure is
matched to that of steady-state protocols. This suggests that, in
spite of any acute metabolic or cardiovascular benefits of
interval training, intense but brief exercise is insufficient for
stimulating meaningful fat loss. This was indirectly highlighted
in Boutchers recent review of research in this area.9 Of the 6
interval training studies in which fat loss outcomes were
identified, the two cited (Boudou et al,10
Mourier et al11
) for
demonstrating the strongest effects included 2 days per week of
45-minute steady-state training bouts in a program with only one
interval-training day each week.
Regarding application, assuming energy intake is regulated,
activity-based fat loss programs should prioritize energy cost of
exercise and activity preference. For athletes already involved in
frequent and intense sport-specific training, activities that have a
negative impact on quality of practice and competitive
performance should be avoided. If interval training results in
poor program compliance, fatigue, overeating, and reduced daily
activity, alternative strategies should be explored. For aesthetic
athletes, a realistic fat-loss strategy might involve small dietary
changes combined with low- to moderate-intensity exercise,
such as uphill walking at a comfortable pace, performed for an
Alan Aragons Research Review May 2014 [Back to Contents] Page 4
extended duration. In sum, although intense interval training has
value to the athlete, it may not be the best option for fat loss. In
the larger context of athletic training, a moderate, comfortable
approach offers the greatest chance for success.
____________________________________________________
Joel Minden, Ph.D., CSCS, is a lecturer in the psychology and
kinesiology departments at
California State University, Chico.
He writes about strength and
conditioning, nutrition, sport
psychology, and dance for his
website www.joelminden.com.
____________________________________________________
References
1. Thomas, T. R., Adeniran, S. B., & Etheridge, G. L. (1984). Effects of different running programs on VO2 max, percent fat,
and plasma lipids. Canadian Journal of Applied Sport Sciences,
9(2), 55-62. [PubMed]
2. Tremblay, A., Simoneau, J. A., & Bouchard, C. (1994). Impact of exercise intensity on body fatness and skeletal muscle
metabolism. Metabolism, 43(7), 814818. [PubMed] 3. Tjnna, A. E., Lee, S. J., Rognmo, ., Stlen, T. O., Bye, A.,
Haram, P. M., Loennechen, J. P., Al-Share, Q. Y., Skogvoll, E.,
Slrdahl, S. A., Kemi, O. J., Najjar, S. M., & Wislff, U.
(2008). Aerobic interval training versus continuous moderate
exercise as a treatment for the metabolic syndrome: a pilot
study. Circulation, 118(4), 346354. [PubMed] 4. Trapp, E. G., Chisholm, D. J., Freund, J., & Boutcher, S. H.
(2008). The effects of high-intensity intermittent exercise
training on fat loss and fasting insulin levels of young women.
International Journal of Obesity, 32(4), 684691. [PubMed] 5. Schjerve, I. E., Tyldum, G. A., Tjnna, A. E., Stlen, T.,
Loennechen, J. P., Hansen, H. E., Haram, P. M,, Heinrich, G.,
Bye, A., Najjar, S. M,, Smith, G. L., Slrdahl, S. A., & Kemi,
O. J., Wislff, U. (2008). Both aerobic endurance and strength
training programmes improve cardiovascular health in obese
adults. Clinical Science, 115(9), 283293. [PubMed] 6. Wallman, K., Plant, L. A., Rakimov, B., & Maiorana, A. J.
(2009). The effects of two modes of exercise on aerobic fitness
and fat mass in an overweight population. Research in Sports
Medicine, 17(3), 156170. [PubMed] 7. Macpherson, R. E., Hazell, T. J., Olver, T. D., Paterson, D. H.,
& Lemon, P. W. (2011). Run sprint interval training improves
aerobic performance but not maximal cardiac output. Medicine
and Science in Sports & Exercise, 43(1), 115-22. [PubMed]
8. Keating, S. E., Machan, E. A., O'Connor, H. T., Gerofi, J. A., Sainsbury, A., Caterson, I. D., & Johnson, N. A. (2014).
Continuous exercise but not high Intensity interval training
improves fat distribution in overweight adults. Journal of
Obesity, 2014. [Journal of Obesity]
9. Boutcher, S. H. (2010). High-intensity intermittent exercise and fat loss. Journal of Obesity, 2011. [Journal of Obesity]
10. Boudou, P., Sobngwi, E., Mauvais-Jarvis, F., Vexiau, P., & Gautier, J. F. (2003). Absence of exercise-induced variations in
adiponectin levels despite decreased abdominal adiposity and
improved insulin sensitivity in type 2 diabetic men. European
Journal of Endocrinology, 149(5), 421-424. [PubMed]
11. Mourier, A., Gautier, J. F., De Kerviler, E., Bigard, A. X., Villette, J. M., Garnier, J. P., Duvallet, A., Guezennec, C. Y., &
Cathelineau, G. (1997). Mobilization of visceral adipose tissue
related to the improvement in insulin sensitivity in response to
physical training in NIDDM: effects of branched-chain amino
acid supplements. Diabetes Care, 20(3), 385-391. [PubMed]
Alan Aragons Research Review May 2014 [Back to Contents] Page 5
How to manipulate research.
By James Heathers
_________________________________________________________________
Most of the audience for this article probably pays attention to
the broader scientific literature in exercise and musculoskeletal
physiology, strength and conditioning, nutrition, dietetics and
sports medicine. From this, you take the available evidence and
you slot it somewhere into an available framework of what's
already known. This, everyone is familiar with.
What people generally dont know is how to cheat.
Yes, cheat. Let me outline why you would: the way academic
funding presently works is that, in general, output is rewarded over insight two papers are better than one. So, the more you write, the better off youre going to be. There are many problems with this, and the environment it creates. One of those problems
is that it becomes very tempting to 'massage' results from
different research projects in order to achieve reportable
outcomes. I should mention here that the majority of the time
this isnt actually dishonesty its the fact that researchers have convinced themselves that theyve asked a good question, and that if they just change a few key variables with the analysis and
reporting, suddenly theyll have the result that they know is there. And when that result turns up, it was because the initial
analysis was wrong.
Unfortunately, science doesnt work like that. Much of my academic work is in dealing with problems surrounding this
issue; I am a methodologist. This means I concentrate heavily on
how research should be conducted essentially, research into research. Methodologists develop new techniques in analysis,
and verify that old ones work in the manner we hope they do.
Think of the production of knowledge via academic outcomes as
a game of poker. Research, like poker, is an expensive,
stochastic process full of frustration, late nights, and alcohol but, also like poker, eventually if youre good enough, the balance of probabilities favour you winning. Insight, like money,
is hard won.
This process comes to a scrunching halt when someone starts to
obscure the honest truth of what happened in a study, because
there is no skill or reason that can be applied. You literally cant win, because the odds of something being supportable or
repeatable are being manipulated.
However, just like poker, there are 'tells' certain signs which allow you to detect another process at work. This is a partial list
of those tells, illustrated with examples drawn liberally from the
medical and social sciences. Ive tried to use exercise science and nutrition studies where convenient, but the principles are the
same regardless often Ive simply chosen the most convenient examples that have come to mind. Please bear in mind I dont think these papers are guilty of any kind of conscious
dishonesty, they are merely convenient examples of the
principles involved.
This list is not comprehensive and is in no particular order some of them work over time across different papers, some of
them are specific to individual papers. These errors are both
common and uncommon, both serious and trivial. They have
various degrees of culpability (likely intent to deceive),
significance (the ability to influence the outcome of the study
overall), and detectability (how easy it is to spot from the article
text). All have the potential to be dishonest.
1. Altered endpoints, timepoints or measurement criteria
Murphy et al1 investigated the effect of beetroot consumption on
running performance due to their nitrate content. n=11 received
a supplement of either beetroot (standardised to contain 500mg
nitrates) or cranberry puree, in a double-blind cross-over
fashion. Their heart rate, perceived exertion and time to
completion of a 5km run was recorded. In the first mile,
participants rated their perceived exertion significantly higher in
the cranberry condition. In the final 1.8km, participants were
significantly faster in the beetroot condition.
Why was a 5km broken into miles? i.e. 0 1.6km, 1.6 3.2km, 3.2 5km.
There are an infinite number of ways to divide a time interval
into pieces. This analysis could have been performed as single
kilometre intervals, or using a simple statistical model which
predicts the overall effect of time through the race on exertion,
and the overall difference between the groups by time. There is
no reason to use a unit of measurement invented by the ancient
Romans and formally defined in 1593.
Researchers are well aware that trying different assortments of
time intervals can uncover differences between timepoints due to random variation. Say we split the data into 100m intervals there are now 50 separate comparisons over the 5km where we
can analyse the difference between our Beetroot and Cranberry
groups. We are essentially making so many comparisons that
one will be true due simply due to the noise present in the
measurement.
(Of course, there are methods for statistically controlling
multiple comparisons2 but researchers don't report all the
comparisons they used... in this case, the reader doesn't know
that these multiple comparisons need to be controlled.)
The other extreme is also a problem. Say we analyse the dataset
only over the whole 5km, but beetroot consumption improved
the finishing speed of the run. This would be a highly significant
finding, as we know that in middle/long distance races there is
already a pattern between laps or race phases (e.g., Tucker et
al3).
Culpability: low to medium
Significance: medium
Detectability: high
2. Conveniently one-sided significance testing 3. Methodological fiddles
(These are not always associated, but theyve been so neatly combined in a paper from a few years ago that Ive put them together here.)
Christian et al4 enrolled n=279 in a computer-support program
for weight loss at an American public hospital. All participants
Alan Aragons Research Review May 2014 [Back to Contents] Page 6
were suffering from metabolic syndrome. Participants were
given a full health and blood screening 12 months apart, and
were assigned to either a computer-based tailored lifestyle
intervention or a standard package of information on weight
management. Participants were more likely to lose weight in the
intervention vs. control group (-3.3lbs vs 0.33lbs, p=0.002).
Participants who lost more than 10% BW had lower total
cholesterol (-14.9 vs -3.9, p=0.05) which appears to be driven by
the loss of LDL cholesterol (-14.0 vs. -4.1, p=0.04).
Why was the outcome of the program determined by a group of people who lost 5% of bodyweight which included BOTH members of the intervention and the control group?
This is the methodological fiddle there were n=46 participants who lost more than 10% BW and n=11 of them were from the control group (and thus n=35 from the intervention). These were
lumped together to create the impression that the program was
effective. This is hardly the most honest conclusion when about
a quarter of the people with significant weight lost were from the
control group it would be more true to say here that people with metabolic syndrome who lose weight improve their serum
lipids regardless of how they do it. This is hardly evidence in favour of the intervention.
The other fiddle is staring us in the face from the p-values
above
Why was the above difference assessed with a one-sided t-test?
As were all probably aware here, the p-value is the calculated probability of getting the observed result if the null hypothesis is
true. In this case, this is that the standard intervention and the
computer-based intervention were identical. We accept that
when a result is sufficiently unlikely to have occurred on this
basis, that the experimental hypothesis is true in other words, that our intervention has actually intervened.
One-tailed statistical tests assume that this process has a
direction that the effect will have a direction (i.e. A will be higher than B). This gives you twice the statistical flexibility than you otherwise might have in a two-tailed test.
There are a few situations where one-tailed tests are necessary.
Firstly, when we have strong directional hypothesis: good
evidence than our intervention should be better than the control
group. In this case, we do not the researchers mention previous work with the same intervention being only somewhat effective
in a diabetic sample. Secondly, when we are using very few t-
tests to compare different values. In this case, we do not the researchers have around twenty individual tests.
However, these are not hard and fast rules, and researchers often
have another rule of thumb which simply goes like this: one tailed tests are what you use when youre trying to get something to achieve a criteria of significance when it hasnt quite made it. They have traditionally found refuge in questionable results, and
as weve just discussed, theyre being used here to assess the difference between did lose weight and didnt lose weight regardless of group. A classic fiddle, and one the reviewers
really should have spotted.
Conveniently one-sided tests.
Culpability: medium to high
Significance: low to medium
Detectability: high
Methodological fiddles
Culpability: medium
Significance: medium
Detectability: medium to high
4. Overly complicated or uninterpretable models
Another rather impressive looking technique is to take individual
measures which are quite complicated and roll them into a
model far more complicated than the average reader can
understand. Social scientists try this much more than exercise
physiologists, in my experience. But it does occur.
A recent paper5 studied the split times of 2 world-record
marathon runs, most recently Patrick Makaus Berlin Marathon (2011) which was a scarcely believable 2 hours, 3 mins and 38
seconds. It describes several different curve fits possible to these
runs, combines headwind and gradient data with individual
kilometre time splits, and tries to find an optimal model or
pacing strategy.
Towards the conclusion it states:
Oscillations at the micro-level overlay low-frequency, macro-level oscillations or modes indicating that an athletes resulting pacing trace represents a potentially complex amalgam of numerous signalling processes emanating from the brain, each with their own activation frequency.
Of course, concluding that the best ever marathon times are
employing highly sub-optimal pacing strategies seems wildly
implausible because of the extraordinary amount of competition over such a long period of time, one might assume
that either a) the best times ever were, in fact, fairly well paced
by definition or b) that an optimal strategy doesnt exist due to individual differences that are impossible to predict (a stubbed
toe, a very slightly tight hamstring, a bad nights sleep, a micro-change in gradient, and so on). An optimal pacing strategy cant be followed, of course, if its highly impractical. That is essentially stating If X was possible, then it would be better in an environment where X cant be practically be performed.
Culpability: low
Significance: low to medium
Detectability: high
5. Over-testing, a.k.a. random sifting
Ive thought long and hard about how to get you an example of this, and Im not sure I can. Heres how random sifting works:
We decide to measure the effect of a new training regime of volume squats on short-course track times.
We assign 30 experienced middle-distance runners equally to three groups no extra training, 1 extra
Alan Aragons Research Review May 2014 [Back to Contents] Page 7
training day of squats per fortnight, and 3 extra training
days per fortnight.
We take demographic variables to start with (age, gender, race), anthropometry (height, weight, BMI,
body composition), bloods (c-reactive protein, cortisol)
and training readiness (neurological assessment, heart
rate variability).
We take race variables (400m time, 3 times w. 5 mins rest between races), and 1500m time (with lap split
times). Participants rate perceived effort and
pain/soreness after each race. Then we run the program
for 6 weeks, test all the above again (mid-line) and test
again at 12 weeks. Naturally, we record the poundages
moved in each session for the two training groups.
Not the worst design ever, right? Comprehensive, detailed?
Wrong. Its dreadful.
This is the most unholy octopus of impossible interlocking
variables youll ever see. Any one of the above can be used to control for, or combine with, any another. Variables you add to a
study are not additive: if I measure seven things at one
timepoint, I dont have seven potential comparisons in the data. I have instead any combination of the presence or absence of
those variables, using a cut-off that I define (or choose from the
literature), or using the top or bottom standard deviation, or all
the values over the mean (or the median) to define groups.
With the full access to the above information in my hypothetical
studies, there are so many ways you can find to combine the
outcomes that the answers that you will find are bordering on
meaningless unless the results you find are statistically very strong. Make no mistake: if I had the above dataset, I am 100%
entirely confident that I could produce a set of statistical
analyses which conclusively showed that our squat intervention was effective. Even if our squat intervention did literally nothing
or even made performance worse.
The only trick is to hide all the analyses that didnt work, then write up the one analysis which worked by pure chance as being
predicted by specific research questions that we started with.
This is formally called post-hoc reasoning and very hard to detect. After you test hundreds or thousands of pathways
through the above variables and find that, say, any squat intervention (1 or 3 sessions per fortnight) is effective on split
times in 1500m but not total times, and reduces perceived effort
but only in men, you then come up with a reason which specifically addresses why you might find this (and you choose
past literature to reference accordingly).
The behavioural economist and statistical guru Uri Simonsohn
has a now-classic paper which conclusively proves that listening to the song When Im 64 actually makes you older.6 Obviously, this is a crazy conclusion because a song cant modify your age, but it is borne out of the analysis that he conducted simply by
hiding all the analyses which didnt work.
He also has a great statement that he encourages reviewers to
send to every paper they peer-review which goes like this:
"I request that the authors add a statement to the paper confirming whether, for all experiments, they have reported all measures, conditions, data exclusions, and how they determined their sample sizes."
In other words, if the researchers have tested hundreds or
thousands of models trying to find a result, they need to report
the fact that they did so. This statement forces researchers to
either a) assent to the statement and upgrade their untrustworthy
analysis to outright fraud or b) admit that over-testing occurred.
The best way of controlling for this rather insidious and hard-to-
detect method is study pre-registration this is where the researchers write and publish a formal prediction of their study
outcome before they start the research. Its not a perfect solution, but its much better than the alternative.
Culpability: medium to high
Significance: medium to high
Detectability: low
6. The creeping over-extrapolation
This fiddle is a little different to the others, as it involves the
external perception of the study. Its also very common, so common that it took me about 45 seconds to find this example.
The science journalism site sciencealert.com.au ran this rather
bold headline a month or so ago.
Depression can be detected with a blood test
Interesting, right? Heres the subheadline, now that has your attention.
Doctors may soon be able to diagnose mental illness
with a simple blood test, new research suggests.
Sounds like a breakthrough, right? Not so fast. The title of the
article its describing is:
"Platelet Serotonin Transporter Function Predicts Default-Mode Network Activity"7
Heres the glossy and rather tortured logic that connects them:
The serotonin transporter protein removes serotonin from
extracellular space. The main method of this is via the
transporter protein on blood platelets. There is also a good
relationship between this platelet uptake and the synaptosomal
uptake (the uptake by areas of the brain).
Separately, there is a relationship between depression and the
activity of the default-mode network in the brain a coordinated system of activity which is active at rest and seems
to be implicated with receiving and processing information
which is self-referential. It is hypothesised that this network is
disrupted in depressed people, which is the hypothetical source
of intrusive thoughts and poor concentration in depression.
Finally, we know that serotonin is implicated in depression as
serotonin reuptake inhibitors are a frontline treatment for
Alan Aragons Research Review May 2014 [Back to Contents] Page 8
depression. That is to say, like most psychotropic medication,
they work sometimes in some people. We also are well aware
that while they fairly straightforwardly increase free serotonin
levels this is probably NOT their primary method of action
(otherwise, why would these drugs which raise serotonin in 20
minutes take weeks to start improving mood in depressed
patients?)
But anyway: if we can measure the blood platelet serotonin
reuptake velocity (related to the same function in the brain), it
might be related to the metabolic activity of the brain by the
default-mode network (impaired in depression; serotonin
implicated in function).
So the researchers took a sample of healthy people and found a reasonable relationship between their blood platelet serotonin
uptake with the function of the default-mode network as
measured by blood-oxygen level dependent fMRI scan.
And finally, please recognise that the above is itself a simplification.
This is what gives us depression detected with a blood test.
I understand, of course, that journalism sensationalises
complicated topics like neurobiology. But the obvious caveat to
that it really shouldnt simplify something so much that is isnt reasonably true anymore. And why would they do such a thing? Well, partly because its their job, but also partly because the researchers put out a press release with exactly the same
headline, containing wonderfully compelling but detail-poor
sentences such as serotonin transporter regulates neural depression networks.
This is creeping over-extrapolation. You start with a result
which, as far as I can tell, is a fairly solid piece of neurobiology
relating brain oxygen level uptake over certain cortical networks
to measured platelet serotonin uptake in the blood. Then you
write a paper discussion and abstract which extrapolates the
results somewhat, talking about what might be possible in future
(if several important caveats are true). About this, you write a
simplified press release which presents the results in a glowing
light and presents those extrapolations as the point of the paper.
Then you let a journalist with no formal science education write
about it.8
Im including this as an error researchers make because its the 21
st century, and researchers have an obligation to ensure that
their research is correctly reported. It is common for researchers
trying to justify the external impact of their work in grant applications to collect these lazy, overwhelmingly positive
stories and list them prominently on their CVs. Be cautious of
any academic who is proud of how many newspaper articles are
written about them.
Culpability: medium
Significance: medium to high
Detectability: high
7. Outlier forgetting 8. Outlier remembering.
Again, these are together because they are closely related.
Outlier remembering
Reger et al9 matched a controlled dose of medium-chain
triglyceride (MCT) oil in n=20 Alzheimer's Disease patients, to
see if the presence of blood ketones had an immediate effect on
cognition. I have reproduced the graph of the central result here
on the left it shows that an increase in performance on a cognitive task was correlated with increase in blood ketones or was it?
That point you can see on the left-hand side of the left-hand side
graph with the big arrow represents a participant who performed
much more poorly on the cognitive task after MCT oil than after
placebo. This is the dead-set opposite of what was predicted, a
decrease in performance a few times bigger than the alleged increases in performance observed in other people. Theres no good reason for this to happen, and its both in the opposite of the predicted direction and dramatically in excess of everyone
elses change scores.
Now, there are several tests which determine whether or not a
value is an outlier some researchers simply do this by feel, but the more correct way is with a test which compares the value
to the rest of the sample. The most common version of this is
Grubbs test10 and this flags that value as being an outlier.
Why was the outlier left in?
When this value is removed, the level of statistical significance
drops from p=0.02 to p=0.08, and reduces the r value (the
correlation coefficient) from 0.5 to 0.42. In other words, it
waters down the impact of the central finding. While it isnt actually a big difference, it does cast doubt on the central
result.11
As you can probably tell from this, outliers being included are
very easy to spot. Even when only the means and standard
deviations of numbers are reported, it's usually obvious when
something is off.
Outlier forgetting
Its hard to find an example of outlier forgetting (the removal of extreme values which disagree with the theory to improve the
Alan Aragons Research Review May 2014 [Back to Contents] Page 9
central result) for the simple reason that they arent there to find! There are some sophisticated methods you can try to determine
if there is enough variation in a sample, but until Im writing for Alan Aragons Statistical Review, well have to let these slide.
Suffice to say, this can be a real problem. If you selectively
remove values which ruin your result, it very quickly runs the
risk of becoming straightforwardly dishonest. This is why I don't
have an example of one all I have is an example of where someone didn't do it.
You can see a good example of this recently. Kogan et al12
examined the relationship between heart rate variability (HRV) the same kind we use for athletic monitoring and depression / social functioning. They found some values which were outliers,
and repeated the analysis with outliers both out and in, and then reported the separate models. This is definitely the honest way to do business if you're removing values, the fact that youre doing it, what the values are, and what this changed about the
analysis should ALL be reported in the paper.
Remembering:
Culpability: medium
Significance: medium
Detectability: very high
Forgetting:
Culpability: high
Significance: high
Detectability: low
9. 'Cute' covariates
Arai et al13
looked at the inter-relationship between heart rate
variability (the same kind we use for athletic monitoring) with
QT-interval (another metric of health/autonomic outflow which
we get out of the electrocardiogram) with a sea of possible
covariates in n=150 young participants.
If I criticised everything in this paper which I didnt like, I would bore you more than is strictly necessary and wear my
fingers down to stumps. So lets leave the criticisms like the incorrect use of the analysis of covariation to one side, and just
concentrate on what might be useful for you: how to spot a
dodgy covariates.
There are several tells here. Firstly, the presence of a lot of covariates and models for a simple question. Here, 9 measures of
different heart rate indices are compared with seven possible
covariates. As before with our hypothetical squat study, a lot of
possible comparisons is a red flag.
Secondly, the use of covariates which are not statistically
independent. For instance, there are models in the paper which
use BMI in the same model as body fat percentage as measured by impedance. These numbers will obviously be related, and
inter-relationships between these variables complicate our ability
to understand the study outcomes dramatically.
Lastly, the use of broad appeals. The paper justifies adding
covariates of BMI and fat mass into the sample because it was
relevant elsewhere (because obese and overweight people often
have impaired HRV, for instance). But this sample Arai et al use
is drawn from Japanese students at a school of medicine the female sample has a mean BMI of 20.1 and a standard deviation
of 2.1. This means that of the 86 women, it is likely that only one participant or even absolutely no participants at all were even overweight (let alone obese). Their comparison paper was drawn
from a sample in Mexico which, as you might be aware, holds the dubious honour of being the worlds most obese country.
Regression is a complicated topic, and its very easy to hide dodgy techniques behind a wall of metrics and numbers.
Researchers and reviewers fail to understand the implications of what theyre doing with a concerning regularity.
Culpability: very high
Significance: very high
Detectability: low
10. Conflating statistical and practical effects
DeWall et al14
tested n=93 undergraduates on the Intimate
Partner Violence scale and Trait Physical Aggression scales in
two groups, who received either a placebo or an intranasal dose
of the hormone oxytocin and a priming condition where they underwent painful / stressful tasks. The paper strongly concluded
that oxytocin increased intimate partner violence inclinations in
participants who were high in trait physical aggression.
Now, this may be strictly true in the statistical sense the results are probably calculated correctly. But does X is mathematically different to Y have any meaning in this context?
The Intimate Partner Violence scale is a series of charming items
where people are asked to score their likelihood of slapping,
shoving, hitting, kicking etc. their current romantic partner. It is
ranked from 1 not at all likely through to 5 extreme likely, and then averaged. The problem here is the whole group in the
study had an average of 1.13 (SD = 0.39).
I tried to model this, and its impossible to predict well (remember that no scores can be below 1). Probably two-thirds
of the entire sample in ALL groups put not at all likely for every single possible answer. The entire sample could be driven by
some combination of a) the very few people who reported some
vague likelihood of violence, and b) the fact that some of the
groups have no mathematical variability AT ALL everyone put the same answer. In psychology this is called a floor effect, and it has the potential to make analyses do awfully strange
things.
As this is a social science example, lets cast the same scenario into a hypothetical exercise science study:
Say we have a new supplement which is designed to decrease
post-exercise pain. N=80 participants firstly take either our
supplement or a matched placebo, then all perform a high-
volume high-intensity deadlift program, doing sets of 85% 1RM
to concentric failure, and then 80%, 75%, etc. until total
Alan Aragons Research Review May 2014 [Back to Contents] Page 10
concentric failure with 50% 1RM is reached. They then rate their
lower back and hamstring pain 48 hours after exercise. More or
less everyone writes 10 I am in the maximum amount of imaginable exercise discomfort mainly because this is an insane protocol which shouldnt be attempted. But a few people in our supplement group write 8 I am in a very, very large amount of pain.
Now, can we accept that this is a meaningful difference? Well,
with hundreds and hundreds of participants, maybe. But it is far
more likely to be semantic we hurt our participants very badly and seem to only be fiddling at the margins of the value of
interest. What we were looking for was the absence of pain, and not the presence of very slightly less.
Our domestic violence questionnaire is the other way around statistical significance or not, the change from, say, extremely unlikely to quite unlikely may not be particularly useful at telling us about actual aggressive tendencies.
Culpability: high
Significance: low to medium
Detectability: high
Bonus: Making up data
I have to include this although it isnt really a manipulation in the way other things are its fraud! Shang and Hasenberg15 investigated the effect of exercise training subsequent to Roux-
En-Y gastric bypass (i.e. stomach surgery). N=60 morbidly
obese participants were randomised to receive either once or
twice-weekly exercise training. Significantly more body weight
and fat mass was lost in the multiple-exercise group, who also
showed significant improvement in co-morbidities.
The problem here is that none of this actually happened.
Someone from either the hospital or associated research group
noticed that in the location where the data was reported from
only n=21 patients had actually undergone any procedure at all
in the period the paper was written over the data, as it stood, couldnt exist! On questioning, Dr. Shang couldnt produce any of the raw data and had no answer for where it had come from.
Naturally, this paper is retracted.
Culpability: very high
Significance: very high
Detectability: very low
Conclusions:
Please keep in mind firstly that researchers arent science-robots from an alternate dimension, theyre people. Theyre people with children and mortgages, and research programs which have to work out so they can continue to be funded, in highly
competitive jobs, often competing against people who are
willing to bend publication requirements to look better.
Research isnt by any means a hotbed of fraud and deceit.
That being said, researchers even from famous and venerable institutions can also be stunningly ignorant of the sub-structure
of the research methodology they need to understand, can make
basic mistakes in analysis, can deceive themselves, and can
cheat, manipulate or defraud the process of producing scientific
knowledge.
The thing that we have in our favour in trying to ascertain the
presence of the above is that science is the pursuit of knowledge
on the public record. Anything thats fiddled, or dishonest, or under-handed, or incorrect, can only ever be hidden in plain
sight, and in general the ideas that everyone agrees are the most
important receive the most scrutiny. This might sound laudable,
but it is anything but straightforward. Progress lurches along
quite slowly. There are a few things that you, the interested
reader (or perhaps peer-reviewer) can do to help, and to satisfy
your own curiosity.
1. Contact the researchers. Ask for data.
Researchers, in general, like to talk about their work. Generally
the person who is on the paper as the corresponding author is the right person to ask about it. However, be aware when the last author in the list of authors is listed as corresponding this generally means the most senior person on the project is also the
person youre contacting, who is also often the busiest.
(In situations like this, I generally Google the first author and
ask them if they can help...)
Researchers can be notoriously precious about sending their data
to other people. This isnt just because theyre afraid of scrutiny or persecution (they often are). Its also because data files can be a complete mess after the completion of a study, in three
different files (with different versions) only comprehensible to a
co-author, and squirreled away on a university server with a
password known only to the research assistant who quit 9
months ago. What youre asking could represent a big investment of time on the part of the researchers. But you can
always ask.
2. Support efforts to put data in the public domain
This is a big component of whats called open science the trend towards publishing datasets with experiments, as well as
analytical tools etc. that are used. Remember that people who do
this are extending what until now has been a privilege, which is
the ability to look under the hood of how a study works. I feel strongly that researchers who publish data earn an extra degree
of trust.
3. Post on pubpeer or PubMed Commons
These are both websites where you can leave comments for the
public record on published research. If you want answers for
questions that you have, they are very useful. To get access, I
believe you need either an academic email address (i.e. one from
a tertiary institution) or an invitation from an existing user.
4. Start a conversation
A few years ago, I was very amused when Alan was arguing
with Dr. Robert Lustig of sugar is evil fame, and was told
Alan Aragons Research Review May 2014 [Back to Contents] Page 11
rather huffily that academics do not have head-to-head
confrontations on blogs, social media, forums, etc. I was amused
because they damn well do all the time, and at great volume. There are plenty of outlets for legitimate questions about
research which arent the old, formal methods if you know someone with a public blog, ask them to start a conversation for
you. Or start one yourself. Invite the researchers to comment.
Remember with all of the above to be courteous and show
interest, rather than trying to storm the ramparts. Everyone is
looking for answers, but some are looking better than others.
____________________________________________________
James is just about to finish a
PhD in cardiac electrophysiology.
In his spare time, he breaks
things for money. Everything else
you need to know is here:
jamesheathers.com
____________________________________________________
References:
1. Murphy, M., K. Eliot, et al. (2012). "Whole beetroot consumption acutely improves running performance." J Acad
Nutr Diet 112(4): 548-552. [PubMed]
2. http://en.wikipedia.org/wiki/Bonferroni_correction 3. Tucker, R., M. I. Lambert, et al. (2006). "An analysis of
pacing strategies during men's world-record performances in
track athletics." Int J Sports Physiol Perform 1(3): 233-245.
[PubMed]
4. Christian, J. G., T. E. Byers, et al. (2011). "A computer support program that helps clinicians provide patients with
metabolic syndrome tailored counseling to promote weight
loss." J Am Diet Assoc 111(1): 75-83. [PubMed]
5. Angus, S. D. (2014). "Did recent world record marathon runners employ optimal pacing strategies?" J Sports Sci
32(1): 31-45. [PubMed]
6. Simmons, J. P., L. D. Nelson, et al. (2011). "False-positive psychology: undisclosed flexibility in data collection and
analysis allows presenting anything as significant." Psychol
Sci 22(11): 1359-1366. [PubMed]
7. Scharinger, C., U. Rabl, et al. (2014). "Platelet serotonin transporter function predicts default-mode network activity."
PLoS One 9(3): e92543. [PubMed]
8. And then someone who doesnt even understand the journalism uses it in an argument on the internet!
9. Reger, M. A., S. T. Henderson, et al. (2004). "Effects of beta-hydroxybutyrate on cognition in memory-impaired adults."
Neurobiol Aging 25(3): 311-314. [PubMed]
10. http://en.wikipedia.org/wiki/Grubbs'_test_for_outliers
11. In statistical terminology, this is only an outlier on the x-axis and it's in the right place so technically it's a point of leverage not an outlier.
12. Kogan, A., J. Gruber, et al. (2013). "Too much of a good thing? Cardiac vagal tone's nonlinear relationship with well-
being." Emotion 13(4): 599-604. [PubMed]
13. Arai, K., Y. Nakagawa, et al. (2013). "Relationships between QT interval and heart rate variability at rest and the covariates
in healthy young adults." Auton Neurosci 173(1-2): 53-57.
[AN/BC]
14. DeWall, C.N., O. Gillath, et al. (2014). When the Love Hormone Leads to Violence: Oxytocin Increases Intimate
Partner Violence Inclinations Among High Trait Aggressive
People Soc Psych Pers Sci, Published online Feb 12th. [SPPS]
15. Shang, E. and T. Hasenberg (2010). "Aerobic endurance training improves weight loss, body composition, and co-
morbidities in patients after laparoscopic Roux-en-Y gastric
bypass." Surg Obes Relat Dis 6(3): 260-266. [PubMed]
Alan Aragons Research Review May 2014 [Back to Contents] Page 12
Changes in exercises are more effective than in loading schemes to improve muscle strength [reviewed by Brad Schoenfeld, PhD, CSCS, CSPS, FNSCA].
Fonseca RM, Roschel H, Tricoli V, de Souza EO, Wilson JM,
Laurentino GC, Aihara AY, de Souza Leo AR, Ugrinowitsch C.
J Strength Cond Res. 2014 May 14. [Epub ahead of print]
[PubMed]
____________________________________________________
BACKGROUND/PURPOSE: This study investigated the
effects of varying strength exercises and/or loading scheme on
muscle cross-sectional area (CSA) and maximum strength after
four strength training loading schemes: constant intensity and
constant exercise (CICE), constant intensity and varied exercise
(CIVE), varied intensity and constant exercise (VICE), varied
intensity and varied exercise (VIVE). METHODS: Forty-nine
individuals were allocated into five groups: CICE, CIVE, VICE,
VIVE, and control group (C). Experimental groups underwent a
twice a week training for 12 weeks. Squat 1RM was assessed at
baseline and after the training period. Whole quadriceps muscle
and its heads CSA were also obtained pre- and post-training.
RESULTS: The whole quadriceps CSA increased significantly
(p
Alan Aragons Research Review May 2014 [Back to Contents] Page 13
for sure that the same findings would be seen in a well-trained
population. Indeed, if the authors hypothesis that changing the
rep range had a negative effect on neural drive is in fact correct,
it could alternatively be hypothesized that this detriment would
not occur in more experienced subjects since neural adaptations
would already be well-ingrained.
One issue that can be raised with the design is that the rep range
employed for the varied intensity groups (6-10 reps per set) was
fairly narrow. It would be difficult to imagine that changes in
muscle growth would have been significantly different using
such a narrow range over the course of a few months. What
would have been more interesting from a hypertrophy
standpoint, IMO, is if the rep range had of encompassed a low
rep condition (i.e. 5 reps), moderate rep condition (10 reps) and
a high rep condition (15 reps). Based on the concept of the
strength-endurance continuum, comparing a constant intensity of
10 reps per set versus a varied intensity of 5-10-15 reps per set
would have made more sense to see if muscle hypertrophy
differs along this continuum.
Ultimately the study provides intriguing findings that have
practical implications for training. Most importantly, it
reinforces the need to vary exercise selection to maximize
muscular symmetry as well as strength. It also suggests that,
from a maximal strength standpoint, limiting variation in
intensity of load is beneficial during the early stages of training.
Ideally this study should be replicated, perhaps with wider
intervals in rep range, in well-trained subjects to provide better
generalizability for those with lifting experience.
____________________________________________________
Brad Schoenfeld, PhD, CSCS, CSPS, FNSCA, is a
lecturer in the exercise science department for
Lehman College and is the head of their
human performance laboratory. His primary
research interests focus on elucidating the
mechanisms of muscle hypertrophy and their
application to resistance training. He has
published over 40 peer-reviewed journal
articles and currently serves on the Board of
Directors for the NSCA. He is author of the
book, "The M.A.X. Muscle Plan" which is
available at all major bookstores and on
Amazon.com. He maintains an active blog on his website:
http://www.lookgreatnaked.com/
Alan Aragons Research Review May 2014 [Back to Contents] Page 14
The effects of consuming a high protein diet (4.4 g/kg/d) on body composition in resistance-trained individuals.
Antonio J, Peacock CA, Ellerbroek A, Fromhoff B, Silver T. J
Int Soc Sports Nutr. 2014 May 12;11:19. [PubMed] [Full Text]
BACKGROUND: The consumption of dietary protein is important for resistance-trained individuals. It has been posited that intakes of 1.4 to 2.0 g/kg/day are needed for physically active individuals. Thus, the purpose of this investigation was to determine the effects of a very high protein diet (4.4 g/kg/d) on body composition in resistance-trained men and women. METHODS: Thirty healthy resistance-trained individuals participated in this study (mean SD; age: 24.1 5.6 yr; height: 171.4 8.8 cm; weight: 73.3 11.5 kg). Subjects were randomly assigned to one of the following groups: Control (CON) or high protein (HP). The CON group was instructed to maintain the same training and dietary habits over the course of the 8 week study. The HP group was instructed to consume 4.4 grams of protein per kg body weight daily. They were also instructed to maintain the same training and dietary habits (e.g. maintain the same fat and carbohydrate intake). Body composition (Bod Pod), training volume (i.e. volume load), and food intake were determined at baseline and over the 8 week treatment period. RESULTS: The HP group consumed significantly more protein and calories pre vs post (p < 0.05). Furthermore, the HP group consumed significantly more protein and calories than the CON (p < 0.05). The HP group consumed on average 307 69 grams of protein compared to 138 42 in the CON. When expressed per unit body weight, the HP group consumed 4.4 0.8 g/kg/d of protein versus 1.8 0.4 g/kg/d in the CON. There were no changes in training volume for either group. Moreover, there were no significant changes over time or between groups for body weight, fat mass, fat free mass, or percent body fat. CONCLUSION: Consuming 5.5 times the recommended daily allowance of protein has no effect on body composition in resistance-trained individuals who otherwise maintain the same training regimen. This is the first interventional study to demonstrate that consuming a hypercaloric high protein diet does not result in an increase in body fat. SPONSORSHIP: JA is the CEO of the International Society of Sports Nutrition. The protein powder was provided by MusclePharm and Adept Nutrition (Europa Sports Products brand); both are sponsors of the ISSN conferences.
Study strengths
A big strength of this study is the underlying concept, and the
interesting question investigated. Its one of the fun studies that pushes the what if we tried this crazy idea factor, examining a highly experimental and exploitive protocol. And, it happened to
yield some intriguing results. Overfeeding studies have thus far
focused on carbohydrate and/or fat,1-7
with a glaring scarcity of
studies on protein overfeeding.8 Furthermore, the majority of
overfeeding trials are short, ranging from a few days to less than
a month. Subjects were resistance-trained, which minimizes the
respond-strongly-to-anything tendency of novices.
Study limitations
Air displacement plethysmography (ADP, or Bod Pod) was used
to assess body composition. A comprehensive review by Fields
et al states:9 In conclusion, the BOD POD is a reliable and
valid technique that can quickly and safely evaluate body composition in a wide range of subject types, including those who are often difficult to measure, such as the elderly, children, and obese individuals. However, it should be noted that the majority of studies on the Bod Pod have compared it to
hydrostatic weighing. Ball and Altena10
compared Bod Pod to
dual X-ray absorptiometry (DXA) in a large sample of men
(n=160) and found that although the results from the two
methods were highly correlated, the difference increased as
bodyfat increased. Quoting their conclusion (which I feel is
hugely important):10
Practitioners should be aware that even with the use of technologically sophisticated methods (i.e., Bod Pod, DXA), differences between methods exist and the determination of body composition is at best, an estimation.
Another limitation is the questionable reliability of self-reported
dietary intake (and activity output). Research that immediately
comes to mind is Lichtman et al, who found that obese subjects
with a reported history of diet resistance under-reported food intake by an average of 47%, and over-reported physical activity
by 51%.11
In the case of the present study, there was a massive
amount of protein assigned to the experimental group (4.4 g/kg
or 307 g/day). The investigators were aware of the inherent
difficulty in carrying this out, hence their purposely uneven
randomization: 20 subjects were assigned to the high-protein
(HP) group, and 10 subjects to the control group. Its not out of the question that over-reporting occurred, since its human nature to avoid admitting failure to fully follow the program.
Aside from the limitations inherent with self-reported intake,
there was no objective measure of energy expenditure An
attempt to control for training volume was made via daily
journaling. There thus was the reliance upon the accuracy of the
subjects records, instead of an objective measure of energy expenditure such as the doubly labeled water (DLW) technique.
The use of DLW has been called the gold standard of assessing energy expenditure, particularly in non-confined conditions.
12
However, its rare to see DLW used in sports nutrition studies (or most any type of research, for that matter). This is because
its expensive and requires specifically trained personnel. Thus, were left with open questions about how the experimental protein overfeeding affected non-exercise activity thermogenesis
(NEAT). One of the most memorable examples of DLW use
capturing the impressive extent of NEAT was in 1999 when
Levine et al13
found that the metabolic response to a 1000-kcal
surplus ranged from a 98 kcal decrease to a 692 kcal increase in
NEAT. The groups mean increase in NEAT was 336 kcal. The authors summation is worth quoting directly:13
Thus, activation of NEAT can explain the variability of fat gain with overeating. As humans overeat, those with effective activation of NEAT can dissipate excess energy so that it is not available for storage as fat, [...] The maximum increase in NEAT that we detected (692 kcal/day, volunteer 5) could be accounted for by an increased strolling-equivalent activity of 15 min/hour during waking hours.
Comment/application
The most salient finding was the lack of significant change in
body composition in either group over the 8-week period:
Alan Aragons Research Review May 2014 [Back to Contents] Page 15
Surprisingly, the HP groups body composition showed no significant changes despite the assignment of an additional 800
kcal (in protein) above and beyond that assigned to the control
group. But, unlike Levine et als overfeeding study, the present study based overfeeding on protein exclusively. The HP groups consumption of ~307 g protein versus the control groups ~138 g without a doubt had a higher thermic effect. As reported by
Jquier,14
the thermic effect of protein (expressed as a
percentage of energy content) is 25-30%, carbohydrate is 6-8%,
and fat is 2-3%. However, not all of the literature is in precise
agreement. Halton and Hu reported greater variability, with the
thermic effect of protein being 20-35%, carbohydrate at 5-15%,
and fat being subject to debate since some investigators found a
lower thermic effect than carbohydrate while others found no
difference.15
Despite relative variations in carbohydrate and fat,
protein has consistently shown a markedly higher thermic effect
than either of them. In combination, the thermic effect of protein
combined with a liberal presumption of NEAT, the majority of
the dissipated protein energy is accounted for. The remainder is
plausibly attributable to reporting error.
In a recent study that made waves for being the first of its kind,
Bray et al16
compared the overfeeding effects of a low-protein
(5%), normal-protein (15%), and high-protein (25%) diet.
Carbohydrate was kept the same across the treatments, with fat
filling in the remainder. Among this studys design strengths was the use of DLW to assess energy expenditure. A 40%
energy surplus (954 kcal) was imposed for 8 weeks, and the low-
protein lost lean mass, all groups increased fat mass equally, but
the normal & high-protein groups gained lean mass, with the
latter gaining the more lean mass by a small margin. The low
protein group gained significantly less total bodyweight than the
higher-protein groups, but this was due to differences in lean
mass gain.
In the present study, no lean mass was gained despite an
increased protein intake in the HP group. This can be attributed
to the advanced resistance-trained status of the subjects (they
trained an average of 8.5 hours/week for the past 8.9 years), and
their baseline protein intake was already high (~1.9-2.3 g/kg). In
contrast, Bray et als subjects were untrained, and their protein intake at baseline was 1.2 g/kg, and this was raised to 1.8 g/kg in
the high-protein treatment essentially crossing the threshold from sub-optimal to optimal. Another point made by the authors
of the present study was that the subjects were instructed to
maintain their habitual training program, thus precluding any
novel or greater training stimulus that might elicit further gains.
Taking the results on face-value, it almost seems surplus calories
dont count since NEAT will save you as long as the surplus is from protein. However, its worth reiterating that not all aspects of this study were tightly controlled, and reporting error could
have played a significant confounding role in the results.
On the other hand, there is still the possibility that relatively
advanced, resistance-trained subjects have a heightened
capability of dissipating surplus energy from dietary protein
through involuntary, non-exercise means. This possibility also
holds potentially important implications for meal planning of
dieting individuals as well as those who are striving to maintain
weight loss, but also need to control appetite.
On a practical note, the following detail should be weighed into
consideration: ...every subject in the high protein group consumed protein powder in order to meet the requirements for the study. Otherwise, it would be have virtually impossible or highly unlikely that one could consume a 4.4 g/kg/d via food alone. Protein supplements (in this case, whey and casein powders) only contain trace amounts of fat and carbohydrate.
Those who want to experiment with higher protein intakes
should keep in mind that inadvertent addition of fat and
carbohydrate along with the extra protein (i.e., via mixed-
macronutrient dishes and/or fatty meats) would not mimic the
protocol nor the effects seen in the present study.
Alan Aragons Research Review May 2014 [Back to Contents] Page 16
An amino acid-electrolyte beverage may increase cellular rehydration relative to carbohydrate-electrolyte and flavored water beverages.
Tai CY, Joy JM, Falcone PH, Carson LR, Mosman MM,
Straight JL, Oury SL, Mendez C, Loveridge NJ, Kim MP,
Moon JR. Nutr J 2014, 13:47 doi:10.1186/1475-2891-13-47
[PubMed]
BACKGROUND: In cases of dehydration exceeding a 2% loss of body weight, athletic performance can be significantly compromised. Carbohydrate and/or electrolyte containing beverages have been effective for rehydration and recovery of performance, yet amino acid containing beverages remain unexamined. Therefore,
the purpose of this study is to compare the rehydration capabilities of an electrolyte-carbohydrate (EC), electrolyte-branched chain amino acid (EA), and flavored water (FW) beverages. METHODS: Twenty men (n = 10; 26.7 +/- 4.8 years; 174.3 +/- 6.4 cm; 74.2 +/- 10.9 kg) and women (n = 10; 27.1 +/- 4.7 years; 175.3 +/- 7.9 cm; 71.0 +/- 6.5 kg) participated in this crossover study. For each trial, subjects were dehydrated, provided one of three random beverages, and monitored for the following three hours. Measurements were
collected prior to and immediately after dehydration and 4 hours after dehydration (3 hours after rehydration) (AE = -2.5 +/- 0.55%; CE = -2.2 +/- 0.43%; FW = -2.5 +/- 0.62%). Measurements collected at each time point were urine volume, urine specific gravity, drink volume, and fluid retention. RESULTS: No significant differences (p > 0.05) existed between beverages for urine volume, drink volume, or fluid retention for any time-point.
Treatment x time interactions existed for urine specific gravity (USG) (p < 0.05). Post hoc analysis revealed differences occurred between the FW and EA beverages (p = 0.003) and between the EC and EA beverages (p = 0.007) at 4 hours after rehydration. Wherein, EA USG returned to baseline at 4 hours post-dehydration (mean difference from pre to 4 hours post-dehydration = -0.0002; p > 0.05) while both EC (-0.0067) and FW (-0.0051) continued to produce
dilute urine and failed to return to baseline at the same time-point (p < 0.05). CONCLUSION: Because no differences existed for fluid retention, urine or drink volume at any time point, yet USG returned to baseline during the EA trial, an EA supplement may enhance cellular rehydration rate compared to an EC or FW beverage in healthy men and women after acute dehydration of around 2% body mass loss. SPONSORSHIP: MusclePharm Corporation.
Study strengths
This study is innovative since its the first to compare the hydrating effects of a BCAA-electrolyte (AE) beverage with that
of a carbohydrate-electrolyte (CE) beverage. Furthermore, the
protocol involved a more realistic fluid dose than the typically
massive fluid doses given in previous research examining
rehydration beverages. Subjects were required to have a
minimum of one year of endurance and resistance training
experience, which minimized the chance of confounding
newbie effects. This investigation is of relevance to trainees aiming to economize caloric intake which is often hiked by the
carbohydrate content of conventional recovery beverages.
Study limitations
While this study may have relevance to those seeking to
economize carbohydrate intake, such a population would be very
sparse among trainees seeking to improve endurance-type
performance. This is because the inclusion of carbohydrate
would serve the dual purpose of driving better exercise
performance, as well as faster glycogen resynthesis post-exercise
(both of which can benefit competitive endurance sports especially those with multiple glycogen-depleting events per
day). Missing from the comparison was a condition containing
amino acids, carbohydrate, and electrolytes. However, the
authors duly cite research by Lambert et al,17
who found that in
the absence of electrolytes, no significant differences in
rehydration were seen between beverages containing versus
omitting carbohydrate. This implicates electrolytes as the critical
factor in rehydration (rather than carbohydrate, whose function
would be limited to glycogen resynthesis). Still, potentially
interactive or synergistic effects of a combination of carbs,
electrolytes, and amino acids on hydration would have been a
worthy condition to investigate in the present studys comparison. For example, chocolate milk has demonstrated
effectiveness for rehydration, glycogen resynthesis, and muscle
recovery, and is more nutrient-dense than typical commercial
recovery drinks.18
A final limitation was that the treatments were
not equal in terms of potassium content (AE had the most).
Comment/application
The findings only partially agreed with the authors hypothesis going into the experiment. They originally predicted that AE and
CE beverages would rehydrate similarly, yet to a greater extent
than the flavored water (FW) beverage. Interestingly, quoting
the authors: The AE and CE beverages rehydrated about equally; however, they were also equal to the FW beverage. However, they go on to mention a subtle detail that separated the
CE beverage from the other two, rendering it superior. CE and
FW yielded more diluted urine than AE, as indicated by urine
specific gravity (USG), depicted below:
At 4 hours post-dehydration, USG in the CE and FW trials was
significantly lower than pre-testing, while AEs USG was the same as pre-testing at this time point. This suggests greater
urinary diuresis and less cellular retention in CE & FW
compared to AE. It should be noted that the measured
differences in fluid retention between conditions were not
statistically significant (43.5% in AE, 40.8% in CE, and 42.2%
in FW). This begs the question of how clinically relevant these
small differences really are, and how necessary or beneficial this
special rehydration product is despite its inclusion of BCAA and absence of carbohydrate.
Alan Aragons Research Review May 2014 [Back to Contents] Page 17
Calorie shifting diet versus calorie restriction diet: a comparative clinical trial study. Davoodi SH, Ajami M, Ayatollahi SA, Dowlatshahi K, Javedan G, Pazoki-Toroudi HR. Int J Prev Med. 2014 Apr;5(4):447-56. [PubMed]
BACKGROUND: Finding new tolerable methods in weight loss
has largely been an issue of interest for specialists. Present study
compared a novel method of calorie shifting diet (CSD) with classic
calorie restriction (CR) on weight loss in overweight and obese
subjects. METHODS: Seventy-four subjects (body mass index
25; 37) were randomized to 4 weeks control diet, 6 weeks CSD or CR diets, and 4 weeks follow-up period. CSD consisted of three
phases each lasts for 2 weeks, 11 days calorie restriction which
included four meals every day, and 4 h fasting between meals
follow with 3 days self-selecting diet. CR subjects receive
determined low calorie diet. Anthropometric and metabolic
measures were assessed at different time points in the study.
RESULTS: Four weeks after treatment, significant weight, and fat
loss started (6.02 and 5.15 kg) and continued for 1 month of follow-
up (5.24 and 4.3 kg), which was correlated to the restricted energy
intake (P < 0.05). During three CSD phases, resting metabolic rate
tended to remain unchanged. The decrease in plasma glucose, total
cholesterol, and triacylglycerol were greater among subjects on the
CSD diet (P < 0.05). Feeling of hunger decreased and satisfaction
increased among those on the CSD diet after 4 weeks (P < 0.05).
CONCLUSIONS: The CSD diet was associated with a greater
improvement in some anthropometric measures, Adherence was
better among CSD subjects. Longer and larger studies are required
to determine the long-term safety and efficacy of CSD diet.
SPONSORSHIP: None listed.
Study strengths
This is the first study to compare the effects of this particular permutation of a calorie shifting diet (CSD: 11 days restricted, 3 days unrestricted) pattern with a linear calorie-restricted (CR) diet. The investigation is an important one given the generally unimpressive weight loss and weight loss maintenance from conventional caloric restriction.
19-21 The sample size (n=74) was
fairly large, especially for diet research, which is notorious for its small subject numbers. More subjects translates to greater statistical power and less likelihood of by-chance occurrences. Study limitations
The design included an intervention period as well as a follow-up period which is a good thing, its just that both periods were short (6-week intervention, 4-week follow-up). This essentially gives us hypothesis-generating pilot data rather than long-term data that we can lean on with greater confidence. Another limitation is the diet construction these are your typical, crappy research diets. Protein intake during the intervention phase was actually less than the subjects habitual intakes at baseline in both groups. The deficits were rather severe, but strangely, they were not equated. CSDs reduction was set at 45% of baseline reported maintenance, and CRs was set at 55% of maintenance. The more severe deficit in CSD may have imparted an advantage. Furthermore, the results of this study might be limited to the subject profile (obese, untrained). Bioelectrical impedance analysis (BIA) was used to assess body composition.
Comment/application
CSD outperformed CR on several parameters:
CSD yielded greater weight loss at the end of the follow-up period but not the intervention period. Full body composition change details here).
CSD yielded greater fat loss at the end of the follow-up period but not at the end of the intervention period.
CSDs decrease in RMR was less than that of CR (which ended up being lower than baseline by the end of the study).
CSD yielded greater decreases in glucose, total cholesterol, and triacylglycerol by the end of the study.
CSD tended to yield lesser subjective feelings of hunger toward the end of the trial.
CSD had a much higher subject retention rate; 36.8% dropped out of CR, and 15.6% dropped out of CSD.
Overall, CSD trumped CR, especially by the end of the 4-week follow-up period. However, its very important to view these results in the proper perspective. It cant be overemphasized that this was a short intervention (6 weeks plus a 4-week follow-up). Ill also reiterate that the diets imposed upon both groups were far from optimal in terms of protein intake. Baseline protein intake of CSD was ~1.1 g/kg, and this dropped to ~0.9 g/kg during the intervention. Baseline protein intake in CR was ~1.1 g/kg, and this dropped to ~0.8 g/kg during the intervention. These protein intakes are approximately half of what has been repeatedly been shown to be a favorable and effective target for optimizing muscular adaptations to hypocaloric conditions.
22-24
Nevertheless, perhaps a case can be made for CSD over CR under conditions of subpar protein intake.
Still, the biological plausibility of there being inherent advantages to the CSD pattern is questionable beyond its potential to bolster compliance, at least in the short-term. CSD had more rules and structuring, particularly during the 11-day cycles where 4 meals were strictly spaced 4 hours apart with no between-eating allowed. This may have raised subjects awareness and focus on the protocol, keeping them more compliant. In contrast, the linear calorie-restricted group could have been lulled into a monotonous grind conducive to the loosening of adherence over time. In contrast, the CSD group essentially took a 3-day diet break (consuming maintenance-level calories) after every 11-day block of dieting.
The weight and fat loss benefits of CSD were not clearly apparent until the end of the follow-up period. Its thus easy to speculate that the 6 weeks of linear, aggressive caloric restriction may have been met with deprivation backlash during the follow-up period where the objective was to consume maintenance-level calories. Remember that in CR, 55% of baseline intake was subtracted, leaving subjects with 6 weeks of consuming 1186 kcal/day (down from 2432 kcal at baseline). An important indicator of the CSDs effectiveness was the doubly higher dropout rate in CR. The more favorable biochemical changes in CSD can be attributed primarily to the greater weight and fat loss at by end of the follow-up. The relative success of the 11/3 CSD model gives rise to the potential effectiveness of other more convenient and realistic non-linear models. For example, a 5/2 model, with 5 calorie-restricted days followed by 2 self-selected days, would mirrors a weekdays/weekend cycle which could potentially fit better into the common work schedule.
Alan Aragons Research Review May 2014 [Back to Contents] Page 18
1. Lecoultre V, Egli L, Carrel G, Theytaz F, Kreis R, Schneiter
P, Boss A, Zwygart K, L KA, Bortolotti M, Boesch C, Tappy L. Effects of fructose and glucose overfeeding on hepatic insulin sensitivity and intrahepatic lipids in healthy humans. Obesity (Silver Spring). 2013 Apr;21(4):782-5. [PubMed]
2. Sobrecases H, L KA, Bortolotti M, Schneiter P, Ith M, Kreis R, Boesch C, Tappy L. Effects of short-term overfeeding with fructose, fat and fructose plus fat on plasma