Skip to main content

Vitamin D sleep experiments

Self-experiment on vitamin D effects on sleep: harmful taken at night, no or beneficial effects when taken in the morning.

Vitamin D is a hormone endogenously created by exposure to sunlight; due to historically low outdoors activity levels, it has become a popular supplement and I use it. Some anecdotes suggest that vitamin D may have circadian and zeitgeber effects due to its origin, and is harmful to sleep when taken at night. I ran a blinded randomized self-experiment on taking vitamin D pills at bedtime. The vitamin D damaged my sleep and especially how rested I felt upon wakening, suggesting vitamin D did have a stimulating effect which obstructed sleep. I conducted a followup blinded randomized self-experiment on the logical next question: if vitamin D is a daytime cue, then would vitamin D taken in the morning show some beneficial effects? The results were inconclusive (but slightly in favor of benefits). Given the asymmetry, I suggest that vitamin D supplements should be taken only in the morning.

Seth Roberts has speculated that vitamin D, despite its myriads of other benefits, may harm sleep when taken in the evening and help sleep when taken in the morning based on some anecdotes (with 2 null results). The anecdotes are nearly worthless as sleep is pretty variable (look above or below, and you’ll see swings of over 20 ZQ points night to night), and just a little carelessness or selection bias will persuade one that there is a major effect where there is none—especially since they are not using Zeos or accelerometers or even giving basic quantities like ‘I felt bad in the morning 3⁄5 days’. But I began to wonder. Vitamin D is a chemical intimately involved in circadian rhythms (a ‘zeitgeber’), with some connections to systems involved in sleep (“The steroid hormone of sunlight soltriol (vitamin D) as a seasonal regulator of biological activities and photoperiodic rhythms”); given its links to the early day and sunlight, one would expect it to affect sleep for the worse.

To see what, if any existing research there was, I checked the 49 hits in PubMed and the first 10 pages of Google Scholar for ‘“vitamin D” sleep’. For the most part, hits were completely irrelevant, and the most relevant ones like “Vitamins and Sleep: An Exploratory Study” did not cover any relationship between vitamin D and sleep, much less the timing of vitamin D consumption. There’s some speculation the elderly may sleep badly in part due to lack of vitamin D (“Some new food for thought: The role of vitamin D in the mental health of older adults”, Cherniack et al 200916ya), but the only hard results I found were weak or tangential: a correlation with daytime sleepiness in Taiwanese dialysis patients1, a correlation with later sleep in American women2, a correlation with earlier sleep in Japanese women3, a correlation with reduced sleep difficulties in Americans, and a correlation of blood levels with both better and worse sleep in Americans4. This reads like noise.

In June 201213ya, after I finished my 2 experiments, a preprint appeared for Medical Hypotheses: “The world epidemic of sleep disorders is linked to vitamin D deficiency”, Gominak & Stumpf 201213ya; the lead author, unfortunately, had little to tell me when I emailed her, indicating that the use of vitamin D was not systematic or recorded:

An observation of sleep improvement with vitamin D supplementation led to a 2 year uncontrolled trial of vitamin D supplementation in 1500 patients with neurologic complaints who also had evidence of abnormal sleep. Most patients had improvement in neurologic symptoms and sleep but only through maintaining a narrow range of 25(OH) vitamin D3 blood levels of 60–80 ng/ml. Comparisons of brain regions associated with sleep-wake regulation and vitamin D target neurons in the diencephalon and several brainstem nuclei suggest direct central effects of vitamin D on sleep…An uncontrolled trial of continuous positive airway pressure CPAP devices for patients with headache and obstructive sleep apnea was partially successful, but in the fall of 200916ya two patients remarked that the serendipitous supplementation of vitamin D, in addition to the use of their CPAP devices had, over a period of weeks, allowed them to wake rested and without headaches. Because the majority of the daily headache sufferers also had vitamin D deficiency the same author went looking for a possible connection between vitamin D and paralysis during sleep. This led to the recognition that several nuclei in the hypothalamus and brainstem that are known to be involved in sleep have high concentrations of vitamin D receptors15,16,17. An uncontrolled clinical trial of vitamin D supplementation in 1500 patients over a 2 year period, maintaining a consistent vitamin D blood level in the range of 60–80 ng/ml over many months, produced normal sleep in most patients regardless of the type of sleep disorder, suggesting that multiple types of sleep disorders might share the same etiology…Like other steroid hormones, Vitamin D is thought to exert its effects in the nucleus of the cell, at the vitamin D receptor, promoting transcription of specific genes. There are also reports of actions unrelated to transcription, possibly mediated by surface membrane receptors, such as Ca++ channels, that produce cellular effects in minutes5. Surprisingly, doses of 20,000 IU/day promote normal sleep without being sedating, and the effect is apparent within the first day of dosing in patients who have had severe sleep disruption and very low 25(OH) vitamin D3 levels…Many of the ideas about normal sleep expressed here grew out of watching patients return to normal sleep cycles, over a period of months, with just the return of the 25(OH) vitamin D3 blood level to 60–80 ng/ml. A totally unexpected observation was that the sleep difficulties produced by vitamin D levels below 50 return, in the same form, as the level goes over 80 ng/ml suggesting a narrower range of “normal” vitamin D levels for sleep than those published for bone health. Also, Vitamin D2, ergocalciferol (widely recommended as an “equivalent” therapy for osteoporosis) prevented normal sleep in most patients, suggesting that D2 may be close enough in structure to act as a partial agonist at some locations, an antagonist at others.

Comments:

  • I don’t know about the overarching claims (I suspect most of the problem is lighting, and general demands on time), but the trial itself seems really important, especially since neither Roberts nor I had the slightest idea about it but seem to have reached similar results

  • the 2 patients suggested it, in an interesting example of the value of self-experimentation

  • the authors cover much more specific potential connections between vitamin D and sleep than just “circadian rhythms”

  • the methodology section is non-existent; how were these 1500 patients picked? how long did each use vitamin D? Unfortunately, I nor Roberts has taken vitamin D blood tests (as far as I know) and so we cannot verify that the authors’ 60–80ng/ml range is what we fell into, but it’s plausible. How is sleep quality being measured? Are these results consistent or inconsistent with the one case of morning mood/restedness improvement but little else? Although even if they were inconsistent, that could be explained by neither of us being sleep disorder sufferers and the effect being weaker in us.

In July 201213ya, preprints of Huang et al 2012 became available; it is a case series—the authors followed a group of veterans with chronic pain who received vitamin D supplements, finding improvements to pain but also reduction in sleep latency and increase in sleep duration. While I did not observe any effect on latency or duration in my following experiments, this would still be a promising datapoint but unfortunately, the sample had substantial dropout, and had no control group (hence no randomizing or blinding). This renders the study not very useful—the improvements being perhaps just regression toward the mean or a selection bias. Blogger Chris L looked back in August 2012 on ~1 year of Zeo data and a quasi-experiment in which he started with 4000IU of vitamin D supplementation, then 5000IU, then none; he took them at night, then switched to morning; the results were that the length of his deep sleep started high, dropped, and then recovered. He interprets this as evidence that too much vitamin D hurts sleep. In 201312ya, a review (McCarty et al 2013) came out arguing that “low vitamin D levels increase the risk for autoimmune disease, chronic rhinitis, tonsillar hypertrophy, cardiovascular disease, and diabetes. These conditions are mediated by altered immunomodulation, increased propensity to infection, and increased levels of inflammatory substances, including those that regulate sleep”; this might handle negative effects on sleep from chronically low vitamin D, but doesn’t seem relevant to acute effects varying by time of administration. A 2017 vitamin D sleep RCT (Majid et al 2017) in people diagnosed with sleep disorders found a large benefit in self-rated sleep quality after the equivalent of 3570IU/daily, but for compliance the vitamin D was administered as a single shot every two weeks so while it provides evidence that vitamin D may help with sleep disorders, it doesn’t address the benefits in otherwise healthy people or the question of timing of doses.

Vitamin D at Night Hurts?

Setup

I decided to run a small double-blind experiment much like the Adderall and other trials. My Vitamin D is 360 5000IU softgels by ‘Healthy Origins’, bought on iHerb.com. The gel-capsules contain cholecalciferol dissolved in olive oil. This made preparing placebo pills a little more difficult. I wound up puncturing the capsules, squeezing out the olive oil contents into a new capsule (they were too wide to push in) and then pushing in the empty shell; all 20 were topped off with ordinary white baking flour. (I used up the last of my creatine preparing the placebos for the Modalert day trial.) For the 20 placebo pills, I spooned in some olive oil to each and topped them off with flour as well. Each set went into its own identical Tupperware container. The process was a little messier than I had hoped, but the pills seem like they will work.

The procedure at night will be: in the dark5 immediately before putting on the Zeo headband and going to bed, I will take my usual melatonin pill; then I will take the two containers blindly; mix them up; select a pill from one to take, and put the selected container on the shelf next to the Zeo. In the morning, I will see which one I took. (The Vitamin D olive oil was distinctly more yellow than the green placebo olive oil.) If I took placebo, I will take my usual daily dose of Vitamin D, and if active, I will skip it. This will blind me and keep constant my total Vitamin D intake. (This procedure may need to be amended with something more like the modafinil/Adderall procedure: a bag with replacement of the consumed placebos.) If I get a run of one kind of pills, I will re-balance the numbers.

Based on the first 10 days’ ZQs, I predict I’ll find in the final data set:

  1. increased sleep latency; probably at least another 10 minutes to fall asleep, as my mind seems to churn away with ideas of things to do

  2. increased awakenings; not that many, maybe 1 or 2 on average

  3. decreased ZQ; by around 5–10 points (a large effect, on par with melatonin)

    My best guess is that the ZQ hit is coming from reduced deep sleep, or maybe reduced deep & REM sleep. I don’t think the total amount of sleep has changed.

Roberts theorizes that besides vitamin D damaging sleep, it could actively improve your sleep if taken in the morning. As it happens, in this setup, on ‘placebo’ days I do take vitamin D in the morning—so wouldn’t one expect to see scores improve on the nights following a placebo night (a vitamin D morning), regardless of whether that night was vitamin D or placebo? A quick analysis of the first 24 nights showed the lagged nights to average a ZQ of 94.5. My monthly averages for October and November were 96, so there is no large improvement here.

One thing I suspect but cannot confirm—since I do not have a heart rate monitor—is that ~10 minutes after taking the vitamin D pills, my heart rate increases. Not to any uncomfortable or worrisome degree, but when one expects one’s heart rate to go down after going to bed, even a small increase in the opposite direction is noticeable. On the 12th, I finally got around to writing down this impression; then I searched online a bit and found that low vitamin D levels are associated with arrhythmia and other issues, but so are very high levels, and increased heart rates in the studies and anecdotes are associated with higher heart rates6. I’m not worried about the heart rate, but I am concerned that this is defeating the double-blinding: if all I have to do is notice my heart rate (and lying swaddled in bed in complete silence, it would be hard for me not to), then I’ve unblinded myself before falling asleep. Other stimulants like caffeine or sulbutiamine might similarly increase my heart rate, but they’d also interfere with sleep, so I can’t create any ‘active placebo’ even if I wanted to start over.

Vitamin D Data

The data (trimmed CSV), covering January–February 201213ya:

Vitamin D time-series by status, sleep quality, and blinding index

Date

Pill

Quality7

ZQ

Guess

31D–1J

active

bad

84

right 70%

1–2

placebo

better

93

right 65%

2–3

active

well

94

50%

3–4

active

poor

86

right 60%

4–5

placebo

well

98

wrong 60%

5–6

active

mediocre

86

50%

6–7

placebo

OK

?8

right 65%

7–8

placebo

good

90

right 60%

8–9

active

poor

84

right 65%

9–10

placebo

good

95

right 65%

10–11

active

good

100

wrong 70%

11–12

active

mediocre

92

right 70%

12–13

active

mediocre

88

50%

13–14

active

poor

100

right 60%

14–15

placebo

poor

83

wrong 60%

15–16

active

poor

101

right 55%

16–17

placebo

mediocre

90

50%

17–18

placebo

mediocre

88

right 60%

18–19

placebo

good

100

50%

19–20

active

poor

86

50%

20–21

active

mediocre

85

50%

21–22

placebo

OK

91

right 60%

22–23

placebo

OK

106

right 65%

23–24

active

poor

91

right 65%

24–25

active

1

79

right 75%

25–26

placebo

3

85

right 65%

26–27

active

2

?9

right 55%

28–29

active

3

85

50%

29–30

active

3

93

wrong 55%

30–31

placebo

3

100

right 60%

31J–1

active

3

94

50%

1F–2

active

2

89

right 60%

2–3

active

1

83

right 70%

3–4

placebo

2

81

wrong 70%

5–6

placebo

3

98

right 65%

6–7

active

2

88

50%

7–8

active

2

94

right 55%

8–9

active

3

94

wrong 75%

9–10

placebo

3

92

50%

10–11

placebo

3

95

right 60%

11–12

placebo

3

103

right 75%

12–13

placebo

3

84

right 70%

(Data input was for ‘Other Disruptions 3’; 0 = placebo, 1 = vitamin D.)

Vitamin D Analysis

From a quick look at the prediction confidences, I was usually correct but perhaps underconfident: my proper scoring log score compared to a random guesser is 5.410, which is even better than my guesses in my Adderall experiment.

Looking at the data averages in the Zeo website, it looked like ZQ & total & REM sleep fell, deep increased slightly, time awake & awakenings both increased, and morning feel decreased. The R analysis11:

The MANOVA is tantalizingly close to statistical-significance (p = 0.07); the variables:

Variable

Effect

p-value

Coefficient’s sign is…

Total.Z

-19.73

0.084

worse

Time.in.REM

-14.54

0.021

worse

Time.in.Deep

2.32

0.41

better

Time.in.Wake

2.50

0.63

worse

Awakenings

0.739

0.37

worse

Morning.Feel

-0.524

0.0067

worse

Time.to.Z

3.47

0.46

worse

Morning.Feel jumps out as having a large effect (-0.5, on a 1–3 rating, is huge) and accordingly, a very low p-value which survives multiple-correction12. Apparently I was waking up feeling like crap on the Vitamin D nights.

Going back to my predictions after the first 10 days, they’re sort of right:

  1. sleep latency was increased, but not statistically-significantly and only by ~3m, which is less than half the predicted 10 minutes

  2. increased awakenings was less than 1 additional awakening (compared to predicted 1–2) and didn’t reach statistical-significance

My conclusion?

Vitamin D hurts sleep when taken at night. I know of no reason that one would want to take vitamin D late at night, so I will definitely be avoiding it at that time in the future.

VoI

For background on “value of information” calculations, see the first calculation.

The first experiment I had no opinion on. I actually did sometimes take vitamin D in the evening when I hadn’t gotten around to it earlier (I take it for its anti-cancer and SAD effects). There was no research background, and the anecdotal evidence was of very poor quality. Still, it was plausible since vitamin D is involved in circadian rhythms, so I gave it 50% and decided to run an experiment. What effect would perfect information that it did negatively affect my sleep have? Well, I’d definitely switch to taking it in the morning and would never take it in the evening again, which would change maybe 20% of my future doses, and what was the negative effect? It couldn’t be that bad or I would have noticed it already (like I noticed sulbutiamine made it hard to get to sleep). I’m not willing to change my routines very much to improve my sleep, so I would be lying if I estimated that the value of eliminating any vitamin D-related disturbance was more than, say, 10 cents per night; so the total value of affected nights would be 0.10 × 0.20 × 365.25 = 7.3. On the plus side, my experiment design was high quality and ran for a fair number of days, so it would surely detect any sleep disturbance from the randomized vitamin D, so say 90% quality of information. This gives (7.3 − 0)(ln 1.05) × 0.90 × 0.50 = 67.3, justifying <9.6 hours. Making the pills took perhaps an hour, recording used up some time, and the analysis took several hours to label & process all the data, play with it in R, and write it all up in a clean form for readers. Still, I don’t think it took almost 10 hours of work, so I think this experiment ran at a profit.

Vitamin D at Morn Helps?

Setup

The logical next thing to test is whether there is any benefit to sleep by taking vitamin D in the morning as compared to not taking vitamin D at all, since we have already established that evening is worse than morning. (Besides anecdotes, Seth Roberts reported—after I concluded my experiment—that his own non-blind varying of doses seemed to help his subjective restedness but didn’t influence anything else.) I would expect any benefits in the morning to be attenuated compared to the evening effect: the morning is simply many hours away from going to bed again in the evening, giving time for many events to affect the ultimate sleep. So this experiment will run for more than 40 days of 20/20, but 56 days of 28/28; per Roberts’s suggestion, I will not randomize individual days but 8 paired blocks of 7 days. (Multiple days to give any slow effects time to manifest, which seem eminently possible with a fat-soluble vitamin like vitamin D; 7 days, so we don’t ‘cycle around the week’ but instead have exactly the same number of eg. active Sundays and placebo Sundays since sleep often varies systematically over the week.)

I prepare 27 placebo pills & 27 actives as before, stored in separate baggies. To randomize blocks of 7-days—I will fill 2 opaque containers with 7 placebo and 7 actives (with a label on the inside of the active container), and pick a container at random to use for the next 7 days. I will take one each morning upon awakening, closing my eyes. On the 8th morning, the first container will be empty, so I set it aside and open the second; when the second is emptied, I will look inside it to see whether it has the label, which lets me infer which one it was, and record whether the 2 weeks were active/placebo or placebo/active. The 2 containers will be refilled as before, and blocks 3–4 will begin. I will do this 4 times, at which point I will analyze the data.

Analysis will be the same Zeo parameters as before, but this time augmented by a simple mood indicator: 1–5, with 3 being an ordinary mildly productive day and 1 being ‘my car caught on fire and was totaled’ day (real data-point), recorded at the end of each day just before bed. (I considered a more complex mood indicator, the BOMS, while setting up my lithium experiment, but rejected it as being too heavy-weight for long-term use, and subjectively, my mood doesn’t vary that much.)

Morning Data

  1. Blocks:

    • 17–25F: guess: placebo (last pill used morning 25; swapped jars and consumed pill from second jar the morning of 26); actual: placebo

    • 26F-8M: skipped multiple days for modafinil (omit March 1, 2); actual: active

  2. Blocks:

    • 9M-15M: guess: active actual: placebo

    • 16–25: active (omit March 21)

  3. Blocks:

    • 26M-1A: guess: placebo actual: placebo

    • 2A-8: active

  4. Blocks:

    • 9A-19: (omit April 11, 12) guess: placebo actual: placebo

    • 20–27: active (omit April 21, 22)

Placebo/active coded as 0/1 in SSCF.113 in the CSV export. Mood was coded as fractional integers as the Mood column.

Morning Analysis

As before, we fire up R and analyze the spreadsheet with the usual assumptions14 about independence of the daily observations. The interpreter session:

zeo <- read.csv("https://gwern.net/doc/zeo/2012-zeo-vitamind-morning.csv")

# an example of the many intercorrelations which make simple t-tests misleading
# and motivate the use of multivariate linear regression:
cor(zeo[c(2,3,5:11, 25)], use="complete.obs")
#               Vitamin.D     Mood  Total.Z Time.to.Z Time.in.Wake Time.in.REM Time.in.Light
# Vitamin.D      1.000000 -0.06210  0.01007 -0.004528     -0.14399     0.01844      -0.02043
# Mood          -0.062097  1.00000  0.03038 -0.229114      0.13365    -0.05137       0.06783
# Total.Z        0.010067  0.03038  1.00000 -0.388734     -0.05258     0.77338       0.82402
# Time.to.Z     -0.004528 -0.22911 -0.38873  1.000000      0.17821    -0.29690      -0.28948
# Time.in.Wake  -0.143987  0.13365 -0.05258  0.178211      1.00000    -0.12396       0.15893
# Time.in.REM    0.018437 -0.05137  0.77338 -0.296904     -0.12396     1.00000       0.35087
# Time.in.Light -0.020427  0.06783  0.82402 -0.289484      0.15893     0.35087       1.00000
# Time.in.Deep   0.054670  0.05648  0.57647 -0.299816     -0.35438     0.37922       0.24574
# Awakenings    -0.074435  0.09076  0.07645  0.142952      0.67797     0.04007       0.21834
# Morning.Feel   0.053450  0.11313  0.62368 -0.285966     -0.04032     0.56241       0.51081
#               Time.in.Deep Awakenings Morning.Feel
# Vitamin.D          0.05467   -0.07444      0.05345
# Mood               0.05648    0.09076      0.11313
# Total.Z            0.57647    0.07645      0.62368
# Time.to.Z         -0.29982    0.14295     -0.28597
# Time.in.Wake      -0.35438    0.67797     -0.04032
# Time.in.REM        0.37922    0.04007      0.56241
# Time.in.Light      0.24574    0.21834      0.51081
# Time.in.Deep       1.00000   -0.28355      0.22280
# Awakenings        -0.28355    1.00000      0.02151
# Morning.Feel       0.22280    0.02151      1.00000

l <- lm(cbind(Total.Z,Time.in.REM,Time.in.Deep,Time.in.Wake,Awakenings,Morning.Feel,Time.to.Z,Mood)
         ~ Vitamin.D, data=zeo)
summary(manova(l))
#           Df Pillai approx F num Df den Df Pr(>F)
# Vitamin.D  1 0.0363    0.213      9     51   0.99
summary(l)
# Response Total.Z :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   525.21      10.06   52.20   <2e-16
# Vitamin.D       1.07      13.89    0.08     0.94
#
# Response Time.in.REM :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  162.172      4.711   34.42   <2e-16
# Vitamin.D      0.921      6.505    0.14     0.89
#
# Response Time.in.Deep :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)    65.34       2.53   25.85   <2e-16
# Vitamin.D       1.47       3.49    0.42     0.68
#
# Response Time.in.Wake :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)    27.76       3.10    8.94  1.4e-12
# Vitamin.D      -4.79       4.29   -1.12     0.27
#
# Response Awakenings :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)    8.000      0.592   13.51   <2e-16
# Vitamin.D     -0.469      0.818   -0.57     0.57
#
# Response Morning.Feel :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   2.8276     0.1386   20.40   <2e-16
# Vitamin.D     0.0787     0.1913    0.41     0.68
#
# Response Time.to.Z :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   25.448      2.827    9.00  1.1e-12
# Vitamin.D     -0.136      3.904   -0.03     0.97
#
# Response Mood :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   3.0931     0.1127   27.45   <2e-16
# Vitamin.D    -0.0744     0.1556   -0.48     0.63

The MANOVA suggests no statistically-significant difference between days (p = 0.99), and no variables seem to have changed much:

Variable

Effect

p-value

Coefficient’s sign is…

Total.Z

1.07

0.94

better

Time.in.REM

0.92

0.89

better

Time.in.Deep

1.47

0.68

better

Time.in.Wake -

4.79

0.27

better

Awakenings -

0.47

0.57

better

Morning.Feel

0.08

0.68

better

Time.to.Z -

0.14

0.97

better

Mood -

0.07

0.63

worse

All the changes are junk, including ones I was fairly sure would change, like ‘Time to Z’ or ‘Mood’. (An earlier version of this analysis found a statistically-significant effect increasing ‘Morning Feel’, but this turns out to be due to the t-tests’ assumption that variables were not correlated, and the multivariate linear regression reduces the effect to non-statistical-significance.) ‘Mood’ arguably was affected by an exogenous event—my car burning ruined that particular week. Graphing the raw data, I notice that when my car burned, my ‘Mood’ takes a clearly visible fall for a week, while my sleep looks like it was affected less—it seems that during that period, waking up was literally the best part of the day…

Day Mood graphed against date/experimental-status

Day Mood graphed against date/experimental-status

Morning Feel over the experiment (colors indicate placebo or active)

Morning Feel over the experiment (colors indicate placebo or active)15

I conclude that the vitamin D in the morning did not damage any of the measured variables, unlike the vitamin D in the evening.

Control Quality Control

Like with melatonin, we might wonder: is taking vitamin D causing effects on the control days as well? With melatonin, the concern I often hear voiced is whether melatonin might in some way be ‘addictive’ or suppress normal melatonin secretion, in which case the observed difference between control and experimental days—which we interpreted as improvement—may actually be the opposite, a negative effect caused by a sort of ‘withdrawal’ (lowered melatonin secretion levels, since the body has not yet adapted to the absence of melatonin supplements and will not when supplementation resumes the next day).

In the case of vitamin D, I find the results (no effect on anything except ‘Morning Feel’) sufficiently surprising that I wonder if this fat-soluble vitamin was causing effects over periods even longer than a week; and that the true results were that both control and experimental weeks were better than unsupplemented weeks, but that ‘Morning Feel’ was the only variable which reacted to placebo fast enough to show up as a difference. The previously-mentioned August 201213ya report of Chris L that an increase of 1k IU in his vitamin D supplementation reduced his deep sleep with month-long lags reinforces my suspicion: with such a long lag, any reduction in my deep sleep would go unnoticed. A completely “dry” multi-month long control group is necessary.

The simplest solution, although I don’t know if it’s statistically correct, is to drop the vitamin D or melatonin for a long enough period that any long-term effects should have disappeared, and then compare this abstention period to the supposed “control” weeks. If the abstention weeks are worse than the control weeks, then this supports the long-term interpretation; if the abstention weeks are similar to the control weeks, then we can eliminate the long-term interpretation; and if the abstention weeks are better than the control weeks, then we ought to be puzzled and start thinking about other possibilities. (Not enough data/power? Misinterpreted results? Or, the original morning experiment was in spring, while the abstention periods were summer/autumn—does sleep get worse in summer, perhaps due to heat?)

I won’t bother with blinding this one since it’s just a double-check of an unlikely possibility. (If one wanted to blind it, the procedure would be the same as before, but with big blocks: say, 2 blocks of 62 days, first pick randomized, or blocks of 31 days, with 4 blocks randomized in 2 pairs.) This ‘experiment’ is easy enough to run: simply stop taking vitamin D. To avoid the temptation to cheat on days I am feeling down, it’s easiest to just wait until I run out of vitamin D and procrastinate on ordering a fresh supply until a bunch of days have passed.

The vitamin D experiment terminated in April; the last day of vitamin D was 2012-07-02; and I resumed 2012-09-06 with the end of the dataset being 2012-10-31.

Analysis

The question is simple: does the ‘Morning Feel’ differ between the control days in the original Vitamin D morning experiment and between vitamin-less days as part of a long later sustained period? Was there something funky about the original control days, was there some sort of vitamin D bleed-over or maybe some sort of long-term effect which we could describe as ‘contamination’ or ‘dependency’?

The short answer is: no. When we compare the two groups of days, the ‘Morning Feel’ ratings have identical means, as we expected.

A Bayesian MCMC analysis16 (using the BEST library) produces the following graphical summary, which shows the two groups almost completely overlapping on means, with the key graph in the lower-right corner: there is no visible effect size at all (centered on 0), much less an effect size of d>=0.1 which we might take seriously as indicating a real difference:

Bayesian t-test of experimental vitamin-less control days and vitamin-less baseline on morning ratings.

Bayesian t-test of experimental vitamin-less control days and vitamin-less baseline on morning ratings.

More precisely, the summary statistics indicate that the difference in means & medians is usually -0.03 (negligibly small), the full range of effect size estimates is -0.4678744 to 0.4142259, and 44.4% of the possibilities were simply zero effect size.

(I did a non-parametric test as well: p = 0.710317.)

VoI

For background on “value of information” calculations, see the first calculation.

With the vitamin D theory partially vindicated by the previous experiment, I became fairly sure that vitamin D in the morning would benefit my sleep somehow: 70%. Benefit how? I had no idea, it might be large or small. I didn’t expect it to be a second melatonin, improving my sleep and trimming it by 50 minutes, but I hoped maybe it would help me get to sleep faster or wake up less. The actual experiment turned out to show, with very high confidence, no bad change (and a good change in my mood upon awakening in the morning).

What is the “value of information” for this experiment? Essentially—zero:

  1. If the experiment had shown any benefit, I would have continued taking it in the morning

  2. if the experiment had shown no effect, I would have continued taking it in the morning to avoid incurring the evening penalty discovered in the previous experiment

  3. if the experiment had shown the unthinkable (a negative effect), it would have to be substantial to convince me to stop taking vitamin D altogether and forfeit its many other apparent health benefits, and it’s not worth bothering to analyze an outcome I would have given <=5% chance to.

So since I did, was then, and still do supplement vitamin D, why bother? But of course, I did it because it was cool and interesting! (Estimated time cost: perhaps half the evening experiment, since I had to manually record less data, and already had the analysis worked out from before.)


  1. “Sleep Behavior Disorders in a Large Cohort of Chinese (Taiwanese) Patients Maintained by Long-Term Hemodialysis” (Chen et al 200619ya):

    …The increased odds of high PSQI score for greater hemoglobin level and for high ESS score for use of vitamin D analogues were unexpected results for which we cannot speculate about the cause or association and that may simply be spurious findings arising from statistical analysis.

    ↩︎
  2. “Relationships among dietary nutrients and subjective sleep, objective sleep, and napping in women” (Grandner et al 201015ya):

    This study found a [statistically-]significant relationship between circadian phase of sleep and dietary Vitamin D intake. Later sleep acrophase, an indicator of sleep timing, was associated with more dietary Vitamin D. For most people, most Vitamin D is obtained through sunlight(44), though dietary Vitamin D is usually obtained through supplementation, usually in pills or in dairy products(44). It is currently unknown why those who consumed more Vitamin D would demonstrate a sleep phase delay, especially since in this same subject group, those exposed to more light had earlier circadian acrophases(45).

    ↩︎
  3. “The midpoint of sleep is associated with dietary intake and dietary behavior among young Japanese women” (Sato-Mito et al 201114ya):

    Late midpoint of sleep was [statistically-]significantly negatively associated with the percentage of energy from protein and carbohydrates, and the energy-adjusted intake of cholesterol, potassium, calcium, magnesium, iron, zinc, vitamin A, vitamin D, thiamin, riboflavin, vitamin B(6), folate, rice, vegetables, pulses, eggs, and milk and milk products.

    ↩︎
  4. “Low vitamin D levels in adults with longer time to fall asleep: US NHANES, 2005–200619ya, Shiue 201312ya:

    …Table 2 shows associations of serum 25(OH)D concentrations and sleep characteristics. After adjusting for age, sex, ethnicity, high blood pressure, body mass index, active smoking, depressive symptoms, and survey weighting, no association between serum 25(OH)D concentrations and sleeping hours was observed (beta 0.19, 95% CI −0.40 0.77, p = 0.51) while a statistically-significant inverse association was found between serum 25(OH)D concentrations and minutes to fall asleep (beta −3.13, 95% CI −5.62 to −0.64, p = 0.02). Moreover, people with higher vitamin D levels could be more likely to complain sleep problems (OR 1.60, 95% CI 1.20 to 2.14, p = 0.004)….It was observed that serum 25(OH)D concentrations were statistically-significantly associated with minutes to fall asleep, indicating that people with lower vitamin D levels tended to have longer time to fall asleep. On the other hand, it was also observed that people with higher vitamin D levels had more sleep complaints, although the reason is unclear.

    ↩︎
  5. The problem was the original vitamin D3 capsule: I couldn’t squeeze out all the oil, so I settled for squeezing out most, and then pushing the original capsule into the new capsule. So they contain everything they should, but they have a visible ‘bubble’ inside them (the original capsule). Hence, the need for literal blinding. Otherwise, they’re pretty good: identical shape and weight.↩︎

  6. See the general remarks in LiveStrong, “Vitamin D warning: Too much can harm your heart”, and the 200916ya study “Relation of serum 25-hydroxyvitamin D to heart rate and cardiac work (from the National Health and Nutrition Examination Surveys)”.↩︎

  7. For ‘Quality’ & ‘ZQ’: higher = better↩︎

  8. Headband came loose at some point, data useless↩︎

  9. Headband came loose at some point, data useless↩︎

  10. The preponderance of True is because while recording the scores, I normalized them; in retrospect, I shouldn’t’ve bothered:

    logBinaryScore = sum . map (\(result,p) -> if result then 1 + logBase 2 p else 1 + logBase 2 (1-p))
    logBinaryScore [(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),
                    (True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.55),(True,0.55),(True,0.55),
                    (True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),
                    (True,0.60),(True,0.65),(True,0.65),(True,0.65),(True,0.65),(True,0.65),(True,0.65),
                    (True,0.65),(True,0.65),(True,0.70),(True,0.70),(True,0.70),(True,0.70),(True,0.75),
                    (True,0.75),(False,0.55),(False,0.6),(False,0.6),(False,0.7),(False,0.7),(False,0.75)]
    5.4
    ↩︎
  11. The usual session:

    zeo <- read.csv("https://gwern.net/doc/zeo/2012-zeo-vitamind.csv")
    colnames(zeo)[26] <- "Vitamin.D"
    l <- lm(cbind(Total.Z, Time.in.REM, Time.in.Deep, Time.in.Wake,
                  Awakenings, Morning.Feel, Time.to.Z)
              ~ Vitamin.D, data=zeo)
    summary(manova(l))
    #           Df Pillai approx F num Df den Df Pr(>F)
    # Vitamin.D  1   0.31     2.12      7     33   0.07
    # Residuals 39
    summary(l)
    # Response Total.Z :
    #
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)   533.37       8.16   65.37   <2e-16
    # Vitamin.D     -19.73      11.14   -1.77    0.084
    #
    # Response Time.in.REM :
    #
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)   175.63       4.44    39.5   <2e-16
    # Vitamin.D     -14.54       6.07    -2.4    0.021
    #
    # Response Time.in.Deep :
    #
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)    55.00       2.04   26.98   <2e-16
    # Vitamin.D       2.32       2.78    0.83     0.41
    #
    # Response Time.in.Wake :
    #
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)    26.32       3.83    6.88  3.2e-08
    # Vitamin.D       2.50       5.22    0.48     0.63
    #
    # Response Awakenings :
    #
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)    7.579      0.598    12.7  2.1e-15
    # Vitamin.D      0.739      0.817     0.9     0.37
    #
    # Response Morning.Feel :
    #
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)    2.842      0.134   21.21   <2e-16
    # Vitamin.D     -0.524      0.183   -2.86   0.0067
    #
    # Response Time.to.Z :
    #
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)    17.58       3.43    5.12  8.6e-06
    # Vitamin.D       3.47       4.69    0.74     0.46
    ↩︎
  12. Correcting for multiple comparisons at q-value=0.05, of our 8 pessimistic p-values, 1 survives:

    p.adjust(c(0.084,0.021,0.41,0.63,0.37,0.0067,0.46), method="BH") < 0.05
    # [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

    Remarkable—the first time a p-value survived. (That was the Morning.Feel one.)↩︎

  13. I originally input the data as ‘Other Disruptions 4’ through the Zeo web interface, since I assumed that if ‘Other Disruptions 3’ was SSCF.12, that would put the data into SSCF.13—but it turns out that does not get exported in the CSV! Apparently the CSV is limited to 1–3. So I edited the exported CSV and just reused SSCF.1. Hopefully Zeo Inc. will fix the export functionality, since it’s very frustrating to be able to see the data used in the ‘Cause & Effect’ tool, for example, but not export it.↩︎

  14. Gustavo Lacerda wondered if the two-sample t-test (or linear regressions in general) were really justifiable to use—could days be correlated, in which case the p-values would be overstated and my results actually weaker than they look? He suggested testing my full Zeo dataset to see whether Morning Feel can be predicted from day to day by a (relatively) simple linear autocorrelation regression looking at all previous recorded days:

    zeo <- read.csv("https://gwern.net/doc/zeo/gwern-zeodata.csv")
    ## Master Zeo export file is periodically updated; your results may not be identical
    n <- length(data$Morning.Feel); n
    [1] 1050
    reg <- lm(Morning.Feel[2:n] ~ Morning.Feel[1:(n-1)], data=zeo)
    summary(reg)
    # Coefficients:
    #                         Estimate Std. Error t value Pr(>|t|)
    # (Intercept)               2.5727     0.0943    27.3   <2e-16
    # Morning.Feel[1:(n - 1)]   0.0689     0.0329     2.1    0.036
    #
    # Residual standard error: 0.771 on 918 degrees of freedom
    #   (129 observations deleted due to missingness)
    # Multiple R-squared:  0.00476,   Adjusted R-squared:  0.00368
    # F-statistic: 4.39 on 1 and 918 DF,  p-value: 0.0364
    
    ## Given that pretty much all the ratings are 2, 3, or 4, and the r^2 is <0.01
    ## with a residual error of 0.75, that doesn't seem very correlated.
    ## although the _p_ does indicate there's a real (but very small) correlation from
    ## day to day, so I guess the p-values may be a *little* overstated
    
    cor(zeo$Morning.Feel[2:n], zeo$Morning.Feel[1:(n-1)], use = "complete.obs")
    # [1] 0.069
    
    ## we can also graph the lags:
    acf(zeo$Morning.Feel, na.action=na.pass,
           main="Do days predict subsequent days at various temporal distances?")
    
    ## incidentally - 129 observations missing? What's going on?
    zeo$Morning.Feel
    #    [1] NA  2  3  3  4  3  3  2 NA NA  4  4 NA  3 NA  2  4  4 NA  4  3  3  3  4  2  3  2  3 NA  3 NA
    #   [32] NA  4 NA  4 NA NA NA NA NA NA NA NA NA NA NA NA NA  3  4 NA NA  4  4  3  4 NA NA NA NA NA NA
    #   [63] NA  4 NA  2  3  3 NA NA  3 NA  3  3 NA  2 NA NA NA NA  3 NA NA NA NA NA NA NA  3  4 NA  4  3
    #   [94]  3  3  4  4  3  3  3  2  3  3  2  3  3  3  2 NA  3  3  4  3 NA  3 NA  3 NA  3  3  3 NA  3  3
    #  [125] NA NA NA NA NA  2 NA NA  3  2  3 NA NA NA NA NA NA  3  2  3  2  2  2  2  2  3  3  3  3 NA  3
    #  [156]  3  2  2  3  3  2  3  2  3 NA  2 NA NA  4  3  3  3  2  3 NA  4  3  2  3  3  3  3  3  3  4  3
    #  [187]  4  3  3  3  3  3  2  3  2  3  3  3 NA  3  1  4 NA  3  2  4  4  2  2  3  3  3  3  3  3  3  3
    #  [218]  3  3  4  3  3  2  2  3  3  2  3  3  3  2  2  3  3  3  3  3  4  3  3  2  2  2  1  2  3  3 NA
    #  [249]  3  3  3  3  3  3  3  3  2  3  2  3  2  3  3  3  2  3  3  2  3  3  3  3  4  3  3  4  3  4  2
    #  [280]  3 NA  3  3  2  2  2  3  3  3  3  2  3  3  2  2  2  3  3  2  2  3  2  3  3  3  3  3  3  2  3
    #  [311]  3  2  1  3  4  3  2  3  3  2  2  3  3  3  1  2 NA  2  3  2  2  3  3  2  3  3 NA  3 NA  3  3
    #  [342]  2  3  2  2  3  3  3  3  1  3  3  3  2  1  3 NA  2  3  3  3  3  2  1  2  2  3  2  2  3  3  3
    #  [373]  3  3  4  3  2  3  3  3  2  2  3 NA  3  2  3  4  4  3  3  2  4  3  2  3  3  4  3  4  3  3 NA
    #  [404]  2  2  3  3  3  4  4  3  1  3  3  2  4  3  3  3  2  3  2  4  2  4  3  3  3  4 NA  2  3  3  3
    #  [435]  3  2  1  2  2  3  2  3  1  4  3  3  4  3  3  2  2  2  2  3  1  3  3  3  4  3  3  2  3  3  4
    #  [466]  4  2  2  3  3  2  2  4  3  3  3  2  3  2  2  3  2  3  2  3  2  3  2  3  2  3  3  3  2  3  3
    #  [497]  2  3  1  2  3  3  3  3  2  2  3  3  1  3  2  3  3  4  1  3  4  1  4  3  4  3  3  2  3  2 NA
    #  [528]  3  4  2  4  3  3  3  4  4  1  3  2  3  3  3  2  3  4  3  3  2  3  3  3  4  2  2  2  3  3  3
    #  [559]  4  4  1  3  3  3  4  3  4  3  3  1  1  2  3  2  3  3  4  3  3  3  2  2  3  4  4  1  4  4  3
    #  [590]  4  3  3  3  3  3  2  3  3  2  3  3  2  3  4  2  2  3  1  3  3  2  3  3  2  2  3  4  3  2  1
    #  [621]  3  3  3  3  2  4  2  3  3  3  3  4  3  3  3 NA  3 NA  4  3  2  2  2  2  3  3  3  4  3  2  3
    #  [652]  2  3  3  1  3  4  3  3  4  4  4  2  3  2  1  4  2  4  3  2  3  3  3  3  2  3  4  2  2  2  2
    #  [683]  3  4  3  4  2  2  3  4  2  3  3  3  2  2  2  3  2  2  2  4  3  3  3  2  2  1  2  4  3  3  3
    #  [714]  3  3  2  2  2  3  3  3  3  1  1  2  3  3  4  3  3  3  4  3  4  3  3  3  3  3  3  3  2  2  2
    #  [745]  2  3  2  3  3  2  1  3  3  2  3  3  3  3  2  3  4  4  2  3  3  4  4  2  4  4  4  3  3  3  1
    #  [776]  3  3  2  3  3  4  4  3  1  4  4  4  3  3  3  2  1  2  2  3  3  3  2  4  3  2  4  3  3  4  4
    #  [807]  1  2  3  2  3  4  2  3  4  2  4  2  3  3  2  3  2  3  3  3  2  3  2  2  3  4  2  0  3  2  2
    #  [838]  1  3  3  4  4  3  2  3  2  3  3  2  1  2  3  3  1  0  3  3  2  3  2  3  3  3  2  3  3  2  2
    #  [869]  3  2  3  2  3  3  3  0  2  3  2  2  2  2  2  3  3  3  2  3  2  3  3  2  2  3  4  3  3  3  2
    #  [900]  3  3  3  3  4  2  3  3  2  3  0  1  3  2  3  3  3  2  2  3  3  3  3  3  2  2  3  4  0  3  3
    #  [931]  3  2  3  4  2  3  3  3  3  3  4  2  3  3  2  3  2  3  4  4  3  3  1  3  4  3  0  3  4  3  3
    #  [962]  4  2  2  3  1  2  4  4  3  3  3  2  3  0  3  4  3  2  4  2  3  0  3  3  3  2  4  2  3  3  2
    #  [993]  3  3  3  3  3  3  4  3  4  3  3  3  4  3  3  3  2  3  3  3  2  2  3  3  4  3  4  2  3  3  3
    # [1024]  3  3  2  3  2  3  3  3  3  3  3  3  3  4  4  3  3  3  0  4  3  2  2  3  3  3  2
    ## ah, I just wasn't good about recording "Morning Feel" early on, and since then
    ## there have been occasional slips (literally, with the headband)

    Gustavo comments:

    And by the way, instead of regressing Morning.Feel[n] on Drug[n] (a discrete variable taking values in {0,1}), it would make more sense to regress on an Exponentially-Weighted Moving Average of Drug, such as Drug[n-1] + (1⁄2 × Drug[n-2]) + (1⁄4 × Drug[n-3]) + …, which is modeling how much drug is present on the body. In the above example, I’m assuming a half-life of 1 day, so lambda=1⁄2. You could arguably select the lambda that gives you the best fit; just be wary of multiple testing.

    (These days I would fit an ARIMA in brms for that.)↩︎

  15. Code written by Ben Wieland:

    library(ggplot2)
    sleep <- read.csv("https://gwern.net/doc/zeo/2012-zeo-vitamind-morning.csv")
    
    qplot(as.Date(Sleep.Date,format="%m/%d/%Y"), weight=Mood, data=sleep, geom="bar",
        binwidth=1, fill=factor(sleep$SSCF.1, labels=c("placebo","active")),
        ylab="Mood", xlab="Date")+scale_fill_discrete(name="treatment")
    
    # to save:
    ggsave("mood.png")
    
    qplot(as.Date(Sleep.Date,format="%m/%d/%Y"), weight=Morning.Feel, data=sleep,
        geom="bar", binwidth=1, fill=factor(sleep$SSCF.1, labels=c("placebo","active")),
        ylab="Morning Feel", xlab="Date")+scale_fill_discrete(name="treatment")
    
    ggsave("morning.feel.png")
    ↩︎
  16. The BEST analysis is powerful and provides much more information than a simple t-test would, but the various parameters in the table or the image are not self-explanatory; the curious should read “Bayesian estimation supersedes the t test” (Kruschke 201213ya).

    In the CSV, an SSCF.1 of 0 indicates membership in the original experiment, 1 indicates the dry period July-September, 2 indicates the vitamin D resumption post-original-experiment, and 3 indicates the vitamin D resumption post-September. So:

    # set up data
    mydata <- read.csv("https://gwern.net/doc/zeo/2012-zeo-vitamind-morning-control.csv")
    originalcontrol <- subset(mydata, SSCF.1==0)
    newcontrol <- subset(mydata, SSCF.1==1)
    # clean missing data
    originalcontrol <- originalcontrol$Morning.Feel[!is.na(originalcontrol$Morning.Feel)]
    newcontrol <- newcontrol$Morning.Feel[!is.na(newcontrol$Morning.Feel)]
    # run BEST MCMC group estimations
    source("BEST.R")
    mcmc = BESTmcmc(originalcontrol, newcontrol)
    BESTplot(originalcontrol, newcontrol, mcmc, TRUE, ROPEeff=c(-0.1,0.1))
    #            SUMMARY.INFO
    # PARAMETER          mean      median        mode     HDIlow     HDIhigh pcgtZero
    #   mu1        2.82199912  2.82184675  2.82109419  2.5425634   3.1008251       NA
    #   mu2        2.84712376  2.84744246  2.84233569  2.6205415   3.0777439       NA
    #   muDiff    -0.02512464 -0.02542602 -0.03361140 -0.3874754   0.3339228 44.43593
    #   sigma1     0.72900731  0.71760315  0.69447083  0.5330477   0.9474278       NA
    #   sigma2     0.88825472  0.88350888  0.87346099  0.7192899   1.0690516       NA
    #   sigmaDiff -0.15924742 -0.16410108 -0.17383105 -0.4269052   0.1171290 12.08159
    #   nu        41.98417254 33.62743916 17.74077514  3.2649758 104.0648983       NA
    #   nuLog10    1.51048794  1.52669380  1.57284008  0.8699835   2.1138309       NA
    #   effSz     -0.03198943 -0.03143175 -0.04438195 -0.4678744   0.4142259 44.43593
    ↩︎
  17. As usual:

    mydata <- read.csv("https://gwern.net/doc/zeo/2012-zeo-vitamind-morning-control.csv")
    originalcontrol <- subset(mydata, SSCF.1==0)
    newcontrol <- subset(mydata, SSCF.1==1)
    
        Wilcoxon rank sum test with continuity correction
    
    data:  originalcontrol$Morning.Feel and newcontrol$Morning.Feel
    W = 886, p-value = 0.7103
    ↩︎