Insights

How to Compute Realized Volatility in Python

Alphanume Team · June 4, 2026

Close-to-close and range-based estimators.

Volatility is the input every options model, position-sizer, and risk manager needs, yet the number almost everyone uses — a simple rolling standard deviation of returns — throws away most of the price information your data contains. This tutorial shows how to compute what realized volatility is in code: starting from the close-to-close estimator that most practitioners learn first, then graduating to the range-based estimators — Parkinson, Garman-Klass, and Rogers-Satchell — that extract far more signal from the same OHLC bar. All examples run on a standard pandas OHLC DataFrame. If you need the realized vol figure to feed an options pricing calculator, the annualized output of any estimator here drops in directly.

Setting up: the realized volatility python baseline

The canonical close-to-close estimator treats each daily log return as one observation and computes their rolling standard deviation. Log returns are the right choice — not percentage returns — because they are additive across time and their distribution is closer to Gaussian. Annualizing by multiplying by sqrt(252) converts the per-day figure to a per-year figure on the same scale used by options markets.

import numpy as np
import pandas as pd


def rolling_realized_vol(prices: pd.Series, window: int = 21) -> pd.Series:
    """Close-to-close realized volatility, annualized.

    Parameters
    ----------
    prices : pd.Series
        Daily closing prices, sorted oldest-first.
    window : int
        Rolling lookback in trading days (default 21 ~ one month).

    Returns
    -------
    pd.Series
        Annualized realized volatility as a decimal (0.20 == 20 %).
    """
    log_rets = np.log(prices / prices.shift(1))
    daily_std = log_rets.rolling(window).std(ddof=1)
    return daily_std * np.sqrt(252)

The ddof=1 argument applies Bessel's correction — the default for pandas.Series.std — which is appropriate when your window is a sample of a longer process rather than the full population. For windows of 21 days or more the correction is negligible, but it is worth being explicit. The result is a Series with the first window values set to NaN, which is exactly what you want: no false precision during the burn-in period.

Why range-based estimators are more efficient

A daily OHLC bar contains four prices, but the close-to-close estimator uses only two of them — yesterday's close and today's close. The high and low encode information about the intraday path that the bar took, and ignoring them is wasteful. Range-based estimators exploit this structure. Under the assumption that prices follow a continuous diffusion, the range of a bar — the distance between high and low — is a much more precise proxy for volatility than the endpoint-to-endpoint distance alone.

The efficiency gain is substantial. Parkinson (1980) showed that the high-low range estimator achieves roughly the same precision as the close-to-close estimator with five times fewer observations. Garman and Klass (1980) pushed further by combining the range with the open-to-close drift. Rogers and Satchell (1991) removed the assumption that the drift is zero, making their estimator valid when a stock is trending. In practice all three are dramatically better than close-to-close for the same window length — which means you can use a shorter window (more responsive) without accepting more noise.

Parkinson, Garman-Klass, and Rogers-Satchell in numpy

All three estimators return a per-bar variance estimate; take the square root and annualize to get volatility. The formulas below are stated in terms of log prices: h = log(High), l = log(Low), o = log(Open), c = log(Close), and c_prev = log(previous Close).

Parkinson variance per bar: (h - l)^2 / (4 * ln 2).

Garman-Klass variance per bar: 0.5*(h-l)^2 - (2*ln2-1)*(c-o)^2.

Rogers-Satchell variance per bar: (h-c)*(h-o) + (l-c)*(l-o).

def parkinson_vol(ohlc: pd.DataFrame, window: int = 21) -> pd.Series:
    """Parkinson (1980) range-based volatility estimator."""
    h = np.log(ohlc["High"])
    l = np.log(ohlc["Low"])
    factor = 1.0 / (4.0 * np.log(2))
    var = factor * (h - l) ** 2
    return np.sqrt(var.rolling(window).mean() * 252)


def garman_klass_vol(ohlc: pd.DataFrame, window: int = 21) -> pd.Series:
    """Garman-Klass (1980) OHLC volatility estimator."""
    h = np.log(ohlc["High"])
    l = np.log(ohlc["Low"])
    o = np.log(ohlc["Open"])
    c = np.log(ohlc["Close"])
    var = 0.5 * (h - l) ** 2 - (2.0 * np.log(2) - 1.0) * (c - o) ** 2
    return np.sqrt(var.rolling(window).mean() * 252)


def rogers_satchell_vol(ohlc: pd.DataFrame, window: int = 21) -> pd.Series:
    """Rogers-Satchell (1991) drift-robust volatility estimator."""
    h = np.log(ohlc["High"])
    l = np.log(ohlc["Low"])
    o = np.log(ohlc["Open"])
    c = np.log(ohlc["Close"])
    var = (h - c) * (h - o) + (l - c) * (l - o)
    return np.sqrt(var.rolling(window).mean() * 252)

Each function returns an annualized volatility Series aligned to the input DataFrame's index — the same shape and dtype as the close-to-close output, so they are interchangeable downstream. The rolling mean of the per-bar variance estimates is equivalent to taking an equally-weighted average before annualizing.

Annualization and window-choice trade-offs

The sqrt(252) factor assumes 252 trading days per year — the US equity convention. For FX or crypto markets that trade continuously, use sqrt(365). For weekly bars, use sqrt(52). The principle is always the same: multiply the per-period standard deviation by the square root of the number of periods in a year.

Window length is the dial between responsiveness and noise. A 5-day window reacts within a week to a volatility regime change but is very noisy — a single large move can double the estimate. A 63-day window is much smoother but lags; a spike in realized vol in early January may not fully show up until late March. Common choices:

5–10 days — short-term trading signals, event windows.
21 days — the standard "one-month" vol used in options markets.
63 days — one quarter; useful for risk budgeting and slower strategies.
252 days — one year; gives a structural baseline but is rarely used for position-sizing.

Range-based estimators earn their keep here: their higher statistical efficiency means a 10-day Parkinson window is roughly as precise as a 50-day close-to-close window. That is a material advantage in fast markets.

Worked example: comparing estimators on the same series

Build a synthetic OHLC DataFrame, run all four estimators, and print the results side by side so the differences are visible.

np.random.seed(42)
n = 252

# Simulate a GBM price series with sigma ~ 25 %
dt = 1 / 252
sigma_true = 0.25
mu = 0.05

log_rets = (mu - 0.5 * sigma_true ** 2) * dt + sigma_true * np.sqrt(dt) * np.random.randn(n)
close = 100 * np.exp(np.cumsum(log_rets))

# Synthetic intraday range: scale with daily sigma proxy
daily_range = sigma_true * np.sqrt(dt) * np.abs(np.random.randn(n)) * close
high = close + daily_range * np.random.uniform(0.3, 0.7, n)
low  = close - daily_range * np.random.uniform(0.3, 0.7, n)
open_ = close - (close - close * np.exp(
    sigma_true * np.sqrt(dt) * np.random.randn(n)
)) * 0.5

ohlc = pd.DataFrame(
    {"Open": open_, "High": high, "Low": low, "Close": close},
    index=pd.bdate_range("2024-01-02", periods=n),
)

window = 21
vols = pd.DataFrame({
    "close_to_close": rolling_realized_vol(ohlc["Close"], window),
    "parkinson":      parkinson_vol(ohlc, window),
    "garman_klass":   garman_klass_vol(ohlc, window),
    "rogers_satchell": rogers_satchell_vol(ohlc, window),
})
print(vols.dropna().tail(10).round(4))

On a typical run you will see the range-based estimates clustering closer to the true 25 % sigma than the close-to-close series, and their standard deviation across the 21-day windows will be visibly tighter. Rogers-Satchell tends to diverge most from the others when the simulated drift is large relative to volatility — exactly the regime it was designed for.

Data gotchas

Three problems routinely corrupt realized vol estimates in production data, and all three are avoidable with modest care.

Overnight gaps. The classic range-based estimators assume a continuous diffusion — they do not account for the gap between one day's close and the next day's open. When a stock opens sharply higher or lower after an earnings announcement, Parkinson and Garman-Klass will understate realized vol because the overnight move is invisible in the bar's high-low range. Yang and Zhang (2000) published an extension that adds an overnight-gap term; it is worth using if your universe contains heavily event-driven names.

The intraday U-shape. Intraday volatility is not uniform — it spikes at the open and close and is lowest around midday. If you are computing realized vol from intraday bars rather than daily bars, the time of day each bar represents matters. A naïve annualization of 5-minute bars ignores this structure entirely. Resampling intraday data with pandas covers the resampling mechanics; once you have consistent bars, consider using only bars from the same session window to avoid mixing high-vol and low-vol periods.

Split adjustment. A 2-for-1 split creates an artificial 50 % overnight gap in the raw price series that has nothing to do with volatility. Always work with split- and dividend-adjusted prices for the close-to-close estimator. For range-based estimators, verify that your data provider adjusts the Open, High, and Low columns as well as the Close — many providers adjust only the close, which silently corrupts high-low ranges on split dates.

Keeping these three issues in mind — overnight gaps, session timing, and adjustment consistency — covers the vast majority of the production errors that appear when a volatility model that looked correct in a notebook starts producing outliers on live data.