When Random Walks Break: Testing Market Efficiency Across 500 Equities

Omega7 Capital Research Collective

May 27, 2026

The random walk hypothesis is among the most durable ideas in quantitative finance. In its simplest form it states that successive price changes are independent and identically distributed, so that past returns carry no usable information about future returns. If this holds, then technical prediction is futile and weak-form market efficiency is satisfied. The hypothesis is convenient, mathematically tractable, and, as a first approximation, frequently defensible.

It is also not exactly true, and the interesting question is not whether it fails but where, by how much, and for whom. In this study we apply a battery of statistical tests to daily returns from 500 large-capitalisation U.S. equities over a twelve-year window. We ask a narrow, measurable question: for each stock, can we reject the null hypothesis that its returns behave like a random walk?

Our headline finding is unsurprising in direction but useful in magnitude. Roughly seventy percent of the stocks we examined are statistically indistinguishable from a random walk at conventional significance levels. The remaining thirty percent are not, and the departures are not randomly distributed across the market. They cluster in identifiable sectors, they are entangled with volatility, and they are far weaker than they first appear once trading frictions are taken seriously. We try throughout to be honest about how thin the exploitable signal really is.

Hypothesis and motivation

Let the log price of an asset be denoted p(t). The random walk hypothesis with drift posits that p(t) = mu + p(t minus 1) + e(t), where the innovations e(t) are independent draws with zero mean and finite variance. Under this model the daily return r(t) = p(t) minus p(t minus 1) is unpredictable from its own history: the best forecast of tomorrow's return, conditional on everything known today, is simply the long-run drift mu.

This is the weak form of the efficient market hypothesis, and it makes a sharp, testable prediction. If returns are serially uncorrelated, then the autocorrelation function of r(t) should be statistically zero at all non-zero lags, the variance of k-period returns should grow linearly in k, and runs of consecutive same-sign returns should appear with the frequency expected under independence. Each of these implications gives us a separate test, and each test probes a slightly different way the null can fail.

We emphasise at the outset what a rejection does and does not mean. Rejecting the random walk null tells us that returns contain some serial structure. It does not tell us that the structure is large, stable across time, or exploitable after costs. A great deal of the empirical literature on market efficiency founders precisely on this gap between statistical significance and economic significance, and we have tried to keep the distinction in view.

What we are testing. The null hypothesis is that daily log returns are serially uncorrelated. We are not testing whether markets are "rational," whether prices equal fundamental value, or whether any particular trading strategy is profitable. Those are different and harder questions.

Methodology

Data

Our sample consists of 500 large-capitalisation U.S. equities, selected to approximate the composition of a broad large-cap index as of the start of the study window. We use adjusted daily closing prices from January 2013 through December 2024, giving approximately 3,020 trading days per name. Prices are adjusted for splits and dividends, and we convert to continuously compounded daily log returns. Names with more than five percent missing observations over the window were excluded and replaced with the next-largest eligible stock, which introduces a mild survivorship bias that we flag in the discussion.

All returns are winsorised at the 0.1 and 99.9 percentiles to limit the influence of a handful of extreme observations, mostly associated with earnings surprises and the March 2020 dislocation. We checked that our qualitative conclusions are unchanged when winsorisation is removed; the main effect of trimming is to stabilise the distribution tests rather than to alter the autocorrelation results.

Tests

We apply four complementary tests to each stock's return series. The first is a Ljung-Box test on the first ten autocorrelation lags, which aggregates serial correlation into a single chi-squared statistic. The second is the Lo-MacKinlay variance ratio test, which compares the variance of multi-period returns to that implied by a random walk and is robust to heteroskedasticity. The third is a runs test on the signs of returns, a non-parametric check that makes no distributional assumptions. The fourth is a set of distributional tests, the Jarque-Bera and Anderson-Darling statistics, which probe departures from normality rather than from independence.

For each stock we combine the independence-oriented tests into a single composite measure we call the randomness score, scaled from zero to one hundred. A score near one hundred indicates returns that are statistically clean under all tests; a low score indicates that one or more tests reject the random walk null. The score is a convenience for visualisation and ranking, not a formal test statistic, and we report the underlying p-values wherever the distinction matters.

Approach and controls

Testing 500 stocks invites a multiple-comparisons problem: at a five percent significance level we expect twenty-five false rejections by chance alone even if every stock were a perfect random walk. We therefore apply a Benjamini-Hochberg correction to control the false discovery rate, and we report both raw and corrected rejection counts. We also conduct all tests on the full sample and on three non-overlapping four-year subperiods, because a structure that appears only in one subperiod is a weaker and less interesting finding than one that persists.

Methodology pipeline A flow diagram showing the four stages of analysis: data preparation, four statistical tests, composite scoring, and multiple-comparison correction. Data 500 equities 2013–2024 Ljung–Box autocorrelation Variance ratio Lo–MacKinlay Runs test sign sequence Distribution JB / AD Randomness score 0–100 FDR correction B–H
Figure 1. The analysis pipeline. Each return series passes through four statistical tests probing serial independence and distributional shape. The independence tests are combined into a single randomness score for ranking and visualisation, and a Benjamini-Hochberg correction controls the false discovery rate across the 500 simultaneous comparisons.

Results

Most stocks look like random walks, but a sizeable minority do not

Across the full twelve-year sample, 348 of the 500 stocks (69.6 percent) failed to reject the random walk null under all independence tests after false-discovery correction. The remaining 152 stocks (30.4 percent) showed at least one statistically significant departure from randomness. Before correction the rejection count was higher, 184 stocks, which underscores how much of the naive signal is attributable to multiple testing rather than genuine structure.

The distribution of randomness scores is informative in its shape, not only its centre. As Figure 2 shows, the scores are concentrated in a broad mode between roughly sixty and ninety, with a thinner left tail of stocks that depart more clearly from random-walk behaviour. There is no clean bimodality that would let us speak of two distinct populations of "efficient" and "inefficient" stocks; the picture is closer to a continuum, with most names clustered near the efficient end.

Distribution of randomness scores A histogram of randomness scores across 500 stocks, showing a broad mode between 60 and 90 and a thinner left tail of less random stocks. 5 9 20 35 60 80 90 75 50 25 median 71 0 20 40 60 80 100 Randomness score 0 20 40 60 80 Number of stocks
Figure 2. Distribution of randomness scores across the 500-stock universe. Higher scores indicate returns that are statistically closer to a random walk. The mass concentrates between 60 and 90 with a median of 71, and a left tail of names exhibiting more serial structure. The shape is a continuum rather than two separate populations.

Departures cluster by sector

When we group the 152 non-random stocks by sector, the rejections are clearly uneven. Sectors associated with smaller average market capitalisation, lower analyst coverage, and thinner liquidity, such as real estate, utilities, and parts of the materials sector, show higher rejection rates. Highly liquid, heavily traded sectors such as information technology and large-cap financials are closer to the random-walk benchmark. This pattern is consistent with the intuition that the limits to arbitrage, rather than the absence of inefficiency in principle, are what allow predictability to survive.

Rejection rate by sector A horizontal bar chart showing the percentage of stocks rejecting the random walk null in each of nine sectors, ranging from about 18 percent in information technology to about 44 percent in real estate. Real estate Utilities Materials Energy Consumer disc. Industrials Health care Financials Info. technology 44% 40% 37% 32% 30% 26% 23% 21% 18% 0% 10% 20% 30% 40%
Figure 3. Share of stocks within each sector that reject the random walk null after false-discovery correction. Less liquid, less-covered sectors such as real estate and utilities show the highest rejection rates, while information technology and financials are closest to the random-walk benchmark. Sector sample sizes vary, so the smaller sectors carry wider uncertainty.

We caution that sector sample sizes differ considerably. Real estate and utilities are represented by relatively few names in a large-cap universe, so their rejection rates carry wider confidence intervals than the figure conveys. The ordering is suggestive and consistent with prior literature, but we would not place strong weight on the precise percentages for the smaller sectors.

Autocorrelation is weak, short-horizon, and inconsistent in sign

For the stocks that do reject, the serial correlation is concentrated at short lags, predominantly the first and second trading days, and decays quickly thereafter. The magnitudes are small. Among rejecting stocks the median first-lag autocorrelation is about 0.06 in absolute value, and even the most extreme names rarely exceed 0.12. The sign is not uniform: some names show mild positive autocorrelation consistent with short-horizon momentum, while others show negative first-lag autocorrelation consistent with mean reversion or bid-ask bounce.

Autocorrelation by lag for three stock groups A line chart of average autocorrelation across lags one to ten for random-walk stocks, momentum-type stocks, and mean-reverting stocks. Random-walk stocks hover near zero; the other two diverge at short lags and converge toward zero by lag five. 95% band 1 2 3 4 5 6 7 8 9 10 Lag (trading days) +0.08 0.00 −0.08 Momentum-type (n=64) Mean-reverting (n=88) Random-walk (n=348)
Figure 4. Average autocorrelation by lag for the three groups of stocks. The shaded band marks the approximate 95 percent confidence region under the random walk null. Random-walk stocks remain inside the band at all lags. Momentum-type and mean-reverting stocks diverge only at lags one and two and converge back toward zero by lag four or five. Note the vertical scale: even the largest deviations are small in absolute terms.

The economic reading of Figure 4 deserves emphasis. A first-lag autocorrelation of 0.06 implies that knowing today's return explains well under one percent of the variance of tomorrow's return. This is a genuine statistical departure from the random walk, and with three thousand observations per stock our tests have ample power to detect it, but it is a small effect by any practical standard.

Volatility is highly predictable even where returns are not

The clearest and most robust departure from random-walk behaviour appears not in the returns themselves but in their magnitudes. Volatility clustering, the tendency of large moves to follow large moves, is present in essentially every stock in our sample regardless of whether its returns reject the independence null. The Ljung-Box test applied to squared returns rejects independence for more than ninety-five percent of the universe, with overwhelming statistical significance.

An important distinction. A return series can be serially uncorrelated and still be far from independent. The random walk hypothesis constrains the first moment of conditional returns; it says nothing about the conditional variance. Volatility clustering lives entirely in the second moment, which is why almost all stocks pass the return-level tests yet fail the squared-return tests.

This is the standard signature of conditional heteroskedasticity and is well captured by GARCH-type models. It is worth stating plainly because it is easy to misread: the near-universal predictability of volatility is not evidence against weak-form efficiency. The level of returns can be unforecastable while their dispersion is highly forecastable, and that combination is exactly what most of our stocks display.

Test (full sample) Stocks rejecting Share
Ljung-Box on returns (lags 1-10)13126.2%
Variance ratio (Lo-MacKinlay)11823.6%
Runs test on return signs9719.4%
Any independence test (post-FDR)15230.4%
Ljung-Box on squared returns47995.8%
Jarque-Bera normality500100.0%

Table 1 collects the rejection counts across tests. Two rows stand out. The first is that the independence tests on returns agree only partially with one another, which is expected since each is sensitive to a different form of dependence. The second is that every single stock rejects normality on the Jarque-Bera test, a reminder that fat tails and excess kurtosis are pervasive and that any model assuming Gaussian returns is mis-specified from the start.

Departures are not stable across time

Perhaps the most sobering result concerns persistence. We re-ran the independence tests on three non-overlapping four-year subperiods and asked how often a stock that rejects in one subperiod also rejects in the next. The answer is: not very often. Of the stocks rejecting in the first subperiod, fewer than forty percent also rejected in the second, and the overlap across all three subperiods was smaller still. The identity of the "predictable" stocks shifts substantially over time.

This instability is itself consistent with a market that is broadly efficient. If a genuine, stable inefficiency existed in a particular name, we would expect it to be arbitraged away rather than to persist; and if a detected departure is largely a transient artefact of a particular regime, we would expect exactly the kind of churn we observe. Either reading cautions strongly against treating a single-period rejection as a durable property of a stock.

Discussion and implications

Taken together, our results support a nuanced rather than a binary picture. The random walk hypothesis is an excellent approximation for the majority of large-cap U.S. equities, and where it fails, it fails by small and unstable margins. We read this as broadly favourable to weak-form efficiency, with three important qualifications.

The first qualification is liquidity. The concentration of rejections in less liquid, less-covered sectors fits the limits-to-arbitrage view of efficiency: prices are efficient to the extent that it is profitable for arbitrageurs to make them so, and predictability survives precisely where the frictions of doing so are highest. Efficiency is better thought of as a property that holds up to transaction costs rather than as an absolute.

The second qualification is the level-versus-magnitude distinction. The near-universal predictability of volatility, combined with the near-universal failure of normality, means that "random walk" is at best a statement about the conditional mean. The full data-generating process is plainly more complex, with persistent conditional variance and heavy tails. For risk management, option pricing, and position sizing, the random-walk-in-returns result is almost beside the point; what matters there is the structure we find in the second moment, and that structure is strong.

The third qualification, and the one we want to stress most, is the gap between statistical and economic significance. With three thousand daily observations per stock, our tests can detect autocorrelations far too small to trade against profitably. A first-lag autocorrelation of 0.06 would be swamped by realistic bid-ask spreads, commissions, market impact, and the variance of the strategy's own returns. We did not implement a trading strategy and make no profitability claims; we simply note that a statistically significant rejection is a necessary but nowhere near sufficient condition for an exploitable one.

Limitations

Several limitations bound our conclusions. Our universe is restricted to large-cap U.S. equities, and the random walk hypothesis is known to fare worse among small caps, in less developed markets, and at intraday frequencies that we did not examine. Our replacement rule for delisted names introduces a mild survivorship bias that likely tilts the sample toward more efficient, surviving stocks. Our tests assume a fixed data-generating process within each window, which the subperiod instability suggests is questionable. And the randomness score, while convenient, is a constructed composite without a clean sampling distribution; we lean on the underlying p-values for any formal claim.

We also acknowledge a more general caution. The literature on market efficiency has a long history of apparent anomalies that weakened or vanished once the discovery sample was extended, once costs were included, or once the multiple-comparisons problem was taken seriously. We have tried to guard against each of these, but we would treat any of our sector-level or single-stock findings as hypotheses to be tested out of sample rather than as established facts.

Conclusions

We tested 500 large-cap U.S. equities against the random walk hypothesis using four complementary statistical tests over a twelve-year window. About seventy percent of the stocks are statistically indistinguishable from a random walk after correcting for multiple comparisons. The thirty percent that reject do so by small margins, the departures are concentrated at short lags and in less liquid sectors, and they are unstable across time. Meanwhile volatility is predictable almost everywhere, and normality holds nowhere.

The honest summary is that weak-form efficiency is a good first approximation for this universe, that the exceptions are real but economically marginal once frictions are considered, and that the most reliable structure in equity returns lives in the second moment rather than the first. None of this is a claim that markets are perfectly efficient or that prediction is impossible; it is a claim, supported by our data, that the room for return-level prediction in large-cap equities is narrow, fragile, and easy to overstate.

Future work should extend the analysis to intraday data, to small-cap and international universes, and to explicit cost-aware backtests that close the gap between the statistical rejections we report here and any economic significance they might carry. We would be glad to share our test code and the simulated dataset underlying this article with members pursuing those questions.

Citation

Please cite this work as:

Omega7 Capital Research Collective, "When Random Walks Break: Testing Market Efficiency Across 500 Equities", Omega7 Capital: Research, May 2026. @article{omega7_2026_randomwalks, author = {Omega7 Capital Research Collective}, title = {When Random Walks Break: Testing Market Efficiency Across 500 Equities}, journal = {Omega7 Capital: Research}, year = {2026} }

Note on data: the figures and statistics in this article are based on a simulated dataset constructed for illustrative and methodological purposes. They are designed to be internally consistent and realistic but should not be read as findings about specific named securities.