There are exponentially many vectors with small inner product

09 Jul, 2025

An interesting observation that Alex stumbled across while reading some of Anthropic's work on LLM interpretability is the following

Although it's only possible to have $n$ orthogonal vectors in an $n$ -dimensional space, it's possible to have $\exp (n)$ many "almost orthogonal" ( $ε$ cosine similarity) vectors in high-dimensional spaces. See the Johnson–Lindenstrauss lemma.

While the JL lemma does indicate that this might be true (ish, in a certain sense) it is not obvious that this statement directly translates to the one given here. (I won't comment on the content on the rest of the linked post, some of which I find somewhat, uh, loose in construction, but I'll focus specifically on this point.)

Why should this not be true?

For example, a simple fact from linear algebra is that, given $m$ vectors $x_{1}, \dots, x_{m} \in 𝐑^{n}$ in $n$ dimensional space each of which has nonpositive inner product with any other vector $x_{i}^{T} x_{j} \leq 0$ for $i \neq j$ , we have that $m \leq 2 n$ . In other words: no more than $2 n$ vectors can have pairwise nonpositive inner products with any other. (The fact that this is tight is very easy to show: pick the unit vectors and their negatives.) A very simple argument also shows that, if we restrict further to have $x_{i}^{T} x_{j} \leq - ε < 0$ (note the negative!) and normalize the $x_{i}$ to have $‖ x_{i} ‖ = 1$ , then:

m \leq \frac{1}{ε} + 1 .

(Of course, this bound is useless if $ε \leq 1 / (2 n - 1)$ since it tells us nothing more than what we already knew from the previous bound.¹)

On the other hand, if the original statement—that we can have exponentially many vectors with slightly positive inner product—is true, it would indicate a phase transition in the number of possible vectors we are allowed to have from $ε$ negative, to zero, to positive. There aren't that many things that undergo dramatic phase transitions like this, but, every once in a while, it does happen!

Idea and proof

Anyways, I'm sure the title of this post gave away the punchline, but indeed the following is true: for any $ε > 0$ there exists a list of $m$ normalized vectors $x_{1}, \dots, x_{m} \in 𝐑^{n}$ in $n$ dimensions such that $x_{i}^{T} x_{j} \leq ε$ (for $i \neq j$ ) where $m$ satisfies

m \geq \exp (\frac{n ε^{2}}{4}) .

A volumetric argument shows that this is tight on $n$ up to constants in the exponent, but that's less fun.

Basic proof sketch

As usual, we will provide a (very silly) randomized construction for the normalized vectors. Pick $m$ vectors ${\tilde{x}}_{1}, \dots, {\tilde{x}}_{m} \in {\pm 1}^{n}$ uniformly at random (with $m$ no larger than the bound we gave) and set $x_{i} = {\tilde{x}}_{i} / \sqrt{n}$ .

Clearly the $x_{i}$ are normalized, by construction. The only thing left to show is that, with some nonzero probability, these vectors will have small inner product; i.e., $x_{i}^{T} x_{j} \leq ε$ .

Of course, we know that (bounded) independent random variables with mean zero have very strong concentration phenomena in the sense that their sum is $≲ \sqrt{n}$ with very high probability. (Indeed, the sum really is around that size too.) This, in turn, implies that $(1 / n) {\tilde{x}}_{i}^{T} {\tilde{x}}_{j} = x_{i}^{T} x_{j} > ε$ with very low probability for any one $i \neq j$ . Adding everything up then bounds the probability that any one pair fails to satisfy $x_{i}^{T} x_{j} \leq ε$ which gives the result.

Full(ish) proof

Ok, with that idea, the details are now mostly mechanical, but let's write them out anyways. Here are two simple observations of uniform $\pm 1$ random variables, also known as Rademacher random variables.

The product of two independent Rademacher random variables is also Rademacher
Expectations of Rademacher variables and functions thereof are very simple to compute

Our goal will now be to show that the probability that any two vectors $i \neq j$ have inner product larger than $ε$ is small. Since $x_{i} = {\tilde{x}}_{i} / \sqrt{n}$ then

Pr (x_{i}^{T} x_{j} > ε) = Pr ({\tilde{x}}_{i}^{T} {\tilde{x}}_{j} > n ε) .

Let $Z_{1}, \dots, Z_{n} ~ {\pm 1}$ be uniform and independent. From our observations, note that

Pr ({\tilde{x}}_{i}^{T} {\tilde{x}}_{j} > n ε) = Pr (Z_{1} + \dots + Z_{n} > n ε) .

We can multiply by a nonnegative constant $λ \geq 0$ and take the exponential of both sides to get

Pr (\exp (λ (Z_{1} + \dots + Z_{n})) > \exp (λ n ε)) \leq \frac{𝐄 [\exp (λ (Z_{1} + \dots + Z_{n}))]}{\exp (λ n ε)} .

(The right hand side inequality is Markov's inequality.) Since the $Z_{i}$ are independent and identical, note that

𝐄 [\exp (λ (Z_{1} + \dots + Z_{n}))] = {(𝐄 [\exp (λ Z_{1})])}^{n} = {(\frac{e^{λ} + e^{- λ}}{2})}^{n} .

Finally, it is not hard to show an upper bound on the right hand side,

\frac{e^{λ} + e^{- λ}}{2} \leq \exp (\frac{λ^{2}}{2}),

and, using this upper bound and putting everything together, gives that, for any $λ \geq 0$ ,

Pr (x_{i}^{T} x_{j} > ε) \leq \exp (n (\frac{λ^{2}}{2} - λ ε)) .

Setting $λ = ε$ gives the result

Pr (x_{i}^{T} x_{j} > ε) \leq \exp (- \frac{n ε^{2}}{2}) .

Since there are $m (m - 1) / 2$ possible pairs of $i, j$ with $i \neq j$ , we then have that the probability that any one of the pairs $i, j$ has inner product larger than $ε$ is bounded from above by

\frac{m (m - 1)}{2} \exp (- \frac{n ε^{2}}{2}) < m^{2} \exp (- \frac{n ε^{2}}{2}) \leq 1,

where the right hand side inequality holds for any $m \leq \exp (n ε^{2} / 4)$ .

Equivalently, there is nonzero probability that a given sampled set of normalized $m$ vectors $x_{1}, \dots, x_{m}$ has all inner products no larger than $ε$ for any choice of $m \leq \exp (n ε^{2} / 4)$ . So, choose $m = \exp (n ε^{2} / 4)$ (or the largest integer no larger than this bound).

Since the resulting probability is nonzero, then there exists at least one such set of vectors for which the claim is true, which proves the desired result.

Twitter discussion after posting

Damek pointed out that Terry Tao has some notes in the case where we want the inner product to lie in a band between $- ε$ and $ε$ ; the bounds are essentially the same in this case for the same reasons as above. (Terry points out slightly better packings using some neat algebraic geometric tricks.)

GaussianProcess points out that there is a slightly better construction in certain regimes of $ε$ using Reed–Solomon codes. One (simple) version of this is to note that there is a near-equivalence between our vectors $v_{i}$ and codes over binary fields, where the inner product is "almost" the Hamming distance between two binary vectors. (The linked paper mentions but does not use this construction directly, but it, too, would suffice!) 0xWave and others also point out that this is linked to codes, and yes, similar constructions are used as both possibility and impossibility results including in the Johnson bound and Plotkin bounds, etc.

Both Nick White and Nick Yoder point out that this might be (or is?) related to Johnson–Lindenstrauss, and, while I agree that it is related, I don't think these statements obviously map one-to-one. In particular, I see JL as a possibility result on the necessary number of dimensions needed to faithfully represent some number of vectors (that live in some higher dimensional space). This, on the other hand is a possibility result that there exist some number of vectors in a low dimensional space that are, in some sense, maximally faithfully representable. I would love if there is some mapping between the two statements, but this is not obvious to me! (Though they do result from the same "asymptotic"/high-dimensional behavior.)

A proof of this is very easy by considering the nonnegative quantity $0 \leq {‖ \sum_{i} x_{i} ‖}^{2}$ .↩