There are exponentially many vectors with small inner product
An interesting observation that Alex stumbled across while reading some of Anthropic's work on LLM interpretability is the following
Although it's only possible to haveĀ Ā orthogonal vectors in anĀ -dimensional space, it's possible to haveĀ Ā many "almost orthogonal" (Ā cosine similarity) vectors in high-dimensional spaces. See theĀ JohnsonāLindenstrauss lemma.
While the JL lemma does indicate that this might be true (ish, in a certain sense) it is not obvious that this statement directly translates to the one given here. (I won't comment on the content on the rest of the linked post, some of which I find somewhat, uh, loose in construction, but I'll focus specifically on this point.)
Why should this not be true?
For example, a simple fact from linear algebra is that, given vectors in dimensional space each of which has nonpositive inner product with any other vector for , we have that . In other words: no more than vectors can have pairwise nonpositive inner products with any other. (The fact that this is tight is very easy to show: pick the unit vectors and their negatives.) A very simple argument also shows that, if we restrict further to have (note the negative!) and normalize the to have , then:
(Of course, this bound is useless if since it tells us nothing more than what we already knew from the previous bound.1)
On the other hand, if the original statementāthat we can have exponentially many vectors with slightly positive inner productāis true, it would indicate a phase transition in the number of possible vectors we are allowed to have from negative, to zero, to positive. There aren't that many things that undergo dramatic phase transitions like this, but, every once in a while, it does happen!
Idea and proof
Anyways, I'm sure the title of this post gave away the punchline, but indeed the following is true: for any there exists a list of normalized vectors in dimensions such that (for ) where satisfies
A volumetric argument shows that this is tight on up to constants in the exponent, but that's less fun.
Basic proof sketch
As usual, we will provide a (very silly) randomized construction for the normalized vectors. Pick vectors uniformly at random (with no larger than the bound we gave) and set .
Clearly the are normalized, by construction. The only thing left to show is that, with some nonzero probability, these vectors will have small inner product; i.e., .
Of course, we know that (bounded) independent random variables with mean zero have very strong concentration phenomena in the sense that their sum is with very high probability. (Indeed, the sum really is around that size too.) This, in turn, implies that with very low probability for any one . Adding everything up then bounds the probability that any one pair fails to satisfy which gives the result.
Full(ish) proof
Ok, with that idea, the details are now mostly mechanical, but let's write them out anyways. Here are two simple observations of uniform random variables, also known as Rademacher random variables.
- The product of two independent Rademacher random variables is also Rademacher
- Expectations of Rademacher variables and functions thereof are very simple to compute
Our goal will now be to show that the probability that any two vectors have inner product larger than is small. Since then
Let be uniform and independent. From our observations, note that
We can multiply by a nonnegative constant and take the exponential of both sides to get
(The right hand side inequality is Markov's inequality.) Since the are independent and identical, note that
Finally, it is not hard to show an upper bound on the right hand side,
and, using this upper bound and putting everything together, gives that, for any ,
Setting gives the result
Since there are possible pairs of with , we then have that the probability that any one of the pairs has inner product larger than is bounded from above by
where the right hand side inequality holds for any .
Equivalently, there is nonzero probability that a given sampled set of normalized vectors has all inner products no larger than for any choice of . So, choose (or the largest integer no larger than this bound).
Since the resulting probability is nonzero, then there exists at least one such set of vectors for which the claim is true, which proves the desired result.
Twitter discussion after posting
Damek pointed out that Terry Tao has some notes in the case where we want the inner product to lie in a band between and ; the bounds are essentially the same in this case for the same reasons as above. (Terry points out slightly better packings using some neat algebraic geometric tricks.)
GaussianProcess points out that there is a slightly better construction in certain regimes of using ReedāSolomon codes. One (simple) version of this is to note that there is a near-equivalence between our vectors and codes over binary fields, where the inner product is "almost" the Hamming distance between two binary vectors. (The linked paper mentions but does not use this construction directly, but it, too, would suffice!) 0xWave and others also point out that this is linked to codes, and yes, similar constructions are used as both possibility and impossibility results including in the Johnson bound and Plotkin bounds, etc.
Both Nick White and Nick Yoder point out that this might be (or is?) related to JohnsonāLindenstrauss, and, while I agree that it is related, I don't think these statements obviously map one-to-one. In particular, I see JL as a possibility result on the necessary number of dimensions needed to faithfully represent some number of vectors (that live in some higher dimensional space). This, on the other hand is a possibility result that there exist some number of vectors in a low dimensional space that are, in some sense, maximally faithfully representable. I would love if there is some mapping between the two statements, but this is not obvious to me! (Though they do result from the same "asymptotic"/high-dimensional behavior.)
A proof of this is very easy by considering the nonnegative quantity .↩