A simple proof of Shapley–Folkman
This post goes over a simple proof of the Shapley–Folkman lemma which I haven't seen published (though I'm sure everyone has a version or variation of this sitting around!). It is (roughly) a combination of Zhao's proof (which uses a simple result about conic combinations, but has a funky construction) and Cassel's proof, which, while nice, has unfortunately very annoying notation. Specifically, this post proves the "slightly more general" statement given in Cassel's proof which is a little nicer to handle, via a technique that is similar to Zhao's, but uses only linear independence.
Anyways, if you're reading this I probably don't have to convince you that this lemma is very useful, but, if you're unsure, I'd recommend just Googling "Shapley–Folkman [your field of choice]" and you'll probably get a hit or two that might be interesting.
Statement
The statement of Shapley–Folkman is a little funny if you're unfamiliar with it, but it's the following.
We have a collection of sets , which need not be convex. Let be a vector in the sum of the convex hulls of these sets; i.e., one which satisfies
where . Then can be written in the following way
where and, importantly, for at least indices , satisfies . In other words, given any point which is the sum of points lying in the convex hulls of the sets, , then can be written as the sum of points lying in the actual sets , for 'most' indices ; no more than indices will lie in but not .
One simple interpretation of this statement is that, given a lot of sets , relative to the number of dimensions (when ), then the resulting set, which is the (Minkowski) sum of sets, is 'very close' to a set that is convex.
Proof
The proof requires a little bit of extra set up, but not too much.
Statement variation
We will show the proof in an inductive way, with a slightly different set up than the one above. In our set up, there exist some sets and some point such that
and each . Using the definition of the convex hull, this is the same as saying that, for each set , there exist and weights with , such that
and the weights sum to :
for each . In this case, denotes the number of elements of whose convex combination results in , all with nonzero coefficients. (This is why we require that , otherwise, if for some index , we can remove this entry and reduce by 1.) We will show the following inequality can be made true:
This implies the original claim, since lies in if, and only if, , so the sum is an upper bound on the number of sets which have ; i.e., the largest number of indices with is , or, equivalently, at least indices satisfy and therefore have .
Proof
Given that set up, we will show that, if
then we can always set at least one of the weights to (potentially changing some other weights along the way) such that the left hand side decreases by at least 1. Applying this statement inductively gives us the result.
So, let's get to it!
The one trick in this proof is to note that we can choose a 'privileged' index , which, in our case, we will just choose to be the first index. We'll write , for each , splitting out the first entry:
We interpret the sum on the right as zero if . (Note that, by definition, cannot be zero! So this expression is always well-defined.)
We can then write as
Now, if , then there are at least vectors in the second sum and is the dimension of the vectors. So there exists some weights such that
where and . (Very importantly, note that the need not be nonnegative!) Because this is zero, we can multiply it by any constant, and add it to our expression for to get, for any choice of :
There exists at least one such that at least one of the terms
where and , is equal to zero. The smallest (in absolute value) such will ensure that all of the terms are nonnegative and at least one is zero. (Why?) For this , define
for and . From the definition of and the discussion above, we know that , with at least one entry zero, and satisfies
for each , which is easy to show from the definition. Removing the nonzero entries, we then reduce at least one by one, proving the claim.
Discussion
Interestingly, this procedure is essentially constructive: we only need the ability to solve a linear system to perform it, but the 'algorithm' provided will be very slow. (I say "essentially" here, since there's a hidden cost: we also need to be able to write an explicit convex combination of points in that yield ; it is not obvious how to do that in general if the set does not admit a simple polyhedral description, but is fairly 'easy' for many structured sets in practice.)