Longest path search

A simple proof of Shapley–Folkman

This post goes over a simple proof of the Shapley–Folkman lemma which I haven't seen published (though I'm sure everyone has a version or variation of this sitting around!). It is (roughly) a combination of Zhao's proof (which uses a simple result about conic combinations, but has a funky construction) and Cassel's proof, which, while nice, has unfortunately very annoying notation. Specifically, this post proves the "slightly more general" statement given in Cassel's proof which is a little nicer to handle, via a technique that is similar to Zhao's, but uses only linear independence.

Anyways, if you're reading this I probably don't have to convince you that this lemma is very useful, but, if you're unsure, I'd recommend just Googling "Shapley–Folkman [your field of choice]" and you'll probably get a hit or two that might be interesting.

Statement

The statement of Shapley–Folkman is a little funny if you're unfamiliar with it, but it's the following.

We have a collection of sets S1,,Sm𝐑n, which need not be convex. Let y𝐑n be a vector in the sum of the convex hulls of these sets; i.e., one which satisfies

y=x1+x2++xm,

where xiconv(Si). Then y can be written in the following way

y=x~1+x~2++x~m,

where x~iconv(Si) and, importantly, for at least mn indices i, satisfies x~iSi. In other words, given any point y which is the sum of points lying in the convex hulls of the sets, conv(Si), then y can be written as the sum of points lying in the actual sets Si, for 'most' indices i; no more than n indices will lie in conv(Si) but not Si.

One simple interpretation of this statement is that, given a lot of sets Si, relative to the number of dimensions (when mn), then the resulting set, which is the (Minkowski) sum of sets, is 'very close' to a set that is convex.

Proof

The proof requires a little bit of extra set up, but not too much.

Statement variation

We will show the proof in an inductive way, with a slightly different set up than the one above. In our set up, there exist some sets S1,,Sm𝐑m and some point y𝐑n such that

y=x1++xm,

and each xiconv(Si). Using the definition of the convex hull, this is the same as saying that, for each set i=1,,m, there exist zijSi and weights γij>0 with j=1,,ni, such that

xi=j=1niγijzij,

and the weights sum to 1:

j=1niγij=1,

for each i=1,,m. In this case, ni denotes the number of elements of Si whose convex combination results in xi, all with nonzero coefficients. (This is why we require that γij>0, otherwise, if γij=0 for some index j, we can remove this entry and reduce ni by 1.) We will show the following inequality can be made true:

i=1m(ni1)n.

This implies the original claim, since xi lies in Si if, and only if, ni=1, so the sum is an upper bound on the number of sets which have ni>1; i.e., the largest number of indices i with ni>1 is n, or, equivalently, at least mn indices satisfy ni=1 and therefore have xiSi.

Proof

Given that set up, we will show that, if

i=1m(ni1)>n,

then we can always set at least one of the weights γij to 0 (potentially changing some other weights along the way) such that the left hand side decreases by at least 1. Applying this statement inductively gives us the result.

So, let's get to it!

The one trick in this proof is to note that we can choose a 'privileged' index j, which, in our case, we will just choose to be the first index. We'll write xi, for each i, splitting out the first entry:

xi=γi1zi1+j=2niγijzij.

We interpret the sum on the right as zero if ni=1. (Note that, by definition, ni cannot be zero! So this expression is always well-defined.)

We can then write y as

y=i=1mγi1zi1+i=1mj=2niγijzij.

Now, if i=1m(ni1)>n, then there are at least n+1 vectors in the second sum and n is the dimension of the vectors. So there exists some weights αij𝐑 such that

i=1mj=2niαij(zijzi1)=0,

where i=1,,m and j=2,,ni. (Very importantly, note that the αij need not be nonnegative!) Because this is zero, we can multiply it by any constant, η𝐑 and add it to our expression for y to get, for any choice of η:

y=i=1m(γi1ηj=2niαij)zi1+i=1mj=2ni(γij+ηαij)zij.

There exists at least one η such that at least one of the terms

γi1ηj=2niαij,orγij+ηαij,

where i=1,,n and j=1,,ni, is equal to zero. The smallest (in absolute value) such η will ensure that all of the terms are nonnegative and at least one is zero. (Why?) For this η, define

γ~i1=γi1ηj=2niαij,andγ~ij=γij+ηαij,

for i=1,,m and j=1,,ni. From the definition of η and the discussion above, we know that γ~ij0, with at least one entry zero, and satisfies

j=1niγ~ij=j=1niγij=1,

for each i=1,,m, which is easy to show from the definition. Removing the nonzero entries, we then reduce at least one ni by one, proving the claim.

Discussion

Interestingly, this procedure is essentially constructive: we only need the ability to solve a linear system to perform it, but the 'algorithm' provided will be very slow. (I say "essentially" here, since there's a hidden cost: we also need to be able to write an explicit convex combination of points in Si that yield xi; it is not obvious how to do that in general if the set Si does not admit a simple polyhedral description, but is fairly 'easy' for many structured sets in practice.)