A simple proof of Shapley–Folkman

09 Jun, 2024

This post goes over a simple proof of the Shapley–Folkman lemma which I haven't seen published (though I'm sure everyone has a version or variation of this sitting around!). It is (roughly) a combination of Zhao's proof (which uses a simple result about conic combinations, but has a funky construction) and Cassel's proof, which, while nice, has unfortunately very annoying notation. Specifically, this post proves the "slightly more general" statement given in Cassel's proof which is a little nicer to handle, via a technique that is similar to Zhao's, but uses only linear independence.

Anyways, if you're reading this I probably don't have to convince you that this lemma is very useful, but, if you're unsure, I'd recommend just Googling "Shapley–Folkman [your field of choice]" and you'll probably get a hit or two that might be interesting.

Statement

The statement of Shapley–Folkman is a little funny if you're unfamiliar with it, but it's the following.

We have a collection of sets $S_{1}, \dots, S_{m} \subseteq 𝐑^{n}$ , which need not be convex. Let $y \in 𝐑^{n}$ be a vector in the sum of the convex hulls of these sets; i.e., one which satisfies

y = x_{1} + x_{2} + \dots + x_{m},

where $x_{i} \in c o n v (S_{i})$ . Then $y$ can be written in the following way

y = {\tilde{x}}_{1} + {\tilde{x}}_{2} + \dots + {\tilde{x}}_{m},

where ${\tilde{x}}_{i} \in c o n v (S_{i})$ and, importantly, for at least $m - n$ indices $i$ , satisfies ${\tilde{x}}_{i} \in S_{i}$ . In other words, given any point $y$ which is the sum of points lying in the convex hulls of the sets, $c o n v (S_{i})$ , then $y$ can be written as the sum of points lying in the actual sets $S_{i}$ , for 'most' indices $i$ ; no more than $n$ indices will lie in $c o n v (S_{i})$ but not $S_{i}$ .

One simple interpretation of this statement is that, given a lot of sets $S_{i}$ , relative to the number of dimensions (when $m ≫ n$ ), then the resulting set, which is the (Minkowski) sum of sets, is 'very close' to a set that is convex.

Proof

The proof requires a little bit of extra set up, but not too much.

Statement variation

We will show the proof in an inductive way, with a slightly different set up than the one above. In our set up, there exist some sets $S_{1}, \dots, S_{m} \subseteq 𝐑^{m}$ and some point $y \in 𝐑^{n}$ such that

y = x_{1} + \dots + x_{m},

and each $x_{i} \in c o n v (S_{i})$ . Using the definition of the convex hull, this is the same as saying that, for each set $i = 1, \dots, m$ , there exist $z_{i j} \in S_{i}$ and weights $γ_{i j} > 0$ with $j = 1, \dots, n_{i}$ , such that

x_{i} = \sum_{j = 1}^{n_{i}} γ_{i j} z_{i j},

and the weights sum to $1$ :

\sum_{j = 1}^{n_{i}} γ_{i j} = 1,

for each $i = 1, \dots, m$ . In this case, $n_{i}$ denotes the number of elements of $S_{i}$ whose convex combination results in $x_{i}$ , all with nonzero coefficients. (This is why we require that $γ_{i j} > 0$ , otherwise, if $γ_{i j} = 0$ for some index $j$ , we can remove this entry and reduce $n_{i}$ by 1.) We will show the following inequality can be made true:

\sum_{i = 1}^{m} (n_{i} - 1) \leq n .

This implies the original claim, since $x_{i}$ lies in $S_{i}$ if, and only if, $n_{i} = 1$ , so the sum is an upper bound on the number of sets which have $n_{i} > 1$ ; i.e., the largest number of indices $i$ with $n_{i} > 1$ is $n$ , or, equivalently, at least $m - n$ indices satisfy $n_{i} = 1$ and therefore have $x_{i} \in S_{i}$ .

Proof

Given that set up, we will show that, if

\sum_{i = 1}^{m} (n_{i} - 1) > n,

then we can always set at least one of the weights $γ_{i j}$ to $0$ (potentially changing some other weights along the way) such that the left hand side decreases by at least 1. Applying this statement inductively gives us the result.

So, let's get to it!

The one trick in this proof is to note that we can choose a 'privileged' index $j$ , which, in our case, we will just choose to be the first index. We'll write $x_{i}$ , for each $i$ , splitting out the first entry:

x_{i} = γ_{i 1} z_{i 1} + \sum_{j = 2}^{n_{i}} γ_{i j} z_{i j} .

We interpret the sum on the right as zero if $n_{i} = 1$ . (Note that, by definition, $n_{i}$ cannot be zero! So this expression is always well-defined.)

We can then write $y$ as

y = \sum_{i = 1}^{m} γ_{i 1} z_{i 1} + \sum_{i = 1}^{m} \sum_{j = 2}^{n_{i}} γ_{i j} z_{i j} .

Now, if $\sum_{i = 1}^{m} (n_{i} - 1) > n$ , then there are at least $n + 1$ vectors in the second sum and $n$ is the dimension of the vectors. So there exists some weights $α_{i j} \in 𝐑$ such that

\sum_{i = 1}^{m} \sum_{j = 2}^{n_{i}} α_{i j} (z_{i j} - z_{i 1}) = 0,

where $i = 1, \dots, m$ and $j = 2, \dots, n_{i}$ . (Very importantly, note that the $α_{i j}$ need not be nonnegative!) Because this is zero, we can multiply it by any constant, $η \in 𝐑$ and add it to our expression for $y$ to get, for any choice of $η$ :

y = \sum_{i = 1}^{m} (γ_{i 1} - η \sum_{j = 2}^{n_{i}} α_{i j}) z_{i 1} + \sum_{i = 1}^{m} \sum_{j = 2}^{n_{i}} (γ_{i j} + η α_{i j}) z_{i j} .

There exists at least one $η$ such that at least one of the terms

γ_{i 1} - η \sum_{j = 2}^{n_{i}} α_{i j}, or γ_{i j} + η α_{i j},

where $i = 1, \dots, n$ and $j = 1, \dots, n_{i}$ , is equal to zero. The smallest (in absolute value) such $η$ will ensure that all of the terms are nonnegative and at least one is zero. (Why?) For this $η$ , define

{\tilde{γ}}_{i 1} = γ_{i 1} - η \sum_{j = 2}^{n_{i}} α_{i j}, and {\tilde{γ}}_{i j} = γ_{i j} + η α_{i j},

for $i = 1, \dots, m$ and $j = 1, \dots, n_{i}$ . From the definition of $η$ and the discussion above, we know that ${\tilde{γ}}_{i j} \geq 0$ , with at least one entry zero, and satisfies

\sum_{j = 1}^{n_{i}} {\tilde{γ}}_{i j} = \sum_{j = 1}^{n_{i}} γ_{i j} = 1,

for each $i = 1, \dots, m$ , which is easy to show from the definition. Removing the nonzero entries, we then reduce at least one $n_{i}$ by one, proving the claim.

Discussion

Interestingly, this procedure is essentially constructive: we only need the ability to solve a linear system to perform it, but the 'algorithm' provided will be very slow. (I say "essentially" here, since there's a hidden cost: we also need to be able to write an explicit convex combination of points in $S_{i}$ that yield $x_{i}$ ; it is not obvious how to do that in general if the set $S_{i}$ does not admit a simple polyhedral description, but is fairly 'easy' for many structured sets in practice.)