Posted by: yanzhang | December 9, 2012

## On Distances and Codes

I recently attended an interesting talk by Henry Cohn at the MIT combinatorics seminar on a certain class of (not necessarily linear) codes that include many of the classical creatures we know and love, such as the Hamming, Golay, and Reed-Solomon codes. This classification seeks to concretely capture the vague idea of “robustness” that people familiar with these codes would feel from working with them. A cute little discussion came up and I wish to record it here and possibly obtain some new ideas. Though the discussion was only tangentially related to the main ideas of Henry’s presentation, I find the question very natural.

The following construction came up: given a code of $k$ codewords in $F_q^n$, let $(A_0, A_1, \ldots)$ be the distance enumerator, (apologies to Henry for forgetting the actual name, but I think this is a natural name to use) vector of pairwise distances between ordered pairs of codewords, such that $A_i$ is the number of pairs of points with distance $i$ weighted by $1/k$ (this way $A_0 = 1$ and $\sum A_i = k$). This weaker invariant is then used to analyze the code — this is actually a strictly weaker invariant, since it can be seen that not all such vectors actually come from actual codes, and some inequalities (using some orthogonal polynomials) describe some necessary (but not sufficient) conditions for these vectors to come from actual codes. The advantage of having these invariants is that they already tell a lot of information (Henry thinks possibly too much information) about the codes when they actually come from codes. Henry was wondering if there is a way to formally capture the idea that these invariants capture so much information.

(Small digression: I personally find this concept similar to the weight enumerators in codes, which are generating functions that capture the weights (number of $1$‘s) in a binary code. Not all these generating functions actually come from codes, so the ones that do not are often called psuedocodes, but it is surprising how much information one can get from weight enumerators alone to tell stories and constraints about codes (see, for example, the MacWilliams identities). This concept is mirrored somewhat in Henry’s discussion.)

Anyway, I suggested thinking about the following natural question: consider the map that sends a code to its distance enumerator. Maybe you can show that the fibers of this map are fairly small, especially if you constrain yourself to, say, linear codes. If you can do this, it shows you don’t have a lot of wiggle room and it is clear that the distances themselves are going to capture most of the information.

Afterwards, Richard Stanley, being Richard, quickly figured out that this wouldn’t work. The following calculation suffices: there are ${q^n \choose k}$ codes. Distance enumerators sum to $k$, so an upper bound is something like $k + n -1 \choose k$. But it is clear that the fibers are way too large if we, say, let $n$ grow. What if we limit to linear codes? Well, we can count these with our favorite q-analogue, ${n \choose \log(k)}_q$. (I am taking base $2$ here, as a $t$-dimensional code has $2^t$ elements). We have many fewer of these, but since $q > 1$ these are still huge compared to the upper bound for the distance enumerators we gave, so it can’t be that limiting.

That said, the fact remains that it is unclear that there is a good algorithm to test if a given distance enumerator actually comes from a code (linear or not). This is definitely decidable since the problem is finite, but the naive algorithm is absolutely horrible, and I haven’t come up with any ideas. Anything is welcome! By the way, it is fun to try this in $R^n$ instead of $F_q^n$. Then the problem becomes: “given $n^2$ pairwise distances, do we know if they came from $n$ points?” This seems like a ridiculously natural question (maybe even applied-worthy) but I don’t know what is known about it. The closest is this: if we had a guess that matched up all the distances with all the pairs of points (this is $O(n^2!)$ time in the most naive case), we can look at the matrix and see if it is positive semi-definite! Kind of miraculously, such a matrix is a Gram matrix (comes from inner products) if and only if it is positive semi-definite. So this at least gives some idea of how to go on – maybe split this problem into assigning the ordered pairs and then seeing if the matrix comes from inner products? I’m not aware of a similar condition over finite fields for the second part, so maybe this approach is useless in that case.

Happy end of semester, everyone.

-Yan