The punchline, of course, is that the two studies come from the same data. Judea Pearl mentions it as a throwaway comment in a general talk “The Art and Science of Cause and Effect,” which I read in his great work *Causality*. In particular, the paper referenced was Goldberger’s 1984 paper *Reverse Regression and Salary Discrimination, *with actual data demonstrating the effect (though with a different color palette). I find it surprising that Pearl would dedicate an entire chapter in *Causality* to Simpson’s Paradox but only one comment in the talk to this effect. While it is somewhat similar to Simpson’s Paradox (which I may talk about some other time), they are decidedly different in origin and I find this phenomenon much more relevant, prominent, and chilling. Let me explain.

I’ve heard this comment several times already. The argument goes something like this: unless the data is really pathological, we can assume the data is something like a curve for each blue and green on the education vs. salary plot. Let’s say that education is the x-axis and salary the y-axis. The first report shows that in general we need to have the blue curve **above** the green curve at all vertical slices of the graph. But this means the blue curve must be **left** of the green curve at all horizontal slices of the graph, which contradicts the second report.

Like all other examples of good math done in the wrong situations, the problem lies not with the logic above (which is correct) but with the assumptions. Instead, let’s make another very commonplace generative model:

**assume the employers were just and higher education corresponds to higher salary**, and there is no discrimination at all; (equivalently, have the education vs. salary plot be centered on the x=y line)- for whatever reason,
**have the blue men be more educated than the green men and thus get more salary**(this could be, of course, due to prior discrimination in the history of the world, socioeconomic circumstances, etc. But the point is we don’t have that kind of information in our data. That’s why I stressed that the**employers**were just and not the**world**). - important:
**add some noise**. So we don’t have well-defined lines, but some variation in ability and occasional misplacement of salary to ability.

This gives you a graph that looks more like the following:

And now you see it — on vertical slices, blue tend to be higher; on horizontal slices, blue tends to be on the right, something that we thought was very counter-intuitive due to the “lines” model. As Boris Alexeev pointed out to me, the **noise** is actually doing all of the work in this model!

The watchful reader will notice that this model makes very few assumptions and is very natural, and surely he/she can make some ideas of real-world data that follows this pattern. This is one main difference between the reverse regression effect and Simpson’s Paradox — by some fairly natural definition of “random,” a random set of data will get into the Simpson’s Paradox situation about 1/60 of the time (see Pearl). The RRE does not depend on such coincidences.

So what is really going on? Some people get very impatient at this point: “is there discrimination or is there not?” Well, whether you believe my simple generative model or not, what is objectively going on is just that the data is telling a very simple story, which is that A) **blue men are more educated **and B) **more educated people get paid more**. Sadly, by trying to torture the data to make it talk, we are overstepping our bounds on what we can extract from the data. What the two activists are doing is trying to pull some complicated mechanism, like discrimination, out in a contorted way from very simple data.

An obvious question to ask at this point is: “**how do we tell when discrimination exists**?” Well, discrimination is a complicated object and it could come in different forms, including, say:

- discrimination-1: employers get equally qualified people, but then pay them less if they were green men
- discrimination-2: employers are just, but the blue people discriminated against green people earlier and green men were put in tough socio-economical situations
- discrimination-3: …

The point is finding different kinds of discrimination is different, but we have an overloaded (and very emotionally charged!) word “discrimination.” **To really find discrimination, you have to define each carefully and look for them separately. **Using Pearl’s language, I would outline all different types of discrimination as different *causal mechanisms. *(formally, this means different graphical models using Pearl’s approach) I would then look at the world and see if those models make sense for the data. Qualitatively, it means something we knew intuitively all along: we need more history about this blue-green world and see how it compares with other worlds: maybe in most other worlds the green men are equally educated as blue men; maybe in this world the education of green men were repressed a century ago and education level has a hereditary effect. In our situation, a careful statement would be something like:

- discrimination-1: we don’t see discrimination-1, rather evidence against it;
- discrimination-2: we don’t see it, but we don’t see evidence against either. We don’t have the right kind of data to find discrimination-2.
- discrimination-3: …

Yes, reductionism makes it more complicated. But it is necessary if we need to make precise judgments.

Why did I mention the Dark Arts? Can this information be maliciously used? **Hell yes. I’ve literally just given you a recipe to finagle the politically charged idea of discrimination, GOING BOTH WAYS, from a single set of data**. So if you feel like it, you can take basically any reasonable population story and create a perfectly-reasonable-sounding discrimination case against either side! I can even go a step further and see a use case when someone can point to this very effect to claim that no discrimination has taken place in a world where it really did (for example, maybe discrimination has actually caused the blue-green difference in education in the first place)! It is very important to stress that **the RRE does not show that discrimination does not exist, rather that similar data like this example alone does not show discrimination**. If this commonplace but nuanced situation is not Dark Arts (the knowledge of which is prerequisite to defend against it) of mathematics, I don’t know what is. The only thing I can do is to use this knowledge responsibly; I hope it has similarly helped you.

Thanks to Boris Alexeev for first pointing me towards this effect and Pearl’s work on Causality. It was one of the best time investments I’ve ever made for mathematics.

-Yan

]]>

The following construction came up: given a code of codewords in , let be the *distance enumerator, *(apologies to Henry for forgetting the actual name, but I think this is a natural name to use) vector of pairwise distances between ordered pairs of codewords, such that is the number of pairs of points with distance weighted by (this way and ). This weaker invariant is then used to analyze the code — this is actually a strictly weaker invariant, since it can be seen that not all such vectors actually come from actual codes, and some inequalities (using some orthogonal polynomials) describe some necessary (but not sufficient) conditions for these vectors to come from actual codes. The advantage of having these invariants is that they already tell a lot of information (Henry thinks possibly too much information) about the codes when they actually come from codes. Henry was wondering if there is a way to formally capture the idea that these invariants capture so much information.

(Small digression: I personally find this concept similar to the weight enumerators in codes, which are generating functions that capture the weights (number of ‘s) in a binary code. Not all these generating functions actually come from codes, so the ones that do not are often called *psuedocodes*, but it is surprising how much information one can get from weight enumerators alone to tell stories and constraints about codes (see, for example, the MacWilliams identities). This concept is mirrored somewhat in Henry’s discussion.)

Anyway, I suggested thinking about the following natural question: consider the map that sends a code to its distance enumerator. Maybe you can show that the fibers of this map are fairly small, especially if you constrain yourself to, say, linear codes. If you can do this, it shows you don’t have a lot of wiggle room and it is clear that the distances themselves are going to capture most of the information.

Afterwards, Richard Stanley, being Richard, quickly figured out that this wouldn’t work. The following calculation suffices: there are codes. Distance enumerators sum to , so an upper bound is something like . But it is clear that the fibers are way too large if we, say, let grow. What if we limit to linear codes? Well, we can count these with our favorite q-analogue, . (I am taking base here, as a -dimensional code has elements). We have many fewer of these, but since these are still huge compared to the upper bound for the distance enumerators we gave, so it can’t be that limiting.

That said, the fact remains that** it is unclear that there is a good algorithm to test if a given distance enumerator actually comes from a code (linear or not)**. This is definitely decidable since the problem is finite, but the naive algorithm is absolutely horrible, and I haven’t come up with any ideas. Anything is welcome! By the way, it is fun to try this in instead of . Then the problem becomes: “given pairwise distances, do we know if they came from points?” This seems like a ridiculously natural question (maybe even applied-worthy) but I don’t know what is known about it. The closest is this: if we had a guess that matched up all the distances with all the pairs of points (this is time in the most naive case), we can look at the matrix and see if it is positive semi-definite! Kind of miraculously, such a matrix is a Gram matrix (comes from inner products) if and only if it is positive semi-definite. So this at least gives some idea of how to go on – maybe split this problem into assigning the ordered pairs and then seeing if the matrix comes from inner products? I’m not aware of a similar condition over finite fields for the second part, so maybe this approach is useless in that case.

Happy end of semester, everyone.

-Yan

]]>

]]>

Several contributors to this blog were mentors of PRIMES students: my students Ravi and Nihal presented some very nice work on generalizations of pattern avoidance in alternating permutations; this extends work in my thesis as well as work of Julian West and collaborators on shape-Wilf equivalence. Steven’s student Sheela presented work that’s a continuation of a project she began last year on the representation theory of Cherednik algebras. Yan’s student Aaron (who also is my coauthor in work based on his PRIMES project from 2011) presented his work studying the number of ways to put a graded poset structure on a given graph. As I understand it, this question comes from work of Yan on adinkras, and is both natural and apparently unstudied. The cutest result Aaron presented was the following: if *G* is a graph all of whose cycles are generated by its 4-cycles then the number of graded poset structures on G is , where is the chromatic polynomial of *G*.

]]>

I just wanted to share a very short and simple insight today that made me very happy thanks to the Whitney sum formula**: taking total Stiefel-Whitney classes are like exponentiating bundles** (because taking total Stiefel-Whitney classes of a direct sum of two bundles becomes multiplication of the total Stiefel-Whitney classes of the individual bundles. This is “well duh” type of information to seasoned topologists, but to me this is exciting for several reasons:

- for storing information in my brain, which is among one of the least well-suited-for-math brains in the department. For some reason, this aphorism-ish idea suddenly made it feel like I could manipulate these guys a lot better. Thankfully too, as a few months ago they were completely mysterious to me and I need them to do some computations.
- for computation, this allows us to divide-and-conquer. This is the obvious “better bombs and banks” reason for mathematicians. For me personally, it is using the folk idea that we can associate representations of a finite group with a real vector bundle and then calculate the Stiefel-Whitney classes of the bundle, in which case the decomposition of representations into direct sums exactly corresponds with taking direct sums of our bundles!
- for crazy ideas, this means I can think of it as a kind of exponential generating function associated to my bundle, in the sense that we associate exponential generating functions to combinatorial structures in combinatorics. It may be interesting to think about what “inverse Stiefel-Whitney classes” may mean, or even proving combinatorial generating function formulae as “categorification” of playing with bundles! I haven’t quite seen any good examples of this, so I’d be happy to hear some, or maybe even make some.
- for the cool idea that math “forces itself to happen,” it is a good mental experiment to consider what classes can possibly have this exponential property, in the sense you can think of defining the exponential function via . Of course we have to play with the constants a bit – in our analogue this is just making sure the classes take the value on trivial bundles – but it ends up being quite a restrictive property. Segal and Stretch’s “Characteristic Classes for permutation representations” explores this kind of perspective.

Just a short breath to catch some air – it is a busy year where I have some more logistical duties and side projects. Back to the topological caves I go, though I really hope this kind of thinking would be helpful for at least one of you.

]]>

1. I’ve been working with Laurent Gruson and Jerzy Weyman on finding geometric interpretations for orbits in “Vinberg -representations”. I gave a talk on this at Princeton (notes) and Michigan (notes). The Princeton talk is more introductory in nature, and even though there is overlap, the two sets of notes should complement one another (for the record, both talks were approximately 1 hour, 45 minutes).

2. A separate project with Weyman involves trying to understand Koszul homology for certain classes of determinantal-like ideals. The motivation comes from trying to classify minimal free resolutions over quadric hypersurface rings and in trying to understand a certain result of Koike and Terada in combinatorial representation theory. I’m giving a talk on this tomorrow at Michigan (notes).

It’s a bit time-consuming, but I think preparing notes like this can be very useful, especially for projects which haven’t been finished yet (it helps me gain direction). I hope more people try it!

-Steven

]]>

As with any object as general as posets, we are mostly interested not in results about all posets, but rather in finding particular families of posets with interesting or unexpected properties. One such family of posets are the (**3**+**1**)-avoiding posets. These are the posets that do not contain four elements, say *a*, *b*, *c*, and *d*, such that *a* < *b* < *c* and *d* is incomparable to the other three. A short digression to explain the name “(**3**+**1**)-avoiding”: one natural class of posets are the *chains*, finite total orders like the first example in the previous paragraph. A natural name for the chain with vertices is ** n**, so the chain with three vertices is

It’s a common surprise in combinatorics to find that important objects are characterized by avoiding certain induced subobjects; subposet-avoidance is one particular example of this, and in fact (**3**+**1**)-avoiding posets show up in a number of unexpected places. The simplest and perhaps nicest appearance is in the characterization of *semiorders*. Semiorders are posets that arise in the following way: we have a set of data (real numbers generated by some experiment) such that each datapoint has error bars of the same size. Thus, if two values *a* and *b* are separated by at least a fixed distance then we know their true order; but if they are separated by less than this distance, we can’t be certain which value is truly larger. The relations we can be certain of are the relations of our poset. It’s not hard to see that this definition is equivalent to the following: a semiorder is a poset whose elements are unit intervals in the real numbers, with one element less than another if and only if it lies entirely to its left. (Aside: an easy exercise is to show that it doesn’t matter whether our intervals are open or closed.) The main result of interest is that semiorders are exactly those posets that avoid **3**+**1** and **2**+**2**. (If we drop the “unit” in “unit intervals”, we get just (**2**+**2**)-avoiding posets.) These posets have all sorts of nice properties; for example, the number of them with *n* unlabeled elements is exactly the *n*th Catalan number, so we immediately know we’re going to get nice combinatorics and lots of connections with other objects. See Wikipedia and the work of Peter Fishburn for much more information about them.

A second set of connections comes from the following simple observation: a poset *P* avoids **3**+**1** if and only if its *incomparability graph* (i.e., the graph *G* on the same vertex set such that *u* is connected to *v* in *G* if and only if *u* is incomparable to *v* in *P*) is *claw-free*, i.e., contains no four vertices *a*, *b*, *c*, *d* such that *d* is connected to *a*, *b* and *c*, none of which are connected to each other. Claw-free graphs are quite interesting; for example, they make an appearance in some recent work of Fadnavis, who proved the following pretty result:

Suppose that we wish to color a graph

Gwithqcolors, choosing the color for each vertex at random. IfGis claw-free then to maximize the chance that the resulting coloring is a proper coloring (i.e., no two adjacent vertices have the same color), we should choose colors uniformly at random (i.e., with equal probabilities 1/q).

(Quite surprisingly, this result is not true in general! To 2-color a star graph with 4 or more points, you’re better off with a more lop-sided distribution.) Actually this result is just one part of a bigger story; for example, it’s also related to the Stanley-Stembridge conjecture, which asserts that symmetric chromatic polynomial of the comparability graph of a (**3**+**1**)-avoiding poset is *e*-positive. (For definitions, check the Fadnavis paper, which is really excellent and has a lot of interesting material.)

As an enumerative combinatorialist, all these nice features of (**3**+**1**)-avoiding posets make me want to count them. And, in some sense one should expect this to be not too difficult: claw-free graphs and (**3**+**1**)-avoiding posets both have nice structural classifications (due to Chudnovsky & Seymour and to Skandera, respectively), and the related (**2**+**2**)-avoiding posets have been enumerated by Bousquet-Mélou, Claesson, Dukes & Kitaev. But, unfortunately, it seems like none of this is actually directly relevant. So at least for now, counting (**3**+**1**)-avoiding posets remains very much open.

Notes:

0: One of the fundamental objects of combinatorics is the *partially ordered set*, or *poset* for short.^{1} Posets are just what their name suggests: they are given by an order relation (usually denoted ) that is transitive (i.e., and means ) and antisymmetric (i.e., we never have such that and ), but it is partial in the sense that not every two elements are necessarily comparable (so we might have such that neither nor hold). Obviously posets are a very flexible family of objects, including things like the usual order on the integers (a partial order than happens to be a total order) or the containment order on the subsets of (the Boolean lattice).

1: Does anyone know the history of how this came to be? As far as I know, MacMahon didn’t do anything with posets, so my assumption is that one can trace it to Rota, but obviously this is not based on anything concrete at all.

]]>

Some of my mathematical hobbies include probability and machine learning. I have recently realized that many simple but effective ideas of these fields really all come from one thing: an independence assumption. It was only until I saw the same example in several different guises, however, before I really caught on. As something Occam would surely approve of, the extreme naiveté this approach embraces can actually go a long way. Our key player is simple: we say that and are *conditionally independent* given if . This can be written in the more symmetric form . Now I will tell a few stories. Most of these should be old for a specialist, but I hope I’ve included some remarks that even they may appreciate.

**Weighing Evidence**

(this is mostly ~~stolen from~~ inspired by Jaynes from *Probability Theory: the Logic of Science*) Suppose we have two hypotheses about the state of the world, say and , and we know that exactly one of them is true. Now suppose we are getting consecutive pieces of data . How does this data update our belief in or its complement?

A standard use of Bayes’s Theorem gives that Doing the same for and dividing gives

Here, let’s make the naive assumption that the ‘s are conditionally independent given or . Not only are these different, we in fact want the slightly stronger assumption that the ‘s are conditionally independent of the intersection of the previous ‘s given or . If so, on the right we just get a product of . We want to take logs here and rewrite our equation as

where is the *odds* . We now make the natural guess we are in the world exactly when this is positive (which corresponds to the having higher probability).

The cute thing about this situation is that we are really “weighing” our evidence, as on a balance! Each new piece of data just contributes a number, and we mentally keep a tally and just decide yes or no given the sign of the final sum. Our original “bias” is exactly the odds of given no other information, which exactly corresponds to the Bayesian information contained in our prior knowledge of , so the entire prior information comes into play as a “head start” bias in one direction, *as if it were a single instance of data with some weight*! This is a very clean way to make decisions, and really makes binary hypothesis testing very intuitive (it is fun/frustrating to try to generalize this to more than hypotheses, where there are quite a few unexpected pitfalls). The key, however, was our conditional independence assumption.

**Naive Bayes**

A very naive approach to spam filtering is the following (generative) model. Let nature choose whether a message is spam (call this hypothesis ) with some probability and then, for each word in the dictionary, pick whether each word is in the message (abuse notation and call this the event ) with probability (or ). You then pick maximum likelihood over sample data to learn the probabilities, and on a new piece of data just get the ‘s and see whether or was more likely.

So what was the naive assumption? It was that the events were conditionally independent given either hypothesis. Cutely, the final log odds is additive because of this (an exercise I leave to the reader) and extremely easy to calculate (and fast for computers, too!). The punchline is that this additivity is exactly the same additivity we had in the hypothesis testing. The expressions work out such that the inclusion/exclusion of every word will add some positive/negative number to a running total, biased in the beginning by some initial value determined by with no other information, and we choose to classify a message as spam based on whether this running total is positive or not.

Before continuing, I want to put in e-print my long-running gripe that “Naive Bayes” is a misnomer, since the only thing naive I can possibly think about doing with Bayes’s Theorem is conditional independence, and so every such algorithm should be called “naive.” The name offers no information about the particular algorithm that is associated with spam filtering, and the graphical network it corresponds to is not even the unique simplest graph (you can reverse all the arrows, for instance, and get Noisy-Or or any other ICI, but I admit those have another layer of complexity that I am not going into here). Unsurprisingly, there are actually many flavors of Naive Bayes depending on where you want to insert your naivete — see Metsis, Androutsopoulos and Paliouras, *Spam Filtering with Naive Bayes – Which Naive Bayes?*

**The Unreasonable Effectiveness of Naive Bayes**

What’s the yoga here? Here’s my take: the conditional independence became a multiplicative condition and thus an additive condition, so the convenience of independence corresponds to the convenience of linearity. Thus, the hyperbolic punchline of this post is that “independence is linearity.”

I see a strange phenomenon (at least among pure mathematicians casually talking about applied math; I’m sure applied mathematicians have a better intuition) that people are very comfortable accepting linear approximations but not as comfortable accepting independence, whereas at least in my very simple setup they are *exactly *the same. I will audaciously extend my analogy to say that this intuition is inconsistent, and I don’t know why people seem to be completely fine with logistic regression (which really just says the log-odds is additive and is thus a third story equivalent to the two stories I’ve told in this blog post!) while careful about making disclaimers about Naive Bayes.

In fact, Naive Bayes, contrary to popular opinion, is actually also very good (and provably optimal, with certain definitions of optimal) when events are *very *dependent! It is only in the middle regions where it suffers, and it really doesn’t suffer by much. We also overestimate its problems because we like to think in terms of but errors in classifying are frequently done under zero-one loss (this is a really interesting nuance that I would love to talk about some other time, but this post has gotten long enough). For a more in-depth look, see Domingos and Pazzani, *On the Optimality of the Simple Bayesian Classifier** under Zero-One Loss*.

**Appendix: Noisy-Or and Bayesian Networks
**

When we talk about conditional independence, we really should take the setup of Bayesian networks, which gives a natural excuse to introduce Naive Bayes’ much lesser well-known sister, Noisy-Or (that often does better than Naive Bayes!). I spent some time in my talk going over the basics of d-separation, Markov blankets, etc.. However, I realized that I had no real interesting observations so I won’t talk too much about it in blog format, where the reader is very close to Google and smarter people who know much more than I do. I did have one silly “original” contribution, however, so I share it here.

Here is an example that I thought was surprisingly clean and possibly helpful for someone interested in the basics of Bayesian networks: consider the events A(AC), B(Battery), and C(Computer), corresponding to whether the corresponding electronic gizmo is on or off (with the computer connected to both the AC and the battery). This corresponds to a Bayesian network with two edges and .

It is obvious that and are independent until is observed, which makes them conditionally dependent; if you know the Computer is on or off, then the other two power sources’ integrities are coupled. Otherwise, your blissful ignorance gives you no information. This is starkly different from *every other *orientation (3 possible) of the edges, where and are dependent but *conditionally independent* given ! This quirkiness makes the weird d-separation criterion necessary, and I thought this example very mnemonically convenient for marking the “bad” edge-orientation.

-Yan

]]>

(Positive knots become negative knots if we switch either our nomenclature or the orientation on , so there are confusions lurking everywhere in this business. Also, note that although we needed on orientation on our knot to define the sign of a crossing, the sign is actually independent of this orientation, and only depends on the embdedding of the knot in ).

A rather amazing theorem is that every positive braid is fibered. What this means is that if we take our positive braid and close it up into a knot in , then the “knot complement” , which we can think of as an open 3-manifold, actually fibers over the circle, with fiber a punctured surface. (Alternatively, if we remove a small tubular neighborhood of the knot, we can think of as a compact 3-manifold with torus boundary). Here’s how to close the braid:

In other words, there is a punctured surface , and a map , fixing the puncture, so that the knot complement is the mapping torus for , as pictured below. Such fibered 3-manifolds are very special.

Note that it is precisely because fixes the puncture that the line above closes up and becomes a knot. I should say, the proof that positive braids are fibered uses an even more amazing theorem of Stallings, which characterizes fibered knot complements in terms of a simple algebraic property of their fundamental group. This particular notion of positivity is one that appears in Matt’s paper. I recently read some work of Etienne Ghys talking about a related notion, and I thought it was so cool that I had to post about it.

Here’s the theorem, which Ghys attributes to Freed, Schwarzman, and Sullivan. Let be a compact manifold with a non-vanishing, smooth vector field . First for the background: suppose there is a closed surface which is transverse to the flow of and meets the forward orbit of every point at least once (therefore, it meets every orbit infinitely many times). We get a first return map , simply by taking and flowing it forward until it hits again, say at , and defining . Then it’s not hard to check that must be the mapping torus for , as before, and that is just the natural vector field pointing along the “time” direction of the mapping torus (up to scaling), as depicted below:

In this case, is called the suspension of . The question addressed by the theorem is: given a non-vanishing vector field on , when is it the suspension of a map ? Note that if we have such a suspension, and therefore a fibration over , we can pull back the form on to to get a closed, NON-VANISHING 1-form which is positive on . It’s not so hard to figure out that having such a form is equivalent to being a suspension. The really cool theorem is an apparently much weaker condition which is also sufficient.

The key object is the set of probability measures on which are invariant with respect to . Given any , and any 1-form , we can get a number by integrating:

This associates to any a 1-chain, i.e. something dual to a 1-form. If is exact, the above integral can be shown to be 0, using the -invariance of (use the invariance to rewrite the integrand, when is an exact form, as a total differential). Therefore, we obtain a map

whose image is in fact compact and convex. Now, if was a suspension, then actually lies entirely in some positive half-space of . Why? Well, remember that in this case we have a closed non-vanishing one-form which is positive on , and by pairing with this form we get a map which is positive on . Therefore lies in a positive half-space of . The rad theorem of FSS is that this is actually sufficient:

Theorem: is a suspension if and only if is contained in some halfspace.

One remark about the measures : note that if we have a closed periodic orbit for , i.e. some closed loop which integrates the flow, then we get a natural set of measures which are concentrated near . In this sense, should be thought of as a set of generalized periodic orbits. The measures associated to actual periodic orbits just get sent by to the class represented by these orbits in .

One way to think about the theorem, and this is how one of the proofs goes, is that just from this positive subset of homology, we can do some fancy functional analysis to create from this an actual dual form, not just a cohomology class, with the right non-vanishing and positivity properties. So somehow, we’re free to work just in homology without losing information, which seems very appealing.

I’m not yet sure exactly what this theorem useful for, but it’s so neat. In a soon to come follow up, I will talk more about positivity on the knot side, and bleg about a concrete question that I’d love to get everyone’s opinion on.

]]>

One of the most common physical tricks, however, is not of this category. It is the curiously natural framework: “we have a consistent idea of units.” Here’s a perfectly sound argument to get something that is not entirely obvious:

Take the integral . There is a way to get some information about it without doing the real integral:

Do the substitution . Then the integral becomes

Using a slightly physical language: f we don’t care about the actual constant, just the “order” of (though it is a similar concept, we’re not exactly doing the order of growth of ), we can deduce that the answer is in the “units” of (the complete answer is ), by “isolating” the part of the integral with dependence on .

Even though this is already somewhat trick-sy, it is not quite as far as what a physicist would do. They would (confirmed by experience!) look at this and say something like:

“Let have the units of . The exponential must be unit-less otherwise it doesn’t have a well-defined unit, which means must have units of and the integrand itself must then be unit-less. When we integrate, we then pick up a single unit of in terms of , so it must be for some constant .”

The problem is this makes perfect “sense” to me in a completely sound way (there is no approximation or heuristic here), yet I cannot argue it to my satisfaction in any mathematical matter. All I know is that most people with even elementary physics experience have picked up a very consistent language of “units” that we can use to make definite deductions, but I’m finding it hard to axiomatize them in a clear way. After trying for about half an hour, the only thing I’ve decided is that we really want some sort of valuation on a space of functions that is multiplicative, which I believe is enough to make the differentiation and integration instincts about units work, and that we limit all addition to be done with functions of homogeneous valuation. However, is that really it (for example, I don’t feel this is all that has gone into the logic above)? If so, what is the right way to formalize it? Also, I distinctly remember having seen usage of units to argue more sophisticated chains of logic than the example I’ve given here, though the exact examples don’t come to mind. If anyone has further insight and examples it would be really helpful.

**Update**: after an unnecessarily long discussion w/ Qiaochu (the source of the un-necessity being my muddled thinking about something irrelevant), I now agree the formalism is “easy” and can be done in several ways (though I still find the intuition to be a clearer way to think than the formalism). The method that seems most natural to me is to just think of all functions we care about as lying in a graded algebra with grades indexed by powers of units; Qiaochu prefers to think of the “physical” attributes as living in one-dimensional representations / weight spaces. Pick whatever you like. My request for more “interesting” examples of using units still holds.

-Yan

(thanks to Allan, Yoni, and Josh for teaching me physics)

]]>