Pivot Table

Keith A. Lewis

April 25, 2024

Abstract
The mathematical foundations of pivot tables.

Pivot tables summarize multidimensional data values using measures. A pivot table is a function t\colon D\to V where D are the dimensions and V are the values. If \Delta is a collection of subsets of D and \mu is a function from subsets of V to V we can define the pivot table t_{\Delta,\mu}\colon\Delta\to V by t_{\Delta,\mu}(\delta) = \mu(t(\delta)) for \delta\in\Delta where t(\delta) = \{t(d):d\in\delta\}\subseteq V. Any function u\colon D'\to D can be be composed with t to get pivot table data t\circ u\colon D'\to V.

For example, if D is the set of all datetimes we can partition D into days using the equivalence relation t_1\sim t_2 if and only if t_1 and t_2 belong to the same day. If V are prices then we can aggregate maximum or minimum prices over days to get daily high and low prices.

If D'\subseteq D are the market open times then the composition gives the open prices.

Pivot tables become more useful when dimensions and values have more structure.

Partition

A partition \Delta of a dimension D is a collection of subsets of D satisfying D = \cup \Delta (for every d\in D there exists \delta\in\Delta with d\in \delta) and for \delta,\epsilon\in\Delta either \delta\cap \epsilon = \emptyset or \delta = \epsilon. The elements of a partition are called atoms. If \Delta = \{\delta_1,\ldots,\delta_n\} is finite then it is a partition of D if and only if \cup_j \delta_j = D and \delta_j\cap\delta_k = \emptyset if j\not= k.

A function t\colon D\to V is measurable with respect to the partition \Delta if it is constant on atoms of \Delta. In this case t|_\Delta\colon\Delta\to V is a function on the atoms since t|_\Delta(\delta) = t(d) for any d\in\delta\in\Delta.

A partition \Delta' is a refinement of the partition \Delta if every atom of \Delta is a union of atoms of \Delta'. We say \Delta' is finer than \Delta and \Delta is coarser than \Delta'. The set of atoms of \Delta' contained in D\in\Delta is denoted \Delta'|D.

Exercise. Show the the coarsest partition of a set D is the single set \{D\} and finest partion of D is the collection of singletons \{\{D\}\} = \{\{d\}:d\in D\}.

Note D is in one-to-one correspondence with \{\{D\}\} by d\leftrightarrow\{d\} for d\in D. Any function t\colon D\to V corresponds to a function t\colon\{\{D\}\}\to V where \{\{d\}\}\mapsto t(d), d\in D. Above we called this function t|_{\{\{D\}\}}. Every function from t\colon D\to V is \{\{D\}\} measurable.

Measure

A measure on a set S is a set function \mu\colon\mathcal{P}(S)\to\mathbf{R} with \mu(\emptyset) = 0 and \mu(E\cup F) = \mu(E) + \mu(F) whenever E\cap F=\emptyset. We use the notation \mathcal{P}(V) = \{E\subseteq S\} for the collection of all subsets and call it the power set of S. Recall A^B = \{f\colon B\to A\} is the set of all functions from B to A. If 2 = \{0,1\} then the power set of S can be identified with 2^S where E\subseteq S correponds to the characteristic function 1_S.

Exercise. Show if \mu is a measure on S then \mu(E\cup F) = \mu(E) + \mu(F) - \mu(E\cap F) for any subsets E,F\subseteq S.

Hint: E is the disjoint union of E\setminus F and E\cap F. E\cup F is the disjoint union of E\setminus F, E\cap F, and F\setminus E.

Measures don’t count things twice. The canonical example of a measure is counting the number of elements in a set.

If \Delta' is a refinement of \Delta and \mu' is a measure on \Delta' we can define a measure \mu'|_\Delta on \Delta by (\mu'|_\Delta)(\delta) = \mu'(\Delta'|\delta) for \delta\in\Delta.

Exercise. If \mu' is a measure on a partion \Delta' show \mu'|_\Delta is a measure on any coarser partition \Delta.

Solution

With \mu = \mu'|_\Delta, we must show \mu(\emptyset) = 0 and \mu(E\cup F) = \mu(E) + \mu(F) when E\cap F = \emptyset for E,F\in\Delta. Since \Delta'|\emptyset = \emptyset we have \mu(\emptyset) = \mu'(\emptyset) = 0. If E,F\in\Delta are disjoint then so are \Delta'|E and \Delta'|F hence \mu(E\cup F) = \mu'(\Delta'|(E\cup F)) = \mu'((\Delta'|E)\cup(\Delta'|F)) = \mu'(\Delta'|E) + \mu'(\Delta'|F)) = \mu(E) + \mu(F).

If t''\colon\Delta''\to V, \Delta'' is finer than \Delta', and \Delta' is finer then \Delta then t''|_\Delta = (t''|\Delta')|_\Delta.

We can generalize measures by replacing \mathbf{R} with any abelian monoid V. Since \mathbf{R} is an abelian group under addition with identity 0 it is an abelian monoid. A monoid is a group that is not required to have inverses. We require \mu(\emptyset) to be the identity of the monoid and \mu(E\cup F) = \mu(E) \oplus \mu(F) whenever E\cap F = \emptyset where \oplus is the binary operation of the monoid.

The set [-\infty,\infty) with binary operation x\vee y = \max\{x,y\} is an abelian monoid with identity -\infty.

Exercise. Show x\vee(y\vee z) = (x\vee y)\vee z (associative), x\vee y = y\vee x (commutative), and x\vee -\infty = x (identity) for x,y,z\in [-\infty,\infty).

Similarly, (-\infty,\infty] with x\wedge y = \min\{x,y\} and identity \infty is an abelian monoid.

If S is finite every measure is determined by \{\mu(\{s\}): s\in S\}.

Exercise. If S is a monoid show \mu(\{s_1,\ldots,s_n\}) = \mu(\{s_1\})\oplus\cdots\oplus\mu(\{s_n\}) where s_j\in S, 1\le j\le n.

Solution

This follows from V being an abelian monoid. By associativity \mu(\{s_1\})\oplus\cdots\oplus\mu(\{s_n\}) is well-defined. Any permutation of (s_j) in the left-hand side set results in the same set. Commutativity shows the right hand side is unchanged by any permutation.

Cartesian Product

Pivot tables become more interesting when D and V are cartesian products.

DAX

Microsoft created the language Data Analysis eXpressions for manipulating pivot tables. A pivot table is just a database table where some columns are measures.

Notes

Fix D, V, \mu, and t\colon D\to V. Let \{\tau\colon Delta\to V: \Delta/D\} be the objects of a category. There is an arrow \tau'\to\tau if \Delta' is a refinement of \Delta and \tau(\delta) = \mu(t(\Delta'|\delta)) for \delta\in\Delta. We say \tau = \tau'|_\Delta if this holds.

The object \{\{t\}\}\colon\{\{D\}\}\to V is \{\{t\}\}(\{d\}) = t(d) for d\in D. It is the initial object of the category.

Exercise. Show \tau(\delta) = \mu(t(\delta)).

Exercise. Arrows are unique.

Since \mu(t(\Delta|\delta)) = \mu(t(\delta)) = \tau(\delta) the identity function \tau\to\tau is an arrow.

The composition of \tau' and \tau, \tau\tau'\colon\Delta\to V, is defined by $’() =

Exercise. Show the identity function is an identity of the category.