Monoid

If a, b, and c are elements of a monoid then (ab)c = a(bc) so we can unambiguously write abc. This is the basis of the map-reduce algorithm. To compute abcd we can compute e = (ab) and f = (cd) independently then ef to get the final result. Monoids are also used for pivot tables. The partial calculations e and f are aggregations of the of the entire data set a, b, c, and d. This becomes more interesting when many data values are involved.

A moniod is a set M with a binary operation m\colon M\times M\to M that is associative and has an identity element e. A binary operation m is associative if m(m(a,b),c) = m(a,m(b,c)), a,b,c\in M. Writing ab for m(a,b) this means (ab)c = a(bc), so writing abc is unambiguous. An identity element e satisfies ea = a = ae for a\in M.

A semigroup is a set with a binary associative operation. A semigroup can be turned into a monoid by adding an identity element. If S is a semigroup let M = S\cup\{e\} (where e\notin S) and define se = s = es for s\in S.

Examples

Let M be the set of real numbers \boldsymbol{R}. Addition and multiplication are commutative with identity 0 and 1 respectivly. Minimum and maximum are commutative with identity \infty and -\infty respectively.

Subsets of a set M are a commutative monoid under union and intersection with identity \emptyset and M respectively.

String concatenation forms a non-commutative monoid with identity element the empty string.

Categories with a single object form a monoid. A (small) category is a partial monoid with left and right identities. A category \mathbf{C} has a partial binary operation \circ\colon\mathbf{C}\times\mathbf{C}\rightharpoonup\mathbf{C} and for every f\in\mathbf{C} there exist right and left identities {}_f e\in\mathbf{C} with f\circ {}_f e = f and e_f\in\mathbf{C} with e_f\circ f = f. Using the usual category theory notation f\colon A\to B where A and B are objects with arrow f we can identify {_f}e with A = \operatorname{dom}f and e_f with B = \operatorname{cod}f. Objects are determined by arrows in category theory.

Aggregation

If M is a commutative monoid let M^+ be the collection of all finite subsets of M. Define a function m^+\colon M^+\to M by m^+(\{m_1,\ldots,m_n\}) = m_1 \cdots m_n when m_j are distinct and m^+(\emptyset) = e, the monoid identity.

Exercise. Show this is well-defined.

Hint: This requires M to be commutative.

We can define a binary operation on disjoint finite subsets of M to M by {m^+(A,B) = m^+(A\cup B)}.

Exercise. If A,B\subseteq M are finite disjoint subsets show {m^+(A,B) = m^+(A)m^+(B)}.

Exercise. If A,B,C\subseteq M are finite pairwise-disjoint subsets show m^+(m^+(A, B), C) = m^+(A, m^+(B,C)).

Since m^+(\emptyset, A) = A = m^+(A, \emptyset) this shows finite disjoint subsets of a monoid form a monoid with identity \emptyset.

The Kleene star of M is the union of all finite sequences of elements of M, M^* = \cup_{n\ge0} M^n where M^n = \{(m_1,\dots,m_n)\mid m_j\in M\}. Define m^*\colon M^*\to M by m^*((m_1,\ldots,m_n)) = m_1 \cdots m_n and m^*(()) = e, the monoid identity.

Exercise. Show M^* is a monoid under m^* with identity (()).

Equivalence Relation

Equivalence relations are a generalization of equality. Equality satisfied a = a, if a = b then b = a and if a = b and b = c then a = c.

An equivalence relation on a set S is a subset R\subseteq S\times S with (s,s)\in R, (s,t)\in R implies (t,s)\in R, and if (s,t)\in R and (t,u)\in R then (s,u)\in R. We write sRt for (s,t)\in R.

Exercise. Show aRa, aRb implies bRa, and if aRb and bRc then aRc if R is an equivalence relation.

Exercise. Show I = \{(s,s)\mid s\in S\} is an equivalence relation.

The equivalence class of s\in S is \overline{s} = \{s'\in S\mid sRs'\}.

Exercise. Show \overline{S} = \{\overline{s}\mid s\in S\} is a a partition of S.

Hint: A partition of S is a collection of pairwise disjoint sets whose union is S.

Exercise. If \overline{S} = \{S_i\}_{i\in I} is a partition of S show sRs' if and only if s,s'\in S_i for some i\in I is an equivalence relation.

This shows equivalence relations on S are in 1-1 correspondence with partitions of S.

Pivot Tables

Pivot tables use monoids to aggregate data. Suppose we have tick data for stock prices (t_j, s_j). We can partition the trading times \{t_j\} by years, months, days, etc. The high and low price apply max and min to all stock prices falling in a given partition. The open and close price apply min and max to the times in a given partition and return the corresponding stock price.

The first step in creating a pivot table is to specify a function. For the example above define S\colon T\to \boldsymbol{R} by S(t_j) = s_j. If \overline{T} is a partition of T and \mu is a commutative binary opreration on \boldsymbol{R} define \overline{S}\colon \overline{T}\to\boldsymbol{R} by \overline{S}(\overline{t}) = \mu^+(S(\overline{t})). This is how high and low are defined using max and min on \boldsymbol{R} respectively. We can also define \underline{S}\colon \overline{T}\to\boldsymbol{R} by \underline{S}(\overline{t}) = S(\mu^+(\overline{t})). This is how open and close are defined using min and max on T respectively.