Logical Entropy

Shannon entropy is defined for measures, not random variables. It is a common mistake to define the entropy of a discrete random variable X, where the probability X takes on the value x_j with probability p_j, as -\sum_j p_j \log p_j. Note this definition of entropy does not involve any x_j.

It is an even worse mistake to define the entropy of a random variable with density f as -\int_\mathbf{R}f(x)\log f(x)\,dx. It is true that every random variable X with cumulative distribution F(x) = P(X\le x) can be modelled as the identity function on the sample space of real numbers with probablility measure P((-\infty, x]) = \int_{-\infty}^x dF(x).

Logic

Boole and De Morgan layed down the foundation for computer science by showing predicate calculus can be reduced to arithmetic.

Frege introduced the notion of parameterizing a predicate with a variable.

Set

Frege advanced the predicate calculus of logic developed by Boole and De Morgan by

A set is a collection of elements that are members of the set.

S = \{a,...\}.

Relation

A relation R on sets A and B is a subset of their cartesian product R\subseteq A\times B. The domain of R is \operatorname{dom}R = A and the codomain of R is \operatorname{cod}R = B. We write {R\colon A\leftrightarrow B} and aRb when {(a,b)\in R}. The codset {aR = \{b\in B\mid aRb\}} and the domset {Rb = \{a\in A\mid aRb\}}. If S\subseteq B\times C is a relation on sets B and C, then the composition SR\colon A\leftrightarrow C is a relation on A\times C where a(SR)c if and only if there is a b\in B with aRb and bSc. This is used to define the join of relational database tables.

Function

A relation R\colon A\leftrightarrow B is a function if the codset aR has exactly one element for every a\in A. To indicate this we write R\colon A\to B and b = R(a) where b is the unique element of aR. A function is injective, or one-to-one, if Rb has at most one element for all b\in B.

Exercise. If f\colon A\to B is injective and g,h\colon C\to A are functions where fg = fh then g = h.

A function is surjective, or onto, if |Rb| > 0 for all b\in B. There is some a\in A with aRb.

Exercise. If f\colon A\to B is surjective and g,h\colon B\to C are functions where gf = hf then g = h.

A part of a set A is an injective function P\colon I\to A. It corresponds to the subset P(I) = \{P(i)\mid i\in I\}\subseteq A.

A partition of a set A is a surjective function \Pi\colon A\to I. Let {A_i = \{a\in A\mid \Pi(a) = i\}}.

Exercise. Show either A_i = A_j or A_i\cap A_j = \emptyset for i,j\in I.

Exercise. Show \cup_{i\in I}A_i = A.

Partitions are a mathematically rigorous way of representing partial information. Given a set A, full information is the partition of singletons {\{\{a\}\mid a\in A\}}, no information is \{A\}, partial information is knowing only which A_i in the partition that a\in A belongs to.