Logical Entropy

Keith A. Lewis

January 26, 2025

Abstract

An improvement over Shannon entropy

Shannon entropy is defined for measures, not random variables. It is a common mistake to define the entropy of a discrete random variable X, where the probability X takes on the value x_j with probability p_j, as -\sum_j p_j \log p_j. Note this definition of entropy does not involve any x_j.

It is an even worse mistake to define the entropy of a random variable with density f as -\int_\mathbf{R}f(x)\log f(x)\,dx. It is true that every random variable X with cumulative distribution F(x) = P(X\le x) can be modelled as the identity function on the sample space of real numbers with probablility measure P((-\infty, x]) = \int_{-\infty}^x dF(x).

Joint distributions. Negative entropy.

Logical entropy can incorporate values or random variables…

Logic

Boole and De Morgan layed down the foundation for computer science by showing predicate calculus can be reduced to arithmetic.

Frege introduced the notion of parameterizing a predicate with a variable.

Set

Frege advanced the predicate calculus of logic developed by Boole and De Morgan by

A set is a collection of elements that are members of the set.

S = \{a,...\}.

Relation

A relation R on sets A and B is a subset of their cartesian product R\subseteq A\times B. The domain of R is \operatorname{dom}R = A and the codomain of R is \operatorname{cod}R = B. We write {R\colon A\leftrightarrow B} and aRb when {(a,b)\in R}. The codset {aR = \{b\in B\mid aRb\}} and the domset {Rb = \{a\in A\mid aRb\}}. If S\subseteq B\times C is a relation on sets B and C, then the composition SR\colon A\leftrightarrow C is a relation on A\times C where a(SR)c if and only if there is a b\in B with aRb and bSc. This is used to define the join of relational database tables.

Function

A relation R\colon A\leftrightarrow B is a function if the codset aR has exactly one element for every a\in A. To indicate this we write R\colon A\to B and b = R(a) where b is the unique element of aR. A function is injective, or one-to-one, if Rb has at most one element for all b\in B.

Exercise. If f\colon A\to B is injective and g,h\colon C\to A are functions where fg = fh then g = h.

A function is surjective, or onto, if |Rb| > 0 for all b\in B. There is some a\in A with aRb.

Exercise. If f\colon A\to B is surjective and g,h\colon B\to C are functions where gf = hf then g = h.

A part of a set A is an injective function P\colon I\to A. It corresponds to the subset P(I) = \{P(i)\mid i\in I\}\subseteq A.

A partition of a set A is a surjective function \Pi\colon A\to I. Let {A_i = \{a\in A\mid \Pi(a) = i\}}.

Exercise. Show either A_i = A_j or A_i\cap A_j = \emptyset for i,j\in I.

Exercise. Show \cup_{i\in I}A_i = A.

Partitions are a mathematically rigorous way of representing partial information. Given a set A, full information is the partition of singletons {\{\{a\}\mid a\in A\}}, no information is \{A\}, partial information is knowing only which A_i in the partition that a\in A belongs to.