### Simple classical definition

Let $U$ be the space of all the possible states of a *classical* system and let $N_U$ be the number of elements of $U$. Here $U$ is just a set, rather than a vector space.

We can represent the elements of $U$ simply with the labels $$U = \left\{1,2,...,N_U\right\}$$ and we define a probability distribution, $P$, on $U$ to be a function of the form $$\begin{align*}P : & U \rightarrow [0, 1]\\& \sum_{i=1}^{N_U} P_i = 1\end{align*}$$

Suppose all we know about a system is that it is in the subset $A \subset U$, consisting of $N_A$ *equally likely* states. Then, the probability distribution must be $$P_i = \left\{\begin{align*}\tfrac{1}{N_A} &\text{ if } i \in A\\0 &\text{ otherwise }\end{align*}\right.$$ for all $i \in U$

In this case, we are somewhat ignorant - since the system could be in any one of $N_A$ states - and we can say that the *degree of ignorance* is $N_A$.

- If $N_A = 1$ then we know everything - the system is in a definite state - and we say there is
*no entropy*. - If $N_A = N_U$ then we know nothing - the system could be in any state - and we say there is
*maximum entropy*.

Under these particularly simple conditions, we define the **entropy** of the system to be $$S = \log N_A$$

Nb. This definition is related to the formula comparing the number, $N$, of states that a system can be in with the number, $n$ of bits needed to describe it, where $N = 2^n \Rightarrow n = \log_2 N$

### General classical definition

We would like a more general definition, where the probability distribution can be chosen arbitrarily, where possibly some states are somewhat more likely than others. What we would like is not just a measure of the number of states in the system but also a measure where more probable states weigh more than less probable states.

Let $U$ again be the space of all possible states of a system and for all $i \in U$, let $P_i$ be the probability that the system will be in the $i^\text{th}$ state. Then, a more general definition of the entropy of the system is given by $$S = -\sum_{i=1}^{N_U}P_i \log P_i$$

The minus-sign comes in due to the fact that $$\begin{align*}0 \leq P_i \leq 1 &\Rightarrow \log P_i \leq 0\\&\Rightarrow S \geq 0\end{align*}$$

Now, suppose that the system in state $A$, where now, it's not necessarily true that each state is equally likely. We can still say that if the number of states in $A$ is $N_A = 1$ then the entropy is *zero*, since we would have, form some $j \in U$, $$P_i = \left\{\begin{align*}1 &\text{ if } i = j\\0&\text{ otherwise }\end{align*}\right.$$ and so $$\begin{align*}S &= -\sum_{i=1}^{N_U}P_i \log P_i\\&= -P_j \log P_j - \sum_{i \neq j}P_i \log P_i\\&= -1 \log 1 - \sum_{i \neq j}0 \log 0\\&= 0\end{align*}$$

On the other hand, if $N_A = N_U$, the entropy is at its maximum for the system, $$S = S_{\mathrm{max}}$$

We can recover the simpler defintion by re-considering the case where $A$ is made up of $N_A$ *equally likely* states. Then, $$\begin{align*}S &= -\sum_{i=1}^{N_U}P_i \log P_i\\&= -\sum_{i \in A} \frac{1}{N_A} \log \frac{1}{N_A}\\&= \left(\sum_{i \in A} 1\right) \frac{1}{N_A} \left(-\log \frac{1}{N_A}\right)\\&= N_A \frac{1}{N_A} \log N_A\\&= \log N_A\end{align*}$$

### Trace of linear operators

The role of trace in quantum mechanics is to replace the summation in the classical formulae for the average and entropy. We need some general results about the trace of a linear operator to do this.

The **trace of a linear operator**, $M$ on some abstract vector space $\mathbb{V}$ is the sum of its diagonal elements. That is, given $$\left\{\ket{i}: i=1,\cdots,N\right\}$$ is a basis for $\mathbb{V}$, then the trace of $M$ is $$\mathrm{Tr}(M) = \sum_{i=1}^{N} \bra{i}M\ket{i}$$

#### The trace of a linear operator is invariant

That is, the trace does not depend upon the basis on which you choose to calculate it. To see why this is true, consider two different bases for $\mathbb{V}$, say $$\begin{matrix}\left\{\ket{e_i}: i=1,\cdots,N\right\}\\\left\{\ket{f_i}: i=1,\cdots,N\right\}\end{matrix}$$

We can find numbers $\alpha_{ij}, \beta_{ij}$ such that $$\begin{matrix}\bra{f_i} = \sum_{k=1}^{N} \bra{e_k}\alpha_{ki}\\\ket{f_j} = \sum_{l=1}^{N} \beta_{jl} \ket{e_l}\end{matrix}$$

Nb. We haven't used complex conjugates here since these results apply to general vector spaces and their dual spaces.

Then, using the definition of a basis, $$\begin{align*}\delta_{ij} &= \braket{f_i}{f_j}\\&= \left(\sum_{k=1}^{N} \bra{e_k}\alpha_{ki}\right)\left(\sum_{l=1}^{N} \beta_{jl} \ket{e_l} \right)\\&= \sum_{k,l=1}^{N} \alpha_{ki} \beta_{jl} \braket{e_k}{e_l}\\&= \sum_{k,l=1}^{N} \alpha_{ki} \beta_{jl} \delta_{kl}\\&= \sum_{k=1}^{N} \alpha_{ki} \beta_{jk}\end{align*}$$

This means that, if we write $A = [\alpha_{ij}], B = [\beta_{ij}]$, then $AB = \mathbf{I}$ where $\mathbf{I}$ is the identity matrix. It's also true that $BA = \mathbf{I}$ or $$\delta_{ij} = \sum_{k=1}^{N} \beta_{ik} \alpha_{kj}$$

Now, let $\mathrm{Tr}_e(M)$ be the trace of a linear operator $M$ with respect to the basis $\left\{\ket{e_i}\right\}$ and $\mathrm{Tr}_f(M)$ be the trace of $M$ with respect to $\left\{\ket{f_i}\right\}$.

Then, $$\begin{align*}\mathrm{Tr}_f(M) &= \sum_{i=1}^{N} \bra{f_i}M\ket{f_i}\\&= \sum_{i=1}^{N} \left(\sum_{k=1}^{N} \bra{e_k}\alpha_{ki}\right) M \left(\sum_{l=1}^{N} \beta_{il} \ket{e_l} \right)\\&= \sum_{k,l=1}^{N} \bra{e_k}M\ket{e_l} \left(\sum_{i=1}^{N} \beta_{il} \alpha_{ki}\right)\\&= \sum_{k,l=1}^{N} \bra{e_k}M\ket{e_l} \delta_{kl} & \text{ using the above result}\\&= \sum_{k=1}^{N} \bra{e_k}M\ket{e_k}\\&= \mathrm{Tr}_e(M)\end{align*}$$

So, the trace is the same with respect to any basis and we can drop the subscripts $$\mathrm{Tr}_e(M) = \mathrm{Tr}_f(M) = \mathrm{Tr}(M)$$

#### The trace of a matrix is the sum of its eigenvalues

We won't prove this in the general case, but we have already seen that a Hermitian operator can determined with respect to a basis of its eigenvectors. Then, the operator becomes a diagonal matrix, with eigenvalues for entries, $$M \rightarrow \begin{bmatrix}\lambda_1 & 0 & \cdots & 0 \\0 & \lambda_2 & \cdots & 0 \\\vdots & \vdots & \ddots & \vdots \\0 & 0 & \cdots & \lambda_N\end{bmatrix}$$

Then, simply, $$\mathrm{Tr}(M) = \sum_{i=1}^{N} \lambda_i$$

The determinant of a matrix is also most easily expressed in terms of its eigenvalues, $$\mathrm{Det}(M) = \prod_{i=1}^{N} \lambda_i$$

Finally, it is straightforward to show that if $A,B$ are linear operators, then

$$\mathrm{Tr}(AB) = \mathrm{Tr}(BA)$$ since if we write $$\begin{align*}A &\rightarrow \left[\alpha_{ij}\right]_{i,j=1}^{N}\\B &\rightarrow \left[\beta_{ij}\right]_{i,j=1}^{N}\end{align*}$$ then the product is $$AB \rightarrow \left[\sum_{j=1}^{N}\alpha_{ij}\beta_{jk}\right]_{i,k=1}^{N}$$

Thus the trace of the product is $$\begin{align*}\mathrm{Tr}(AB) &= \mathrm{Tr}\left[\sum_{j=1}^{N}\alpha_{ij}\beta_{jk}\right]_{i,k=1}^{N}\\&= \sum_{i,j=1}^{N}\alpha_{ij}\beta_{ji}\\&= \sum_{i,j=1}^{N}\beta_{ij}\alpha_{ji} & \text{ swapping the order and the dummy variables}\\&= \mathrm{Tr}\left[\sum_{j=1}^{N}\beta_{ij}\alpha_{jk}\right]_{i,k=1}^{N}\\&= \mathrm{Tr}(BA)\end{align*}$$