---
title: "swNoData"
author: "Matt Moores"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    fig_caption: yes
bibliography: refs.bib
vignette: >
  %\VignetteIndexEntry{swNoData}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteKeyword{Swendsen-Wang}
  %\VignetteKeyword{Gibbs sampling}
---

Gibbs sampling was originally designed by @Geman1984 for drawing updates from the Gibbs distribution, hence the name. However, single-site Gibbs sampling exhibits poor mixing due to the posterior correlation between the pixel labels. Thus it is very slow to converge when the correlation (controlled by the inverse temperature $\beta$) is high.

The algorithm of @Swendsen1987 addresses this problem by forming clusters of neighbouring pixels, then updating all of the labels within a cluster to the same value. When simulating from the prior, such as a Potts model without an external field, this algorithm is very efficient.

------
The `SW` function in the **PottsUtils** package is implemented in a combination of `R` and `C`. The `swNoData` function in **bayesImageS** is implemented using **RcppArmadillo**, which gives it a speed advantage. It is worth noting that the intention of **bayesImageS** is not to replace **PottsUtils**. Rather, an efficient Swendsen-Wang algorithm is used as a building block for implementations of ABC [@Grelaud2009], path sampling [@Gelman1998], and the exchange algorithm [@Murray2006]. These other algorithms will be covered in future posts.

There are two things that we want to keep track of in this simulation study: the speed of the algorithm and the distribution of the summary statistic. We will be using `system.time(..)` to measure both CPU and elapsed (wall clock) time taken for the same number of iterations, for a range of inverse temperatures:
```{r}
beta <- seq(0,2,by=0.1)
tmMx.PU <- tmMx.bIS <- matrix(nrow=length(beta),ncol=2)
rownames(tmMx.PU) <- rownames(tmMx.bIS) <- beta
colnames(tmMx.PU) <- colnames(tmMx.bIS) <- c("user","elapsed")
```

We will discard the first 100 iterations as burn-in and keep the remaining 500.
```{r}
iter <- 600
burn <- 100
samp.PU <- samp.bIS <- matrix(nrow=length(beta),ncol=iter-burn)
```

The distribution of pixel labels can be summarised by the sufficient statistic of the Potts model:

$$
S(z) = \sum_{i \sim \ell \in \mathscr{N}} \delta(z_i, z_\ell)
$$

where $i \sim \ell \in \mathscr{N}$ are all of the pairs of neighbours in the lattice (ie. the cliques) and $\delta(u,v)$ is 1 if $u = v$ and 0 otherwise (the Kronecker delta function). `swNoData` returns this automatically, but with `SW` we will need to use the function `sufficientStat` to calculate the sufficient statistic for the labels.

```{r}
library(bayesImageS)

mask <- matrix(1,50,50)
neigh <- getNeighbors(mask, c(2,2,0,0))
block <- getBlocks(mask, 2)
edges <- getEdges(mask, c(2,2,0,0))

n <- sum(mask)
k <- 2
bcrit <- log(1 + sqrt(k))
maxSS <- nrow(edges)

for (i in 1:length(beta)) {
  if (requireNamespace("PottsUtils", quietly = TRUE)) {
    tm <- system.time(result <- PottsUtils::SW(iter,n,k,edges,beta=beta[i]))
    tmMx.PU[i,"user"] <- tm["user.self"]
    tmMx.PU[i,"elapsed"] <- tm["elapsed"]
    res <- sufficientStat(result, neigh, block, k)
    samp.PU[i,] <- res$sum[(burn+1):iter]
    print(paste("PottsUtils::SW",beta[i],tm["elapsed"],median(samp.PU[i,])))
   } else {
      print("PottsUtils::SW unavailable on this platform.")
   }

  
  # bayesImageS
  tm <- system.time(result <- swNoData(beta[i],k,neigh,block,iter))
  tmMx.bIS[i,"user"] <- tm["user.self"]
  tmMx.bIS[i,"elapsed"] <- tm["elapsed"]
  samp.bIS[i,] <- result$sum[(burn+1):iter]
  print(paste("bayesImageS::swNoData",beta[i],tm["elapsed"],median(samp.bIS[i,])))
}
```

Here is the comparison of elapsed times between the two algorithms (in seconds):
```{r}
summary(tmMx.PU)
summary(tmMx.bIS)
boxplot(tmMx.PU[,"elapsed"],tmMx.bIS[,"elapsed"],ylab="seconds elapsed",names=c("SW","swNoData"))
```

On average, `swNoData` using **RcppArmadillo** [@Eddelbuettel2013] is seven times faster than `SW`.

```{r}
s_z <- c(samp.PU,samp.bIS)
s_x <- rep(beta,times=iter-burn)
s_a <- rep(1:2,each=length(beta)*(iter-burn))
s.frame <- data.frame(s_z,c(s_x,s_x),s_a)
names(s.frame) <- c("stat","beta","alg")
s.frame$alg <- factor(s_a,labels=c("SW","swNoData"))
  if (requireNamespace("lattice", quietly = TRUE)) {
    lattice::xyplot(stat ~ beta | alg, data=s.frame)
  }
plot(c(s_x,s_x),s_z,pch=s_a,xlab=expression(beta),ylab=expression(S(z)))
abline(v=bcrit,col="red")
```

The overlap between the two distributions is almost complete, although it is a bit tricky to verify that statistically. The relationship between $beta$ and $S(z)$ is nonlinear and heteroskedastic.

```{r}
rowMeans(samp.bIS) - rowMeans(samp.PU)
apply(samp.PU, 1, sd)
apply(samp.bIS, 1, sd)
s.frame$beta <- factor(c(s_x,s_x))
  if (requireNamespace("PottsUtils", quietly = TRUE)) {
    s.fit <- aov(stat ~ alg + beta, data=s.frame)
    summary(s.fit)
    TukeyHSD(s.fit,which="alg")
  }
```

### References