| Title: | Bayesian Modelling of Raman Spectroscopy |
|---|---|
| Description: | Sequential Monte Carlo (SMC) algorithms for fitting a generalised additive mixed model (GAMM) to surface-enhanced resonance Raman spectroscopy (SERRS), using the method of Moores et al. (2016) <arXiv:1604.07299>. Multivariate observations of SERRS are highly collinear and lend themselves to a reduced-rank representation. The GAMM separates the SERRS signal into three components: a sequence of Lorentzian, Gaussian, or pseudo-Voigt peaks; a smoothly-varying baseline; and additive white noise. The parameters of each component of the model are estimated iteratively using SMC. The posterior distributions of the parameters given the observed spectra are represented as a population of weighted particles. |
| Authors: | Matt Moores [aut, cre] (ORCID: <https://orcid.org/0000-0003-4531-3572>), Jake Carson [aut] (ORCID: <https://orcid.org/0000-0002-7896-0971>), Benjamin Moskowitz [ctb], Kirsten Gracie [dtc], Karen Faulds [dtc] (ORCID: <https://orcid.org/0000-0002-5567-7399>), Mark Girolami [aut], Engineering and Physical Sciences Research Council [fnd] (EPSRC programme grant ref: EP/L014165/1), University of Warwick [cph] |
| Maintainer: | Matt Moores <[email protected]> |
| License: | GPL (>= 2) | file LICENSE |
| Version: | 0.6-0 |
| Built: | 2026-06-04 08:19:25 UTC |
| Source: | https://github.com/mooresm/serrsbayes |
This is an internal function that is only exposed on the public API for unit testing purposes. It computes the log-likelihood of the spline and the noise, once the spectral signature has been subtracted from the observed data. Thus, it can be used with either Lorentzian, Gaussian, or pseudo-Voigt broadening functions.
computeLogLikelihood( obsi, lambda, prErrNu, prErrSS, basisMx, eigVal, precMx, xTx, aMx, ruMx )computeLogLikelihood( obsi, lambda, prErrNu, prErrSS, basisMx, eigVal, precMx, xTx, aMx, ruMx )
obsi |
Vector of residuals after the spectral signature has been subtracted. |
lambda |
smoothing parameter of the penalised B-spline. |
prErrNu |
hyperparameter of the additive noise |
prErrSS |
hyperparameter of the additive noise |
basisMx |
Matrix of B-spline basis functions |
eigVal |
eigenvalues of the Demmler-Reinsch factorisation |
precMx |
precision matrix for the spline |
xTx |
sparse matrix cross-product |
aMx |
orthoganal matrix A from the Demmler-Reinsch factorisation |
ruMx |
product of Ru from the Demmler-Reinsch factorisation |
The logarithm of the likelihood.
This is an internal function that is only exposed on the public API for unit testing purposes.
copyLogProposals(nPK, T_Prop_Theta)copyLogProposals(nPK, T_Prop_Theta)
nPK |
number of Raman peaks in the spectral signature |
T_Prop_Theta |
Vector of logarithms of the MH proposals |
Vector of proposals
The ESS is a "rule of thumb" for assessing the degeneracy of the importance distribution:
effectiveSampleSize(log_weights)effectiveSampleSize(log_weights)
log_weights |
logarithms of the importance weights of each particle. |
the effective sample size, a scalar between 0 and Q
Liu, JS (2001) "Monte Carlo Strategies in Scientific Computing." Springer, NY, pp. 34–36.
x <- runif(100) effectiveSampleSize(log(x))x <- runif(100) effectiveSampleSize(log(x))
Fit the model using Markov chain Monte Carlo.
fitSpectraMCMC(wl, spc, peakWL, lPriors, sd_mh, niter = 10000, nchains = 4)fitSpectraMCMC(wl, spc, peakWL, lPriors, sd_mh, niter = 10000, nchains = 4)
wl |
Vector of |
spc |
|
peakWL |
Vector of locations for each peak (cm^-1) |
lPriors |
List of hyperparameters for the prior distributions. |
sd_mh |
Vector of |
niter |
number of MCMC iterations per chain. |
nchains |
number of concurrent MCMC chains. |
a List containing MCMC samples for the model parameters:
amplitudeniter * nchains * npeaks Array of amplitudes.
scaleniter * nchains * npeaks Array of scale parameters.
sigmaniter * nchains Matrix of standard deviations.
n_accThe number of RWMH proposals that were accepted.
wavenumbers <- seq(200,600,by=10) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(300,500) peakAmplitude <- c(10000,4000) peakScale <- c(10, 15) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) lPriors <- list(scale.mu=log(11.6) - (0.4^2)/2, scale.sd=0.4, bl.smooth=10^11, bl.knots=20, amp.mu=5000, amp.sd=5000, noise.sd=200, noise.nu=4) rw_bw <- c(100, 100, 2, 2) result <- fitSpectraMCMC(wavenumbers, spectra, peakLocations, lPriors, rw_bw, 500) result$n_accwavenumbers <- seq(200,600,by=10) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(300,500) peakAmplitude <- c(10000,4000) peakScale <- c(10, 15) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) lPriors <- list(scale.mu=log(11.6) - (0.4^2)/2, scale.sd=0.4, bl.smooth=10^11, bl.knots=20, amp.mu=5000, amp.sd=5000, noise.sd=200, noise.nu=4) rw_bw <- c(100, 100, 2, 2) result <- fitSpectraMCMC(wavenumbers, spectra, peakLocations, lPriors, rw_bw, 500) result$n_acc
Fit the model using Sequential Monte Carlo (SMC).
fitSpectraSMC( wl, spc, peakWL, lPriors, conc = rep(1, nrow(spc)), npart = 10000, rate = 0.9, minESS = npart/2, destDir = NA )fitSpectraSMC( wl, spc, peakWL, lPriors, conc = rep(1, nrow(spc)), npart = 10000, rate = 0.9, minESS = npart/2, destDir = NA )
wl |
Vector of |
spc |
|
peakWL |
Vector of locations for each peak (cm^-1) |
lPriors |
List of hyperparameters for the prior distributions. |
conc |
Vector of |
npart |
number of SMC particles to use for the importance sampling distribution. |
rate |
the target rate of reduction in the effective sample size (ESS). |
minESS |
minimum effective sample size, below which the particles are resampled. |
destDir |
destination directory to save intermediate results (for long-running computations) |
a List containing weighted parameter values, known as particles:
weightsVector of importance weights for each particle.
betanpart * npeaks Matrix of regression coefficients for the amplitudes.
scalenpart * npeaks Matrix of scale parameters.
sigmaVector of npart standard deviations.
alphabl.knots * n_y * npart Array of spline coefficients for the baseline.
basisA dense nwl * bl.knots Matrix containing the values of the basis functions.
expFnnpart * nwl Matrix containing the spectral signature.
essVector containing the effective sample size (ESS) at each SMC iteration.
logEvidenceVector containing the logarithm of the model evidence (marginal likelihood).
acceptVector containing the Metropolis-Hastings acceptance rate at each SMC iteration.
sd.mhniter * 2npeaks Matrix of random walk MH bandwidths at each SMC iteration..
Chopin (2002) "A Sequential Particle Filter Method for Static Models," Biometrika 89(3): 539–551, doi:10.1093/biomet/89.3.539
wavenumbers <- seq(200,600,by=10) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(300,500) peakAmplitude <- c(10000,4000) peakScale <- c(10, 15) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) lPriors <- list(scale.mu=log(11.6) - (0.4^2)/2, scale.sd=0.4, bl.smooth=10^11, bl.knots=20, beta.mu=5000, beta.sd=5000, noise.sd=200, noise.nu=4) ## Not run: result <- fitSpectraSMC(wavenumbers, spectra, peakLocations, lPriors, npart=500) ## End(Not run)wavenumbers <- seq(200,600,by=10) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(300,500) peakAmplitude <- c(10000,4000) peakScale <- c(10, 15) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) lPriors <- list(scale.mu=log(11.6) - (0.4^2)/2, scale.sd=0.4, bl.smooth=10^11, bl.knots=20, beta.mu=5000, beta.sd=5000, noise.sd=200, noise.nu=4) ## Not run: result <- fitSpectraSMC(wavenumbers, spectra, peakLocations, lPriors, npart=500) ## End(Not run)
Fit the model with Voigt peaks using iterated batch importance sampling (IBIS).
fitVoigtIBIS( wl, spc, n, lResult, conc = rep(1, nrow(spc)), batch = rep(1, nrow(spc)), npart = 10000, rate = 0.9, mcAR = 0.234, mcSteps = 20, minESS = npart/2, minPart = npart, destDir = NA )fitVoigtIBIS( wl, spc, n, lResult, conc = rep(1, nrow(spc)), batch = rep(1, nrow(spc)), npart = 10000, rate = 0.9, mcAR = 0.234, mcSteps = 20, minESS = npart/2, minPart = npart, destDir = NA )
wl |
Vector of |
spc |
|
n |
index of the new observation |
lResult |
List of results from the previous call to “fitVoigtPeaksSMC“ or “fitVoigtIBIS“ |
conc |
Vector of |
batch |
identifies to which batch each observation belongs |
npart |
number of SMC particles to use for the importance sampling distribution. |
rate |
the target rate of reduction in the effective sample size (ESS). |
mcAR |
target acceptance rate for the MCMC kernel |
mcSteps |
number of iterations of the MCMC kernel |
minESS |
minimum effective sample size, below which the particles are resampled. |
minPart |
target number of unique particles for the MCMC iterations |
destDir |
destination directory to save intermediate results (for long-running computations) |
Chopin (2002) "A Sequential Particle Filter Method for Static Models," Biometrika 89(3): 539–551, doi:10.1093/biomet/89.3.539
Fit the model with Voigt peaks using Sequential Monte Carlo (SMC).
fitVoigtPeaksSMC( wl, spc, lPriors, conc = rep(1, nrow(spc)), npart = 10000, rate = 0.9, mcAR = 0.234, mcSteps = 20, minESS = npart/2, destDir = NA, minPart = npart )fitVoigtPeaksSMC( wl, spc, lPriors, conc = rep(1, nrow(spc)), npart = 10000, rate = 0.9, mcAR = 0.234, mcSteps = 20, minESS = npart/2, destDir = NA, minPart = npart )
wl |
Vector of |
spc |
|
lPriors |
List of hyperparameters for the prior distributions. |
conc |
Vector of |
npart |
number of SMC particles to use for the importance sampling distribution. |
rate |
the target rate of reduction in the effective sample size (ESS). |
mcAR |
target acceptance rate for the MCMC kernel |
mcSteps |
number of iterations of the MCMC kernel |
minESS |
minimum effective sample size, below which the particles are resampled. |
destDir |
destination directory to save intermediate results (for long-running computations) |
minPart |
target number of unique particles for the MCMC iterations |
wavenumbers <- seq(200,600,by=10) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(300,500) peakAmplitude <- c(10000,4000) peakScale <- c(10, 15) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) lPriors <- list(scaG.mu=log(11.6) - (0.4^2)/2, scaG.sd=0.4, scaL.mu=log(11.6) - (0.4^2)/2, scaL.sd=0.4, bl.smooth=5, bl.knots=20, loc.mu=peakLocations, loc.sd=c(5,5), beta.mu=c(5000,5000), beta.sd=c(5000,5000), noise.sd=200, noise.nu=4) ## Not run: result <- fitVoigtPeaksSMC(wavenumbers, spectra, lPriors, npart=50, mcSteps=1) ## End(Not run)wavenumbers <- seq(200,600,by=10) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(300,500) peakAmplitude <- c(10000,4000) peakScale <- c(10, 15) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) lPriors <- list(scaG.mu=log(11.6) - (0.4^2)/2, scaG.sd=0.4, scaL.mu=log(11.6) - (0.4^2)/2, scaL.sd=0.4, bl.smooth=5, bl.knots=20, loc.mu=peakLocations, loc.sd=c(5,5), beta.mu=c(5000,5000), beta.sd=c(5000,5000), noise.sd=200, noise.nu=4) ## Not run: result <- fitVoigtPeaksSMC(wavenumbers, spectra, lPriors, npart=50, mcSteps=1) ## End(Not run)
This function computes penalised cubic B-splines using the method proposed by Eilers & Marx (1996). The spline coefficients can be computed efficiently using sparse matrix algebra, as described in Sect. 2.3.3 of Green & Silverman (1994) and Appendix B of Ruppert, Wand & Carroll (2003).
getBsplineBasis(V, n.b, pen, prec = 1e-08)getBsplineBasis(V, n.b, pen, prec = 1e-08)
V |
a |
n.b |
the number of basis functions to use. |
pen |
the smoothing penalty hyperparameter. |
prec |
a constant scale factor. |
a list containing:
basisA dense nwl by n.b matrix containing the values of the basis functions.
precisionA sparse n.b by n.b dsCMatrix, the inverse of the prior covariance.
distanceThe distance between each knot .
knotsThe knot locations.
Eilers, PHC & Marx, BD (1996) "Flexible smoothing with B-splines and penalties," Statist. Sci. 11(2): 89–121, doi:10.1214/ss/1038425655
Green, PJ & Silverman, BW (1994) "Nonparametric Regression and Generalized Linear Models: a roughness penalty approach" Chapman & Hall, Boca Raton, FL, pp. 11–21.
Ruppert, D; Wand, MP & Carroll, RJ (2003) "Semiparametric Regression" CUP, Cambridge, UK, pp. 336–340.
Calculates the mixing parameter from the scales of the Gaussian/Lorentzian
components.
getVoigtParam(scale_G, scale_L)getVoigtParam(scale_G, scale_L)
scale_G |
Vector of standard deviations |
scale_L |
Vector of scale parameters |
First, calculate a polynomial average of the scale parameters according to the approximation of Thompson et al. (1987):
Then the Voigt mixing parameter is defined as:
The Voigt mixing weights for each peak, between 0 (Gaussian) and 1 (Lorentzian).
Thompson, Cox & Hastings (1987) "Rietveld refinement of Debye–Scherrer synchrotron X-ray data from ,"
J. Appl. Crystallogr. 20(2): 79–83, doi:10.1107/S0021889887087090
Surface-enhanced Raman spectram of tetramethylrhodamine+DNA (T20)
lsTamralsTamra
A list containing 2 variables:
a numeric Vector of 2401 wavenumbers (cm^-1)
a 1 * 2401 Matrix of intensity values (a.u.)
Updates all of the parameters using a single Metropolis-Hastings step, such that the
baseline cancels out in the MH ratio, using the marginalisation identity of Chib (1995).
If npart > 1, then multiple MCMC chains will be executed independently in parallel using OpenMP.
This means that all functions used for the proposal distributions and to evaluate the MH ratio
need to be thread-safe. Specifically, no calls to R::rnorm, R::dnorm, nor their
Rcpp equivalents, can be made from within the parallel portion of the code.
marginalMetropolisUpdate( spectra, n, conc, wavelengths, peakWL, betaMx, scaleMx, sigma, expMx, baselines, sd_mh, priors )marginalMetropolisUpdate( spectra, n, conc, wavelengths, peakWL, betaMx, scaleMx, sigma, expMx, baselines, sd_mh, priors )
spectra |
|
n |
number of observations to use in calculating the likelihood |
conc |
Vector of |
wavelengths |
Vector of |
peakWL |
Vector of locations for each peak (cm^-1) |
betaMx |
|
scaleMx |
|
sigma |
Vector of |
expMx |
|
baselines |
|
sd_mh |
Vector of |
priors |
List of hyperparameters for the prior distributions. |
The number of RWMH proposals that were accepted.
Chib (1995) "Marginal Likelihood from the Gibbs Output," JASA 90(432): 1313–1321, doi:10.1080/01621459.1995.10476635
Rosenthal (2000) "Parallel computing and Monte Carlo algorithms" Far East J. Theor. Stat. 4(2): 207–236, URL: https://www.pphmj.com/abstract/1961.htm
Raman spectrum of methanol (CH3OH)
methanolmethanol
A list containing 2 variables:
a numeric Vector of 331 wavenumbers (cm^-1)
a 1 * 331 Matrix of intensity values (a.u.)
Updates all of the parameters (location, amplitude, std. dev., and scale) using a single Metropolis-
Hastings step, such that the baseline cancels out in the MH ratio, using the marginalisation identity
of Chib (1995).
Note: if npart > 1, then multiple MCMC chains will be executed independently in parallel using
OpenMP. This means that all functions used for the proposal distributions and to evaluate the MH ratio
need to be thread-safe. Specifically, no calls to R::rnorm, R::dnorm, nor their
Rcpp equivalents, can be made from within the parallel portion of the code.
mhUpdateVoigt( spectra, n, kappa, conc, wavenum, thetaMx, logThetaMx, mhChol, priors )mhUpdateVoigt( spectra, n, kappa, conc, wavenum, thetaMx, logThetaMx, mhChol, priors )
spectra |
|
n |
number of observations to use in calculating the likelihood. |
kappa |
likelihood tempering parameter. |
conc |
Vector of |
wavenum |
Vector of |
thetaMx |
|
logThetaMx |
|
mhChol |
lower-triangular Cholesky factorisation of the covariance matrix for the random walk proposals. |
priors |
List of hyperparameters for the prior distributions. |
The number of RWMH proposals that were accepted.
Chib (1995) "Marginal Likelihood from the Gibbs Output," JASA 90(432): 1313–1321, doi:10.1080/01621459.1995.10476635
Rosenthal (2000) "Parallel computing and Monte Carlo algorithms" Far East J. Theor. Stat. 4(2): 207–236, URL: https://www.pphmj.com/abstract/1961.htm
Calculates the value of the pseudo-Voigt broadening function at the given wavenumbers, given the parameters of the peaks. This function is thread-safe.
mixedVoigt(location, scale_G, scale_L, amplitude, wavenum)mixedVoigt(location, scale_G, scale_L, amplitude, wavenum)
location |
Vector of location parameters of the peaks ( |
scale_G |
Vector of standard deviations |
scale_L |
Vector of scale parameters |
amplitude |
Vector of amplitudes of the peaks (a.u.) |
wavenum |
Vector of wavenumbers at which to compute the function. |
The value of the pseudo-Voigt function at the given wavenumbers.
Thompson, Cox & Hastings (1987) "Rietveld refinement of Debye–Scherrer synchrotron X-ray data from ,"
J. Appl. Crystallogr. 20(2): 79–83, DOI: doi:10.1107/S0021889887087090
Cal_V <- seq(300,400,by=5) loc <- c(320,350,375) scG <- c(10,5,1) scL <- c(3,20,7) amp <- c(100,500,200) mixedVoigt(loc,scG,scL,amp,Cal_V)Cal_V <- seq(300,400,by=5) loc <- c(320,350,375) scG <- c(10,5,1) scL <- c(3,20,7) amp <- c(100,500,200) mixedVoigt(loc,scG,scL,amp,Cal_V)
Resample in place to avoid expensive copying of data structures, using a permutation of the ancestry vector.
resampleParticles(log_weights, ampMx, scaleMx, peaks, baselines, n_y, nwl)resampleParticles(log_weights, ampMx, scaleMx, peaks, baselines, n_y, nwl)
log_weights |
logarithms of the importance weights of each particle |
ampMx |
|
scaleMx |
|
peaks |
|
baselines |
|
n_y |
number of observations |
nwl |
number of wavenumbers |
Vector of indices to the parents of the resampled particles.
Murray, L.M., Lee, A. & Jacob, P.E. (2015) "Parallel resampling in the particle filter" arXiv:1301.4019v3
Compute an ancestry vector for residual resampling of the SMC particles.
residualResampling(log_wt)residualResampling(log_wt)
log_wt |
logarithms of the importance weights of each particle. |
Vector of indices to the particles that will be propagated forward to the next generation (i.e. the parents)
Liu & Chen (1998) "Sequential Monte Carlo methods for dynamic systems," JASA 93(443): 1032-1044, doi:10.1080/01621459.1998.10473765
Douc, Cappe & Moulines (2005) "Comparison of resampling schemes for particle filtering" In Proc. 4th IEEE Int. Symp. ISPA, pp. 64-69, doi:10.1109/ISPA.2005.195385
Posterior distribution for pseudo-Voigt parameters, obtained by running 'fitVoigtPeaksSMC' on a spectrum from Gracie et al. (Anal. Chem., 2016). 1000 SMC particles with 32 peaks. For details, see the vignette.
resultresult
A list containing 15 variables:
normalised importance weights for each particle
location parameters of 32 peaks
amplitudes of 32 peaks
scale of the Gaussian (RBF) broadening
scale of the Lorentzian (Cauchy) broadening
standard deviation of the additive white noise
smoothing parameter of the cubic B-splines
List of informative priors
history of the effective sample size
history of the likelihood tempering
history of Metropolis-Hastings acceptance rates
history of Metropolis-Hastings steps
history of times for each SMC iteration
computation time taken by the SMC algorithm
Posterior distribution for pseudo-Voigt parameters, obtained by running 'fitVoigtPeaksSMC' on a Raman spectrum of methanol with 4 peaks. For details, refer to the vignette.
result2result2
A list containing 15 variables.
Update the importance weights of each particle.
reWeightParticles( spectra, peaks, baselines, i, start, sigma, old_weights, alpha, idx )reWeightParticles( spectra, peaks, baselines, i, start, sigma, old_weights, alpha, idx )
spectra |
|
peaks |
|
baselines |
|
i |
index of the current observation to use in calculating the likelihood |
start |
index of the next wavelength to use in calculating the likelihood, permuted by |
sigma |
Vector of |
old_weights |
logarithms of the importance weights of each particle. |
alpha |
the target learning rate for the reduction in effective sample size (ESS). |
idx |
permutation of the indices of the wavelengths. |
a List containing:
essThe effective sample size, after reweighting.
weightsVector of updated importance weights.
indexindex of the last wavelength used.
evidenceSMC estimate of the logarithm of the model evidence.
Pitt, dos Santos Silva, Giordani & Kohn (2012) "On some properties of Markov chain Monte Carlo simulation methods based on the particle filter" J. Econometrics 171(2): 134–151, DOI: doi:10.1016/j.jeconom.2012.06.004
Zhou, Johansen & Aston (2015) "Towards Automatic Model Comparison: An Adaptive Sequential Monte Carlo Approach" arXiv:1303.3123 [stat.ME]
This R package implements sequential Monte Carlo (SMC) algorithms for fitting a generalised additive mixed model (GAMM) to Raman spectra. These multivariate observations are highly collinear and lend themselves to a reduced-rank representation. The GAMM separates the hyperspectral signal into three components: a sequence of Lorentzian or Gaussian peaks; a smoothly-varying baseline; and zero-mean, additive white noise. The parameters of each component of the model are estimated iteratively using SMC. The posterior distributions of the parameters given the observed spectra are represented as a population of weighted particles.
Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser. The pattern of peaks in a Raman spectrum corresponds to the vibrational modes of the molecule. The shift in wavenumber of the photons is proportional to the change in energy state, which is reflected in the locations of the peaks. Surface-enhanced Raman scattering (SERS) is a technique that amplifies the Raman signal using metallic substrates, such as nanoparticles. The laser can also be tuned to the resonant frequency of the molecule, which is known as surface-enhanced resonance Raman scattering (SERRS). Under controlled experimental conditions, the amplitudes of the peaks are linearly related to the concentration of the molecule, from the limit of detection (LOD) up to monolayer coverage of the nanoparticle surface.
The GAMM represents the peaks and baseline as continuous functions. The background fluorescence is modelled using a penalised cubic spline, while the peaks are an additive mixture of squared exponential (Gaussian) or Lorentzian (Cauchy) kernels:
where is a matrix of hyperspectral observations that have been discretised at wavenumbers ;
are the spline basis functions with coefficients ;
are the radial basis functions for each peak, with location , amplitude , and scale parameters.
is assumed to be zero mean, additive white noise with constant variance .
This model-based approach accounts for differences in resolution and experimental conditions, enabling comparison and alignment of heterogeneous spectra. The relationship between concentration and peak intensity can be quantified by fitting a Bayesian functional regression:
where is the nanomolar (nM) concentration of the molecule in the th spectrum,
. The regression model produces highest posterior density (HPD) intervals for the
limit of detection of each peak. A consistent, unbiased estimate of the model evidence (also known as the marginal likelihood)
is also computed. This can be used to evaluate whether Gaussian or Lorentzian peaks are a better fit to the data.
M. T. Moores, J. Carson & M. Girolami
Maintainer: Matt Moores <[email protected]>
Moores, Gracie, Carson, Faulds, Graham & Girolami "Bayesian modelling and quantification of Raman spectroscopy," arXiv preprint
# simulate some data with known parameter values wavenumbers <- seq(700,1400,by=2) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(840, 960, 1140, 1220, 1290) peakAmplitude <- c(11500, 2500, 4000, 3000, 2500) peakScale <- c(10, 15, 20, 10, 12) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity") lines(wavenumbers, baseline, col=2, lty=4) lines(wavenumbers, baseline + signature, col=4, lty=2) # fit the model using SMC lPriors <- list(scale.mu=log(11.6) - (0.4^2)/2, scale.sd=0.4, bl.smooth=10^11, bl.knots=50, beta.mu=5000, beta.sd=5000, noise.sd=200, noise.nu=4) ## Not run: ## takes approx. 1 minute for 100 SMC iterations with 10,000 particles result <- fitSpectraSMC(wavenumbers, spectra, peakLocations, lPriors) plot.ts(result$ess, xlab="SMC iterations", ylab="ESS") # sample 200 particles from the posterior distribution samp.idx <- sample.int(length(result$weights), 200, prob=result$weights) plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity") for (pt in samp.idx) { bl.est <- result$basis %*% result$alpha[,1,pt] lines(wavenumbers, bl.est, col="#C3000009") lines(wavenumbers, bl.est + result$expFn[pt,], col="#0000C309") } ## End(Not run)# simulate some data with known parameter values wavenumbers <- seq(700,1400,by=2) spectra <- matrix(nrow=1, ncol=length(wavenumbers)) peakLocations <- c(840, 960, 1140, 1220, 1290) peakAmplitude <- c(11500, 2500, 4000, 3000, 2500) peakScale <- c(10, 15, 20, 10, 12) signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers) baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200) plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity") lines(wavenumbers, baseline, col=2, lty=4) lines(wavenumbers, baseline + signature, col=4, lty=2) # fit the model using SMC lPriors <- list(scale.mu=log(11.6) - (0.4^2)/2, scale.sd=0.4, bl.smooth=10^11, bl.knots=50, beta.mu=5000, beta.sd=5000, noise.sd=200, noise.nu=4) ## Not run: ## takes approx. 1 minute for 100 SMC iterations with 10,000 particles result <- fitSpectraSMC(wavenumbers, spectra, peakLocations, lPriors) plot.ts(result$ess, xlab="SMC iterations", ylab="ESS") # sample 200 particles from the posterior distribution samp.idx <- sample.int(length(result$weights), 200, prob=result$weights) plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity") for (pt in samp.idx) { bl.est <- result$basis %*% result$alpha[,1,pt] lines(wavenumbers, bl.est, col="#C3000009") lines(wavenumbers, bl.est + result$expFn[pt,], col="#0000C309") } ## End(Not run)
This is an internal function that is only exposed on the public API for unit testing purposes.
sumDexp(x, rate)sumDexp(x, rate)
x |
Vector of i.i.d. exponential random varibles |
rate |
parameter of the exponential distribution |
The sum of the log-likelihoods (log of the product of the likelihoods) for independent, identically-distributed, exponential random variables. Note: this Rcpp function is thread-safe, unlike the equivalent alternatives.
log-likelihood of x
sum(dexp(x, rate, log=TRUE))
This is an internal function that is only exposed on the public API for unit testing purposes.
sumDlogNorm(x, meanlog, sdlog)sumDlogNorm(x, meanlog, sdlog)
x |
Vector of i.i.d. lognormal random varibles |
meanlog |
mean of the distribution on the log scale |
sdlog |
standard deviation on the log scale |
The sum of the log-likelihoods (log of the product of the likelihoods) for independent, identically-distributed, lognormal random variables. Note: this Rcpp function is thread-safe, unlike the equivalent alternatives.
log-likelihood of x
sum(dlnorm(x, meanlog, sdlog, log=TRUE))
x <- rlnorm(100) sumDlogNorm(x,0,1)x <- rlnorm(100) sumDlogNorm(x,0,1)
This is an internal function that is only exposed on the public API for unit testing purposes.
sumDnorm(x, mean, sd)sumDnorm(x, mean, sd)
x |
Vector of i.i.d. Gaussian random varibles |
mean |
Vector of means |
sd |
Vector of standard deviations |
The sum of the log-likelihoods (log of the product of the likelihoods) for independent, identically-distributed, Gaussian random variables. Note: this Rcpp function is thread-safe, unlike the equivalent alternatives.
log-likelihood of x
sum(dnorm(x, mean, sd, log=TRUE))
x <- rnorm(100) mu <- rep(0,length(x)) sd <- rep(1,length(x)) sumDnorm(x,mu,sd)x <- rnorm(100) mu <- rep(0,length(x)) sd <- rep(1,length(x)) sumDnorm(x,mu,sd)
Calculates the value of the squared exponential radial basis function at the given wavelengths, given the parameters of the peaks. This function is thread-safe.
weightedGaussian(location, scale, amplitude, wavelengths)weightedGaussian(location, scale, amplitude, wavelengths)
location |
Vector of location parameters of the peaks (mean). |
scale |
Vector of scale parameters of the peaks (standard deviation). |
amplitude |
Vector of amplitudes of the peaks. |
wavelengths |
Vector of wavenumbers at which to compute the function. |
The value of the Gaussian function at the given wavelengths.
Cal_V <- seq(300,400,by=5) loc <- c(320,350,375) sca <- c(10,5,18) amp <- c(1000,5000,2000) weightedGaussian(loc,sca,amp,Cal_V)Cal_V <- seq(300,400,by=5) loc <- c(320,350,375) sca <- c(10,5,18) amp <- c(1000,5000,2000) weightedGaussian(loc,sca,amp,Cal_V)
Calculates the value of the Lorentzian function at the given wavelengths, given the parameters of the peaks. This function is thread-safe.
weightedLorentzian(location, scale, amplitude, wavelengths)weightedLorentzian(location, scale, amplitude, wavelengths)
location |
Vector of location parameters of the peaks. |
scale |
Vector of scale parameters of the peaks. |
amplitude |
Vector of amplitudes of the peaks. |
wavelengths |
Vector of wavenumbers at which to compute the function. |
The value of the Lorentian function at the given wavelengths.
Cal_V <- seq(300,400,by=5) loc <- c(320,350,375) sca <- c(10,5,18) amp <- c(1000,5000,2000) weightedLorentzian(loc,sca,amp,Cal_V)Cal_V <- seq(300,400,by=5) loc <- c(320,350,375) sca <- c(10,5,18) amp <- c(1000,5000,2000) weightedLorentzian(loc,sca,amp,Cal_V)
This SMC estimate of the means can be used to centre independent Metropolis-Hastings proposals.
weightedMean(particles, log_weights)weightedMean(particles, log_weights)
particles |
|
log_weights |
logarithms of the importance weights of each particle. |
A vector of means, one for each row.
This SMC estimate of the variance can be used to scale the bandwidth of adaptive, Gaussian random walk Metropolis-Hastings proposals.
weightedVariance(particles, log_weights, mean)weightedVariance(particles, log_weights, mean)
particles |
|
log_weights |
logarithms of the importance weights of each particle. |
mean |
Vector of weighted means of each particle. |
A vector of variances, one for each row.