Skip to content

Stokes I Simultaneous Image and Instrument Modeling

In this tutorial, we will create a preliminary reconstruction of the 2017 M87 data on April 6 by simultaneously creating an image and model for the instrument. By instrument model, we mean something akin to self-calibration in traditional VLBI imaging terminology. However, unlike traditional self-cal, we will solve for the gains each time we update the image self-consistently. This allows us to model the correlations between gains and the image.

To get started we load Comrade.

julia
using Comrade



using Pyehtim
using LinearAlgebra

For reproducibility we use a stable random number genreator

julia
using StableRNGs
rng = StableRNG(12)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000019)

Load the Data

To download the data visit https://doi.org/10.25739/g85n-f134 First we will load our data:

julia
obs = ehtim.obsdata.load_uvfits(joinpath(__DIR, "..", "..", "Data", "SR1_M87_2017_096_lo_hops_netcal_StokesI.uvfits"))
Python: <ehtim.obsdata.Obsdata object at 0x7f55d45036d0>

Now we do some minor preprocessing:

  • Scan average the data since the data have been preprocessed so that the gain phases coherent.

  • Add 1% systematic noise to deal with calibration issues that cause 1% non-closing errors.

julia
obs = scan_average(obs).add_fractional_noise(0.01)
Python: <ehtim.obsdata.Obsdata object at 0x7f55d45031f0>

Now we extract our complex visibilities.

julia
dvis = extract_table(obs, Visibilities())
EHTObservationTable{Comrade.EHTVisibilityDatum{:I}}
  source:      M87
  mjd:         57849
  bandwidth:   1.856e9
  sites:       [:AA, :AP, :AZ, :JC, :LM, :PV, :SM]
  nsamples:    274

##Building the Model/Posterior

Now, we must build our intensity/visibility model. That is, the model that takes in a named tuple of parameters and perhaps some metadata required to construct the model. For our model, we will use a raster or ContinuousImage for our image model. The model is given below:

The model construction is very similar to Imaging a Black Hole using only Closure Quantities, except we include a large scale gaussian since we want to model the zero baselines. For more information about the image model please read the closure-only example.

julia
function sky(θ, metadata)
    (;fg, c, σimg) = θ
    (;ftot, mimg) = metadata
    # Apply the GMRF fluctuations to the image
    rast = apply_fluctuations(CenteredLR(), mimg, σimg.*c.params)
    m = ContinuousImage((ftot*(1-fg)).*rast, BSplinePulse{3}())
    x0, y0 = centroid(m)
    # Add a large-scale gaussian to deal with the over-resolved mas flux
    g = modify(Gaussian(), Stretch(μas2rad(250.0), μas2rad(250.0)), Renormalize(ftot*fg))
    return shifted(m, -x0, -y0) + g
end
sky (generic function with 1 method)

Now, let's set up our image model. The EHT's nominal resolution is 20-25 μas. Additionally, the EHT is not very sensitive to a larger field of view. Typically 60-80 μas is enough to describe the compact flux of M87. Given this, we only need to use a small number of pixels to describe our image.

julia
npix = 32
fovx = μas2rad(150.0)
fovy = μas2rad(150.0)
7.27220521664304e-10

Now let's form our cache's. First, we have our usual image cache which is needed to numerically compute the visibilities.

julia
grid = imagepixels(fovx, fovy, npix, npix)
RectiGrid(
executor: Serial()
Dimensions: 
↓ X Sampled{Float64} LinRange{Float64}(-3.5224744018114725e-10, 3.5224744018114725e-10, 32) ForwardOrdered Regular Points,
→ Y Sampled{Float64} LinRange{Float64}(-3.5224744018114725e-10, 3.5224744018114725e-10, 32) ForwardOrdered Regular Points
)

Now we need to specify our image prior. For this work we will use a Gaussian Markov Random field prior Since we are using a Gaussian Markov random field prior we need to first specify our mean image. This behaves somewhat similary to a entropy regularizer in that it will start with an initial guess for the image structure. For this tutorial we will use a a symmetric Gaussian with a FWHM of 50 μas

julia
using VLBIImagePriors
using Distributions, DistributionsAD
fwhmfac = 2*sqrt(2*log(2))
mpr  = modify(Gaussian(), Stretch(μas2rad(50.0)./fwhmfac))
mimg = intensitymap(mpr, grid)
╭───────────────────────────────╮
│ 32×32 IntensityMap{Float64,2} │
├───────────────────────────────┴──────────────────────────────────────── dims ┐
  ↓ X Sampled{Float64} LinRange{Float64}(-3.5224744018114725e-10, 3.5224744018114725e-10, 32) ForwardOrdered Regular Points,
  → Y Sampled{Float64} LinRange{Float64}(-3.5224744018114725e-10, 3.5224744018114725e-10, 32) ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  ComradeBase.NoHeader()
└──────────────────────────────────────────────────────────────────────────────┘
  ↓ →          -3.52247e-10  -3.29522e-10  …  3.29522e-10  3.52247e-10
 -3.52247e-10   6.37537e-8    1.32434e-7      1.32434e-7   6.37537e-8
 -3.29522e-10   1.32434e-7    2.751e-7        2.751e-7     1.32434e-7
 -3.06796e-10   2.62014e-7    5.44273e-7      5.44273e-7   2.62014e-7
  ⋮                                        ⋱  ⋮            
  3.06796e-10   2.62014e-7    5.44273e-7      5.44273e-7   2.62014e-7
  3.29522e-10   1.32434e-7    2.751e-7        2.751e-7     1.32434e-7
  3.52247e-10   6.37537e-8    1.32434e-7      1.32434e-7   6.37537e-8

Now we can form our metadata we need to fully define our model. We will also fix the total flux to be the observed value 1.1. This is because total flux is degenerate with a global shift in the gain amplitudes making the problem degenerate. To fix this we use the observed total flux as our value.

julia
skymeta = (;ftot = 1.1, mimg = mimg./flux(mimg))
(ftot = 1.1, mimg = [6.380486931986174e-8 1.3253982629359533e-7 … 1.3253982629359533e-7 6.380486931986174e-8; 1.3253982629359533e-7 2.7532076691313077e-7 … 2.7532076691313077e-7 1.3253982629359533e-7; … ; 1.3253982629359533e-7 2.7532076691313077e-7 … 2.7532076691313077e-7 1.3253982629359533e-7; 6.380486931986174e-8 1.3253982629359533e-7 … 1.3253982629359533e-7 6.380486931986174e-8])

In addition we want a reasonable guess for what the resolution of our image should be. For radio astronomy this is given by roughly the longest baseline in the image. To put this into pixel space we then divide by the pixel size.

julia
beam = beamsize(dvis)
rat = (beam/(step(grid.X)))
5.326336637737519

To make the Gaussian Markov random field efficient we first precompute a bunch of quantities that allow us to scale things linearly with the number of image pixels. The returns a functional that accepts a single argument related to the correlation length of the field. The second argument defines the underlying random field of the Markov process. Here we are using a zero mean and unit variance Gaussian Markov random field. The keyword argument specifies the order of the Gaussian field. Currently, we recommend using order

  • 1 which is identical to TSV variation and L₂ regularization

  • 2 which is identical to a Matern 1 process in 2D and is really the convolution of two order 1 processes

For this tutorial we will use the first order random field

julia
crcache = ConditionalMarkov(GMRF, grid; order=1)
ConditionalMarkov(
Random Field: GaussMarkovRandomField
Graph: MarkovRandomFieldGraph{1}(
dims: (32, 32)
)
)

To demonstrate the prior let create a few random realizations

Now we can finally form our image prior. For this we use a heirarchical prior where the inverse correlation length is given by a Half-Normal distribution whose peak is at zero and standard deviation is 0.1/rat where recall rat is the beam size per pixel. For the variance of the random field we use another half normal prior with standard deviation 0.1. The reason we use the half-normal priors is to prefer "simple" structures. Gaussian Markov random fields are extremly flexible models, and to prevent overfitting it is common to use priors that penalize complexity. Therefore, we want to use priors that enforce similarity to our mean image. If the data wants more complexity then it will drive us away from the prior.

julia
cprior = HierarchicalPrior(crcache, truncated(InverseGamma(1.0, -log(0.1)*rat); upper=2*npix))
HierarchicalPrior(
	map: 
	ConditionalMarkov(
Random Field: GaussMarkovRandomField
Graph: MarkovRandomFieldGraph{1}(
dims: (32, 32)
)
)	hyper prior: 
	Truncated(Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=1.0, θ=0.0815371823902832)
θ: 12.264343342322437
)
; upper=64.0)

)

We can now form our model parameter priors. Like our other imaging examples, we use a Dirichlet prior for our image pixels. For the log gain amplitudes, we use the CalPrior which automatically constructs the prior for the given jones cache gcache.

julia
prior = (
         c = cprior,
         fg = Uniform(0.0, 1.0),
         σimg = truncated(Normal(0.0, 0.1), lower=0.0)
        )

skym = SkyModel(sky, prior, grid; metadata=skymeta)
 SkyModel
  with map: sky
   on grid: RectiGrid

Unlike other imaging examples (e.g., Imaging a Black Hole using only Closure Quantities) we also need to include a model for the instrument, i.e., gains. The gains will be broken into two components

  • Gain amplitudes which are typically known to 10-20%, except for LMT, which has amplitudes closer to 50-100%.

  • Gain phases which are more difficult to constrain and can shift rapidly.

julia
G = SingleStokesGain() do x
    lg = x.lg
    gp = x.gp
    return exp(lg + 1im*gp)
end

intpr = (
    lg= ArrayPrior(IIDSitePrior(ScanSeg(), Normal(0.0, 0.2)); LM = IIDSitePrior(ScanSeg(), Normal(0.0, 1.0))),
    gp= ArrayPrior(IIDSitePrior(ScanSeg(), DiagonalVonMises(0.0, inv(π^2))); refant=SEFDReference(0.0), phase=true)
        )
intmodel = InstrumentModel(G, intpr)


post = VLBIPosterior(skym, intmodel, dvis)
VLBIPosterior
ObservedSkyModel
  with map: sky
   on grid: FourierDualDomainObservedInstrumentModel
  with Jones:  SingleStokesGain
  with reference basis: CirBasis()Data Products: Comrade.EHTVisibilityDatum

done using the asflat function.

julia
tpost = asflat(post)
ndim = dimension(tpost)
1355

We can now also find the dimension of our posterior or the number of parameters we are going to sample.

Warning

This can often be different from what you would expect. This is especially true when using angular variables where we often artificially increase the dimension of the parameter space to make sampling easier.

To initialize our sampler we will use optimize using Adam

julia
using Optimization
using OptimizationOptimisers
using Zygote
xopt, sol = comrade_opt(post, Optimisers.Adam(), Optimization.AutoZygote(); initial_params=prior_sample(rng, post), maxiters=15_000, g_tol=1e-1)
((sky = (c = (params = [0.014811289624710029 0.030223800238355462 … 0.01455503513945859 0.007347731583564094; 0.02909012222272987 0.05933949485579535 … 0.02936064594078301 0.014837886088439995; … ; 0.06046009527188973 0.12019734035873142 … 0.10873127512396265 0.05521823107123587; 0.029988667359748512 0.05957204314217223 … 0.053441789414507224 0.02713700533218975], hyperparams = 18.98049634113745), fg = 0.20867867817697067, σimg = 1.0544509813564384), instrument = (lg = [0.04806001708040684, 0.04805677559437946, 0.011297028688204215, 0.024947533709809157, -0.3845887158721352, 0.16910573280703947, 0.006536329461425865, 0.028812249376416788, -0.22058382985335012, 0.12759619460558996  …  -0.21290214495272325, 0.017214861595208804, -0.9240272894928138, 0.015339698112741834, 0.009474338666006719, 0.026691175245743347, -0.27195856824085385, 0.01793317552456177, -0.9735800606438588, 0.014789236293079728], gp = [0.0, -2.6844804427999645, 0.0, -2.174336524727037, -0.10011415433661668, -2.895435424276745, 0.0, -2.2305628625377203, -0.03865373873881601, -3.131514521045342  …  -3.4309410939773524, -0.7604148352053133, -4.115904279026289, 2.40048501706032, 0.0, -1.8273746294572952, -3.549908488120835, -0.7140700971698195, -4.264522522454779, 2.449286092652984])), retcode: Default
u: [0.014811289624710029, 0.02909012222272987, 0.04232148325157313, 0.05403657695730048, 0.06374697792895838, 0.07109982202752693, 0.07597007533150797, 0.07862251510301949, 0.07963799841005703, 0.079673370013241  …  0.05881418524948227, 0.9375791943770803, -0.11149209359416919, 0.932739646345857, 0.04352083776501261, 0.9383950934572486, -0.13910434604388972, 0.9290830071509915, 0.04582531491017072, 0.9382770815490042]
Final objective value:     -1626.6153975108589
)

Warning

Fitting gains tends to be very difficult, meaning that optimization can take a lot longer. The upside is that we usually get nicer images.

First we will evaluate our fit by plotting the residuals

julia
using Plots
using DisplayAs
residual(post, xopt) |> DisplayAs.PNG |> DisplayAs.Text

These look reasonable, although there may be some minor overfitting. This could be improved in a few ways, but that is beyond the goal of this quick tutorial. Plotting the image, we see that we have a much cleaner version of the closure-only image from Imaging a Black Hole using only Closure Quantities.

julia
import CairoMakie as CM
g = imagepixels(fovx, fovy, 128, 128)
img = intensitymap(skymodel(post, xopt), g)
imageviz(img, size=(500, 400))|> DisplayAs.PNG |> DisplayAs.Text

Because we also fit the instrument model, we can inspect their parameters. To do this, Comrade provides a caltable function that converts the flattened gain parameters to a tabular format based on the time and its segmentation.

julia
gt = Comrade.caltable(xopt.instrument.gp)
plot(gt, layout=(3,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text

The gain phases are pretty random, although much of this is due to us picking a random reference sites for each scan.

Moving onto the gain amplitudes, we see that most of the gain variation is within 10% as expected except LMT, which has massive variations.

julia
gt = Comrade.caltable(exp.(xopt.instrument.lg))
plot(gt, layout=(3,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text

To sample from the posterior, we will use HMC, specifically the NUTS algorithm. For information about NUTS, see Michael Betancourt's notes. However, due to the need to sample a large number of gain parameters, constructing the posterior is rather time-consuming. Therefore, for this tutorial, we will only do a quick preliminary run

julia
using AdvancedHMC
chain = sample(rng, post, NUTS(0.8), 700; n_adapts=500, progress=false, initial_params=xopt)
PosteriorSamples
  Samples size: (700,)
  sampler used: AHMC
Mean
┌───────────────────────────────────────────────────────────────────────────────
│ sky                                                                          ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, f ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [-0.00635113 -0.00239457 … 0.0339052 0.00358742; 0.0245023 0. ⋯
└───────────────────────────────────────────────────────────────────────────────
                                                               2 columns omitted
Std. Dev.
┌───────────────────────────────────────────────────────────────────────────────
│ sky                                                                          ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, f ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [0.528178 0.525163 … 0.600568 0.52221; 0.565343 0.623071 … 0. ⋯
└───────────────────────────────────────────────────────────────────────────────
                                                               2 columns omitted

Note

The above sampler will store the samples in memory, i.e. RAM. For large models this can lead to out-of-memory issues. To fix that you can include the keyword argument saveto = DiskStore() which periodically saves the samples to disk limiting memory useage. You can load the chain using load_samples(diskout) where diskout is the object returned from sample.

Now we prune the adaptation phase

julia
chain = chain[501:end]
PosteriorSamples
  Samples size: (200,)
  sampler used: AHMC
Mean
┌───────────────────────────────────────────────────────────────────────────────
│ sky                                                                          ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, f ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [0.0404886 0.014068 … -0.0478675 -0.0759045; 0.0318741 0.0131 ⋯
└───────────────────────────────────────────────────────────────────────────────
                                                               2 columns omitted
Std. Dev.
┌───────────────────────────────────────────────────────────────────────────────
│ sky                                                                          ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, f ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [0.577861 0.55404 … 0.52606 0.48466; 0.594397 0.618869 … 0.64 ⋯
└───────────────────────────────────────────────────────────────────────────────
                                                               2 columns omitted

Warning

This should be run for likely an order of magnitude more steps to properly estimate expectations of the posterior

Now that we have our posterior, we can put error bars on all of our plots above. Let's start by finding the mean and standard deviation of the gain phases

julia
mchain = Comrade.rmap(mean, chain)
schain = Comrade.rmap(std, chain)
(sky = (c = (params = [0.5778609619131376 0.5540399879504856 … 0.526060418550886 0.4846598344217562; 0.5943974070041592 0.6188685278923548 … 0.6472055011235213 0.5707300539029615; … ; 0.6026448515059188 0.6635411422037917 … 0.6175204509141885 0.5737396370597366; 0.5489429889048036 0.5602156698425422 … 0.5821926775617174 0.5365062327501327], hyperparams = 15.050370050101815), fg = 0.06499889102262585, σimg = 0.056966118090492665), instrument = (lg = [0.13247242760004135, 0.14095582045089447, 0.016920689304094367, 0.01816082353677125, 0.06066300770176316, 0.061226908393403605, 0.016194490290650367, 0.016756426410203473, 0.05367521056137245, 0.06005627021304517  …  0.03870879282906631, 0.012158609842690975, 0.04988255951512501, 0.011812793055212354, 0.014163361321941162, 0.016753915199378872, 0.04090289712679592, 0.013610387254015753, 0.05064028117827307, 0.01231812110227675], gp = [0.0, 0.29031592430762615, 0.0, 0.009503571021263695, 0.1264062099300669, 0.2946833825838371, 0.0, 0.00881645702158383, 0.1288069451941842, 0.297939733561392  …  0.15605556464654116, 0.25395402473787654, 0.13962806329200073, 0.25401514643250916, 0.0, 0.009805848548533933, 0.1509406252619615, 0.24641018935216244, 0.13517801750018515, 0.24702466341925827]))

Now we can use the measurements package to automatically plot everything with error bars. First we create a caltable the same way but making sure all of our variables have errors attached to them.

julia
using Measurements
gmeas_am = Measurements.measurement.(mchain.instrument.lg, schain.instrument.lg)
ctable_am = caltable(exp.(gmeas_am)) # caltable expects gmeas_am to be a Vector
gmeas_ph = Measurements.measurement.(mchain.instrument.gp, schain.instrument.gp)
ctable_ph = caltable(gmeas_ph)
───────────────────────────────────────────────────┬────────────────────────────
                                              time │      AA              AP   ⋯
───────────────────────────────────────────────────┼────────────────────────────
 IntegrationTime{Float64}(57849, 0.916667, 0.0002) │ 0.0±0.0         missing   ⋯
  IntegrationTime{Float64}(57849, 1.21667, 0.0002) │ 0.0±0.0  -2.1747±0.0095   ⋯
  IntegrationTime{Float64}(57849, 1.51667, 0.0002) │ 0.0±0.0  -2.2314±0.0088   ⋯
  IntegrationTime{Float64}(57849, 1.81667, 0.0002) │ 0.0±0.0     -2.27±0.011   ⋯
  IntegrationTime{Float64}(57849, 2.11667, 0.0002) │ 0.0±0.0     -2.32±0.011   ⋯
     IntegrationTime{Float64}(57849, 2.45, 0.0002) │ missing     -2.32±0.011   ⋯
     IntegrationTime{Float64}(57849, 2.75, 0.0002) │ 0.0±0.0  -2.3996±0.0096   ⋯
     IntegrationTime{Float64}(57849, 3.05, 0.0002) │ 0.0±0.0     -2.431±0.01   ⋯
     IntegrationTime{Float64}(57849, 3.35, 0.0002) │ 0.0±0.0    -2.454±0.011   ⋯
  IntegrationTime{Float64}(57849, 3.68333, 0.0002) │ 0.0±0.0  -2.4661±0.0098   ⋯
  IntegrationTime{Float64}(57849, 3.98333, 0.0002) │ 0.0±0.0     -2.436±0.01   ⋯
  IntegrationTime{Float64}(57849, 4.28333, 0.0002) │ 0.0±0.0  -2.4579±0.0098   ⋯
  IntegrationTime{Float64}(57849, 4.58333, 0.0002) │ 0.0±0.0  -2.3711±0.0093   ⋯
  IntegrationTime{Float64}(57849, 4.91667, 0.0002) │ 0.0±0.0  -2.4168±0.0095   ⋯
  IntegrationTime{Float64}(57849, 5.18333, 0.0002) │ 0.0±0.0     -2.379±0.01   ⋯
     IntegrationTime{Float64}(57849, 5.45, 0.0002) │ 0.0±0.0     -2.301±0.01   ⋯
                         ⋮                         │    ⋮           ⋮          ⋱
───────────────────────────────────────────────────┴────────────────────────────
                                                    5 columns and 9 rows omitted

Now let's plot the phase curves

julia
plot(ctable_ph, layout=(3,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text

and now the amplitude curves

julia
plot(ctable_am, layout=(3,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text

Finally let's construct some representative image reconstructions.

julia
samples = skymodel.(Ref(post), chain[begin:2:end])
imgs = intensitymap.(samples, Ref(g))

mimg = mean(imgs)
simg = std(imgs)
fig = CM.Figure(;resolution=(700, 700));
axs = [CM.Axis(fig[i, j], xreversed=true, aspect=1) for i in 1:2, j in 1:2]
CM.image!(axs[1,1], mimg, colormap=:afmhot); axs[1, 1].title="Mean"
CM.image!(axs[1,2], simg./(max.(mimg, 1e-8)), colorrange=(0.0, 2.0), colormap=:afmhot);axs[1,2].title = "Std"
CM.image!(axs[2,1], imgs[1],   colormap=:afmhot);
CM.image!(axs[2,2], imgs[end], colormap=:afmhot);
CM.hidedecorations!.(axs)
fig |> DisplayAs.PNG |> DisplayAs.Text

And viola, you have just finished making a preliminary image and instrument model reconstruction. In reality, you should run the sample step for many more MCMC steps to get a reliable estimate for the reconstructed image and instrument model parameters.


This page was generated using Literate.jl.