Stokes I Simultaneous Image and Instrument Modeling
In this tutorial, we will create a preliminary reconstruction of the 2017 M87 data on April 6 by simultaneously creating an image and model for the instrument. By instrument model, we mean something akin to self-calibration in traditional VLBI imaging terminology. However, unlike traditional self-cal, we will solve for the gains each time we update the image self-consistently. This allows us to model the correlations between gains and the image.
To get started we load Comrade.
using Comrade
using Pyehtim
using LinearAlgebra
For reproducibility we use a stable random number genreator
using StableRNGs
rng = StableRNG(12)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000019)
Load the Data
To download the data visit https://doi.org/10.25739/g85n-f134 First we will load our data:
obs = ehtim.obsdata.load_uvfits(joinpath(__DIR, "..", "..", "Data", "SR1_M87_2017_096_lo_hops_netcal_StokesI.uvfits"))
Python: <ehtim.obsdata.Obsdata object at 0x7f58029aad10>
obs = ehtim.obsdata.load_uvfits("~/Dropbox (Smithsonian External)/M872021Project/Data/2021/CASA/e21e18/V4/M87_calibrated_b3.uvf+EVPA_rotation+netcal_10savg+flag.uvfits") Now we do some minor preprocessing:
Scan average the data since the data have been preprocessed so that the gain phases coherent.
Add 1% systematic noise to deal with calibration issues that cause 1% non-closing errors.
obs = scan_average(obs).add_fractional_noise(0.02)
Python: <ehtim.obsdata.Obsdata object at 0x7f580273b7f0>
Now we extract our complex visibilities.
dvis = extract_table(obs, Visibilities())
EHTObservationTable{Comrade.EHTVisibilityDatum{:I}}
source: M87
mjd: 57849
bandwidth: 1.856e9
sites: [:AA, :AP, :AZ, :JC, :LM, :PV, :SM]
nsamples: 274
##Building the Model/Posterior
Now, we must build our intensity/visibility model. That is, the model that takes in a named tuple of parameters and perhaps some metadata required to construct the model. For our model, we will use a raster or ContinuousImage
for our image model. The model is given below:
The model construction is very similar to Imaging a Black Hole using only Closure Quantities, except we include a large scale gaussian since we want to model the zero baselines. For more information about the image model please read the closure-only example.
function sky(θ, metadata)
(;fg, c, σimg) = θ
(;ftot, mimg) = metadata
# Apply the GMRF fluctuations to the image
rast = apply_fluctuations(CenteredLR(), mimg, σimg.*c.params)
pimg = parent(rast)
@. pimg = (ftot*(1-fg))*pimg
m = ContinuousImage(rast, BSplinePulse{3}())
x0, y0 = centroid(m)
# Add a large-scale gaussian to deal with the over-resolved mas flux
g = modify(Gaussian(), Stretch(μas2rad(500.0), μas2rad(500.0)), Renormalize(ftot*fg))
return shifted(m, -x0, -y0) + g
end
sky (generic function with 1 method)
Now, let's set up our image model. The EHT's nominal resolution is 20-25 μas. Additionally, the EHT is not very sensitive to a larger field of view. Typically 60-80 μas is enough to describe the compact flux of M87. Given this, we only need to use a small number of pixels to describe our image.
npix = 32
fovx = μas2rad(200.0)
fovy = μas2rad(200.0)
9.69627362219072e-10
Now let's form our cache's. First, we have our usual image cache which is needed to numerically compute the visibilities.
grid = imagepixels(fovx, fovy, npix, npix)
RectiGrid(
executor: Serial()
Dimensions:
(↓ X Sampled{Float64} LinRange{Float64}(-4.69663253574863e-10, 4.69663253574863e-10, 32) ForwardOrdered Regular Points,
→ Y Sampled{Float64} LinRange{Float64}(-4.69663253574863e-10, 4.69663253574863e-10, 32) ForwardOrdered Regular Points)
)
Now we need to specify our image prior. For this work we will use a Gaussian Markov Random field prior Since we are using a Gaussian Markov random field prior we need to first specify our mean
image. This behaves somewhat similary to a entropy regularizer in that it will start with an initial guess for the image structure. For this tutorial we will use a a symmetric Gaussian with a FWHM of 50 μas
using VLBIImagePriors
using Distributions
fwhmfac = 2*sqrt(2*log(2))
mpr = modify(Gaussian(), Stretch(μas2rad(50.0)./fwhmfac))
mimg = intensitymap(mpr, grid)
╭───────────────────────────────╮
│ 32×32 IntensityMap{Float64,2} │
├───────────────────────────────┴──────────────────────────────────────── dims ┐
↓ X Sampled{Float64} LinRange{Float64}(-4.69663253574863e-10, 4.69663253574863e-10, 32) ForwardOrdered Regular Points,
→ Y Sampled{Float64} LinRange{Float64}(-4.69663253574863e-10, 4.69663253574863e-10, 32) ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
ComradeBase.NoHeader()
└──────────────────────────────────────────────────────────────────────────────┘
↓ → -4.69663e-10 -4.39362e-10 … 4.39362e-10 4.69663e-10
-4.69663e-10 1.25675e-11 4.60978e-11 4.60978e-11 1.25675e-11
-4.39362e-10 4.60978e-11 1.69087e-10 1.69087e-10 4.60978e-11
-4.09062e-10 1.55054e-10 5.6874e-10 5.6874e-10 1.55054e-10
⋮ ⋱
4.09062e-10 1.55054e-10 5.6874e-10 … 5.6874e-10 1.55054e-10
4.39362e-10 4.60978e-11 1.69087e-10 1.69087e-10 4.60978e-11
4.69663e-10 1.25675e-11 4.60978e-11 4.60978e-11 1.25675e-11
Now we can form our metadata we need to fully define our model. We will also fix the total flux to be the observed value 1.1. This is because total flux is degenerate with a global shift in the gain amplitudes making the problem degenerate. To fix this we use the observed total flux as our value.
skymeta = (;ftot = 1.1, mimg = mimg./flux(mimg))
(ftot = 1.1, mimg = [1.2567547416996487e-11 4.60979671782088e-11 … 4.60979671782088e-11 1.2567547416996487e-11; 4.60979671782088e-11 1.690880891437348e-10 … 1.690880891437348e-10 4.60979671782088e-11; … ; 4.60979671782088e-11 1.690880891437348e-10 … 1.690880891437348e-10 4.60979671782088e-11; 1.2567547416996487e-11 4.60979671782088e-11 … 4.60979671782088e-11 1.2567547416996487e-11])
To make the Gaussian Markov random field efficient we first precompute a bunch of quantities that allow us to scale things linearly with the number of image pixels. The returns a functional that accepts a single argument related to the correlation length of the field. The second argument defines the underlying random field of the Markov process. Here we are using a zero mean and unit variance Gaussian Markov random field. For this tutorial we will use the first order random field
cprior = corr_image_prior(grid, dvis)
HierarchicalPrior(
map:
ConditionalMarkov(
Random Field: GaussMarkovRandomField
Graph: MarkovRandomFieldGraph{1}(
dims: (32, 32)
)
) hyper prior:
Truncated(Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=1.0, θ=0.054358121593522137)
θ: 18.396515013483654
)
; lower=1.0, upper=64.0)
)
Putting everything together the total prior is then our image prior, a prior on the standard deviation of the MRF, and a prior on the fractional flux of the Gaussian component.
prior = (
c = cprior,
σimg = truncated(Normal(0.0, 0.5); lower=0.0),
fg = Uniform(0.0, 1.0),
)
(c = HierarchicalPrior(
map:
ConditionalMarkov(
Random Field: GaussMarkovRandomField
Graph: MarkovRandomFieldGraph{1}(
dims: (32, 32)
)
) hyper prior:
Truncated(Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=1.0, θ=0.054358121593522137)
θ: 18.396515013483654
)
; lower=1.0, upper=64.0)
)
, σimg = Truncated(Distributions.Normal{Float64}(μ=0.0, σ=0.5); lower=0.0), fg = Distributions.Uniform{Float64}(a=0.0, b=1.0))
Now we can construct our sky model.
skym = SkyModel(sky, prior, grid; metadata=skymeta)
SkyModel
with map: sky
on grid: RectiGrid
Unlike other imaging examples (e.g., Imaging a Black Hole using only Closure Quantities) we also need to include a model for the instrument, i.e., gains. The gains will be broken into two components
Gain amplitudes which are typically known to 10-20%, except for LMT, which has amplitudes closer to 50-100%.
Gain phases which are more difficult to constrain and can shift rapidly.
G = SingleStokesGain() do x
lg = x.lgμ + x.lgσ*x.lgz
gp = x.gp
return exp(lg + 1im*gp)
end
intpr = (
lgμ = ArrayPrior(IIDSitePrior(TrackSeg(), Normal(0.0, 0.2)); LM = IIDSitePrior(TrackSeg(), Normal(0.0, 1.0))),
lgσ = ArrayPrior(IIDSitePrior(TrackSeg(), Exponential(0.1))),
lgz = ArrayPrior(IIDSitePrior(ScanSeg(), Normal(0.0, 1.0))),
gp= ArrayPrior(IIDSitePrior(ScanSeg(), DiagonalVonMises(0.0, inv(π^2))); refant=SEFDReference(0.0), phase=true)
)
intmodel = InstrumentModel(G, intpr)
InstrumentModel
with Jones: SingleStokesGain
with reference basis: CirBasis()
To form the posterior we just combine the skymodel, instrument model and the data. Additionally, since we want to use gradients we need to specify the AD mode. Essentially for all modes we recommend using Enzyme.set_runtime_activity(Enzyme.Reverse)
. Eventually as Comrade and Enzyme matures we will no need set_runtime_activity
.
using Enzyme
post = VLBIPosterior(skym, intmodel, dvis; admode=set_runtime_activity(Enzyme.Reverse))
VLBIPosterior
ObservedSkyModel
with map: sky
on grid: FourierDualDomainObservedInstrumentModel
with Jones: SingleStokesGain
with reference basis: CirBasis()Data Products: Comrade.EHTVisibilityDatum
done using the asflat
function.
tpost = asflat(post)
ndim = dimension(tpost)
1369
We can now also find the dimension of our posterior or the number of parameters we are going to sample.
Warning
This can often be different from what you would expect. This is especially true when using angular variables where we often artificially increase the dimension of the parameter space to make sampling easier.
To initialize our sampler we will use optimize using Adam
using Optimization
using OptimizationOptimisers
xopt, sol = comrade_opt(post, Optimisers.Adam(); initial_params=prior_sample(rng, post), maxiters=20_000, g_tol=1e-1)
((sky = (c = (params = [2.0076515236904918e-5 5.1690393548906376e-6 … -4.0974058163463525e-6 2.1983264529647484e-5; 2.153830896879591e-6 5.1018698769931056e-5 … 5.457923054809228e-5 -2.1704647852831583e-6; … ; 1.2400788460633849e-5 1.9465685549182595e-5 … 3.6096763509420226e-5 1.1800042740313767e-5; 1.9411964723872542e-7 1.2380609394774555e-5 … 1.263133134034929e-5 9.007003528022272e-6], hyperparams = 1.1548592798844655), σimg = 2.4377532105284474, fg = 0.19157807973147084), instrument = (lgμ = [0.015503527241854312, 0.024830226300738837, -0.25195755146660864, 0.015084620642853507, -0.8445479470451497, 0.1565260432147824, 0.015194088307542227], lgσ = [0.21205922098118046, 0.196360786742244, 0.14187210291354327, 0.11660606989112157, 0.5116638916264322, 0.2459838639835606, 0.10472240182981192], lgz = [-0.0012864501616474317, -0.0007345267808387717, -0.036675335474035814, 0.014288227734998639, 0.5668496440034954, 0.44729554232962543, -0.05323026393320384, 0.02577947608813775, 0.9323427483396638, 0.3057819099454429 … 0.05245969939271056, -0.0027004618322428765, -0.28057852231835356, -0.016733709669898282, -0.06545728644422705, 0.04126484651355116, -0.4020795787745577, 0.06681510883974712, -0.38966587342023873, -0.019493176090938622], gp = [0.0, -0.6387960180901588, 0.0, -2.1918389799153095, 0.6987476250837964, -0.8150609355110472, 0.0, -2.2413919025458613, 0.7355830737624022, -1.038684370842759 … -2.8053364311117193, -0.7896424068391976, 2.751695363652954, 2.3723782056444285, 0.0, -1.830549863778911, -2.9170144587396285, -0.768173068340471, 2.604899333599963, 2.390780727003531])), retcode: Default
u: [2.0076515236904918e-5, 2.153830896879591e-6, 3.978903893366981e-5, 3.2452264957551606e-5, 7.098397238123091e-5, 8.813715172317709e-5, 0.00012092049637605513, 0.00015447627301449932, 0.00018102941759448756, 0.00021832784948879182 … 0.05691465900519313, 0.9376935374616415, -0.10469594430661146, 0.9335795848574782, 0.02016739895860438, 0.9392138589366785, -0.13740858031984554, 0.9293177081828191, 0.01728670303528503, 0.9392599664570503]
Final objective value: -1525.8804326770107
)
Warning
Fitting gains tends to be very difficult, meaning that optimization can take a lot longer. The upside is that we usually get nicer images.
First we will evaluate our fit by plotting the residuals
using Plots
using DisplayAs
residual(post, xopt) |> DisplayAs.PNG |> DisplayAs.Text
These look reasonable, although there may be some minor overfitting. This could be improved in a few ways, but that is beyond the goal of this quick tutorial. Plotting the image, we see that we have a much cleaner version of the closure-only image from Imaging a Black Hole using only Closure Quantities.
import CairoMakie as CM
g = imagepixels(fovx, fovy, 128, 128)
img = intensitymap(skymodel(post, xopt), g)
imageviz(img, size=(500, 400))|> DisplayAs.PNG |> DisplayAs.Text
Because we also fit the instrument model, we can inspect their parameters. To do this, Comrade
provides a caltable
function that converts the flattened gain parameters to a tabular format based on the time and its segmentation.
intopt = instrumentmodel(post, xopt)
gt = Comrade.caltable(angle.(intopt))
plot(gt, layout=(3,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text
The gain phases are pretty random, although much of this is due to us picking a random reference sites for each scan.
Moving onto the gain amplitudes, we see that most of the gain variation is within 10% as expected except LMT, which has massive variations.
gt = Comrade.caltable(abs.(intopt))
plot(gt, layout=(3,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text
To sample from the posterior, we will use HMC, specifically the NUTS algorithm. For information about NUTS, see Michael Betancourt's notes. However, due to the need to sample a large number of gain parameters, constructing the posterior is rather time-consuming. Therefore, for this tutorial, we will only do a quick preliminary run
using AdvancedHMC
chain = sample(rng, post, NUTS(0.8), 1_000; n_adapts=500, progress=false, initial_params=xopt)
PosteriorSamples
Samples size: (1000,)
sampler used: AHMC
Mean
┌───────────────────────────────────────────────────────────────────────────────
│ sky ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, σ ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [-0.00726376 0.0226817 … -0.0169236 -0.032161; 0.00759771 0.0 ⋯
└───────────────────────────────────────────────────────────────────────────────
2 columns omitted
Std. Dev.
┌───────────────────────────────────────────────────────────────────────────────
│ sky ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, σ ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [0.531322 0.627261 … 0.620804 0.545255; 0.604893 0.66599 … 0. ⋯
└───────────────────────────────────────────────────────────────────────────────
2 columns omitted
Note
The above sampler will store the samples in memory, i.e. RAM. For large models this can lead to out-of-memory issues. To fix that you can include the keyword argument saveto = DiskStore()
which periodically saves the samples to disk limiting memory useage. You can load the chain using load_samples(diskout)
where diskout
is the object returned from sample.
Now we prune the adaptation phase
chain = chain[501:end]
PosteriorSamples
Samples size: (500,)
sampler used: AHMC
Mean
┌───────────────────────────────────────────────────────────────────────────────
│ sky ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, σ ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [0.016837 0.0416006 … -0.0278728 -0.0152579; 0.0312651 0.0849 ⋯
└───────────────────────────────────────────────────────────────────────────────
2 columns omitted
Std. Dev.
┌───────────────────────────────────────────────────────────────────────────────
│ sky ⋯
│ @NamedTuple{c::@NamedTuple{params::Matrix{Float64}, hyperparams::Float64}, σ ⋯
├───────────────────────────────────────────────────────────────────────────────
│ (c = (params = [0.510519 0.631306 … 0.619614 0.567119; 0.613664 0.65624 … 0. ⋯
└───────────────────────────────────────────────────────────────────────────────
2 columns omitted
Warning
This should be run for likely an order of magnitude more steps to properly estimate expectations of the posterior
Now that we have our posterior, we can put error bars on all of our plots above. Let's start by finding the mean and standard deviation of the gain phases
mchain = Comrade.rmap(mean, chain)
schain = Comrade.rmap(std, chain)
(sky = (c = (params = [0.510519023047925 0.6313058268341921 … 0.6196142965256104 0.567118931457733; 0.6136644244252576 0.656240126282091 … 0.6881275513692006 0.5971345459473286; … ; 0.606848611625147 0.6196036068257442 … 0.6089879743606179 0.5914669830767639; 0.5100055693840528 0.5992250165912231 … 0.5717002184835416 0.5010882018389208], hyperparams = 15.466028930510648), σimg = 0.22202854429578175, fg = 0.1076275860323719), instrument = (lgμ = [0.0110922085331063, 0.008684341415643716, 0.06681997173301708, 0.006126888288366532, 0.07911176504080264, 0.10534648056400905, 0.006123890156418069], lgσ = [0.007636931397969585, 0.008775891074169477, 0.008905921687187712, 0.004074411204544212, 0.03833858479688895, 0.01878095964800656, 0.004057112323990045], lgz = [1.0080120589556114, 0.6625506864694438, 0.4553062145746777, 0.6014425450760678, 0.3167931718781745, 0.45982463271308593, 0.4269795887198531, 0.5487565568163734, 0.36021530559923937, 0.42548160854429945 … 0.8469153840883928, 0.9896393334881259, 0.21823073825384245, 0.9638933294912603, 0.41926010388572105, 0.6103054662586329, 0.9567224193376066, 0.954919087433182, 0.23372663280331907, 0.9014492910599783], gp = [0.0, 0.3897020216918826, 0.0, 0.018501528866066118, 0.2089089947943557, 0.3897986279729577, 0.0, 0.017939670182806317, 0.20940124996146156, 0.3918719142016837 … 0.2337967486421316, 0.3355323141620888, 2.9776800612774057, 2.8011526128207405, 0.0, 0.017033635052403878, 0.2310834940642554, 0.33301474951209825, 2.5444709093115114, 2.8345131928395517]))
Now we can use the measurements package to automatically plot everything with error bars. First we create a caltable
the same way but making sure all of our variables have errors attached to them.
using Measurements
gmeas = instrumentmodel(post, (;instrument= map((x,y)->Measurements.measurement.(x,y), mchain.instrument, schain.instrument)))
ctable_am = caltable(abs.(gmeas))
ctable_ph = caltable(angle.(gmeas))
───────────────────────────────────────────────────┬────────────────────────────
time │ AA AP ⋯
───────────────────────────────────────────────────┼────────────────────────────
IntegrationTime{Float64}(57849, 0.916667, 0.0002) │ 0.0±0.0 missing ⋯
IntegrationTime{Float64}(57849, 1.21667, 0.0002) │ 0.0±0.0 -2.192±0.019 ⋯
IntegrationTime{Float64}(57849, 1.51667, 0.0002) │ 0.0±0.0 -2.241±0.018 ⋯
IntegrationTime{Float64}(57849, 1.81667, 0.0002) │ 0.0±0.0 -2.276±0.017 ⋯
IntegrationTime{Float64}(57849, 2.11667, 0.0002) │ 0.0±0.0 -2.329±0.018 ⋯
IntegrationTime{Float64}(57849, 2.45, 0.0002) │ missing -2.329±0.018 ⋯
IntegrationTime{Float64}(57849, 2.75, 0.0002) │ 0.0±0.0 -2.402±0.017 ⋯
IntegrationTime{Float64}(57849, 3.05, 0.0002) │ 0.0±0.0 -2.435±0.017 ⋯
IntegrationTime{Float64}(57849, 3.35, 0.0002) │ 0.0±0.0 -2.467±0.018 ⋯
IntegrationTime{Float64}(57849, 3.68333, 0.0002) │ 0.0±0.0 -2.468±0.018 0 ⋯
IntegrationTime{Float64}(57849, 3.98333, 0.0002) │ 0.0±0.0 -2.438±0.02 0 ⋯
IntegrationTime{Float64}(57849, 4.28333, 0.0002) │ 0.0±0.0 -2.456±0.021 0 ⋯
IntegrationTime{Float64}(57849, 4.58333, 0.0002) │ 0.0±0.0 -2.371±0.021 0 ⋯
IntegrationTime{Float64}(57849, 4.91667, 0.0002) │ 0.0±0.0 -2.425±0.018 0. ⋯
IntegrationTime{Float64}(57849, 5.18333, 0.0002) │ 0.0±0.0 -2.385±0.019 - ⋯
IntegrationTime{Float64}(57849, 5.45, 0.0002) │ 0.0±0.0 -2.303±0.021 -0 ⋯
⋮ │ ⋮ ⋮ ⋱
───────────────────────────────────────────────────┴────────────────────────────
5 columns and 9 rows omitted
Now let's plot the phase curves
plot(ctable_ph, layout=(4,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text
and now the amplitude curves
plot(ctable_am, layout=(4,3), size=(600,500)) |> DisplayAs.PNG |> DisplayAs.Text
Finally let's construct some representative image reconstructions.
samples = skymodel.(Ref(post), chain[begin:5:end])
imgs = intensitymap.(samples, Ref(g))
mimg = mean(imgs)
simg = std(imgs)
fig = CM.Figure(;resolution=(700, 700));
axs = [CM.Axis(fig[i, j], xreversed=true, aspect=1) for i in 1:2, j in 1:2]
CM.image!(axs[1,1], mimg, colormap=:afmhot); axs[1, 1].title="Mean"
CM.image!(axs[1,2], simg./(max.(mimg, 1e-8)), colorrange=(0.0, 2.0), colormap=:afmhot);axs[1,2].title = "Std"
CM.image!(axs[2,1], imgs[1], colormap=:afmhot);
CM.image!(axs[2,2], imgs[end], colormap=:afmhot);
CM.hidedecorations!.(axs)
fig |> DisplayAs.PNG |> DisplayAs.Text
And viola, you have just finished making a preliminary image and instrument model reconstruction. In reality, you should run the sample
step for many more MCMC steps to get a reliable estimate for the reconstructed image and instrument model parameters.
This page was generated using Literate.jl.