Qubefit

This is an extension of the base class Qube. Besides the functions available in this class, it defines several additional functions needed to fit a model to the data and to analyze this model.

class qubefit.qubefit.QubeFit(modelname='ThinDisk', probmethod='ChiSq', intensityprofile=None, velocityprofile=None, dispersionprofile=None)

Initiate the Qubefit class.

Class that is used for fitting data cubes. This class is based on the Qube class and inherits all of its functions. The additional functions defined here are to define a model, run the fitting routine and analyze the results.

When it is initiated, the following data attributes are defined and can be explcitely set: modelname, probmethod, intensityprofile, velocity profile, and dispersionprofile.

Parameters:
  • modelname (STRING, optional) – The name of the model used for the fitting. The default is ‘ThinDisk’.

  • probmethod (STRING, optional) – The name of the probability function to use. The default is ‘ChiSq’.

  • intensityprofile (LIST, optional) – List of three strings with the names of the profiles to use for the intesity of the model. The three strings correspond to the three dimensions used to describe the model. For a cylindrical coordinate system this is r, phi and z in that order. The default is [‘Exponential’, None, ‘Exponential’].

  • velocityprofile (LIST, optional) – List of three strings with the names of the profiles to use for the velocity of the model. The three strings correspond to the three dimensions used to describe the model. For a cylindrical coordinate system this is r, phi and z in that order. The default is [‘Constant’, None, None].

  • dispersionprofile (TYPE, optional) – List of three strings with the names of the profiles to use for the disperdsion of the model. The three strings correspond to the three dimensions used to describe the model. For a cylindrical coordinate system this is r, phi and z in that order. The default is [‘Constant’, None, None].

calculate_chisquared(reduced=True, adjust_for_kernel=True)

Calcuate the (reduced) chi-squared statistic.

This method will calculate the (reduced) chi-squared statistic for the given model, data and pre-defined mask. The value can be adjusted for the oversampling of the beam using a simplified assumption, which is described in detail in the reference paper.

Parameters:
  • reduced (BOOLEAN, optional) – If set, this will calculate the reduced chi-square statistic, by dividing by the degrees of freedom and optionally by the adjustment factor to account for the oversampled beam. The default is True.

  • adjust_for_kernel (BOOLEAN, optional) – If set, it will apply an adjustement factor to the reduced chi-suared statistic to account for oversampling the beam. If the mask is only sparsely sampled, this adjustment factor should be set to False. The default is True.

Raises:

AttributeError – This method will return an AttributeError if no mask has yet been defined.

Returns:

chisq – The (reduced) chi-square statistic of the model within the mask.

Return type:

FLOAT

calculate_ksprobability()

Calculate the Kolmogorov-Smirnov probability of the model

create_gaussiankernel(channels=None, lsf_sigma=None, kernelsize=4)

Create a Gaussian kernel.

This method will generate a Gaussian kernel from the data stored in the Qube. It will look for the beam keyword and generate a gaussian kernel from the shape of the synthesized beam given here. Several different kernels can be returned. 1) A list of 3D kernels -one for each spectral channel- that takes into account both a varying LSF and a varying PSF (set this by selecting a range of channels and LSFSigma) 2) A list of 2D kernels -one for each spectral bin- that takes into account a varying PSF, but no LSF (set this by selecting a range of channels and LSFSigma=None) 3) A 3D kernel that has constant PSF and LSF Only the last option is currently implemented in the code and should therefore be used (i.e., specify a single channel not a list).

This method will populate the kernel and kernelarea data attributes of the qubefit instance.

Parameters:
  • channels (LIST, NUMPY.ARRAY or INT, optional) – The channels to use to create the beam. If this is not specified, then the center of the cube will be used. If multiple channels are specified, then a list of kernels will be given, one for each channel. This is useful when the kernel/PSF changes with the channel. This option, however, is currently NOT implemented in the code and therefore just a single channel should be specified. The default is None.

  • lsf_sigma (FLOAT or NUMPY.ARRAY, optional) – The width of the line spread function. Note that this is the sigma or square root of the variace, NOT the FWHM!. The unit for this value is pixels. If this is not specfied, then the LSF is assumed to be neglible, which is often ok for ALMA data, which has been averaged over many channels. The default is None.

  • kernelsize (FLOAT, optional) – The size of the kernel in terms of the sigma of the major axis in pixels (bsig). The actual size of the kernel will be a cube that has a size for the two spatial dimensions: 2 (n * bsig) + 1, where n is the kernelsize. The default is 4.

Raises:

AttributeError – Function will raise an attribute error if the beam has not been propertly defined.

Returns:

None

Return type:

NoneType

create_maskarray(sampling=2.0, bootstrapsamples=200, regular=None, sigma=None, nblobs=1, fmaskgrow=0.01)

Generate the mask for fitting.

This will generate the mask array. It should only be done once so that the mask does not change between runs of the MCMC chain, which could/will result in slightly different probabilities for the same parameters. The mask returned is either the same size as the data cube (filled with np.NaN and 1’s for data to ignore/include) or an array with size self.data.size by bootstrapsamples. The first is used for directly calculating the KS or ChiSquared value of the given data point, while the second is used for the bootstrap method of both functions.

Several mask options can be specified. It should be noted that these masks are muliplicative. This method will set the maskarrray data attribute.

Parameters:
  • sampling (FLOAT, optional) – The number of samples to choose per beam/kernel area. This keyword is only needed for the sparse sampling methods and for bootstrap arrays. The default is 2.

  • bootstrapsamples (INT, optional) – The number of bootstrap samples to generate. This keyword is only needed if the probability method is set to ‘BootKS’ or ‘BootChiSq’. The default is 200.

  • regular (TUPLE, optional) – If a two element tuple is given, then a regular grid will be generated that is specified by the number of samplings per kernelarea which will go through the given tuple. The default is None.

  • sigma (FLOAT, optional) – If given, a mask is created that will just consider values above the specified sigma (based on the calculated variance). Only the n largest blobs will be considered (specified optionally by nblob). It will grow this mask by convolving with the kernel to include also adjacent ‘zero’ values.

  • nblobs (INT, optional) – The number of blobs to consider in the sigma-mask method. The blobs are ordered in size, where the largest blob which is the background is ignored. If set to 1, this will select only the largest non-background blob. The default is 1.

  • fmaskgrow (FLOAT, optional) – The fraction of the peak value which marks the limit of where to cut of the growth factor. If set to 1, the mask is not grown. This value allows to include some adjacent zero values to be included.

Returns:

None

Return type:

NoneType

create_model(do_convolve=True)

Create the model cube with the given parameters and model.

This method will take the stored parameters (in self.par) and will generate the requested model. NOTE: currently no checking is done that the parameters are actually defined for the model that is requested. This will likely result in a rather benign AttributeError.

This method will set the model data attribute using the parameters stored in the par data attribute and the model defined elsewhere.

Parameters:

do_convolve (BOOLEAN, optional) – If set, the model will be convolved with the beam. The default is True.

Returns:

None

Return type:

NoneType

get_chainresults(filename=None, burnin=0.3, reload_model=True, load_best=False)

Get the results from the MCMC run either from file or memory.

Calling this method will generate a dictionary with the median values, and 1, 2, 3 sigma ranges of the parameters. These have been converted into their original units using the conversions in the the initial parameter attribute (initpar).

If not present, this method will populate the self.mcmcarray and self.mcmclnprob attributes with the values in the file. It will also populate the self.chainpar data attribute with a dictionary with the median parameters, uncertainty and unit conversions.

Parameters:
  • filename (STRING, optional) – The name of the HDF5 file to load. If the filename is set to None, the assumption is that the qubefit instance already has the MCMC chain and log probability loaded into their respective instances. If this is not the case, the code will exit with an AttributeError. The default is None.

  • burnin (FLOAT, optional) – The fraction of runs to discard at the start of the MCMC chain. This is necessary as the chain might not have converged in the very beginning. One should check the chain to make sure that this value is correct. The default is 0.3.

  • reload_model (BOOLEAN, optional) – reload the model cube with the updated parameters. The default parameters to use are the median values of each parameter. The default is True.

  • load_best (BOOLEAN, optional) – If set, then instead of the median parameters, the combination of parameters that yielded the highest probability will be chosen. The default is False.

Raises:

AttributeError – An attribute error will be raised if the needed mcmcarray and lnprobability keys are not set in the qubefit instance and no filename is given.

Returns:

None

Return type:

NoneType

load_initialparameters(parameters)

Load the intial parameter dictionary into the qubefit instance.

This method will load in the initial parameters from a dictionary. The dictionary is transfered into a data attribute, in addition several other attributes are created which are needed for the model and the MCMC fitting procedure.

This method will set the initpar, par, mcmcpar, mcmcmap, priordist and mcmcdim data attributes. The initpar attribute is a simple copy of the parameter dictionary. The par data attribute contains the value of the parameters converted into intrinsic units. mcmcpar and mcmcmap are the values and names of the parameters not held fixed during the fitting procedure and mcmcdim are the number of free parameters. Finally priordist is the prior distribution of each parameter that is not held fixed (see scipy.stats).

Parameters:

parameters (DICT) – The dictionary describes several important features for each parameter of the model. A detailed description is given in the online documentation. In general, it contains a ‘Value’ and ‘Unit’, key, which are in physically interesting units. A ‘Conversion’ to intrinsic units of the cube. A ‘Fixed’ key to allow a parameter to be kept fixed, and finally a ‘Dist’ keyword, which describes the priors of the parameters (using the keys: ‘Dloc and ‘Dscale’).

Returns:

None

Return type:

NoneType

run_mcmc(nwalkers=50, nsteps=100, nproc=None, init_frac=0.02, filename=None, return_sampler=False)

Run the MCMC process with emcee.

This method is the heart of QubeFit. It will run an MCMC process on a predefined model and will return the resulting chain of the posterior PDFs of each individual parameter that was varied. It saves the results in the mcmcarray and mcmclnprob data attributes, but it is HIGHLY RECOMMENDED to also save the outputs into a HDF5 file.

Parameters:
  • nwalkers (INTEGER, optional) – The number of walkers to use in the MCMC chain. The default is 50.

  • nsteps (INTEGER, optional) – The number of steps to make in the MCMC chain. The default is 100.

  • nproc (INTEGER, optional) – The number of parallel processes to use. If set to None, the code will determine the optimum number of processes to spawn. Set this number to limit to load of this code on your system. The default is None.

  • init_frac (FLOAT, optional) – The fraction to use to randomize the initial distribution of walker away from the chosen initial value. The default is 0.02.

  • filename (STRING, optional) – If set, the chain will be saved into a file with this file name. The file format is HDF5, and if not directly specified, this extension will be appended to the filename. The default is None.

  • return_sampler (BOOLEAN, optional) – If set, this will return the emcee ensemble sampler. The default is False.

Raises:

ValueError – A ValueError will be raised if the initial probability is infinity. This likely is an indication that the initial parameters fall outside the range defined by the priors.

Yields:

sampler (EMCEE.ENSEMBLESAMPLER) – The ensemble sampler returned by the emcee function call can be returned, if wanted.

update_parameters(parameters)

Update the parameters key with the given parameters.

This method is a simple wrapper to update one or multiple parameters in the par data attribute. These parameters should be given in intrinsic units.

Parameters:

parameters (DICT) – dictionary of parameters that will be read into the par keyword.

Returns:

None

Return type:

NoneType