# Running a Refinement

In order to refine potential parameters the following is required:

A

`Universe`

which contains one or more atoms. (Building a Universe)A

`Simulation`

which runs using the`Universe`

. (Running a Simulation)One or more

`Parameters`

to refine. (Selecting fitting Parameters)One or more experimental datasets.

For more information on how to produce the first three of these, please see the tutorials in the brackets.

## Experimental datasets

Whether or not an experimental dataset can be refined against depends on:

if the relevant experimental

`Observable`

(e.g. dynamic structure factor, S(Q,w)) has been implemented.if a

`Reader`

for the specific file format of the data has been implemented (e.g. the`LAMPSQw`

`Reader`

, which reads the LAMP output file format).

If both of these exist, an experimental experimental dataset can be used. If only the former has been implemented, then it would be necessary to add a subclass of the `ObservableReader`

class to use the experimental dataset; more information about how to do is included within the `ObservableReader`

documentation.

Each experimental dataset must be specified by a dictionary containing the following:

`file_name`

: A string with the file name of the experimental data`type`

: A string specifying the type of`Observable`

which describes the data (e.g. SQw if the data is the dynamic structure factor, PDF if the data is the pair distribution function).`reader`

: A string specifying the name of the`ObservableReader`

which will be used to read the file.`weight`

: A float which determines the relative importance weighting of this dataset to other datasets.`rescale_factor`

, optional : A float which rescales the experimental data linearly in order to match the absolute scale of the simulated data when calculating the Figure of Merit (FoM). Default is`1.`

, i.e. no change to the data.`auto_scale`

, optional : A boolean specifying whether to automatically set the aforementioned`rescale_factor`

at each step of the refinement to the value which minimizes the FoM. Default is`False`

.`use_FFT`

, optional : A boolean specifying whether to use the Fast Fourier Transform (FFT) in the calculation of variables from MD, which are faster but can impose restrictions on the uniformity of variables. Default is`True`

.`resolution`

: a one-line dict of format`{'file' : str}`

or`{key : float}`

. if`{'file' : str}`

, the`str`

should be a file in the same format as`file_name`

containing results of a vanadium sample which is used to determine instrument energy resolution for this dataset. If`{key : float}`

, the key should be the type of numeric resolution, with the key being the function type (e.g.`'gaussian'`

or`'lorentzian'`

), and the float the FWHM for that function.

```
[ ]:
```

```
# Dataset from: Johan Qvist et al, J. Chem. Phys. 134, 144508 (2011)
QENS = {'file_name':'data/263K05Awat_LAMP',
'type':'SQw',
'reader':'LAMPSQw',
'weight':1.,
'auto_scale':True,
'use_FFT':False,
'resolution':{'file': 'data/262p7K0A5van_LAMP'}}
```

As the dataset we are using has both negative energy values and non-uniform spacing, the default behaviour of `use_FFT=True`

would cause the data to be interpolated as part of the refinement process. Setting this to `False`

allows us to preserve the original energy values.

The default behaviour of the scaling (`rescale_factor=1.`

, `auto_scale=False`

) assumes that the dataset provided has been properly scaled and normalised for the refinement process. This is the preferred way of using MDMC, and arbitrary or automatic rescaling should be undertaken with care. For example, using `auto_scale`

to determine the scaling does not take into account any physical aspects of scaling the data, such as the presence or absence of background events from peaks outside its
range.

The specific manner in which the `weight`

applies to calculating the FoM depends on the particular FoM which is used for the refinement. By default this is a least-squares difference weighted by the experimental error (`StandardFoMCalculator`

):

\(FoM_{total} = \sum_{i} FoM_{i}\)

where the sum is over the number of experimental datasets and we calculate a weighted FoM for the \(i\)-th dataset as follows:

\(FoM_{i} = w_{i} \sum_{j} \left( \frac{D_{j}^{exp} - D_{j}^{sim}}{\sigma_{j}^{exp}} \right)^2\)

\(D_{j}\) is an observable, \(\sigma_{j}\) is the error associated with the observable, and \(exp\) and \(sim\) refer to the experimental and simulated data respectively. \(D_{j}\) and \(\sigma_{j}\) are N-dimensional arrays of all the data points for a dataset.

The `exp_datasets`

parameter which is passed to `Control`

is a list of all dataset dictionaries. In this instance there is only a single experimental dataset:

```
[ ]:
```

```
exp_datasets = [QENS]
```

If another dataset were to be included, for example the pair distribution function, this would require its own dictionary:

```
[ ]:
```

```
n_diffraction = {'file_name':'data/water_PDF',
'type':'PDF',
'reader':'ASCII',
'weight':1.,
'auto_scale':True,
'resolution': {'gaussian': 84}}
two_exp_datasets = [QENS, n_diffraction]
```

## Controlling the refinement

Continuing from the Running a Simulation tutorial, using a `Universe`

filled with SPCE water molecules. As this minimizes and equilibrates a small simulation, it may take ~90s to execute.

The `%%capture`

and `%run`

commands below simply executes the Running a Simulation notebook and captures the variables into this notebook. They are only valid if they are executed in the same folder as the Running a Simulation notebook. Otherwise, please start from the Running a Simulation to set the same state.

```
[ ]:
```

```
%%capture
# Run Running a Simulation notebook and hide output
%run "running-a-simulation.ipynb"
```

### Controlling the parameters

The `Parameter`

s of the `Universe`

can have various restrictions placed on them for the purposes of refinement.

```
[ ]:
```

```
universe = simulation.universe
fit_parameters = universe.parameters
fit_parameters.sort(key=lambda p : p.name)
print(fit_parameters)
```

The value of one `Parameter`

can be `tied`

to that of another if there is some physical reason why it should be dependent on it. For example, in a water molecule setting the charge of the oxygen to be -2 times that of the hydrogen:

```
[ ]:
```

```
fit_parameters[1].set_tie(fit_parameters[0], ' * - 2')
print(fit_parameters[1])
```

The value of a `Parameter`

can have `constraints`

set to limit it’s value to a certain range:

```
[ ]:
```

```
fit_parameters[2].constraints = (0.6, 0.7)
print(fit_parameters[2])
```

Finally, a `Parameter`

can be `fixed`

outright to whatever value it currently has.

```
[ ]:
```

```
fit_parameters[7].fixed = True
print(fit_parameters[7])
```

Note that `Parameter`

s that are `fixed`

, `tied`

, or equal to 0 cannot be refined and so will not be passed to the `Minimizer`

when the `Control`

object is created. If `constraints`

are set then refinement will occur however any proposed values that happen to lie outside the range of the `constraints`

will be clipped.

Restricting the value of a `Parameter`

in these ways applies to the Monte Carlo aspect of MDMC. Namely, it affects how the `Parameter`

is allowed to change between refinement steps. It should not be confused with the use of `constraint_algorithm`

s when creating a `Universe`

(see Building a Universe). The latter controls whether or not aspects of the `Universe`

such as bonds are treated as rigid or not during molecular dynamics. It is entirely possible
to have a rigid bond but allow the length of that bond to change between refinement steps, or conversely have a bond that is free to oscillate during MD but the equilibrium length is not altered as part of the refinement.

### Controlling the figure of merit

The figure of merit (FoM) used in the refinement process to assess the goodness of fit between MD and experimental data can be configured using a dictionary with the following key/value pairs: - `error`

: `{'exp', 'none'}`

Whether to weight the FoM using the experimental errors of the data or not (respectively). The former results in values with larger (relative) error contributing less to the overall FoM. - `norm`

: `{'data_points', 'dof', 'none'}`

Whether to normalise the FoM using the
total number of data points, the degrees of freedom (number of data points less the number of varying parameters), or not at all.

If not provided then the FoM defaults to using the first options, namely:

```
[ ]:
```

```
FoM_options = {'error':'exp', 'norm':'data_points'}
```

Currently, the only option for the overall form of the FoM is that of a Chi-square. Details of the implementation can be obtained by using `help`

, for example:

```
[ ]:
```

```
from MDMC.refinement.FoM import ChiSquaredExpError
help(ChiSquaredExpError)
```

### Running the refinement

The `Control`

object sets up and runs the refinement (using `Control.refine`

). It uses the experimental datasets to refine the fitting `Parameters`

for the specified `Simulation`

:

```
[ ]:
```

```
# Assuming a Universe called universe and a Simulation called simulation have been created
from MDMC.control import Control
control = Control(simulation=simulation,
exp_datasets=exp_datasets,
fit_parameters=fit_parameters,
MC_norm=1,
minimizer_type="MMC",
FoM_options = FoM_options,
MD_steps=424620,
equilibration_steps=1000)
```

To refine the specified `fit_parameters`

, pass the number of refinement steps to `Control.refine`

.

If 0 steps are passed, an MD simulation is performed, the simulation `Observable`

is calculated, and the FoM is calculated, however the `fit_parameters`

are not modified. This can be used to determine the agreement between a simulation with the current `fit_parameters`

and the experimental data.

During the refinement, in addition to running the MD that is used to calculate `Observable`

s, the `Universe`

can be equilibrated by providing a number of `equilibration_steps`

that will be run in between the refinement steps. In general what constitutes an appropriate amount of equilibration will depend on both the `Universe`

and the `Parameters`

being varied. However, provided the changes caused by the refinement process are small, less equilibration should be needed between steps
than would be when first creating the `Simulation`

object. `equilibration_steps`

is an optional argument, with a default value of 0 (no equilibration).

```
[ ]:
```

```
# So that the MD simulation size can be minimized, the Q min is increased and
# the Q resolution is reduced.
import numpy as np
exp_obs = control.observable_pairs[0].exp_obs
Q_slice = slice(6, len(exp_obs.Q), 2)
Q = exp_obs.Q[Q_slice]
E = exp_obs.E
# copy the updated Q values back to the control.observable
control.observable_pairs[0].exp_obs.independent_variables = {'E':E, 'Q':Q}
control.observable_pairs[0].MD_obs.independent_variables = {'E':E, 'Q':Q}
```

## Resolution Function

As the dataset included a vanadium sample, this will be used to determine the instrument resolution as a function of momentum. This function is then used in calculations involving that corresponding dataset. Note for numerically defined reoslutions, the resolution function is not accessible as an array. The resolution function can be accessed from the `Observable`

(note that it is transformed into the time domain to enable it to be applied multiplicatively to `FQt`

rather than convolving
with `SQw`

):

```
[ ]:
```

```
exp_obs = control.observable_pairs[0].exp_obs
resolution_function = exp_obs.resolution_functions['SQw']
# Note that for multidimensional resolution functions, the innermost independent variable should be passed first
t = np.linspace(0, 1e5)
resolution_array = resolution_function(t, Q)
```

The real and imaginary components of this can then be plotted as a 3D surface (note that for a Gaussian lineshape in the energy domain, one would expect no imaginary components and another Gaussian shape in the time domain):

```
[ ]:
```

```
%matplotlib notebook
from matplotlib import cm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def plot_SQw(Q, E, SQw):
fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y = np.meshgrid(E, Q)
surf = ax.plot_surface(X, Y, SQw, cmap=cm.viridis)
plt.show()
```

```
[ ]:
```

```
plot_SQw(Q, t, np.real(resolution_array))
plot_SQw(Q, t, np.imag(resolution_array))
```

Note that for values outside the original range of the vanadium sample, nearest neighbour extrapolation is used.

When applied to calculated data the resolution window is normalised to have a value of 1 at time zero for all momentum. As the zero time component is directly proportional to the integral in the energy domain, this is equivalent to enforcing the static structure factor of vanadium is constant for all momentum.

Once normalised in this way, this can be compared to the resolution from assuming a Gaussian lineshape with a FWHM of 84 ueV at a specific Q value:

```
[ ]:
```

```
from MDMC.common.constants import h
sigma_e = 2 * np.sqrt(2 * np.log(2)) * 0.084
# h is in units of eV s whereas system units are meV fs, so apply a factor of 1e3 * 1e15 to convert it
sigma_t = h * 1e18 / sigma_e
control.observable_pairs[0].exp_obs.t = t
time_resolution = exp_obs._calculate_resolution_window(resolution_function)
Q_index = 10
plt.figure()
plt.plot(t, np.real(time_resolution)[Q_index], label='Vanadium Resolution')
plt.plot(t, np.exp(-0.5 * (t / sigma_t) ** 2), label='Gaussian Resolution')
plt.legend()
```

## Refinement Output

As a refinement is running, it will print information from each step to the terminal (`stdout`

). The specific information depends on which minimizer is selected, however it will always include the current Figure of Merit and the values of the refined parameters. This information will also be saved to the comma separated variable file, results.csv. **It is important to note that this will overwrite any results.csv which exists in the directory**.

Upon the completion of the refinement, the final parameter values will be output to the terminal. If `auto_scale`

is set for any of the datasets, then the relevant scale factor(s) will also be printed. For stochastic minimization algorithms, such as MMC (Metropolis Monte Carlo), these values may not be the parameter values which resulted in the minimum Figure of Merit; however it is highly probable that these values will produce a Figure of Merit which is not significantly larger than the
minimum achieved.

```
[ ]:
```

```
control.refine(3)
```

## Refinement plotting

If MDMC is being run in a Jupyter notebook, and matplotlib and ipywidgets are installed, it is also possible to dynamically plot one or more variables while the refinement is running. For example, it might be desirable to see how the FoM and one or more potential parameters change over time:

```
[ ]:
```

```
# Install matplotlib and ipywidgets if they are not already installed
!pip install matplotlib
!pip install ipywidgets
from MDMC.utilities import plot_progress
contol = plot_progress(control, ['FoM', 'charge'])
control.refine(3)
```

If dynamic plotting is used, only the last 5 steps of the text will be shown, so that the plots are always visible. For the same reason, it is recommended that a maximum of 8 plots are displayed at any time.

## FoM and Parameter Plotting

After a refinement has completed, the results can be read in and plotted with pandas. Unlike dynamic plotting, this can be done whether or not the refinement has been run within a Jupyter notebook, but it still requires matplotlib to be installed:

```
[ ]:
```

```
import pandas as pd
results = pd.read_csv('results.csv')
# If only the y variable is provided (e.g. 'FoM'), it is plotted against the number of steps)
results.plot(y='FoM')
```

```
[ ]:
```

```
# The x axis can also be specified
results.plot(x='FoM', y='charge')
```

## Observable Plotting

Plotting software can be used to visualize the experimental and simulation `Observable`

objects. Here `matplotlib`

is used to define a function for plotting the dynamic structure factor, `SQw`

.

Each comparable experimental and simulation `Observable`

will be part of an `ObservablePair`

. These will have the same order as the `exp_datasets`

that were passed to `Control`

. So the QENS experimental data will be part of the first (zero index) `ObservablePair`

:

```
[ ]:
```

```
obs_pairs = control.observable_pairs
QENS_pair = obs_pairs[0]
```

Each `ObservablePair`

must have an `Observable`

from experimental data (`ObservablePair.exp_obs`

) and an `Observable`

from simulation (`ObservablePair.MD_obs`

). Note that as the `rescale_factor`

is only taken into account when calculating the FoM, `ObservablePair.exp_obs`

will not be scaled. In order to ensure the scales match, `ObservablePair.rescale_factor`

can be accessed directly. In the case of the QENS `ObservablePair`

(and other 2-D data), these can be plotted using the
`plot_SQw`

function defined above.

```
[ ]:
```

```
E = QENS_pair.exp_obs.E
Q = QENS_pair.exp_obs.Q
# Plot the experimental SQw
plot_SQw(Q, E, QENS_pair.exp_obs.SQw[0])
```

```
[ ]:
```

```
# Plot the simulation SQw
plot_SQw(Q, E, QENS_pair.MD_obs.SQw[0])
```

```
[ ]:
```

```
# Plot the experimental SQw with rescaling
plot_SQw(Q, E, QENS_pair.MD_obs.SQw * QENS_pair.rescale_factor)
```