Vector AutoRegression (VAR) Simulation¶
Introduction¶
Vector AutoRegression (VAR) is a statistical model that captures the linear interdependencies among multiple time series. VAR models treat all variables in the system as endogenous, allowing for the analysis and forecasting of systems where variables mutually influence each other.
This approach is particularly useful for analyzing and forecasting systems where variables influence each other. In economics and finance, VAR models are commonly applied to study the dynamic impact of shocks in one variable on the entire system of variables. MIMIC, however, uses VAR models to simulate the dynamics of a microbiome system, where the abundances of different microbial species influence each other over time.
VAR(1) Process¶
The VARsim.py
script simulates data from a VAR(1) process, which is the simplest form of VAR model where each variable’s current value is influenced by the immediately preceding values of all variables in the system. The general formula for a VAR(1) process with $ n $ variables is:
where:
$ X_t $ is the vector of variables at time $ t $,
$ A $ is the matrix of coefficients, capturing the influence of each variable’s previous value on the current value of all variables in the system,
$ \epsilon_t $ is the vector of error terms, assumed to be normally distributed with mean 0 and a specific standard deviation.
In the script, generate_var1_data
method is used to simulate data based on the VAR(1) model parameters:
n_obs
: Number of observations to generate.coefficients
: Coefficient matrix for the VAR(1) process. It should be a square matrix of shape $ (n, n) $.initial_values
: Initial values for the process, a vector of shape $ (n,) $.noise_stddev
: Standard deviation of the noise term.
This method generates a time series data set following the specified VAR(1) process, allowing for the exploration of dynamic relationships among variables.
Visualization Methods¶
The script includes functions for visualizing the simulated data, making it easier to analyze the interdependencies and dynamics within the system.
Notebook Structure¶
This notebook will show how to simulate VAR models using the VARsim.py
script, including specifying parameters directly in the code and importing from a JSON file. We will also explore various methods for visualizing the simulated data to derive insights into the modeled system’s dynamics.
Objective¶
The goal is to simulate VAR model dynamics using the VARsim.py
class, visualize the simulated data, and understand the interactions within the VAR model with an example use-case. We aim to gain insights into the system’s dynamics through simulation and visualization techniques.
Example Usage of VARsim.py
VAR Model Simulation¶
In this section, we will demonstrate the practical application of the VARsim.py class through a comprehensive example. Our goal is to provide you with a hands-on understanding of how to simulate Vector Autoregression (VAR) model dynamics, visualize the resulting data, and interpret the interactions between variables within the model. This example is designed to guide you step-by-step through the process, from setting up the simulation environment to analyzing the outcomes with visualization techniques.
Importing Libraries¶
[1]:
from mimic.model_simulate.sim_VAR import *
Creating a VAR simulation model¶
In this section, we are just going to create a VAR simulation model class and print the parameters it is initialized with.
[2]:
model = sim_VAR()
model.print_parameters()
We can see that the initial parameters are set to null
. If we create a VAR model or a gMLV model, the initial parameters are going to be different for each model. However, a VAR model is going to have the following parameters:
n_obs: Number of observations to generate.
coefficients: Coefficient matrix for the VAR(1) process. It should be a square matrix of shape (n, n).
initial_values: Initial values for the process, a vector of shape (n,).
noise_stddev: Standard deviation of the noise term.
output: This is a string that specifies the output format. It can be either
show
,save
orboth
. If it isshow
, the plot is going to be displayed on the screen. If it issave
, the plot is going to be saved in the current working directory. If it isboth
, the plot is going to be displayed on the screen and saved in the current working directory.
In the following section, we are going to specify the parameters we want to use in our simulation.
Running the Simulation with Specified Parameters¶
In this section, we’ll demonstrate how to run a VAR simulation using specified parameters directly in the code. This approach provides a quick and flexible way to test different configurations.
[3]:
# Specify the simulation parameters directly
# Number of observations to simulate. In this example we simulate 100 observations.
n_obs = 100
# Coefficients for the VAR model represented as a matrix. In this example, the VAR model has 2 lags and 2 variables.
coefficients = [[0.8, -0.2], [0.3, 0.5]]
# Initial values for the VAR model, represented as a matrix. In this example, the VAR model has 2 variables so the initial values are a 2x1 matrix.
initial_values = [[1], [2]]
# Standard deviation of the noise term in the VAR model. In this example, the noise term has a standard deviation of 1.
noise_stddev = 1.0
# Determines how the output should be handled. If 'show', the output is printed to the console. If 'save', the output is saved to a file. If 'none', the output is not printed or saved. If 'both', the output is printed and saved.
output = 'show'
[4]:
# now we can set the parameters for our VAR model created previously
model.set_parameters(n_obs=n_obs, coefficients=coefficients,
initial_values=initial_values, noise_stddev=noise_stddev, output=output)
[5]:
# Simulate the data. The data is simulated by calling the run method of the sim_VAR object and passing the name of the simulation method as an argument. In this example, the simulation method is 'VARsim'.
model.simulate("VARsim")
[6]:
# The simulated data is stored in the sim_VAR object and can be accessed using the data attribute of the object.
model.data
[6]:
array([[ 1.00000000e+00, 2.00000000e+00],
[ 1.86575292e+00, 1.71471962e+00],
[ 9.36875781e-01, 2.08484648e-01],
[-1.60825963e+00, 2.25586896e-01],
[-7.80926557e-01, 4.57946392e-02],
[-1.91372818e+00, -9.16449928e-01],
[-1.22230452e+00, -8.62207154e-01],
[-1.21754644e+00, -2.90696003e+00],
[ 2.47342249e-01, -9.97933732e-01],
[ 1.66354232e+00, -2.49801030e-01],
[-1.70739040e-01, -3.32194323e-02],
[ 5.53857593e-01, 7.13726002e-01],
[ 5.42485682e-01, -5.10378692e-01],
[ 2.32470329e-02, 2.71940603e-01],
[-1.64461809e+00, 4.62760992e-01],
[-3.63527445e+00, -6.63373681e-01],
[-2.77905681e+00, -1.39733653e+00],
[-1.89862384e+00, -2.49523405e-01],
[-2.51426916e+00, -1.22225073e+00],
[-2.25268180e+00, -1.85264487e+00],
[-5.23518156e-01, -1.99703295e+00],
[-3.48098197e-01, -1.82586229e+00],
[ 1.25597309e+00, -4.99617076e-01],
[ 1.08136736e+00, -3.09479686e-01],
[ 1.59066994e+00, -9.97820827e-02],
[ 7.08138190e-01, -8.95485846e-01],
[-8.97082578e-01, -2.59997190e-01],
[-1.04178063e+00, -9.42000638e-01],
[-9.68814017e-01, -2.03962153e+00],
[ 1.05640718e-01, -3.82454395e-01],
[-6.64568444e-02, -1.23111825e+00],
[-6.27854396e-01, 1.13330674e+00],
[-1.04408591e+00, -2.76490804e-01],
[ 2.88074257e-01, -2.51719049e-01],
[-1.46026458e-01, -2.41267169e-01],
[-9.64122931e-01, -5.59617227e-02],
[-1.31504148e+00, -4.52628638e-01],
[-4.22874738e-01, -3.18593809e-01],
[-1.72684038e-01, -7.40474966e-01],
[-5.25570445e-01, 3.86730165e-01],
[-4.16713939e-01, 7.96070750e-01],
[-4.89408391e-01, 1.88444609e+00],
[-7.42597394e-01, 5.28853961e-01],
[-2.92342807e-01, -5.54073777e-01],
[-7.40296445e-01, 6.36273700e-01],
[-1.65887854e+00, -6.13591002e-02],
[-3.35178644e+00, -1.70325568e+00],
[-3.85093341e+00, -2.36879923e+00],
[-1.87198325e+00, -1.76896525e+00],
[-4.53905452e-01, -9.02601119e-01],
[-8.35820106e-01, -3.03338590e-01],
[-2.69861655e+00, -7.81159399e-01],
[-2.23642537e+00, -4.90695320e-01],
[-1.28544695e+00, -1.47325925e+00],
[-1.25850976e+00, -1.76521415e+00],
[ 4.41034364e-01, -1.95481795e+00],
[ 1.30764750e+00, -1.29795448e+00],
[ 1.99049960e+00, 4.18733704e-04],
[ 1.38373988e+00, 2.81628739e+00],
[ 1.34956803e+00, 1.23121854e+00],
[ 9.90619520e-01, -2.91994650e-02],
[ 8.84275944e-03, 2.29873559e-01],
[ 1.50351770e-01, 1.93473696e+00],
[-6.83032334e-02, 1.75661998e+00],
[-3.95462772e-01, 4.86593899e-01],
[-1.72946490e+00, -4.38255234e-01],
[-1.28938039e+00, -6.82630193e-01],
[-2.10019765e+00, -1.44975260e-01],
[-3.12681843e+00, -2.50584956e+00],
[-2.33474826e+00, -2.86668475e+00],
[-2.23940713e+00, -4.48226298e-01],
[-1.15341900e+00, -3.05021431e-01],
[-5.50828738e-01, -1.61341102e+00],
[-2.70346294e-01, -5.42518556e-01],
[-1.24311521e-01, -1.32158468e+00],
[-1.98245938e+00, 1.71592398e-01],
[ 3.40657735e-01, 1.65788736e+00],
[ 5.29529945e-01, 4.24943151e-01],
[-2.53534168e-01, 3.10227407e-01],
[-1.24135680e+00, -2.93475097e-01],
[-6.40683243e-01, -1.26351119e+00],
[-1.60138276e+00, -1.95864785e+00],
[-2.30779324e+00, -1.82802165e+00],
[-1.28835645e+00, -9.51417783e-02],
[-2.04603513e+00, -1.34833167e+00],
[-1.32666035e+00, -3.01802701e+00],
[-1.08920959e-01, -1.58653848e+00],
[ 1.23727123e+00, -6.35216636e-01],
[ 1.54914003e+00, -1.95296169e+00],
[ 2.68638392e+00, -2.06189332e+00],
[ 2.08552938e+00, 1.46145764e+00],
[ 1.72333870e+00, 1.87424792e+00],
[ 1.87610302e+00, 1.31112395e+00],
[ 1.52072736e+00, 1.65546481e+00],
[ 7.22468406e-01, 1.03547760e+00],
[-3.35388106e-01, 4.03683201e-01],
[ 6.40411815e-01, -6.06660422e-02],
[ 1.23132593e+00, 1.87311096e-01],
[ 1.31406498e+00, 1.65667113e+00],
[ 8.27079461e-01, 2.09094212e+00]])
[7]:
# The simulated data can be saved to a file by calling the save method of the sim_VAR object and passing the name of the file as an argument. In this example, the file is saved as 'simulated_data.csv'.
model.save_data(r'./simulated_data.csv')
[8]:
# You can also access the simulation parameters using the print_params method of the sim_VAR object.
model.print_parameters()
[9]:
# as well as the parameters
model.save_parameters(r'./simulated_parameters.json')
[10]:
# If no coefficients are specified, the default values are used
# for example, in this example the noise and coefficients are not specified
simulator = sim_VAR()
simulator.set_parameters(n_obs=n_obs, initial_values=[[1], [2]], output='show')
simulator.print_parameters()
simulator.simulate("VARsim")
simulator.print_parameters()
Warning: Missing or None parameters for VAR simulation. Using default values for: ['coefficients', 'noise_stddev']
The simulator can also be run using parameters saved in a json file, which is useful for managing and reusing configurations, especially if the matrices are big. We’ll demonstrate this approach in the next section.
For example, you could specify the parameters in the following way:
{
"n_obs": 97,
"coefficients": [
[
0.8,
-0.2,
0.3
],
[
0.3,
0.5,
-1.0
],
[
0.2,
-0.1,
0.4
]
],
"initial_values": [
[
1
],
[
2
],
[
0
]
],
"noise_stddev": 1.2,
"output": "both"
}
[11]:
model.read_parameters(r'./parameters2.json')
model.print_parameters()
model.simulate("VARsim")
The script also contains different methods for visualizing the simulated data, including plotting the time series of each variable in a different plot.
[12]:
model.make_plot(model.data)
[13]:
model.make_plot_overlay(model.data)