A challenge for physiologists and neuroscientists is to map information transfer between components of the systems that they study at different time-scales, in order to derive important knowledge on structure and function from the analysis of the recorded dynamics.

The components of physiological networks often interact in a nonlinear way and through mechanisms which are in general not completely known. It is then safer that the method of choice for analyzing these interactions does not rely on any model or assumption on the nature of the data and their interactions.

Transfer entropy and Granger causality have emerged as a powerful tool to quantify directed dynamical interactions.

With MuTE I would like to compare different approaches to evaluate transfer entropy, some of them already proposed, some novel, and implement them in a freeware MATLAB toolbox. Applications to simulated and real data will be presented.

Some Formalism

Before letting the reader go into the details of the embedding approaches and the entropy estimators we would like to introduce some formalism that we will use from now on.

Let us consider a composite physical system described by a set of M interacting dynamical
(sub) systems and suppose that, within the composite system, we are interested in evaluating the information flow from the source system \mathcal X to the destination system \mathcal{Y}, collecting the remaining systems in the vector \mathbf{\mathcal{Z}} = \left\{Z^k\right\}_{k = 1,\ldots,M-2}. We develop our framework under the assumption of stationarity, which allows to perform estimations replacing ensemble averages with time averages (for non-stationary formulations see, e.g., Ledberg (2012), and references therein). Accordingly, we denote X, Y and \mathbf{Z} as the stationary stochastic processes describing the state visited by the systems \mathcal{X}, \mathcal{Y} and \mathcal{Z} over time, and X_n, Y_n and \mathbf{Z}_n as the stochastic variables obtained sampling the processes at the present time n. Moreover, we denote X_n^-=[X_{n-1}X_{n-2}\ldots], Y_n^-=[Y_{n-1}Y_{n-2}\ldots], and \textbf{Z}_n^-=[\textbf{Z}_{n-1}\textbf{Z}_{n-2}\ldots] as the infinite-dimensional vector variables representing the whole past of the processes X, Y and \mathbf{Z}. Then, the multivariate transfer entropy (TE) from X to Y conditioned to \mathbf{Z} is defined as:

(1)   \begin{equation*} TE_{X \rightarrow Y|\mathbf{Z}} = \sum{p\left( Y_n, Y_n^-, X_n^-, \mathbf{Z}_n^- \right) log{\frac{p\left(Y_n | Y_n^-, X_n^-, \mathbf{Z}_n^- \right)}{p\left(Y_n | Y_n^-, \mathbf{Z}_n^- \right)}} } \end{equation*}

where the sum extends over all the phase space points forming the trajectory of the composite system. p(\textbf{a}) is then the probability associated with the vector variable \textbf{a} while p(b|\textbf{a}) = p(\textbf{a},b)/p(\textbf{a}) is the probability of observing b given that the variables forming the vector \textbf{a} are known. The conditional probabilities used in (1) can be interpreted as transition probabilities, in the sense that they describe the dynamics of the transition of the destination system from its past
states to its present state, accounting for the past of the other systems. Utilization of the transition
probabilities makes the resulting measure able to quantify the extent to which the transition of the destination system \mathcal{Y} into its present state is affected by the past states visited by the source system \mathcal{X}. Specifically, the TE quantifies the information provided by the past of the process X about the present of the process Y that is not already provided by the past of Y or any other process included in \mathbf{Z}.

The formulation presented in (1) is an extension of the original TE measure proposed for pairwise
systems, Schreiber (2000), to the case of multiple interacting processes. The conditional TE
formulation, also denoted as partial TE, Vakorin (2010), Kugiumtzis (2013), rules out the information shared between X and Y that could be possibly triggered by their common interaction with \mathbf{Z}. Note that the TE can be seen as a difference of two conditional entropies (CE), or equivalently as a sum of four Shannon entropies:

(2)   \begin{equation*} \begin{aligned}   TE_{X \rightarrow Y|\mathbf{Z}} &= H(Y_n | Y_n^-, \textbf{Z}_n^-) - H(Y_n | Y_n^-, X_n^-, \textbf{Z}_n^-) \\   &= H(Y_n, Y_n^-, \textbf{Z}_n^-)-H(Y_n^-, \textbf{Z}_n^-)  \\   &\qquad {} - H(Y_n,Y_n^-, X_n^-, \textbf{Z}_n^-)+H(Y_n^-, X_n^-, \textbf{Z}_n^-)  \end{aligned}  \end{equation*}

The TE has a great potential in detecting information transfer because it does not assume any particular model that can describe the interactions governing the system dynamics, it is able to discover purely non linear interactions and to deal with a range of interaction delays, Vicente (2011). Recent research has proven that TE is equivalent to Granger Causality (GC) for Gaussianly distributed data, Barnett (2009), Hlavackova (2011). This establishes a convenient joint framework for both measures. Here we evaluate GC in the TE framework and compare a classical VAR model implemented in both versions, UE and NUE, with two model-free approaches.

Embedding frameworks

Take a look at the two embedding versions and their theoretical differences. A comparison between the two approaches is provided

Uniform Embedding

Uniform conditioned embedding schemes take into account the components to be included in the embedding vectors as selected arbitrarily or separately for each variable...

go to uniform embedding

Non-Uniform Embedding

According to the non-uniform embedding framework, only the past states that actually help the prediction are entered into the model, improving the prediction and avoiding the risk of overfitting...

go to non-uniform embedding


Select the estimator you are interested in. Look at the theoretical contents and how the method performs

Linear Estimator

The linear estimator method works under the assumption that the processes involved in the analysis have a joint Gaussian distribution. This assumption allows to work with well-known expressions for the probability density functions...

go to linear estimator

Binning Estimator

This approach is based on performing uniform quantization of the time series and then estimating the entropy approximating probabilities with the frequency of visitation of the quantized states...

go to binning estimator

Nearest Neighbot Estimator

Since its first introduction in 1967, Cover (1967), the nearest neighbours method has been shown to be a powerful non parametric technique for classification, density estimation, and regression estimation. This method can be used to estimate the entropy of a d-dimensional random variable X, H(x), starting from a random sample...

go to nearest neighbor estimator

Neural Network Estimator

Relying on neural networks, the proposed approach to Granger causality will be both non-parametric and based on regression, thus realizing the Granger paradigm in a non-parametric fashion...

go to neural network estimator

Want to share your code?

New methods and/or functions are very welcomed

Are you interested in sharing your contribution? Do you want your method(s)/function(s) to be part of MuTE?...

find out how


  1. Ledberg, Anders, Chicharro, Daniel: Framework to study dynamic dependencies in networks of interacting processes. In: Phys Rev E Stat Nonlin Soft Matter Phys, 2012.
  2. Schreiber, Thomas: Measuring information transfer. In: Phys Rev Lett, 85 (2), pp. 461, 2000.
  3. Vakorin, Vasily A, Kovacevic, Natasa, McIntosh, Anthony R: Exploring transient transfer entropy based on a group-wise ICA decomposition of EEG data. In: Neuroimage, 49 (2), pp. 1593–1600, 2010.
  4. Kugiumtzis, D.: Direct-coupling information measure from nonuniform embedding. In: Phys Rev E, 87 , pp. 062918, 2013.
  5. Vicente, Raul, Wibral, Michael, Lindner, Michael, Pipa, Gordon: Transfer entropy a model-free measure of effective connectivity for the neurosciences. In: J Comput Neurosci, 30 (1), pp. 45–67, 2011.
  6. Barnett, L., Barrett, A.B., Seth, A.K.: Granger causality and transfer entropy are equivalent for Gaussian variables. In: Phys Rev Lett, 103 (23), pp. 238701, 2009.
  7. Hlav'ackov'a-Schindler, Katerina: Equivalence of Granger causality and transfer entropy: A generalization. In: App Math Sci, 5 (73), pp. 3637–3648, 2011.
  8. Cover, Thomas, Hart, Peter: Nearest neighbor pattern classification. In: IEEE Trans Inf Theory, 13 (1), pp. 21–27, 1967.

Looking for related topics? Search Google