Linear Estimator

NB: for the formalism needed to understand what follows please go to MuTE page and read SOME FORMALISM section

The linear estimator method works under the assumption that the overall process \{X, Y, \textbf{Z}\} has a joint Gaussian distribution. This assumption allows to work with well known expressions for the probability density functions. Under this assumption, the two CE terms defining the TE in ( equation (2) ) are expressed by means of linear regressions involving the past states of the systems collected in the vector variables, Barrett (2010). When the UE is implemented, X_n^- is approximated with the vector of length p, V_n^X= \left[ X_{n-1}, \ldots, X_{n-p} \right], and the same for Y_n^- and Z_n^- which are approximated by V_n^Y=[Y_{n-1}, ldots,Y_{n-p}] and V_n^\textbf{Z}=[\textbf{Z}_{n-1},\ldots,\textbf{Z}_{n-p}] (here m = 1, p = d). When the NUE is implemented, the embedding vectors will contain only the components resulting from the selection procedure. Then, an unrestricted regression of Y_n on the full vector V^{(u)}=[V_n^X \, V_n^Y \, V_n^\textbf{Z}]^T, and a restricted regression of Y_n on the reduced vector V^{(r)}=[V_n^Y\, V_n^\textbf{Z}]^T, are performed as follows:

(1)   \begin{align*} Y_n & = A^{(u)}V^{(u)} + \varepsilon_n^{(u)}  \\ Y_n & = A^{(r)}V^{(r)} + \varepsilon_n^{(r)} \end{align*}

where A^{(u)} and A^{(r)} are vectors of linear regression coefficients. The terms \varepsilon_n^{(u)} and
\varepsilon_n^{(r)} are scalar white noise residuals with variance \sigma^{(u)} and \sigma^{(r)}.
Under the Gaussian assumption, it has been demonstrated, Barnett (2009), that the entropy of Y_n conditioned to
the unrestricted or restricted regression vectors is, respectively, H(Y_n|V^{(u)})=0.5(\log \sigma^{(u)} + 2\pi e) and H(Y_n|V^{(r)})=0.5(\log \sigma^{(r)} + 2\pi e), from which it follows immediately the TE estimate:

(2)   \begin{equation*} \mbox{TE}_{X\rightarrow Y|Z} = \frac{1}{2} \log \frac{\sigma^{(r)}}{\sigma^{(u)}} \end{equation*}

The unrestricted and restricted regression models in (1) were estimated by the least-squares method. In the UE implementation, the order p of the regressions was selected by the Bayesian information criterion, Schwarz (1978); in the NUE implementation, the order resulted implicitly from the selection procedure. In NUE, maximization of the mutual information between the component \hat{W}_n selected at the step k and the target variable Y_n (step (*) of the algorithm ) was obtained in terms of minimization of the CE H(Y_n|\hat{W}_n,V_n^{k-1})=0.5(\log \sigma^{(k)} + 2\pi e), where \sigma^{(k)} denotes the variance of the residuals of the linear regression of Y_n on [\hat{W}_n,V_n^{k-1}]. Here, the randomization procedure applied to test candidate significance consisted time-shifting the points of \hat{W}_n by a randomly selected lag (of at least 20 lags, set to avoid autocorrelation effects), Quiroga (2002).

The statistical significance of the TE estimated through the UE approach was assessed by the parametric F-test for the null hypothesis that the p coefficients of A^{(u)} which weigh the driving variable X_n^p are all zero Brandt (2007). In our case, the test statistic is F=((RSS_r - RSS_u)/p)/(RSS_u/(N - Mp)), where RSS_r and RSS_u are the residual sum of squares of the restricted and the unrestricted model, and N is the time series length. The TE is considered statistical significant if F is larger than the critical value of the Fisher distribution with (p, N - p) degrees of freedom at the significance level \alpha=0.05.


  1. Barrett, Adam B, Barnett, Lionel, Seth, Anil K: Multivariate Granger causality and generalized variance. In: Phys Rev E, 81 (4), pp. 041907, 2010.
  2. Barnett, L., Barrett, A.B., Seth, A.K.: Granger causality and transfer entropy are equivalent for Gaussian variables. In: Phys Rev Lett, 103 (23), pp. 238701, 2009.
  3. Schwarz, Gideon: Estimating the dimension of a model. In: Ann Stat, 6 (2), pp. 461–464, 1978.
  4. Quiroga, R Quian, Kraskov, A, Kreuz, T, Grassberger, Peter: Performance of different synchronization measures in real data: a case study on electroencephalographic signals. In: Phys Rev E, 65 (4), pp. 041903, 2002.
  5. Brandt, P.T., Williams, J.T.: Multiple Time Series Models. SAGE Publications, 2007, ISBN: 9781412906562.

Looking for related topics? Search Google