## Linear Estimator

**NB**: for the formalism needed to understand what follows please go to MuTE page and read **SOME FORMALISM** section

The linear estimator method works under the assumption that the overall process has a joint Gaussian distribution. This assumption allows to work with well known expressions for the probability density functions. Under this assumption, the two CE terms defining the TE in ( equation (2) ) are expressed by means of linear regressions involving the past states of the systems collected in the vector variables, Barrett (2010). When the UE is implemented, is approximated with the vector of length , , and the same for and which are approximated by and (here ). When the NUE is implemented, the embedding vectors will contain only the components resulting from the selection procedure. Then, an unrestricted regression of on the full vector , and a restricted regression of on the reduced vector , are performed as follows:

(1)

where and are vectors of linear regression coefficients. The terms and

are scalar white noise residuals with variance and .

Under the Gaussian assumption, it has been demonstrated, Barnett (2009), that the entropy of conditioned to

the unrestricted or restricted regression vectors is, respectively, and , from which it follows immediately the TE estimate:

(2)

The unrestricted and restricted regression models in (1) were estimated by the least-squares method. In the UE implementation, the order of the regressions was selected by the Bayesian information criterion, Schwarz (1978); in the NUE implementation, the order resulted implicitly from the selection procedure. In NUE, maximization of the mutual information between the component selected at the step and the target variable (step (*) of the algorithm ) was obtained in terms of minimization of the CE , where denotes the variance of the residuals of the linear regression of on . Here, the randomization procedure applied to test candidate significance consisted time-shifting the points of by a randomly selected lag (of at least 20 lags, set to avoid autocorrelation effects), Quiroga (2002).

The statistical significance of the TE estimated through the UE approach was assessed by the parametric F-test for the null hypothesis that the coefficients of which weigh the driving variable are all zero Brandt (2007). In our case, the test statistic is , where and are the residual sum of squares of the restricted and the unrestricted model, and is the time series length. The TE is considered statistical significant if is larger than the critical value of the Fisher distribution with degrees of freedom at the significance level .

**Bibliography**

- Multivariate Granger causality and generalized variance. In: Phys Rev E, 81 (4), pp. 041907, 2010.
- Granger causality and transfer entropy are equivalent for Gaussian variables. In: Phys Rev Lett, 103 (23), pp. 238701, 2009.
- Estimating the dimension of a model. In: Ann Stat, 6 (2), pp. 461–464, 1978.
- Performance of different synchronization measures in real data: a case study on electroencephalographic signals. In: Phys Rev E, 65 (4), pp. 041903, 2002.
- Multiple Time Series Models. SAGE Publications, 2007, ISBN: 9781412906562.