## Nearest Neighbor Estimator

**NB**: for the formalism needed to understand what follows please go to MuTE page and read **SOME FORMALISM** section

Since its first introduction in 1967, Cover (1967), the nearest neighbours method has been shown to be a powerful non parametric technique for classification, density estimation, and regression estimation. This method can be used to estimate the entropy of a -dimensional random variable , , starting from a random sample of realizations of . Following the reasoning in Kraskov (2004), if we consider the probability distribution for the distance between and

its th nearest neighbour, the probability is equal to the chance that there is one point within distance from , that there are other points at smaller distances, and that points have larger distances from . Let be the mass of the -sphere centered at , . Then, the expectation value of is

(1)

where is evaluated through the trinomial formula and is the digamma function. The expectation is taken here over the positions of all other points, with kept fixed. An estimator for is then obtained by assuming that is constant in the entire -sphere. The latter gives

(2)

where is the dimension of and is the volume of the -dimensional unit sphere. For the maximum norm one has simply , while for the Euclidean norm. From \eqref{eq:elog} and \eqref{eq:p} we can evaluate and finally:

(3)

The NN estimator faces the issue of the bias in the estimation of multiple entropies for vector variables of different dimensions by computing entropy sums through a neighbor search in the space of higher dimension, and range searches in the projected sub-spaces of lower dimensions, Kraskov (2004). This approach can be fruitfully exploited for the estimation of the TE, as previously done, e.g., in Wibral (2011), Vicente (2011). To do this, we first rewrite the expression for TE in ( equation (2) ) in terms of the components of the embedding vector spanning a space of dimension :

(4)

The term is estimated through neighbor search in the dimensional space, while the three other terms are estimated through range searches in the spaces of dimension , and . Accordingly, adaptation of \eqref{eq:Hknn} to the four terms in

\eqref{eq:teNN} yields the equation for TE based on the nearest neighbor estimator:

(5)

where , and are the number of points whose distance from

, and , respectively, is strictly less than the distance from to its th neighbor.

In the NUE implementation of the NN estimator, maximization of the mutual information between the component

selected at the step and the target variable (step (*) of the algorithm ) was obtained in terms of maximization of the conditional mutual information , which was computed with the approach described above of estimating the four relevant entropies through a neighbor search in the space of higher dimension, and range searches in the projected sub-spaces of lower dimensions. Moreover, the randomization procedure applied to test candidate significance consisted in shuffling randomly and independently both the points of and those of . These techniques have been recently shown to be optimal for the selection of candidates in a non-uniform embedding approach using nearest neighbor entropy estimators, Kugiumtzis (2013).

**Bibliography**

- Nearest neighbor pattern classification. In: IEEE Trans Inf Theory, 13 (1), pp. 21–27, 1967.
- Estimating mutual information. In: Phys Rev E, 69 (6), pp. 066138, 2004.
- Transfer entropy in magnetoencephalographic data: Quantifying information flow in cortical and cerebellar networks. In: Prog Biophys Mol Biol, 105 (1), pp. 80–97, 2011.
- Transfer entropy a model-free measure of effective connectivity for the neurosciences. In: J Comput Neurosci, 30 (1), pp. 45–67, 2011.
- Direct-coupling information measure from nonuniform embedding. In: Phys Rev E, 87 , pp. 062918, 2013.