## Non-Uniform Embedding

NB: for the formalism needed to understand what follows please go to MuTE page and read SOME FORMALISM section

Non uniform embedding (NUE) can be a convenient alternative to UE. This approach is based on the progressive selection, from a set of candidate variables including the past of , , and considered up to a maximum lag (), of the lagged variables which are more informative about the target variable . At each step, selection is performed maximizing the amount of information that can be explained about by observing the variables considered with their specific lag up to the current step. This results in a criterion for maximum relevance and minimum redundancy for candidate selection, so that the resulting embedding vector includes only the components of , and , which contribute most to the description of . Given the candidate set, the procedure is composed by the main steps described in the following pseudo code:

1. Get the matrix with all the candidate terms , with , , representing the maximum lag considered for the past variables of the observed processes;
2. Run the procedure to select the most informative past variables and the optimal embedding vector:
1. Initialize an empty embedding vector
2. At the th iteration, where runs on the maximum number of candidates in , after having chosen candidates collected in the vector : for number of current candidate terms
• add the th term of \mc, , to a copy of to form the temporary storage variable
• compute the mutual information between and , estimating the probability density function according to the chosen estimator
3. Among the tested , select the term which maximizes the mutual information
4. if fulfills a test for candidate significance, put it in the embedding vector, , delete it from and increment by 1
5. else the procedure ends returning
3. Use and the full embedding vector to evaluate the third and fourth entropy values of ( equation (2) ) and, consequently, the lowest CE term (CE2)
4. Take the subset of without the past states belonging to the source process, to evaluate the first and the second term of ( equation (2) ) and, consequently, the highest CE term (CE1)
5. compute TE as equal to the difference CE1 – CE2

As described above, candidate selection is performed maximizing the mutual information between the vector of the candidates already selected, the tested candidate, and the target variable. As we will see in the following sections, the practical implementation of this general criterion results in optimizing different quantities (i.e., the conditional entropy or the conditional mutual information, depending on the estimator chosen). This is because the utilization quantities chosen ad-hoc for each specific estimator has been shown to yield optimal performances in the reconstruction of the optimal embedding for an assigned target process, Kugiumtzis (2013).

At step , the test for candidate significance is performed at the -th step comparing the conditional mutual information between the target variable and the selected candidate given the candidates previously selected up to the -th step, , with its null distribution empirically built by means of a proper randomization procedure applied to the points of . The test for candidate significance is fulfilled if the original measure is above the percentile of its null distribution. In order to maximize detection accuracy, the adopted randomization procedure varied for each estimator, and is thus described in the relevant section.

Summarizing, the non uniform embedding is a sort of feature selection technique choosing, among the available variables describing the past of the observed processes, those who are the most significant – in the sense of predictive information – for the target variable. Moreover, given the fact that the variables are included into the embedding vector only if associated with a statistically significant contribution to the description of the target, the statistical significance of the TE estimated with the NUE approach results simply from the selection of at least one lagged component of the source process. In other words, if at least one component from is selected by NUE, the estimated TE is strictly positive and can be assumed as statistically significant. If this is not the case, the estimated TE results exactly zero and is assumed as non-significant.

## Bibliography

1. Kugiumtzis, D.: Direct-coupling information measure from nonuniform embedding. In: Phys Rev E, 87 , pp. 062918, 2013.