Non-Uniform Embedding

NB: for the formalism needed to understand what follows please go to MuTE page and read SOME FORMALISM section

Non uniform embedding (NUE) can be a convenient alternative to UE. This approach is based on the progressive selection, from a set of candidate variables including the past of X, Y, and \mathbf{Z} considered up to a maximum lag (\textit{candidate set}), of the lagged variables which are more informative about the target variable Y_n. At each step, selection is performed maximizing the amount of information that can be explained about Y by observing the variables considered with their specific lag up to the current step. This results in a criterion for maximum relevance and minimum redundancy for candidate selection, so that the resulting embedding vector V=[V_n^X\, V_n^Y\, V_n^Z] includes only the components of X_n^-, Y_n^- and \textbf{Z}_n^-, which contribute most to the description of Y_n. Given the candidate set, the procedure is composed by the main steps described in the following pseudo code:

  1. Get the matrix with all the candidate terms MC =[X_{n-1} \ldots X_{n-l_X} Y_{n-1} \ldots Y_{n-l_Y} \textbf{Z}_{n-1} \ldots \textbf{Z}_{n-l_Z}], with l_X, l_Y, l_Z representing the maximum lag considered for the past variables of the observed processes;
  2. Run the procedure to select the most informative past variables and the optimal embedding vector:
    1. Initialize an empty embedding vector V_n^{(0)}
    2. At the k-th iteration, where k runs on the maximum number of candidates in MC, after having chosen k-1 candidates collected in the vector V_n^{(k-1)}: for 1 \leq i \leq number of current candidate terms
      • add the i-th term of \mc, W_n^{(i)}, to a copy of V_n^{(k-1)} to form the temporary storage variable V_n' = [W_n^{(i)}V_n^{(k-1)}]
      • compute the mutual information between Y_n and V_n', estimating the probability density function according to the chosen estimator
    3. Among the tested W_n^{(i)}, select the term \hat{W}_n which maximizes the mutual information
    4. if \hat{W}_n fulfills a test for candidate significance, put it in the embedding vector, V_n^{(k)}=[\hat{W}_n V_n^{(k-1)}], delete it from MC and increment k by 1 (^*)
    5. else the procedure ends returning V=V_n^{(k-1)}
  3. Use Y_n and the full embedding vector V=[V_n^X\, V_n^Y\, V_n^Z] to evaluate the third and fourth entropy values of ( equation (2) ) and, consequently, the lowest CE term (CE2)
  4. Take the subset of V without the past states belonging to the source process, [V_n^Y\, V_n^Z] to evaluate the first and the second term of ( equation (2) ) and, consequently, the highest CE term (CE1)
  5. compute TE as equal to the difference CE1 – CE2

As described above, candidate selection is performed maximizing the mutual information between the vector of the candidates already selected, the tested candidate, and the target variable. As we will see in the following sections, the practical implementation of this general criterion results in optimizing different quantities (i.e., the conditional entropy or the conditional mutual information, depending on the estimator chosen). This is because the utilization quantities chosen ad-hoc for each specific estimator has been shown to yield optimal performances in the reconstruction of the optimal embedding for an assigned target process, Kugiumtzis (2013).

At step (^*), the test for candidate significance is performed at the k-th step comparing the conditional mutual information between the target variable and the selected candidate given the candidates previously selected up to the (k-1)-th step, I(Y_n;\hat{W}_n|V_n^{(k-1)}), with its null distribution empirically built by means of a proper randomization procedure applied to the points of \hat{W}_n. The test for candidate significance is fulfilled if the original measure I(Y_n;\hat{W}_n|V_n^{(k-1)}) is above the 100(1-\alpha)^{\mbox{th}} percentile of its null distribution. In order to maximize detection accuracy, the adopted randomization procedure varied for each estimator, and is thus described in the relevant section.

Summarizing, the non uniform embedding is a sort of feature selection technique choosing, among the available variables describing the past of the observed processes, those who are the most significant – in the sense of predictive information – for the target variable. Moreover, given the fact that the variables are included into the embedding vector only if associated with a statistically significant contribution to the description of the target, the statistical significance of the TE estimated with the NUE approach results simply from the selection of at least one lagged component of the source process. In other words, if at least one component from X is selected by NUE, the estimated TE is strictly positive and can be assumed as statistically significant. If this is not the case, the estimated TE results exactly zero and is assumed as non-significant.


  1. Kugiumtzis, D.: Direct-coupling information measure from nonuniform embedding. In: Phys Rev E, 87 , pp. 062918, 2013.

Looking for related topics? Search Google