[Project CETI] This paper proposes a methodology for discovering meaningful properties in data by exploring the latent space of unsupervised deep generative models [fiwGAN, an InfoGAN]. We combine manipulation of individual latent variables to extreme values with methods inspired by causal inference into an approach we call causal disentanglement with extreme values (CDEV) and show that this method yields insights for model interpretability. With this, we can test for what properties of unknown data the model encodes as meaningful, using it to glean insight into the communication system of sperm whales (Physeter macrocephalus), one of the most intriguing and understudied animal communication systems.
The network architecture used has been shown to learn meaningful representations of speech; here, it is used as a learning mechanism to decipher the properties of another vocal communication system in which case we have no ground truth. The proposed methodology suggests that sperm whales encode information using the number of clicks in a sequence, the regularity of their timing, and audio properties such as the spectral mean and the acoustic regularity of the sequences. Some of these findings are consistent with existing hypotheses, while others are proposed for the first time.
We also argue that our models uncover rules that govern the structure of units in the communication system and apply them while generating innovative data not shown during training. This paper suggests that an interpretation of the outputs of deep neural networks with causal inference methodology can be a viable strategy for approaching data about which little is known and presents another case of how deep learning can limit the hypothesis space.
Finally, the proposed approach can be extended to other architectures and datasets.
…Beguš2020 proposes a technique to uncover individual latent variables that have linguistic meaning by setting them to extreme values outside of those seen in training and interpolating from there. In this work, we conversely treat the generator as an experiment in the vein of causal inference and test for observable properties of the data that have (or can be) hypothesized as meaningful.
When generating output samples, the incompressible noise X is sampled randomly, while the featural encoding t is set manually to a desired value. Since the consistency of output with regard to the encoding is only enforced in a loose way, this relationship often only becomes readily apparent when setting the numerical values outside the bounds seen in training, where the primary associated effect begins to dominate2, 3, 5.
We then apply statistical estimators on the candidate property samples derived from the raw generated outputs to determine whether there exists a statistically consistent relationship between the encoding and the outcomes.
This procedure gives rise to a methodology we call causal disentanglement with extreme values (CDEV) (Figure 2b).
Figure 2: Model and approach overview.
The space available for the featural encodings is limited; hence, finding the real-world attributes that map almost one-to-one with the encodings suggests that the generator considers them very important to generating convincing outputs, in which it is being checked by the discriminator with access to real data. We limit our featural encoding space to 5 bits (= 25 = 32 classes) for the 5 coda types present in the data to allow the model to capture compositionality. However, we demonstrate in Appendix A.1.1 that the method is robust to the number of bits chosen, as well as model training specifics. Similarly, on language data, the architecture uncovers meaningful properties even when there is a mismatch between the number of true meaningful classes and the size of the binary code (Beguš2021). Therefore, any intuition about the desired size of the encoding acts only as a rough prior. Additionally, the uniqueness of the encoding matches is verified by an unrelated method, presented in Appendix A.7.