Causality guides ML through change
Machine learning (ML), and statistics, is, in essence, a field dealing with estimating probability distributions from data. It is used within causal inference as a tool for learning observed probability distributions, or properties of these, to learn a directed acyclic graph (DAG, causal discovery) or estimate an interventional distribution (cause-effect estimation). Since causal inference is, in effect, a language that may be used to describe the changes a system may endure and articulate how these may impact the probability distribution of the system, there are many interesting links with subfields of machine learning which deal with the problem of learning from data generated in such changing environments
Generalisability and robustness in ML deal with learning a probability distribution from observed data that is valid for unobserved data. Insofar as any changes in the data are caused by a changing underlying generating mechanism, causality gives the language to articulate these changes and decide which parts, or factors, of the probability distribution need changing. With regard to changes resulting from differing sampling procedures, more recent developments
Similarly, co-variate shift and non-stationarity describe situations where the data distribution of subsets of observations generated at different times has changed. In the case of a co-variate shift, where the distribution of the inputs changes, knowing the underlying causal structure can help determine if we should modify our predictive models in response to this change. If the inputs correspond to the causes and outputs to effects, then the modularity of the Functional Causal Model (FCM) dictates that this will not influence the conditional distribution of outputs given inputs. When the prediction task is anti-causal, in the sense that inputs are effects and outputs are causes, the co-variate shift does necessitate that the prediction model is changed.
Something similar occurs in semi-supervised learning, where complementary information about the distribution of the inputs is used in learning about the conditional distribution of outputs given inputs. This may be fruitful if the modelling is done in the anti-causal direction. However, if modularity is satisfied in the causal direction, this cannot help
Transfer learning
Active learning
In reinforcement learning, an agent chooses actions in a stochastic environment to maximize his expected reward. Causal inference can guide the process of knowing which actions to take to better explore (sample) the different interventional distributions implied by the FCM that governs the environment
In ML, latent modelling techniques, such as variational autoencoders
Finally, hybrid and physically informed models attempt to incorporate, a priori, knowledge about the phenomenon at hand into ML algorithms which learn from data. From a causal point of view, this may be incorporated by keeping fixed, known subgraphs of the causal DAG. Alternatively, when mixed data from different environments are available, physics-style conservation restrictions may be introduced. In this case, several, possibly natural, interventions have occurred. Using the modularity assumption of FCMs, we may enforce those models corresponding to equations not intervened on remain invariant, such as is applied with