“Bayesian Workflow”, 2020-11-02 (; similar):
The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit Bayesian models, but this still leaves us with many options regarding constructing, evaluating, and using these models, along with many remaining challenges in computation. Using Bayesian inference to solve real-world problems requires not only statistical skills, subject matter knowledge, and programming, but also awareness of the decisions made in the process of data analysis. All of these aspects can be understood as part of a tangled workflow of applied Bayesian statistics. Beyond inference, the workflow also includes iterative model building, model checking, validation and troubleshooting of computational problems, model understanding, and model comparison. We review all these aspects of workflow in the context of several examples, keeping in mind that in practice we will be fitting many models for any given problem, even if only a subset of them will ultimately be relevant for our conclusions.
Introduction
From Bayesian inference to Bayesian workflow
Why do we need a Bayesian workflow?
- “Workflow” and its relation to statistical theory and practice
Organizing the many aspects of Bayesian workflow
Aim and structure of this article
Before fitting a model
Choosing an initial model
Modular construction
Scaling and transforming the parameters
Prior predictive checking
Generative and partially generative models
Fitting a model
Initial values, adaptation, and warmup
How long to run an iterative algorithm
Approximate algorithms and approximate models
Fit fast, fail fast
Using constructed data to find and understand problems
Fake-data simulation
Simulation-based calibration
Experimentation using constructed data
Addressing computational problems
The folk theorem of statistical computing
Starting at simple and complex models and meeting in the middle
Getting a handle on models that take a long time to fit
Monitoring intermediate quantities
Stacking to reweight poorly mixing chains
Posterior distributions with multimodality and other difficult geometry
Reparameterization
Marginalization
Adding prior information
Adding data
Evaluating and using a fitted model
Posterior predictive checking
Cross validation and influence of individual data points and subsets of the data
Influence of prior information
Summarizing inference and propagating uncertainty
Modifying a model
Constructing a model for the data
Incorporating additional data
Working with prior distributions
A topology of models
Understanding and comparing multiple models
Visualizing models in relation to each other
Cross validation and model averaging
Comparing a large number of models
Modeling as software development
Version control smooths collaborations with others and with your past self
Testing as you go
Making it essentially reproducible
Making it readable and maintainable
Example of workflow involving model building and expansion: Golf putting
First model: logistic regression
Modeling from first principles
Testing the fitted model on new data
A new model accounting for how hard the ball is hit
Expanding the model by including a fudge factor
General lessons from the golf example
Example of workflow for a model with unexpected multimodality: Planetary motion
Mechanistic model of motion
Fitting a simplified model
Bad Markov chain, slow Markov chain?
Building up the model
General lessons from the planetary motion example
Discussion
Different perspectives on statistical modeling and prediction
Justification of iterative model building
Model selection and overfitting
Bigger datasets demand bigger models
Prediction, generalization, and poststratification
Going forward
View PDF: