In the context of Denoising Diffusion Probabilistic Models (DDPMs), the term Markovian refers to the structure of the diffusion process and the reverse process. Specifically, it describes how these processes assume a Markov property, which is foundational for modeling the evolution of the system.
What is Markovian?
The Markov property states that the future state of a system depends only on the current state and not on the sequence of states that preceded it. Mathematically:
$$ P(x_{t+1} \mid x_t, x_{t-1}, \dots, x_0) = P(x_{t+1} \mid x_t) $$
In the case of DDPMs, this property is utilized to define how data points (e.g., images) are transformed into progressively noisier versions during the forward process and how they are reconstructed in the reverse process.
Markovian in DDPMs
- Forward Diffusion Process:
The forward process incrementally adds Gaussian noise to a data point \(x_0\) over \(T\) timesteps to produce a sequence \(x_1, x_2, \dots, x_T\). This process is Markovian because: $$ q(x_t \mid x_{t-1}, x_{t-2}, \dots, x_0) = q(x_t \mid x_{t-1}) $$ Here, \(q\) represents the probability distribution, and the noise added at each step depends only on the immediately preceding state \(x_{t-1}\). - Reverse Process:
The reverse process aims to denoise \(x_T\) (a fully noisy version of \(x_0\)) back to the original data \(x_0\). This is also Markovian and is modeled as: $$ p(x_{t-1} \mid x_t, x_{t+1}, \dots, x_T) = p(x_{t-1} \mid x_t) $$ The reverse process depends only on the current state \(x_t\) to predict the previous state \(x_{t-1}\).
Why is the Markov Property Important in DDPMs?
- Simplifies Modeling:
The Markov assumption allows the entire diffusion process (both forward and reverse) to be described as a sequence of simple transitions between consecutive states. This reduces the complexity of learning the reverse process. - Tractable Probabilities:
By assuming a Markov process, the joint probability of the entire sequence can be factorized into a product of conditional probabilities: $$ q(x_{0:T}) = q(x_0) \prod_{t=1}^T q(x_t \mid x_{t-1}) $$ and $$ p_\theta(x_{0:T}) = p(x_T) \prod_{t=1}^T p_\theta(x_{t-1} \mid x_t) $$ This factorization is key to deriving the variational lower bound (VLB) and training the model. - Supports Efficient Sampling:
During inference, the Markov property ensures that each step of the reverse process depends only on the current state. This allows DDPMs to generate samples efficiently in a step-by-step manner.
Summary
The term "Markovian" in DDPMs refers to the assumption that both the forward diffusion process and the reverse denoising process follow the Markov property. This assumption simplifies the modeling and training of DDPMs, enabling the generation of high-quality data samples in a mathematically tractable way.