ML paper misc

Accurate Uncertainty Estimation and Decomposition in Ensemble Learning (NeurIPS 2019)

image-20220802174158525
  1. aleatoric uncertainty: 来自数据集,如sensor数据采集的误差,不能被减少

  2. epistemic uncertainty:来源于对数据认知的缺失,可以通过增加数据量减少误差

  • parametric uncertainty:模型参数的不确定性
  • structural uncertainty:当前模型是否足够描述数据集

Generalizing uncertainty decomposition theory in climate change impact assessments (HydrologyX 2019)

不确定性分解方法:

  1. 标准化方法
image-20220803164911473

使用“范围”、“方差”作为不确定性度量

  1. Lee2016

    image-20220803165108107

    标准方法先取平均值,然后应用不确定性度量,而 Lee 的方法先应用不确定性度量,然后取平均值。

  2. ANOVA

    假设image-20220803165212768

image-20220803165229005

​ x是随机变量,均值为0,方差解释为自然变异或者内部变异

​ 最小化:

image-20220803165413291

​ 阶段k的不确定性度量为:

image-20220803165455128

Kernel Identification Through Transformers (NeurIPS 2021)

Intro

the chosen kernel determines both the inductive biases and prior support of functions under the GP prior.

Kernel选择在高斯过程中非常重要

使用Transformer来选kernel

each kernel’s suitability is instead usually determined via a proxy for the model evidence, such as the Bayesian Information Criterion (BIC)

In this work, we utilise eight primitive kernels: the squared exponential; periodic; white noise; three variants of the Matern kernel ; the cosine kernel, and our novel variant of the linear kernel. 包括stationary的和non-stationary的kernel

Kernel之间的操作包括加和、乘积、卷积、分解、仿射变换

image-20220807221455783

白噪声+stationary kernel

白噪声+non-stationary kernel(线性kernel):

image-20220807221620786

线性kernel对每个输入维度使用独立的方差

The product between the linear kernel and the noise kernel generates a form of noise whose variance changes linearly with respect to the inputs.


Spatio-Temporal Variational Gaussian Processes (NeurIPS 2021)

Intro

背景知识:

GPs可以parallel algorithms on GPUs(Exact Gaussian processes on a million data points),但是the sparse GP approach is perhaps the most popular, and is typically combined with mini-batching to allow training on massive datasets

efficient inference:structured kernel interpolation (SKI, [52]) which requires only that inducing points be on a grid.

In the spatio-temporal case, sparsity has been used in the spatial dimension

spatio-temporal GP, which rewrite the GP prior as a state-space model and use filtering to perform inference in $O(Nd^3)$, where d is the dimensionality of the state-space.

Sparse GP and spatio-temporal GP have been combined by constructing a Markovian system in which a set of spatial inducing points are tracked over time

已有工作缺点:

existing methods for spatio-temporal GPs make approximations to the prior conditional model or do not exploit natural gradients, meaning they do not provide the same inference and learning results as state-of-the-art variational GPs in the presence of non-conjugate likelihoods or sparsity, which has hindered their widespread adoption.

本文工作

可以provide the exact same results as standard variational GPs

ST-VGP is derived using a natural gradient variational inference approach based on filtering and smoothing. We also derive this method’s sparse variant

Definition and Model

输入数据:X:一维时间,多维空间

image-20220809172702433 image-20220809172729591
State Space Spatio-Temporal Gaussian Processes

reformulate the prior in Eq. (1) as a state space model, reducing the computational scaling to linear in the number of time points

assumption is that the kernel is both Markovian and separable between time and space: $κ(t, s, t^′, s^′) = κ_t(t, t^′) κ_s(s, s^′)$.

Markovian kernel to refer to a kernel which can be re-written in state-space form

GP prior as a stochastic partial differential equation (SPDE)

image-20220809173752837

where $w(t, s)$ is a (spatio-temporal) white noise process and $A_s$ a suitable (pseudo-) differential operator

SPDEs of this form can represent a large class of separable and non-separable GP models.