This work is based on the project proposed here:

https://drive.google.com/file/d/1ZtHOWv2ArBuwY3pvqUdpm__R1EWpbsma/view?usp=drive_link

CS180_final_project_proposal (13).pdf

https://drive.google.com/file/d/1ZtHOWv2ArBuwY3pvqUdpm__R1EWpbsma/view?usp=drive_link

Introduction

The basic idea was to fit an autoencoder to many unlabeled pulse-height images and experiment with ways to use the resulting low-dimensional latent space for anomaly detection. This was motivated by the fact that autoencoders can learn representations of data that can be used for applications like anomaly detection and synthetic data generation.

Beta Variational Autoencoders

image.png

I decided to use a Beta Variational Autoencoder ($\beta$-VAE) model with PyTorch because they can typically learn disentangled latent spaces and can function as a generative model. This type of model optimizes a loss function composed of three objectives:

  1. MSE reconstruction error between input and output images,
  2. KL divergence between the latent space distribution and a standard normal distribution, , weighted by $\beta$, and
  3. L1 penalty based on the size of the model weights.

Objectives (1) and (3) encourage the model to learn a sparse and efficient representation of the dataset while (2) pushes the latent space to look like a standard normal distribution, resulting in a more robust fit and enabling synthetic data generation. (Generation works by sampling from a standard normal and feeding this sample into the model's decoder.)

Methods

norm_real_ph.png

I trained the beta-VAE on a dataset of 250k 16x16 PH images randomly selected from a population of 9.75 million across 7 sci-obs observing runs that used the same pulse-height PE threshold of 11.5 and modules from at least 2 domes. Note that, for simplicity, this dataset only contains pixel data and doesn't consider telescope geometry or temporal relations.

After fitting three models with small, medium, and large Beta values, I used PCA to reduce the 32-dimensional latent space to 3D (for visualization), then applied Gaussian Mixture Model (GMM) clustering to look for patterns. (I tuned the number of GMM components by optimizing the Bayesian information criterion.)

Model 1: Small Beta

With a small beta value, the model's latent space reveals the detailed structure of the training dataset (right). However, this latent space representation is specific to the training set and is not very useful for anomaly detection.