This work is based on the project proposed here:
https://drive.google.com/file/d/1ZtHOWv2ArBuwY3pvqUdpm__R1EWpbsma/view?usp=drive_link
CS180_final_project_proposal (13).pdf
https://drive.google.com/file/d/1ZtHOWv2ArBuwY3pvqUdpm__R1EWpbsma/view?usp=drive_link
The basic idea was to fit an autoencoder to many unlabeled pulse-height images and experiment with ways to use the resulting low-dimensional latent space for anomaly detection. This was motivated by the fact that autoencoders can learn representations of data that can be used for applications like anomaly detection and synthetic data generation.

I decided to use a Beta Variational Autoencoder ($\beta$-VAE) model with PyTorch because they can typically learn disentangled latent spaces and can function as a generative model. This type of model optimizes a loss function composed of three objectives:
Objectives (1) and (3) encourage the model to learn a sparse and efficient representation of the dataset while (2) pushes the latent space to look like a standard normal distribution, resulting in a more robust fit and enabling synthetic data generation. (Generation works by sampling from a standard normal and feeding this sample into the model's decoder.)

I trained the beta-VAE on a dataset of 250k 16x16 PH images randomly selected from a population of 9.75 million across 7 sci-obs observing runs that used the same pulse-height PE threshold of 11.5 and modules from at least 2 domes. Note that, for simplicity, this dataset only contains pixel data and doesn't consider telescope geometry or temporal relations.
After fitting three models with small, medium, and large Beta values, I used PCA to reduce the 32-dimensional latent space to 3D (for visualization), then applied Gaussian Mixture Model (GMM) clustering to look for patterns. (I tuned the number of GMM components by optimizing the Bayesian information criterion.)
With a small beta value, the model's latent space reveals the detailed structure of the training dataset (right). However, this latent space representation is specific to the training set and is not very useful for anomaly detection.