Context Autoencoders for Pretraining Foundation Models for Satellite Imagery Tasks

Sorry, the audio is not too good!

Done with Cynthia Kamikazi, Maureen Gatimu and Scovia Achan.

Self-supervised pre-training has been shown to improve the performance of vision models on downstream tasks, especially where labelled data is scarce. It is therefore particularly relevant in the earth observation and satellite imagery domain where, though there are massive satellite imagery datasets, there are few and small labelled datasets. We use context autoencoder (CAE), a masked image modelling self-supervised pretraining scheme, on satellite imagery. Our experiments show that CAE performs comparably to masked autoencoder (our baseline) on both the image reconstruction pretext task and land-use classification downstream tasks, while achieving slightly better performance on a flood mapping segmentation tasks. The code is available here.

See project manuscript here. (It was a hurried submission 🫣)