OverviewSee the associated study (link TBC) for full details on the dataset and analysis.This repository contains a large dataset of raw soundscape recordings and human-in-the-loop AI generated sonotypes detections (fish and invertebrate sounds) from these soundscapes. This data was used to conduct a global study investigating the recovery of coral reef ecosystem functions following active restoration. Data were collected from 45 tropical reef sites across Australia, Indonesia, Kenya, Mexico, and the Maldives, totalling over 1year of raw audio.Each site is one of four habitat types: degraded, healthy, early-stage restored (<3 months), and mid-stage restored (32–53 months). All restored plots followed a standardised rubble stabilisation and coral out-planting methodology (MARRS, buildingcoral.com). At each site, a HydroMoth acoustic recorder was deployed, recording on a 1-in-4-minute duty cycle (except Indonesia: 1-in-2-minute) for approximately 1 month. Data were gathered between October 2021 and June 2023. These raw soundscapes recordings total just under 1 TB of .WAV files once unzipped, sampled at 16 kHz.Sonotype detections were generated using a human-in-the-loop agile modelling pipeline, keeping only high-confidence detections (logit score ≥ 1.0). A manual review found a false positive rate of 2.90% at this logit threshold. This pipeline combined SurfPerch (Williams et al., 2025) with agile modelling (Dumoulin et al., 2025). See the associated manuscript to this dataset for more details.The resulting dataset was used to infer and compare four key ecosystem functions between habitat types:Fish diversity (acoustic diversity)Recruitment cuescape (night-time fish sounds)Herbivory (audible parrotfish grazing)Sediment bioturbation (snapping shrimp activity)However, we anticipate it could be used for many other purposes.ContentsManifest: manifest.csv lists all zip files, file counts, total sizes, and SHA256 checksums to verify integrity after download.Raw audio: 45 country_site.zip files containing all 1-minute HydroMoth recordings. Each zip corresponds to one site with 12k files on average, at 1.83 MB per .WAV (a tiny number are of a reduced size as recorder batteries reached end of life).Sonotype detections: detections_country.zip files containing 5-second audio clips of all sonotype detections. Each country zip contains subfolders by sonotype (e.g. growl, scrape).File naming schemeSite-level zips:Pattern: country_site.zipExplanation: country = one of aus (Australia), ind (Indonesia), ken (Kenya), mal (Maldives), mex (Mexico) site = site code (H = healthy, D = degraded, R = mid-stage restored, N = early-stage restored) + site ID number. Site ID number counts up sequently for each country. There is no relationships between ID number across habitat types, meaning H1 and D1 are not part of a 'paired design'.Example: ind_H1.zip → Indonesia, healthy site 1Raw audio files (within site level zips):Pattern: country_site_YYYYMMDD_HHMMSS.wavExplanation: Same country and site codes as site-level zips, YYYYMMDD_HHMMSS = local timestamp (Audiomoth standard)Example: aus_D1_20230207_120400.wav → Australia, degraded site 1, 7 Feb 2023, 12:04pm local timeSonotype detections (within detection_country.zips):Pattern: logit-SCORE_start-XXs_country_site_YYYYMMDD_HHMMSS.wavExplanation: SCORE = model logit (≥1.0 retained; higher score = higher model confidence in the detection) start-XXs = offset of the 5-second detection window within the original raw file country_site_YYYYMMDD_HHMMSS = as per raw audio filesExample: logit-1.22_start-15s_aus_H3_20230227_085200.wav → logit score = 1.22; the 5sec detection window runs between 15 - 20 sec within the source file; Australia, healthy site 3; 27 Feb 2023, 08:52 local time.NotesDownloading large files via a web browser may fail on slow or unreliable internet connections. We recommend using tools that support resumable downloads (e.g. rclone, wget -c, or curl -C -) with the Figshare API download links.A Google Earth project containing a map and pins for all 45 sites is available.All code used during analysis for the stuidy can be found in a with a GitHub repository. Of most use will be:reefs_embed.py → embed audio with SurfPerchreefs_agile.ipynb → run agile modellingwrite_detections.py → export detections from raw audio using the detections.csv filesHowever, we now recommend the better supported Perch Hoplite repository for getting started with embeddings and agile modelling.This dataset is released under a Creative Commons Attribution (CC BY 4.0) licence to encourage reuse. Please use for any purpose but credit the authors by citing our study.Full Author ListBen Williams, University College London & Zoological Society of London, UKAya Naseem, Maldives Coral Institute, MaldivesGaby Nava, Oceanus A.C, MexicoAngus Roberts, The Ocean Trust, KenyaFreda Nicholson, Mars, AustraliaMars Coral Restoration Project Monitoring Team, Mars, IndonesiaTimothy Lamont, Lancaster University, UKTries Razak, Institut Pertanian Bogor University, IndonesiaAimee du Luart, UCL, UKDave Erasmus, Aqoustics, UKOllie Stoole, Aqoustics, UKAlistair Whittick, Aqoustics, UKRory Gibb, UCL, UKSteve Simpson, University of Bristol, UKDavid Curnick, Zoological Society of London, UKKate E. Jones, University College London, UK