In order to generate data produce a generalizable sequence-to-expression model, we randomly sampled ~21 million 80 bp sequences and tested their activity as promoters (in yeast) by measuring expression level by FACS (sorting into 18 bins). Overall design: Here, all libraries tested have the following sequences flanking the random 80 bp oligo: the pTpA distal region was (pT) GCTAGCAGGAATGATGCAAAAGGTTCCCGATTCGAACTGCATTTTTTTCACATC and proximal region (pA) was GGTTACGGCTGTTTCTTAATTAAAAAAAGATAGAAAACATTAGGAGTGTAACACAAGACTTTCGGATCCTGAGCAGGCAAGATAAACGA (up to the theoretical TSS). 80 Ns were inserted in between proximal and distal regions. Here, both experiments represent yeast grown in SD-Ura (Sunrise Sciences), and the designed-sequence library used a strain of S288C with Ura3 deleted, while the large-scale random promoter library used strain Y8205 (Boone Lab).