Figure 1.
Accumulation of protein sequences of unknown function in the genome databases. Open symbols indicate the total number of protein sequences encoded in prokaryotic (blue) and eukaryotic (red) genomes; filled symbols indicate the number of “hypothetical” or “uncharacterized” proteins. The data are taken from the NCBI’s RefSeq database [68]; the numbers for 2010 are extrapolated from the first 4 months.