New findings reveal the surprisingly complex protein-coding capacity of the human cytomegalovirus, or HCMV, and provide the first steps toward understanding how the virus manipulates human cells during infection. The genome of the HCMV was first sequenced over 20 years ago, but researchers have now investigated the proteome—the complete set of expressed proteins—of this common pathogen as well.
HCMV is an incredibly successful virus, and it infects most humans on the planet. Birth defects and disease, however, are only known to occur in newborn infants and adults with compromised immune systems, respectively. But, the pathogen also has one of the largest viral genomes on record, with a massive 240,000 base pairs of DNA. (For comparison, the genome of the poliovirus only contains about 7,500 base pairs.)
Noam Stern-Ginossar from the University of California in San Francisco, along with colleagues from the United States and Germany, used a combination of techniques, including ribosome profiling and mass spectrometry, to study HCMV's proteome. The method could be used to investigate proteins produced by other viruses as well, they say.
The researchers' findings appear in the 23 November issue of the journal Science, which is published by AAAS, the nonprofit science society.
"The genome of a virus is just a starting point," explained Jonathan Weissman from the University of California, a co-author of the Science report. "Understanding what proteins are encoded by that genome allows us to start thinking about what the virus does and how we can interfere with it… Each of the proteins we've identified has the potential to tell us how this virus is manipulating its host cell."
Stern-Ginossar and the other researchers suspected that existing maps of HCMV's protein-coding potential, based largely on computational methods, were far from complete. So, they began mapping the positions of ribosomes—the cellular organelles in which proteins are synthesized—during an HCMV infection of human fibroblast cells. With the resulting map, Stern-Ginossar and her colleagues discovered templates for hundreds of previously unidentified proteins that were encoded in corresponding DNA segments of the viral genome, known as open reading frames.
Surprisingly, the researchers found that many of these open reading frames encode for exceptionally short protein sequences (fewer than 100 amino acids). And some of the newly identified open reading frames were even hiding inside other open reading frames, they say.
"A key finding of our work is that each of these templates can encode more than one protein," said Annette Michalski from the Max Planck Institute of Biochemistry in Martinsried, Germany, another co-author of the Science report. "And these extremely short proteins might be more common than we expect."
The researchers applied mass spectrometry to confirm the presence of many unknown viral proteins that had been predicted by mapping the ribosome positions.
In the future, this coupling of ribosome profiling with mass spectrometry might be used to investigate the proteomes of other complex viruses. Eventually, such information could be used to understand how different viruses hijack their hosts' cells for their own purposes.