Wake Forest Center for Precision Medicine Omics workshop Next Generation DNA Sequencing, Dr. Greg Hawkins

Presentation on Omics presented by the Center for Precision Medicine at Wake Forest School of Medicine
High Throughput Sequencing -Illumina Next Seq 500- Genomics: whole genome sequencing, exome sequencing, and targeted sequencing Transcriptomics: RNA-Seq, small RNA-Seq

Video Transcription

So, proteomics and mass spectrometry. Biology is complicated. If you haven't figured this out, yet, after the first few talks, I'll make it a little more complicated. So, we've talked about the genome, and how we can sequence the genome and find all the genes and find sequence variance. There's about 25,000 genes in our genome. As you heard from Jeanie, a lot of these genes as they get transcribed, there's alternative splice variants and different types of transcripts, so on average, we make about four transcripts per gene in any given cell, so that makes about 100,000 transcripts that you could find in a genome oriented organism. Now, to make this more complicated, every single one of these transcripts gets translated into a protein. At first approximation, there's some noncoding RNAs, but all the mRNAs get translated into proteins and they then get modified. We've all heard about these post translation modifications that regulate protein activity. Some proteins get cleaved in order to be active, some of them get phosphorylated or regulated another way, so on average, we think that, per transcript, we make 10 different forms of a particular protein, which means our organism, our body, probably makes about a million different proteins. So, genetics people have it easy, right? They only look at 25,000 genes, there's a million proteins.

So, how do you actually study these? Well, we use an instrumentation called the mass spectrometer. A mass spectrometer is essentially a giant, expensive, and highly sensitive kitchen scale. It weighs the molecular weight of molecules, specifically, it actually measures the molecular weight of ions, so what it essentially does is if you were interested in a particular protein or any other small molecule, as long as you can turn this into an ion, either with electric current or some other in an electric field, this instrument will be able to determine the molecular weight of that particular ion, so it weighs it, and then, remember, in proteins, since every amino acid has a specific molecular weight, the mass, or the molecular weight of a peptide or a protein, can be calculated back to the amino acid sequence. I'll come back to how we actually practically do this, but, let's look first of how a mass spectrometer actually looks. So, the first part is the source, where you basically generate ions, and about, I should know this, but I think about 15 years ago, two people won the Nobel prize because they were the first two to develop specific ionization methods of how you can take larger molecules, like proteins and peptides, and turn them into ions, and they're called electrospray ionization. This is basically like your little aerosol can, where you sort of spray a liquid, and then ionize the molecules that are in that sprayed liquid, or matrix-assisted laser desorption ionization, complicated words, we call it MALDI. This is, essentially, if you have a solid or something that's dried down and you want to see what's in there, you can basically ionize this with a laser directly from there. These two methodologies were developed, probably about 20 or 25 years ago and have revolutionized how we can use mass spectrometers. So, once you have ions, you sort of separate these ions by their properties, that is, by their mass, and by their charge. The more they're charged, they're different from less charged ions, heavier ions are different from lighter ions, and that's basically what we're measuring, here, and then, you have a detector at the end that basically measures this to give you the molecular weight of a particular ion and the relative abundance of this.

Usually, we couple this with a chromatography system up front because we don't always have, sort of, a pure molecule that we shoot into the mass spectrometer, but usually, a biological sample or a mixture, so we couple this to an HPLC system or a gas chromatography system or a capillary electrophoresis or any type of other system that allows us to separate molecules up front, macro molecules up front, by some property, and then we ionize them and can measure the molecular weight downstream. So, how do we use this for proteomics? Well, despite these great ionization methods, most of these are still not sufficiently effective to ionize intact proteins as whole molecules, because these are still relatively large, so what we usually do is we use enzymes that naturally occur that allow us to peptide, peptidases, that basically allow us to cut proteins into smaller peptide pieces, and most of these enzymes cut after specific amino acids, so we know exactly the defined pieces it would make of any given protein if we knew the amino acid sequence. Now, these proteolytic peptides can now be separated by chromatography, like, HBLC, and then, sort of get shot into the mass spectrometer, they all get ionized, and they all get weighed, so I get the spectrum like this, which is basically the molecular weight, or the mass, of any ion and the relative abundance, the peak height, of any ion that I see in that particular mixture. Now, that still doesn't quite tell you what the protein is. What the mass spectrometer, now, can do, is, it can dynamically select any of these ions and hit it against an inert gas, usually, we use nitrogen, or helium, or some other gas, and what that does to a peptide is it breaks it into smaller and smaller pieces. This is where biology is really wonderful, again, as a chemist, because the most labile bond in a peptide is that peptide bond between any two amino acids, so what usually happens if you now hit this molecule in the gas space, against other gases, it breaks exactly at that peptide bond, so you have, basically the mass minus one amino acid, because that fell off, the mass minus two amino acids, because they fell off, and so you get what we call a fragment spectrum.

Now, in order to identify this, we can basically match this against the reference database, right, because all our genomes are sequenced, so I could in theory predict what the protein sequence of a particular protein should be. I could translate it into amino acids. I can now, in silico, use the same digestion, get these peptides, and generate a theoretical spectrum, and then match the two, and basically, that fragment spectrum and the mass allows me to identify this protein. Now, what proteomics, basically, what this methodology can do, it can identify proteins in complex samples, any species, any tissue, at first approximation, and it can do that quantitatively, from cells, tissues, mitochondria, exosomes, microbiome, you name it, we can probably do it. There's methodologies of how we can multiplex multiple samples to allow effective direct relative quantification in comparison. Can also identify posttranslational modifications, and the instrument we most commonly use for this, in case you ever stumble into this term, is called an Orbitrap mass spectrometer.

This is the specific type of instrumentation that allows us to basically generate these data highly efficiently, on average, this instrument generates 20 mass spectra per second, and for any given complex sample we run this on, anywhere between three and 12 hours of mass spec time, so that gives you an idea of how many spectra we generate and how many data we generate for this. Just a quick example, so, this is an example where we study the mouse hypothalamus, really, really teeny tiny, so, a lot of these analyses are possible in a few milligrams of tissue samples. These mice were kept 12 hours without water, and then the control animals were, the hypothalamus was extracted, and the second group, basically, then, after the 12 hours, got water, and they were basically then killed, and the hypothalamus was isolated within five minutes of them starting the drinking, and then we did proteomic analysis of about five milligrams of tissue for all of these animals, and we identified, in this relatively survey analysis, about 2700 proteins and 40 phosphopeptides. So, this gives you a broad idea of how this methodology can be used.

Obviously, there's no limit how deep you can analyze a sample. If we fragment this complex mixture of hypothalamus proteins, or if we grind up, if we pool 10 hypothalamus samples and analyze them at depth, we can probably identify 5,000 or 10,000 proteins, but we can also analyze less complex samples, like blood samples, HDL particles, exosomes, anything along those lines that may have only a few hundred proteins in it. We can also do this with plasma. Plasma proteins are a little more complicated because plasma consists to 99.9% of 20 proteins, and the most abundant of which is albumin, and then there's IgG, and other common contributors, and most of what we are interested in are probably in that .1%, the very low abundance proteins, so that gives you sort of a broad overview of what we can do with mass spectrometry and how this methodology can be used for proteomics. Thank you.