What is CAGE technology, the cap-analysis of gene expression?

How genes work is one of the major questions for biological processes. A question that has been raised during the last ten or eight years is whether or not certain regions of a genome are the key players in regulating a gene’s activity. We needed to develop a method that allows researchers to know about transcription factors and the regulatory elements that are combining with genes and giving it or suppressing activity.

CAGE accurately marks the gene’s transcription start activity. It has the power of classical mRNA-seq techniques, so you can determine how the gene works, whether it’s active, passive, or differently regulated, by mapping only the head of RNA. The exact position of this so-called transcription start site gives you a lot of cues for how other regulating protein parts in a genome or chromatin are interacting and what kind of additional regulators are there. The resolution of these kinds of marks in a genome is extremely high at an extremely precise level, which had been inaccessible by previous teams.

What are the key advantages of choosing CAGE over other methods?

The typical RNA-seq analysis uses the tail of RNA as a kind of start point. On the other hand, CAGE uses the head. Therefore, it gives us the ability to see not only the protein-coding genes, but also the long non-coding RNAs, and recently discovered enhancer RNAs as well. If we’re talking about different techniques for expression analysis, microarray and RNA-seq have evolved, but very often from the same gene, you can have several starting sites, which are hard to detect by microarray and RNA-seq. Transcription can be initiated with different sites, which technically gives you various types of proteins, or involve different types of transcriptional factors, which again cannot be accessed by any other methods. By the same input, CAGE gives two or three times higher output, both scientifically and technically.

What kinds of research have you done using CAGE analysis? Has it helped improve the results of your research by being faster and more in-depth than other methods?

One of the projects in which we are actively involving CAGE technology is space biology. For many years, researchers have been questioning if there is any specific gene regulation in space. There was a recent Russian-Japanese program, AQH (Aquatic Habitat), where an aquarium with zebrafish was delivered to space. CAGE gives straightforward answers to the questions: Is there anything strange happening with gene regulation in space? Can they expect new, unknown transcription factors? Are there enhancers involved in specifically reacting with alternative gravity? Can we see anything that can signal to us that there are damaging effects with space radiation? If you think about gene approaches, CAGE seems to be very much in line with these questions and objectives.

Another example I will mention is in desiccation preservation. We are using the larvae of a type of a bloodworm from Africa. You can take these larvae and desiccate it, take it to outer space, put it in a deep freezer, heat it with amazing doses of radiation, and it will come back to life. It was fundamentally interesting to see seeds for desiccation preservative technologies hidden in the genome. Again, the question is: What are the key parts of a genome that try to signal and control those genes involved in desiccation resistance? We need to know how the gene works and where exactly those yet unknown regulatory parts of the genome are because no other insects can do so. CAGE seems to be an extremely high-powered weapon to assess it, especially considering the requirements. CAGE is pretty cost-performance effective.

Microarrays, full length RNA-seq and CAGE sequencing all perform genome-wide gene expression analysis. How do researchers choose between these technologies when they need transcriptome analysis?

The microarray was the first to be developed. The advantage of the microarray is that you can see very fine changes in gene expression because the sensitivity is very high. At the same time, you need a platform. You need to know what genome you’re seeing. If you take some unknown creature, microarray will not be much help here.

RNA-seq was developed to fill this gap. Using this method, you can take unknown creatures or plants, and try to reconstruct the genes. Microarrays focus on the changes in gene expression. The microarray cannot discriminate between splicing- and SNP- variants. In that sense, RNA-seq is superior. But think about the next step where you want to see a more complex picture of how the genome pieces interconnect. You will want to see how they are regulated and what kind of key enhancer parts are involved in the life of a given gene. The only answer so far is CAGE because it requires one-tenth of the total reads to reach the same level as other methods for mapping the activity of a gene. CAGE is probably the most obvious choice because it gives you five times more information.

We know that CAGE is a powerful technology for studying transcriptomes. Do you have any advice on using CAGE to study diseases or other areas of molecular biology?

The answer is yes. To some extent, both microarray and transcriptome analysis reached a plateau because there is a lot of accumulated data and we now see how the classical approach with microarray (which shows changes of gene expression level), as well as RNA-seq (where we’ll see how genes are activated), contribute. These methods cannot explain the larger part of the problem with the diseases we face. Even if a gene is okay, the problem can come from a necessary activator, and we cannot find the proper site. Sometimes the gene part is mutated, and the enhancer part is damaged. Let’s take the data from microarray of diseases in a healthy person. Microarray shows no difference at all. Does this mean we should forget about this gene? Not at all. We take a look at the transcriptome, and we see the gene stays stable, so the expression doesn’t change. Does this mean we should forget about this gene? No. There are many cases in which in the same region of the genome, there are several initiation sites for transcription.

But the question is: Why start from those microarray and transcriptome approaches? If we study human genes, we would want to use the most advantageous method. This is why CAGE works. The same question applies when finding new biomarkers. Let’s say we have an HIV-infected person and we want to take their blood in the early stages with no virus present yet. If you use microarrays, you’ll see changes in expression and transcriptomes. But if you use CAGE, you’ll get many times more information, and it gives you variations produced from the same gene. You have a much better chance to see what can be different. It gives you more variety and more potential biomarkers. In that sense, CAGE gives you a lot of advantages over classical transcriptome analysis.

Are there any aspects of CAGE that need improvement?

Yes. Compared to classical transcriptome approaches, CAGE is still a bit too highly technological. It gives you a lot of output and contributes a lot. It will probably be very beneficial once it has made a premix or becomes easier for end users. Another thing is to reduce the price. Because it involves many steps and careful selection, the price of one round of analysis by microarray transcriptome is a bit different, with CAGE being the most expensive. Once you’ve made comparable pricing, people will probably forget about transcriptome and switch to CAGE in many applications because it provides more information from the same set of data.