Help :

LncVar database
Data statistics in each species
Data processing flow chart
Data processing (take human as an example)
LncRNA conservation
Upload your data

1. LncVar databaseTOP

      Long noncoding RNAs (lncRNAs) have been identified in various species, and play essential roles in many molecular processes. Genetic variations, including single nucleotide polymorphisms (SNPs) and structural variations, are widely distributed in the genome. Variations in the long noncoding gene loci may affect lncRNA transcripts from sequences, structures, expression levels to biological functions. Therefore we constructed LncVar, a database of long noncoding genes associated genetic variations in 6 species (human, mouse, zebrafish, worm, fruitfly, arabidopsis). It contains:
      SNPs in transcription factor binding sites (TFBS) of lncRNA gene promoters (TFBS were obtained from either ChIP-seq data or prediction).
      SNPs in TFBS that are spatially co-located with lncRNA promoters in DNA 3D organization (DNA 3D organization was based on 5C,Hi-C and DNase Hi-C data).
      SNPs in lncRNAs m6A modification regions (data from m6A-seq).
      SNPs in open reading frames of lncRNAs that could encode micropeptides (data from ribosome profiling).
      eQTLs of lncRNA genes (data from literatures).
      LncRNAs in Copy Number Variation Regions as prognostic biomarker candidates of various cancers (CNV, RNA-seq and clinical data are from TCGA).
      LncRNA genes involved fusion events predicted from the RNA-seq data of 7 cell lines from the ENCODE project.
      LncRNAs are from NONCODEv4. Conservation of lncRNAs among the 6 species are calculated by two methods (PhastCons and liftOver) in LncVar.
      Some SNPs in LncVar have also been reported to be associated with disease by genome-wide association studies. We add the associations to LncVar.

2. Data statistics in each speciesTOP

3. Data processing flow chartTOP

‹ ›

4. Data processing (take human as an example)TOP

      We obtained lncRNAs from NONCODEv4, and took 5kb upstream of lncRNA genes transcription start sites as promoters. We got SNPs from dbSNP database.

      We downloaded 690 ChIP-seq datasets from portals to ENCODE data at UCSC. We selected datasets of good quality, and without treatment to cell lines. We got 508 ChIP-seq datasets, including 84 cell lines and 137 transcriptional factors. These data were from 9 laboratories (Broad, Harvard, HudsonAlpha, Stanford, UChicago, USC, UT-A, UW, Yale). We obtained the peak regions (flanking 50bp of peak site) from narrowPeak files as TFBS, and found the TFBS that located in the promoters of lncRNA genes. We also got position weight matrix of 127 transcriptional factors from JASPAR database, and predicted TFBS in the promoters of lncRNA genes. SNPs in these TFBS were integrated into LncVar, and they might affect transcription of lncRNA genes.

      The spatial organization of genomes plays an essential rols in the regulation of gene expression.Using the newly developed chromosome conformation capture technologies (such as 3C,4C,5C,Hi-C), the spatial organization of genomes is being explored at unprecedented resolution. We obtained genome spatial contacts in five cell lines from literatures (PMID:22955621,25437436,24141950). We combined TFBS with genome spatial contacts, and found TFBS close to the promoter of lncRNA genes in spatial organization. We found SNPs in these TFBS and integrated the data into LncVar. These SNPs might affect transcription of lncRNA genes through long-range looping interactions.

      m6A is the most common and abundant modification on RNA molecules, but the biological significance of m6A modification remains largely unknown. With the advent of high-throughput sequencing technology, a new immunocapturing approach m6A-seq have been developed for transcriptome-wide localization of m6A in high resolusion. We obtained 32 m6A-seq data (PMID:22575960,22608085,24284625,24209618,25456834), mapped reads to hg19 genome and called peaks using MACS. We found all the possible m6A modification regions with a consensus sequence identified as RRACH in the peak regions located in lncRNAs, then we found SNPs in these m6A modification regions. These SNPs might affect m6A modification on lncRNAs.

      Ribosome profiling is a new developed technique that uses specialized messenger RNA sequencing to determine which mRNAs are being actively translated. Recent studies reported that many lncRNAs are bound by ribosomes through ribosome profiling, raising the possibility that they are translated into proteins. Several peptides encoded by lncRNAs have been reported to paly important roles in cellular regulation. We collected peptides that might be encoded by lncRNAs from literatures (PMID:22955977,23160002,24705786,24870543,25233276,25599403), and found synonymous and nonsynonymous SNPs in the ORF regions of lncRNAs. We integrated the position of SNP in ORF, the position of altered amino acid and the sequence of altered peptide ( if SNP is nonsynonymous) into LncVar.

      Expression quantitative trait loci (eQTLs) have brought insights into the regulation of lncRNAs. Two studies found eQTLs of hundreds of lncRNAs in two populations through combining genotype data and lncRNA expression levels (PMID:23341781,20220756). We also integrated these results into our database.

      Copy Number Variation (CNV) is a form of structural variation. CNVs might result in deletion or amplification of large regions of the genome. The expression levels of genes located in CNV regions might be affected. We obtained CNV regions of 30 cancers from TCGA Copy Number Portal at Broad Institute, and found lncRNA genes located in these regions. We got expression levels of lncRNAs and clinical data of patiens from TCGA and performed survival analysis. We identified lncRNAs as prognostic biomarker candidates (P-value<0.05, Log-Rank test) and plotted Kaplan-Meier curves for these candidates.

      Chromosome translocation, interstitial deletion and inversion might reult in the fusion of two previously separate genes. Fusion genes are usually oncogenes, and play important roles in tumorgenesis. Several computional methods have been developed to discover fused transcripts from RNA-seq data. We downloaded RNA-seq data of 7 cell lines from ENCODE project, and prediced lncRNA genes involved fusion events using deFuse and FusionMap. We integrated the predicted results into our database.

5. LncRNA conservationTOP

Two methods were performed to analyze conservation of lncRNAs. The first method is liftOver. We liftover the coordinates of lncRNA genes from one species to 8 other species. If the output satisfies minMatch=0.5 and minBlocks=0, then we consider the lncRNA gene to be potentially conserved. The second method is using phastCons data. PhastCons data were downloaded from UCSC in wig format. The average phastCons scores, ranging from 0 to 1, were calculated in exons of lncRNA genes. If the score is higher than 0.1, the lncRNA gene is considered conserved, if higher than 0.5, it is highly conserved.

6. Upload your dataTOP

If your data have been published, you could email the PubMed ID or the link to your data to chenxiaowei(at)moon.ibp.ac.cn, Please give a detailed desciption of your data. We will integrate your data to our database.