We obtained lncRNAs from NONCODEv4, and took 5kb upstream of lncRNA genes transcription start sites as promoters. We got SNPs from dbSNP database.
We downloaded 690 ChIP-seq datasets from portals to ENCODE data at UCSC. We selected datasets of good quality, and without treatment to cell lines. We got 508 ChIP-seq datasets, including 84 cell lines and 137 transcriptional factors. These data were from 9 laboratories (Broad, Harvard, HudsonAlpha, Stanford, UChicago, USC, UT-A, UW, Yale). We obtained the peak regions (flanking 50bp of peak site) from narrowPeak files as TFBS, and found the TFBS that located in the promoters of lncRNA genes. We also got position weight matrix of 127 transcriptional factors from JASPAR database, and predicted TFBS in the promoters of lncRNA genes. SNPs in these TFBS were integrated into LncVar, and they might affect transcription of lncRNA genes.
The spatial organization of genomes plays an essential rols in the regulation of gene expression.Using the newly developed chromosome conformation capture technologies (such as 3C,4C,5C,Hi-C), the spatial organization of genomes is being explored at unprecedented resolution. We obtained genome spatial contacts in five cell lines from literatures (PMID:22955621,25437436,24141950). We combined TFBS with genome spatial contacts, and found TFBS close to the promoter of lncRNA genes in spatial organization. We found SNPs in these TFBS and integrated the data into LncVar. These SNPs might affect transcription of lncRNA genes through long-range looping interactions.
m6A is the most common and abundant modification on RNA molecules, but the biological significance of m6A modification remains largely unknown. With the advent of high-throughput sequencing technology, a new immunocapturing approach m6A-seq have been developed for transcriptome-wide localization of m6A in high resolusion. We obtained 32 m6A-seq data (PMID:22575960,22608085,24284625,24209618,25456834), mapped reads to hg19 genome and called peaks using MACS. We found all the possible m6A modification regions with a consensus sequence identified as RRACH in the peak regions located in lncRNAs, then we found SNPs in these m6A modification regions. These SNPs might affect m6A modification on lncRNAs.
Ribosome profiling is a new developed technique that uses specialized messenger RNA sequencing to determine which mRNAs are being actively translated. Recent studies reported that many lncRNAs are bound by ribosomes through ribosome profiling, raising the possibility that they are translated into proteins. Several peptides encoded by lncRNAs have been reported to paly important roles in cellular regulation. We collected peptides that might be encoded by lncRNAs from literatures (PMID:22955977,23160002,24705786,24870543,25233276,25599403), and found synonymous and nonsynonymous SNPs in the ORF regions of lncRNAs. We integrated the position of SNP in ORF, the position of altered amino acid and the sequence of altered peptide ( if SNP is nonsynonymous) into LncVar.
Expression quantitative trait loci (eQTLs) have brought insights into the regulation of lncRNAs. Two studies found eQTLs of hundreds of lncRNAs in two populations through combining genotype data and lncRNA expression levels (PMID:23341781,20220756). We also integrated these results into our database.
Copy Number Variation (CNV) is a form of structural variation. CNVs might result in deletion or amplification of large regions of the genome. The expression levels of genes located in CNV regions might be affected. We obtained CNV regions of 30 cancers from TCGA Copy Number Portal at Broad Institute, and found lncRNA genes located in these regions. We got expression levels of lncRNAs and clinical data of patiens from TCGA and performed survival analysis. We identified lncRNAs as prognostic biomarker candidates (P-value<0.05, Log-Rank test) and plotted Kaplan-Meier curves for these candidates.
Chromosome translocation, interstitial deletion and inversion might reult in the fusion of two previously separate genes. Fusion genes are usually oncogenes, and play important roles in tumorgenesis. Several computional methods have been developed to discover fused transcripts from RNA-seq data. We downloaded RNA-seq data of 7 cell lines from ENCODE project, and prediced lncRNA genes involved fusion events using deFuse and FusionMap. We integrated the predicted results into our database.