README for *.PAS.txt Last updated: August 27, 2018 E-mail: rw479@njms.rutgers.edu / btian@wistar.org #################################################### *.PAS.txt is a tab-delimited file containing all annotated PASs based on information from RefSeq, Ensembl, and FANTOM5 (human only). Columns: 1) PAS_ID: ID of each PAS, shown as Chromosome:Position:Strand 2) Chromosome: chromosome ID of PAS 3) Position: genomic position of PAS 4) Strand: strand information of PAS 5) Mean RPM: Average reads per million of PAS reads across all samples 6) Intron/exon location: PAS location relative to splicing configuration, including 5'-most exon, internal exon, 3'-most exon, and intron. If a gene has only one exon, it is called single exon. 7) Ensembl Gene ID: Ensembl gene ID of the gene containing the PAS 8) RefSeq Gene ID: RefSeq gene ID of the gene containing the PAS 9) Gene Symbol: Gene symbol from RefSeq 10) Gene Name: Gene name from RefSeq 11) Extension: Whether PAS is located on an extended 3' end region beyond RefSeq/Ensembl annotations, Yes/No 12) PAS type: PAS location in annotated genes, including 5'UTR, CDS, 3'UTR and intergenic. For PASs in 3'UTRs, they are further divided into First (F), Middle (M), and Last (L). If there is only one PAS in 3'UTR, it is called S. 13) PSE: Percentage of samples with detected expression of all samples. 14) PAS Signal: The PAS signal is located within 40-nt upstream from the PAS, including AAUAAA, AUUAAA, Other (including AGTAAA, TATAAA, CATAAA, GATAAA, AATATA, AATACA, AATAGA, AAAAAG, and ACTAAA), A-rich (AAAAAA), and None. 15) Conservation: Whether the PAS in conserved in at least two mammals (human, mouse and rat). 16) FAMTOM ID (human only): FAMTOM lncRNA ID of the gene containing the human PAS 17) FANTOM_Category (human only): LncRNA category defined by the FANTOM5 18) Intergenic_TE: Whether the intergenic PAS is located on an ±24 bp region of an annoated transposable element, Yes/No References: ========== 1. Wang R, Zheng D, Yehia G, & Tian B. (2018). A compendium of conserved cleavage and polyadenylation events in mammals. Genome Research 28(10):1427-1441. 2. Wang R, Nambiar R, Zheng D, & Tian B (2017). PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes. Nucleic Acids Research 46(D1):D315-D319. 3. Zheng D, Liu X, Tian B. (2016). 3'READS+, a sensitive and accurate method for 3' end sequencing of polyadenylated RNA. RNA 22:1631-9. 4. Hoque M*, Ji Z*, Zheng D, Luo W, Li W, You B, Park JY, Yehia G, Tian B. (2013). Analysis of alternative cleavage and polyadenylation by 3' region extraction and deep sequencing. Nature Methods 10:133-9.