Title: | Isoform-Level Biological Interpretation of Transcriptomic Data |
---|---|
Description: | Isoform-level biological interpretation of transcriptomic data. |
Authors: | Lucio Rezende Queiroz [aut, cre, cph] , Izabela Mamede Conceicao [aut, ctb] , Luigi Marchionni [aut, ctb] , Gloria Franco [aut, ctb] |
Maintainer: | Lucio Rezende Queiroz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1.9003 |
Built: | 2024-12-29 08:00:48 UTC |
Source: | https://github.com/luciorq/isoformic |
Downloads reference annotation files from the GENCODE database for human or mouse genomes. Supports downloading GTF, GFF, and transcriptome FASTA files. The function handles directory creation and checks for existing files to avoid redundant downloads.
download_reference( version = "46", reference = "gencode", organism = c("human", "mouse"), file_type = c("gtf", "gff", "fasta"), output_path = "data-raw", timeout_limit = 3600, method = "auto" )
download_reference( version = "46", reference = "gencode", organism = c("human", "mouse"), file_type = c("gtf", "gff", "fasta"), output_path = "data-raw", timeout_limit = 3600, method = "auto" )
version |
A character string specifying the GENCODE release version.
For mouse references, include the letter 'M' in the version string (e.g., |
reference |
A character string specifying the source of the reference file.
Currently, only |
organism |
A character string specifying the organism.
Valid options are |
file_type |
A character string specifying the type of file to download.
Valid options are |
output_path |
A character string specifying the directory where the
downloaded file will be saved. Defaults to |
timeout_limit |
A numeric value specifying the maximum time in seconds for the
download to complete. This argument takes precedence over |
method |
A character string specifying the method used by
|
The function constructs the appropriate download URL based on the specified organism, version, and file type, and downloads the file to the specified output path. If the file already exists in the output directory, the function will not download it again and will return the existing file path. The function requires an internet connection and handles timeout settings to prevent download interruptions.
A character string with the full path to the downloaded file.
Currently, only "gencode"
reference files are supported.
The "mane"
reference is not implemented yet.
# Download human GTF file for GENCODE release 43 gtf_file <- download_reference( version = "43", organism = "human", file_type = "gtf", output_path = "data-raw" ) # Download mouse GTF file for GENCODE release M32 gtf_file_mouse <- download_reference( version = "M32", organism = "mouse", file_type = "gtf", output_path = "data-raw" ) # Download human transcriptome FASTA file for GENCODE release 43 fasta_file <- download_reference( version = "43", organism = "human", file_type = "fasta", output_path = "data-raw" )
# Download human GTF file for GENCODE release 43 gtf_file <- download_reference( version = "43", organism = "human", file_type = "gtf", output_path = "data-raw" ) # Download mouse GTF file for GENCODE release M32 gtf_file_mouse <- download_reference( version = "M32", organism = "mouse", file_type = "gtf", output_path = "data-raw" ) # Download human transcriptome FASTA file for GENCODE release 43 fasta_file <- download_reference( version = "43", organism = "human", file_type = "fasta", output_path = "data-raw" )
Adds a column to a transcript-level differential expression table indicating whether each transcript originates from a gene that is significantly differentially expressed.
is_deg_sig(DegsigVector, DET_table)
is_deg_sig(DegsigVector, DET_table)
DegsigVector |
A character vector containing the names of transcripts from significantly differentially expressed genes. |
DET_table |
A |
A tibble
with an additional column DEG_sig
indicating whether the transcript is from a significantly
differentially expressed gene ("YES"
or "NO"
).
# Sample data significant_transcripts <- c("transcript1", "transcript3") DET_table <- data.frame( transcript_name = c("transcript1", "transcript2", "transcript3", "transcript4"), log2FC = c(2.5, -1.2, 0.8, -0.5), pvalue = c(0.01, 0.2, 0.03, 0.6) ) # Annotate transcripts with DEG significance DET_table_annotated <- is_deg_sig(DegsigVector = significant_transcripts, DET_table = DET_table) # View the result print(DET_table_annotated)
# Sample data significant_transcripts <- c("transcript1", "transcript3") DET_table <- data.frame( transcript_name = c("transcript1", "transcript2", "transcript3", "transcript4"), log2FC = c(2.5, -1.2, 0.8, -0.5), pvalue = c(0.01, 0.2, 0.03, 0.6) ) # Annotate transcripts with DEG significance DET_table_annotated <- is_deg_sig(DegsigVector = significant_transcripts, DET_table = DET_table) # View the result print(DET_table_annotated)
Combines gene-level and transcript-level differential expression results into a single table, annotates the combined data with significance labels based on specified cutoffs, and filters transcripts based on their types.
join_DEG_DET(DEG_tab, DET_final_tab, logfc_cut, pval_cut)
join_DEG_DET(DEG_tab, DET_final_tab, logfc_cut, pval_cut)
DEG_tab |
A |
DET_final_tab |
A |
logfc_cut |
A numeric value specifying the absolute log2 fold-change cutoff for significance. |
pval_cut |
A numeric value specifying the p-value cutoff for significance. |
A tibble
combining gene and transcript differential expression results, with additional columns:
id
: gene or transcript ID.
name
: gene or transcript name.
transcript_type
: type of transcript or "gene"
for gene-level entries.
abs_log2FC
: absolute value of log2 fold-change.
significance
: "sig"
if significant based on cutoffs, "not_sig"
otherwise.
# Sample gene-level data DEG_tab <- data.frame( gene_id = c("gene1", "gene2"), gene_name = c("GeneA", "GeneB"), log2FC = c(1.5, -2.0), pvalue = c(0.01, 0.04) ) # Sample transcript-level data DET_final_tab <- data.frame( transcript_id = c("tx1", "tx2", "tx3"), transcript_name = c("Transcript1", "Transcript2", "Transcript3"), transcript_type = c("protein_coding", "lncRNA", "processed_transcript"), log2FC = c(1.2, -1.8, 0.5), pvalue = c(0.02, 0.03, 0.2) ) # Merge and annotate differential expression results DEGs_DETs_table <- join_DEG_DET( DEG_tab = DEG_tab, DET_final_tab = DET_final_tab, logfc_cut = 1, pval_cut = 0.05 ) # View the result print(DEGs_DETs_table)
# Sample gene-level data DEG_tab <- data.frame( gene_id = c("gene1", "gene2"), gene_name = c("GeneA", "GeneB"), log2FC = c(1.5, -2.0), pvalue = c(0.01, 0.04) ) # Sample transcript-level data DET_final_tab <- data.frame( transcript_id = c("tx1", "tx2", "tx3"), transcript_name = c("Transcript1", "Transcript2", "Transcript3"), transcript_type = c("protein_coding", "lncRNA", "processed_transcript"), log2FC = c(1.2, -1.8, 0.5), pvalue = c(0.02, 0.03, 0.2) ) # Merge and annotate differential expression results DEGs_DETs_table <- join_DEG_DET( DEG_tab = DEG_tab, DET_final_tab = DET_final_tab, logfc_cut = 1, pval_cut = 0.05 ) # View the result print(DEGs_DETs_table)
Extracts a transcript-to-gene mapping table from GENCODE annotation files, such as the transcriptome FASTA file. Currently, only FASTA files are supported.
make_tx_to_gene(file_path, file_type = c("fasta", "gff", "gtf"))
make_tx_to_gene(file_path, file_type = c("fasta", "gff", "gtf"))
file_path |
A character string specifying the path to the reference file (e.g., GENCODE FASTA file). |
file_type |
A character string specifying the type of the reference file. Currently, only |
The function reads the headers of the FASTA file and extracts relevant information to create a mapping table. For GTF or GFF3 files, support is not yet implemented.
A tibble
containing the transcript-to-gene mapping information, including transcript IDs, gene IDs,
transcript names, gene names, and transcript types.
# Assuming you have downloaded the GENCODE transcriptome FASTA file: fasta_file <- download_reference( version = "43", organism = "human", file_type = "fasta", output_path = "data-raw" ) # Create the transcript-to-gene mapping table tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta") # View the first few rows head(tx_to_gene)
# Assuming you have downloaded the GENCODE transcriptome FASTA file: fasta_file <- download_reference( version = "43", organism = "human", file_type = "fasta", output_path = "data-raw" ) # Create the transcript-to-gene mapping table tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta") # View the first few rows head(tx_to_gene)
Creates a bar plot of log2 fold-change values for transcripts of a selected gene, differentiating transcript types and significance levels.
plot_log2FC(DEG_DET_table, selected_gene, custom_colors = NULL)
plot_log2FC(DEG_DET_table, selected_gene, custom_colors = NULL)
DEG_DET_table |
A |
selected_gene |
A character string specifying the gene name to plot. |
custom_colors |
An optional named vector of colors for different transcript types. |
The function filters the input table for the selected gene and creates a bar plot of log2 fold-change values.
If all transcripts are significant, it plots without adjusting alpha transparency; otherwise, it adjusts alpha
based on significance. The function uses predefined colors for transcript types, which can be overridden
by providing custom_colors
.
A ggplot2
object representing the bar plot.
# Sample data DEGs_DETs_table <- data.frame( name = c("Transcript1", "Transcript2", "GeneA"), log2FC = c(1.5, -2.0, 0.8), transcript_type = c("protein_coding", "lncRNA", "gene"), significance = c("sig", "not_sig", "sig"), gene_name = c("GeneA", "GeneA", "GeneA") ) # Plot log2 fold-change for the selected gene plot_obj <- plot_log2FC( DEG_DET_table = DEGs_DETs_table, selected_gene = "GeneA" ) # Display the plot print(plot_obj)
# Sample data DEGs_DETs_table <- data.frame( name = c("Transcript1", "Transcript2", "GeneA"), log2FC = c(1.5, -2.0, 0.8), transcript_type = c("protein_coding", "lncRNA", "gene"), significance = c("sig", "not_sig", "sig"), gene_name = c("GeneA", "GeneA", "GeneA") ) # Plot log2 fold-change for the selected gene plot_obj <- plot_log2FC( DEG_DET_table = DEGs_DETs_table, selected_gene = "GeneA" ) # Display the plot print(plot_obj)
This function plots the genomic context of all transcripts of given genes.
plot_tx_context(exon_table, custom_colors = NULL)
plot_tx_context(exon_table, custom_colors = NULL)
exon_table |
a tibble with exon information.
Must contain columns |
custom_colors |
a vector of colors to use for each transcript. If not provided, the function will use the default colors. Actually, this argument is *NOT implemented yet. |
Plot Transcript per gene expression
plot_tx_expr(genes_to_plot, profile_data)
plot_tx_expr(genes_to_plot, profile_data)
genes_to_plot |
a character vector with gene names |
profile_data |
tibble output from |
a ggplot
object
Prepare annotation to be imported as rowRanges
and rowData
for both
Genes, Transcripts and Exons based Position Annotation Table.
From a GTF or GFF3 annotation file.
prepare_annotation(file_path, file_type = c("gtf", "gff"))
prepare_annotation(file_path, file_type = c("gtf", "gff"))
file_path |
Path to annotation file. |
file_type |
Character indicating the type of file to download.
One of |
Prepare Exon based Position Annotation Table
prepare_exon_annotation(gene_name, file_path, file_type = c("gff", "gtf"))
prepare_exon_annotation(gene_name, file_path, file_type = c("gff", "gtf"))
file_path |
Path to annotation file. |
file_type |
A character string specifying the type of file to download.
Valid options are |
gene_names |
String or vector of gene names to extract. |
This function processes gene and transcript-level expression data, along with differential expression results, to prepare a tidy data frame suitable for plotting expression profiles across different sample groups.
prepare_profile_data( txi_gene = NULL, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, de_result_transcript, var, var_levels, gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE )
prepare_profile_data( txi_gene = NULL, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, de_result_transcript, var, var_levels, gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE )
txi_gene |
A |
txi_transcript |
A |
sample_metadata |
A |
tx_to_gene |
A |
de_result_gene |
A |
de_result_transcript |
A |
var |
A string specifying the column name in |
var_levels |
A character vector specifying the levels of |
gene_col |
A string specifying the column name in |
tx_col |
A string specifying the column name in |
pvalue_cutoff |
A numeric value specifying the p-value cutoff for determining significant differential expression. Default is |
lfc_cutoff |
A numeric value specifying the log2 fold-change cutoff for determining significant differential expression. Default is |
use_fdr |
A logical value indicating whether to use the false discovery rate ( |
The function combines gene and transcript expression data with differential expression results to generate a tidy data frame. It filters significant genes and transcripts based on specified cutoffs and prepares the data for plotting expression profiles across specified sample groups.
A tibble
containing processed expression data and differential expression flags, ready for plotting.
# Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, # and de_result_transcript are pre-loaded data frames: # Prepare data for plotting expr_df <- prepare_profile_data( txi_gene = txi_gene, txi_transcript = txi_transcript, sample_metadata = sample_metadata, tx_to_gene = tx_to_gene, de_result_gene = de_result_gene, de_result_transcript = de_result_transcript, var = "condition", var_levels = c("control", "treatment"), gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE ) # View the prepared data head(expr_df) # Plotting example (assuming ggplot2 is installed) library(ggplot2) ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) + geom_bar(stat = "identity", position = position_dodge()) + facet_wrap(~ parent_gene + transcript_type)
# Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, # and de_result_transcript are pre-loaded data frames: # Prepare data for plotting expr_df <- prepare_profile_data( txi_gene = txi_gene, txi_transcript = txi_transcript, sample_metadata = sample_metadata, tx_to_gene = tx_to_gene, de_result_gene = de_result_gene, de_result_transcript = de_result_transcript, var = "condition", var_levels = c("control", "treatment"), gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE ) # View the prepared data head(expr_df) # Plotting example (assuming ggplot2 is installed) library(ggplot2) ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) + geom_bar(stat = "identity", position = position_dodge()) + facet_wrap(~ parent_gene + transcript_type)
Performs gene set enrichment analysis (GSEA) on differential expression results for various transcript types,
using the fgsea
package. The function iterates over specified transcript types, filters the data accordingly,
and runs GSEA for each type.
run_enrichment(det_df, genesets_list, pval_cutoff = 0.05, lfc_cutoff = 1)
run_enrichment(det_df, genesets_list, pval_cutoff = 0.05, lfc_cutoff = 1)
det_df |
A |
genesets_list |
A list of gene sets to be used in the enrichment analysis. |
pval_cutoff |
A numeric value specifying the p-value cutoff for the enrichment results. Default is |
lfc_cutoff |
A numeric value specifying the log2 fold-change cutoff for filtering transcripts. Default is |
The function defines a list of transcript types and their corresponding labels.
It then filters the input differential expression data for each transcript type, ranks the genes by log2 fold-change,
and performs GSEA using the fgsea
package.
A tibble
containing the enrichment analysis results for each transcript type, including pathway names,
p-values, adjusted p-values, and the transcript type (experiment).
# Sample differential expression data det_df <- data.frame( gene_name = c("GeneA", "GeneB", "GeneC", "GeneD"), transcript_type = c( "protein_coding", "retained_intron", "processed_transcript", "nonsense_mediated_decay" ), log2FC = c(1.5, -2.0, 0.8, -1.2) ) # Sample gene sets genesets_list <- list( Pathway1 = c("GeneA", "GeneC"), Pathway2 = c("GeneB", "GeneD") ) # Run enrichment analysis fgsea_results_df <- run_enrichment( det_df = det_df, genesets_list = genesets_list, pval_cutoff = 0.05, lfc_cutoff = 1 ) # View the results print(fgsea_results_df)
# Sample differential expression data det_df <- data.frame( gene_name = c("GeneA", "GeneB", "GeneC", "GeneD"), transcript_type = c( "protein_coding", "retained_intron", "processed_transcript", "nonsense_mediated_decay" ), log2FC = c(1.5, -2.0, 0.8, -1.2) ) # Sample gene sets genesets_list <- list( Pathway1 = c("GeneA", "GeneC"), Pathway2 = c("GeneB", "GeneD") ) # Run enrichment analysis fgsea_results_df <- run_enrichment( det_df = det_df, genesets_list = genesets_list, pval_cutoff = 0.05, lfc_cutoff = 1 ) # View the results print(fgsea_results_df)