Package 'isoformic'

Title: Isoform-Level Biological Interpretation of Transcriptomic Data
Description: Isoform-level biological interpretation of transcriptomic data.
Authors: Lucio Rezende Queiroz [aut, cre, cph] , Izabela Mamede Conceicao [aut, ctb] , Luigi Marchionni [aut, ctb] , Gloria Franco [aut, ctb]
Maintainer: Lucio Rezende Queiroz <[email protected]>
License: MIT + file LICENSE
Version: 0.1.1.9003
Built: 2024-12-29 08:00:48 UTC
Source: https://github.com/luciorq/isoformic

Help Index


Download Reference Files from GENCODE

Description

Downloads reference annotation files from the GENCODE database for human or mouse genomes. Supports downloading GTF, GFF, and transcriptome FASTA files. The function handles directory creation and checks for existing files to avoid redundant downloads.

Usage

download_reference(
  version = "46",
  reference = "gencode",
  organism = c("human", "mouse"),
  file_type = c("gtf", "gff", "fasta"),
  output_path = "data-raw",
  timeout_limit = 3600,
  method = "auto"
)

Arguments

version

A character string specifying the GENCODE release version. For mouse references, include the letter 'M' in the version string (e.g., "M32"). Default is "46".

reference

A character string specifying the source of the reference file. Currently, only "gencode" is supported. Default is "gencode".

organism

A character string specifying the organism. Valid options are "human" or "mouse".

file_type

A character string specifying the type of file to download. Valid options are "gtf", "gff", or "fasta". Defaults to "gtf". Note: "fasta" refers to the transcriptome FASTA file.

output_path

A character string specifying the directory where the downloaded file will be saved. Defaults to "data-raw".

timeout_limit

A numeric value specifying the maximum time in seconds for the download to complete. This argument takes precedence over options("timeout"). Defaults to 3600 seconds (1 hour).

method

A character string specifying the method used by utils::download.file(). Defaults to "auto".

Details

The function constructs the appropriate download URL based on the specified organism, version, and file type, and downloads the file to the specified output path. If the file already exists in the output directory, the function will not download it again and will return the existing file path. The function requires an internet connection and handles timeout settings to prevent download interruptions.

Value

A character string with the full path to the downloaded file.

Note

Currently, only "gencode" reference files are supported. The "mane" reference is not implemented yet.

Examples

# Download human GTF file for GENCODE release 43
gtf_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "gtf",
  output_path = "data-raw"
)

# Download mouse GTF file for GENCODE release M32
gtf_file_mouse <- download_reference(
  version = "M32",
  organism = "mouse",
  file_type = "gtf",
  output_path = "data-raw"
)

# Download human transcriptome FASTA file for GENCODE release 43
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
)

Annotate Transcripts with Differential Gene Expression Significance

Description

Adds a column to a transcript-level differential expression table indicating whether each transcript originates from a gene that is significantly differentially expressed.

Usage

is_deg_sig(DegsigVector, DET_table)

Arguments

DegsigVector

A character vector containing the names of transcripts from significantly differentially expressed genes.

DET_table

A data.frame or tibble containing transcript-level differential expression results, including a transcript_name column.

Value

A tibble with an additional column DEG_sig indicating whether the transcript is from a significantly differentially expressed gene ("YES" or "NO").

Examples

# Sample data
significant_transcripts <- c("transcript1", "transcript3")
DET_table <- data.frame(
  transcript_name = c("transcript1", "transcript2", "transcript3", "transcript4"),
  log2FC = c(2.5, -1.2, 0.8, -0.5),
  pvalue = c(0.01, 0.2, 0.03, 0.6)
)

# Annotate transcripts with DEG significance
DET_table_annotated <- is_deg_sig(DegsigVector = significant_transcripts, DET_table = DET_table)

# View the result
print(DET_table_annotated)

Merge Gene and Transcript Level Differential Expression Tables

Description

Combines gene-level and transcript-level differential expression results into a single table, annotates the combined data with significance labels based on specified cutoffs, and filters transcripts based on their types.

Usage

join_DEG_DET(DEG_tab, DET_final_tab, logfc_cut, pval_cut)

Arguments

DEG_tab

A data.frame or tibble containing gene-level differential expression results, including gene_id, gene_name, log2FC, and pvalue columns.

DET_final_tab

A data.frame or tibble containing transcript-level differential expression results, including transcript_id, transcript_name, transcript_type, log2FC, and pvalue columns.

logfc_cut

A numeric value specifying the absolute log2 fold-change cutoff for significance.

pval_cut

A numeric value specifying the p-value cutoff for significance.

Value

A tibble combining gene and transcript differential expression results, with additional columns:

  • id: gene or transcript ID.

  • name: gene or transcript name.

  • transcript_type: type of transcript or "gene" for gene-level entries.

  • abs_log2FC: absolute value of log2 fold-change.

  • significance: "sig" if significant based on cutoffs, "not_sig" otherwise.

Examples

# Sample gene-level data
DEG_tab <- data.frame(
  gene_id = c("gene1", "gene2"),
  gene_name = c("GeneA", "GeneB"),
  log2FC = c(1.5, -2.0),
  pvalue = c(0.01, 0.04)
)

# Sample transcript-level data
DET_final_tab <- data.frame(
  transcript_id = c("tx1", "tx2", "tx3"),
  transcript_name = c("Transcript1", "Transcript2", "Transcript3"),
  transcript_type = c("protein_coding", "lncRNA", "processed_transcript"),
  log2FC = c(1.2, -1.8, 0.5),
  pvalue = c(0.02, 0.03, 0.2)
)

# Merge and annotate differential expression results
DEGs_DETs_table <- join_DEG_DET(
  DEG_tab = DEG_tab,
  DET_final_tab = DET_final_tab,
  logfc_cut = 1,
  pval_cut = 0.05
)

# View the result
print(DEGs_DETs_table)

Create Transcript-to-Gene Relationship Table

Description

Extracts a transcript-to-gene mapping table from GENCODE annotation files, such as the transcriptome FASTA file. Currently, only FASTA files are supported.

Usage

make_tx_to_gene(file_path, file_type = c("fasta", "gff", "gtf"))

Arguments

file_path

A character string specifying the path to the reference file (e.g., GENCODE FASTA file).

file_type

A character string specifying the type of the reference file. Currently, only "fasta" is supported. Default is "fasta".

Details

The function reads the headers of the FASTA file and extracts relevant information to create a mapping table. For GTF or GFF3 files, support is not yet implemented.

Value

A tibble containing the transcript-to-gene mapping information, including transcript IDs, gene IDs, transcript names, gene names, and transcript types.

Examples

# Assuming you have downloaded the GENCODE transcriptome FASTA file:
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
)

# Create the transcript-to-gene mapping table
tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta")

# View the first few rows
head(tx_to_gene)

Plot Log2 Fold-Change Results for Selected Genes

Description

Creates a bar plot of log2 fold-change values for transcripts of a selected gene, differentiating transcript types and significance levels.

Usage

plot_log2FC(DEG_DET_table, selected_gene, custom_colors = NULL)

Arguments

DEG_DET_table

A data.frame or tibble containing combined gene and transcript differential expression results, including name, log2FC, transcript_type, significance, and gene_name columns.

selected_gene

A character string specifying the gene name to plot.

custom_colors

An optional named vector of colors for different transcript types.

Details

The function filters the input table for the selected gene and creates a bar plot of log2 fold-change values. If all transcripts are significant, it plots without adjusting alpha transparency; otherwise, it adjusts alpha based on significance. The function uses predefined colors for transcript types, which can be overridden by providing custom_colors.

Value

A ggplot2 object representing the bar plot.

Examples

# Sample data
DEGs_DETs_table <- data.frame(
  name = c("Transcript1", "Transcript2", "GeneA"),
  log2FC = c(1.5, -2.0, 0.8),
  transcript_type = c("protein_coding", "lncRNA", "gene"),
  significance = c("sig", "not_sig", "sig"),
  gene_name = c("GeneA", "GeneA", "GeneA")
)

# Plot log2 fold-change for the selected gene
plot_obj <- plot_log2FC(
  DEG_DET_table = DEGs_DETs_table,
  selected_gene = "GeneA"
)

# Display the plot
print(plot_obj)

Plot Transcript Genomic Context

Description

This function plots the genomic context of all transcripts of given genes.

Usage

plot_tx_context(exon_table, custom_colors = NULL)

Arguments

exon_table

a tibble with exon information. Must contain columns tx_id, exon_left, and exon_right.

custom_colors

a vector of colors to use for each transcript. If not provided, the function will use the default colors. Actually, this argument is *NOT implemented yet.


Plot Transcript per gene expression

Description

Plot Transcript per gene expression

Usage

plot_tx_expr(genes_to_plot, profile_data)

Arguments

genes_to_plot

a character vector with gene names

profile_data

tibble output from prepare_profile_data

Value

a ggplot object


Prepare Annotation

Description

Prepare annotation to be imported as rowRanges and rowData for both Genes, Transcripts and Exons based Position Annotation Table. From a GTF or GFF3 annotation file.

Usage

prepare_annotation(file_path, file_type = c("gtf", "gff"))

Arguments

file_path

Path to annotation file.

file_type

Character indicating the type of file to download. One of "gtf" or "gff". Defaults to "gtf".


Prepare Exon based Position Annotation Table

Description

Prepare Exon based Position Annotation Table

Usage

prepare_exon_annotation(gene_name, file_path, file_type = c("gff", "gtf"))

Arguments

file_path

Path to annotation file.

file_type

A character string specifying the type of file to download. Valid options are "gtf", "gff", or "fasta". Defaults to "gtf". Note: "fasta" refers to the transcriptome FASTA file.

gene_names

String or vector of gene names to extract.


Prepare Data for Gene and Transcript Expression Profile Plot

Description

This function processes gene and transcript-level expression data, along with differential expression results, to prepare a tidy data frame suitable for plotting expression profiles across different sample groups.

Usage

prepare_profile_data(
  txi_gene = NULL,
  txi_transcript,
  sample_metadata,
  tx_to_gene,
  de_result_gene,
  de_result_transcript,
  var,
  var_levels,
  gene_col = "gene_name",
  tx_col = "transcript_name",
  pvalue_cutoff = 0.05,
  lfc_cutoff = 1,
  use_fdr = TRUE
)

Arguments

txi_gene

A tibble or tximport output containing gene-level expression abundances. If NULL, gene-level abundances will be summarized from txi_transcript. Default is NULL.

txi_transcript

A tibble or tximport output containing transcript-level expression abundances.

sample_metadata

A data.frame or tibble containing sample metadata. The first column should contain sample names matching the column names in txi_gene and txi_transcript.

tx_to_gene

A data.frame or tibble containing transcript-to-gene mapping information. Must include columns specified by gene_col and tx_col.

de_result_gene

A data.frame or tibble containing differential expression results at the gene level. Must include gene_name, log2FC, and qvalue columns.

de_result_transcript

A data.frame or tibble containing differential expression results at the transcript level. Must include transcript_name, log2FC, and qvalue columns.

var

A string specifying the column name in sample_metadata that indicates the grouping variable (e.g., treatment, condition).

var_levels

A character vector specifying the levels of var to include in the contrasts.

gene_col

A string specifying the column name in tx_to_gene that contains gene names. Default is "gene_name".

tx_col

A string specifying the column name in tx_to_gene that contains transcript names. Default is "transcript_name".

pvalue_cutoff

A numeric value specifying the p-value cutoff for determining significant differential expression. Default is 0.05.

lfc_cutoff

A numeric value specifying the log2 fold-change cutoff for determining significant differential expression. Default is 1.

use_fdr

A logical value indicating whether to use the false discovery rate (qvalue) instead of p-value for significance cutoff. Default is TRUE.

Details

The function combines gene and transcript expression data with differential expression results to generate a tidy data frame. It filters significant genes and transcripts based on specified cutoffs and prepares the data for plotting expression profiles across specified sample groups.

Value

A tibble containing processed expression data and differential expression flags, ready for plotting.

Examples

# Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene,
# and de_result_transcript are pre-loaded data frames:

# Prepare data for plotting
expr_df <- prepare_profile_data(
  txi_gene = txi_gene,
  txi_transcript = txi_transcript,
  sample_metadata = sample_metadata,
  tx_to_gene = tx_to_gene,
  de_result_gene = de_result_gene,
  de_result_transcript = de_result_transcript,
  var = "condition",
  var_levels = c("control", "treatment"),
  gene_col = "gene_name",
  tx_col = "transcript_name",
  pvalue_cutoff = 0.05,
  lfc_cutoff = 1,
  use_fdr = TRUE
)

# View the prepared data
head(expr_df)

# Plotting example (assuming ggplot2 is installed)
library(ggplot2)
ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  facet_wrap(~ parent_gene + transcript_type)

Run Gene Set Enrichment Analysis for Different Transcript Types

Description

Performs gene set enrichment analysis (GSEA) on differential expression results for various transcript types, using the fgsea package. The function iterates over specified transcript types, filters the data accordingly, and runs GSEA for each type.

Usage

run_enrichment(det_df, genesets_list, pval_cutoff = 0.05, lfc_cutoff = 1)

Arguments

det_df

A data.frame or tibble containing transcript-level differential expression results, including transcript_type, log2FC, and gene_name columns.

genesets_list

A list of gene sets to be used in the enrichment analysis.

pval_cutoff

A numeric value specifying the p-value cutoff for the enrichment results. Default is 0.05.

lfc_cutoff

A numeric value specifying the log2 fold-change cutoff for filtering transcripts. Default is 1.

Details

The function defines a list of transcript types and their corresponding labels. It then filters the input differential expression data for each transcript type, ranks the genes by log2 fold-change, and performs GSEA using the fgsea package.

Value

A tibble containing the enrichment analysis results for each transcript type, including pathway names, p-values, adjusted p-values, and the transcript type (experiment).

Examples

# Sample differential expression data
det_df <- data.frame(
  gene_name = c("GeneA", "GeneB", "GeneC", "GeneD"),
  transcript_type = c(
    "protein_coding", "retained_intron",
    "processed_transcript", "nonsense_mediated_decay"
  ),
  log2FC = c(1.5, -2.0, 0.8, -1.2)
)

# Sample gene sets
genesets_list <- list(
  Pathway1 = c("GeneA", "GeneC"),
  Pathway2 = c("GeneB", "GeneD")
)

# Run enrichment analysis
fgsea_results_df <- run_enrichment(
  det_df = det_df,
  genesets_list = genesets_list,
  pval_cutoff = 0.05,
  lfc_cutoff = 1
)

# View the results
print(fgsea_results_df)