Package 'isoformic' reference manual

Title:	Isoform-Level Biological Interpretation of Transcriptomic Data
Description:	Isoform-level biological interpretation of transcriptomic data.
Authors:	Lucio Rezende Queiroz [aut, cre, cph] , Izabela Mamede Conceicao [aut, ctb] , Luigi Marchionni [aut, ctb] , Gloria Franco [aut, ctb]
Maintainer:	Lucio Rezende Queiroz <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1.9003
Built:	2025-03-29 05:19:12 UTC
Source:	https://github.com/luciorq/isoformic

Download Reference Files from GENCODE

Description

Downloads reference annotation files from the GENCODE database for human or mouse genomes. Supports downloading GTF, GFF, and transcriptome FASTA files. The function handles directory creation and checks for existing files to avoid redundant downloads.

Usage

download_reference(
  version = "46",
  reference = "gencode",
  organism = c("human", "mouse"),
  file_type = c("gtf", "gff", "fasta"),
  output_path = "data-raw",
  timeout_limit = 3600,
  method = "auto"
)
download_reference(
  version = "46",
  reference = "gencode",
  organism = c("human", "mouse"),
  file_type = c("gtf", "gff", "fasta"),
  output_path = "data-raw",
  timeout_limit = 3600,
  method = "auto"
)

Arguments

`version`	A character string specifying the GENCODE release version. For mouse references, include the letter 'M' in the version string (e.g., `"M32"`). Default is `"46"`.
`reference`	A character string specifying the source of the reference file. Currently, only `"gencode"` is supported. Default is `"gencode"`.
`organism`	A character string specifying the organism. Valid options are `"human"` or `"mouse"`.
`file_type`	A character string specifying the type of file to download. Valid options are `"gtf"`, `"gff"`, or `"fasta"`. Defaults to `"gtf"`. Note: `"fasta"` refers to the transcriptome FASTA file.
`output_path`	A character string specifying the directory where the downloaded file will be saved. Defaults to `"data-raw"`.
`timeout_limit`	A numeric value specifying the maximum time in seconds for the download to complete. This argument takes precedence over `options("timeout")`. Defaults to `3600` seconds (1 hour).
`method`	A character string specifying the method used by `utils::download.file()`. Defaults to `"auto"`.

Details

The function constructs the appropriate download URL based on the specified organism, version, and file type, and downloads the file to the specified output path. If the file already exists in the output directory, the function will not download it again and will return the existing file path. The function requires an internet connection and handles timeout settings to prevent download interruptions.

Value

A character string with the full path to the downloaded file.

Note

Currently, only "gencode" reference files are supported. The "mane" reference is not implemented yet.

Examples

# Download human GTF file for GENCODE release 43
gtf_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "gtf",
  output_path = "data-raw"
)

# Download mouse GTF file for GENCODE release M32
gtf_file_mouse <- download_reference(
  version = "M32",
  organism = "mouse",
  file_type = "gtf",
  output_path = "data-raw"
)

# Download human transcriptome FASTA file for GENCODE release 43
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
)

# Download human GTF file for GENCODE release 43
gtf_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "gtf",
  output_path = "data-raw"
)

# Download mouse GTF file for GENCODE release M32
gtf_file_mouse <- download_reference(
  version = "M32",
  organism = "mouse",
  file_type = "gtf",
  output_path = "data-raw"
)

# Download human transcriptome FASTA file for GENCODE release 43
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
)

Annotate Transcripts with Differential Gene Expression Significance

Description

Adds a column to a transcript-level differential expression table indicating whether each transcript originates from a gene that is significantly differentially expressed.

Usage

is_deg_sig(DegsigVector, DET_table)
is_deg_sig(DegsigVector, DET_table)

Arguments

`DegsigVector`	A character vector containing the names of transcripts from significantly differentially expressed genes.
`DET_table`	A `data.frame` or `tibble` containing transcript-level differential expression results, including a `transcript_name` column.

Value

A tibble with an additional column DEG_sig indicating whether the transcript is from a significantly differentially expressed gene ("YES" or "NO").

Examples

# Sample data
significant_transcripts <- c("transcript1", "transcript3")
DET_table <- data.frame(
  transcript_name = c("transcript1", "transcript2", "transcript3", "transcript4"),
  log2FC = c(2.5, -1.2, 0.8, -0.5),
  pvalue = c(0.01, 0.2, 0.03, 0.6)
)

# Annotate transcripts with DEG significance
DET_table_annotated <- is_deg_sig(DegsigVector = significant_transcripts, DET_table = DET_table)

# View the result
print(DET_table_annotated)

# Sample data
significant_transcripts <- c("transcript1", "transcript3")
DET_table <- data.frame(
  transcript_name = c("transcript1", "transcript2", "transcript3", "transcript4"),
  log2FC = c(2.5, -1.2, 0.8, -0.5),
  pvalue = c(0.01, 0.2, 0.03, 0.6)
)

# Annotate transcripts with DEG significance
DET_table_annotated <- is_deg_sig(DegsigVector = significant_transcripts, DET_table = DET_table)

# View the result
print(DET_table_annotated)

Merge Gene and Transcript Level Differential Expression Tables

Description

Combines gene-level and transcript-level differential expression results into a single table, annotates the combined data with significance labels based on specified cutoffs, and filters transcripts based on their types.

Usage

join_DEG_DET(DEG_tab, DET_final_tab, logfc_cut, pval_cut)
join_DEG_DET(DEG_tab, DET_final_tab, logfc_cut, pval_cut)

Arguments

`DEG_tab`	A `data.frame` or `tibble` containing gene-level differential expression results, including `gene_id`, `gene_name`, `log2FC`, and `pvalue` columns.
`DET_final_tab`	A `data.frame` or `tibble` containing transcript-level differential expression results, including `transcript_id`, `transcript_name`, `transcript_type`, `log2FC`, and `pvalue` columns.
`logfc_cut`	A numeric value specifying the absolute log2 fold-change cutoff for significance.
`pval_cut`	A numeric value specifying the p-value cutoff for significance.

Value

A tibble combining gene and transcript differential expression results, with additional columns:

id: gene or transcript ID.
name: gene or transcript name.
transcript_type: type of transcript or "gene" for gene-level entries.
abs_log2FC: absolute value of log2 fold-change.
significance: "sig" if significant based on cutoffs, "not_sig" otherwise.

Examples

# Sample gene-level data
DEG_tab <- data.frame(
  gene_id = c("gene1", "gene2"),
  gene_name = c("GeneA", "GeneB"),
  log2FC = c(1.5, -2.0),
  pvalue = c(0.01, 0.04)
)

# Sample transcript-level data
DET_final_tab <- data.frame(
  transcript_id = c("tx1", "tx2", "tx3"),
  transcript_name = c("Transcript1", "Transcript2", "Transcript3"),
  transcript_type = c("protein_coding", "lncRNA", "processed_transcript"),
  log2FC = c(1.2, -1.8, 0.5),
  pvalue = c(0.02, 0.03, 0.2)
)

# Merge and annotate differential expression results
DEGs_DETs_table <- join_DEG_DET(
  DEG_tab = DEG_tab,
  DET_final_tab = DET_final_tab,
  logfc_cut = 1,
  pval_cut = 0.05
)

# View the result
print(DEGs_DETs_table)

# Sample gene-level data
DEG_tab <- data.frame(
  gene_id = c("gene1", "gene2"),
  gene_name = c("GeneA", "GeneB"),
  log2FC = c(1.5, -2.0),
  pvalue = c(0.01, 0.04)
)

# Sample transcript-level data
DET_final_tab <- data.frame(
  transcript_id = c("tx1", "tx2", "tx3"),
  transcript_name = c("Transcript1", "Transcript2", "Transcript3"),
  transcript_type = c("protein_coding", "lncRNA", "processed_transcript"),
  log2FC = c(1.2, -1.8, 0.5),
  pvalue = c(0.02, 0.03, 0.2)
)

# Merge and annotate differential expression results
DEGs_DETs_table <- join_DEG_DET(
  DEG_tab = DEG_tab,
  DET_final_tab = DET_final_tab,
  logfc_cut = 1,
  pval_cut = 0.05
)

# View the result
print(DEGs_DETs_table)

Create Transcript-to-Gene Relationship Table

Description

Extracts a transcript-to-gene mapping table from GENCODE annotation files, such as the transcriptome FASTA file. Currently, only FASTA files are supported.

Usage

make_tx_to_gene(file_path, file_type = c("fasta", "gff", "gtf"))
make_tx_to_gene(file_path, file_type = c("fasta", "gff", "gtf"))

Arguments

`file_path`	A character string specifying the path to the reference file (e.g., GENCODE FASTA file).
`file_type`	A character string specifying the type of the reference file. Currently, only `"fasta"` is supported. Default is `"fasta"`.

Details

The function reads the headers of the FASTA file and extracts relevant information to create a mapping table. For GTF or GFF3 files, support is not yet implemented.

Value

A tibble containing the transcript-to-gene mapping information, including transcript IDs, gene IDs, transcript names, gene names, and transcript types.

Examples

# Assuming you have downloaded the GENCODE transcriptome FASTA file:
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
)

# Create the transcript-to-gene mapping table
tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta")

# View the first few rows
head(tx_to_gene)

# Assuming you have downloaded the GENCODE transcriptome FASTA file:
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
)

# Create the transcript-to-gene mapping table
tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta")

# View the first few rows
head(tx_to_gene)

Plot Log2 Fold-Change Results for Selected Genes

Description

Creates a bar plot of log2 fold-change values for transcripts of a selected gene, differentiating transcript types and significance levels.

Usage

plot_log2FC(DEG_DET_table, selected_gene, custom_colors = NULL)
plot_log2FC(DEG_DET_table, selected_gene, custom_colors = NULL)

Arguments

`DEG_DET_table`	A `data.frame` or `tibble` containing combined gene and transcript differential expression results, including `name`, `log2FC`, `transcript_type`, `significance`, and `gene_name` columns.
`selected_gene`	A character string specifying the gene name to plot.
`custom_colors`	An optional named vector of colors for different transcript types.

Details

The function filters the input table for the selected gene and creates a bar plot of log2 fold-change values. If all transcripts are significant, it plots without adjusting alpha transparency; otherwise, it adjusts alpha based on significance. The function uses predefined colors for transcript types, which can be overridden by providing custom_colors.

Value

A ggplot2 object representing the bar plot.

Examples

# Sample data
DEGs_DETs_table <- data.frame(
  name = c("Transcript1", "Transcript2", "GeneA"),
  log2FC = c(1.5, -2.0, 0.8),
  transcript_type = c("protein_coding", "lncRNA", "gene"),
  significance = c("sig", "not_sig", "sig"),
  gene_name = c("GeneA", "GeneA", "GeneA")
)

# Plot log2 fold-change for the selected gene
plot_obj <- plot_log2FC(
  DEG_DET_table = DEGs_DETs_table,
  selected_gene = "GeneA"
)

# Display the plot
print(plot_obj)

# Sample data
DEGs_DETs_table <- data.frame(
  name = c("Transcript1", "Transcript2", "GeneA"),
  log2FC = c(1.5, -2.0, 0.8),
  transcript_type = c("protein_coding", "lncRNA", "gene"),
  significance = c("sig", "not_sig", "sig"),
  gene_name = c("GeneA", "GeneA", "GeneA")
)

# Plot log2 fold-change for the selected gene
plot_obj <- plot_log2FC(
  DEG_DET_table = DEGs_DETs_table,
  selected_gene = "GeneA"
)

# Display the plot
print(plot_obj)

Plot Transcript Genomic Context

Description

This function plots the genomic context of all transcripts of given genes.

Usage

plot_tx_context(exon_table, custom_colors = NULL)
plot_tx_context(exon_table, custom_colors = NULL)

Arguments

`exon_table`	a tibble with exon information. Must contain columns `tx_id`, `exon_left`, and `exon_right`.
`custom_colors`	a vector of colors to use for each transcript. If not provided, the function will use the default colors. Actually, this argument is *NOT implemented yet.

Plot Transcript per gene expression

Description

Plot Transcript per gene expression

Usage

plot_tx_expr(genes_to_plot, profile_data)
plot_tx_expr(genes_to_plot, profile_data)

Arguments

`genes_to_plot`	a character vector with gene names
`profile_data`	tibble output from `prepare_profile_data`

Value

a ggplot object

Prepare Annotation

Description

Prepare annotation to be imported as rowRanges and rowData for both Genes, Transcripts and Exons based Position Annotation Table. From a GTF or GFF3 annotation file.

Usage

prepare_annotation(file_path, file_type = c("gtf", "gff"))
prepare_annotation(file_path, file_type = c("gtf", "gff"))

Arguments

`file_path`	Path to annotation file.
`file_type`	Character indicating the type of file to download. One of `"gtf"` or `"gff"`. Defaults to `"gtf"`.

Prepare Exon based Position Annotation Table

Description

Prepare Exon based Position Annotation Table

Usage

prepare_exon_annotation(gene_name, file_path, file_type = c("gff", "gtf"))
prepare_exon_annotation(gene_name, file_path, file_type = c("gff", "gtf"))

Arguments

`file_path`	Path to annotation file.
`file_type`	A character string specifying the type of file to download. Valid options are `"gtf"`, `"gff"`, or `"fasta"`. Defaults to `"gtf"`. Note: `"fasta"` refers to the transcriptome FASTA file.
`gene_names`	String or vector of gene names to extract.

Prepare Data for Gene and Transcript Expression Profile Plot

Description

This function processes gene and transcript-level expression data, along with differential expression results, to prepare a tidy data frame suitable for plotting expression profiles across different sample groups.

Usage

prepare_profile_data(
  txi_gene = NULL,
  txi_transcript,
  sample_metadata,
  tx_to_gene,
  de_result_gene,
  de_result_transcript,
  var,
  var_levels,
  gene_col = "gene_name",
  tx_col = "transcript_name",
  pvalue_cutoff = 0.05,
  lfc_cutoff = 1,
  use_fdr = TRUE
)
prepare_profile_data(
  txi_gene = NULL,
  txi_transcript,
  sample_metadata,
  tx_to_gene,
  de_result_gene,
  de_result_transcript,
  var,
  var_levels,
  gene_col = "gene_name",
  tx_col = "transcript_name",
  pvalue_cutoff = 0.05,
  lfc_cutoff = 1,
  use_fdr = TRUE
)

Arguments

`txi_gene`	A `tibble` or `tximport` output containing gene-level expression abundances. If `NULL`, gene-level abundances will be summarized from `txi_transcript`. Default is `NULL`.
`txi_transcript`	A `tibble` or `tximport` output containing transcript-level expression abundances.
`sample_metadata`	A `data.frame` or `tibble` containing sample metadata. The first column should contain sample names matching the column names in `txi_gene` and `txi_transcript`.
`tx_to_gene`	A `data.frame` or `tibble` containing transcript-to-gene mapping information. Must include columns specified by `gene_col` and `tx_col`.
`de_result_gene`	A `data.frame` or `tibble` containing differential expression results at the gene level. Must include `gene_name`, `log2FC`, and `qvalue` columns.
`de_result_transcript`	A `data.frame` or `tibble` containing differential expression results at the transcript level. Must include `transcript_name`, `log2FC`, and `qvalue` columns.
`var`	A string specifying the column name in `sample_metadata` that indicates the grouping variable (e.g., treatment, condition).
`var_levels`	A character vector specifying the levels of `var` to include in the contrasts.
`gene_col`	A string specifying the column name in `tx_to_gene` that contains gene names. Default is `"gene_name"`.
`tx_col`	A string specifying the column name in `tx_to_gene` that contains transcript names. Default is `"transcript_name"`.
`pvalue_cutoff`	A numeric value specifying the p-value cutoff for determining significant differential expression. Default is `0.05`.
`lfc_cutoff`	A numeric value specifying the log2 fold-change cutoff for determining significant differential expression. Default is `1`.
`use_fdr`	A logical value indicating whether to use the false discovery rate (`qvalue`) instead of p-value for significance cutoff. Default is `TRUE`.

Details

The function combines gene and transcript expression data with differential expression results to generate a tidy data frame. It filters significant genes and transcripts based on specified cutoffs and prepares the data for plotting expression profiles across specified sample groups.

Value

A tibble containing processed expression data and differential expression flags, ready for plotting.

Examples

# Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene,
# and de_result_transcript are pre-loaded data frames:

# Prepare data for plotting
expr_df <- prepare_profile_data(
  txi_gene = txi_gene,
  txi_transcript = txi_transcript,
  sample_metadata = sample_metadata,
  tx_to_gene = tx_to_gene,
  de_result_gene = de_result_gene,
  de_result_transcript = de_result_transcript,
  var = "condition",
  var_levels = c("control", "treatment"),
  gene_col = "gene_name",
  tx_col = "transcript_name",
  pvalue_cutoff = 0.05,
  lfc_cutoff = 1,
  use_fdr = TRUE
)

# View the prepared data
head(expr_df)

# Plotting example (assuming ggplot2 is installed)
library(ggplot2)
ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  facet_wrap(~ parent_gene + transcript_type)

# Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene,
# and de_result_transcript are pre-loaded data frames:

# Prepare data for plotting
expr_df <- prepare_profile_data(
  txi_gene = txi_gene,
  txi_transcript = txi_transcript,
  sample_metadata = sample_metadata,
  tx_to_gene = tx_to_gene,
  de_result_gene = de_result_gene,
  de_result_transcript = de_result_transcript,
  var = "condition",
  var_levels = c("control", "treatment"),
  gene_col = "gene_name",
  tx_col = "transcript_name",
  pvalue_cutoff = 0.05,
  lfc_cutoff = 1,
  use_fdr = TRUE
)

# View the prepared data
head(expr_df)

# Plotting example (assuming ggplot2 is installed)
library(ggplot2)
ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  facet_wrap(~ parent_gene + transcript_type)

Run Gene Set Enrichment Analysis for Different Transcript Types

Description

Performs gene set enrichment analysis (GSEA) on differential expression results for various transcript types, using the fgsea package. The function iterates over specified transcript types, filters the data accordingly, and runs GSEA for each type.

Usage

run_enrichment(det_df, genesets_list, pval_cutoff = 0.05, lfc_cutoff = 1)
run_enrichment(det_df, genesets_list, pval_cutoff = 0.05, lfc_cutoff = 1)

Arguments

`det_df`	A `data.frame` or `tibble` containing transcript-level differential expression results, including `transcript_type`, `log2FC`, and `gene_name` columns.
`genesets_list`	A list of gene sets to be used in the enrichment analysis.
`pval_cutoff`	A numeric value specifying the p-value cutoff for the enrichment results. Default is `0.05`.
`lfc_cutoff`	A numeric value specifying the log2 fold-change cutoff for filtering transcripts. Default is `1`.

Details

The function defines a list of transcript types and their corresponding labels. It then filters the input differential expression data for each transcript type, ranks the genes by log2 fold-change, and performs GSEA using the fgsea package.

Value

A tibble containing the enrichment analysis results for each transcript type, including pathway names, p-values, adjusted p-values, and the transcript type (experiment).

Examples

# Sample differential expression data
det_df <- data.frame(
  gene_name = c("GeneA", "GeneB", "GeneC", "GeneD"),
  transcript_type = c(
    "protein_coding", "retained_intron",
    "processed_transcript", "nonsense_mediated_decay"
  ),
  log2FC = c(1.5, -2.0, 0.8, -1.2)
)

# Sample gene sets
genesets_list <- list(
  Pathway1 = c("GeneA", "GeneC"),
  Pathway2 = c("GeneB", "GeneD")
)

# Run enrichment analysis
fgsea_results_df <- run_enrichment(
  det_df = det_df,
  genesets_list = genesets_list,
  pval_cutoff = 0.05,
  lfc_cutoff = 1
)

# View the results
print(fgsea_results_df)

# Sample differential expression data
det_df <- data.frame(
  gene_name = c("GeneA", "GeneB", "GeneC", "GeneD"),
  transcript_type = c(
    "protein_coding", "retained_intron",
    "processed_transcript", "nonsense_mediated_decay"
  ),
  log2FC = c(1.5, -2.0, 0.8, -1.2)
)

# Sample gene sets
genesets_list <- list(
  Pathway1 = c("GeneA", "GeneC"),
  Pathway2 = c("GeneB", "GeneD")
)

# Run enrichment analysis
fgsea_results_df <- run_enrichment(
  det_df = det_df,
  genesets_list = genesets_list,
  pval_cutoff = 0.05,
  lfc_cutoff = 1
)

# View the results
print(fgsea_results_df)

Package 'isoformic'

Help Index

Download Reference Files from GENCODE

Description

Usage

Arguments

Details

Value

Note

Examples

Annotate Transcripts with Differential Gene Expression Significance

Description

Usage

Arguments

Value

Examples

Merge Gene and Transcript Level Differential Expression Tables

Description

Usage

Arguments

Value

Examples

Create Transcript-to-Gene Relationship Table

Description

Usage

Arguments

Details

Value

Examples

Plot Log2 Fold-Change Results for Selected Genes

Description

Usage

Arguments

Details

Value

Examples

Plot Transcript Genomic Context

Description

Usage

Arguments

Plot Transcript per gene expression

Description

Usage

Arguments

Value

Prepare Annotation

Description

Usage

Arguments

Prepare Exon based Position Annotation Table

Description

Usage

Arguments

Prepare Data for Gene and Transcript Expression Profile Plot

Description

Usage

Arguments

Details

Value

Examples

Run Gene Set Enrichment Analysis for Different Transcript Types

Description

Usage

Arguments

Details

Value

Examples