| Title: | Isoform-Level Biological Interpretation of Transcriptomic Data |
|---|---|
| Description: | Isoform-level biological interpretation of transcriptomic data. |
| Authors: | Lucio Rezende Queiroz [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-6090-1834>), Lucio Rezende Queiroz [aut, ctb], Izabela Mamede Conceicao [aut, ctb] (ORCID: <https://orcid.org/0000-0002-0707-5588>), Luigi Marchionni [aut, ctb] (ORCID: <https://orcid.org/0000-0002-7336-8071>), Gloria Franco [aut, ctb] (ORCID: <https://orcid.org/0000-0001-5245-2365>) |
| Maintainer: | Lucio Rezende Queiroz <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2.9017 |
| Built: | 2026-05-11 08:50:01 UTC |
| Source: | https://github.com/luciorq/isoformic |
This function converts a SummarizedExperiment object to an IsoformicExperiment object.
It extracts the assays, row data, column data, and metadata from the input object
and uses them to create a new IsoformicExperiment object.
as_isoformic(se, annot_path, annot_type = c("gff", "annot_db"))as_isoformic(se, annot_path, annot_type = c("gff", "annot_db"))
se |
A |
annot_path |
Path to the annotation file. This can be a GFF file or the
path pre-built annotation database created with |
annot_type |
Type of the annotation file provided. Options are "gff" for GFF files and "annot_db" for pre-built annotation databases. |
This function instantiates a ContextData object containing information
about the genomic context of transcripts.
It requires a GFF file for gene annotation and constructs a TxDb
object from it. The function also prepares an annotation table and
updates the transcript names in the TxDb object.
The ContextData object can then be used in conjunction with
IsoformicExperiment for transcriptomic analyses.
create_context_data( gff_file, ..., organism, orgdb_package, bsgenome_package, tx_type_palette = NULL )create_context_data( gff_file, ..., organism, orgdb_package, bsgenome_package, tx_type_palette = NULL )
gff_file |
Character string specifying the path to a GFF file containing the gene annotation. |
... |
These dots are for future extensions and must be empty. |
organism |
Character string specifying the organism name (e.g., "Homo sapiens"). |
orgdb_package |
Character string specifying the name of the organism database package (e.g., "org.Hs.eg.db"). |
bsgenome_package |
Character string specifying the name of the BSgenome package (e.g., "BSgenome.Hsapiens.UCSC.hg38"). |
tx_type_palette |
Named character vector specifying the color palette for transcript types. |
Downloads reference annotation files from the GENCODE database for human or mouse genomes. Supports downloading GFF, GTF, transcriptome FASTA, and genome FASTA files. The function handles directory creation and checks for existing files to avoid redundant downloads.
download_reference( version = "49", reference = "gencode", organism = c("human", "mouse"), file_type = c("gff", "gtf", "fasta", "genome_fasta"), output_path = ":cache:", timeout_limit = 3600, method = "auto" )download_reference( version = "49", reference = "gencode", organism = c("human", "mouse"), file_type = c("gff", "gtf", "fasta", "genome_fasta"), output_path = ":cache:", timeout_limit = 3600, method = "auto" )
version |
A character string specifying the GENCODE release version.
For mouse references, include the letter 'M' in the version string
(e.g., |
reference |
A character string specifying the source of the reference
file.
Currently, only |
organism |
A character string specifying the organism.
Valid options are |
file_type |
A character string specifying the type of file to download.
Valid options are |
output_path |
A character string specifying the directory where the
downloaded file will be saved. Defaults to |
timeout_limit |
A numeric value specifying the maximum time in seconds
for the download to complete. This argument takes precedence over
|
method |
A character string specifying the method used by
|
The function constructs the appropriate download URL based on the specified organism, version, and file type, and downloads the file to the specified output path, being a user cache by default. If the file already exists in the output directory, the function will not download it again and will return the existing file path. The function requires an internet connection and handles timeout settings to prevent download interruptions.
A character string with the full path to the downloaded file.
Currently, only "gencode" reference files are supported.
The "mane" reference is not implemented yet.
## Not run: # Download human GFF file for GENCODE release 49 gff_file <- download_reference( version = "49", organism = "human", file_type = "gff", output_path = ":cache:" ) # Download mouse GFF file for GENCODE release M38 gff_file_mouse <- download_reference( version = "M38", organism = "mouse", file_type = "gff", output_path = ":cache:" ) # Download human transcriptome FASTA file for GENCODE release 49 fasta_file <- download_reference( version = "49", organism = "human", file_type = "fasta", output_path = ":cache:" ) ## End(Not run)## Not run: # Download human GFF file for GENCODE release 49 gff_file <- download_reference( version = "49", organism = "human", file_type = "gff", output_path = ":cache:" ) # Download mouse GFF file for GENCODE release M38 gff_file_mouse <- download_reference( version = "M38", organism = "mouse", file_type = "gff", output_path = ":cache:" ) # Download human transcriptome FASTA file for GENCODE release 49 fasta_file <- download_reference( version = "49", organism = "human", file_type = "fasta", output_path = ":cache:" ) ## End(Not run)
Determines the appropriate user cache directory for the isoformic package
based on the operating system.
On macOS, it avoids using paths with spaces and follows the XDG base
directory specification.
get_isoformic_cache(..., ext = NULL)get_isoformic_cache(..., ext = NULL)
... |
Additional path components to append to the cache directory. |
ext |
An optional file extension (e.g., "rds", "csv") to append to the final path. |
This function uses the [tools::R_user_dir()] function to determine the
user cache directory.
A path character string representing the path to the user cache
directory for the isoformic package.
Adds a column to a transcript-level differential expression table indicating whether each transcript originates from a gene that is significantly differentially expressed.
is_deg_sig(DegsigVector, DET_table)is_deg_sig(DegsigVector, DET_table)
DegsigVector |
A character vector containing the names of transcripts from significantly differentially expressed genes. |
DET_table |
A |
A tibble with an additional column DEG_sig indicating whether the transcript is from a significantly
differentially expressed gene ("YES" or "NO").
# Sample data significant_transcripts <- c("transcript1", "transcript3") DET_table <- data.frame( transcript_name = c("transcript1", "transcript2", "transcript3", "transcript4"), log2FC = c(2.5, -1.2, 0.8, -0.5), pvalue = c(0.01, 0.2, 0.03, 0.6) ) # Annotate transcripts with DEG significance DET_table_annotated <- is_deg_sig(DegsigVector = significant_transcripts, DET_table = DET_table) # View the result print(DET_table_annotated)# Sample data significant_transcripts <- c("transcript1", "transcript3") DET_table <- data.frame( transcript_name = c("transcript1", "transcript2", "transcript3", "transcript4"), log2FC = c(2.5, -1.2, 0.8, -0.5), pvalue = c(0.01, 0.2, 0.03, 0.6) ) # Annotate transcripts with DEG significance DET_table_annotated <- is_deg_sig(DegsigVector = significant_transcripts, DET_table = DET_table) # View the result print(DET_table_annotated)
The IsoformicExperiment class encapsulates the core data structure
for transcriptomic analyses in the isoformic package.
It holds the path to the dataset, sample metadata, and provides
access to transcript, gene, and exon annotations through properties.
The preferred way to construct an object of this class is through the
IsoformicExperiment() constructor.
IsoformicExperiment( experiment_name = NA_character_, data_path = NULL, annot_path = NULL, assay = NULL, col_data = NULL, annot_metadata = NULL, dea = NULL, gsea = NULL, tx_type_palette = character(0) ) col_data(self, ...) annot_data_transcripts(self, ...) annot_data_genes(self, ...) annot_data_exons(self, ...) annot_data(self, ...) annot_row_names(self, ...) col_names(self, ...) row_names(self, ...) tx_to_gene(self, ...) row_data(self, ...) tx_annot(self, ...) de_tx(self, ...) de_gene(self, ...)IsoformicExperiment( experiment_name = NA_character_, data_path = NULL, annot_path = NULL, assay = NULL, col_data = NULL, annot_metadata = NULL, dea = NULL, gsea = NULL, tx_type_palette = character(0) ) col_data(self, ...) annot_data_transcripts(self, ...) annot_data_genes(self, ...) annot_data_exons(self, ...) annot_data(self, ...) annot_row_names(self, ...) col_names(self, ...) row_names(self, ...) tx_to_gene(self, ...) row_data(self, ...) tx_annot(self, ...) de_tx(self, ...) de_gene(self, ...)
experiment_name |
Character string specifying the name of the experiment. This name is used for caching the assays experiment. If a name is not provided a random identifier is used. |
data_path |
Character string specifying the path to the data directory. |
annot_path |
Character string specifying the path to the annotation database directory. |
assay |
A list of matrices or data frames containing assay data, with transcript IDs as row names and sample IDs as column names. Each element of the list represents a different assay (e.g., TPM, counts). |
col_data |
A data frame containing sample metadata.
First column must be |
annot_metadata |
A list containing metadata about the annotation, such as source, version, and date. |
dea |
A list containing differential expression analysis results for transcripts and genes. |
gsea |
A list containing gene set enrichment analysis results. |
tx_type_palette |
A named character vector specifying the color palette for different transcript types. |
self |
An |
... |
Additional arguments passed to methods. |
annot_data_transcripts |
A property that retrieves transcript annotation data. |
annot_data_genes |
A property that retrieves gene annotation data. |
annot_data_exons |
A property that retrieves exon annotation data. |
annot_data |
A property that aggregates transcript, gene, and exon annotation data. |
Combines gene-level and transcript-level differential expression results into a single table, annotates the combined data with significance labels based on specified cutoffs, and filters transcripts based on their types.
join_DEG_DET(DEG_tab, DET_final_tab, logfc_cut, pval_cut)join_DEG_DET(DEG_tab, DET_final_tab, logfc_cut, pval_cut)
DEG_tab |
A |
DET_final_tab |
A |
logfc_cut |
A numeric value specifying the absolute log2 fold-change cutoff for significance. |
pval_cut |
A numeric value specifying the p-value cutoff for significance. |
A tibble combining gene and transcript differential expression results, with additional columns:
id: gene or transcript ID.
name: gene or transcript name.
transcript_type: type of transcript or "gene" for gene-level entries.
abs_log2FC: absolute value of log2 fold-change.
significance: "sig" if significant based on cutoffs, "not_sig" otherwise.
# Sample gene-level data DEG_tab <- data.frame( gene_id = c("gene1", "gene2"), gene_name = c("GeneA", "GeneB"), log2FC = c(1.5, -2.0), pvalue = c(0.01, 0.04) ) # Sample transcript-level data DET_final_tab <- data.frame( transcript_id = c("tx1", "tx2", "tx3"), transcript_name = c("Transcript1", "Transcript2", "Transcript3"), transcript_type = c("protein_coding", "lncRNA", "processed_transcript"), log2FC = c(1.2, -1.8, 0.5), pvalue = c(0.02, 0.03, 0.2) ) # Merge and annotate differential expression results DEGs_DETs_table <- join_DEG_DET( DEG_tab = DEG_tab, DET_final_tab = DET_final_tab, logfc_cut = 1, pval_cut = 0.05 ) # View the result print(DEGs_DETs_table)# Sample gene-level data DEG_tab <- data.frame( gene_id = c("gene1", "gene2"), gene_name = c("GeneA", "GeneB"), log2FC = c(1.5, -2.0), pvalue = c(0.01, 0.04) ) # Sample transcript-level data DET_final_tab <- data.frame( transcript_id = c("tx1", "tx2", "tx3"), transcript_name = c("Transcript1", "Transcript2", "Transcript3"), transcript_type = c("protein_coding", "lncRNA", "processed_transcript"), log2FC = c(1.2, -1.8, 0.5), pvalue = c(0.02, 0.03, 0.2) ) # Merge and annotate differential expression results DEGs_DETs_table <- join_DEG_DET( DEG_tab = DEG_tab, DET_final_tab = DET_final_tab, logfc_cut = 1, pval_cut = 0.05 ) # View the result print(DEGs_DETs_table)
Extracts a transcript-to-gene mapping table from GENCODE annotation files, such as the transcriptome FASTA file. Currently, only FASTA files are supported.
make_tx_to_gene(file_path, file_type = c("fasta", "gff", "gtf"))make_tx_to_gene(file_path, file_type = c("fasta", "gff", "gtf"))
file_path |
A character string specifying the path to the reference file (e.g., GENCODE FASTA file). |
file_type |
A character string specifying the type of the reference file. Currently, only |
The function reads the headers of the FASTA file and extracts relevant information to create a mapping table. For GTF or GFF3 files, support is not yet implemented.
A tibble containing the transcript-to-gene mapping information, including transcript IDs, gene IDs,
transcript names, gene names, and transcript types.
## Not run: # Assuming you have downloaded the GENCODE transcriptome FASTA file: fasta_file <- download_reference( version = "43", organism = "human", file_type = "fasta", output_path = "data-raw" ) # Create the transcript-to-gene mapping table tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta") # View the first few rows utils::head(tx_to_gene) ## End(Not run)## Not run: # Assuming you have downloaded the GENCODE transcriptome FASTA file: fasta_file <- download_reference( version = "43", organism = "human", file_type = "fasta", output_path = "data-raw" ) # Create the transcript-to-gene mapping table tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta") # View the first few rows utils::head(tx_to_gene) ## End(Not run)
Generate a genomic context plot for a specified gene, displaying its transcripts
along with their types and annotations.
The function utilizes the plotgardener package to create a detailed
visualization of the genomic context, including an ideogram, chromosome
highlight, and transcript structures.
It requires a ContextData object that contains the necessary genomic
information and annotations.
The plot can be customized with various parameters such as offsets,
label limits, and ideogram references.
plot_genomic_context( gene_name, context_data, limit_label = TRUE, show_guides = FALSE, y_offset = 0, height_offset = 0, downstream_offset = 0, upstream_offset = 0, ideogram_reference = c("hg38", "hg19", "mm11", "mm10", "none") )plot_genomic_context( gene_name, context_data, limit_label = TRUE, show_guides = FALSE, y_offset = 0, height_offset = 0, downstream_offset = 0, upstream_offset = 0, ideogram_reference = c("hg38", "hg19", "mm11", "mm10", "none") )
gene_name |
Character string specifying the name of the gene to plot. |
context_data |
A |
limit_label |
Logical indicating whether to limit the length of
transcript labels to avoid overlap (default is |
show_guides |
Logical indicating whether to show guide lines on the plot
(default is |
y_offset |
Numeric value to adjust the vertical position of the plot
(default is |
height_offset |
Numeric value to adjust the height of the plot
(default is |
downstream_offset |
Numeric value to extend the downstream region
beyond the gene's end position (default is |
upstream_offset |
Numeric value to extend the upstream region
beyond the gene's start position (default is |
ideogram_reference |
Character string specifying the reference genome for the ideogram. Options include "hg38", "hg19", "mm11", "mm10", or "none" (default is "hg38"). |
A plotgardener object representing the genomic context plot.
Creates a bar plot of log2 fold-change values for transcripts of a selected gene, differentiating transcript types and significance levels.
plot_log2FC( de_data, feature, feature_column = "gene_name", color_palette = NULL ) plot_log2fc(self, ...)plot_log2FC( de_data, feature, feature_column = "gene_name", color_palette = NULL ) plot_log2fc(self, ...)
de_data |
A |
feature |
A character string specifying the gene name to plot. |
feature_column |
A character string specifying the column name in |
color_palette |
A named character vector specifying colors for different transcript types.
If |
self |
Input object, either a |
... |
Additional arguments passed to the method. |
The function filters the input table for the selected gene and creates a bar plot of log2 fold-change values.
If all transcripts are significant, it plots without adjusting alpha transparency; otherwise, it adjusts alpha
based on significance. The function uses predefined colors for transcript types, which can be overridden
by providing custom_colors.
A ggplot2 object representing the bar plot.
# Sample data de_table_long <- data.frame( feature_name = c("Transcript1", "Transcript2", "Transcript3", "GeneA"), feature_id = c("TX1", "TX2", "TX3", "GENEA"), gene_name = c("GeneA", "GeneA", "GeneA", "GeneA"), log2FC = c(1.5, -0.5, -2.0, 0.8), feature_type = c("protein_coding", "lncRNA", "retained_intron", "gene"), is_de = c("yes", "no", "yes", "yes") ) # Plot log2 fold-change for the selected gene plot_obj <- plot_log2FC( de_data = de_table_long, feature = "GeneA", feature_column = "gene_name" ) # Display the plot print(plot_obj)# Sample data de_table_long <- data.frame( feature_name = c("Transcript1", "Transcript2", "Transcript3", "GeneA"), feature_id = c("TX1", "TX2", "TX3", "GENEA"), gene_name = c("GeneA", "GeneA", "GeneA", "GeneA"), log2FC = c(1.5, -0.5, -2.0, 0.8), feature_type = c("protein_coding", "lncRNA", "retained_intron", "gene"), is_de = c("yes", "no", "yes", "yes") ) # Plot log2 fold-change for the selected gene plot_obj <- plot_log2FC( de_data = de_table_long, feature = "GeneA", feature_column = "gene_name" ) # Display the plot print(plot_obj)
This function plots the genomic context of all transcripts of given genes.
plot_tx_context(exon_table, custom_colors = NULL)plot_tx_context(exon_table, custom_colors = NULL)
exon_table |
a tibble with exon information.
Must contain columns |
custom_colors |
a vector of colors to use for each transcript. If not provided, the function will use the default colors. Actually, this argument is *NOT implemented yet. |
Plot Transcript per gene expression
plot_tx_expr(genes_to_plot, profile_data)plot_tx_expr(genes_to_plot, profile_data)
genes_to_plot |
a character vector with gene names |
profile_data |
tibble output from |
a ggplot object
Prepare annotation to be imported as rowRanges and rowData for both
Genes, Transcripts and Exons based Position Annotation Table.
From a GTF or GFF3 annotation file.
prepare_annotation(file_path, file_type = c("gtf", "gff"))prepare_annotation(file_path, file_type = c("gtf", "gff"))
file_path |
Path to annotation file. |
file_type |
Character indicating the type of file to download.
One of |
This function reads a GFF file and writes its contents to a Parquet file using DuckDB.
prepare_annotation_db(input_path, output_path = NULL, file_type = c("gff"))prepare_annotation_db(input_path, output_path = NULL, file_type = c("gff"))
input_path |
Character string specifying the path to the input GFF file. |
output_path |
Character string specifying the path to the output
Parquet file. If |
file_type |
Character string specifying the type of the input file. Currently, only "gff" is supported (default is "gff"). |
Invisible path to the created Parquet file.
Prepare Exon based Position Annotation Table
prepare_exon_annotation(gene_name, file_path, file_type = c("gff", "gtf"))prepare_exon_annotation(gene_name, file_path, file_type = c("gff", "gtf"))
gene_name |
String or vector of gene names to extract. |
file_path |
Path to annotation file. |
file_type |
A character string specifying the type of file to download.
Valid options are |
This function reads an annotation file and parse feature annotation to Parquet files each level of required feature (i.e. gene, transcript, and exon).
prepare_isoformic_annotation( input_path, output_path = NULL, file_type = c("gff") )prepare_isoformic_annotation( input_path, output_path = NULL, file_type = c("gff") )
input_path |
Character string specifying the path to the input GFF file. |
output_path |
Character string specifying the path to the output
directory where Parquet files are written. If |
file_type |
Character string specifying the type of the input file. Currently, only "gff" is supported (default is "gff"). |
Invisible path to the created Parquet file.
This function processes gene and transcript-level expression data, along with differential expression results, to prepare a tidy data frame suitable for plotting expression profiles across different sample groups.
prepare_profile_data( txi_gene = NULL, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, de_result_transcript, var, var_levels, gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE )prepare_profile_data( txi_gene = NULL, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, de_result_transcript, var, var_levels, gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE )
txi_gene |
A |
txi_transcript |
A |
sample_metadata |
A |
tx_to_gene |
A |
de_result_gene |
A |
de_result_transcript |
A |
var |
A string specifying the column name in |
var_levels |
A character vector specifying the levels of |
gene_col |
A string specifying the column name in |
tx_col |
A string specifying the column name in |
pvalue_cutoff |
A numeric value specifying the p-value cutoff for determining significant differential expression. Default is |
lfc_cutoff |
A numeric value specifying the log2 fold-change cutoff for determining significant differential expression. Default is |
use_fdr |
A logical value indicating whether to use the false discovery rate ( |
The function combines gene and transcript expression data with differential expression results to generate a tidy data frame. It filters significant genes and transcripts based on specified cutoffs and prepares the data for plotting expression profiles across specified sample groups.
A tibble containing processed expression data and differential expression flags, ready for plotting.
## Not run: # Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, # and de_result_transcript are pre-loaded data frames: # Prepare data for plotting if (FALSE) { expr_df <- prepare_profile_data( txi_gene = txi_gene, txi_transcript = txi_transcript, sample_metadata = sample_metadata, tx_to_gene = tx_to_gene, de_result_gene = de_result_gene, de_result_transcript = de_result_transcript, var = "condition", var_levels = c("control", "treatment"), gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE ) # View the prepared data utils::head(expr_df) # Plotting example (assuming ggplot2 is installed) library(ggplot2) ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) + geom_bar(stat = "identity", position = position_dodge()) + facet_wrap(~ parent_gene + transcript_type) } ## End(Not run)## Not run: # Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene, # and de_result_transcript are pre-loaded data frames: # Prepare data for plotting if (FALSE) { expr_df <- prepare_profile_data( txi_gene = txi_gene, txi_transcript = txi_transcript, sample_metadata = sample_metadata, tx_to_gene = tx_to_gene, de_result_gene = de_result_gene, de_result_transcript = de_result_transcript, var = "condition", var_levels = c("control", "treatment"), gene_col = "gene_name", tx_col = "transcript_name", pvalue_cutoff = 0.05, lfc_cutoff = 1, use_fdr = TRUE ) # View the prepared data utils::head(expr_df) # Plotting example (assuming ggplot2 is installed) library(ggplot2) ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) + geom_bar(stat = "identity", position = position_dodge()) + facet_wrap(~ parent_gene + transcript_type) } ## End(Not run)
Performs gene set enrichment analysis (GSEA) on differential expression results for various transcript types,
using the fgsea package. The function iterates over specified transcript types, filters the data accordingly,
and runs GSEA for each type.
run_enrichment( det_df, genesets_list, tx_to_gene, pval_cutoff = 0.05, lfc_cutoff = 1 )run_enrichment( det_df, genesets_list, tx_to_gene, pval_cutoff = 0.05, lfc_cutoff = 1 )
det_df |
A |
genesets_list |
A list of gene sets to be used in the enrichment analysis. |
tx_to_gene |
A |
pval_cutoff |
A numeric value specifying the p-value cutoff for the enrichment results. Default is |
lfc_cutoff |
A numeric value specifying the log2 fold-change cutoff for filtering transcripts. Default is |
The function defines a list of transcript types and their corresponding labels.
It then filters the input differential expression data for each transcript type, ranks the genes by log2 fold-change,
and performs GSEA using the fgsea package.
A tibble containing the enrichment analysis results for each transcript type, including pathway names,
p-values, adjusted p-values, and the transcript type (experiment).
# Sample differential expression data det_df <- data.frame( gene_name = c( "GeneA", "GeneB", "GeneC", "GeneD", "GeneA", "GeneA", "GeneB", "GeneC", "GeneD", "GeneE", "GeneB", "GeneA" ), transcript_type = c( "protein_coding", "retained_intron", "protein_coding_CDS_not_defined", "processed_transcript", "protein_coding", "protein_coding", "retained_intron", "protein_coding_CDS_not_defined", "processed_transcript", "nonsense_mediated_decay", "protein_coding", "retained_intron" ), transcript_name = c( "Transcript1", "Transcript2", "Transcript3", "Transcript4", "Transcript5", "Transcript6", "Transcript7", "Transcript8", "Transcript9", "Transcript10", "Transcript11", "Transcript12" ), log2FC = c( 1.5, -2.0, 0.8, -1.2, 2.3, -0.5, 1.0, -1.5, 0.3, -2.5, 1.8, -0.7 ) ) # Sample gene sets genesets_list <- list( Pathway1 = c("GeneA", "GeneC", "GeneF"), Pathway2 = c("GeneB", "GeneD", "GeneE", "GeneX") ) # Sample transcript to gene mapping tx_to_gene <- data.frame( transcript_name = det_df$transcript_name, gene_name = det_df$gene_name ) # Run enrichment analysis fgsea_results_df <- run_enrichment( det_df = det_df, genesets_list = genesets_list, tx_to_gene = tx_to_gene, pval_cutoff = 0.05, lfc_cutoff = 1 ) # View the results print(fgsea_results_df)# Sample differential expression data det_df <- data.frame( gene_name = c( "GeneA", "GeneB", "GeneC", "GeneD", "GeneA", "GeneA", "GeneB", "GeneC", "GeneD", "GeneE", "GeneB", "GeneA" ), transcript_type = c( "protein_coding", "retained_intron", "protein_coding_CDS_not_defined", "processed_transcript", "protein_coding", "protein_coding", "retained_intron", "protein_coding_CDS_not_defined", "processed_transcript", "nonsense_mediated_decay", "protein_coding", "retained_intron" ), transcript_name = c( "Transcript1", "Transcript2", "Transcript3", "Transcript4", "Transcript5", "Transcript6", "Transcript7", "Transcript8", "Transcript9", "Transcript10", "Transcript11", "Transcript12" ), log2FC = c( 1.5, -2.0, 0.8, -1.2, 2.3, -0.5, 1.0, -1.5, 0.3, -2.5, 1.8, -0.7 ) ) # Sample gene sets genesets_list <- list( Pathway1 = c("GeneA", "GeneC", "GeneF"), Pathway2 = c("GeneB", "GeneD", "GeneE", "GeneX") ) # Sample transcript to gene mapping tx_to_gene <- data.frame( transcript_name = det_df$transcript_name, gene_name = det_df$gene_name ) # Run enrichment analysis fgsea_results_df <- run_enrichment( det_df = det_df, genesets_list = genesets_list, tx_to_gene = tx_to_gene, pval_cutoff = 0.05, lfc_cutoff = 1 ) # View the results print(fgsea_results_df)
Create a Salmon index from a reference transcriptome FASTA file.
salmon_index( fasta_path, index_path = "salmon_index", kmer_len = 31, num_threads = 2, env_name = "salmon-env", is_gencode = FALSE, decoy_fasta = NULL, clip_poly_a = TRUE )salmon_index( fasta_path, index_path = "salmon_index", kmer_len = 31, num_threads = 2, env_name = "salmon-env", is_gencode = FALSE, decoy_fasta = NULL, clip_poly_a = TRUE )
fasta_path |
Path to the reference transcriptome FASTA file. |
index_path |
Directory path to save the Salmon index (default is "salmon_index"). |
kmer_len |
K-mer length for the index (default is 31). |
num_threads |
Number of threads to use (default is 2). |
env_name |
Name of the conda environment with Salmon installed (default is "salmon-env"). |
is_gencode |
Logical indicating if the FASTA is from GENCODE
(default is |
decoy_fasta |
Optional path to a FASTA file containing decoy sequences
(default is |
clip_poly_a |
Logical indicating whether to clip poly-A tails
(default is |
processx style output list.
Perform transcript quantification using Salmon's selective-alignment-based mode from raw RNA-seq reads.
salmon_quant( input_r1, input_r2 = NULL, index_path = "salmon_index", output_dir = "quant_output", num_threads = 8, num_gibbs = 100, min_score_fraction = "0.65", env_name = "salmon-env" )salmon_quant( input_r1, input_r2 = NULL, index_path = "salmon_index", output_dir = "quant_output", num_threads = 8, num_gibbs = 100, min_score_fraction = "0.65", env_name = "salmon-env" )
input_r1 |
Path to the FASTQ file for read 1 (or single-end reads). |
input_r2 |
Optional path to the FASTQ file for read 2 (paired-end reads). |
index_path |
Path to the Salmon index directory (default is "salmon_index"). |
output_dir |
Directory to save the quantification output (default is "quant_output"). |
num_threads |
Number of threads to use (default is 8). |
num_gibbs |
Number of Gibbs samples for uncertainty estimation (default is 100). |
min_score_fraction |
Minimum score fraction for alignments (default is "0.65"). |
env_name |
Name of the conda environment with Salmon installed (default is "salmon-env"). |
processx style output list.
Default color palette for transcript types.
tx_type_palette()tx_type_palette()