Title: | K-Mers Similarity Score Matrix and HeatMap |
---|---|
Description: | Similarity Score Matrix and HeatMap for nucleic and amino acid k-mers. Similarity score is evaluated by Point Accepted Mutation (PAM) and BLOcks SUbstitution Matrix (BLOSUM). The 30, 40, 70, 120, 250 and 62, 45, 50, 62, 80, 100 matrix versions are available for PAM and BLOSUM, respectively. Alignment is evaluated by local and global alignment. |
Authors: | Rafal Urniaz [aut, cre] |
Maintainer: | Rafal Urniaz <[email protected]> |
License: | GPL-3 |
Version: | 2.1.0 |
Built: | 2025-02-23 04:17:25 UTC |
Source: | https://github.com/urniaz/kmers |
kmeRs generate kmers
kmeRs_generate_kmers(k, bases)
kmeRs_generate_kmers(k, bases)
k |
times |
bases |
follow the kmeRs_similarity_matrix() |
The kmeRs_heatmap
function generates a heatmap from similarity score matrix
kmeRs_heatmap( x, cexRow = NULL, cexCol = NULL, col = NULL, Colv = NA, Rowv = NA )
kmeRs_heatmap( x, cexRow = NULL, cexCol = NULL, col = NULL, Colv = NA, Rowv = NA )
x |
matrix calculated by |
cexRow |
= NULL |
cexCol |
= NULL |
col |
color palette, when NULL the default palette is applied |
Colv |
when different from NA, the column dendrogram is shown |
Rowv |
when different from NA, the row dendrogram is shown |
heatmap from results
# Use RColorBrewer to generate a figure similar to publication library(RColorBrewer) h.palette <- rev(brewer.pal(9, "YlGnBu")) q0 <- c("GATTACA", "ACAGATT", "GAATTAC", "GAAATCT", "CTATAGA", "GTACATA", "AACGATT") example <- kmeRs_similarity_matrix(q0, submat = "BLOSUM62") kmeRs_heatmap(kmeRs_score(example), col = h.palette)
# Use RColorBrewer to generate a figure similar to publication library(RColorBrewer) h.palette <- rev(brewer.pal(9, "YlGnBu")) q0 <- c("GATTACA", "ACAGATT", "GAATTAC", "GAAATCT", "CTATAGA", "GTACATA", "AACGATT") example <- kmeRs_similarity_matrix(q0, submat = "BLOSUM62") kmeRs_heatmap(kmeRs_score(example), col = h.palette)
The kmeRs_score
function sums the partial scores and sort the data.frame
to indicate the most 'different' k-mers
kmeRs_score(x, decreasing = FALSE)
kmeRs_score(x, decreasing = FALSE)
x |
the similarity matrix calculated by |
decreasing |
when TRUE, results are sorted decreasing |
sorted similarity matrix with global.score column added; is returned as a data.frame
# Calculate the example BLOSUM62 matrix and score the result example <- kmeRs_similarity_matrix(kmers_given = c("A", "T", "C", "G"), submat = "BLOSUM62") kmeRs_score(example)
# Calculate the example BLOSUM62 matrix and score the result example <- kmeRs_similarity_matrix(kmers_given = c("A", "T", "C", "G"), submat = "BLOSUM62") kmeRs_score(example)
The kmeRs_show_alignment
function aligns and shows calculated
alignment between two DNA or RNA sequences
kmeRs_show_alignment( kmer_A, kmer_B, seq.type = "AA", submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes = "BLOSUM62", no = NA), na.match = ifelse(is.na(submat), yes = 2, no = NA), na.mismatch = ifelse(is.na(submat), yes = -3, no = NA), align.type = "global", verbose = TRUE, ... )
kmeRs_show_alignment( kmer_A, kmer_B, seq.type = "AA", submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes = "BLOSUM62", no = NA), na.match = ifelse(is.na(submat), yes = 2, no = NA), na.mismatch = ifelse(is.na(submat), yes = -3, no = NA), align.type = "global", verbose = TRUE, ... )
kmer_A |
given k-mer A |
kmer_B |
given k-mer B |
seq.type |
type of sequence in question, either 'DNA' or 'AA' (default) |
submat |
substitution matrix version, defaults to 'BLOSUM62'; other
choices include 'BLOSUM45', 'BLOSUM50', 'BLOSUM62', 'BLOSUM80', 'BLOSUM100',
'PAM30', 'PAM40', 'PAM70', 'PAM120' and 'PAM250'; this parameter is ignored
if |
na.match |
for DNA sequences, what should the score for exact match be? |
na.mismatch |
for DNA sequences, what should the score for mismatches be? |
align.type |
"global" or "local" |
verbose |
= TRUE |
... |
other parameters, e.g. gap opening/extension penalties ( |
alignment is returned as a data frame
# Example DNA alignment with gap opening and extension penalties of 1 and 0 # with default base match/mismatch values kmeRs_show_alignment(kmer_A = "AAATTTCCCGGG", kmer_B = "TCACCC", seq.type = "DNA", gapOpening = 1, gapExtension = 0)
# Example DNA alignment with gap opening and extension penalties of 1 and 0 # with default base match/mismatch values kmeRs_show_alignment(kmer_A = "AAATTTCCCGGG", kmer_B = "TCACCC", seq.type = "DNA", gapOpening = 1, gapExtension = 0)
The kmeRs_similarity_matrix
function generates a pairwise similarity score
matrix for for k length given k-mers vs. all possible k-mers combination.
The pairwise similarity score is calculated using PAM or BLOSUM substitution matrix;
30, 40, 70, 120, 250 and 62, 45, 50, 62, 80, 100 matrix versions are available for
PAM or BLOSUM, respectively. The results are evaluated by global similarity score;
higher similarity score indicates more similar sequences for BLOSUM and opposite for
PAM matrix.
kmeRs_similarity_matrix( q = NULL, x = NULL, align.type = "global", k = 3, seq.type = "AA", submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes = "BLOSUM62", no = NA), compare.all = FALSE, save_to_file = NULL, ... )
kmeRs_similarity_matrix( q = NULL, x = NULL, align.type = "global", k = 3, seq.type = "AA", submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes = "BLOSUM62", no = NA), compare.all = FALSE, save_to_file = NULL, ... )
q |
query vector with given k-mers |
x |
kmers to search the query vector against. If unspecified, |
align.type |
type of alignment, either |
k |
length of k-mers to calculate the similarity matrix for, defaults to 3; e.g. for DNA, N = 4^3 = 64 combinations if |
seq.type |
type of sequence in question, either 'DNA' or 'AA' (default);
this will also modify |
submat |
substitution matrix, default to 'BLOSUM62'; other choices are 'BLOSUM45', 'BLOSUM50', 'BLOSUM62', 'BLOSUM80', 'BLOSUM100', 'PAM30', 'PAM40', 'PAM70', 'PAM120' or 'PAM250' |
compare.all |
if |
save_to_file |
if specified, the results will be saved to the path in comma-separated format (.CSV) |
... |
other parameters, e.g. gap opening/extension penalties ( |
similarity matrix is returned as a data.frame
# Simple BLOSUM62 similarity matrix for all amino acid nucleotides kmeRs_similarity_matrix(submat = "BLOSUM62")
# Simple BLOSUM62 similarity matrix for all amino acid nucleotides kmeRs_similarity_matrix(submat = "BLOSUM62")
The kmeRs_statistics
function calculates basic statistics and returns the similarity matrix
with calculated results or summarized table with statistics only when margin.only
is set to TRUE
kmeRs_statistics(x, margin.only = FALSE, digits = 2)
kmeRs_statistics(x, margin.only = FALSE, digits = 2)
x |
Similarity matrix computed by |
margin.only |
Should only margin statistics be displayed? Defaults to |
digits |
rounding digits, defaults to 2 |
data.frame with results
# Simple BLOSUM62 similarity matrix for DNA nucleotides # Sample heptamers q0 <- c("GATTACA", "ACAGATT", "GAATTAC", "GAAATCT", "CTATAGA", "GTACATA", "AACGATT") # Compute similarity matrix example <- kmeRs_similarity_matrix(q0, submat = "BLOSUM62") # Result as a full matrix kmeRs_statistics(example) # Result a summary statistics table kmeRs_statistics(example, margin.only = TRUE)
# Simple BLOSUM62 similarity matrix for DNA nucleotides # Sample heptamers q0 <- c("GATTACA", "ACAGATT", "GAATTAC", "GAAATCT", "CTATAGA", "GTACATA", "AACGATT") # Compute similarity matrix example <- kmeRs_similarity_matrix(q0, submat = "BLOSUM62") # Result as a full matrix kmeRs_statistics(example) # Result a summary statistics table kmeRs_statistics(example, margin.only = TRUE)
The kmeRs_transcript_to_complementary
function transcripts DNA given k-mers to complementary sequences
kmeRs_transcript_to_complementary(kmers_given)
kmeRs_transcript_to_complementary(kmers_given)
kmers_given |
vector contains given k-mers |
vector contains complementary sequences
# Returns complementary sequence to GATTACA kmeRs_transcript_to_complementary('GATTACA')
# Returns complementary sequence to GATTACA kmeRs_transcript_to_complementary('GATTACA')
Supporting func to kmeRs_show_alignment
kmeRs_twoSeqSim( kmer_A, kmer_B, seq.type = "AA", submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes = "BLOSUM62", no = NA), na.match = ifelse(is.na(submat), yes = 2, no = NA), na.mismatch = ifelse(is.na(submat), yes = -3, no = NA), align.type = "global", verbose = TRUE, ... )
kmeRs_twoSeqSim( kmer_A, kmer_B, seq.type = "AA", submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes = "BLOSUM62", no = NA), na.match = ifelse(is.na(submat), yes = 2, no = NA), na.mismatch = ifelse(is.na(submat), yes = -3, no = NA), align.type = "global", verbose = TRUE, ... )
kmer_A |
given k-mer A |
kmer_B |
given k-mer B |
seq.type |
type of sequence in question, either 'DNA' or 'AA' (default) |
submat |
substitution matrix version, defaults to 'BLOSUM62'; other
choices include 'BLOSUM45', 'BLOSUM50', 'BLOSUM62', 'BLOSUM80', 'BLOSUM100',
'PAM30', 'PAM40', 'PAM70', 'PAM120' and 'PAM250'; this parameter is ignored
if |
na.match |
for DNA sequences, what should the score for exact match be? |
na.mismatch |
for DNA sequences, what should the score for mismatches be? |
align.type |
"global" or "local" |
verbose |
= TRUE |
... |
other parameters, e.g. gap opening/extension penalties ( |