Exactly PalindromeWe derived the longest exact palindrome sequences for the three sets of RNA sequences. The results are depicted in following figure, where the x-axis and y-axis denote the length and number of the longest palindromes, respectively.
The length distribution of the longest exactly palindrome for each data set.
Approximate PalindromeWe computed the longest approximate palindrome sequences with k=1 for the three set of sequences. The results are depicted in following figure, where x-axis and y-axis denote the length and frequency of the longest palindromes, respectively.
The distribution of the length of the longest approximate palindromes with k=1 for each data set.
KS TestGiven the frequency and length distributions of exactly and approximate palindromes for the three types of RNA sequences, we are interested in knowing if the distributions are RNA type-specific or not. Therefore, the Kolmogorov-Smirnov (KS) test was employed to conduct a pairwise comparison study. The KS test examines the difference between two cumulative distributions. It rejects the null hypothesis of no difference between two cumulative distributions if the p-value is less than 0.05. The KS test is performed by using the MATLAB ‘kstest’ function, the results are shown in the table. The H value represents the hypothesis test result. If H = 1, this indicates the rejection of the null hypothesis at the significance level of 0.05; if H = 0, this indicates a failure to reject the null hypothesis at the same significance level.
Test of homogeneity for the length and frequency distributions of exactly and approximate palindromes for the three types of RNA sequences.
The A-U richness of RNA palindromesWe examined the assumption that the palindromes are also A-U rich in RNA sequences. The probabilities of occurrence for the three types of base pairings, i.e. (A, U), (C, G) and (U, G), in the palindromes of the three types of RNA sequences are listed in Table 8. We found that the (A, U) pair is consistently higher than the average (33.33%) in all of the three types of RNA sequences. Furthermore, in order to validate the A-U richness hypothesis, we recorded the number of sequences whose longest palindrome has an A-U pair ratio higher than the average. It was found that 54.57%, 64.27% and 61.69% of them are A-U rich for fusion gene mRNA, miRNA and lncRNA respectively. These two results validated the A-U rich assumption, which is in line with the results in previous study.
The average percentage of each base pair in the longest palindrome of three types of RNA sequences.