PSI-Sigma
Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes.
However, previous PSI-detection methods are limited to specific types of AS events. PSI-Sigma is using a new splicing index (PSIΣ) that is more flexible, can incorporate novel junctions, and can compute PSI values of individual exons in complex splicing events.
docker pull docker.io/woodydon/psi_sigma_pipeline:4.0
singularity pull docker://woodydon/psi_sigma_pipeline:4.0
singularity pull docker.io/woodydon/psi_sigma_pipeline:4.0
singularity exec --bind /mnt:/mnt docker://woodydon/psi_sigma_pipeline:4.0 perl /usr/local/bin/PSI-Sigma-2.3/dummyai.pl --gtf Homo_sapiens.GRCh38.100.sorted.gtf --nread 10 --name PSIsigma2d1 --type 1 --fmode 3 --threads 6
Kuan-Ting (Woody) Lin, klin@cshl.edu
For short-read RNA-seq data, please generate .bam, .bai and .SJ.out files by using STAR (https://github.com/alexdobin/STAR).
###This is an example for short-read RNA-seq###
STAR --runThreadN 6 \
--outSAMtype BAM SortedByCoordinate \
--outFilterIntronMotifs RemoveNoncanonical \
--genomeDir ~/index/starR100H38 \
--twopassMode Basic \
--readFilesIn R1.fastq R2.fastq \
--outFileNamePrefix <NAME>.
samtools index <NAME>.Aligned.sortedByCoord.out.bam
For long-read RNA-seq data, please use GMAP (http://research-pub.gene.com/gmap/src/gmap-gsnap-2017-11-15.tar.gz) or minimap2 (https://github.com/lh3/minimap2).
###This is an example for long-read RNA-seq###
#Example of using GMAP#
~/gmap-2017-11-15/bin/gmap -d GRCh38 -f samse --min-trimmed-coverage=0.5 --no-chimeras -B 5 -t 6 MinION_long_read.fastq > <NAME>.sam
#Example of using minimap2#
~/minimap2-2.17/minimap2 -ax splice:hq -uf H38.fa MinION_long_read.fastq > <NAME>.sam
samtools view -bS <NAME>.sam > <NAME>.bam
samtools sort <NAME>.bam -o <NAME>.Aligned.sortedByCoord.out.bam
samtools index <NAME>.Aligned.sortedByCoord.out.bam
Quick Start ====== Create links to the .bam, .bai, and .SJ.out files in the a folder (afolder). If you are using long-read RNA-seq data, .SJ.out files will be generated automatically since GMAP doesn’t produce the file.
mkdir afolder
cd afolder
ln -s bamfolder/*.bam* .
ln -s bamfolder/*.SJ.* .
Download a .gtf file and sort the coordinates. (NOTE: sorting .gtf file is necessary!)
wget ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens//Homo_sapiens.GRCh38.87.gtf.gz
gzip -d Homo_sapiens.GRCh38.87.gtf.gz
(grep "^#" Homo_sapiens.GRCh38.87.gtf; grep -v "^#" Homo_sapiens.GRCh38.87.gtf | sort -k1,1 -k4,4n) > Homo_sapiens.GRCh38.87.sorted.gtf
rm Homo_sapiens.GRCh38.87.gtf
Create two files: (1) groupa.txt and (2) groupb.txt. Please put the full name or the suffix of your .bam files in the groupa.txt or groupb.txt. For example, the suffix of a “Sequins_MixA.Aligned.sortedByCoord.out.bam” file is “Sequins_MixA”. Groupa.txt will be compared with groupb.txt. Below is an example:
#Note: one file name per line in groupa.txt and groupb.txt
echo Sequins_MixA.Aligned.sortedByCoord.out.bam >> groupa.txt
echo Sequins_MixB.Aligned.sortedByCoord.out.bam >> groupb.txt
#Alternatively, you can put only the suffix (WARNNING: only works when the .bam files are linked to the working directory)
echo Sequins_MixA > groupa.txt
echo Sequins_MixB > groupb.txt
Run dummyai.pl. After the .gtf file, please specify 1 for short-read RNA-seq and 2 for long-read RNA-seq. The last column is used to specify the minimum number of supporting reads for an AS event (10 is specified in the example below).
#For short-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/dummyai.pl --gtf Homo_sapiens.GRCh38.87.sorted.gtf --name PSIsigma --type 1 -nread 10
#For long-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/dummyai.pl --gtf Homo_sapiens.GRCh38.87.sorted.gtf --name PSIsigma --type 2 -nread 10
That’s it. Filtered results (p<0.01) will be listed in the PSIsigma_r10_ir3.sorted.txt.
conda create -n PSIsigma r-essentials r-base perl-app-cpanminus
conda activate PSIsigma
conda install python=3.9
conda install -c conda-forge gcc gsl
cpanm PDL::LiteF
cpanm PDL::GSL::CDF
cpanm PDL::Stats
cpanm Statistics::Multtest
cpanm Statistics::R
# 1-a. If you are a sudo user. Set up working directory for Perl library (Using Perl version 5.18 as an example)
export PERL5LIB=/usr/local/lib/perl/5.18
# 1-b. If you are a local user, you can do like this (https://stackoverflow.com/questions/2980297/how-can-i-use-cpan-as-a-non-root-user)
wget -O- http://cpanmin.us | perl - -l ~/perl5 App::cpanminus local::lib
eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`
echo 'eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`' >> ~/.bashrc
echo 'export MANPATH=$HOME/perl5/man:$MANPATH' >> ~/.bashrc
# 2. Install GSL
apt-get install -y git make g++ gcc python wget libgsl0-dev
# 3. Install PDL::GSL
cpan App::cpanminus
cpanm PDL::LiteF
cpanm PDL::GSL::CDF
cpanm PDL::Stats
cpanm Statistics::Multtest
cpanm Statistics::R
PSI-Sigma on Windows OS =========== PSI-Sigma has been tested in Linux and Mac OS environment. You can install Linux bash shell on Windows to run PSI-Sigma.
To use the PSIsigma-longread-gene-expression.pl:
perl ~/PSIsigma/PSIsigma-longread-gene-expression.pl Homo_sapiens.GRCh38.87.sorted.gtf Experiment.Aligned.sortedByCoord.out.bam
The default setting is using 4 CPUs to calculate gene expression levels by matching constitutive exons in the gene annotation. An extra perl extension (threads) is needed.
https://www.ncbi.nlm.nih.gov/pubmed/31135034
https://pubmed.ncbi.nlm.nih.gov/29449409
https://pubmed.ncbi.nlm.nih.gov/31578525
https://pubmed.ncbi.nlm.nih.gov/32001512/
https://pubmed.ncbi.nlm.nih.gov/34921016/
https://www.nature.com/articles/s41586-023-05820-3