PSI-Sigma

Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes. However, previous PSI-detection methods are limited to specific types of AS events. PSI-Sigma is using a new splicing index (PSI_Σ) that is more flexible, can incorporate novel junctions, and can compute PSI values of individual exons in complex splicing events.

PSI-Sigma is now published: https://www.ncbi.nlm.nih.gov/pubmed/31135034

Updates

PSI-Sigma workshop:

Docker/Singularity version:

docker pull docker.io/woodydon/psi_sigma_pipeline:4.0

singularity pull docker://woodydon/psi_sigma_pipeline:4.0

singularity pull docker.io/woodydon/psi_sigma_pipeline:4.0

Run PSI-Sigma with singularity (example):

singularity exec --bind /mnt:/mnt docker://woodydon/psi_sigma_pipeline:4.0 perl /usr/local/bin/PSI-Sigma-2.3/dummyai.pl --gtf Homo_sapiens.GRCh38.100.sorted.gtf --nread 10 --name PSIsigma2d1 --type 1 --fmode 3 --threads 6

The latest release: https://github.com/wososa/PSI-Sigma/releases/tag/v2.3
Try the “–help” and “–threads” function.
Papers using PSI-Sigma: Nature 2019 Nature 2023
Alignment file for nanopore long-read PCR-cDNA-seq of human U87 cells: https://dropfiles.cshl.edu/link/mpkT92runIvNJ3QeBvZPzC

AUTHOR/SUPPORT

Kuan-Ting (Woody) Lin, klin@cshl.edu

Alignment files

For short-read RNA-seq data, please generate .bam, .bai and .SJ.out files by using STAR (https://github.com/alexdobin/STAR).

###This is an example for short-read RNA-seq###
STAR --runThreadN 6 \
	--outSAMtype BAM SortedByCoordinate \
	--outFilterIntronMotifs RemoveNoncanonical \
	--genomeDir ~/index/starR100H38 \
	--twopassMode Basic \
	--readFilesIn R1.fastq R2.fastq \
	--outFileNamePrefix <NAME>.
samtools index <NAME>.Aligned.sortedByCoord.out.bam

For long-read RNA-seq data, please use GMAP (http://research-pub.gene.com/gmap/src/gmap-gsnap-2017-11-15.tar.gz) or minimap2 (https://github.com/lh3/minimap2).

###This is an example for long-read RNA-seq###
#Example of using GMAP#
~/gmap-2017-11-15/bin/gmap -d GRCh38 -f samse --min-trimmed-coverage=0.5 --no-chimeras -B 5 -t 6 MinION_long_read.fastq > <NAME>.sam
#Example of using minimap2#
~/minimap2-2.17/minimap2 -ax splice:hq -uf H38.fa MinION_long_read.fastq > <NAME>.sam

samtools view -bS <NAME>.sam > <NAME>.bam
samtools sort <NAME>.bam -o <NAME>.Aligned.sortedByCoord.out.bam
samtools index <NAME>.Aligned.sortedByCoord.out.bam

Quick Start ====== Create links to the .bam, .bai, and .SJ.out files in the a folder (afolder). If you are using long-read RNA-seq data, .SJ.out files will be generated automatically since GMAP doesn’t produce the file.

mkdir afolder
cd afolder
ln -s bamfolder/*.bam* .
ln -s bamfolder/*.SJ.* .

Download a .gtf file and sort the coordinates. (NOTE: sorting .gtf file is necessary!)

wget ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens//Homo_sapiens.GRCh38.87.gtf.gz
gzip -d Homo_sapiens.GRCh38.87.gtf.gz
(grep "^#" Homo_sapiens.GRCh38.87.gtf; grep -v "^#" Homo_sapiens.GRCh38.87.gtf | sort -k1,1 -k4,4n) > Homo_sapiens.GRCh38.87.sorted.gtf
rm Homo_sapiens.GRCh38.87.gtf

Create two files: (1) groupa.txt and (2) groupb.txt. Please put the full name or the suffix of your .bam files in the groupa.txt or groupb.txt. For example, the suffix of a “Sequins_MixA.Aligned.sortedByCoord.out.bam” file is “Sequins_MixA”. Groupa.txt will be compared with groupb.txt. Below is an example:

#Note: one file name per line in groupa.txt and groupb.txt
echo Sequins_MixA.Aligned.sortedByCoord.out.bam >> groupa.txt
echo Sequins_MixB.Aligned.sortedByCoord.out.bam >> groupb.txt

#Alternatively, you can put only the suffix (WARNNING: only works when the .bam files are linked to the working directory)
echo Sequins_MixA > groupa.txt
echo Sequins_MixB > groupb.txt

Run dummyai.pl. After the .gtf file, please specify 1 for short-read RNA-seq and 2 for long-read RNA-seq. The last column is used to specify the minimum number of supporting reads for an AS event (10 is specified in the example below).

#For short-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/dummyai.pl --gtf Homo_sapiens.GRCh38.87.sorted.gtf --name PSIsigma --type 1 -nread 10
#For long-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/dummyai.pl --gtf Homo_sapiens.GRCh38.87.sorted.gtf --name PSIsigma --type 2 -nread 10

That’s it. Filtered results (p<0.01) will be listed in the PSIsigma_r10_ir3.sorted.txt.

Filtered Results (p<0.01): PSIsigma_r10_ir3.sorted.txt
Unfiltered Results: PSIsigma_r10_ir3.txt
Junction Read File: *.SJ.out.tab
Intronic Read File: *.IR.out.tab
Database File: *.db
BAM File: *.bam
GTF File: *.gtf (http://useast.ensembl.org/info/data/ftp/index.html/)

OUTPUT

Event Region: Genomic coordinates of the splicing event.
Gene Symbol: Gene symbol of the splicing event.
Target Exon: Genomic coordinates of the alternative exon.
Event Type: Category of the splicing event.
N: the number of valid samples in groupa.txt (influenced by the number of supporting reads).
T: the number of valid samples in groupb.txt (influenced by the number of supporting reads).
Exon Type: Whether the exon is a novel exon or an exon related to nonsense mediated decay (NMD).
Reference Transcript: The transcript ID in the gene annotation file.
ΔPSI (%): the average difference of PSI values in groupa.txt and groupb.txt.
T-test p-value: p-value derived from two-sample t-test.
FDR (BH): false discovery rate based on the p-values.
N Values: It shows all valid PSI values derived from the .SJ.out.tab files based on groupa.txt. (influenced by the number of supporting reads).
T Values: It shows all valid PSI values derived from the .SJ.out.tab files based on groupb.txt. (influenced by the number of supporting reads).
Database ID: It shows the accession number of the splicing event in the database of PSI-Sigma (e.g., PSIsigma.db).

DATABASE CREATION (for advanced users)

Wikipage: https://github.com/wososa/PSI-Sigma/wiki/Database-creation-manual

SOFTWARE REQUIREMENTS

Perl (https://www.perl.org/get.html)
Samtools (http://www.htslib.org)
R (https://www.r-project.org) (For version 1.9k and when –adjp 2 is used)

Perl EXTENSIONS

PDL::LiteF
PDL::Stats
PDL::GSL::CDF
Statistics::Multtest
Statistics::R (For version 1.9k and when –adjp 2 is used)

EXAMPLE of INSTALLING Perl EXTENSIONS (Anaconda)

conda create -n PSIsigma r-essentials r-base perl-app-cpanminus
conda activate PSIsigma
conda install python=3.9
conda install -c conda-forge gcc gsl
cpanm PDL::LiteF
cpanm PDL::GSL::CDF
cpanm PDL::Stats
cpanm Statistics::Multtest
cpanm Statistics::R

EXAMPLE of INSTALLING Perl EXTENSIONS (without Anaconda)

# 1-a. If you are a sudo user. Set up working directory for Perl library (Using Perl version 5.18 as an example)
export PERL5LIB=/usr/local/lib/perl/5.18

# 1-b. If you are a local user, you can do like this (https://stackoverflow.com/questions/2980297/how-can-i-use-cpan-as-a-non-root-user)
wget -O- http://cpanmin.us | perl - -l ~/perl5 App::cpanminus local::lib
eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`
echo 'eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`' >> ~/.bashrc
echo 'export MANPATH=$HOME/perl5/man:$MANPATH' >> ~/.bashrc

# 2. Install GSL
apt-get install -y git make g++ gcc python wget libgsl0-dev

# 3. Install PDL::GSL
cpan App::cpanminus
cpanm PDL::LiteF
cpanm PDL::GSL::CDF
cpanm PDL::Stats
cpanm Statistics::Multtest
cpanm Statistics::R

PSI-Sigma on Windows OS =========== PSI-Sigma has been tested in Linux and Mac OS environment. You can install Linux bash shell on Windows to run PSI-Sigma.

Linux Bash Shell on Windows: https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/

Gene Expression Analysis for nanopore long-read RNA-seq

To use the PSIsigma-longread-gene-expression.pl:

perl ~/PSIsigma/PSIsigma-longread-gene-expression.pl Homo_sapiens.GRCh38.87.sorted.gtf Experiment.Aligned.sortedByCoord.out.bam

The default setting is using 4 CPUs to calculate gene expression levels by matching constitutive exons in the gene annotation. An extra perl extension (threads) is needed.

CITATION

https://www.ncbi.nlm.nih.gov/pubmed/31135034

Lin, K. T. & Krainer, A. R. PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis. Bioinformatics, doi:10.1093/bioinformatics/btz438 (2019).

PSI-Sigma PRESENTATION

Oxford Nanopore London Calling 2019: https://vimeo.com/339511487

PUBLICATIONS USING PSI-Sigma

https://pubmed.ncbi.nlm.nih.gov/29449409

K. T. Lin, W. K. Ma, J. Scharner et al., A human-specific switch of alternatively spliced AFMID isoforms contributes to TP53 mutations and tumor recurrence in hepatocellular carcinoma. Genome Res, (2018).

https://pubmed.ncbi.nlm.nih.gov/31578525

A. Yoshimi, K. T. Lin, D. H. Wiseman et al., Coordinated alterations in RNA splicing and epigenetic regulation drive leukaemogenesis. Nature 574, 273-277 (2019).

https://pubmed.ncbi.nlm.nih.gov/32001512/

M. A. Rahman, K. T. Lin, R. K. Bradley et al., Recurrent SRSF2 mutations in MDS affect both splicing and NMD. Genes Dev 34, 413-427 (2020).

https://pubmed.ncbi.nlm.nih.gov/34921016/

W. K. Ma, D. M. Voss, J. Scharner et al., ASO-based PKM splice-switching therapy inhibits hepatocellular carcinoma growth. Cancer Res, (2021).

https://www.nature.com/articles/s41586-023-05820-3

A. Jbara, K. T. Lin, C. Stossel et al., RBFOX2 modulates a metastatic signature of alternative splicing in pancreatic cancer. Nature, (2023).

Commercial Use

For licensing, please contact CSHL tech transfer office: narayan@cshl.edu