Setup

System requirements

Hardware/Software

Requirement

Operating system

KGGSEE runs in a Java Virtual Machine. It does not matter which operating system it runs in.

Java Runtime Environment

A Java SE Runtime Environment of version 1.8 or higher is needed.

CPU

A CPU with four cores or more is recommended.

Memory

16 GB RAM or higher is recommended.

Free space

KGGSEE and related datasets may take up to 10 GB.

Setup a Java Runtime Environment (JRE)

KGGSEE needs JRE 1.8 or higher. Both Java(TM) SE JRE and OpenJDK JRE are competent.

After installing a JRE, check by entering java -version in a Terminal of Linux/MacOS, or a CMD/PowerShell of MS Windows. If it displays the JRE version like Java(TM) SE Runtime Environment (build x) or OpenJDK Runtime Environment (build x), it means the JRE has already been set up. Otherwise, check if JRE has been installed and if java is in $PATH.

KGGSEE and its running resources

KGGSEE is written in Java and distributed as a Java Archive kggsee.jar. To perform an analysis, corresponding running resources are also needed. For example, reference genotypes and gene annotations are needed for gene-based association tests (GATES and ECS) and heritability estimations (EHE); in addition, eQTL summary statistics are needed for gene-expression causal-effect estimations (EMIC). Thus, kggsee.jar is always needed and which resource files are needed depends on the analysis. We provide the following download links.

File

Description

Size

kggsee.jar

The KGGSEE program

46 MB

resources/

A OneDrive folder containing all running resource files provided by us

resources.zip

Running resource files except for reference genotypes and eQTL summary statistics

362 MB

tutorials.zip

A tutorial dataset to run through the four types of analyses

155 MB

Set up an environment for the Quick tutorials

A quick and easy way to set up an environment for the Quick tutorials is

  1. Download kggsee.jar, resources.zip and tutorials.zip

  2. Unzip resources.zip and tutorials.zip

  3. Put kggsee.jar, resources/ and tutorials/ under one directory.

where resources.zip contains

File

Description

resources/{hg19,hg38}/kggseqv1.1_{hg19,hg38}_GEncode.txt.gz

GENCODE annotations

resources/{hg19,hg38}/kggseqv1.1_{hg19,hg38}_refGene.txt.gz

RefGene annotations

resources/HgncGene.txt.gz

HGNC gene ID

resources/ENSTGene.gz

Ensembl gene ID and transcript ID

resources/*.symbols.gmt.gz

MSigDB gene sets

resources/GTEx_v8_TMM_all.gene.meanSE.txt.gz

The gene-level expression profile of the GTEx v8 tissues

resources/GTEx_v8_TMM_all.transcript.meanSE.txt.gz

The transcript-level expression profile of the GTEx v8 tissues

and tutorials.zip contains

File

Description

tutorials/scz_gwas_eur_chr1.tsv.gz

Chromosome 1 summary statistics of a schizophrenia GWAS with a European sample.

tutorials/1kg_hg19_eur_chr1.vcf.gz

Chromosome 1 genotypes of the European panel of the 1000 Genomes Project

tutorials/GTEx_v8_gene_BrainBA9.eqtl.txt.gz

eQTL summary statistics calculated from the brain BA9 gene-level expression profile of GTEx v8

tutorials/GTEx_v8_transcript_BrainBA9.eqtl.txt.gz

eQTL summary statistics calculated from the brain BA9 transcript-level expression profile of GTEx v8

Set up an environment for customized analyses

In addition to the files packaged in resources.zip, reference genotypes of five 1000 Genomes Project super populations and eQTL summary statistics of 49 GTEx v8 tissues are also available for downloading under resources/:

File

Description

resources/hg19/gty/*.vcf.gz

VCF files of each super-population panel of the 1000 Genomes Project using hg19 coordinates. Each VCF file includes biallelic variants with MAF>0.01 of the super population. The VCF files include autosomes and chrX.

resources/hg38/gty/*.vcf.gz

VCF files of each super-population panel of the 1000 Genomes Project using hg38 coordinates. Each VCF file includes biallelic variants with MAF>0.01 of the super population. The VCF files include only autosomes.

resources/hg19/eqtl/*.eqtl.txt.gz

cis-eQTL summary statistics using hg19 coordinates calculated from the gene or transcript-level expression profile of the GTEx v8 dataset

resources/hg38/eqtl/*.eqtl.txt.gz

cis-eQTL summary statistics using hg38 coordinates calculated from the gene or transcript-level expression profile of the GTEx v8 dataset

Then, a straightforward way to set up an environment for customized analyses is

  1. Download kggsee.jar and resources.zip

  2. Unzip resources.zip, and put kggsee.jar and resources/ under one directory

  3. Download the reference genotypes (1kg_hg19 or 1kg_hg38) of the population that matches your GWAS.

  4. For running EMIC or eDESE, also download the eQTL summary statistics (eqtl_hg19 or eqtl_hg38) of phenotype-associated tissues.

  5. To prepare customized resource files, refer to Detailed Document for descriptions of the file formats.