The following datasets were used in our benchmarks :

NameSpeciesDescription
H0 HG37.1 The dataset H0 is composed of 10 millions reads of length 40bps. H0 contains 49 reads with N. The quality file H0.fq associated to the read file H0.fa contains quality J for all reads and all positions.
H1 HG37.1 The dataset H1 is composed of 10 millions reads of length 40bps. H1 is built from H0 by adding exactly one mismatch to each read. H1 contains 49 reads with N. The quality file H1.fq associated to the read file H1.fa contains quality J for all reads and all positions.
H2 HG37.1 The dataset H2 is composed of 10 millions reads of length 40bps. H2 is built from H0 by adding exactly two mismatches to each read. H2 contains 49 reads with N. The quality file H2.fq associated to the read file H2.fa contains quality J for all reads and all positions.
H3 HG37.1 10 millions single end read. Mutations were added using an in house perl script using the H0 data set as reference.
Hl0 HG37.1 The dataset Hl0 is composed of 10 millions reads of length 100bps. Hl0 contains 113 reads with N. The quality file Hl0.fq associated to the read file Hl0.fa contains quality J for all reads and all positions.
Hl1 HG37.1 The dataset Hl1 is composed of 10 millions reads of length 100bps. Hl1 is built from Hl0 by adding exactly one mismatch to each read. Hl1 contains 112 reads with N. The quality file Hl1.fq associated to the read file Hl1.fa contains quality J for all reads and all positions.
Hl2 HG37.1 The dataset Hl2 is composed of 10 millions reads of length 100bps. Hl2 is built from Hl0 by adding exactly two mismatches to each read. Hl2 contains 112 reads with N. The quality file Hl2.fq associated to the read file Hl2.fa contains quality J for all reads and all positions.
Hl3 HG37.1 The dataset Hl3 is composed of 10 millions reads of length 100bps. Hl3 is built from Hl0 by adding exactly three mismatches to each read. Hl3 contains 111 reads with N. The quality file Hl3.fq associated to the read file Hl3.fa contains quality ! for all reads and all positions.
Hl3d HG37.1 The dataset Hl3d is composed of 10 millions reads of length 97 bps. Hl3d is built from Hl0 by deleting 3 bps to each read. Hl3d contains 111 reads with N
Hl3i HG37.1 The dataset Hl3i is composed of 10 millions reads of length 103 bps. Hl3i is built from Hl0 by adding 3 bps to each read. Hl3i contains 112 reads with N
Hp0 HG37.1 The dataset Hp0 is composed of 5 millions pairs of reads of length 100 bps. Hp0 does not contain any read with N. The quality file Hp0.fq associated to the read file Hp0.fa contains quality I for all reads and all positions.
Hp1 HG37.1 The dataset Hp1 is composed of 5 millions paired of reads of length 100 bps. Hp1 does not contain any read with N. The quality file Hp1.fq associated to the read file Hp1.fa contains quality I for all reads and all positions.
Hp2 HG37.1 The dataset Hp2 is composed of 5 millions paired of reads of length 100 bps. Hp2 does not contain any read with N. The quality file Hp2.fq associated to the read file Hp2.fa contains quality I for all reads and all positions.
Hp3 HG37.1 The dataset Hp3 is composed of 5 millions paired of reads of length 100 bps. Hp3 does not contain any read with N. The quality file Hp3.fq associated to the read file Hp3.fa contains quality I for all reads and all positions.
Hp3d HG37.1 The dataset Hp3d is composed of 5 millions paired of reads of length 97 bps. Hp3d is built from Hp0 by deleting 3 bps to each read. Hp3d does not contain any read with N. The quality file Hp3d.fq associated to the read file Hp3d.fa contains quality I for all reads and all positions.
Hp3i HG37.1 The dataset Hp3i is composed of 5 millions paired of reads of length 103 bps. Hp3i is built from Hp0 by adding 3 bps to each read. Hp3i does not contain any read with N. The quality file Hp3i.fq associated to the read file Hp3i.fa contains quality I for all reads and all positions.