Skip to content

pgs-calc occassionally creates 2 lines for the same SNP #21

@jakewendt

Description

@jakewendt

pgs-calc occassionally creates 2 lines for the same SNP

If I download all of the harmonized PGS files and create a collection matrix like so ...

module load htslib
pgs-calc create-collection --out=hg19.0001-0999.txt.gz PGS000???.txt.gz

And then look at the chromosome, position and alleles, occasionally there is a duplicate.

zcat hg19.0001-0999.txt.gz| cut -f1-4 | tail -n +6 | uniq -D
1	3329384	T	C
1	3329384	T	C
1	8481016	G	T
1	8481016	G	T
1	27138393	T	C
1	27138393	T	C
1	43926305	C	T
1	43926305	C	T
1	55496039	C	T
1	55496039	C	T
1	55505647	T	G
1	55505647	T	G
1	55518752	T	C
1	55518752	T	C
1	55638546	C	T
1	55638546	C	T

They are not duplicate lines as the values are different.

zgrep -m2 "^1    8481016 " hg19.0001-0999.txt.gz | cut -f1-400


1	8481016	G	T													3.0712999432580546E-6	4.069399892614456E-6		4.505800006882055E-6	1.2084999980288558E-5									1.1064999853260815E-5																	0.004041288048028946	0.003549806075170636	0.004646944813430309											-0.028402794152498245								2.8564030799316242E-5	7.387383084278554E-5	-3.2502509839105187E-6	-1.929324071170413E-6	2.081734919556766E-6	2.043977929133689E-6										4.1333998524351045E-5	-6.75490009598434E-5	9.596299787517637E-5	-9.73819987848401E-6	1.4049000128579792E-5	-2.9073000860080356E-6	-1.65940000442788E-5	-1.0808000297402032E-5	-3.0400999094126746E-5	-3.516599826980382E-5	-1.6987999629236583E-7	-2.8755999665008858E-5	-1.453099957871018E-5	2.471699917805381E-5	-5.8814999647438526E-5	-8.305600204039365E-5	-5.16420004714746E-6	1.023700024234131E-4	-7.3579999479989056E-6	-6.868899799883366E-5	3.448999996180646E-5	1.6570000298088416E-5	-7.298200216609985E-5	-1.868599938461557E-4	-1.2524000339908525E-5	-3.9826001739129424E-5	-2.909499926317949E-5	1.7221000234712847E-5	-8.147399785229936E-5	-6.11930008744821E-5	-2.6996000087819993E-5	-5.572300142375752E-5	-1.0186999861616641E-4	1.49670004248037E-5	-1.1559999984456226E-4	3.402599986657151E-6	-1.7652999667916447E-4	5.640900053549558E-5	-5.7856000239553396E-6	-8.356900252692867E-6	-2.219700036221184E-5	-2.670499998203013E-5	9.222499647876248E-5	-2.1779000235255808E-5	-3.127099989796989E-5	-1.8528000509832054E-4	-1.1295000149402767E-4	-1.238600020769809E-6	-3.0007000532350503E-5	2.6350999178248458E-5	-7.046200335025787E-5	-7.747799827484414E-5	-1.4356999599840492E-4	7.540699880337343E-5	2.1323000964912353E-6	-1.2088000221410766E-4	1.6364999737561448E-6	4.875100057688542E-5	-8.55269972817041E-5	-7.091499719535932E-5	-2.8275999284232967E-5	-2.132400004484225E-5	-1.0472000212757848E-5	4.7674998882030195E-7	-1.9876000806107186E-5	-3.560199911589734E-5	-4.531499871518463E-5	-2.5224999990314245E-4	-3.1480001780437306E-5	-9.217100341629703E-6	-4.5484001020668074E-5	-3.2185998861677945E-5	3.655600085039623E-5	3.8901998777873814E-5	1.0823999900821946E-6	-9.233399759978056E-5	-8.030299795791507E-6	-8.331199933309108E-5	-2.3308000891120173E-5	3.6397000258148182E-6														-9.346699807792902E-4	0.0014767489628866315		2.2372399689629674E-4	8.428919827565551E-4							2.5899000775098102E-6	-6.570900040969718E-6	3.390500069144764E-6	-1.040600000123959E-5	1.7372000229443074E-6		-2.000286076508928E-5																			-1.5007520232757088E-5				-1.2961399988853373E-5	-6.78490505379159E-6				1.1935720067413058E-5		-5.928923928877339E-5	-2.3903310648165643E-5							2.8370230211294256E-5			5.388775025494397E-5		-6.038292212906526E-6		2.2552070731762797E-5			-4.0797400288283825E-5	3.044594905077247E-6		4.537521817837842E-6					6.44251995254308E-5	6.44251995254308E-5	6.753030902473256E-6		
1	8481016	G	T																0.0040430729277431965	0.004129755776375532

I think that this is causing problems downstream, but I can't confirm that until I can correct it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions