-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
pgs-calc occassionally creates 2 lines for the same SNP
If I download all of the harmonized PGS files and create a collection matrix like so ...
module load htslib
pgs-calc create-collection --out=hg19.0001-0999.txt.gz PGS000???.txt.gz
And then look at the chromosome, position and alleles, occasionally there is a duplicate.
zcat hg19.0001-0999.txt.gz| cut -f1-4 | tail -n +6 | uniq -D
1 3329384 T C
1 3329384 T C
1 8481016 G T
1 8481016 G T
1 27138393 T C
1 27138393 T C
1 43926305 C T
1 43926305 C T
1 55496039 C T
1 55496039 C T
1 55505647 T G
1 55505647 T G
1 55518752 T C
1 55518752 T C
1 55638546 C T
1 55638546 C T
They are not duplicate lines as the values are different.
zgrep -m2 "^1 8481016 " hg19.0001-0999.txt.gz | cut -f1-400
1 8481016 G T 3.0712999432580546E-6 4.069399892614456E-6 4.505800006882055E-6 1.2084999980288558E-5 1.1064999853260815E-5 0.004041288048028946 0.003549806075170636 0.004646944813430309 -0.028402794152498245 2.8564030799316242E-5 7.387383084278554E-5 -3.2502509839105187E-6 -1.929324071170413E-6 2.081734919556766E-6 2.043977929133689E-6 4.1333998524351045E-5 -6.75490009598434E-5 9.596299787517637E-5 -9.73819987848401E-6 1.4049000128579792E-5 -2.9073000860080356E-6 -1.65940000442788E-5 -1.0808000297402032E-5 -3.0400999094126746E-5 -3.516599826980382E-5 -1.6987999629236583E-7 -2.8755999665008858E-5 -1.453099957871018E-5 2.471699917805381E-5 -5.8814999647438526E-5 -8.305600204039365E-5 -5.16420004714746E-6 1.023700024234131E-4 -7.3579999479989056E-6 -6.868899799883366E-5 3.448999996180646E-5 1.6570000298088416E-5 -7.298200216609985E-5 -1.868599938461557E-4 -1.2524000339908525E-5 -3.9826001739129424E-5 -2.909499926317949E-5 1.7221000234712847E-5 -8.147399785229936E-5 -6.11930008744821E-5 -2.6996000087819993E-5 -5.572300142375752E-5 -1.0186999861616641E-4 1.49670004248037E-5 -1.1559999984456226E-4 3.402599986657151E-6 -1.7652999667916447E-4 5.640900053549558E-5 -5.7856000239553396E-6 -8.356900252692867E-6 -2.219700036221184E-5 -2.670499998203013E-5 9.222499647876248E-5 -2.1779000235255808E-5 -3.127099989796989E-5 -1.8528000509832054E-4 -1.1295000149402767E-4 -1.238600020769809E-6 -3.0007000532350503E-5 2.6350999178248458E-5 -7.046200335025787E-5 -7.747799827484414E-5 -1.4356999599840492E-4 7.540699880337343E-5 2.1323000964912353E-6 -1.2088000221410766E-4 1.6364999737561448E-6 4.875100057688542E-5 -8.55269972817041E-5 -7.091499719535932E-5 -2.8275999284232967E-5 -2.132400004484225E-5 -1.0472000212757848E-5 4.7674998882030195E-7 -1.9876000806107186E-5 -3.560199911589734E-5 -4.531499871518463E-5 -2.5224999990314245E-4 -3.1480001780437306E-5 -9.217100341629703E-6 -4.5484001020668074E-5 -3.2185998861677945E-5 3.655600085039623E-5 3.8901998777873814E-5 1.0823999900821946E-6 -9.233399759978056E-5 -8.030299795791507E-6 -8.331199933309108E-5 -2.3308000891120173E-5 3.6397000258148182E-6 -9.346699807792902E-4 0.0014767489628866315 2.2372399689629674E-4 8.428919827565551E-4 2.5899000775098102E-6 -6.570900040969718E-6 3.390500069144764E-6 -1.040600000123959E-5 1.7372000229443074E-6 -2.000286076508928E-5 -1.5007520232757088E-5 -1.2961399988853373E-5 -6.78490505379159E-6 1.1935720067413058E-5 -5.928923928877339E-5 -2.3903310648165643E-5 2.8370230211294256E-5 5.388775025494397E-5 -6.038292212906526E-6 2.2552070731762797E-5 -4.0797400288283825E-5 3.044594905077247E-6 4.537521817837842E-6 6.44251995254308E-5 6.44251995254308E-5 6.753030902473256E-6
1 8481016 G T 0.0040430729277431965 0.004129755776375532
I think that this is causing problems downstream, but I can't confirm that until I can correct it
Metadata
Metadata
Assignees
Labels
No labels