The Queensland Brain Institute |
The missing data issueand the data resurrection miracle
[ElCierne ]
April 14, 2023
The Queensland Brain Institute |
What is the missing data issue
• Consequence– Config.xml might need to
be corrected– Missing *.bcl, *.stats can
be recreated– Missing *.filter, *.pos.txt
causes the loss of a tile
April 14, 2023
Critical Run files are missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage
The Queensland Brain Institute |
What causes the missing data issue?
• Files are not transferred correctly– Millisecond hang-ups of the network, which are not
recognized by windows
• RTA did not generate files in the first place– HiSeq computer overload– Mismanagement of parallel threads (two processes
accessing the same file)
April 14, 2023
The Queensland Brain Institute |
Why is it an issue?
• Usual workflow crashes: bclConverter does not proceed if there are missing files.
April 14, 2023
The Queensland Brain Institute |
Solutions to recoverable missing data issues
1. Copy .stats from the same tile of a different cycle– PRO: fast – CON: fudge, trusts RTA, requires separate workflow for missing *.bcl files
2. Recalculate *.stats from *.dif, *.filter and *.bcl (Sanger)– PRO: accurate & fast– CON: requires separate workflow for missing *.bcl files, trusts RTA
3. Calculate *.qseq from *.cif for missing tile (QBI)– PRO: handles missing *.stats, *.bcl– CON: slow, trusts RTA
4. Calculate *.qseq from *.cif for all tiles– PRO: handles missing *.stats, *.bcl, recalculates all – no usage of potentially corrupt
RTA bcl/stats files– CON: slow (days)
April 14, 2023
1 2 3 4
The Queensland Brain Institute |
New workflow with OLB
Identify missing files, calculate qseq for them and merge with the qseqs from the normal workflow to proceed
April 14, 2023
The Queensland Brain Institute |
Details: If *.stats or *.bcl was missing
1. Start offline base caller (OLB) for the missing tiles
2. Comment out missing tile in config.xml and start bclConverter to convert intact tiles(or use setupBclToQseq + bcl2qseq directly with --ignore-missing-bcl or --ignore-missing-stats)
3. Merge *.qseq generated from OLB and bclConverter in one directory (BaseCalls_<date>_<user>)
4. Start GERALD to convert to fastq (_sequence.txt)
April 14, 2023
The Queensland Brain Institute |
Solution requires .cifs to be saved
• Intensity files (*.cif) are not stored by default
– Remember to tick the safe intensity box when starting a run
– Or make it default: In c:/illumina/HiSeqControlSoftware/RTA/RTA.exe.config add
April 14, 2023
<add key="DeleteIntensityFiles" value="0" />
The Queensland Brain Institute |
Acknowledgement
• Thanks to – Dr. Steven Leonard, Informatics Division, The Sanger
Institute. – Eugene, illumina tech-support.
April 14, 2023