47
R-loop proximity proteomics identiヲes a role of DDX41 in transcription-associated genomic instability Thorsten Mosler Institute of Molecular Biology (IMB) Francesca Conte Institute of Molecular Biology (IMB) Ivan Mikicic Institute of Molecular Biology (IMB) https://orcid.org/0000-0003-3393-0688 Nastasja Kreim Institute of Molecular Biology Martin Möckel Institute of Molecular Biology (IMB) Johanna Flach Medical Faculty Mannheim of the Heidelberg University Brian Luke Johannes Gutenberg University of Mainz https://orcid.org/0000-0002-1648-5511 Petra Beli ( [email protected] ) Institute of Molecular Biology (IMB) https://orcid.org/0000-0001-9507-9820 Article Keywords: DDX41, genomic instability, proximity proteomics, R-loops, transcription Posted Date: April 26th, 2021 DOI: https://doi.org/10.21203/rs.3.rs-337351/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Version of Record: A version of this preprint was published at Nature Communications on December 16th, 2021. See the published version at https://doi.org/10.1038/s41467-021-27530-y.

R-loop proximity proteomics identies a role of DDX41 in

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: R-loop proximity proteomics identies a role of DDX41 in

R-loop proximity proteomics identi�es a role ofDDX41 in transcription-associated genomicinstabilityThorsten Mosler 

Institute of Molecular Biology (IMB)Francesca Conte 

Institute of Molecular Biology (IMB)Ivan Mikicic 

Institute of Molecular Biology (IMB) https://orcid.org/0000-0003-3393-0688Nastasja Kreim 

Institute of Molecular BiologyMartin Möckel 

Institute of Molecular Biology (IMB)Johanna Flach 

Medical Faculty Mannheim of the Heidelberg UniversityBrian Luke 

Johannes Gutenberg University of Mainz https://orcid.org/0000-0002-1648-5511Petra Beli  ( [email protected] )

Institute of Molecular Biology (IMB) https://orcid.org/0000-0001-9507-9820

Article

Keywords: DDX41, genomic instability, proximity proteomics, R-loops, transcription

Posted Date: April 26th, 2021

DOI: https://doi.org/10.21203/rs.3.rs-337351/v1

License: This work is licensed under a Creative Commons Attribution 4.0 International License.  Read Full License

Version of Record: A version of this preprint was published at Nature Communications on December 16th,2021. See the published version at https://doi.org/10.1038/s41467-021-27530-y.

Page 2: R-loop proximity proteomics identies a role of DDX41 in

1

R-loop proximity proteomics identifies a role of DDX41 in transcription-1

associated genomic instability 2

Thorsten Mosler1, Francesca Conte1, Ivan Mikicic1, Nastasja Kreim1, Martin M. Möckel1, 3

Johanna Flach2, Brian Luke1,3, Petra Beli1,3,# 4

5

1Institute of Molecular Biology (IMB), Mainz, Germany 6

2Department of Hematology and Oncology, Medical Faculty Mannheim of the Heidelberg 7

University, Mannheim, Germany 8

3Institute of Developmental Biology and Neurobiology (IDN), Johannes Gutenberg-Universität, 9

Mainz, Germany. 10

#Correspondence should be addressed to [email protected] 11

12

13

14

15

16

17

18

19

20

21

Keywords: DDX41, genomic instability, proximity proteomics, R-loops, transcription 22

23

Page 3: R-loop proximity proteomics identies a role of DDX41 in

2

Abstract 24

Transcription can pose a threat to genomic stability through the formation of R-loops that 25

obstruct the progression of replication forks. R-loops are three-stranded nucleic acid structures 26

formed by an RNA-DNA hybrid with a displaced non-template DNA strand. We developed 27

RDProx to identify proteins that regulate R-loops in human cells. RDProx relies on the 28

expression of the hybrid-binding domain (HBD) of Ribonuclease H1 (RNaseH1) fused to the 29

ascorbate peroxidase (APEX2) and permits mapping of the R-loop proximal proteome using 30

quantitative mass spectrometry. We implicate different cellular proteins in R-loop regulation 31

and identify a role of the tumor suppressor DEAD box protein 41 (DDX41) in opposing R-32

loop-dependent genomic instability. Depletion of DDX41 resulted in replication stress, double 33

strand breaks and inflammatory signaling. DDX41 opposes the accumulation of R-loops by 34

unwinding RNA-DNA hybrids at gene promoters and its loss leads to upregulation of TGFβ 35

and NOTCH signaling genes. Germline loss-of-function mutations in DDX41 lead to 36

predisposition to acute myeloid leukemia (AML) in adulthood. We propose that accumulation 37

of R-loops at CpG island promoters, altered TGFβ and NOTCH signaling, and inflammatory 38

response contribute to the development of familial AML with mutated DDX41. 39

Page 4: R-loop proximity proteomics identies a role of DDX41 in

3

Transcription by RNA polymerase II (RNAPII) is essential to all cellular processes and to the 40

response of cells to internal and external stimuli. Dysregulated transcription resulting in high 41

transcription rates and increased frequency of transcription-replication conflicts is observed in 42

tumors. Accordingly, targeting different mechanisms that enable tumor cells to cope with 43

transcription stress is being explored as a therapeutic strategy1. Co-transcriptional R-loops are 44

three-stranded nucleic acid structures formed by an RNA-DNA hybrid with a displaced non-45

template DNA strand. R-loops are most prevalent at gene promoters where they promote 46

transcription initiation2,3. R-loops formed at CpG islands (CGIs) that are present at 47

approximately 60% of gene promoters protect these regions from DNA methylation by 48

repelling DNA methyl-transferases (DNMTs)2,4. In addition to reducing the binding of 49

DNMTs, R-loop-dependent recruitment of GADD45A and active de-methylation by TET 50

enzymes has been proposed as mechanism for loss of CGIs methylation3. Furthermore, R-51

loops are present at the three prime end of genes where they facilitate transcription termination 52

by stalling RNAPII downstream of the polyadenylation sequence5,6. R-loops are prevalent in 53

highly transcribed genes and accumulate in repeats such as centromeres, telomeres and 54

retrotransposons7–11. Genome-wide approaches for mapping R-loops in human cells revealed 55

that R-loops occupy up to 5% of unique sequences12. Presence of G-quadruplexes (G4s) on 56

the displaced DNA contribute to the stabilization of R-loops13. Modification of RNA in the 57

hybrid with the N6-methyladenosine (m6a) provides an additional layer of regulation through 58

m6a reader proteins14–16. In addition to the regulatory functions of R-loops in transcription, 59

DNA repair, telomere maintenance and chromosome segregation, these non-B DNA structures 60

can be a driver of genomic instability17–22. The single stranded DNA in the R-loops is more 61

prone to DNA damage23,24. In cycling cells, R-loops form an obstacle for the replication 62

machinery potentially resulting in transcription-replication conflicts that can lead to fork 63

breakage and double strand breaks (DSBs)25–27. Furthermore, stalled transcription complexes 64

at R-loops can trigger their processing by nucleotide excision repair endonucleases XPG and 65

XPF into DSBs19. Different proteins regulate R-loop levels in human cells either by preventing 66

their formation or by assisting their resolution. RNA-binding proteins that bind to nascent 67

RNAs and are involved in the maturation or export of mRNA, such as the THO complex or 68

the nuclear exosome, oppose R-loop formation27,28. Furthermore, negative supercoiling of 69

DNA that is normally relaxed by Topoisomerase 1 (TOP1) favors R-loop formation25,29. Once 70

Page 5: R-loop proximity proteomics identies a role of DDX41 in

4

formed, R-loops can be removed by the action of Ribonuclease H1 (RNaseH1) and H2 71

(RNaseH2) that cleave the RNA in the hybrid 30. Different helicases including SETX, DDX5 72

and DDX39B have been implicated in the unwinding of RNA-DNA hybrids and resolution of 73

R-loops5,18,31. 74

Dead box helicase 41 (DDX41) is a tumor suppressor that is conserved in D.melanogaster, 75

C.elegans, D.rerio and plants, and is considered essential to cell growth and viability32–35. 76

Somatic and germline mutations of DDX41 are present in 0.5% to 4% of adult myelodysplastic 77

syndrome (MDS)/acute myeloid leukemia (AML) cohorts and are considered as oncogenic 78

drivers36. Pathogenic germline variants in DDX41 predominantly lead to frameshifts and 79

production of truncated protein forms, whereas somatic mutations are mostly located within 80

the DEAD box and helicase domain most likely resulting in protein with compromised helicase 81

activity37. DDX41 has been reported to interact with components of the spliceosome, and 82

DDX41 deletion or mutations led to splicing defects and faulty RNA processing38. The role of 83

DDX41 in RNA processing appears to be conserved: The C.elegans orthologue SACY-1 was 84

recently shown to associate with the spliceosome, and impacts the transcriptome through 85

splicing-dependent and -independent mechanisms33. Despite the relevance of DDX41 in 86

cancer, the cellular and molecular functions of DDX41 remain poorly understood. 87

We employed quantitative mass spectrometry (MS)-based proteomics to identify proteins that 88

regulate R-loops in human cells. To this end, we developed RDProx that enables mapping of 89

the R-loop-proximal proteome using the fusion protein of the hybrid-binding domain (HBD) 90

of RNaseH1 and an engineered variant of ascorbate peroxidase (APEX2). We implicated 91

proteins with different cellular functions in R-loop regulation and characterized the role of the 92

tumor suppressor DDX41 in opposing transcription-associated genomic instability. DDX41 93

binds and unwinds RNA-DNA hybrids to suppress the accumulation of R-loops at gene 94

promoters. Furthermore, knockdown of DDX41 leads to dependency on ATR signaling and 95

transcriptional changes including upregulation of TGFβ and NOTCH signaling genes. We 96

propose that accumulation of co-transcriptional R-loops, replication stress and associated 97

inflammatory response contribute to the development of familial AML and MDS with mutated 98

DDX41. 99

Page 6: R-loop proximity proteomics identies a role of DDX41 in

5

Results 100

RDProx allows mapping of the R-loop proximal proteome 101

Tight regulation of R-loop levels across the genome is essential for their function in promoting 102

chromatin-associated processes and for preventing R-loop-dependent genomic instability. To 103

gain insights into protein-based mechanisms that regulate R-loop homeostasis, we probed the 104

R-loop-proximal protein networks using RDProx. To this end, we fused the HBD of RNaseH1 105

to an engineered variant of soybean ascorbate peroxidase (APEX2)39. As a negative control, 106

we employed a construct harboring three point mutations in the HBD (WKK) that led to a loss 107

of affinity towards RNA-DNA hybrids (Supplementary Figure 1a)40. In accordance, only GFP-108

tagged HBD and not the WKK mutant associated with chromatin under pre-extraction 109

conditions (Supplementary Figure 1b). APEX2-HBD or HBD-WKK fusion proteins were 110

expressed in HEK293T cells and labeling of proximal proteins was induced in vivo (Figure 1a, 111

Supplementary Figure 1c). Biotinylated proteins were enriched using streptavidin and 112

analyzed by liquid chromatography (LC)-tandem mass spectrometry (MS/MS). Stable isotope 113

labeling with amino acids in cell culture (SILAC) was used to distinguish proteins that are 114

proximal to the wild type HBD compared to WKK mutant (Figure 1a). We performed three 115

replicate experiments that showed excellent reproducibility (r > 0.85) and identified 312 116

proteins enriched with high confidence by RDProx (log2 FC > 2; FDR < 0.01) (Figure 1b, 117

Supplementary Figure 1d, Supplementary Table 1). Among these proteins, we identified 118

previously known R-loop regulators such as TOP1, AQR, single stranded DNA binding 119

proteins RPA1/2 and components of the THO complex (THOC1/2/6, THOC6, ALYREF) and 120

nuclear exosome (EXOSC7, EXOSC10). R-loop proximal proteins were associated with 121

RNA-N6-metyladenosine dioxygenase activity, nucleosome binding, RNA binding, basal 122

transcription machinery binding, telomeric DNA binding, RNA polymerase II complex 123

binding and DNA replication origin binding (Figure 2a, b). Proteins in the R-loop proximal 124

proteome showed functional interactions as demonstrated by the identification of different 125

protein clusters involved in splicing, m6A regulation, mRNA 3’ end processing, RNAPII 126

elongation and DNA replication and repair (Figure 2a). These proteins were enriched with 127

domains typical for RNA and DNA-binding proteins including helicase, DEAD/DEAH, HMG 128

1/2 box, BRCT, ARID, SNF2 and Znf GATA (Figure 2c). 129

130

Page 7: R-loop proximity proteomics identies a role of DDX41 in

6

Depletion of DDX41 leads to replication stress and R-loop-dependent genomic instability 131

To assess the possible function of the identified DEAD box helicases in regulation of R-loops, 132

we monitored the intensity of histone variant H2AX phosphorylation on Ser140 (γH2AX) 133

upon depletion of DDX27, DDX41, DDX42, DHX37 and DDX39A. Only depletion of 134

DDX41 and AQR led to a notable increase in nuclear yH2AX intensity, suggesting that the 135

other helicases might regulate R-loops in specific genomic regions or under stress conditions 136

(Figure 3a). Increased levels of yH2AX were also confirmed by western blotting 137

(Supplementary Figure 2a). Overexpression of RNaseH1 under control of a doxycycline-138

inducible promoter partially rescued the effect of DDX41 knockdown on yH2AX pointing to 139

an R-loop-dependent genomic instability (Figure 3b). Overexpression of nuclear RNaseH1 140

also rescued the increased formation of DSBs in DDX41 knockdown cells measured by neutral 141

comet assay (Figure 3c). Increased formation of DSBs in DDX41 knockdown cells was also 142

confirmed by monitoring 53BP1 foci formation (Figure 3d). Knockdown of DDX41 resulted 143

in increased phosphorylation of RPA on Ser33 and significantly reduced DNA fiber length 144

similarly to mild replication stress induced by DNA polymerase inhibition with aphidicolin 145

(Figure 3d, e). Accordingly, increase in yH2AX and pRPA intensity was predominant in S 146

phase (Supplementary Figure 2b, c). Moreover, the phosphorylation of RPA was mediated by 147

ATR, since inhibition with VE-821 significantly reduced the increase in pRPA intensity after 148

DDX41 knockdown (Supplementary Figure 2d). To corroborate that these cells depend on 149

ATR activity to respond to replication stress, we treated U2OS cells after DDX41 knockdown 150

and OCI-AML3 cells expressing DDX41 disease variants (L237F/P238T and R525H) with 151

ATR inhibitors and monitored their viability. Both DDX41 knockdown and disease variants-152

expressing cells displayed sensitivity to ATR inhibition (Supplementary Figure 2e, f). 153

154

DDX41 depletion results in R-loop accumulation in hematopoietic stem and progenitor 155

cells 156

To confirm that DDX41 is proximal to R-loops in human cells using an alternative approach, 157

we performed proximity ligation assays with antibodies against endogenous DDX41 and GFP-158

tagged HBD or HBD-WKK (Figure 4a). The proximity to R-loops and the occurrence of 159

spontaneous DNA damage and replication stress in DDX41 knockdown cells, prompted us to 160

investigate whether DDX41 opposes the accumulation of R-loops. To test this, we performed 161

Page 8: R-loop proximity proteomics identies a role of DDX41 in

7

dot-blot analysis with the RNA-DNA hybrid antibody S9.6. Indeed, knockdown of DDX41 162

resulted in accumulation of RNA-DNA hybrids that were sensitive to RNaseH1 163

overexpression (Figure 4b). Using chromatin-bound GFP-tagged HBD as proxy for R-loops, 164

we could confirm increased levels of R-loops upon depletion of DDX41 and AQR as positive 165

control (Supplementary Figure 3a). Importantly, inhibition of transcription elongation by 166

treating cells with DRB for 3 hours partially rescued the effect of DDX41 knockdown on R-167

loop accumulation (Figure 4c, Supplementary Figure 3a). 168

To test whether DDX41 also regulates R-loop levels in hematopoietic stem and progenitor 169

cells (HSPCs) in which loss-of-function mutations of DDX41 result in AML, we depleted 170

DDX41 in human CD34+ HSPCs, and monitored R-loops and DSBs using S9.6 and 53BP1 171

foci, respectively. Importantly, we found that depletion of DDX41 using two different shRNAs 172

led to a significant increase of R-loops and spontaneous DSB formation (Figure 4e). 173

Expression of DDX41 variants found in AML patients in either the DEAD (L237F/P238T) or 174

helicase domain (R525H) in CD34+ cells also resulted in accumulation of R-loops and 53BP1 175

foci (Figure 4f, Supplementary Figure 3b, c). 176

177

DDX41 can bind and unwind RNA-DNA hybrids in vitro 178

Our experiments demonstrated that DDX41 opposes the accumulation of R-loops and R-loop-179

dependent genomic instability. To address the question whether DDX41 can directly bind and 180

unwind RNA-DNA hybrids in R-loops, we purified full length DDX41, DDX41 lacking the 181

helicase domain (153-410) and the AML-associated R525H variant in the C-terminus of the 182

RecA-like helicase core domain (Figure 5a, Supplementary Figure 4a). Full length DDX41 183

was incubated with five different fluorophore-conjugated oligonucleotide substrates to 184

determine the binding affinity. Binding of DDX41 to the substrates resulted in a change of 185

fluorescence polarization. We found that recombinant DDX41 possesses the strongest affinity 186

(Kd=2.5µM +/- 1.4 µM) to RNA-DNA hybrids in vitro compared to other nucleic acid 187

substrates (Figure 5b). Furthermore, we employed an ATPase assay to determine whether the 188

ATPase domain of DDX41 hydrolyzes ATP when encountering an RNA-DNA hybrid 189

substrate. We found that the intrinsic ATPase activity of DDX41 was stimulated by RNA-190

DNA hybrids with a single-stranded DNA overhang (Figure 5c, Supplementary Figure 4b). 191

To test whether the ATPase activity was accompanied by unwinding of this substrate, we 192

Page 9: R-loop proximity proteomics identies a role of DDX41 in

8

established a FRET-based displacement assay. Separation of a fluorophore-labeled DNA 193

strand and a quencher-conjugated RNA strand resulted in increased fluorescence intensity. 194

Importantly, DDX41 not only bound but it also unwound the RNA-DNA hybrids in vitro in a 195

concentration-dependent manner, whereas DDX41 lacking the helicase domain was not able 196

to separate the two strands (Figure 5d, e). Also, DDX41 harboring the disease-associated 197

R525H variant showed decreased efficiency in RNA-DNA hybrid unwinding (Figure 5e). 198

Taken together, recombinant full length DDX41 preferentially binds RNA-DNA hybrids 199

compared to other nucleic acids and can unwind RNA-DNA hybrids in vitro. 200

201

DDX41 depletion is associated with R-loop accumulation at CGI promoters and 202

upregulation of TGFβ and NOTCH signaling genes 203

To quantify R-loops genome-wide in wild type cells and upon knockdown of DDX41, we 204

performed MapR that uses a catalytically-dead RNaseH1 to target micrococcal nuclease to R-205

loops, which are subsequently cleaved, released, and identified by sequencing41,42. The 206

majority of R-loops were identified in promoters and the levels of R-loops at promoters 207

positively correlated with gene expression measured by RNA-sequencing (Figure 6a, 208

Supplementary Figure 5a, b, c, d, Supplementary Table 2, 3). Inhibition of transcription by 209

Actinomycin D resulted in a dramatic decrease of R-loops at promoters (Figure 6a, 210

Supplementary Figure 5c, d), whereas knockdown of DDX41 led to a significant increase of 211

R-loops in promoter regions (Figure 6a, Supplementary Figure 5c, d). Differential binding 212

analysis revealed significant changes (FDR < 0.05) of the MapR signal in 2,793 genomic 213

regions in the absence of DDX41 (Figure 6b). We observed 1,539 regions with significant 214

increase in MapR signal (i.e. R-loop accumulation) after DDX41 knockdown, of which 93.5% 215

overlapped with CGI promoters (Figure 6b, c). In contrast, this was true for only 0.1% of the 216

1,254 regions with decreased MapR signal. Moreover, regions with decreased MapR signal 217

mostly overlapped with introns and distal intergenic regions (Supplementary Figure 5e). 218

Pathway analysis of genes displaying R-loop accumulation in the CGI promoters revealed 219

TGFβ and NOTCH signaling, SUMOylation, regulation of TP53 and PTEN and chromatin 220

organization (Figure 6d). RNA-sequencing after DDX41 knockdown demonstrated that 221

transcript levels of 34% of genes displaying R-loop accumulation were significantly 222

upregulated, whereas transcript levels of 39% of genes displaying R-loop accumulation were 223

Page 10: R-loop proximity proteomics identies a role of DDX41 in

9

not affected (Figure 6e). Notably, TGFβ and NOTCH signaling genes exhibited accumulation 224

of R-loops at CGI promoters and upregulated transcripts. “Chromatin organization” term was 225

enriched among the genes that displayed R-loop accumulation without significant changes in 226

transcript expression (Figure 6e). Interestingly, RNA-sequencing revealed upregulation of 227

inflammatory signaling in DDX41 knockdown cells, in particular NF-kB signaling, which was 228

confirmed also by increased nuclear localization of NF-kB subunit p65 in these cells 229

(Supplementary Figure 5f, g). Inflammatory signaling genes did not display an accumulation 230

of R-loops, suggesting that these genes are not directly regulated by DDX41, but might be 231

regulated as consequence of DDX41-induced DSBs and genomic instability. 232

Page 11: R-loop proximity proteomics identies a role of DDX41 in

10

Discussion 233

R-loop levels across the genome need to be balanced to ensure the regulation of chromatin-234

associated processes without inflicting DNA damage and genomic instability. Here, we have 235

developed RDProx that provides a snapshot of the R-loop-proximal proteome in human cells. 236

We identified 600 R-loop-proximal proteins and divided them in two categories (Tier 1 and 237

Tier 2) depending on the probability of their presence at R-loops, providing a rich resource for 238

further functional investigations. Previous studies have employed the RNA-DNA hybrid 239

antibody S9.6 for co-immunoprecipitation of proteins that associate with R-loops or used an 240

in vitro-generated RNA-DNA hybrid to pull down interacting proteins from cell extracts43,44. 241

The advantages of RDProx are manifold: (1) Labeling of R-loop-proximal proteins is 242

performed in vivo, which ensures that R-loops, chromatin and cellular compartments remain 243

intact; (2) Low abundant chromatin-associated proteins are amenable to the analysis; (3) Low 244

affinity and transient interactions are detected. RDProx is easily applicable for mapping R-245

loop-proximal proteins in different cell lines and species as well as upon different cellular 246

perturbations. Thereby, it provides a methodological framework to answer outstanding 247

questions, including how R-loop regulation differs between cell cycle stages or in response to 248

stress that impacts on transcription or co-transcriptional processes. 249

Recent studies have shown that the RNA moiety in the hybrid can be modified with N6-250

methyladenosine (m6a)14–16. We now provide evidence that components of the m6A RNA 251

machinery including the m6A writers (m6A–METTL-associated complex: VIRMA, ZC3H13 252

and RBM15), readers (hnRNPA2B1 and hnRNPC) and erasers (ALKBH5 and FTO) are 253

indeed proximal to R-loops. This finding suggests that dynamic m6A deposition at RNA-DNA 254

hybrids modulates the stability of R-loops in a context-dependent manner and through an 255

interplay with other pre-mRNA processing factors. METTL3-dependent m6A deposition on 256

RNA-DNA hybrids was shown to favor R-loop turnover during mitosis by recruiting the m6A 257

reader YTHDF216. A recent study identified a role for m6A RNA modification in stabilizing 258

co-transcriptional R-loops forming at transcription termination sites, thereby ensuring faithful 259

transcription termination and avoiding RNA polymerase II read-through15. 260

Another large group of proteins identified by RDProx were components of the DNA 261

replication machinery such as the MCM complex, WDHD1, RFC1, MSH6 and CHAF1B. 262

Regulation of R-loop levels by RNaseH enzymes is known to be necessary for unperturbed 263

Page 12: R-loop proximity proteomics identies a role of DDX41 in

11

replication45,46. Conversely, replication can influence R-loop formation during transcription-264

replication collisions depending on the mutual orientation of the transcription and replication 265

machineries47. The MCM complex was demonstrated to possess the RNA-DNA in vitro and 266

is therefore potentially involved in the removal of R-loops in S phase48. Timely unloading of 267

PCNA as well as the recruitment of DEAD/DExH-box helicases to the replication fork were 268

shown to prevent replication-associated R-loop accumulation49. The identification of 269

additional replication-associated factors by RDProx implies unexplored details of the crosstalk 270

between DNA replication and R-loops. 271

Identification of SMARCA4, ARID1A, SMARCC1 and SMARCE1 proximal to R-loops and 272

their known association with active transcription sites marked by H3K27 acetylation suggests 273

a role of the SWI/SNF chromatin-remodeling complex in balancing R-loop levels50,51. The 274

yeast and human FACT complex have been reported to resolve R-loop-mediated transcription-275

replication conflicts by reshaping the chromatin environment52. It has recently been 276

demonstrated that ARID1A-containing BAF complexes recruit TOP2A to R-loop-associated 277

chromatin, thereby preventing excessive R-loop formation and replication stress53. 278

RDProx identified the role of the evolutionary conserved Dead box helicase 41 (DDX41) in 279

opposing R-loop accumulation at CGI promoters to regulate transcription. Different studies 280

have suggested a regulatory role of R-loops at CGI promoters by suppressing methylation and 281

thereby increasing gene expression2,3. Our results suggests that accumulation of R-loops after 282

loss of DDX41 might repel DNA methyl-transferases or promote the recruitment of TET 283

enzymes, thereby introducing epigenetic changes that lead to alterations in the expression of 284

TGFβ and NOTCH signaling genes. DDX41 can directly bind and unwind RNA-DNA hybrids 285

and thereby oppose R-loop-dependent replication stress and genomic instability. We also 286

identified the DEAD box helicase DDX39B/UAP56 that was described to participate in 287

nuclear mRNA export as part of the TREX complex27,54,55. A recent study revealed a role of 288

DDX39B in resolving R-loops: DDX39B/UAP56 associates with active transcription 289

complexes in order to resolve R-loops throughout the gene body until the transcription 290

termination site, thereby ensuring faithful transcriptional elongation and transcript release31. 291

Depletion of DDX39B did not lead to the accumulation of R-loops around the transcription 292

start sites, but rather downstream throughout the gene body suggesting that DDX41 and 293

Page 13: R-loop proximity proteomics identies a role of DDX41 in

12

DDX39B have complementary roles in opposing R-loop accumulation at different genomic 294

regions31. 295

Pathogenic variants in DDX41 cause familial MDS/AML36,38,56–58. A recent study performed 296

in zebrafish suggested that R-loop accumulation caused by Ddx41 deficiency leads to 297

upregulated inflammatory signaling and aberrant expansion of the HSPCs32. Also, 298

accumulation of R-loops was recently proposed as feature of myelodysplastic syndrome 299

(MDS) harboring splicing mutations59–61. In this work, we demonstrate in human HSPCs that 300

pathogenic variants in DDX41 lead to accumulation of R-loops, replication stress and genomic 301

instability. Furthermore, we demonstrate that knockdown of DDX41 leads to dependency on 302

ATR signaling and transcriptional changes including upregulation of TGFβ and NOTCH 303

signaling genes (Figure 7a). These results suggest that pathogenic DDX41 variants also in 304

human familial MDS/AML contribute to disease development through accumulation of R-305

loops and provide incentives to explore ATR inhibition as therapeutic strategy in this patients. 306

307

Data Availability 308

RNA-Seq and MapR data are available on GEO under the reference GSE168173. The mass 309

spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via 310

the PRIDE partner repository62 with the dataset identifier PXD024517. 311

Acknowledgements 312

We thank the members of the Beli group for helpful discussions. Funded by the Deutsche 313

Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 393547839 – 314

SFB 1361 and Project-ID BE 5342/2-1 – FOR 2800. Support by the IMB Genomics Core 315

Facility and the use of its NextSeq500 (INST 247/870-1 FUGG) is gratefully acknowledged. 316

We thank Jimmy Chen and Amitkumar Fulzele for assistance with mass spectrometry analysis, 317

and Andrea Voigt and Katharina Mayr for technical support. VisiScope Spinning Disc 318

Confocal microscope is funded by the DFG (Project number 402386039). We thank Joan 319

Barau for help with MapR and Ronald Wong with DNA fiber analysis. Support by the IMB 320

microscopy, flow cytometry, protein production, genomics and bioinformatics core facility is 321

gratefully acknowledged. We thank Vassilis Roukos for comments on the manuscript. 322

Page 14: R-loop proximity proteomics identies a role of DDX41 in

13

Author Contributions 323

PB, TM and BL designed the research. TM and FC conducted experiments and analyzed the 324

data. IM performed DNA fiber assays and analyzed the data. NK performed the analysis of 325

MapR and RNA-sequencing. MM purified all recombinant proteins and helped with 326

fluorescence polarization assays. JF conducted the experiments in CD34+ HSPCs. PB and TM 327

wrote the manuscript. All authors read and commented on the manuscript. 328

329

Declaration of Interests 330

The authors declare no competing interests. 331

Page 15: R-loop proximity proteomics identies a role of DDX41 in

14

Online Methods 332

Cell culture 333

U2OS and HEK293T cells were obtained from ATCC and cultured in D-MEM medium 334

supplemented with 10% fetal bovine serum, L-glutamine, penicillin and streptomycin. OCI-335

AML3 cells were purchased from Leibniz InstituteCan DSMZ GmbH and cultured in a-MEM 336

medium (PAN-Biotech) containing 20% FBS, L-glutamine, penicillin and streptomycin. Cells 337

were routinely tested for mycoplasma infection with a PCR-based method. For SILAC 338

labeling, cells were cultured in media containing either L-arginine and L-lysine, L-arginine 339

[13C6] and L-lysine [2H4] or L-arginine [13C615N4] and L-lysine [13C6-15N2] (Cambridge 340

Isotope Laboratories) as described previously63. All cells were cultured at 37°C in a humidified 341

incubator containing 5% CO2. 342

RDProx 343

SILAC-labeled cells were transiently transfected with a construct expressing APEX2-tagged 344

HBD and WKK. After 48 hours, cells were pre-treated with 500 µM biotin phenol (Iris 345

Biochem) for 2 hours at 37°C, followed by a 2 minute incubation with 1 mM H2O2 (Sigma-346

Aldrich) at room temperature. Cells were washed twice with quenching solution (10 mM 347

sodium azide, 10 mM sodium ascorbate, 5 mM Trolox (all from Sigma-Aldrich)) and twice 348

with PBS. Cells were lysed on ice using RIPA buffer (50 mM Tris, 150 mM NaCl, 0.1% SDS, 349

0.5% sodium deoxycholate, 1% Triton X-100). To release chromatin-bound proteins, cell 350

lysates were sonicated using Bioruptor (Diagenode). For affinity purification of biotinylated 351

proteins, equal amounts of differentially SILAC-labeled cell extracts, originating from either 352

the HBD or the WKK condition, were combined prior to the pull down and incubated with 353

pre-equilibrated NeutrAvidin agarose beads (Thermo Scientific) for 2 hours at 4°C on a 354

rotation wheel. Beads were washed once with RIPA buffer, thrice with 8 M Urea (Sigma) in 355

1% SDS and once with 1% SDS in PBS. Bound proteins were eluted in NuPAGE LDS Sample 356

Buffer (Life Technologies) supplemented with 1 mM DTT and boiled at 95°C for 15 min. The 357

eluates, after cooling down to room temperature, were alkylated by incubating with 5.5 mM 358

chloroacetamide for 30 min in the dark and then loaded onto 4-12% gradient SDS-PAGE gels. 359

Proteins were stained using the Colloidal Blue Staining Kit (Life Technologies) and digested 360

in-gel using trypsin. Peptides were extracted from the gel and desalted on reversed phase C18 361

StageTips. 362

Page 16: R-loop proximity proteomics identies a role of DDX41 in

15

MS analysis 363

Peptide fractions were analyzed on a quadrupole Orbitrap mass spectrometer (Q Exactive or 364

Q Exactive Plus, Thermo Scientific) equipped with a UHPLC system (EASY-nLC 1000, 365

Thermo Scientific) as described64,65. Peptide samples were loaded onto C18 reversed phase 366

columns (15 cm length, 75 µm inner diameter, 1.9 µm bead size) and eluted with a linear 367

gradient from 8 to 40% acetonitrile containing 0.1% formic acid in 2 h. The mass spectrometer 368

was operated in data-dependent mode, automatically switching between MS and MS2 369

acquisition. Survey full scan MS spectra (m/z 300 – 1700) were acquired in the Orbitrap. The 370

10 most intense ions were sequentially isolated and fragmented by higher-energy C-trap 371

dissociation (HCD)66. An ion selection threshold of 5,000 was used. Peptides with unassigned 372

charge states, as well as with charge states less than +2 were excluded from fragmentation. 373

Fragment spectra were acquired in the Orbitrap mass analyzer. 374

Peptide identification 375

Raw data files were analyzed using MaxQuant (development version 1.5.2.8)67. Parent ion and 376

MS2 spectra were searched against a database containing 98,566 human protein sequences 377

obtained from the UniProtKB released in 04/2018 using Andromeda search engine68. Spectra 378

were searched with a mass tolerance of 6 ppm in MS mode, 20 ppm in HCD MS2 mode, strict 379

trypsin specificity and allowing up to 3 miscleavages. Cysteine carbamidomethylation was 380

searched as a fixed modification, whereas protein N-terminal acetylation and methionine 381

oxidation were searched as variable modifications. The dataset was filtered based on posterior 382

error probability (PEP) to arrive at a false discovery rate of below 1% estimated using a target-383

decoy approach69. 384

RDProx network analysis 385

Functional protein interaction network analysis was performed using interaction data from the 386

STRING database70. Only interactions with a score > 0.7 are represented in the networks. 387

Cytoscape (version 3.2.1) was used for visualization of protein interaction networks71. Clusters 388

within the proximal protein network were generated using the ClusterViz (version 1.0.2) 389

MCODE algorithm72. Clusters with at least 4 nodes were subjected to Reactome Pathway 390

analysis, or GO-term analysis using PANTHER73. PFAM domain enrichment analysis was 391

performed using EnrichR74. The respective terms with the lowest FDR based on the Fisher’s 392

exact test and correction for multiple comparison are highlighted next to each cluster. 393

Page 17: R-loop proximity proteomics identies a role of DDX41 in

16

SDS-PAGE and western blotting 394

Proteins were resolved on 4-12% gradient SDS-PAGE gels (NuPAGE® Bis-Tris Precast Gels, 395

Life Technologies) and transferred onto nitrocellulose membranes. Membranes were blocked 396

using 10% skimmed milk solution in PBS supplemented with 0.1% Tween-20. The list of 397

antibodies used in this study and conditions can be found in Table 1. Secondary antibodies 398

coupled to horseradish peroxidase (Jackson ImmunoResearch Laboratories) were used for 399

immunodetection. The detection was performed with SuperSignal West Pico 400

Chemiluminescent Substrate (Thermo Scientific). 401

Neutral comet assay 402

Neutral comet assay was performed according to the manufacturer’s protocol (Trevigen). 403

Briefly, cells were embedded in low melting agarose at 37°C on Comet Slides (Trevigen). 404

Overnight cell lysis at 4°C was followed by equilibration in 1× Neutral Electrophoresis Buffer 405

for 30 min at room temperature. Single cell electrophoresis was performed at 4°C in 1× Neutral 406

Electrophoresis buffer for 45 min with constant 21V. After DNA precipitation with 1× DNA 407

Precipitation Buffer, Comet Slides were dried with 70% EtOH at room temperature. In order 408

to completely dry the samples, Comet Slides were transferred to 37°C for 15 min. DNA was 409

stained with SYBR Gold solution for 30 min at room temperature. Images were taken with a 410

Leica AF7000 microscope using a 20× 0.8NA air objective and a filter cube 480/40 nm, 505nm 411

and 527/30 for excitation, dichroic and emission wavelengths respectively. Tail moments of 412

the comets were quantified using the CometScore (TriTek Corp.) software. At least 50 comets 413

were quantified per condition. 414

Quantification of DNA-RNA hybrids using dot blot 415

Cells were cultured until 80-90% confluence. Genomic DNA was extracted using the DNeasy 416

mini kit (Qiagen). The isolated gDNA was treated with 1.2 U RNase III (produced in-house) 417

for 2 h at 37°C. After enzyme deactivation at 65°C for 20 min, samples were split in half to 418

digest control samples with 10 U RNaseH1 (NEB) overnight at 37°C. Enzyme deactivation 419

was followed by spotting DNA in a serial dilution on a nitrocellulose membrane (NeoLab 420

Migge GmbH) using a dotblot apparatus (BioRad). DNA was crosslinked to the membrane by 421

UV light and afterwards blocked with 10% skimmed milk solution in PBS supplemented with 422

0.1% Tween-20. The membrane was incubated over night at 4°C with the S9.6 antibody 423

(produced in-house). After incubation of secondary antibodies conjugated to horseradish 424

Page 18: R-loop proximity proteomics identies a role of DDX41 in

17

peroxidase (Jackson ImmunoResearch Laboratories) signal was detected using SuperSignal 425

West Pico Chemiluminescent Substrate (Thermo Scientific). An antibody against dsDNA was 426

probed as a loading control after stripping the membrane with β-mercaptoethanol (Sigma) and 427

0.1% SDS in PBS. The detected signal was quantified using Fiji/ImageJ (v1.51) and ratios 428

between the signal resulting from S9.6 and dsDNA staining were calculated to quantify global 429

R-loop levels75. 430

Proximity ligation assay 431

Proximity Ligation Assay was performed according to the manufacturer´s protocol (Duolink®, 432

Sigma-Aldrich). Cells were fixed with 4% paraformaldehyde in PBS and permeabilized with 433

0.25% Triton X-100. Samples were blocked with Duolink® Blocking Solution for 1 hour at 434

37°C in a humidity chamber. After removal of the blocking solution, primary antibodies 435

diluted in Duolink® Antibody Diluent were added on the coverslips for 2 hours at room 436

temperature in a humidity chamber. Coverslips were washed 2× with Washing Buffer A. PLA 437

Plus and Minus probes were put on in a 1:5 dilution in Duolink® Antibody Diluent for 1 hour 438

at 37°C in a humidity chamber. Two washes with Washing Buffer A were followed by Ligase 439

treatment in 1× Ligation Buffer for 30 min at 37°C in a humidity chamber. Ligation buffer 440

was tapped off and coverslips were washed twice with Washing Buffer A. Amplification was 441

achieved by adding the Polymerase in 1× Amplification buffer for 100 min at 37°C in a 442

humidity chamber. After washing the samples 2× with 1× Washing Buffer B and 1× with 0.01× 443

Washing Buffer B, coverslips were stained with 1 µg/ml Hoechst33342 and mounted using 444

Dako mounting medium. Images were taken with a Leica SPE microscope using a 63× 1.4NA 445

oil objective. The number of PLA spots per nucleus was quantified using Fiji/ImageJ (v1.51)75. 446

ATPase assay 447

The ADP-Glo Assay was performed according to the manufacturer´s protocol (Promega). In 448

brief, an ATP/ADP standard curve was prepared before each experiment in order to interpolate 449

the measured values. Purified full length DDX41 was incubated in a serial dilution together 450

with 100 nM of RNA-DNA substrate with an ssDNA overhang and 5 µM ATP. After 451

incubating the mix at 37°C for 60 min, the reaction was stopped by depleting unconsumed 452

ATP with the ADP-Glo Reagent. The Kinase Detection Buffer was added to convert ADP to 453

ATP and to add luciferase and luciferin to detect ATP. Resulting luminescence was measured 454

with a Spark M200 (Tecan). The measured values were interpolated based on the values 455

Page 19: R-loop proximity proteomics identies a role of DDX41 in

18

obtained by the ATP/ADP standard curve using GraphPad PRISM (v7.04, Graphpad Software, 456

Inc.). 457

Fluorescence polarization assay 458

DsDNA, dsRNA and RNA-DNA hybrid 12-mer substrates were generated by heating the 459

respective 6-FAM-conjugated and unlabeled oligonucleotide pairs to 95°C and gradually 460

cooling them down to 4°C. Single-stranded and double-stranded substrates were diluted to a 461

final concentration of 20 nM in FP assay buffer (20mM HEPES, pH 7.0, 100mM NaCl, 5% 462

glycerol). Purified full-length DDX41 protein was added to the individual substrates in a serial 463

dilution. The change of fluorescence polarization caused by binding to the substrates was 464

measured on a Spark M20 (Tecan). Measured values were fit using the Michaelis-Menten 465

equation. Resulting Kd values were calculated using GraphPad Prism (v7.04, Graphpad 466

Software, Inc.). 467

FRET-based unwinding assay 468

RNA-DNA hybrid substrates with a single-stranded DNA overhang were generated by mixing 469

an IBFQ-conjugated 38-mer DNA oligo (IDT) and a 6-FAM-conjugated 13-mer RNA oligo 470

(IDT) and heating them to 95°C and gradually cooling them down to 4°C. Annealed substrates 471

were incubated together with 5 µM ATP and either full length DDX41 or mutant proteins. 472

Increased fluorescence intensity upon addition of DDX41 after displacement of the quencher 473

during unwinding was measured on a Spark M20 (Tecan) plate reader. 474

Cell viability assay 475

Cell viability assay was performed using the Cell Titer-Blue Cell Viability Assay (Promega) 476

according to the manufacturer’s instructions. 477

Colony formation assay 478

Cells were plated in low density 48h after siRNA transfection and challenged either with 479

DMSO or VE-821 for 12 days. Colonies were stained with 0.4% Coomassie Brilliant Blue in 480

20% ethanol solution and colonies containing at least 10 cells were counted. The number of 481

colonies formed in the presence of VE-821 was normalized to cells treated with DMSO. 482

MapR 483

MapR was performed according to the before published protocol with minor 484

modifications41,42. Cells were either treated with indicated siRNAs or with 4 µM Actinomycin 485

D (Cell Signaling Technology) for 6 h. Concanavalin A-coated beads (Polysciences Europe 486

Page 20: R-loop proximity proteomics identies a role of DDX41 in

19

GmbH) were activated in Binding Buffer (20 mM Hepes-KOH pH 7.9, 10 mM KCl, 1mM 487

CaCl2, 1 mM MnCl2). 5*105 U2OS cells were washed twice with Wash Buffer (Hepes-NaOH 488

pH7.5, 150 mM NaCl, 0.5 mM Spermidine, 1 mM protease inhibitor) at room temperature and 489

afterwards immobilized on the activated beads in 50 µl Wash Buffer containing 0.05% 490

Digitonin. Either pA-MNase or RHΔ-MNase were added to the cells overnight at 4°C on a 491

rotating wheel. After three washes with Wash Buffer containing 0.05% Digitonin (Millipore), 492

samples resuspended in 100 µl Dig-Wash-Buffer were equilibrated on ice. Activity of the 493

MNase was triggered by adding 2mM CaCl2 to the samples for 30 minutes. 2x Stop Buffer (68 494

µl 5M NaCl, 40 µl 0.5 M EDTA, 20 µl 0.2 M EGTA, 10 µl 5% digitonin, 5 µl 10mg/ml 495

RNaseA, 20 pg/ml spike-in Drosophila DNA (Active Motif)) was mixed with the samples to 496

stop the reaction. Chromatin fragments were released by incubating the samples for 20 minutes 497

at 37°C and centrifugation at 16.000×g for 10 minutes at 4°C. Supernatants were incubated at 498

70°C in the presence of 0.1% SDS and 5 µg proteinase K. Before library preparation, the DNA 499

was recovered by phenol-chloroform extraction. NGS library preparation was performed using 500

NuGEN´s Ovation Ultralow System V2 (M01379 v5). Libraries were prepared with a starting 501

amount of 1 ng of DNA and were amplified in 12 PCR cycles. Libraries were profiled in a 502

High Sensitivity DNA on a 2100 Bioanalyzer (Agilent technologies) and quantified using the 503

Qubit dsDNA HS Assay Kit, in a Qubit 2.0 Fluorometer (Life technologies). All 18 samples 504

were pooled in equimolar ratio and sequenced on one NextSeq 500 Highoutput Flowcell, PE 505

for 2x 42 cycles plus 8 cycles for the index read. 506

MapR analysis 507

Libraries were de-multiplexed using blc2fastq (v2.19). Samples were mapped against dm6 and 508

h38 using bowtie2 (v2.3.4)76. The spike-in reads mapping against dm6 were used as reference 509

to normalize the bam files based on the spike in reads. The normalization was done according 510

to the recommendations made by Active Motif. In short normalization factors were calculated 511

by taking the lowest read count mapped to the dm6 reference in the sample set divided by the 512

read count of the current sample. BAM files were down-sampled based on this normalization 513

factors. Peak calling was performed using MACS2 (v2.1.2) with the parameters * –keep-dup 514

auto–broad –broad-cutoff 0.1 –bw 100 –min-length 100 –format BAMPE –g hs *77. 515

Differentially bound regions were retrieved using Diffbind (v.3.0.5) based on +- 500 bp 516

regions around the summit of consensus 1 peaks78. The testing was done using EdgeR and 517

Page 21: R-loop proximity proteomics identies a role of DDX41 in

20

regions were deemed to be differentially bound with an FDR below 5%. GO term and pathway 518

analysis on the diff. bound regions /called peaks was performed using ClusterProfiler (v3.18) 519

and other Bioconductor packages notably rtracklayer and 520

GenomicFeatures/GenomicRanges79–81. Coverage tracks were created using Deeptools 521

(v3.4.1)82. Metagene / Enrichment around the TSS profiles were created using DeepTools 522

(v3.4.1) and further processed using custom R scripts82. Plots were generated using ggplot2 523

and further packages from the tidyverse83,84. Regions were assigned to features using 524

ChIPSeeker (v. 1.26) 85. 525

DNA fiber spreading assay 526

U2OS cells were labeled with 5-Chloro-2′-deoxyuridine (CldU, 30 µM) for 30 min, washed 527

once with warm PBS, then labeled for 30 min with 5-Iodo-2′-deoxyuridine (IdU, 340 µM). 528

Cells were either transfected with siDDX41/siCTRL for 24 h or treated with 0.1 µM 529

aphidicolin (APH) for 1.5 h. After labeling, cells were washed once with warm and 3× with 530

cold PBS, then trypsinized and spun down (300×g, 5 min). They were resuspended in cold 531

PBS, counted and diluted to 5×105/mL. Labeled cells were diluted with twice the number of 532

unlabeled cells. 4 μL of the cell suspension were mixed with 7.5 μL of the lysis buffer 533

(200mM Tris-HCl pH 7.4, 50 mM EDTA, 0.5% SDS) directly on the SuperFrost Plus 534

microscopy slide (Thermo Scientific) and incubated horizontally for 9 min. The slides were 535

then tilted at 30°−45°, allowing DNA fibers to spread to the bottom of the slide. DNA spreads 536

were air-dried and fixed with 3:1 methanol:acetic acid overnight at 4°C. The fibers were 537

rehydrated 3×3 min in PBS, dipped once in Milli-Q water and denatured in 2.5 M HCl for 1.5 h 538

at RT, then washed 5×3 min in PBS. The slides were blocked for 40 min in the blocking 539

solution (2% BSA, 0.1% Tween 20 in PBS) and incubated with primary antibodies (mouse 540

anti-BrdU, 1:100, BD Bioscience and rat anti-BrdU, 1:500, Abcam) at RT for 2.5 h. After 3×5 541

min washes with PBS-T, the slides were incubated with secondary antibodies (goat anti-mouse 542

Cy3.5, Abcam and goat anti-rat Cy5, Abcam) at RT in the dark for 1 h. The spreads were 543

washed 3×x5 min with PBS-T, dipped in Milli-Q water and air-dried completely in the dark. 544

The slides were mounted using Prolong Gold AntiFade mountant (Thermo Scientific), imaged 545

with Visiscope 5-Elements Spinning Disc Confocal microscope (Visitron Systems, Germany) 546

(magnification: 60× Water immersion objective with 2x extra magnification; laser lines and 547

Page 22: R-loop proximity proteomics identies a role of DDX41 in

21

corresponding emission filters: 640nm, 692/40 and 561nm 623/32) and quantified using the 548

Fiji/ImageJ software75. 549

Immunofluorescence 550

Cells were washed 2× with PBS, incubated with 0.4% NP-40 for 20 or 40 min on ice and 551

washed 2× with PBS-T (0.1%). Cells were fixed with 4% paraformaldehyde in PBS for 15 552

min at room temperature, washed 2× with PBS-T (0.1%) and permeabilized with Triton-X-553

100 (0.3%) for 5 min at room temperature, followed by 2× washes with PBS. Cells were 554

blocked for 1 hour with 5% fetal bovine serum albumin in PBS-T (0.1%) containing penicillin 555

and streptomycin. Incubation with primary antibodies diluted in blocking buffer was 556

performed overnight at 4°C and followed by 3× washes with PBS-T and 1 hour incubation 557

with Alexa Fluor-coupled secondary antibodies in a dark chamber at room temperature. Nuclei 558

were counterstained with 1µg/ml Hoechst33342 in PBS either simultaneously with secondary 559

antibody incubation or for 30 minutes. For chromatin retention assay permeabilization, 560

blocking and antibody incubations steps were omitted. Cells were washed 2× with PBS-T and 561

kept at 4°C in PBS until imaging. Imaging was performed with an Opera Phenix (PerkinElmer) 562

microscope using a 40× 1.1NA water objective. Image analysis was performed by using 563

Harmony High-Content Imaging and Analysis Software (version 4.4, PerkinElmer). Standard 564

building blocks allowed for nuclei segmentation based on the Hoechst signal and cells on the 565

edges of the field were excluded. Mean intensity measurements were performed for maximum 566

projections and spot detection was calculated by using algorithm B. 567

RNA-sequencing 568

RNA was extracted using the RNeasy Plus Mini Kit (Qiagen). In brief, cells were lysed and 569

genomic DNA was depleted. Samples were treated with DNase to remove residual DNA. After 570

purification using spin columns, RNA was eluted in RNase-free water and stored at -80°C 571

until library preparation. NGS library prep was performed with Illumina's TruSeq stranded 572

mRNA LT Sample Prep Kit following TruSeq Stranded mRNA Reference Guide (Oct.2017) 573

(Document # 1000000040498v00). Libraries were prepared with a starting amount of 1000 ng 574

and amplified in 10 PCR cycles. Libraries were profiled in a High Sensitivity DNA on a 2100 575

Bioanalyzer (Agilent Technologies) and quantified using the Qubit dsDNA HS Assay Kit, in 576

a Qubit 2.0 Fluorometer (Life Technologies). All 15 samples were pooled in equimolar ratio 577

Page 23: R-loop proximity proteomics identies a role of DDX41 in

22

and sequenced on a NextSeq 500 Highoutput FC, SR for 1×84 cycles plus 7 cycles for the 578

index read. 579

RNA-sequencing analysis 580

Libraries were de-multiplexed using blc2fastq (v2.19). Samples were mapped using STAR 581

(v2.7) against hg38 with the Gencode annotation (v25)86,87. Reads per gene were counted using 582

featureCounts (v.1.6) 88. The differential expression analysis was performed using 583

Bioconductor (v2.46)/DESeq2 (v1.26)89,90. Genes were deemed sigificantly diff. regulated 584

with an FDR below 1%. Coverage tracks were normalised and created using deepTools 585

(v3.4.1)82. Genes were deemed expressed within the analysis if they were tested for diff. 586

expression in the DESeq2 analysis. 587

Hematopoietic stem and progenitor cell experiments 588

For DDX41 knockdown experiments primary human CD34+ cells (from cord blood, purchased 589

from Lonza) were transfected with two different shRNA constructs (TL305064C; 590

GCTATGCAGACCAAGCAGGTCAGCAACAT; TL305064D; 591

GCGTGCGGAAGAAATACCACATCCTGGTG) (Origene) using Amaxa® Human CD34+ 592

Cell Nucleofector® Kit according to the manufacturer’s recommendations (Lonza). Similarly, 593

for ectopic expression of DDX41 WT, DDX41 L237F P238T, and DDX41 R525H, 594

respectively, primary human CD34+ cells (from cord blood, purchased from Lonza) were 595

transfected using Amaxa® Human CD34+ Cell Nucleofector® Kit according to the 596

manufacturer’s recommendations (Lonza). Cells were cultured in StemSpan SFEM II 597

supplemented with myeloid expansion supplement containing SCF, TPO, G-CSF and GM-598

CSF (Stemcell Technologies) for 24 h before isolation of GFP+ cells with a BD FACSMelody 599

cell sorter, using double sorting to ensure maximum purity (BD Biosciences). For 600

immunofluorescence experiments, cells were seeded onto glass slides, fixed with 4% PFA for 601

10 min, permeabilized using 0.15% Triton X-100 for 2 min and blocked in 1% BSA/PBS. R-602

loops were detected using antibody S9.6 (Kerafast) followed by Alexa Fluor® 488-conjugated 603

goat anti-mouse antibody (Thermo Fisher Scientific) and 53BP1 was detected using anti-604

53BP1 (nb100-904; Novus Biologicals) followed by Alexa Fluor® 594-conjugated goat anti-605

rabbit antibody (Thermo Fisher Scientific). Slides were mounted in VectaShield containing 1 606

µg/ml DAPI (Vector Laboratories). Images were acquired on a DMi8 Leica inverted 607

microscope (100x objective) and processed using LasX software (Leica). Quantification of 608

Page 24: R-loop proximity proteomics identies a role of DDX41 in

23

mean fluorescence intensity (MFI) was performed using the following formula: mean 609

fluorescence of selected cell – (area of selected cell × mean fluorescence of background 610

readings). Values are displayed as arbitrary units (A.U.). In order to assess DDX41 knockdown 611

efficiency, about 5000 eGFP+ cells were sorted in RLT buffer (Qiagen) for RNA extraction 612

followed by cDNA synthesis. DDX41 expression was quantified using qRT-PCR (DDX41-613

fw-qPCR, DDX41-rev-qPCR). 614

DDX41 DEAD box purification 615

His6-DDX41 (153-410) was expressed from a pET28 vector in E.coli (BL21 DE3 codon+). 616

Cells were lysed in lysis buffer (30 mM Tris-Cl, 500 mM NaCl, 15 mM imidazole, 1 mM 617

DTT, cOmplete protease inhibitors, 1 mM MgCl2, 150 U/ml Benzonase, pH 8.0) using 618

sonication (Branson Sonifier 450, 9 mm tip, 20% duty cycle, output 7, 6 minutes total). The 619

lysate was cleared by centrifugation (40000×g, 4°C, 30 min) and loaded onto a HisTrap ff 5 620

ml column (Cytiva) using a NGC Quest Plus FPLC system (Biorad). The column was washed 621

with wash buffer (30 mM Tris-Cl, 500 mM NaCl, 15 mM imidazole, pH 8.0), followed by a 622

second wash with wash buffer containing 40 mM imidazole. His6-DDX41 (153-410) was 623

eluted by applying a linear gradient of 40-500 mM imidazole in wash buffer. Peak elution 624

fractions were pooled and diluted 1:10 in Heparin wash buffer (30 mM Na-Hepes, 25 mM 625

NaCl, 5% glycerol). Diluted His-DDX41 (153-410) was loaded onto a HiTrap Heparin 5 ml 626

column. After the column was washed with Heparin wash buffer, His-DDX41 (153-410) was 627

eluted using a linear gradient of 25-1000 mM NaCl in Heparin wash buffer. Peak fractions 628

were pooled and concentrated using an Amicon 15 ml spin concentrator with 10 kDa cut off 629

(Merck Millipore). Concentrated His6-DDX41 (153-410) was applied onto a gel filtration 630

column (Superdex 200 16/60 pg, Cytiva in 30 mM Na-Hepes, 150 mM NaCl, 1 mM DTT, 631

10% Glycerol, pH 7.5). Peak fractions containing His-DDX41 (153-410) were pooled and 632

protein concentration was determined by using absorbance spectroscopy and the respective 633

extinction coefficient at 280 nm. Aliquots of the protein were snap-frozen in liquid nitrogen 634

and stored at -80C. 635

DDX41 full length purification 636

His6-3C-DDX41 WT and R525H were expressed in SF9 insect cells using the Bac-to-Bac 637

system (Thermo Fisher). SF9 cells grown in SF900 III media (Thermo Fisher) were infected 638

at a cell density of 1 million cells/ml using 10 ml of the respective DDX41 baculovirus. Cells 639

Page 25: R-loop proximity proteomics identies a role of DDX41 in

24

were harvested for purification 2 days after infection. Cells were lysed in 50 ml lysis buffer 640

(30 mM Tris-Cl, 500 mM NaCl, 15 mM imidazole, 1 mM MgCl2, 1 mM DTT, 5 % glycerol, 641

0.5 % Triton X-100, 100 U/ml Benzonase, EDTA-free cOmplete protease inhibitor cocktail, 642

pH 8.0) and lysed using sonication (9 mm tip on a Branson Sonifier 450, output 3, 20% duty 643

cycle, 3 minutes total). The cell lysate was cleared by centrifugation (40000×g, 30 min at 4°C) 644

and loaded onto a HisTrap FF crude 1 ml column (Cytiva) using a NGC Quest Plus FPLC 645

system (Biorad). The column was washed with 25 ml wash buffer (30 mM Tris-Cl, 500 mM 646

NaCl, 15 mM imidazole, 5% glycerol, pH 8.0), followed by a 10 ml wash with wash buffer 647

containing 40 mM imidazole. DDX41 variants were eluted from the column by applying a 648

linear gradient from 40 to 500 mM imidazole in wash buffer over 20 ml total volume. Peak 649

elution fractions were pooled and concentrated to a volume of 500 µl using an Amicon Ultra-650

4 spin concentrator with 10 kDa cut off (Merck Millipore). Concentrated His6-3C-DDX41 651

versions were subjected to gelfiltration (Superdex 200 10/300 increase, Cytiva, in 30 mM Na-652

Hepes, 150 mM NaCl, 1 mM DTT, 10% Glycerol, pH 7.5). Peak fractions containing DDX41 653

were pooled and subjected to 3C-mediated cleavage of the His6-tag overnight. Tagless DDX41 654

was separated from 3C protease using another round of gel filtration. Peak fractions containing 655

tagless DDX41 were pooled and protein concentration was determined by using absorbance 656

spectroscopy and the respective extinction coefficient at 280 nm. Aliquots of the protein were 657

snap-frozen in liquid nitrogen and stored at -80°C. 658

Flow cytometry 659

For the HBD-GFP retention assay, HBD-GFP expression in U2OS cells was induced with 660

1µg/ml doxycycline (Sigma-Aldrich) for 48 h. Cells were collected with trypsin and pre-661

extracted with 0.05% Triton X-100 in PBS for 3 min at room temperature. After fixation with 662

4% PFA and subsequent PBS washes, cells were analyzed on an LSRFortessa SORP (Becton 663

Dickinson). 30,000 cells were measured. The 488 nm laser and 530/30 band pass filter were 664

used for analyzing GFP fluorescence. DNA content was analyzed using 1µg/ml Hoechst33342 665

(Excitation 355 nm, Emission 450/50 BP). Downstream data analysis was performed with 666

FlowJo (v10.5.3, Becton Dickinson). After gating for singlets (FSC-A/FSC-H, then Hoechst-667

A/Hoechst-H), the proportion of GFP-positive cells was quantified by setting a threshold 668

above fluorescence of DMSO-treated control cells. 669

670

Page 26: R-loop proximity proteomics identies a role of DDX41 in

25

Fluorescence activated cell sorting 671

Expression of N-terminally GFP-tagged DDX41 WT, L237F+P238T or R525H in OCI-AML3 672

cells was induced with 3 µg/ml doxycycline (Sigma-Aldrich). 72 h after induction, cells were 673

spun down and washed twice with PBS. After re-suspending cells in PBS, they were sorted by 674

FACS using a 100 µM nozzle on a BD FACSAria III SORP (Becton Dickinson) in purity 675

precision mode with FACSDiva software version 8.0.2. Cells of interest were identified via 676

FSC-A/SSC-A. Subsequently, doublets were excluded via FSC-A/FSC-H and dead cells were 677

excluded by DAPI staining (0.5 µg/ml final concentration) using a 405 nm laser and 450/50 678

BP. GFP cutoff was set according to non-expressing cells. Sorting based on GFP was achieved 679

using the 488 nm laser and 530/30 band pass filter. Roughly 500.000 cells were sorted directly 680

into fresh a-MEM containing 20% FBS and further cultured until subsequent experiments. 681

682

Table 1: List of antibodies used in this study 683

Protein name Product number Origin Dilution (WB/IF)

γH2AX A300-081A-M Bethyl (1:1000/1:500)

53BP1 MAB3802 Millipore (-/1:200)

pRPA (S33) A300-246A-M Bethyl (-/1:200)

DDX41 15076 Cell Signaling 1:1000

GFP sc-9996 Santa Cruz 1:1000

S9.6 ENH001 Kerafast 1:10000

dsDNA ab27156 Abcam 1:1000

AQR A302-547A Bethyl 1:2000

DDX42 SAB1407136 Sigma 1:1000

DDX39A SAB2700315 Sigma 1:1000

FLAG M2 F1804 Sigma 1:2000

Β-Actin A2228 Sigma 1:10000

DHX37 A300-856A-M Bethyl 1:1000

BrdU (mouse) 347580 BD Bioscience 1:100

BrdU(rat) ab6326 Abcam 1:500

Anti-mouse Cy3.5 Ab6946 Abcam 1:100

Anti-rat Cy5 Ab6565 Abcam 1:100

P65 sc-372 Santa Cruz (-/1:200)

Page 27: R-loop proximity proteomics identies a role of DDX41 in

26

Table 2: List of siRNA used in this study 684

Gene name Sequence 5’-3’ or origin

AQR CUGAAUAUGGCGGUGUAGU

DDX41 L-010394-00-0005 – Horizon Discovery

DDX42 L-012393-01-0005 – Horizon Discovery

DHX37 L-019073-00-0005– Horizon Discovery

DDX39A L-004920-01-0005 – Horizon Discovery

DDX27 L-013635-01-0005 – Horizon Discovery

Si control pool D-001820-10 – Horizon Discovery

Table 3: List of oligos used in this study 685

Construct Sequence 5’-3’

DDX41-fw-qPCR GTCCGTGAAAGAGCAGATGGAG

DDX41-rev-qPCR GTAGCGACAGATGTCTAGGCTG

HBD-backbone-fw cctcctcacggcatagaacatggtggagcctgcttttttgtacaaagttgg

HBD-backbone-rev cctttgtcaggaaatctgcaagcgacccagctttcttgtacaaagt

HBD-insert-fw ccaactttgtacaaaaaagcaggctccaccatgttctatgccgtgaggagg

HBD-insert rev actttgtacaagaaagctgggtcgcttgcagatttcctgacaaagg

HBD-W43A-fw ctttctgaccgcgaatgagtgcagagcacaggtggaccg

HBD-W43A-rev accccggtcttgcggccc

HBD-K59A-K60A-fw tgccagatttgcggcgtttgccacagaggatgaggc

HBD-K59A-K60A-rev gcaggaaaccggtccacc

DDX41-R525H-fw CGCACCGGGCACTCGGGAAAC

DDX41-R525H_rev GCCAATCCGGTGTACATAGTTCTC

DDX41-L237F-P238T-fw TGTTCACGTTTACCGTCATCATG

DDX41-L237FP238T-rev CCAGTGTCTTGCCTGAAC

DDX41-3C-rev ACCGGGCCCCTGGAACAGAAC

DDX41-L153-fw TTCTGTTCCAGGGGCCCGGTCTGAGCATGTCTGAAGAGC

DDX41-Q410-rev gtgctcgagtgcggccgctcactggatgacatccaggctg

DNA-12mer-fw GACACCTGATTC-6-FAM

DNA-12mer-rev GAATCAGGTGTC

RNA-12mer-fw GACACCTGATTC-6-FAM

RNA-12mer-rev GAATCAGGTGTC

DNA-38mer-IBFQ TAAAACAAAACAAAACAAAACAAAATCTTTACGGTGCT-IBFQ

RNA-13mer-6-FAM 6-FAM-AGCACCGUAAAGA

DNA-38mer TAAAACAAAACAAAACAAAACAAAATCTTTACGGTGCT

686

Page 28: R-loop proximity proteomics identies a role of DDX41 in

27

References 687

1. Bradner, J. E., Hnisz, D. & Young, R. A. Transcriptional Addiction in Cancer. Cell 168, 688

629–643 (2017). 689

2. Chédin, F., Ginno, P. A., Lott, P. L., Christensen, H. C. & Korf, I. R-Loop Formation 690

Is a Distinctive Characteristic of Unmethylated Human CpG Island Promoters. Mol. 691

Cell 45, 814–825 (2012). 692

3. Arab, K. et al. GADD45A binds R-loops and recruits TET1 to CpG island promoters. 693

Nat. Genet. 51, 217–223 (2019). 694

4. Grunseich, C. et al. Senataxin Mutation Reveals How R-Loops Promote Transcription 695

by Blocking DNA Methylation at Gene Promoters. Mol. Cell 69, 426-437.e7 (2018). 696

5. Skourti-Stathaki, K., Proudfoot, N. J. & Gromak, N. Human Senataxin Resolves 697

RNA/DNA Hybrids Formed at Transcriptional Pause Sites to Promote Xrn2-Dependent 698

Termination. Mol. Cell 42, 794–805 (2011). 699

6. Crossley, M. P., Bocek, M. & Cimprich, K. A. R-Loops as Cellular Regulators and 700

Genomic Threats. Mol. Cell 73, 398–411 (2019). 701

7. Graf, M. et al. Telomere Length Determines TERRA and R-Loop Regulation through 702

the Cell Cycle. Cell 170, 72-85.e14 (2017). 703

8. El Hage, A., Webb, S., Kerr, A. & Tollervey, D. Genome-Wide Distribution of RNA-704

DNA Hybrids Identifies RNase H Targets in tRNA Genes, Retrotransposons and 705

Mitochondria. PLoS Genet. 10, (2014). 706

9. Niehrs, C. & Luke, B. Regulatory R-loops as effectors of gene expression and genome 707

stability. Nat Rev Mol Cell Biol. 21, 167–178 (2021). 708

10. Balk, B. et al. Telomeric RNA-DNA hybrids affect telomere-length dynamics and 709

senescence. Nat. Struct. &Amp; Mol. Biol. 20, 1199 (2013). 710

11. Kabeche, L., Nguyen, H. D., Buisson, R. & Zou, L. A mitosis-specific and R loop-711

driven ATR pathway promotes faithful chromosome segregation. Science (80-. ). 712

(2018). doi:10.1126/science.aan6490 713

12. Sanz, L. A. et al. Prevalent, Dynamic, and Conserved R-Loop Structures Associate with 714

Specific Epigenomic Signatures in Mammals. Mol. Cell 63, 167–178 (2016). 715

13. De Magis, A. et al. DNA damage and genome instability by G-quadruplex ligands are 716

mediated by R loops in human cancer cells. Proc. Natl. Acad. Sci. U. S. A. 116, 816–717

Page 29: R-loop proximity proteomics identies a role of DDX41 in

28

825 (2019). 718

14. Zhang, C. et al. METTL3 and N6-Methyladenosine Promote Homologous 719

Recombination-Mediated Repair of DSBs by Modulating DNA-RNA Hybrid 720

Accumulation. Mol. Cell 79, 425-442.e7 (2020). 721

15. Yang, X. et al. m6A promotes R-loop formation to facilitate transcription termination. 722

Cell Res. 29, 1035–1038 (2019). 723

16. Abakir, A. et al. N 6-methyladenosine regulates the stability of RNA:DNA hybrids in 724

human cells. Nat. Genet. 52, 48–55 (2020). 725

17. Guan, A. et al. A Genome-wide siRNA Screen Reveals Diverse Cellular Processes and 726

Pathways that Mediate Genome Stability. Mol. Cell 35, 228–239 (2009). 727

18. Mersaoui, S. Y. et al. Arginine methylation of the DDX 5 helicase RGG / RG motif by 728

PRMT 5 regulates resolution of RNA:DNA hybrids . EMBO J. 38, 1–20 (2019). 729

19. Sollier, J. et al. Transcription-Coupled Nucleotide Excision Repair Factors Promote R-730

Loop-Induced Genome Instability. Mol. Cell 56, 777–785 (2014). 731

20. Aguilera, A. & Gómez-González, B. DNA-RNA hybrids: The risks of DNA breakage 732

during transcription. Nat. Struct. Mol. Biol. 24, 439–443 (2017). 733

21. Aguilera, A. The connection between transcription and genomic instability. EMBO J. 734

21, 195 LP – 201 (2002). 735

22. Okamoto, Y. et al. FANCD2 protects genome stability by recruiting RNA processing 736

enzymes to resolve R-loops during mild replication stress. FEBS J. 286, 139–150 737

(2019). 738

23. Cristini, A. et al. Dual Processing of R-Loops and Topoisomerase I Induces 739

Transcription-Dependent DNA Double-Strand Breaks. Cell Rep. 28, 3167-3181.e6 740

(2019). 741

24. Belen Gomez-Gonzalez and Andres Aguilera. Activation-induced cytidine deaminase 742

action is. Proc. Natl. Acad. Sci. 104, 8409–8414 (2007). 743

25. Tuduri, S. et al. Topoisomerase I suppresses genomic instability by preventing 744

interference between replication and transcription. Nat. Cell Biol. 11, 1315–1324 745

(2009). 746

26. Zeman, M. K. & Cimprich, K. A. Causes and consequences of replication stress. Nat. 747

Cell Biol. 16, 2–9 (2013). 748

Page 30: R-loop proximity proteomics identies a role of DDX41 in

29

27. Domínguez-Sánchez, M. S., Barroso, S., Gómez-González, B., Luna, R. & Aguilera, A. 749

Genome instability and transcription elongation impairment in human cells depleted of 750

THO/TREX. PLoS Genet. 7, 19–22 (2011). 751

28. Pefanis, E. et al. RNA exosome-regulated long non-coding RNA transcription controls 752

super-enhancer activity. Cell 161, 774–789 (2015). 753

29. Massé, E. & Drolet, M. R-loop-dependent hypernegative supercoiling in Escherichia 754

coli topA mutants preferentially occurs at low temperatures and correlates with growth 755

inhibition. J. Mol. Biol. 294, 321–332 (1999). 756

30. Lockhart, A. et al. RNase H1 and H2 Are Differentially Regulated to Process RNA-757

DNA Hybrids. Cell Rep. 29, 2890-2900.e5 (2019). 758

31. Pérez-Calero, C. et al. UAP56/DDX39B is a major cotranscriptional RNA–DNA 759

helicase that unwinds harmful R loops genome-wide. Genes Dev. 34, 1–15 (2020). 760

32. Weinreb, J. T. et al. Excessive R-loops trigger an inflammatory cascade leading to 761

increased HSPC production Article Excessive R-loops trigger an inflammatory cascade 762

leading to increased HSPC production. Dev. Cell 1–14 (2021). 763

doi:10.1016/j.devcel.2021.02.006 764

33. Tsukamoto, T. et al. Insights into the Involvement of Spliceosomal Mutations in 765

Myelodysplastic Disorders from Analysis of SACY-1/DDX41 in Caenorhabditis 766

elegans . Genetics 214, 869–893 (2020). 767

34. Jiang, Y., Zhu, Y., Liu, Z. J. & Ouyang, S. The emerging roles of the DDX41 protein 768

in immunity and diseases. Protein Cell 8, 83–89 (2017). 769

35. Liu, Y. & Imai, R. Function of plant DExD/H-Box RNA helicases associated with 770

ribosomal RNA biogenesis. Front. Plant Sci. 9, 1–7 (2018). 771

36. Maciejewski, J. P., Padgett, R. A., Brown, A. L. & Müller-Tidow, C. DDX41-related 772

myeloid neoplasia. Semin. Hematol. 54, 94–97 (2017). 773

37. Kadono, M. et al. Biological implications of somatic DDX41 p . R525H mutation in 774

acute myeloid leukemia. Exp. Hematol. 44, 745-754.e4 (2016). 775

38. Polprasert, C. et al. Inherited and Somatic Defects in DDX41 in Myeloid Neoplasms. 776

Cancer Cell 27, 658–670 (2015). 777

39. Lam, S. S. et al. Directed evolution of APEX2 for electron microscopy and proteomics. 778

Nat Methods 12, 51–54 (2015). 779

Page 31: R-loop proximity proteomics identies a role of DDX41 in

30

40. Nowotny, M. et al. Specific recognition of RNA/DNA hybrid and enhancement of 780

human RNase H1 activity by HBD. EMBO J. 27, 1172–1181 (2008). 781

41. Yan, Q., Shields, E. J., Bonasio, R. & Sarma, K. Mapping Native R-Loops Genome-782

wide Using a Targeted Nuclease Approach. Cell Rep. 29, 1369-1380.e5 (2019). 783

42. Yan, Q. & Sarma, K. MapR: A Method for Identifying Native R-Loops Genome Wide. 784

Curr. Protoc. Mol. Biol. 130, 1–12 (2020). 785

43. Cristini, A., Groh, M., Kristiansen, M. S. & Gromak, N. RNA/DNA Hybrid Interactome 786

Identifies DXH9 as a Molecular Player in Transcriptional Termination and R-Loop-787

Associated DNA Damage. Cell Rep. 23, 1891–1905 (2018). 788

44. Wang, I. X. et al. Human proteins that interact with RNA/DNA hybrids. Genome Res. 789

28, 1405–1414 (2018). 790

45. Parajuli, S. et al. Human ribonuclease H1 resolves R-loops and thereby enables 791

progression of the DNA replication fork. J. Biol. Chem. 292, 15216–15224 (2017). 792

46. Zhao, H., Zhu, M., Limbo, O. & Russell, P. RNase H eliminates R‐loops that disrupt 793

DNA replication but is nonessential for efficient DSB repair. EMBO Rep. 19, 1–10 794

(2018). 795

47. Hamperl, S., Bocek, M. J., Saldivar, J. C., Swigut, T. & Cimprich, K. A. Transcription-796

Replication Conflict Orientation Modulates R-Loop Levels and Activates Distinct DNA 797

Damage Responses. Cell 170, 774-786.e19 (2017). 798

48. Shin, J.-H. & Kelman, Z. The replicative helicases of bacteria, archaea, and eukarya can 799

unwind RNA-DNA hybrid substrates. J. Biol. Chem. 281, 26914–26921 (2006). 800

49. Kim, S. et al. ATAD5 restricts R-loop formation through PCNA unloading and RNA 801

helicase maintenance at the replication fork. Nucleic Acids Res. 48, 7218–7238 (2020). 802

50. Nakayama, R. T. et al. SMARCB1 is required for widespread BAF complex-mediated 803

activation of enhancers and bivalent promoters. Nat. Genet. 49, 1613–1623 (2017). 804

51. Wang, X. et al. SMARCB1-mediated SWI/SNF complex function is essential for 805

enhancer regulation. Nat. Genet. 49, 289–295 (2017). 806

52. Barroso, S. The yeast and human FACT chromatin- reorganizing complexes solve R-807

loop- mediated transcription – replication conflicts. Genes Dev. 28, 735–748 (2014). 808

53. Tsai, S. et al. ARID1A regulates R-loop associated DNA replication stress. bioRxiv 809

2020.11.16.384446 (2020). doi:10.1101/2020.11.16.384446 810

Page 32: R-loop proximity proteomics identies a role of DDX41 in

31

54. Pühringer, T. et al. Structure of the human core transcription-export complex reveals a 811

hub for multivalent interactions. Elife 9, 1–65 (2020). 812

55. Stäßer, K. et al. TREX is a conserved complex coupling transcription with messenger 813

RNA export. Nature 417, 304–308 (2002). 814

56. Cheah, J. J. C., Hahn, C. N., Hiwase, D. K., Scott, H. S. & Brown, A. L. Myeloid 815

neoplasms with germline DDX41 mutation. Int. J. Hematol. 106, 163–174 (2017). 816

57. Lewinsohn, M. et al. Novel germ line DDX41 mutations define families with a lower 817

age of MDS/AML onset and lymphoid malignancies. Blood 127, 1017–1023 (2016). 818

58. Qu, S. et al. Molecular and clinical features of myeloid neoplasms with somatic DDX41 819

mutations. Br. J. Haematol. 1–5 (2020). doi:10.1111/bjh.16668 820

59. Chen, L. et al. The Augmented R-Loop Is a Unifying Mechanism for Myelodysplastic 821

Syndromes Induced by High-Risk Splicing Factor Mutations. Mol. Cell 69, 412-425.e6 822

(2018). 823

60. Nguyen, H. D. et al. Spliceosome mutations induce R loop-associated sensitivity to 824

ATR inhibition in myelodysplastic syndromes. Cancer Res. 78, 5363–5374 (2018). 825

61. Flach, J. et al. Replication stress signaling is a therapeutic target in myelodysplastic 826

syndromes with splicing factor mutations. Haematologica Online ahe, (2020). 827

62. Vizcaino, J. A. et al. The PRoteomics IDEntifications (PRIDE) database and associated 828

tools: status in 2013. Nucleic Acids Res 41, D1063-9 (2013). 829

63. Ong, S.-E. et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a 830

simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–831

386 (2002). 832

64. Michalski, A. et al. Mass spectrometry-based proteomics using Q Exactive, a high-833

performance benchtop quadrupole Orbitrap mass spectrometer. Mol Cell Proteomics 834

10, M111 011015 (2011). 835

65. Kelstrup, C. D., Young, C., Lavallee, R., Nielsen, M. L. & Olsen, J. V. Optimized fast 836

and sensitive acquisition methods for shotgun proteomics on a quadrupole orbitrap mass 837

spectrometer. J Proteome Res 11, 3487–3497 (2012). 838

66. Olsen, J. V et al. Higher-energy C-trap dissociation for peptide modification analysis. 839

Nat Methods 4, 709–712 (2007). 840

67. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized 841

Page 33: R-loop proximity proteomics identies a role of DDX41 in

32

p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. 842

Biotechnol. 26, 1367–1372 (2008). 843

68. Cox, J. et al. Andromeda: A peptide search engine integrated into the MaxQuant 844

environment. J. Proteome Res. 10, 1794–1805 (2011). 845

69. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-846

scale protein identifications by mass spectrometry. Nat Methods 4, 207–214 (2007). 847

70. Franceschini, A. et al. STRING v9.1: Protein-protein interaction networks, with 848

increased coverage and integration. Nucleic Acids Res. 41, D808-15 (2013). 849

71. Saito, R. et al. A travel guide to Cytoscape plugins. Nat Methods 9, 1069–1076 (2012). 850

72. Wang, J. et al. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological 851

Network. IEEE/ACM Trans. Comput. Biol. Bioinforma. 12, 815–822 (2015). 852

73. Mi, H. et al. PANTHER version 16: a revised family classification, tree-based 853

classification tool, enhancer regions and extensive API. Nucleic Acids Res. 49, D394–854

D403 (2021). 855

74. Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web 856

server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016). 857

75. Schindelin, J. et al. Fiji - an Open platform for biological image analysis. Nat. Methods 858

9, (2009). 859

76. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. 860

Methods 9, 357–359 (2012). 861

77. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, (2008). 862

78. Stark, R. & Brown, G. DiffBind differential binding analysis of ChIP-Seq peak data. R 863

Packag. version 100, (2011). 864

79. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing 865

biological themes among gene clusters. OMICS 16, 284–287 (2012). 866

80. Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing 867

with genome browsers. Bioinformatics 25, 1841–1842 (2009). 868

81. Lawrence, M. et al. Software for Computing and Annotating Genomic Ranges. PLOS 869

Comput. Biol. 9, e1003118 (2013). 870

82. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data 871

analysis. Nucleic Acids Res. 44, W160-5 (2016). 872

Page 34: R-loop proximity proteomics identies a role of DDX41 in

33

83. Wickham, H. ggplot2: Elegant Graphics for Data Analysis Title. (Springer-Verlag New 873

York, 2016). 874

84. Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019). 875

85. Yu, G., Wang, L. G. & He, Q. Y. ChIP seeker: An R/Bioconductor package for ChIP 876

peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015). 877

86. Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. 878

Nucleic Acids Res. 47, D766–D773 (2019). 879

87. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 880

(2013). 881

88. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program 882

for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014). 883

89. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and 884

dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014). 885

90. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. 886

Nat. Methods 12, 115–121 (2015). 887

91. Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-888

sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). 889

890

Page 35: R-loop proximity proteomics identies a role of DDX41 in

Figures

Figure 1

RDProx-Mapping R-loop-proximal proteome on native chromatin. a. Schematic representation of theRDProx work�ow for identi�cation of R-loop-proximal proteins. HBD or HBD-WKK fused N-terminally toAPEX2 were transiently expressed in SILAC-labeled HEK293T cells. Biotinylation was induced upon

Page 36: R-loop proximity proteomics identies a role of DDX41 in

addition of 500 µM biotin-phenol for 2 h at 37°C and 1 mM H2O2 for 2 min at room temperature.Samples were pooled after cell lysis and biotinylated proteins puri�ed using NeutrAvidin beads (FisherScienti�c). Denatured proteins were separated by SDS-PAGE and in-gel digested before LC-MS/MSanalysis. b. Volcano plot of protein groups identi�ed by RDProx in three biological replicates. Mean log2ratios of all replicates between HBD and HBD-WKK are plotted against the –log10 FDR. The FDR andenrichment were calculated using Limma 91. Signi�cantly enriched proteins are highlighted in blue (FDR< 0.01). Light blue indicates proteins in Tier2 (300 proteins) above 2- fold change of the mean ratio anddark blue indicates proteins in Tier1 (312 proteins) with a 4-fold change or higher.

Page 37: R-loop proximity proteomics identies a role of DDX41 in

Figure 2

Properties of R-loop-proximal proteins. a. Functional interaction network of proteins identi�ed by RDProx.Clusters were generated based on an MCODE algorithm. Semantically simpli�ed Reactome pathwayswith the lowest adjusted p-value (Fisher´s exact test, Bonferroni correction) are displayed alongside eachcluster. b. GO terms corresponding to each MCODE cluster are displayed. Terms with the lowest adjustedp- value (Fisher´s exact test, Bonferroni correction) for each GO term are plotted individually. Adjusted p-

Page 38: R-loop proximity proteomics identies a role of DDX41 in

values are shown on a –log10 scale. The color code indicates the relation between the cluster and GOterm. c. Bar plot displaying the results from a PFAM domain enrichment analysis of proteins associatedwith the individual clusters. The two PFAM domains with the highest –log10 adjusted p-value (Fisher´sexact test, Bonferroni correction) are plotted.

Figure 3

Page 39: R-loop proximity proteomics identies a role of DDX41 in

DDX41 depletion leads to replication stress and double strand breaks. a. U2OS cells were transfected withcontrol siRNA or with siRNAs against AQR, DDX27, DDX41, DDX42, DHX37 or DDX39A. Representativeimmuno�uorescence images for control siRNA, siAQR and siDDX41 of yH2AX (red) staining andHoechst33342 (blue) after 48 h of knockdown (right). Quanti�cation of the mean yH2AX intensity pernucleus (left). Center of boxplots indicate the median and whiskers the 10th-90th percentile. ****P-value <0.0001, One-way ANOVA. b. Immuno�uorescence analysis of yH2AX in U2OS cells after doxycyclineinducible expression of GFP-tagged M27-RNaseH1. Staining was performed 48 h after knockdown andRNaseH1 expression was induced 12 h post-knockdown with 1µg/ml doxycycline. Boxplot displaying thequanti�cation of the mean yH2AX intensity per nucleus for cells with medium GFP intensity (mediumM27-RNaseH1 expression). ****P-value < 0.0001, One-way ANOVA. c. Single cell electrophoresis of U2OScells after doxycycline inducible expression of HA-tagged M27-RNaseH1. After 48 h of knockdown and 36h of RNaseH1 induction, cells were embedded in low melting agarose and run in an electric �eld and totalDNA was visualized with SYBR Gold. DNA fragmentation was apparent by Comet-tail formation behindthe cell. Representative images of Comet-tails of the individual conditions are displayed (right).Quanti�cation of the tail moment of each cell with the CometScore software is shown on the left. Tailmoments for each measured cell are represented by individual dots. The black line indicates the medianof the population. **P-value < 0.01, ***p-value < 0.01, One-way ANOVA. d. Immuno�uorescence analysisof 53BP1 foci 48 h after transfection of U2OS cells with control siRNA or DDX41 siRNA. Number of fociper cell were counted automated using the Harmony software 4.8 (Perkin Elmer). Whiskers of the box plotrepresent the 10th-90th percentile. The black line indicates the median of the population. ****P-value <0.0001, unpaired student´s T-test. e. DNA �ber spreading assay of U2OS cells after 48 h knockdown ofDDX41. Additional controls were either treated with DMSO or 100 nM Aphidicolin for 1.5 h. Cells werelabeled with 30 µM CIdU for 30 min and afterwards with 340 µM IdU for 30 min before spreading the DNAand staining with the respective antibodies. Representative images of DNA �bers, white line indicates 10µm scale (right). Quanti�cation of �ber tract length in µm using the Fiji software (left). Each individualvalue is displayed and the black line indicates the median of the population. ****P-value < 0.0001, Mann-Whitney test. f. Immuno�uorescence analysis of pRPA (Ser33) in U2OS cells after 48 h knockdown ofAQR, DDX27, DDX41, DDX42, DHX37 and DDX39A. Cells were stained with Hoechst33342 (blue) and anantibody against pRPA (Ser33) (green). Representative images of cells transfected with control siRNA ora siRNA against AQR or DDX41 (right). Quanti�cation of the mean pRPA intensity per nucleus (left).Center of boxplots indicate the median and whiskers the 10th-90th percentile. ****P-value < 0.0001, One-way ANOVA.

Page 40: R-loop proximity proteomics identies a role of DDX41 in

Figure 4

DDX41 opposes R-loop accumulation in HSPCs. a. Proximity ligation assay between endogenous DDX41and GFP-tagged HBD or GFP-tagged WKK-HBD. Either GFP-tagged HBD or the WKK mutant weretransiently expressed in U2OS cells. 48 h after transfection, proximity ligation assay was performed usingprimary antibodies against DDX41 and GFP. Individual antibodies were left-out as a control to test thespeci�city. Proximal events were visualized by a rolling circle ampli�cation and resulted in PLA spots

Page 41: R-loop proximity proteomics identies a role of DDX41 in

(pink). The DNA was stained with Hoechst33342 (blue). Representative images for HBD-GFP and DDX41with the corresponding antibody leave-out controls (left). Quanti�cation of PLA spots in the nucleusbetween GFP and DDX41, comparing HBD to the HBD-WKK binding de�cient mutant (right). Each value isdisplayed individually and the black line indicates the median. ***P-value < 0.001, unpaired student´s T-test. b. Dot blot analysis in U2OS cells expressing GFP-tagged M27-RNaseH1 upon doxycycline addition.Genomic DNA was collected 48 h post-knockdown of AQR or DDX41. M27-RnaseH1 expression wasinduced with 1 µg/ml doxycycline 6 h after transfection. Half of the sample was treated with RNaseH1 tocontrol for the signal speci�city. The gDNA was spotted in different concentrations and the membranewas probed with the S9.6 antibody before being stripped and consequently probed with dsDNA antibodyas loading control. Representative images of membranes probed with S9.6 and dsDNA antibodies (left).Quanti�cation of the mean intensity per dot measured with Fiji (right). Ratios between S9.6 signal anddsDNA signal are displayed. *P-value < 0.05, one-way ANOVA. c. HBD retention assay in U2OS cells thatexpress GFP-tagged HBD upon 24 h doxycycline addition. Cells were pre-extracted with Triton X-100 tovisualize chromatin-bound HBD after 48 h knockdown of AQR or DDX41. Non-induced cells serve as acontrol to determine auto �uorescence of the cells. Control cells were treated with 10 µM DRB for 3 h toinhibit transcriptional elongation. The mean GFP intensity per nucleus was quanti�ed and displayed in abox plot. Black line indicates the median of the population and whiskers the 10th-90th percentiles. ****P-value < 0.0001, One-way ANOVA. d. Immuno�uorescence analysis of HSPCs after 24 h knockdown ofDDX41. Cells were nucleofected with plasmids encoding GFP and scrambled or DDX41 shRNA #C or #D.GFP-positive cells were sorted via FACS and seeded on coverslips. The S9.6 antibody (green) was used tovisualize RNA-DNA hybrids and DNA was counterstained with DAPI (blue). Representative images of S9.6staining in HSPCs (left). Quanti�cation of nuclear S9.6 intensity as a measure for RNA-DNA hybrid levels(right). Each dot represents a single measured value. The black line indicates the median. ****P-value <0.0001, One-way ANOVA. e. Representative images of 53BP1 (red) staining in HSPCs (left). DNA wascounterstained with DAPI (blue). Quanti�cation of nuclear 53BP1 intensity (right). Each dot represents asingle measured value. The black line indicates the median. ****P-value < 0.0001, One-way ANOVA. f.Immuno�uorescence analysis of HSPCs after expression of DDX41 WT, L237F+P238T or R525H mutantstagged with GFP. GFP-positive cells were sorted by FACS and used for the analysis. Quanti�cation of S9.6(left) or 53BP1 (right) intensity. Single values are individually displayed. The median is indicated by theblack line. n.s. = non-signi�cant, a.u. = arbitrary unit. * P-value<0.05, ****P-value < 0.0001, One-wayANOVA.

Page 42: R-loop proximity proteomics identies a role of DDX41 in

Figure 5

DDX41 binds and unwinds RNA-DNA hybrids in vitro. a. Schematic representation of DDX41 domainorganization. Location of L237F/P238T and R525H mutations identi�ed in AML/MDS patients aredepicted. Numbers represent corresponding amino acid positions. b. Fluorescence polarization assay offull length DDX41 puri�ed protein and indicated nucleic acid substrates coupled to a 6-FAM �uorophorein three technical replicates with individually thawed protein aliquots. Titrated DDX41 was incubated

Page 43: R-loop proximity proteomics identies a role of DDX41 in

together with 20 nM of the respective 6- FAM oligonucleotide substrate and the change in �uorescencepolarization was measured upon binding. The protein concentration in µM on a log2-scale is plottedagainst the �uorescence polarization in mP (milipolarization unit). Dots represent mean values of allreplicates and whiskers indicate the standard deviation from the mean. Individual lines indicate theresults obtained from the Michaelis-Menten �t. c. ADP-Glo assay measuring ATP hydrolysis of full lengthDDX41 in 5 technical replicates using individually thawed protein aliquots. DDX41 was titrated from 5 µMto 0 µM and incubated together with 100 nM RNA-DNA hybrid substrate with a ssDNA overhang and 5 µMATP for 60 minutes at 37°C. ATP to ADP conversion was measured by luciferase activity. Results aredisplayed as percentage of consumed ATP compared to total ATP, based on an interpolation against astandard curve. Individual measurements are plotted and mean of all replicates is indicated by the blackdashed line. Standard deviation from the mean is represented by error bars. d. FRET-based RNA-DNAhybrid displacement assay. Full length DDX41 protein in indicated concentrations is incubated togetherwith 100 nM RNA-DNA hybrid substrate with a ssDNA overhang and 5 µM ATP. Displacement of the IBFQ-conjugated 38-mer DNA oligo from the 6- FAM-conjugated 13-mer RNA oligo was measured by thechange in �uorescence intensity. The obtained intensities are normalized on the intensity from the single-stranded 6-FAM-conjugated RNA oligo after background subtraction and expressed as percentageunwound. e. FRET-based RNA:DNA hybrid displacement assay. Different concentrations of full lengthDDX41, R525H or 153-410 mutant were incubated together with 100 nM RNA:DNA hybrid substrate with assDNA overhang and 5 µM ATP. Displacement of the IBFQ-conjugated 38-mer DNA oligo from the 6-FAM-conjugated 13-mer RNA oligo was measured by the change in �uorescence intensity. The obtainedintensities are normalized on the intensity from the single-stranded 6-FAM-conjugated RNA oligo afterbackground subtraction and expressed as percentage unwound. *P-value < 0.05, **P-value < 0.01, ****P-value < 0.0001, One-way ANOVA.

Page 44: R-loop proximity proteomics identies a role of DDX41 in

Figure 6

DDX41 depletion results in R-loop accumulation at CGI promoters. a. MapR in three biological replicatesfrom U2OS cells after 48 h knockdown with control siRNA (grey), DDX41 siRNA (blue) or treatment with 4µM Actinomycin D for 6 hours (orange). Metagene pro�les of MapR signal from -2kb/+2kb around thetranscription start site of all expressed genes within the RNA-sequencing analysis of U2OS cells, dividedin 4 quantiles. Shadows indicate the standard error of the median. b. MA plot of the MapR differential

Page 45: R-loop proximity proteomics identies a role of DDX41 in

peak analysis using Diffbind and statistical testing by EdgeR. The region +/- 500 bp around the summitof the consensus peaks was used for the Diffbind analysis. Consensus peaks were created as peakswhich were overlapping in at least 2 samples. The mean log2 fold change between siCtrl and siDDX41 isplotted against the log2 average counts per million representing the coverage. Genomic regions that aredifferentially bound (<5% FDR) are highlighted in red (up) or in blue (down). Peaks >5% FDR are indicatedin grey. The black line indicates a Loess �t to indicate the relationship between fold change and coverage.c. Bar plot displaying the number of differentially-bound regions with annotated CpG islands. Theproportion of genomic regions overlapping with CGIs (red) or not-overlapping regions (green) aredepicted. Left bar represents down-regulated MapR regions and right bar up-regulated MapR regions afterDDX41 knockdown compared to the control knockdown. d. Network of the Reactome pathwayenrichment analysis of the up-regulated genomic sites (with higher binding) after DDX41 knockdowncompared to control knockdown. All expressed genes based on RNA-seq were used as background. Peakswhich annotation was not associated with genes (distal intergenic) were excluded from the analysis. Thesize of the dots indicates the number of genes contributing to the displayed term. Gradual coloringrepresents the adjusted p- values (Fisher´s exact test with Bonferroni-Holm correction). e. Overlapbetween genes with higher R-loop levels based on MapR after DDX41 knockdown and their changes ingene expression obtained by RNA-seq. The circle represents the proportion of genes that are higher boundin MapR and either up- (turquois), down- (blue) or not signi�cantly regulated (grey) in RNA-seq. Reactomepathways enriched below 5% adjusted p-value for each segment are depicted. Genes contributing to theindividual term are displayed in a String network with 0.7 con�dent score and colored based on theira�liation.

Page 46: R-loop proximity proteomics identies a role of DDX41 in

Figure 7

Model for DDX41 function in R-loop homeostasis. a. Wild type DDX41 associates with R-loops andmaintains R-loop homeostasis at CGI promoters by unwinding RNA-DNA hybrids. Pathogenic DDX41variants found in AML display impaired RNA-DNA hybrid unwinding activity, leading to the accumulationof R-loops at CGI promoters and altered expression of TGFβ and NOTCH signaling genes. Accumulationof R-loops at CGI promoters results also in increased replication stress, DSBs and in�ammatory signaling,which renders DDX41 mutated AML cells vulnerable to ATR inhibitors.

Supplementary Files

This is a list of supplementary �les associated with this preprint. Click to download.

TableS1.xlsx

TableS2.xlsx

TableS3.xlsx