Regional Workshop on Development of Radon Maps and the ... · Late Jurassic Late Permian Middle...

Preview:

Citation preview

Regional Workshop on Development of Radon Maps and

the Definition of Radon-Prone Areas

Javier Elio (javiereliomedina@gmail.com)

Vilnius, 9 - 11 July 2019

Software: R (and QGIS)

Reasons for Geoscientists and Engineers to Learn Coding (Michael Pyrcz, University of Texas at Austin - @GeostatGuy)

Caveats:1. Any type of coding, scripting, workflow matched to your working environment is great. We don’t need to be C++ experts (me: actually I do know nothing

about C++!!!!)2. We need experience component of geoscience and engineering expertise. This is beyond coding and is essential to workflow logic development, best use

of data, etc.3. Some expert judgement will remain subjective and not completely reproducible. I’m not advocating for the geoscientist or engineer being replaced by a

computer.

Transparency – no compiler accepts hand waiving! Coding forces your logic to be uncovered for any other scientific or engineer to review.

Reproducibility – run it, get an answer, hand it over, run it, get the same answer. This is a main principle of the scientific method.

Quantification – programs need numbers. Feed the program and discover new ways to look at the world.

Open-source – leverage a world of brilliance. Check out packages, snippets and the amazed with what great minds have freely shared.

Break Down Barriers – don’t throw it over the fence. Sit at the table with the developers and share more of your subject matter expertise for a better product.

Deployment – share it with others and multiply the impact. Performance metrics or altruism, your good work benefits many others.

Efficiency – minimize the boring parts of the job. Build a suite script for automation of common task and spend more time doing science and engineering!

Always Time to Do it Again! – how many times did you only do it once? It probably takes 2-4 times as long to script and automate a workflow. Usually worth it.

Be Like Us – it will change you. Users feel limited, programmers truly harness the power of their application and hardware.

R software

R-CRAN: https://cran.r-project.org/

R Studio

R Studio: https://www.rstudio.com/

R Studio

Introduction

https://r4ds.had.co.nz/https://bookdown.org/rdpeng/rprogdatascience/

Index

1. Exploratory analysis

� Outliers, histograms, Q-Q plots, etc.

� Non-detected values

� ANOVA analysis

� Spatial distribution of indoor radon measurements (2D Kernel density plots)

2. Summary statistics by small areas (e.g. grids, municipalities, …)

� Map summary statistics (e.g. N, AM)

� Map probability of having an indoor radon concentration higher than a reference level

3. Interpolation techniques

� Inverse distant weighted (IDW)

� Ordinary kriging (OK)

4. Dose maps

5. Interactive maps

Index

1. Exploratory analysis

� Outliers, histograms, Q-Q plots, etc.

� Non-detected values

� ANOVA analysis

� Spatial distribution of indoor radon measurements (2D Kernel density plots)

2. Summary statistics by small areas (e.g. grids, municipalities, …)

� Map summary statistics (e.g. N, AM)

� Map probability of having an indoor radon concentration higher than a reference level

3. Interpolation techniques

� Inverse distant weighted (IDW)

� Ordinary kriging (OK)

4. Dose maps

5. Interactive maps

Lognormal distribution?

�� − 1� � ≠ 0Log(X) � = 0Box-Cox transformation

Histogram Indoor Radon

Rn [Bqm−3

]

Den

sity

0 500 1000 1500

0.0

00

0.0

04

0.0

08

0.0

12

0 200 400 600 800 1200

-3-2

-10

12

3

Q-Q plot (InRn)

InRn$Rn

no

rm q

uantile

s

-1.0 -0.5 0.0 0.5 1.0

-52

00

-48

00

-440

0-4

00

0

λ

log-L

ike

lihoo

d

95%

Box-Cox Transformation

2 3 4 5 6 7

-3-2

-10

12

3

Q-Q plot (log InRn)

InRn$LogRn

no

rm q

uantile

s

How to treat non-detected values?

What can we do with values bellow the DL?

1. Assign a value equal to the DL, however it produces an overestimation of the mean value.

2. Report as “zero”. Not realistic for indoor radon, and probably underestimate the mean.

3. Report as half of the DL (or other fraction). It may be adequate if the percentage of non-detected is low

(< 10-15%). However, the variance of the data is altered which may lead to inaccurate results.

4. Assign a concentration based on a statistical estimation, effective for dataset with high proportion of

detects (> 50%).

5. If the percentage of non-detected is large (> 50%) we may only consider if radon was detected (or not)

above some level

Detected AND quantifiedNot detected Detected NOT quantified

Zero

No element

Detection

limit

Quantification

limit

Saturation of the detector

Upper

limit

Imputation methods

The final method depends on multiple factors (e.g. the number of data, the proportion of non-detected, objectives of the

survey). Here, I present a method for replacing non-detected (i.e. ROS: Regression on Order Statistics) but if you think

censored data were an issue in the data analysis you would need to consult an statistician.

N = 1000

AM = 64.76

SD = 133.75

GM = 24.67

GSD = 3.77

Prob[Rn > 200] = 5.73%

N = 1000

AM = 64.73

SD = 133.77

GM = 23.18

GSD = 4.28

Prob[Rn > 200] = 6.91%

2 3 4 5 6 7

-3-2

-10

12

3

Original data

InRn$LogRn

norm

quantile

s

0 2 4 6

-3-2

-10

12

3

After ROS

InRn_DL$LogRn

norm

quantile

s

Histogram – Boxplot – QQ Plot

Histogram

LogRn [Bqm−3

]

Density

0 2 4 6

0.0

00.0

50.1

00.1

50.2

00.2

5

02

46

Boxplot

LogRn [Bqm−3

]

Lognorm

al tr

ansfo

rmation

0 2 4 6

-3-2

-10

12

3

Normal Q-Q plot

Observed Value

Expecte

d N

orm

al V

alu

e

Plot spatial distribution of the data

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

[500,1.43e+03]

[300,500)

[200,300)

[100,200)

[50,100)

[0,50)

Indoor radon measurements (Simulated)

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

< 200

>= 200

Indoor radon measurements (Simulated)

2D Kernel density plots

InRn vs. Geology (ANOVA)

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

AgeName

Early Cretaceous

Early Triassic

Late Devonian

Late Jurassic

Late Permian

Middle Jurassic

Geology 1:1M

Early C

reta

ceous

Early T

riassic

Late

Devonia

n

Late

Jura

ssic

Late

Perm

ian

Mid

dle

Jura

ssic

0

2

4

6

Geology (AgeName)

LogR

n [B

qm

−3]

anova(lm_BG)

Analysis of Variance Table

Response: LogRn

Df Sum Sq Mean Sq F value Pr(>F)

AgeName 5 46.23 9.2450 4.4495 0.0005156 ***

Residuals 994 2065.32 2.0778

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Index

1. Exploratory analysis

� Outliers, histograms, Q-Q plots, Outliers, etc.

� Non-detected values

� ANOVA analysis

� Spatial distribution of indoor radon measurements (2D Kernel density plots)

2. Summary statistics by small areas (e.g. grids, municipalities, …)

� Map summary statistics (e.g. N, AM)

� Map probability of having an indoor radon concentration higher than a reference level

3. Interpolation techniques

� Inverse distant weighted (IDW)

� Ordinary kriging (OK)

4. Dose maps

5. Interactive maps

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

0

5

10

15

20

N

Number of data

Summary statistics

Number od dwelling sampled (N) by grid cells of 10 km x 10 km, and arithmetic mean (AM)

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

NA

[200,274]

[100,200)

[75,100)

[50,75)

[25,50)

[0,25)

Arithmetic mean

Probabilistic map

� Objective:

• Estimate the probability of having an indoor radon

concentration higher than a reference level (e.g. 200 Bq m-3)

� Method:

• Select radon measurements in each grid cell (or municipality,

small areas, district, etc.)

• Calculate the GM and the GSD in each grid cell

• Supposing a log-normal distribution, estimate the probability

of having a values above the reference level (RL)

• In the grids cells with few data (e.g. N ≤ 5), estimates are

carried out based on neighbour grids (interpolation – IDW)

GM = 57; GSD = 2.4

InRn

Fre

qu

en

cy

0 100 200 300 400 500

05

01

50

RL = 200

GMµ = 0

σ = 1

k

� = � �� − � �� � ���⁄ )

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

%

[30,44.5]

[20,30)

[10,20)

[5,10)

[1,5)

[0,1)

Prob[InRn > 200 Bq m-3]

Index

1. Exploratory analysis

� Outliers, histograms, Q-Q plots, Outliers, etc.

� Non-detected values

� ANOVA analysis

� Spatial distribution of indoor radon measurements (2D Kernel density plots)

2. Summary statistics by small areas (e.g. grids, municipalities, …)

� Map summary statistics (e.g. N, AM)

� Map probability of having an indoor radon concentration higher than a reference level

3. Interpolation techniques

� Inverse distant weighted (IDW)

� Ordinary kriging (OK)

4. Dose maps

5. Interactive maps

Points for interpolation

DegreesN/S or E/W

(aprox.)

1.0 111.32 km

0.1 11.132 km

0.01 1.1132 km

Not projected data: Great-circle distance

Degrees vs. distance

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

��� =∑ 1������� ��∑ 1�������

1.0 1.5 2.0 2.5 3.0 3.5 4.0

118

12

21

26

13

0

idp

RS

ME

10-fold cross-validation

Inverse Distance Weighted (IDW)

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

[500,648]

[300,500)

[200,300)

[100,200)

[50,100)

[0,50)

IDW - Predictions

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

[500,529]

[300,500)

[200,300)

[100,200)

[50,100)

[0,50)

OK - Predictions

Model

Nug

psill

Exp

range

0.54

1.91

0.00

11.470.0

0.5

1.0

1.5

2.0

2.5

0 10 20 30 40

dist

gam

ma id

Model

var1

Variogram

Ordinary kriging (OK)

� !" = # $ %& !' +)**(,-)(.%&/ (!'0/ −1)

ϕ34� =5X3 − 1λ log X ϕ x = < x · λ �3e? ϕ** x = < 1 − λ x · λ + 1 �34@� ≠ 0e?� = 0

Trans-Gaussian kriging using Box-Cox transforms:

Predictions are carried out over the transformed data, and then unbiased

back-transformed to the original scale using the Lagrange multiplier

(function krigeTg in R software, “gstat” and “MASS”).

Radon concentration (AM) by small areas

In our cases Grid Cells of 10 x 10 km; but we could choose other geometries (e.g. municipalities, districts, 1 km x 1 km, etc.)

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

NA

[200,274]

[100,200)

[75,100)

[50,75)

[25,50)

[0,25)

Arithmetic mean

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

NA

[195,200]

[100,195)

[75,100)

[50,75)

[25,50)

[0,25)

IDW - Predictions

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

NA

[200,253]

[100,200)

[75,100)

[50,75)

[25,50)

[0,25)

OK - Predictions

Index

1. Exploratory analysis

� Outliers, histograms, Q-Q plots, Outliers, etc.

� Non-detected values

� ANOVA analysis

� Spatial distribution of indoor radon measurements (2D Kernel density plots)

2. Summary statistics by small areas (e.g. grids, municipalities, …)

� Map summary statistics (e.g. N, AM)

� Map probability of having an indoor radon concentration higher than a reference level

3. Interpolation techniques

� Inverse distant weighted (IDW)

� Ordinary kriging (OK)

4. Dose maps

5. Interactive maps

Dose map

ABBCDEFGHI: K 1LMN4O = PQR · ST · U · S% · SVChallenges (to be further investigated):

1. Annual indoor radon concentration (CRn):

a) Predictions over small areas (grid cells of 10 km x 10 km, 1 km x 1 km, municipalities, districts, etc.). Can we improve

predictions? New statistical models (e.g. ML)? Other secondary variables (e.g. soil permeability)?

b) Floor correction model

2. Equilibrium factor (FE): convert CRn to the Equivalent Equilibrium Concentration (EEC) of radon daughters. Take default values (e.g.

0.4 UNSCEAR)? Regional trends? Dependence of usage?

3. Time expend indoors? (Occupancy factor - Fo). UNSCEAR recommend a value of 0.8. Can we use the same value for all the country

(e.g. differences between rural vs. urban areas)? Same value for different nationalities?

4. Time spend at home vs. workplaces? Rn characteristic at workplaces is in general different than at dwellings. Can we model this?

5. Commuting patters. Workplace in most cases not at same location as home. Sometimes quite far away… people commute 100 km.

How to model such effect?

6. Dose conversion factor (FD): dose coefficient applied to the EEC. International recommendation 9·10−6 mSv per Bq m−3 h (under

discussion).

Indoor radon (AM - SD)

Select the best method according to our data

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

NA

[200,274]

[100,200)

[75,100)

[50,75)

[25,50)

[0,25)

Arithmetic mean

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

NA

[195,200]

[100,195)

[75,100)

[50,75)

[25,50)

[0,25)

IDW - Predictions

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

Bq/m3

NA

[200,253]

[100,200)

[75,100)

[50,75)

[25,50)

[0,25)

OK - Predictions

Dose map

ABBCDEFGHI: K 1LMN4O = PQR · ST · U · S% · SVInitial try (to be adjusted):

� Indoor radon concentration at ground floor level and

standard dose conversion factors.

� Uncertainty analysis by Monte Carlo simulation:

(Elío et al, Environnent International 114: 69–76, 2018)

Nsim = 100

CRn ~ N(AM, SD) [truncated InRn > 0]

FE ~ LN(0.4, 1.15)

FO ~ N(0.8, 0.03)

FD ~ N(9·10-6 ; 1.5 ·10-6 )

T = 8760 h/y

� Map the AM and SD of the simulated values 55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

2

4

6

mSv/y

Radiation dose - AM

55°N

55.2°N

55.4°N

55.6°N

55.8°N

56°N

23°E 23.2°E 23.4°E 23.6°E 23.8°E 24°E

1

2

3

mSv/y

Radiation dose - SD

Interactive maps(leaflet)

It is possible to save as Web page (HTLM) and upload on the web:

File:///C:/Users/elioj/Documents/JAVIER_Trabajo/IAEA/2019_07_Lithuany/R-Lithuania/Rresults/Interactive_OK_Pred_Map.html

Geographic Information System

https://qgis.org/en/site/

Geographic Information System

More info

Recommended