Upload
roberta-nichols
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
1
John Walker
http://www.fourmilab.ch/
An Attempt to Replicate the Shnoll et al. Effect
with Algorithmic Classification of Histogram Similarity
2
Principal Goal
Development of an easily-replicated stochastic source and an accompanying computer-based toolkit for exploring time-dependence in histogram structure and automated techniques for histogram similarity ranking.
3
Stochastic Source Oxford Nuclear 5.0 µCi 137Cs 661.6keV gamma source (US$40) Aware Electronics RM-80 Geiger- Müller detector with serial port interface (US$319) Generic PC with MS-DOS and a serial port Modified HotBits generator software (public domain) Event rate 200,000 counts/min Background 60 counts/min
4
Generator Software MS-DOS (not Windows) 16-bit program Direct port access from assembly language Interval timing from PC ROM BIOS clock Time of day synchronised with Network Time Protocol
Small footprint: “retired” PCs suitable as generators Consistent hardware-based interval timing Accurate detection and accumulation of counts Measurements precisely labeled with date and time
Design Goals:
5
Raw Data Format One Measurement Record per minute, beginning at start of the minute 100 consecutive Count Windows per minute, each consisting of nine ticks of the 18.2 Hz PC hardware clock Mean ticks per Count Window 1900 CSV output record written in “housekeeping time” between end of 100th Count Window and start of next minute: Unix time() at start of minute followed by 100 comma separated count valuesFile size: 735 Kb/day
964310400,1890,1964,1898,1902,1840,1901,1842,1916,1886,1901,1838,1932,1880,1985,1910,1883,1919,1903,1895,1913,1899,1902,1870,1914,1897,1858,1854,1855,1893,1860,1948,1837,1887,1865,1888,1882,1914,1914,1905,1903,1898,1930,1892,1883,1926,1903,1861,1899,1951,1900,1856,1877,1861,1861,1865,1882,1850,1882,1910,1874,1870,1893,1926,1923,1880,1889,1911,1885,1913,1863,1883,1918,1910,1933,1945,1891,1873,1910,1861,1850,1888,1948,1902,1881,1939,1948,1861,1870,1897,1938,1895,1896,1889,1912,1919,1867,1847,1899,1937,1890
6
Analysis Software Reads one or more days’ Raw Data CSV files Assembles count histograms into Experiments of 10 minutes each
Raw Data HistogramCompilation
Transformation Modules
Histogram Pair Assembly
Matching Modules
Closeness Sort Ranking Table
7
Histogram Compilation Arranges raw data into 10 minute Experiments, each beginning at a round 10 minute interval. (Intervals with missing data are discarded.) Builds in-memory raw histograms (number of occurrences of a given count in interval) Computes exponentially smoothed moving average (P = 0.2) of histogram, symmetrically from the mean Creates histogram CSV files (raw and smoothed) for each experiment for subsequent analysis Plots each experiment’s histogram as a GIF file
8
Transformation ModulesOpen-ended plug-in modules transform experimenthistograms in place:
NORMALISE: Scales histogram values so that maximum value is 1 FOURIER: Replaces histogram with its Fourier transform WAVELET: Replaces histogram with its discrete wavelet transform using the Daubechies 4-coefficient filter coefficients
Multiple transforms can be enabled; new transforms canbe added.Transformed histograms and their inverses can be plotted for debugging.
9
Histogram Pair Assembly All pairs of histograms are tabulated in memory Assumes matching algorithm is commutative (but this can be changed)
10
Matching Modules (1) Plug-in modules which, given a pair of experiment histograms, return a floating-point metric of how “close” they are in morphology.
MEAN-ALIGNED ²: Histograms are shifted so mean values align, then ² distance between the curves is computed. SLIDING, MIRRORED ²: Histograms are initially aligned at their mean value, then the histogram pair and pair with one mirrored about the mean are shifted along the X axis and the minimum ² distance is reported.
11
Matching Modules (2)SLIDING, MIRRORED, STRETCHED ²: Histograms are initially aligned at their mean value, then the histogram pair and pair with one mirrored about the mean, and histograms scaled along the X axis within a defined range, are shifted along the X axis and the minimum ² distance is reported. (Work in progress.)
HUMAN-DIRECTED: It would be possible to input the ranking table from similarity measures made by human judges.
12
Closeness SortSorts histogram pairs by closeness as determined by the Matching Module.
Produces aligned plots of best and worst matches to evaluate effectiveness of Matching Module.
13
Ranking TableCSV format file lists histogram pairs in descending order of closeness evaluated by the Matching Module.
Closeness metric and free matching parameters included for downstream analysis programs.
0.5603,964031400,2000-07-19-18-30,964112400,2000-07-20-17-00,1,20.589,964181400,2000-07-21-12-10,964343400,2000-07-23-09-10,-1,-830.5926,964311000,2000-07-23-00-10,964402200,2000-07-24-01-30,1,00.5943,964224000,2000-07-22-00-00,964413000,2000-07-24-04-30,-1,-850.5943,963837000,2000-07-17-12-30,963957000,2000-07-18-21-50,1,1 . .
.31.45,963907800,2000-07-18-08-10,964073400,2000-07-20-06-10,1,431.8,964226400,2000-07-22-00-40,964245000,2000-07-22-05-50,1,-231.92,963907800,2000-07-18-08-10,964158600,2000-07-21-05-50,1,431.92,964226400,2000-07-22-00-40,964270800,2000-07-22-13-00,-1,-8433.9,963907800,2000-07-18-08-10,963950400,2000-07-18-20-00,1,2
14
Time BinningReads Ranking Table and bins into deciles by closeness metric, creating a histogram of time difference between histograms for each decile.
Creates expectation value table for null hypothesis.
Normalises decile histograms vs. null hypothesis and plots results by decile.
15
Ranking Table RandomiserShuffles lines in the ranking table produced by the Closeness Sort.
Time binning randomised ranking provides null hypothesis control for closeness matching.
16
Pilot ExperimentData collected continuously from 2000-07-16 through 2000-07-24; no gaps in data set.
Data set contains: 12,960 one minute measurement records1,296,000 equal duration count windows 1,296 ten-minute experiments 839,160 histogram pairs, excluding self/self
and assuming commutative comparison
17
Complete Data Set Histogram
µ = 1889.35, = 26.4
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Counts 1769 1789 1809 1829 1849 1869 1889 1909 1929 1949 1969 1989 2009
Occurrences
Normal Distribution
18
Representative ExperimentHistograms
19
Closely Matching Histograms
20
Closely Matching Histograms
21
Closely Matching Histograms
22
Closely Matching Histograms
23
Poorly Matching Histograms
24
Poorly Matching Histograms
25
Null Hypothesis TimeDistribution Expectation
26
Closeness Ranking: Closest 2000
27
Closeness Ranking: Decile 1
28
Closeness Ranking: Decile 2
29
Closeness Ranking: Decile 9
30
Closeness Ranking: Decile 10
31
Control Ranking: Decile 1
32
Control Ranking: Decile 2
33
Control Ranking: Decile 9
34
Control Ranking: Decile 10
35
Conclusions FromPilot Experiment
No evidence found for time dependence in fine structure of smoothed histograms.
Not a refutation due to very small data set, single generator at one location, limitations in automated histogram similarity scoring, and inability to correlate automated scoring vs. human judging reported by Shnoll et al.
36
Toolkit AvailabilityAll software developed for this project is in the public domain and all ancillary software is free software included in a standard Linux distribution.
Hardware cost for the stochastic generator is less than US$500, plus a generic MS-DOS PC.
Analysis source code and pilot experiment data set available to all investigators.
Open framework for exploring automated histogram similarity ranking.
37
ReferencesShnoll, S. et al., “Realization of discrete states during fluctuations in macroscopic processes”, Physics–Uspekhi 41 (10) 1025 –1035 (1998).
Shnoll, S. et al., “Regular variation of the fine structure of statistical distributions as a consequence of cosmophysical agents”, Physics–Uspekhi 43 (2) 25 –209 (2000).
38