Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Research Article 1
BD5: an open HDF5-based data format to represent 2
quantitative biological dynamics data 3
Koji Kyoda1,2, Kenneth H. L. Ho1,2, Yukako Tohsato1,2,3, Hiroya Itoga1 and Shuichi 4
Onami1,2,* 5
1Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics 6
Research, Kobe 650-0047, Japan. 7
2Laboratory for Developmental Dynamics, RIKEN Quantitative Biology Center, Kobe 8
650-0047, Japan. 9
3Department of Information Science and Engineering, Ritsumeikan University, Shiga 10
525-8577, Japan. 11
*To whom correspondence should be addressed 12
13
Short title: BD5 data format for representing quantitative biological dynamics data 14
15
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
2
Abstract 16
BD5 is a new binary data format based on HDF5 (hierarchical data format version 5). It 17
can be used for representing quantitative biological dynamics data obtained from 18
bioimage informatics techniques and mechanobiological simulations. Biological 19
Dynamics Markup Language (BDML) is an XML(Extensible Markup Language)-based 20
open format that is also used to represent such data; however, it becomes difficult to 21
access quantitative data in BDML files when the file size is large because parsing XML-22
based files requires large computational resources to first read the whole file 23
sequentially into computer memory. BD5 enables fast random (i.e., direct) access to 24
quantitative data on disk without parsing the entire file. Therefore, it allows practical 25
reuse of data for understanding biological mechanisms underlying the dynamics. 26
27
28
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
3
Introduction 29
Recent advances in bioimage informatics and mechanobiological simulation techniques 30
have led to the production of a large amount of quantitative data of spatiotemporal 31
dynamics of biological objects ranging from molecules to organisms [1]. A wide variety 32
of such data can be described in an open unified data format Biological Dynamics 33
Markup Language (BDML), an Extensible Markup Language (XML)-based format [2]. 34
BDML enables efficient development and evaluation of software tools for a wide range 35
of applications. 36
The XML-based BDML format has the advantages of machine/human readability, 37
and extensibility. However, it is often problematic for accessing and retrieving data 38
when the size of the BDML file becomes too large (e.g., our programs cannot load a 39
BDML file over 20 GB on a standard workstation). This problem arises because parsing 40
an XML-based file often requires large computational resources to first read the whole 41
file sequentially into computer memory. In fact, many sets of quantitative data stored in 42
the SSBD:database (Systems Science of Biological Dynamics database) [1] were 43
divided into a series of BDML files for each time point to allow software to read them 44
efficiently. One of the solutions to the above problem is to use another approach such as 45
the eXtensible Data Model and Format [3] or FieldML [4]. In these formats, the data 46
itself is described in HDF5 binary format and meta-information about the data is 47
described in XML format. HDF5 is a hierarchical data format for storing large scientific 48
data sets (http://www.hdfgroup.org/HDF5/). It is widely used for describing various 49
kinds of large-scale biological data [4-9]. 50
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
4
Here, we describe the development of BD5 data format, based on HDF5, for 51
representing quantitative biological dynamics data in a manner that enables quick access 52
and retrieval. 53
Materials and Methods 54
Design and implementation 55
Here, we extended BDML to support HDF5-based storage of quantitative biological 56
dynamics data. In contrast to XML documents, HDF5 format can allow random (i.e., 57
direct) access to parts of the file without parsing the entire contents. Therefore, HDF5 is 58
a more efficient file format for accessing and retrieving the contents of the file. 59
We developed the BD5 data format based on HDF5 for representing quantitative 60
data. A BD5 file is organized into two primary structures, datasets and groups. Datasets 61
are array-like objects that store numerical data, whereas groups are hierarchical 62
containers that store datasets and other groups. Detailed information on BD5 is 63
available at http://ssbd.qbic.riken.jp/bdml/. Here, we summarize the BD5 major datasets 64
and groups. BD5 format has one container named data (Fig. 1). It includes 65
● scaleUnit dataset for the definition of spatial and time scales and units, 66
● objectDef dataset for the definition of biological objects, 67
● featureDef dataset for features of interest, 68
● numbered groups (0, 1, … , n) corresponding to an index number of a 69
time-ordered sequence, 70
● trackInfo dataset for the information of tracking of one object to another. 71
Each of the numbered groups corresponds to an index of a time-ordered sequence 72
that has object and feature groups. For a fixed time interval, the index will correspond 73
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
5
to each sequential time point. For example, if the time interval is 2 minute, group 0 will 74
have t = 0 and group 1 will have t = 1 while tScale is 2 and tUnit is minute (Fig. 1). For 75
irregular time intervals, the index allows a time-ordered sequence to be saved and be 76
read in the correct order. If the first time is 0 minutes, the second time is 2 minutes, 77
while the third time is 7 minutes, then group 0 will have t = 0, group 1 will have t = 2 78
and group 2 will have t = 7. The tUnit is still minute, but the tScale in this case will be 79
1. 80
Each object group has numbered dataset(s) corresponding to the reference number 81
of the biological object(s) predefined under the objectDef dataset. Each row of the 82
numbered object includes an identifier of the object and its spatiotemporal information 83
such as time point and xyz-coordinates (Fig. 2). To represent biological objects such as 84
line and face entities that have an arbitrary number of xyz-coordinates in BD5 format, a 85
tabular dataset is used (Fig. 3). The multiple xyz-coordinates are represented by using a 86
sequential ID (sID) that allows us to connect the xyz-coordinates together to form a line 87
or a face within a biological object. 88
Each feature group has numbered dataset(s) corresponding to the reference 89
number of the object(s) predefined in the objectDef dataset. Each row of the numbered 90
object includes an identifier of the object, an identifier of the feature (fID) predefined in 91
featureDef, and the value of the feature (Fig. 4). This format allows objects that do not 92
possess all the features defined in featureDef to be recorded, because not all the features 93
can necessarily be measured in practical biological experiments. For example, in the 94
experiment in Fig. 4, an object may have information for fID = 1 (name: center-of-mass 95
GFP signal) but not fID = 0 (name: average GFP signal). 96
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
6
The trackInfo dataset enables information of the objects to be linked between 97
different time points or time frames (Fig. 2). For example, when a cell at t = 0 divides 98
into two daughter cells at t = 1, it has links from the parent cell to the daughter cells. 99
The trackInfo dataset can be used to represent not only phenomena such as cell division 100
but also those such as cell fusion. 101
To allow the use of BD5 to describe quantitative data, we needed to update the 102
BDML format so that it could be used to describe the corresponding meta-information. 103
The latest version of BDML (version 3.0) can handle an external file by using the 104
extFile element (Fig. 5). The bd5File element that we introduced within the 105
extFile element can be used to point to an external BD5 file. In addition, this update 106
allows the designation of multiple contact persons and the use of a unique persistent 107
digital identifier, ORCID (https://orcid.org) in its format. 108
Results 109
Validation 110
To evaluate the performance of the BD5 format, we first compared time for accessing 111
the file between XML- and HDF5-based files (i.e., between pairs of BDML and BD5 112
files containing equivalent data). We measured the time for accessing coordinate data at 113
a randomly selected time point in the BDML and BD5 files (334 pairs of files) by using 114
a Python-based program (Fig. 6a). The results indicate that the access times of HDF5-115
based files were consistently faster than those of the corresponding XML-based files. 116
Therefore, BD5, the new HDF5-based format, enables practical access to quantitative 117
data for further analysis. 118
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
7
File size can be a critical benchmark for a data format because the transfer of large 119
files often fails. Therefore, we next compared disk space requirement between the 120
XML- and HDF5-based files by comparing the size of BDML and BD5 files (450 pairs 121
of files) (Fig. 6b). BD5 format reduced the file size by ~85% compared with the BDML 122
format when the data is large. When the data is small (< 300MB), the size of BD5 file is 123
close to, but still less than, that of the corresponding BDML file. Because the size of 124
HDF5-based files for large data is much less than that of the equivalent XML-based 125
files, the BD5 format enables, in theory, fast transfer of large quantitative data to and 126
from computers on the network and on the internet. 127
In addition, we determined the relationships between access time and file size for 128
BDML and BD5 files (Fig. 7). In BD5, we found fast access to the coordinate data even 129
when the file size was large. This fast data access in BD5 originated from its random 130
access to data. In BDML, the access time linearly increased with file size. This result 131
suggests that parsing of XML was the main bottleneck of data access. Quantitative 132
biological dynamics data tends to be large due to the advances in live-cell imaging 133
techniques and imaging equipment. We anticipate that BD5 will play a key role in fast 134
access to such large data sets. 135
Software tools and usage related to BD5 136
So that BD5-based tools can be used for data stored in older BDML files, we provide a 137
C++-based software tool named BDML2BD5. By using this tool, BDML files can be 138
converted into BD5 files. To compile the tool, the HDF5 library is required for HDF5 139
data writing, and CodeSynthesis XSD (http://www.codesynthesis.com/products/xsd/) is 140
required for the BDML schema to C++ data binding compiler. All source codes and the 141
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
8
executable file of BDML2BD5 are available at 142
https://github.com/openssbd/BDML2BD5/. 143
We also provide a program bd5lint for detecting bugs and inconsistencies in BD5 144
files. The program checks the structure of BD5 files, and checks that the ordered 145
numbered datasets in object and feature groups correspond to the reference numbers of 146
the objects and features predefined in the objectDef and featureDef dataset. It also 147
checks the consistency of the dimensions declared and the actual dimensions used 148
within the datasets. It provides type checking of the data and error warnings if the data 149
do not conform to the BD5 specification. The Python source code is available at 150
https://github.com/openssbd/bd5lint/. 151
We also provide several Python-based programs for data analysis using BD5 files. 152
These programs are available as Jupyter Notebook files at 153
https://github.com/openssbd/BDML-BD5/. An example is a program that counts the 154
number of biological objects in each numbered group of the time-ordered sequence. By 155
using this program, we can obtain the proliferation curve of Caenorhabditis elegans 156
embryogenesis. The program can be modified to obtain similar information for other 157
organisms such as Danio rerio and Drosophila melanogaster. 158
Discussion 159
In this study, we developed a new BD5 data format based on HDF5 for representing 160
quantitative biological dynamics data. Compared with BDML, which is based on XML, 161
the BD5 format has two advantages: (a) fast access and retrieval to quantitative data 162
because of random access to the HDF5-based file, and (b) fast transfer of files 163
containing large quantitative data because the file size is dramatically reduced. A 164
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
9
drawback of the BD5 is that human readability is low when compared with BDML 165
format. BD5 files cannot be opened by text editors because the file is binary formatted. 166
However, the HDF group provides a software tool named HDFView that enables the 167
user to open and read all HDF5-based files 168
(https://www.hdfgroup.org/downloads/hdfview/). This tool can compensate for the lack 169
of human readability. 170
BD5 format has already been used in the latest version of SSBD:database 171
(http://ssbd.qbic.riken.jp), which is one of the major databases for sharing bioimage data 172
and quantitative biological dynamics data [10]. Over 687 files, which include a wide 173
variety of quantitative biological dynamics data from molecules to cells to organisms, 174
are available. This demonstrates that the BD5 format has high functionality and 175
flexibility for representing quantitative biological dynamics data. SSBD:database also 176
provides a RESTful API (i.e., an API (application programming interface) that allows 177
applications to access data and interact with external software tools) through the use of 178
the webservice h5serv (https://github.com/HDFGroup/h5serv). This enables 179
SSBD:database to provide a web service for users to access quantitative data stored in 180
BD5 files (http://ssbd.qbic.riken.jp/restfulapi/). Because HDF5 and XML are supported 181
by many software platforms, BD5 is a promising data format for storing quantitative 182
biological dynamics data. 183
Like BDML, BD5 can represent quantitative biological dynamics data that is 184
associated with, but independent of, microscopy images. Such data has often been 185
represented as regions of interest (ROIs) on the corresponding microscopy images; for 186
example, the ROIs in the OME data model (https://docs.openmicroscopy.org/ome-187
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
10
model/) and segmentation channels in Cell Feature Explorer (https://cfe.allencell.org). 188
However, not all data can be represented as an ROI on a microscopy image. For 189
example, in an automated cell lineage tracing study of Caenorhabditis elegans, each 190
nucleus was represented as a sphere with center and radius, independently of the z-stack 191
images [11]. Such flexible representation of BD5 (and also BDML) enables us to 192
represent quantitative biological dynamics data obtained not only from bioimage 193
informatics but also from mechanobiological simulation techniques. 194
Funding 195
This work was supported in part by the National Bioscience Database Center (NBDC) 196
of the Japan Science and Technology Agency (JST); Core Research for Evolutionary 197
Science and Technology (CREST) Grant Number JPMJCR1511, JST; JSPS KAKENHI 198
Grant Number JP18H05412; the Strategic Programs for R&D (President’s Discretionary 199
Fund) of RIKEN, Japan; and Open Life Science Platform, RIKEN, Japan. 200
Acknowledgements 201
We are grateful to the members of the Onami laboratory, RIKEN Center for Biosystems 202
Dynamics Research, Japan for feedback and discussions. 203
204
REFERENCES 205
1. Tohsato Y, Ho KH, Kyoda K, Onami S. SSBD: a database of quantitative data of 206
spatiotemporal dynamics of biological phenomena. Bioinformatics. 2016;32:3471-207
2479. 208
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
11
2. Kyoda K, Tohsato Y, Ho KH, Onami S. Biological Dynamics Markup Language 209
(BDML): an open format for representing quantitative biological dynamics data. 210
Bioinformatics. 2015;31:1044-1052. 211
3. Clarke JA, Mark ER. Enhancements to the eXtensible Data Model and Format 212
(XDMF). Proceedings of the 2007 DoD High Performance Computing 213
Modernization Program Users Group Conference; 2007 Jun; Washington DC, 214
USA. IEEE Computer Society. pp. 322–327. 215
4. Britten RD, Christie GR, Little C, Miller AK, Bradley C, Wu A, et al. FieldML, a 216
proposed open standard fort the Physiome project for mathematical model 217
representation. Med Biol Eng Comput. 2013;51:1191-1207. 218
5. Baker M. Quantitative data: learning to share. Nature Meth. 2012;9:39-41. 219
6. Dougherty MT, Folk MJ, Zadok E, Bernstein HJ, Bernstein FC, Eliceiri KW, et al. 220
Unifying biological image formats with HDF5. Commun ACM. 2009;52:42-47. 221
7. Hoffman MM, Buske OJ, Noble WS. The Genomedata format for storing large-222
scale functional genomics data. Bioinformatics. 2010;26:1458-1459. 223
8. Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK. Adaptive informatics 224
for multifactorial and high-content biological data. Nature Meth. 2011;8:487-492. 225
9. Wilhelm M, Kirchner M, Steen JA, Steen H. mz5: space- and time-efficient storage 226
of mass spectrometry data sets. Mol Cell Proteomics. 2012;11:O111.011379. 227
10. Dance A. Find a home for every imaging data set. Nature. 2020;579:162-163. 228
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
12
11. Bao Z, Murray JI, Boyle T, Ooi SL, Sandel MJ, Waterston RH. Automated cell 229
lineage tracing in Caenorhabditis elegans. Proc Natl Acad Sci USA. 230
2006;103:2707-2712. 231
232
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
13
Figure 1. Outline of the BD5 data format. The data group includes scaleUnit, 233
objectDef, featureDef, and trackInfo datasets; each data group is numbered to 234
correspond to the index number of the time-ordered sequence. Each numbered group 235
has spatial information about biological objects and numerical information about 236
features related to the objects. Solid and dashed boxes represent the required and 237
optional elements, respectively. 238
Figure 2. An example of the description of the spatiotemporal information of biological 239
objects and their tracking information. The dataset name in object group corresponds to 240
the identifier (ID) of the biological object (red). Each row in the dataset must have a 241
unique ID and its spatiotemporal information. A label can optionally be attached for 242
each object. The tracking information including object divisions and fusions can be 243
stored in trackInfo dataset. 244
Figure 3. An example of the description of the spatiotemporal information based on line 245
(a) and face entities (b). The sequential identifier (sID) represents a set of coordinates 246
that can be connected beginning at the top to describe an entity within one biological 247
object. 248
Figure 4. An example of the description of the feature information related to biological 249
objects. The example object is a nucleus expressing green fluorescent protein (GFP) at t 250
= 0 in a time series. The dataset name in feature group corresponds to the identifier (ID) 251
of the biological object. Each row in the dataset has object ID, feature fID (blue), and 252
the feature value. In this example, fID is 0 or 1 depending on whether the data is total or 253
average GFP signal, respectively. Object ID is 0 if the object is a nucleus. Feature value 254
is the fluorescence intensity expressed in a.u. (arbitrary units). 255
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
14
Figure 5. A skeleton of a BDML version 3.0 file for describing meta-information. This 256
version allows the use of an external file for describing the data itself, designation of 257
multiple contact persons, and the use of ORCID, a unique persistent digital identifier of 258
the research scientist. 259
Figure 6. Comparison between BD5 and BDML data formats. a) Access times of the 260
BDML and BD5 files. Access time was measured as the time for accessing and 261
displaying xyz-coordinate data at a randomly selected time point stored in the BDML 262
and BD5 files. The time was measured on an Intel Xeon CPU 2.8 GHz processor with 263
32 GB of main memory. Each dot represents a biological quantitative data set. We used 264
334 biological quantitative data sets, each of which has coordinate data and is stored in 265
SSBD:database as a single BDML file. BD5 files were generated from the BDML files 266
by using the BDML2BD5 program. b) Size of the BDML and BD5 files. Each dot 267
represents a biological quantitative data set. In this comparison, we used 450 biological 268
quantitative data sets stored in SSBD:database as BDML files. As above, the BD5 files 269
were generated from the BDML files by using the BDML2BD5 program. The dashed 270
line represents the linear regression line for all dots. The data within the small rectangle 271
near the origin in the large graph is plotted on expanded axes in the insert. 272
Figure 7. Relationship between access time and file size for BDML and BD5 files. The 273
time for accessing and displaying coordinate data at a randomly selected time point is 274
plotted against file size. Each cross represents a BDML file; each dot represents a BD5 275
file. In the comparison, we used BDML and BD5 files of the 334 biological quantitative 276
data sets described in Fig. 6a. 277
.CC-BY 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted April 28, 2020. . https://doi.org/10.1101/2020.04.26.062976doi: bioRxiv preprint
data
objectDef
scaleUnit
featureDef
0
1
2
trackInfo
…
object
feature
…
…
0
time-orderedsequence
0
oID name
0 nucleus
objectDef
featureDef
dimension xScale yScale zScale sUnit tScale tUnit
3D+T 0.09 0.09 1.0 micrometer 2.0 minute
scaleUnit
Figure 1: Kyoda et al.
fID name fUnit
0 average GFP signal a.u.
1 center of mass GFP signal a.u.
data
0
1
trackInfo
object 0
object 0
2 …
…
ID t entity x y z radius label
000002 0 sphere 380 366 16.1 3.6 A
000003 0 sphere 387 153 16.6 3.87 B
ID t entity x y z radius label
001002 1 sphere 380 366 16.1 3.6 A
001003 1 sphere 387 153 19.6 3.87 C
001004 1 sphere 386 151 13.1 3.87 D
from to
000002 001002
000003 001003
000003 001004
trackInfo
oID name
0 nucleus
objectDef
t = 0 t = 1A A
B C
D
Figure 2: Kyoda et al.
000002
000003001002
001004
001003
Figure 3: Kyoda et al.
ID t entity sID x y z label
000001 0 line 0 2 1 0 E
000001 0 line 0 2 10 0 E
000001 0 line 0 9 13 0 E
000001 0 line 0 8 4 0 E
000001 0 line 0 2 1 0 E
E
a
b ID t entity sID x y z label
000100 0 face 0 1 1 1 F
000100 0 face 0 2 3 0 F
000100 0 face 0 3 1 2 F
000100 0 face 1 2 3 0 F
000100 0 face 1 4 5 4 F
000100 0 face 1 3 1 2 F
… … … … … … … …
F
(2, 1, 0)
(2, 10, 0)
(9, 13, 0)
(8, 4, 1)
(1, 1, 1)
(2, 3, 0)(3, 1, 2)
(4, 5, 4)
ID t entity x y z radius label
000002 0 sphere 380 366 16.1 3.6 A
0
0
ID fID value
000002 0 153
000002 1 214
object
feature
data
featureDef
0
1
…
object
feature
…
0
0
fID name fUnit
0 average GFP signal a.u.
1 center of mass GFP signal a.u.
featureDef
t = 0A
GFP signal
Figure 4: Kyoda et al.
<bdml version="3.0" xmlns=http://ssbd.qbic.riken.jp/bdmlxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ssbd.qbic.riken.jp/bdmlhttp://ssbd.qbic.riken.jp/bdml/bdml3.0.xsd">
<info>...
</info><summary>
...</summary><contact>
<person><first-name>Shuichi</first-name><last-name>Onami</last-name><ORCID>0000-0002-8255-1724</ORCID><affiliation>
...</affiliation>
</person></contact><methods>
...</methods><extfile>
<bd5File>wt_N2_030116_02_bd5.h5</bd5File></extfile></bdml>
Figure 5: Kyoda et al.
a b
0
1E+10
2E+10
3E+10
0 1E+10 2E+10 3E+10
BDML file size (bytes)
BD5
file
size
(byt
es)
Access time for BDML file (s)
Acce
ss ti
me
for B
D5
file
(s)
0
20
40
60
80
100
120
0 20 40 60 80 100 120
Figure 6: Kyoda et al.
0
20
40
60
80
100
120
0.E+00 5.E+07 1.E+08 2.E+08 2.E+08 3.E+08
Figure 7: Kyoda et al.
File size (bytes)
Acce
ss ti
me
(s)
BDML filesBD5 files