Data format for event-driven particle trajectories

On my site, I host data on particle motion, including the motion of molecules in gases and liquids, as well as the motion of Lagrangian particles in turbulence. Most of these systems, like turbulence, exhibit intrinsically smooth time dynamics and can be simulated using conventional time-stepping algorithms such as direct numerical simulation (DNS). However, I also host data on the dynamics of hard-sphere fluids — statistical systems with intrinsically discrete, collision-based dynamics modeled using event-driven algorithms. Such data requires as different storage format, compared to what I used for the smooth particle dynamics.

The data for hard-sphere systems is obtained using the DynamO particle simulator, see respective page for the details. DynamO outputs data in XML format and is designed to provide processed data. I, on the other hand, offer raw trajectory data for users to process themselves. I consider HDF5 to be a more convenient format for this purpose. Below, I describe how I store DynamO output in HDF5.

HDF5 structure for hard-sphere simulation data

Each HDF5 file contains particle trajectories from a simulation of a hard-sphere fluid at thermal equilibrium and includes the following objects:

Object name	Type	Description
`t_start`	Float64	Start time of the simulation (typically 0).
`t_end`	Float64	End time of the simulation.
`ρ`	Float64	Reduced density $\rho^*$ of the system.
`N`	Int64	Number of particles $N$ in the system.
`L`	Float64	Side length $L^$ of the periodic cubic simulation domain, ${L^}^3 = N / \rho^*$ .
`README`	String	General information and reference to this website.
`t`	Group	Collision times for each particle.
`x`	Group	Positions of each particle at the collision times.
`collision_partner`	Group	Indexes of the particles with which each given particle collides.

All the data is stored in reduced units, with the diameter of a hard sphere $\sigma$ being the unit of length, and $\sigma \sqrt{m / (k_\text{B} T)}$ being the unit of time. Here $m$ is the mass of a hard sphere, $T$ is the temperature and $k_\mathrm{B}$ is the Boltzmann constant. The unit of velocity is then $\sqrt{k_\text{B} T / m}$ .

Particle trajectories

The trajectories of the particles are stored in two groups, t and x. Each of them contain $N$ arrays, one for each particle, named with the particle’s index, from 000000001 to 00...0N (names have fixed-length of 9 characters).

Consider the particle with index 42. In t/000000042 you will find an array of timestamps (1D Float64), starting with the simulation start time $t_\mathrm{start}$ and ending with the simulation end time $t_\mathrm{end}$ , and in between containing the times the 42nd particle experienced a collision:

[t_\mathrm{start},\ t_\mathrm{coll.\ 1},\ t_\mathrm{coll.\ 2},\ \ldots,\ t_{\mathrm{coll.\ }n},\ t_\mathrm{end}].

In x/000000042 you will find the position $\bm x(t)$ of the 42nd particle sampled at these timestamps:

[\bm x(t_\mathrm{start}),\ \bm x(t_\mathrm{coll.\ 1}),\ \bm x(t_\mathrm{coll.\ 2}),\ \ldots,\ \bm x(t_{\mathrm{coll.\ }n}),\ \bm x(t_\mathrm{end})].

In the HDF5 file, x/000000042 is stored not as a $n+2$ array of 3D vectors, but as a 2D Float64 array of size $(n+2) \times 3$ . Note that reading it in Julia will give you an array of size $3 \times (n+2)$ .

Particle velocities and periodic boundary conditions

The velocity $\bm u(t)$ of a particle is a piecewise constant function of time; between two consecutive collisions, say $3$ and $2$ , it can be computed as

\bm u(t) = \frac{\bm x(t_\mathrm{coll.\ 3}) - \bm x(t_\mathrm{coll.\ 2})}{t_\mathrm{coll.\ 3} - t_\mathrm{coll.\ 2}}, \qquad t \in (t_\mathrm{coll.\ 2},\ t_\mathrm{coll.\ 3}).

Note that the data were obtained from numerical simulations with periodic boundary conditions. That means that the actual position $\bm x_\mathrm{true}(t)$ of a particle in a simulation is a discontinuous function of time and the above formula for the particle’s velocity would not work. Therefore, instead of $\bm x_\mathrm{true}(t)$ , I store the time integral of the particle’s velocity $\bm u(t)$ , i.e.

\bm x(t) = \bm x(t_\text{start}) + \int_{t_\text{start}}^{t_\text{end}} \!\! \bm u(s) \, \mathrm{d} s.

The actual coordinate can be obtained by applying the periodic boundary conditions to $\bm x(t)$ as

\bm x_\mathrm{true}(t) = \bm x(t) - L \operatorname{round}(\bm x(t) / L).

At least I think so, I never tried to compute $\bm x_\mathrm{true}(t)$ .

Collision information

Finally, for completeness I store the collision data in the group collision_partner. For our 42nd particle collision_partner/000000042 contains a 1D Int64 array of particle indexes, these are the particles the 42nd particle collided with at the corresponding collision times t/000000042.

Reading data in Julia

Below is the Julia code to read the file hsf_1e-2.0.h5 into a dictionary hsf_data. Note that the t group of the HDF5 file is read into hsf_data["t"] as an array of $N$ arrays. Similarly, x is read into hsf_data["x"] as an array of $N$ arrays, each containing static 3-element vectors representing the particle positions.

using HDF5
using Statistics
using ProgressMeter
using StaticArrays

function read_data(file_name)
  d = Dict()
  i_to_str(i) = lpad(i, 9, '0')
  
  h5open(file_name, "r") do f
    for dataset in ["t_start", "t_end", "ρ", "N", "L", "README"]
      d[dataset] = read(f, dataset)
    end

    println("Reading t data...")
    d["t"] = Array{Array{Float64, 1}, 1}(undef, d["N"])
    @showprogress for i in eachindex(d["t"])
      d["t"][i] = read(f, "t/" * i_to_str(i))
    end

    println("Reading x data...")
    d["x"] = Array{Array{SVector{3, Float64}, 1}, 1}(undef, d["N"])
    @showprogress for i in eachindex(d["x"])
      d["x"][i] = reinterpret(SVector{3,Float64}, vec(read(f, "x/" * i_to_str(i))))
    end
  end
  
  return d
end

hsf_data = read_data("hsf_1e-2.0.h5");