Data format for event-driven particle trajectories

On my site, I host data on particle motion, including the motion of molecules in gases and liquids, as well as the motion of Lagrangian particles in turbulence. Most of these systems, like turbulence, exhibit intrinsically smooth time dynamics and can be simulated using conventional time-stepping algorithms such as direct numerical simulation (DNS). However, I also host data on the dynamics of hard-sphere fluids — statistical systems with intrinsically discrete, collision-based dynamics modeled using event-driven algorithms. Such data requires as different storage format, compared to what I used for the smooth particle dynamics.

The data for hard-sphere systems is obtained using the DynamO particle simulator, see respective page for the details. DynamO outputs data in XML format and is designed to provide processed data. I, on the other hand, offer raw trajectory data for users to process themselves. I consider HDF5 to be a more convenient format for this purpose. Below, I describe how I store DynamO output in HDF5.

HDF5 structure for hard-sphere simulation data

Each HDF5 file contains particle trajectories from a simulation of a hard-sphere fluid at thermal equilibrium and includes the following objects:

Object name Type Description
t_start Float64 Start time of the simulation (typically 0).
t_end Float64 End time of the simulation.
ρ Float64 Reduced density ρ\rho^* of the system.
N Int64 Number of particles NN in the system.
L Float64 Side length LL^* of the periodic cubic simulation domain, L3=N/ρ{L^*}^3 = N / \rho^*.
README String General information and reference to this website.
t Group Collision times for each particle.
x Group Positions of each particle at the collision times.
collision_partner Group Indexes of the particles with which each given particle collides.

All the data is stored in reduced units, with the diameter of a hard sphere σ\sigma being the unit of length, and σm/(kBT)\sigma \sqrt{m / (k_\text{B} T)} being the unit of time. Here mm is the mass of a hard sphere, TT is the temperature and kBk_\mathrm{B} is the Boltzmann constant. The unit of velocity is then kBT/m\sqrt{k_\text{B} T / m}.

Particle trajectories

The trajectories of the particles are stored in two groups, t and x. Each of them contain NN arrays, one for each particle, named with the particle’s index, from 000000001 to 00...0N (names have fixed-length of 9 characters).

Consider the particle with index 42. In t/000000042 you will find an array of timestamps (1D Float64), starting with the simulation start time tstartt_\mathrm{start} and ending with the simulation end time tendt_\mathrm{end}, and in between containing the times the 42nd particle experienced a collision:

[tstart, tcoll. 1, tcoll. 2, , tcoll. n, tend].[t_\mathrm{start},\ t_\mathrm{coll.\ 1},\ t_\mathrm{coll.\ 2},\ \ldots,\ t_{\mathrm{coll.\ }n},\ t_\mathrm{end}].

In x/000000042 you will find the position x(t)\bm x(t) of the 42nd particle sampled at these timestamps:

[x(tstart), x(tcoll. 1), x(tcoll. 2), , x(tcoll. n), x(tend)].[\bm x(t_\mathrm{start}),\ \bm x(t_\mathrm{coll.\ 1}),\ \bm x(t_\mathrm{coll.\ 2}),\ \ldots,\ \bm x(t_{\mathrm{coll.\ }n}),\ \bm x(t_\mathrm{end})].

In the HDF5 file, x/000000042 is stored not as a n+2n+2 array of 3D vectors, but as a 2D Float64 array of size (n+2)×3(n+2) \times 3. Note that reading it in Julia will give you an array of size 3×(n+2)3 \times (n+2).

Particle velocities and periodic boundary conditions

The velocity u(t)\bm u(t) of a particle is a piecewise constant function of time; between two consecutive collisions, say 33 and 22, it can be computed as

u(t)=x(tcoll. 3)x(tcoll. 2)tcoll. 3tcoll. 2,t(tcoll. 2, tcoll. 3).\bm u(t) = \frac{\bm x(t_\mathrm{coll.\ 3}) - \bm x(t_\mathrm{coll.\ 2})}{t_\mathrm{coll.\ 3} - t_\mathrm{coll.\ 2}}, \qquad t \in (t_\mathrm{coll.\ 2},\ t_\mathrm{coll.\ 3}).

Note that the data were obtained from numerical simulations with periodic boundary conditions. That means that the actual position xtrue(t)\bm x_\mathrm{true}(t) of a particle in a simulation is a discontinuous function of time and the above formula for the particle’s velocity would not work. Therefore, instead of xtrue(t)\bm x_\mathrm{true}(t), I store the time integral of the particle’s velocity u(t)\bm u(t), i.e.

x(t)=x(tstart)+tstarttend ⁣ ⁣u(s)ds.\bm x(t) = \bm x(t_\text{start}) + \int_{t_\text{start}}^{t_\text{end}} \!\! \bm u(s) \, \mathrm{d} s.

The actual coordinate can be obtained by applying the periodic boundary conditions to x(t)\bm x(t) as

xtrue(t)=x(t)Lround(x(t)/L).\bm x_\mathrm{true}(t) = \bm x(t) - L \operatorname{round}(\bm x(t) / L).

At least I think so, I never tried to compute xtrue(t)\bm x_\mathrm{true}(t).

Collision information

Finally, for completeness I store the collision data in the group collision_partner. For our 42nd particle collision_partner/000000042 contains a 1D Int64 array of particle indexes, these are the particles the 42nd particle collided with at the corresponding collision times t/000000042.

Reading data in Julia

Below is the Julia code to read the file hsf_1e-2.0.h5 into a dictionary hsf_data. Note that the t group of the HDF5 file is read into hsf_data["t"] as an array of NN arrays. Similarly, x is read into hsf_data["x"] as an array of NN arrays, each containing static 3-element vectors representing the particle positions.

using HDF5
using Statistics
using ProgressMeter
using StaticArrays

function read_data(file_name)
  d = Dict()
  i_to_str(i) = lpad(i, 9, '0')
  
  h5open(file_name, "r") do f
    for dataset in ["t_start", "t_end", "ρ", "N", "L", "README"]
      d[dataset] = read(f, dataset)
    end

    println("Reading t data...")
    d["t"] = Array{Array{Float64, 1}, 1}(undef, d["N"])
    @showprogress for i in eachindex(d["t"])
      d["t"][i] = read(f, "t/" * i_to_str(i))
    end

    println("Reading x data...")
    d["x"] = Array{Array{SVector{3, Float64}, 1}, 1}(undef, d["N"])
    @showprogress for i in eachindex(d["x"])
      d["x"][i] = reinterpret(SVector{3,Float64}, vec(read(f, "x/" * i_to_str(i))))
    end
  end
  
  return d
end

hsf_data = read_data("hsf_1e-2.0.h5");