As part of the statistical quality
analysis I
did of the of the xoroshiro128aox
PRNG, I looked at interleaved parallel
generators (where a single generator is created by round-robin interleaving the
output $n$ identical generators with different seeds) as a way to test its
suitability for parallel processing. Against my intuition, I found that simple
seeding schemes produce poor interleaved generators, and even when the
subsequences are disjoint. These findings equally apply to xoroshiro128+
as
we will see.
The xoroshiro128+
creators recommend using a jump
function to seed parallel generators to
deterministically move to disjoint parts of the sequence. However, computing
jumps is expensive to do in hardware because it involves 128-bit arithmetic and
so it is preferable to compute seed values based on a simpler function of a
machine’s state, such as an integer identifier for a process/thread. Since the
probability of any two randomly-chosen sequences overlap is very small even
with a large number of sequences, it seems reasonable to assume that a simple
seed generator will perform okay in practice. Note also that the creators
separately recommend “that initialization must be performed with a generator
radically different in nature from the one initialized to avoid correlation on
similar seeds”, based on research from
2007.
To investigate the use of simple seeding schemes, I ran tests against interleaved generators with three representative options (the source code is available on GitHub):
With equidistant intervals from 1 in the natural number sequence [Scheme A], defined in Python notation as:
for i in range(NUM_SEEDS):
seed[i] = int(1 + i * ((2**128) // NUM_SEEDS))
With equidistant intervals starting from a random offset [Scheme B]:
for i in range(NUM_SEEDS):
seed[i] = int(rand(0, (2**128) // NUM_SEEDS) + i * ((2**128) // NUM_SEEDS))
By adding a fixed offset to a initial state of balanced 0s and 1s [Scheme C]:
for i in range(NUM_SEEDS):
seed[i] = 0x55555555555555555555555555555555 + i
And, as baselines:
- Using the
xoroshiro128
jump function, jumping $2^{64}$ steps [Scheme D]. - Using a minimum-size jump for the number of generators to pass PractRand [Scheme E]
(for example, the distance for 1000 generators is 4398046511 =
(32 * 1024**4) / (8 * 1000)
). - Using a non-linear PRNG to choose seeds (PCG64) [Scheme F].
To test these seeding schemes, I ran each generator against the standard PractRand test battery. PractRand is a good choice for these tests since it reports results at intermediate points and consumes much more output than Big Crush or Gjrand: 32 TB by default. A pass is achieved if no overtly suspicious $p$-values are flagged.
The results are summarised in the following table:
Seeding scheme | Number of generators | Output seen | Failing tests |
---|---|---|---|
Scheme A | 10 | 256 MB | DC6 , FPF |
Scheme A | 100 | 512 MB | DC6 , FPF |
Scheme A | 1000 | 16 GB | BCFN |
Scheme B | 10 | 256 MB | FPF |
Scheme B | 100 | 2 GB | DC6 |
Scheme B | 1000 | 32 GB | BCFN |
Scheme C | 10 | 256 MB | DC6 , Gap , FPF , mod3 |
Scheme C | 100 | 256 MB | BCFN , DC6 , Gap , FPF , mod3n |
Scheme C | 1000 | 256 MB | BCFN , DC6 , Gap , Brank , FPF , mod3n |
Scheme D | 10,100,1000 | 32 TB | Pass |
Scheme E | 10,100,1000 | 32 TB | Pass |
Scheme F | 10,100,1000 | 32 TB | Pass |
Note that DC6
and BCFN
are both tests for linearity. For the failing
generators (schemes A-C), I checked there are no duplicate values between the
different generators to establish that no two sequences overlap (using the
analyse
mode). This means that the above failures are due to correlations
between disjoint sequences.
For reference, below is sample output for running the Scheme A generator with 10 parallel streams against PractRand. There are comprehensive failures within the first 256 MB of output.
$ ./build/xoroshiro128plus_il_equia_10 stdout std64 0 0 | ./install/PractRand-pre0.95/RNG_test stdin64 -a
RNG_test using PractRand version 0.95
RNG = RNG_stdin64, seed = unknown
test set = core, folding = standard (64 bit)
rng=RNG_stdin64, seed=unknown
length= 256 megabytes (2^28 bytes), time= 2.3 seconds
Test Name Raw Processed Evaluation
BCFN(2+0,13-2,T) R= -0.0 p = 0.494 normal
BCFN(2+1,13-2,T) R= +4.1 p = 0.050 normal
BCFN(2+2,13-3,T) R= +0.0 p = 0.484 normal
BCFN(2+3,13-3,T) R= +1.3 p = 0.292 normal
...
[Low16/64]BCFN(2+12,13-9,T) R= -2.1 p = 0.849 normal
[Low16/64]DC6-9x1Bytes-1 R= +1.6 p = 0.332 normal
[Low16/64]Gap-16:A R= +0.5 p = 0.523 normal
[Low16/64]Gap-16:B R= +4.3 p = 1.3e-3 normalish
[Low16/64]FPF-14+6/16:(0,14-0) R= +0.2 p = 0.434 normal
[Low16/64]FPF-14+6/16:(1,14-0) R= +2.2 p = 0.061 normal
...
[Low4/64]BCFN(2+9,13-9,T) R= +0.6 p = 0.310 normal
[Low4/64]BCFN(2+10,13-9,T) R= +0.2 p = 0.372 normal
[Low4/64]DC6-9x1Bytes-1 R= +88.5 p = 5.9e-51 FAIL !!!!
[Low4/64]Gap-16:A R= +4.4 p = 4.6e-3 normalish
[Low4/64]Gap-16:B R= -0.6 p = 0.656 normal
[Low4/64]FPF-14+6/16:(0,14-1) R= -1.0 p = 0.759 normal
[Low4/64]FPF-14+6/16:(1,14-2) R= -3.7 p =1-4.2e-3 normal
[Low4/64]FPF-14+6/16:(2,14-2) R= -0.9 p = 0.738 normal
[Low4/64]FPF-14+6/16:(3,14-3) R= +1.3 p = 0.189 normal
[Low4/64]FPF-14+6/16:(4,14-4) R= +1.6 p = 0.134 normal
[Low4/64]FPF-14+6/16:(5,14-5) R= +1.2 p = 0.206 normal
[Low4/64]FPF-14+6/16:(6,14-5) R= +1.4 p = 0.166 normal
[Low4/64]FPF-14+6/16:(7,14-6) R= -1.6 p = 0.877 normal
[Low4/64]FPF-14+6/16:(8,14-7) R= -0.7 p = 0.681 normal
[Low4/64]FPF-14+6/16:(9,14-8) R= +1.2 p = 0.187 normal
[Low4/64]FPF-14+6/16:(10,14-8) R= +19.4 p = 5.5e-14 FAIL
[Low4/64]FPF-14+6/16:(11,14-9) R= +12.4 p = 4.0e-8 suspicious
[Low4/64]FPF-14+6/16:(12,14-10) R= +12.5 p = 3.0e-7 mildly suspicious
[Low4/64]FPF-14+6/16:(13,14-11) R= +5.1 p = 4.8e-3 normal
[Low4/64]FPF-14+6/16:(14,14-11) R= +8.8 p = 1.1e-4 normal
[Low4/64]FPF-14+6/16:all R= +0.3 p = 0.435 normal
[Low4/64]FPF-14+6/16:cross R= +2.1 p = 0.032 normal
[Low4/64]BRank(12):128(4) R= -0.8 p~= 0.670 normal
[Low4/64]BRank(12):256(2) R= +1.6 p~= 0.168 normal
[Low4/64]BRank(12):384(1) R= +1.8 p~= 0.146 normal
[Low4/64]BRank(12):512(2) R= -0.2 p~= 0.554 normal
[Low4/64]BRank(12):768(1) R= -0.7 p~= 0.689 normal
[Low4/64]mod3n(5):(0,9-6) R= +14.0 p = 1.3e-5 mildly suspicious
[Low4/64]mod3n(5):(1,9-6) R= -0.0 p = 0.451 normal
[Low4/64]mod3n(5):(2,9-6) R= -0.2 p = 0.492 normal
[Low4/64]mod3n(5):(3,9-6) R= -1.5 p = 0.781 normal
[Low4/64]mod3n(5):(4,9-6) R= +0.0 p = 0.433 normal
[Low4/64]mod3n(5):(5,9-6) R= +0.4 p = 0.353 normal
[Low4/64]mod3n(5):(6,9-6) R= -1.3 p = 0.742 normal
[Low4/64]mod3n(5):(7,9-6) R= +1.9 p = 0.152 normal
[Low4/64]mod3n(5):(8,9-6) R= -1.3 p = 0.731 normal
[Low4/64]mod3n(5):(9,9-6) R= -0.4 p = 0.540 normal
[Low4/64]mod3n(5):(10,9-6) R= -0.6 p = 0.585 normal
[Low4/64]mod3n(5):(11,9-6) R= +0.6 p = 0.322 normal
[Low1/64]BCFN(2+0,13-6,T) R= +7.3 p = 6.5e-3 normalish
[Low1/64]BCFN(2+1,13-6,T) R= -3.2 p = 0.926 normal
[Low1/64]BCFN(2+2,13-6,T) R= -1.5 p = 0.716 normal
[Low1/64]BCFN(2+3,13-6,T) R= +1.6 p = 0.238 normal
[Low1/64]BCFN(2+4,13-7,T) R= +1.7 p = 0.212 normal
[Low1/64]BCFN(2+5,13-8,T) R= +0.8 p = 0.304 normal
[Low1/64]BCFN(2+6,13-8,T) R= +0.2 p = 0.403 normal
[Low1/64]BCFN(2+7,13-9,T) R= +1.0 p = 0.257 normal
[Low1/64]BCFN(2+8,13-9,T) R= -1.8 p = 0.790 normal
[Low1/64]DC6-9x1Bytes-1 R= +1379 p = 3.6e-777 FAIL !!!!!!!
[Low1/64]Gap-16:A R= +4863 p = 5e-3914 FAIL !!!!!!!!
[Low1/64]Gap-16:B R=+11222 p = 6e-8467 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:(0,14-2) R=+10120 p = 5e-8851 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:(1,14-3) R= +7164 p = 2e-6279 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:(2,14-4) R= +5071 p = 9e-4144 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:(3,14-5) R= +3567 p = 1e-2956 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:(4,14-5) R= +2180 p = 7e-1807 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:(5,14-6) R= +1535 p = 1e-1174 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:(6,14-7) R=+350.5 p = 9.0e-279 FAIL !!!!!!
[Low1/64]FPF-14+6/16:(7,14-8) R=+255.2 p = 1.1e-183 FAIL !!!!!!
[Low1/64]FPF-14+6/16:(8,14-8) R=+235.5 p = 1.7e-169 FAIL !!!!!
[Low1/64]FPF-14+6/16:(9,14-9) R=+152.7 p = 2.0e-96 FAIL !!!!!
[Low1/64]FPF-14+6/16:(10,14-10) R= +87.1 p = 6.7e-47 FAIL !!!
[Low1/64]FPF-14+6/16:(11,14-11) R= +58.4 p = 2.6e-26 FAIL !!
[Low1/64]FPF-14+6/16:(12,14-11) R= +58.0 p = 3.9e-26 FAIL !!
[Low1/64]FPF-14+6/16:all R=+14029 p = 0 FAIL !!!!!!!!
[Low1/64]FPF-14+6/16:cross R= +3119 p = 3e-2681 FAIL !!!!!!!!
[Low1/64]BRank(12):128(2) R= -0.2 p~= 0.554 normal
[Low1/64]BRank(12):256(2) R= -1.0 p~= 0.744 normal
[Low1/64]BRank(12):384(1) R= -0.7 p~= 0.689 normal
[Low1/64]BRank(12):512(1) R= +0.4 p~= 0.366 normal
[Low1/64]mod3n(5):(0,9-6) R=+322.9 p = 4.1e-111 FAIL !!!!!
[Low1/64]mod3n(5):(1,9-6) R= -0.5 p = 0.563 normal
[Low1/64]mod3n(5):(2,9-6) R= +2.1 p = 0.129 normal
[Low1/64]mod3n(5):(3,9-6) R= +5.0 p = 0.016 normal
[Low1/64]mod3n(5):(4,9-6) R= +2.5 p = 0.099 normal
[Low1/64]mod3n(5):(5,9-6) R= -0.8 p = 0.614 normal
[Low1/64]mod3n(5):(6,9-6) R= -1.4 p = 0.757 normal
[Low1/64]mod3n(5):(7,9-6) R= +1.7 p = 0.167 normal
[Low1/64]mod3n(5):(8,9-6) R= +2.3 p = 0.111 normal
[Low1/64]mod3n(5):(9,9-6) R= +2.2 p = 0.125 normal
Conclusion
When choosing seeds for PRNGs operating in parallel, it might seem sufficient
to ensure that the sequences they range over are disjoint, provably or with
high probability. However, correlations between disjoint subsequences do exist
in PRNGs based on the xoroshiro128
linear engine. It is known that these
correlations can manifest when sequences are chosen by a linear generator.
Based on the findings in this note, these correlations also occur when seeds
are created using non-linear operations such as addition. The safest course of
action to take is to follow the guidance of the xoroshiro128
authors and
either use the jump function to traverse the state space or use a high-quality
non-linear PRNG to generate random seeds.
References
- My PRNG testing source code.
- xoshiro / xoroshiro generators and the PRNG shootout.
- PractRand.
- Makoto Matsumoto, Isaku Wada, Ai Kuramoto, and Hyo Ashihara. 2007. Common defects in initialization of pseudorandom number generators. ACM Transactions on Modeling and Computer Simulation.