Random Numbers

This page has links to some random numbers and then their qualities are evaluated using the Diehard utility. Three small files of random numbers are given and then larger files of random numbers are provided.

The first file is called random_01.aes with 8080 bytes of random binary bits. The key is all zeros for the random numbers made with three variable oscillators. Encryption was using software called Perfect File Encryption with AES. This software failed validation tests, but the file is presented as an artifact from software written in the year 2000.

The second file has random numbers from a keyboard input. It is named ran-2.aes It has 8080 bytes. The key and IV are zeros for aes 128 cbc mode (Cipher Block Chaining). The encryption software is called OpenSSL.

Random numbers can come from nature or from computer complications. There are several simple random-seeming functions that have been developed in software. Random numbers can be tested for their qualities by using some statistical tests that were programmed by George Marsaglia. The suite of statistical tests are called Diehard. The Diehard tests will be done on the three files of random numbers linked above. Two are encrypted random numbers and one is a file of computer generated random numbers (pseudo-random numbers) that is not encrypted. They should all get high grades from the Diehard statistical tests.

July 17, 2010 : Beginning Diehard tests. The test needs a binary file with 10 megabytes to 11 megabytes of random bits. The three files given above only have 8 kilobytes, so new 10 megabyte files will be prepared.

ran_04.dat is a text file with copies of "Alien Art For Sale".
ran_05.dat is that text file encrypted with OpenSSL AES-128-cbc.
ran_06.dat is unencrypted random numbers from OpenSSL.
_____________________________________________________

The results from Diehard:
Three result files came from Diehard:
ran_04.txt
ran_05.txt
ran_06.txt
Those correspond to the three .dat files listed above.


The text file ran_04.dat was tested by Diehard and it fails to seem random. The results from Diehard are in ran_04.txt at Toyon Jungle Technology.

ran_04.dat has 10,346,041 bytes
so a fractional aes block is there with .5625 of a block.

Excerpts from Diehard results for ran_04.dat (renamed from g.txt)

Birthday test
For a sample of size 500: mean
g.txt using bits 6 to 29 224.988
duplicate number number
spacings observed expected
0 500. 67.668
1 0. 135.335
2 0. 135.335
3 0. 90.224
4 0. 45.112
5 0. 18.045
6 to INF 0. 8.282
Chisquare with 6 d.o.f. = 3194.53 p-value= 1.000000


OPERM5 test for file g.txt
For a sample of 1,000,000 consecutive 5-tuples,
chisquare for 99 degrees of freedom=*******; p-value=1.000000


Binary rank test for g.txt
Rank test for 31x31 binary matrices:
rows from leftmost 31 bits of each 32-bit integer
rank observed expected (o-e)^2/e sum
28 35228 211.4*******************
29 3627 5134.0442.359800*********
30 1030 23103.0*******************
31 115 11551.5*******************
chisquare=****** for 3 d. of f.; p-value=1.000000
________________________________________________

The p-value of 1.000000 means the file seems non-random. The documentation with Diehard explains this.
__________________________________________


The diehard results from ran_05.txt:

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: THE BITSTREAM TEST ::
:: The file under test is viewed as a stream of bits. Call them ::
:: b1,b2,... . Consider an alphabet with two "letters", 0 and 1 ::
:: and think of the stream of bits as a succession of 20-letter ::
:: "words", overlapping. Thus the first word is b1b2...b20, the ::
:: second is b2b3...b21, and so on. The bitstream test counts ::
:: the number of missing 20-letter (20-bit) words in a string of ::
:: 2^21 overlapping 20-letter words. There are 2^20 possible 20 ::
:: letter words. For a truly random string of 2^21+19 bits, the ::
:: number of missing words j should be (very close to) normally ::
:: distributed with mean 141,909 and sigma 428. Thus ::
:: (j-141909)/428 should be a standard normal variate (z score) ::
:: that leads to a uniform [0,1) p value. The test is repeated ::
:: twenty times. ::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
THE OVERLAPPING 20-tuples BITSTREAM TEST, 20 BITS PER WORD, N words
This test uses N=2^21 and samples the bitstream 20 times.
No. missing words should average 141909. with sigma=428.
---------------------------------------------------------
tst no 1: 141454 missing words, -1.06 sigmas from mean, p-value= .14370
tst no 2: 141816 missing words, -.22 sigmas from mean, p-value= .41369
tst no 3: 142364 missing words, 1.06 sigmas from mean, p-value= .85595
tst no 4: 141669 missing words, -.56 sigmas from mean, p-value= .28722
tst no 5: 142302 missing words, .92 sigmas from mean, p-value= .82055
tst no 6: 141718 missing words, -.45 sigmas from mean, p-value= .32743
tst no 7: 141826 missing words, -.19 sigmas from mean, p-value= .42282
tst no 8: 141701 missing words, -.49 sigmas from mean, p-value= .31322
tst no 9: 142109 missing words, .47 sigmas from mean, p-value= .67958
tst no 10: 142054 missing words, .34 sigmas from mean, p-value= .63233
tst no 11: 141461 missing words, -1.05 sigmas from mean, p-value= .14744
tst no 12: 141885 missing words, -.06 sigmas from mean, p-value= .47734
tst no 13: 141640 missing words, -.63 sigmas from mean, p-value= .26459
tst no 14: 142022 missing words, .26 sigmas from mean, p-value= .60382
tst no 15: 142098 missing words, .44 sigmas from mean, p-value= .67033
tst no 16: 142193 missing words, .66 sigmas from mean, p-value= .74627
tst no 17: 141872 missing words, -.09 sigmas from mean, p-value= .46525
tst no 18: 141940 missing words, .07 sigmas from mean, p-value= .52857
tst no 19: 141417 missing words, -1.15 sigmas from mean, p-value= .12501
tst no 20: 142281 missing words, .87 sigmas from mean, p-value= .80741
__________________________________________________

Here is an excerpt from ran_06.txt :


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: This is the COUNT-THE-1's TEST for specific bytes. ::
:: Consider the file under test as a stream of 32-bit integers. ::
:: From each integer, a specific byte is chosen , say the left- ::
:: most:: bits 1 to 8. Each byte can contain from 0 to 8 1's, ::
:: with probabilitie 1,8,28,56,70,56,28,8,1 over 256. Now let ::
:: the specified bytes from successive integers provide a string ::
:: of (overlapping) 5-letter words, each "letter" taking values ::
:: A,B,C,D,E. The letters are determined by the number of 1's, ::
:: in that byte:: 0,1,or 2 ---> A, 3 ---> B, 4 ---> C, 5 ---> D,::
:: and 6,7 or 8 ---> E. Thus we have a monkey at a typewriter ::
:: hitting five keys with with various probabilities:: 37,56,70,::
:: 56,37 over 256. There are 5^5 possible 5-letter words, and ::
:: from a string of 256,000 (overlapping) 5-letter words, counts ::
:: are made on the frequencies for each word. The quadratic form ::
:: in the weak inverse of the covariance matrix of the cell ::
:: counts provides a chisquare test:: Q5-Q4, the difference of ::
:: the naive Pearson sums of (OBS-EXP)^2/EXP on counts for 5- ::
:: and 4-letter cell counts. ::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Chi-square with 5^5-5^4=2500 d.of f. for sample size: 256000
chisquare equiv normal p value
Results for COUNT-THE-1's in specified bytes:
bits 1 to 8 2571.60 1.013 .844362
bits 2 to 9 2531.62 .447 .672648
bits 3 to 10 2473.61 -.373 .354510
bits 4 to 11 2516.43 .232 .591855
bits 5 to 12 2471.80 -.399 .345004
bits 6 to 13 2389.82 -1.558 .059594
bits 7 to 14 2491.06 -.126 .449688
bits 8 to 15 2411.62 -1.250 .105660
bits 9 to 16 2460.97 -.552 .290495
bits 10 to 17 2459.58 -.572 .283796
bits 11 to 18 2486.62 -.189 .424977
bits 12 to 19 2440.71 -.839 .200872
bits 13 to 20 2506.74 .095 .537967
bits 14 to 21 2383.18 -1.652 .049261
bits 15 to 22 2561.02 .863 .805936
bits 16 to 23 2576.98 1.089 .861840
bits 17 to 24 2496.84 -.045 .482175
bits 18 to 25 2449.56 -.713 .237812
bits 19 to 26 2372.72 -1.800 .035927
bits 20 to 27 2507.49 .106 .542157
bits 21 to 28 2582.16 1.162 .877359
bits 22 to 29 2422.61 -1.095 .136866
bits 23 to 30 2646.38 2.070 .980776
bits 24 to 31 2493.85 -.087 .465361
bits 25 to 32 2509.92 .140 .555770

____________________________________________________

Discussion of the Diehard results for files numbered 4, 5, and 6.


The text file of a book (ran_04.dat) had p values of 1.000 and 0.000, so the randomness tests easily identify the non-randomness.

The encrypted text file (ran_05.dat) has p-values typically like 0.28331, so its looks random to Diehard.

The unencrypted random numbers file (ran_06.dat) generated by OpenSSL has p-values like .10640, so it may be less random than the encrypted file.

It is interesting to compare the results of Diehard for files 5 and 6, the encrypted file and the random number file. For the birthday test, here are two excerpts:

ran_05.dat using bits 1 to 24 1.944
duplicate number number
spacings observed expected
0 69. 67.668
1 139. 135.335
2 130. 135.335
3 96. 90.224
4 49. 45.112
5 13. 18.045
6 to INF 4. 8.282
Chisquare with 6 d.o.f. = 4.66 p-value= .412542

ran_06.dat using bits 1 to 24 2.032
duplicate number number
spacings observed expected
0 71. 67.668
1 132. 135.335
2 122. 135.335
3 92. 90.224
4 59. 45.112
5 17. 18.045
6 to INF 7. 8.282
Chisquare with 6 d.o.f. = 6.13 p-value= .591194
_________________________________________

July 18, 2010

A significant difference seems to exist for the Diehard DNA test, comparing the encrypted file and the random file (ran_05.dat versus ran_06.dat). The p-values were tabulated into the following histogram :

p ran05 ran06
.0 1 5
.1 1 2
.2 1 2
.3 2 2
.4 3 2
.5 8 4
.6 3 1
.7 3 0
.8 5 6
.9 1 7

The first column is the approximate p-value.
The second column is ran_05 and the third column is for ran_06.dat number of occurrences of the p-value in the Diehard DNA table of results. Notice how the right column has many occurrences near p=0.9 and near p=0.0. That trend toward the 0.0 and 1.0 p-values makes the OpenSSL random numbers seem to have worse randomness qualities than the AES-128-cbc ciphertext numbers.

________________________________________________________

July 19, 2010

Diehard summaries 7/19/2010

BIRTHDAY SPACINGS TEST
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 approx. p-value
ran_05.dat 2 1 2 0 1 0 0 1 1 1 occurrences
ran_06.dat 1 1 0 1 0 2 0 2 1 1 occurrences


OVERLAPPING 5-PERMUTATION TEST
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 0 0 1 0 0 0 0 1 0 0
ran_06.dat 0 1 1 0 0 0 0 0 0 0


BINARY RANK TEST for 31x31 matrices
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 0 0 0 0 0 0 0 1 0 0
ran_06.dat 0 0 0 0 1 0 0 0 0 0

BINARY RANK TEST for 6x8 matrices.
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 2 2 1 7 2 4 1 2 2 1
ran_06.dat 3 3 6 1 4 0 3 2 3 0

OVERLAPPING 20-tuples BITSTREAM TEST
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 0 3 2 2 4 1 4 1 3 0
ran_06.dat 1 2 0 0 3 3 2 1 3 2

OPSO test
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 1 1 1 2 2 4 3 5 2 1
ran_06.dat 2 4 0 2 3 2 4 1 3 1

OQSO test
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 2 2 6 2 2 1 3 3 3 2
ran_06.dat 4 3 4 3 6 1 2 1 0 3

COUNT-THE-1's in specified bytes
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 2 4 4 3 1 2 0 3 3 3
ran_06.dat 3 2 4 2 4 4 1 0 4 1

CDPARK
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 3 2 1 0 0 0 0 1 2 1
ran_06.dat 0 2 1 2 1 0 2 0 1 1

3DSPHERES test
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 1 2 2 3 1 3 1 2 1 4
ran_06.dat 2 0 2 4 1 4 1 1 1 4

OVERLAPPING SUMS test
p-values-- 0.0 .1 .2 .3 .4 .5 .6 .7 .8 .9
ran_05.dat 1 0 1 0 1 1 3 1 2 0
ran_06.dat 3 1 1 0 0 1 0 1 1 1

DNA test
p-value--- .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 p-value
ran_05.txt 1 1 1 3 3 8 3 5 5 1 occurrences
ran_06.txt 5 2 2 2 2 4 1 0 6 7 occurrences

Summary of Summaries of p-values

ran_05.dat less than 0.1 has 15 occurrences
ran_05.dat more than 0.9 has 14 occurrences

ran_06.dat less than 0.1 has 24 occurrences
ran_06.dat more than 0.9 has 21 occurrences

It seeems that the OpenSSL random number generator has a bias more than the OpenSSL AES-128-cbc randomness. OpenSSL Rev. 0.9.8 o
_________________________________________

Conclusion, August 8, 2010

Mistakes were made when I (the author and publisher) added up the numbers. The author was wrong about seeing a difference in quality of the random numbers from two sources: ciphertext and a random number generator. This second look at the random numbers was more careful than the first look. The result is a retraction of the guess that one file was less random than another file.

A Perl program was used to make summaries of Diehard reports. It is called merit_47.pl and it is free for you to download. Merit_47.pl provides two output files, one is a bare table of p-values for 17 statistical Diehard tests. This is suitable to be imported into a spreadsheet. The other output file is a commented table of p-values as well as histograms of the number of occurrences of p-values for each test. A batch file called eight.bat is also on line so sixteen files of Diehard results can be processed automatically by the merit_47.pl program.