Random Number Test Suite

Comprehensive statistical testing suite for evaluating random number generator quality

Overview

The Random Number Test Suite provides a comprehensive battery of statistical tests including chi-square, Kolmogorov-Smirnov, runs test, gap test, poker test, and serial correlation analysis to evaluate random number generator quality. Run standardized tests to assess uniformity, independence, and randomness of number sequences with configurable sample sizes and significance levels. Compare different RNG algorithms, visualize test score plots and p-value distributions, and understand clear pass/fail criteria to validate implementations and choose the right algorithm for your needs.

Tips and Tricks

Getting Started

  1. Select an RNG algorithm to test
  2. Choose sample size - start with 5,000 for quick tests, use 10,000+ for thorough validation
  3. Set significance level - use 0.05 (standard) or 0.01 (strict)
  4. Click “Run All Tests” to execute the full battery
  5. Review results - check which tests pass or fail

Understanding the Tests

Chi-Square Test - Tests uniformity - Checks if values are evenly distributed across bins - Pass: Values appear uniformly random - Fail: Bias toward certain values detected

Kolmogorov-Smirnov (K-S) Test - Tests distribution shape - Compares empirical distribution to theoretical uniform distribution - More powerful for small samples than chi-square - Pass: Distribution matches uniform - Fail: Distribution differs from uniform

Runs Test - Tests independence - Analyzes sequences above/below median - Too many runs: Values alternate (negative correlation) - Too few runs: Values cluster (positive correlation) - Pass: Number of runs appears random

Gap Test - Tests spacing patterns - Examines distances between values in a range - Pass: Gaps follow expected geometric distribution - Fail: Periodicities or spacing patterns detected

Poker Test - Tests consecutive value dependencies - Groups consecutive values and checks frequencies - Inspired by poker hand probabilities - Pass: Frequencies match expected probabilities - Fail: Dependencies between consecutive values

Serial Correlation Test - Tests linear relationships - Measures correlation at different time lags - Pass: Correlation near zero for all lags - Fail: Linear relationships detected

Interpreting Results

P-Values Explained: - p > 0.05: PASS - appears random (fail to reject null hypothesis) - p ≤ 0.05: FAIL - evidence of non-randomness - p > 0.95: Suspicious - might indicate artificial uniformity - p-values should vary: If all p-values are similar, something’s wrong

Overall Quality Assessment: - Excellent: Pass all tests consistently - Good: Pass 5-6 tests (1-2 failures acceptable by chance at α=0.05) - Marginal: Pass 3-4 tests - Poor: Fail most tests - do not use this RNG

What Each Algorithm Should Show: - Mersenne Twister: Pass all or nearly all tests - XorShift: Pass all or nearly all tests - Mulberry32: Pass most tests - LCG: Expect multiple failures (educational example) - Math.random(): Typically pass all tests (browser-dependent)

Practical Tips

Sample Size Selection: - 1,000 - 5,000: Quick screening, less reliable - 5,000 - 10,000: Good balance (recommended starting point) - 10,000 - 50,000: Rigorous validation, more reliable - Larger samples: Better power to detect subtle flaws

Significance Level (α): - α = 0.05 (standard): Accept 5% false positive rate - α = 0.01 (strict): Fewer false alarms, may miss subtle issues - α = 0.10 (lenient): More sensitive, more false positives

Testing Best Practices: 1. Run multiple times with different seeds 2. Try different sample sizes to check consistency 3. Don’t cherry-pick - report all results 4. Document failures - which tests fail and why 5. Consider your use case - not all applications need perfect RNGs

Common Failure Patterns:

Failure Pattern Likely Cause
All tests fail Algorithm is fundamentally broken
Chi-square fails Non-uniform distribution
Runs test fails Serial correlation present
Gap test fails Periodicity in sequence
Poker test fails Dependencies between values
Serial correlation fails Linear relationships at lags

Choosing the Right Algorithm

For Scientific Computing: - Mersenne Twister (gold standard) - Must pass all tests consistently - Long period and excellent statistical properties

For Game Development: - XorShift (fast and good quality) - Should pass most tests - Speed matters for real-time applications

For General Web Apps: - Math.random() (convenient) - Or Mulberry32 (if you need seeding) - Moderate quality requirements

Never Use: - Simple LCG (unless teaching what not to do) - Any algorithm that fails multiple tests - Home-grown algorithms without thorough testing

Advanced Usage

Comparing Algorithms: 1. Run the same test suite on multiple algorithms 2. Record pass/fail rates for each 3. Note p-value distributions 4. Consider both quality and performance

Custom Validation: - Test your own RNG implementation - Use large sample sizes (50,000+) for final validation - Run tests multiple times with different seeds - Combine with visual testing (use Randomness Visualizer)

Debugging Failed Tests: - Start with failed test - understand what it detects - Use visualizations to see the problem - Check algorithm implementation - Verify parameters are correct - Test with different seeds

Important Limitations

These tests DO NOT guarantee: - Cryptographic security - use crypto.getRandomValues() for that - Suitability for all applications - Absence of all possible patterns - Quality beyond the tested sample size

Statistical Testing Caveats: - ~5% of tests will fail by chance (at α=0.05) - Passing tests doesn’t prove perfect randomness - Some patterns might not be detected - Longer sequences might reveal hidden flaws

When to Use This Tool

Quality Assurance: - Validate RNG implementations before deployment - Test after code changes to detect regressions - Verify third-party RNG libraries

Algorithm Selection: - Compare candidates for your application - Understand trade-offs between algorithms - Make evidence-based decisions

Education: - Learn statistical hypothesis testing - Understand what makes a good RNG - See real-world application of statistics

Research: - Evaluate new RNG designs - Study randomness properties - Publish validation results

Quick Reference Guide

Fast Check (2 minutes): - Sample size: 5,000 - Significance: 0.05 - Look for: Major failures only

Standard Validation (5 minutes): - Sample size: 10,000 - Significance: 0.05 - Look for: Consistent pass rate

Rigorous Testing (15+ minutes): - Sample size: 50,000 - Significance: 0.01 - Multiple runs with different seeds - Document all results