Random Number Test Suite
Overview
The Random Number Test Suite provides a comprehensive battery of statistical tests including chi-square, Kolmogorov-Smirnov, runs test, gap test, poker test, and serial correlation analysis to evaluate random number generator quality. Run standardized tests to assess uniformity, independence, and randomness of number sequences with configurable sample sizes and significance levels. Compare different RNG algorithms, visualize test score plots and p-value distributions, and understand clear pass/fail criteria to validate implementations and choose the right algorithm for your needs.
Tips and Tricks
Getting Started
- Select an RNG algorithm to test
- Choose sample size - start with 5,000 for quick tests, use 10,000+ for thorough validation
- Set significance level - use 0.05 (standard) or 0.01 (strict)
- Click “Run All Tests” to execute the full battery
- Review results - check which tests pass or fail
Understanding the Tests
Chi-Square Test - Tests uniformity - Checks if values are evenly distributed across bins - Pass: Values appear uniformly random - Fail: Bias toward certain values detected
Kolmogorov-Smirnov (K-S) Test - Tests distribution shape - Compares empirical distribution to theoretical uniform distribution - More powerful for small samples than chi-square - Pass: Distribution matches uniform - Fail: Distribution differs from uniform
Runs Test - Tests independence - Analyzes sequences above/below median - Too many runs: Values alternate (negative correlation) - Too few runs: Values cluster (positive correlation) - Pass: Number of runs appears random
Gap Test - Tests spacing patterns - Examines distances between values in a range - Pass: Gaps follow expected geometric distribution - Fail: Periodicities or spacing patterns detected
Poker Test - Tests consecutive value dependencies - Groups consecutive values and checks frequencies - Inspired by poker hand probabilities - Pass: Frequencies match expected probabilities - Fail: Dependencies between consecutive values
Serial Correlation Test - Tests linear relationships - Measures correlation at different time lags - Pass: Correlation near zero for all lags - Fail: Linear relationships detected
Interpreting Results
P-Values Explained: - p > 0.05: PASS - appears random (fail to reject null hypothesis) - p ≤ 0.05: FAIL - evidence of non-randomness - p > 0.95: Suspicious - might indicate artificial uniformity - p-values should vary: If all p-values are similar, something’s wrong
Overall Quality Assessment: - Excellent: Pass all tests consistently - Good: Pass 5-6 tests (1-2 failures acceptable by chance at α=0.05) - Marginal: Pass 3-4 tests - Poor: Fail most tests - do not use this RNG
What Each Algorithm Should Show: - Mersenne Twister: Pass all or nearly all tests - XorShift: Pass all or nearly all tests - Mulberry32: Pass most tests - LCG: Expect multiple failures (educational example) - Math.random(): Typically pass all tests (browser-dependent)
Practical Tips
Sample Size Selection: - 1,000 - 5,000: Quick screening, less reliable - 5,000 - 10,000: Good balance (recommended starting point) - 10,000 - 50,000: Rigorous validation, more reliable - Larger samples: Better power to detect subtle flaws
Significance Level (α): - α = 0.05 (standard): Accept 5% false positive rate - α = 0.01 (strict): Fewer false alarms, may miss subtle issues - α = 0.10 (lenient): More sensitive, more false positives
Testing Best Practices: 1. Run multiple times with different seeds 2. Try different sample sizes to check consistency 3. Don’t cherry-pick - report all results 4. Document failures - which tests fail and why 5. Consider your use case - not all applications need perfect RNGs
Common Failure Patterns:
| Failure Pattern | Likely Cause |
|---|---|
| All tests fail | Algorithm is fundamentally broken |
| Chi-square fails | Non-uniform distribution |
| Runs test fails | Serial correlation present |
| Gap test fails | Periodicity in sequence |
| Poker test fails | Dependencies between values |
| Serial correlation fails | Linear relationships at lags |
Choosing the Right Algorithm
For Scientific Computing: - Mersenne Twister (gold standard) - Must pass all tests consistently - Long period and excellent statistical properties
For Game Development: - XorShift (fast and good quality) - Should pass most tests - Speed matters for real-time applications
For General Web Apps: - Math.random() (convenient) - Or Mulberry32 (if you need seeding) - Moderate quality requirements
Never Use: - Simple LCG (unless teaching what not to do) - Any algorithm that fails multiple tests - Home-grown algorithms without thorough testing
Advanced Usage
Comparing Algorithms: 1. Run the same test suite on multiple algorithms 2. Record pass/fail rates for each 3. Note p-value distributions 4. Consider both quality and performance
Custom Validation: - Test your own RNG implementation - Use large sample sizes (50,000+) for final validation - Run tests multiple times with different seeds - Combine with visual testing (use Randomness Visualizer)
Debugging Failed Tests: - Start with failed test - understand what it detects - Use visualizations to see the problem - Check algorithm implementation - Verify parameters are correct - Test with different seeds
Important Limitations
These tests DO NOT guarantee: - Cryptographic security - use crypto.getRandomValues() for that - Suitability for all applications - Absence of all possible patterns - Quality beyond the tested sample size
Statistical Testing Caveats: - ~5% of tests will fail by chance (at α=0.05) - Passing tests doesn’t prove perfect randomness - Some patterns might not be detected - Longer sequences might reveal hidden flaws
When to Use This Tool
Quality Assurance: - Validate RNG implementations before deployment - Test after code changes to detect regressions - Verify third-party RNG libraries
Algorithm Selection: - Compare candidates for your application - Understand trade-offs between algorithms - Make evidence-based decisions
Education: - Learn statistical hypothesis testing - Understand what makes a good RNG - See real-world application of statistics
Research: - Evaluate new RNG designs - Study randomness properties - Publish validation results
Quick Reference Guide
Fast Check (2 minutes): - Sample size: 5,000 - Significance: 0.05 - Look for: Major failures only
Standard Validation (5 minutes): - Sample size: 10,000 - Significance: 0.05 - Look for: Consistent pass rate
Rigorous Testing (15+ minutes): - Sample size: 50,000 - Significance: 0.01 - Multiple runs with different seeds - Document all results