Data Profiler

Comprehensive dataset overview with automatic quality assessment

Upload Dataset

Help

What is Data Profiling?

Data profiling is the process of examining and analyzing data to understand its structure, content, and quality. It helps identify:

  • Data types and formats
  • Missing values and null patterns
  • Unique values and cardinality
  • Statistical distributions
  • Potential quality issues
Profile Metrics
  • Column Count: Number of columns/features
  • Row Count: Number of observations
  • Data Types: Automatic detection (numeric, string, date, boolean)
  • Missing %: Percentage of null/empty values per column
  • Cardinality: Number of unique values
  • Memory Usage: Estimated size in memory
Data Quality Indicators

Quality score based on:

  • Completeness: Low missing data percentage
  • Uniqueness: Appropriate cardinality for data type
  • Consistency: Uniform data types within columns
  • Validity: Values match expected formats

Scores:

  • 90-100%: Excellent quality
  • 70-89%: Good quality
  • 50-69%: Fair quality - review recommended
  • <50%: Poor quality - cleaning required
Column Analysis

For each column, the profiler provides:

  • Name: Column identifier
  • Type: Detected data type
  • Non-Null: Count of non-empty values
  • Unique: Count of distinct values
  • Missing %: Percentage of missing values
  • Sample Values: First few unique values
  • Stats: Min, max, mean (for numeric columns)
Use Cases
  • Initial EDA: First step in exploratory data analysis
  • Data Quality Check: Assess dataset before analysis
  • Documentation: Generate data dictionary
  • ETL Validation: Verify data pipeline outputs
  • Schema Discovery: Understand unknown datasets
  • Cleaning Planning: Identify what needs attention