Since big data deals with huge chunks of data, there are high chances of having bad data and data quality issues at each stage of the process. Functional Testing should be performed at each of the four phases of Big data processing to ensure no errors.
Data Flow validation
- Data Acquisition based on business use cases and Validating data movement across different layers.
- Testing the Data Aggregation and Data filtering mechanisms.
- End-to-End Data validation and transformation logic based on business rules
Data Integrity
- Validate for Data completeness with referential integrity checks, Data constraints and duplication checks along with error conditions.
- Testing boundaries to identify schema limitations for each layer.
Data Ingestion Layer
- Ability to connect with different data models and replay the data through messaging systems, Monitor Data loss.
- Fault Tolerance, Continuous availability and connection to different data streams
Data Processing Layer
- Business Rules validation&Validating Map-Reduce process.
- Data Integrity, Data Transformation, Data Aggregation and Consolidation
- Data Processing Performance & Exception Handling.
Data Storage Layer
- Read & Write Timeouts, Continuous Availability, Load balancing and Query Performance Analysis
Reports Testing
- Validation for Measures & Dimensions, Real-time Custom Reporting, Drill up / down mechanisms, Business reports & Charts.