AELTL Data Coherence Rules
This document defines the coherence rules AELTL checks during schema detection, validation, cleansing, and denormalization preflight.
It is intended as the shared reference for product behavior (UI warnings/errors), API outputs (report JSON), and testing expectations.
Rule Contract
Canonical rule IDs live in eltl/lib/contract.js.
- Blocking error types:
type_mismatch,api_error,no_such_property - Non-blocking warning types:
ambiguous_date,locale_ambiguous_number,id_like_numeric_risk,email_like,url_like,phone_like,primary_key_missing,primary_key_duplicate,join_key_missing,join_key_type_mismatch,join_coverage_low,join_fanout_unexpected,many_to_many_risk,nested_array_explosion_risk,scd_asof_key_missing,scd_asof_ambiguous_match,join_key_normalization_mismatch - Rule-warning types:
invalid_regex,unsafe_regex,invalid_predicate,invalid_date_parse_rule,date_parse_failed,unsupported_timezone,invalid_number_parse_rule,number_parse_failed,invalid_enum_map_rule,invalid_email_normalize_rule,invalid_url_normalize_rule,invalid_phone_normalize_rule
Cohesion and Coherence Checks
1) Validation Errors (blocking)
Source: eltl/lib/batch-loader.js
type_mismatch: field value does not match declared schema type.no_such_property: input has fields not present in target schema.api_error: load/publish pipeline returned API-level failure.
2) Cleansing Warnings (non-blocking)
Source: eltl/lib/cleanse.js
Sampling behavior:
- Warning detection samples the first 2,000 rows by default.
- Values are normalized conservatively first (trim, null-like handling, zero-width/control cleanup).
Rules:
ambiguous_date: slash dates are ambiguous inpdatecolumns. Threshold: ambiguous ratio >= 20%.locale_ambiguous_number: mixed US/EU formats in numeric columns. Threshold: both styles and at least 3 locale-formatted values.id_like_numeric_risk: numeric columns appear identifier-like. Threshold: leading-zero ratio >= 20% or 16+ digit values.email_like,url_like,phone_like: string columns strongly resemble these classes. Threshold: at least 3 matches and >= 70% sampled match rate.
3) Denormalization Preflight Coherence Warnings
Source: eltl/lib/denorm-preflight-report.js
Primary key integrity:
primary_key_missingprimary_key_duplicate
Join-key and relationship integrity:
join_key_missingjoin_key_type_mismatchjoin_coverage_low(inner default minimum hit-rate 95%, left join default minimum hit-rate 50%)join_fanout_unexpected(declared 1:1 has multi-match)many_to_many_risk(duplicates on both sides)nested_array_explosion_risk(default thresholds:p99 fanout > 200ormax fanout > 2000)scd_asof_key_missingscd_asof_ambiguous_matchjoin_key_normalization_mismatch(triggered when key normalization materially alters join behavior; defaults: key-changed rate >= 1% or normalized-vs-raw hit-rate delta >= 5 points)
4) Rule Safety and Parse-Failure Warnings
Source: eltl/lib/batch-loader.js
Used to prevent unsafe or low-confidence auto-fixes from silently mutating data.
- Date parse safeguards: invalid format/timezone and unparseable values generate warnings and skip unsafe rules.
- Number parse safeguards: invalid locale (
US/EU), parse failures, non-finite values, and integer violations generate warnings. - Regex safeguards: invalid regex and unsafe regex (ReDoS guardrail) generate warnings and skip unsafe rules.
Expected User Experience
- Blocking errors are surfaced in Work Queue and prevent affected rows from loading.
- Non-blocking warnings are surfaced as guidance and suggested transforms.
- Denorm preflight warnings are shown before materialization and can emit suggested spec/rule edits.
- Rule warnings are shown when a user-provided transform cannot be safely applied.
Remediation Priority
- Resolve schema blockers (
no_such_property, hardtype_mismatchcategories). - Resolve locale/date ambiguity before broad casts (
date_parse,number_parse). - Resolve join integrity issues before denorm materialization.
- Apply normalization improvements (
email/url/phone) for long-term consistency.
Change Process
- Update IDs in
eltl/lib/contract.js. - Implement detector/validator logic in the relevant module.
- Add/adjust tests in
eltl/tests. - Update this document so product, engineering, and QA stay aligned.