Data Governance
Enterprise Data Governance with PostgreSQL
Building GDPR-Compliant Systems That Actually Work
Enterprise Data Governance with PostgreSQL
After working on data governance at Veeva Systems, I've learned that most companies approach GDPR compliance backwards. They start with the legal requirements and try to retrofit their systems.
That's like building a house and then trying to add a foundation.
The Foundation: Data Classification
The foundation of any governance framework is understanding what data you have and where it lives. But here's the thing - most organizations think they know their data better than they actually do.
// Example data classification system
const dataClassification = {
personalData: {
identifiers: ['email', 'phone', 'ssn'],
sensitive: ['health', 'financial', 'biometric'],
categories: ['customer', 'employee', 'vendor']
},
businessData: {
confidential: ['trade_secrets', 'financials'],
internal: ['processes', 'policies'],
public: ['marketing', 'press_releases']
}
};The key insight: Start simple, but be comprehensive. You'll discover data you didn't know existed.
Access Control: The Real Challenge
Role-based access control (RBAC) sounds simple until you try to implement it at enterprise scale. Here's what I learned:
-- Example access control implementation
CREATE TABLE data_access_permissions (
id SERIAL PRIMARY KEY,
user_id UUID REFERENCES users(id),
resource_type VARCHAR(50) NOT NULL,
resource_id UUID NOT NULL,
permission_level VARCHAR(20) NOT NULL, -- read, write, delete
granted_by UUID REFERENCES users(id),
granted_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
reason TEXT,
success BOOLEAN
);The reality: People will find ways to access data they shouldn't. Your audit trail is your lifeline when things go wrong.
Compliance Monitoring: Automation is Key
Manual compliance checking doesn't scale. You need automation, but it has to be smart automation:
// Example compliance monitoring system
class ComplianceMonitor {
constructor(database) {
this.db = database;
this.rules = this.loadComplianceRules();
}
async checkDataRetention() {
const expiredData = await this.db.query(`
SELECT * FROM personal_data
WHERE retention_date < NOW()
AND deletion_status = 'pending'
`);
return this.processExpiredData(expiredData);
}
async auditAccess() {
// Check for suspicious access patterns
const suspiciousAccess = await this.db.query(`
SELECT user_id, COUNT(*) as access_count
FROM access_logs
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY user_id
HAVING COUNT(*) > 100
`);
return suspiciousAccess;
}
}The key insight: Automation catches things humans miss, but humans catch things automation can't understand. You need both.
PostgreSQL-Specific Implementation
PostgreSQL provides excellent tools for implementing data governance, but you have to use them right:
-- Row-level security for data isolation
ALTER TABLE patient_data ENABLE ROW LEVEL SECURITY;
CREATE POLICY patient_data_policy ON patient_data
FOR ALL TO authenticated_users
USING (
user_id = current_setting('app.current_user_id')::uuid
OR
current_setting('app.user_role') = 'admin'
);
-- Audit trigger
CREATE OR REPLACE FUNCTION audit_trigger()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO audit_log (
table_name,
operation,
old_data,
new_data,
user_id,
timestamp
) VALUES (
TG_TABLE_NAME,
TG_OP,
row_to_json(OLD),
row_to_json(NEW),
current_setting('app.current_user_id'),
NOW()
);
RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;Why this works: Database-level controls are harder to bypass than application-level controls. Start here.
Application-Level Controls
But you also need application-level controls for flexibility:
// Example TypeScript implementation
interface DataGovernanceConfig {
dataClassification: {
personalData: string[];
sensitiveData: string[];
businessCritical: string[];
};
accessControls: {
roles: Role[];
permissions: Permission[];
policies: Policy[];
};
complianceRules: {
retentionPolicies: RetentionPolicy[];
consentManagement: ConsentRule[];
auditRequirements: AuditRule[];
};
}
class DataGovernanceService {
async classifyData(data: any): Promise<DataClassification> {
// Implement classification logic
return this.classifier.classify(data);
}
async checkAccess(userId: string, resourceId: string): Promise<boolean> {
// Implement access control logic
return this.accessControl.checkPermission(userId, resourceId);
}
async auditAction(action: AuditAction): Promise<void> {
// Implement audit logging
await this.auditLogger.log(action);
}
}What I Learned (The Hard Way)
- Data discovery is harder than you think: You'll find data in places you never expected
- User behavior changes: People adapt to restrictions in unexpected ways
- Performance matters: Governance controls can kill performance if not designed carefully
- Documentation is everything: Compliance auditors need to understand your system
The Bottom Line
Data governance isn't just about compliance - it's about building systems that can scale safely. PostgreSQL gives you the tools, but you need the right architecture and processes.
Start with classification, implement proper access controls, and automate compliance monitoring. Your future self (and your auditors) will thank you.
Lessons Learned
- Data classification is the foundation of everything
- Database-level controls are more secure than application-level
- Automation is essential but human oversight is critical
- Performance testing is crucial for governance systems
- Documentation saves you during audits