Manage Data Masking with Terraform
This tutorial is part of the Manage Bytebase with Terraform series:
- Part 1: Manage Databases with Terraform - Set up instances and environments
- Part 2: Manage Projects with Terraform - Organize databases into projects
- Part 3: Manage Bytebase Settings with Terraform - Configure workspace settings, environment policies, approval flows, and risk management
- Part 4: Manage SQL Review Rules with Terraform - Set up SQL review policies
- Part 5: Manage Database Access Control with Terraform - Set up access controls and permissions
- Part 6: Manage Data Masking with Terraform (This one) - Configure data masking policies
📚 Complete tutorial terraform files on GitHub
File Structure: This tutorial series uses separate Terraform files for better organization. Files are numbered by tutorial part (e.g., 1-instances.tf
for Part 1, 2-projects.tf
for Part 2, etc.). Terraform automatically handles dependencies between files.
Learn how to protect sensitive data with masking policies using Terraform and the Terraform Bytebase Provider.
Bytebase Terraform Provider handles control plane configuration such as settings, policies, access controls. It does not handle data plane operations such as database creation, schema migrations, DML execution, query.
What You’ll Learn
- Define semantic types with various masking algorithms
- Configure data classification levels and categories
- Create global masking policies that apply workspace-wide
- Set up database-specific column masking
- Grant masking exceptions for specific users
Prerequisites
Before starting this tutorial, ensure you have:
- Completed Part 5: Manage Database Access Control with Terraform
- Bytebase running with ngrok and service account configured
- Your Terraform files from the previous tutorials
Setup
From the previous tutorials, you should have:
- Database instances and projects configured
- Users and access controls set up
- Production database
hr_prod
with employee data
Define Masking Methods
You can define masking methods using semantic types, or classification levels with semantic types. These definitions determine how data will be masked when policies are applied.
Important Relationship: Classifications define sensitivity levels, but require mapping to semantic types for actual masking. Semantic types define the masking algorithms that perform the actual data protection.
Option A - Define Semantic Types
Create 6-1-semantic-types.tf
with semantic types that define how data should be masked:
Apply and Verify Semantic Types
Verify in Bytebase: Click Data Access > Semantic Types on the left sidebar. You should see three masking types configured.
Option B - Set Up Data Classification
Create 6-2-classification.tf
with a classification hierarchy for sensitive data:
Apply and Verify Classification
Verify in Bytebase: Click Data Access > Data Classification on the left sidebar. You should see the classification hierarchy with two levels. Note that Level 2 is marked as more sensitive.
Apply Masking Policies
Once you’ve defined your masking methods (semantic types and/or classification), you can apply them using global policies, column-specific configuration, or both.
Important: Classification levels must be mapped to semantic types to perform actual masking. Classification defines the sensitivity level, while semantic types define the masking algorithm.
Option 1 - Apply Global Masking Policy
Create 6-3-global-data-masking.tf
with workspace-wide masking rules that automatically apply based on column names or classification levels. Notice how classification levels are mapped to semantic types:
Apply and Verify Global Policy
Verify in Bytebase:
Click Data Access > Global Masking. You should see the global policy with three conditions with corresponding semantic types.
Log in as Developer 1 (dev1@example.com), then go to SQL Editor to access hr_prod
. double click employee
table on the left. birth_date
has Date year mask
semantic type, and last_name
has Name first letter only
.
Option 2 - Apply Column-Specific Masking
Create 6-4-database-masking.tf
to apply semantic types or classifications directly to specific database columns:
- column
from_date
is assigned the semantic typedate-year-mask
- column
amount
is assigned the classification2-1
(Employment info)
Apply and Verify Global Policy
Verify in Bytebase:
-
Go into Project Two, then click Database > Databases and click hr_prod.
-
Scroll down to find
salary
table, click it. You should see:amount
is assigned asEmployment info
(Level 2) classificationfrom_date
is assigned asdate-year-mask
semantic type
-
Log in as Developer 1 (dev1@example.com), then go to SQL Editor to access
hr_prod
. double clicksalary
table on the left.from_date
hasDate year mask
semantic type, andammount
hasL2
classification which leads toFull masking
semantic type.
Grant Masking Exceptions (Optional)
Create 6-5-masking-exception.tf
to allow specific users to bypass masking for certain operations:
- Workspace Admin (admin@example.com) has Masking Exemptions for
birth_date
in tableemployee
for Query - Workspace Admin (admin@example.com) has Masking Exemptions for
last_name
in tableemployee
for Export
Apply Masking Exceptions
Verify in Bytebase:
-
Log in as Workspace Admin (admin@example.com), then go to SQL Editor to access
hr_prod
, double clickemployee
table on the left. You may notice thebirth_date
is not masked any longer. -
Click Export, and then open the file. You should notice the
birth_date
is still masked whilelast_name
is no longer masked.
Explanation of Used Masking Algorithms
1. Full Mask
- Replaces entire value with substitution string
- Example: “John Doe” → ”***“
2. Range Mask
- Masks specific character ranges
- Example: “2024-03-15” → ”****-03-15”
3. Inner/Outer Mask
- Preserves prefix/suffix while masking the middle
- Example: “Johnson” → “J******“
Key Points
🔑 Critical Relationship - Classifications MUST Map to Semantic Types:
- Classifications define sensitivity levels (Level 1, Level 2, etc.) but cannot mask data by themselves
- Semantic Types define the actual masking algorithms (full-mask, range-mask, etc.)
- You must map classifications to semantic types for masking to occur (e.g., Level 2 → full-mask)
- Direct semantic type assignment can bypass classification entirely
Define Phase (choose one or both):
- Semantic Types: Define reusable masking algorithms
- Classification: Organize data by sensitivity levels (must be mapped to semantic types for masking)
Apply Phase (choose one or both):
- Global Policies: Apply masking rules workspace-wide based on conditions
- Column-Level Masking: Apply semantic types or classifications to specific columns
Additional Control:
- Exceptions: Grant bypass permissions for specific users and actions
Next Steps
Congratulations! You’ve completed the Bytebase Terraform tutorial series. You now have a fully configured Bytebase instance with:
- Database instances and environments
- Organized projects
- Approval workflows and risk policies
- SQL review rules for schema standards
- User access controls
- Data masking for sensitive information
Resources: