Analytics API Security
The Analytics API (/v2/analytics.getVerifications) allows workspace users to query their verification data using SQL. This is a powerful feature that requires multiple layers of security to prevent abuse and ensure data isolation.
Security Model
Multi-Layer Defense
We implement security at three levels:
- API Level: Query parsing, validation, and rewriting
- RBAC Level: Permission-based access control
- ClickHouse Level: Per-workspace users with quotas and resource limits
This defense-in-depth approach ensures that even if one layer is bypassed, the others still protect the system.
API Level Security
Query Parser (pkg/clickhouse/query-parser)
The query parser is responsible for validating, rewriting, and securing user-submitted SQL queries before they reach ClickHouse.
What it does:
- Parse SQL: Uses
github.com/AfterShip/clickhouse-sql-parserto parse the query into an AST - Validate query type: Only SELECT queries are allowed (no INSERT, UPDATE, DELETE, DROP, etc.)
- Enforce workspace isolation: Automatically injects
WHERE workspace_id = 'ws_xxx'to every query - Validate table access: Only allows queries against pre-approved tables
- Enforce limits: Adds
LIMITclause if not present, caps at configured maximum - Validate functions: Blocks dangerous or expensive functions
Example transformation:
Direct Column Access
Users query ClickHouse tables directly using the actual column names. The schema exposes:
key_space_id: The API's KeyAuth ID (e.g.,ks_1234)identity_id: Internal identity identifierexternal_id: User-provided external identifierkey_id: Individual key identifier
Users can find their key_space_id in the API settings in the dashboard.
No ID transformation is performed - users query with the actual IDs stored in ClickHouse, and results contain those same IDs.
Table Aliases
Users query against friendly table names that map to actual ClickHouse tables:
Limits Enforcement
Multiple limits protect against resource exhaustion:
- Query result rows: Max 10,000 rows returned
- Memory usage: Max memory per query
- Execution time: Max seconds per query
These are enforced both at the parser level and at the ClickHouse user level.
RBAC Level Security
Permission Model
Access to analytics requires one of these permissions:
analytics.read: Workspace-level access to all analyticsapi.*.read_analytics: Wildcard access to analytics for all APIsapi.<api_id>.read_analytics: Per-API analytics access. The system translatesapi_idtokey_space_idinternally.
Permission checking logic (handler.go:170-212):
This ensures users with per-API permissions cannot access data they shouldn't see.
ClickHouse Level Security
Per-Workspace Database Users
Each workspace gets its own ClickHouse user created by the create-clickhouse-user CLI command.
User configuration:
- Username:
workspace_<workspaceID>_user - Password: Random 32-character string, encrypted with Vault
- Database access: Only the
defaultdatabase - Table grants:
SELECTonly on approved tables
Creation command:
ClickHouse QUOTA
Quotas limit query volume over time windows:
This prevents runaway query volume even if API-level rate limits are bypassed.
ClickHouse SETTINGS PROFILE
Settings profiles enforce resource limits per query:
Why readonly = 2?
readonly = 0: Full access (not suitable for users)readonly = 1: Read-only, cannot set any settings (breaks ClickHouse driver)readonly = 2: Read-only for data, can set query-level settings within profile limits
The ClickHouse HTTP driver needs to set query execution parameters, so we use readonly = 2 which allows the driver to set settings while the SETTINGS PROFILE enforces maximum values.
Connection Management
The ConnectionManager (internal/services/analytics/connection_manager.go) handles per-workspace connections:
Features:
- Two-layer caching:
- Workspace settings cache (24hr) with SWR for database lookups
- Connection cache (24hr) with health checks
- Vault integration for password decryption
- DSN template-based connection building
- Automatic connection health verification (10% sampling)
- Graceful connection cleanup on shutdown
DSN Template:
The API uses HTTP protocol instead of native TCP because:
- Simpler connection model (stateless requests)
- No persistent connection pool overhead per workspace
- Easier to debug and monitor
- Works well with ClickHouse Cloud
Connection lifecycle:
- Request comes in with workspace ID
- Check connection cache for existing connection
- If cache miss or failed health check:
- Fetch workspace settings from cache (SWR)
- Decrypt password using Vault
- Build DSN from template
- Create new ClickHouse connection
- Store in cache
- Execute query using workspace-specific connection
Error Handling
Error Codes
Analytics-specific error codes:
analytics_not_configured(404): Workspace doesn't have analytics enabledanalytics_connection_failed(503): Cannot connect to workspace's ClickHouse userinvalid_analytics_query(400): SQL syntax errorinvalid_table(400): Table not in allowed listinvalid_function(400): Function not allowedquery_not_supported(400): Non-SELECT query attemptedquery_execution_timeout(400): Query exceeded time limitquery_memory_limit_exceeded(400): Query exceeded memory limitquery_rows_limit_exceeded(400): Query exceeded rows-to-read limit
These are mapped in pkg/zen/middleware_errors.go to appropriate HTTP status codes.
Monitoring and Debugging
Query Logging
All analytics queries are logged with:
- Request ID
- Workspace ID
- Original user query
- Rewritten safe query
- Execution time
- Error details (if any)
ClickHouse System Tables
Monitor analytics usage:
Connection Health
The connection manager performs periodic health checks:
- 10% of requests trigger a
PINGbefore query execution - Failed pings remove the connection from cache
- Next request will create a fresh connection
- Prevents using stale or dead connections
Common Issues and Solutions
"Cannot modify setting in readonly mode"
Cause: User has readonly = 1 instead of readonly = 2
Solution: Re-run create-clickhouse-user with the correct settings profile (already fixed to use readonly = 2)
"No KEK found for key ID"
Cause: API's Vault service doesn't have access to the KEK used to encrypt the password
Solution: Ensure API and create-clickhouse-user use the same Vault configuration (S3 bucket, master keys)
Query timeout errors
Cause: Query is too complex or scanning too many rows
Solutions:
- Query aggregated tables (
per_hour,per_day) instead of raw tables - Add more specific WHERE filters to reduce data scanned
- Increase workspace's
max_execution_timesetting - Use indexed columns in WHERE clauses
Permission denied errors
Cause: User's root key doesn't have required permissions
Solutions:
- Grant
analytics.readfor workspace-level access - Grant
api.*.read_analyticsfor all APIs - Grant specific
api.<api_id>.read_analyticspermissions. The system translatesapi_idtokey_space_idfor the permission check.
Best Practices
For Query Performance
- Use aggregated tables when possible (per_hour, per_day, per_month)
- Filter by workspace_id first (automatic, but good to know)
- Use indexed columns in WHERE clauses (time, workspace_id, key_space_id)
- Limit result size to what you actually need
- Avoid expensive functions like complex string operations on large datasets
For Security
- Never bypass the query parser - always use the safe, rewritten query
- Verify permissions before query execution - check after virtual column resolution
- Use workspace-specific connections - never share connections between workspaces
- Encrypt passwords at rest - use Vault for all credential storage
- Monitor quota usage - alert when workspaces approach limits
For Development
- Test queries locally using Docker Compose ClickHouse instance
- Validate parser changes with comprehensive test cases
- Check query plans with
EXPLAINfor performance - Monitor error rates in production query logs
- Keep parser and ClickHouse settings in sync - both should enforce same limits