CloudQuery — Sync Cloud Infrastructure to SQL for Security and Compliance
CloudQuery is an open-source ELT framework that extracts configuration data from cloud APIs, SaaS platforms, and databases into PostgreSQL or data lakes for security, compliance, and asset visibility.
What it is
CloudQuery is an open-source ELT framework that extracts configuration data from cloud APIs (AWS, GCP, Azure), SaaS platforms, and databases into PostgreSQL or data lakes. Once your infrastructure data is in SQL, you can run security audits, compliance checks, and asset inventory queries using standard SQL.
CloudQuery is for security engineers, compliance teams, and platform engineers who need visibility into their cloud infrastructure across multiple providers and accounts.
The project is actively maintained with regular releases and a growing user community. Documentation covers common use cases, and the open-source nature means you can inspect the source code, contribute fixes, and adapt the tool to your specific requirements.
How it saves time or tokens
Querying cloud infrastructure natively requires different CLIs, APIs, and authentication for each provider. CloudQuery normalizes everything into a single SQL schema. A compliance check that would require 50 API calls across AWS, GCP, and Azure becomes one SQL query. Scheduled syncs keep your data current without manual polling.
How to use
- Install the CloudQuery CLI via brew or download the binary.
- Create a configuration file specifying source plugins (AWS, GCP) and a destination (PostgreSQL).
- Run
cloudquery syncto extract and load data.
Example
# cloudquery.yml
kind: source
spec:
name: aws
path: cloudquery/aws
tables: ['aws_s3_buckets', 'aws_ec2_instances']
destinations: ['postgresql']
---
kind: destination
spec:
name: postgresql
path: cloudquery/postgresql
spec:
connection_string: 'postgresql://user:pass@localhost:5432/cloudquery'
# Install CloudQuery
brew install cloudquery/tap/cloudquery
# Sync infrastructure data
cloudquery sync cloudquery.yml
# Query: find public S3 buckets
psql -c "SELECT name, region FROM aws_s3_buckets WHERE block_public_acls = false;"
Related on TokRepo
- AI Tools for Security -- Cloud security and compliance tools
- AI Tools for DevOps -- Infrastructure management tools
Common pitfalls
- CloudQuery requires valid cloud credentials with read permissions. Missing IAM permissions cause partial syncs with no error for individual skipped resources.
- Large AWS accounts with thousands of resources can take 30+ minutes to sync. Use table filtering to sync only the resources you need for your queries.
- The PostgreSQL schema changes when you upgrade CloudQuery plugins. Run migrations before syncing after a plugin update to avoid schema mismatch errors.
Before adopting this tool, evaluate whether it fits your team's existing workflow. Read the official documentation thoroughly, and start with a small proof-of-concept rather than a full migration. Community forums, GitHub issues, and Stack Overflow are valuable resources when you encounter edge cases not covered in the documentation.
Frequently Asked Questions
CloudQuery supports AWS, GCP, Azure, DigitalOcean, Oracle Cloud, and many more via source plugins. It also supports SaaS APIs like GitHub, Okta, and Datadog. The plugin ecosystem covers over 100 data sources.
CloudQuery supports PostgreSQL, BigQuery, Snowflake, and file-based destinations like Parquet and CSV. PostgreSQL is the most common choice for interactive querying. BigQuery and Snowflake are used for large-scale analytics.
AWS Config is AWS-only and stores data in AWS. CloudQuery is multi-cloud, stores data in your own database, and lets you join data across providers. You can query AWS instances alongside GCP VMs in a single SQL statement.
CloudQuery CLI runs as a one-shot process. Schedule it with cron, Kubernetes CronJobs, or any task scheduler. For continuous sync, run it on a short interval (e.g., every hour) to keep your database current.
Yes. CloudQuery provides policy packs for CIS benchmarks, SOC 2, HIPAA, and PCI DSS. These are SQL queries that check your infrastructure data against compliance requirements and produce pass/fail reports.
Citations (3)
- CloudQuery GitHub— CloudQuery is an open-source ELT framework for cloud infrastructure
- CloudQuery Documentation— Supports AWS, GCP, Azure, and 100+ data sources
- CloudQuery Policies— CIS benchmark policy packs for compliance
Related on TokRepo
Discussion
Related Assets
Moodle — Open-Source Learning Management System
The most widely used open-source learning platform, providing course management, assessments, and collaboration tools for educators and organizations worldwide.
Sylius — Headless E-Commerce Framework on Symfony
An open-source headless e-commerce platform built on Symfony and API Platform, designed for developers who need a customizable and API-first commerce solution.
Akaunting — Free Self-Hosted Accounting Software
A free, open-source online accounting application built on Laravel for small businesses and freelancers to manage invoices, expenses, and financial reports.