Presto
What it is: Distributed SQL query engine developed at Meta (Facebook). Query data across multiple sources without moving it. ANSI SQL on everything.
What It Does Best
Data federation. JOIN data across S3, MySQL, PostgreSQL, Cassandra in one query. No ETL required.
Interactive speed. In-memory distributed execution. Query petabytes with sub-second latency.
Standard SQL. Full ANSI SQL support. Analysts use familiar syntax across all data sources.
Pricing
Free: Open source, Apache 2.0. Compute costs on AWS/GCP for self-managed clusters. Managed options vary.
When to Use It
β Data lake analytics (S3, HDFS)
β Querying across multiple data sources
β Ad-hoc exploratory analytics
β Too much data to move into warehouse
When NOT to Use It
β Operational workloads (analytics only)
β Small datasets (overhead not worth it)
β Need data persistence (compute-only layer)
Bottom line: Query engine for distributed analytics. Don't move dataβquery it where it lives. Great for data lakes. Note: Presto forked into PrestoDB and Trino. Choose based on community preference.