Spark SQL vs Presto

November 5, 2021November 5, 2021 ~ planet_goodwill ~ 1 Comment

In following article, we have tried to lay out the comparisons of Spark SQL vs Presto. When it comes to checking out Spark Presto, there are some differences that we need to be aware of

Commonality:

Both open source, “big data” software frameworks
Distributed, Parallel and in-memory
BI tools connect to them using JDBC/ODBC
Both have been tested and deployed at petabyte-scale companies
Can be run on premise or in the cloud. They can also be containerized

Differences:

Presto	Spark SQL
Presto is an ANSI SQL:2003 query engine for accessing and unifying data from many different data sources. It’s deployed as a middle-layer for federation	Spark is a general-purpose cluster-computing framework. Core Spark does not support SQL – for SQL support you install the Spark SQL module which adds structured data processing capabilities. Spark SQL is also ANSI SQL:2003 compliant (since Spark 2.0)
Presto is more commonly used to support interactive SQL queries. Queries are usually analytical but can perform SQL-based ETL	Spark is more general in its applications, often used for data transformation and Machine Learning workloads
Presto supports querying data in object stores like S3 by default, and has many connectors available. It also works really well with Parquet and Orc format data	Spark must use Hadoop file APIs to access S3 (or pay for Databricks features). Spark has limited connectors for data sources