Spark SQL vs Presto

In following article, we have tried to lay out the comparisons of Spark SQL vs Presto. When it comes to checking out Spark Presto, there are some differences that we need to be aware of

Commonality:

  • Both open source, “big data” software frameworks
  • Distributed, Parallel and in-memory
  • BI tools connect to them using JDBC/ODBC
  • Both have been tested and deployed at petabyte-scale companies
  • Can be run on premise or in the cloud. They can also be containerized

Differences:

PrestoSpark SQL
Presto is an ANSI SQL:2003 query engine for accessing and unifying data from many different data sources.  It’s deployed as a middle-layer for federationSpark is a general-purpose cluster-computing framework. Core Spark does not support SQL – for SQL support you install the Spark SQL module which adds structured data processing capabilities. Spark SQL is also ANSI SQL:2003 compliant (since Spark 2.0)
Presto is more commonly used to support interactive SQL queries.  Queries are usually analytical but can perform SQL-based ETLSpark is more general in its applications, often used for data transformation and Machine Learning workloads
Presto supports querying data in object stores like S3 by default, and has many connectors available. It also works really well with Parquet and Orc format dataSpark must use Hadoop file APIs to access S3 (or pay for Databricks features). Spark has limited connectors for data sources