In previous blog post, we walk through some basic CRUD operations on Delta Lake. However, if you’re a Java application developer, you might just want to focus on the SQL query logic without having to worry about the implementation details of Spark job.

“Can I access Delta Lake with JDBC? “

“Yes. Use Spark Thrift Server (STS)”

Spark Thrift Server (STS)

STS is basically an Apache Hive Server2 that enable JDBC/ODBC client to execute queries remotely to retrieve results. The differences between STS and hiveSever2 is that instead of submitting the SQL queries as hive map reduce job STS will use Spark SQL engine. With STS, you will able to leverage the full spark capabilities to perform the queries.

Start/Stop Server

Run the following command in the Spark distribution folder (SPARK_HOME).

Start server with Spark local mode:

sbin/start-thriftserver.sh \
  --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
  --packages 'io.delta:delta-core_2.12:1.0.0'

Start server on existing Spark cluster:

sbin/start-thriftserver.sh \
  --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
  --packages 'io.delta:delta-core_2.12:1.0.0' \ 
  --master spark://<spark host>:<spark port> 

Start server with S3 aceess credential:

sbin/start-thriftserver.sh \
  --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
  --conf spark.hadoop.fs.s3a.access.key=<s3 access key> \
  --conf spark.hadoop.fs.s3a.secret.key=<s3 secret key> \
  --packages 'io.delta:delta-core_2.12:1.0.0, org.apache.hadoop:hadoop-aws:3.3.1' \ 
  --master spark://<spark host>:<spark port> 

Stop server:

sbin/stop-thriftserver.sh

Connect with JDBC

To test the JDBC connection we will use the beeline tools that package in Spark’s bin folder.

 bin/beeline

Connect to STS :

beeline> !connect jdbc:hive2://localhost:10000

** Beeline will prompt for username and password. In non-secure mode, simply enter the username on your machine and a blank password. For secure mode, please follow the instructions given in the beeline documentation.

Once connected you can simply issue the SQL queries.

Conclusion

In this blog post, we looked into using Spark thrift server to query Delta lake using JDBC.

Reference

Disrtibuted SQL Engine - link
Thrift JDBC/ODBC Server — Spark Thrift Server (STS) - link
HiveServer2 Clients - link

Connect Delta Lake with JDBC

Spark Thrift Server (STS)

Start/Stop Server

Connect with JDBC

Conclusion

CATALOG

FEATURED TAGS