Chen Riang's Blog

Live everyday like it's your last

Redis Server Side Scripting with Lua

Performance and light weight are always reasons why people want to use Redis. If the ordinary use of Redis does not fulfil your requirement, server side scripting with Lua might come into rescue. R...

Generating Load with Locust

Locust is a performance testing tools just like JMeter that generate load but written in Python. The main feature of Locust is the expandability, you can write the behavior of whole perform...

Profile Java(WSL) using Yourkit(Window)

I have been using WSL(Window Subsystem Linux) for development for nearly a year now, if you ask me how’s the experience using it? I love it, it makes my life as a developer so much easier e.g. soft...

Druid Not Enough Capacity for Primary Segment

The error below being thrown by Historical node when the Druid cluster loading the data segment. 2021-10-01T01:51:13,538 WARN [Coordinator-Exec–0] org.apache.druid.server.coordinator.rules.Load...

Druid Timeout Exception

org.apache.druid.query.QueryTimeoutException

I hit timeout exception from Druid when I performing a big query that require long time to process. org.apache.druid.query.QueryTimeoutException: url [http://xx.xx.xx.xx:8088/druid/v2/] timed o...

Druid Not Enough Buffer Capacity

When running a groupby query with huge aggregation process and follow by sub select query in SQL, I bump into following exception Not enough capacity for even one row! Need[xx] but have[0] So...

Running Apache Druid in Kubernetes

Apache Druid is an open source database that special built for business intelligence queries (OLPA) on large event dataset. It can provide a very low latency data ingestion, flexible data explorati...

Connect Delta Lake with JDBC

In previous blog post, we walk through some basic CRUD operations on Delta Lake. However, if you’re a Java application developer, you might just want to focus on the SQL query logic without having ...

Delta Lake - The Next Gen Data Lake

When we talk about data lake in enterprise, the first term that normally will pop up would be Apache Hadoop. Hadoop seem to be the no brainer solution for most of the company that thinking to maint...

Spark Infer Timestamp data type from JSON

Apache Spark provides a feature to infer data schema based on the incoming data. However, in Spark version 3.1.2, it wrongly interprets the Timestamp field as String data type. 1 2 3 4 5 6 7 8 9 1...