When processing SQL data in Spark, sometimes single rows are not the right unit — your calculation may need a collection or group of records as input.
This post shows how to group SQL data for use in Spark. Continue reading
When processing SQL data in Spark, sometimes single rows are not the right unit — your calculation may need a collection or group of records as input.
This post shows how to group SQL data for use in Spark. Continue reading
Covering indexes are a crucial performance technique for RDBMS optimization, and one of the most effective tools in the tuning toolbox. When large joined queries suffer from poor performance, here are some tips to tackle the situation. Continue reading
Users often enter data approximately or inaccurately.. But sometimes, we need to search or match this inaccurate data anyway!
For example, users should match existing customer records rather than creating unwanted duplicates.
There are standard algorithms for measuring string-distances, but we’ll need a few extra steps to make this work efficiently against a database.. Continue reading