“It’s a false assumption that you need to choose between a [data lake] and [data warehouse],” Grab analytics lead data scientist Zulfikar Lazuardi said.
Speaking as part of the virtual 2021 Databricks Data and AI World Tour APAC event, Lazuardi touted said the Singapore-based tech giant Grab believes it has uncovered the sweet spot between a data lake and data warehouse by building a centralised data platform.
Dubbed as Grab’s one central data, or OCD for short, the platform was built on Databricks’ Delta Lake, and designed to bring together the flexibility and reliability of a data lake and the BI capabilities of a data warehouse.
The solution was introduced after the company initially started operating with a data lake but came across difficulties around standardisation and BI use-cases. It then decided to introduce a data warehouse as a band-aid solution to the issues that arose, but this only resulted in creating data siloes, making it more difficult to serve its 25 million monthly transacting users and over nine million registered partners.
“The one central data … has been built to have all the capabilities of data lake and data warehouse. By using one central data, we can fully support all use cases for analytics, for data science, and even for BI,” Lazuardi said.
“For the data science persona, we can build the models with the full benefit of Spark and Delta Engine, and for the BI persona, it can have the familiarity and performance of data warehouse.”
Making up the OCD are two main components: OCD central and OCD federated. Lazuardi said the OCD central has been designed to “act as a single source of truth for multiple personas”, while OCD federated has been designed to give more than 50 data teams within the company a data “marketplace” to produce datasets.
At the same time, Lazuardi outlined that the OCD has helped the data team speed up extract, transform, load processes, while removing any manual work associated with integrated datasets.
Some more specific use cases that the OCD has supported include predicting customer lifetime value, assisting marketing teams in evaluation, and enhancing customer interactions through personalised marketing.
Additionally, Grab has since implemented a sandbox solution to give data teams the freedom and flexibility to produce machine learning models without compromising the central data platform.
“We have roughly 50-plus sandboxes,” Lazuardi said.
On Tuesday, the Grab app was down for hours. A Grab spokesperson told ZDNet the cause behind the disruption was due to “an issue with a planned upgrade to one of our systems, which caused some services to be degraded”.
“Our core services have been up and running since late [Tuesday] morning, and our users and partners have been able to use Grab services per normal,” the spokesperson added.
However, intermittent issues continued to affect some users on Wednesday.
“A small segment may still experience minor issues as we work on completing the upgrade,” the Grab spokesperson said.
“We’re sorry for any inconvenience caused, and are communicating with our partners who have been affected to ensure full support to them.”