After a nearly ten-month-long preview and years of limping along with the underpowered Azure Data Catalog service, Microsoft is finally entering the data governance prime time today, with general availability (GA) of Azure Purview. Consisting of both an underlying data management/governance platform and a new data catalog application that runs on it, Purview aims to serve enterprise organizations’ twin needs of keeping their data discoverable and managing its use in compliance with data protection regulations in multiple jurisdictions throughout the world.
ZDNet spoke with Mike Flasko, Microsoft’s General Manager, Data Governance & Privacy Platform, to understand Azure Purview’s capabilities, technological underpinnings, pricing structure and roadmap more precisely. Flasko provided comprehensive details on all of these facets; he also described Azure Purview’s architecture quite well, and how that impacts the service in practical terms.
Purview’s not parochial
Purview integrates a host of Microsoft products and services, both in the cloud and on-premises. These include Azure Synapse Analytics, Azure SQL, Azure Data Factory, Power BI, SQL Server and even Microsoft Information Protection. But Purview also sports connectors for non-Microsoft properties, like Amazon Web Services’ S3 storage service, Snowflake and Oracle Database.
Purview had 36 connectors when ZDNet spoke with the Microsoft; and Flasko says new connectors will be released each month. Today, along with the Azure Purview GA itself, Microsoft is also announcing the GA of the above-mentioned AWS S3 support, along with the public preview of data scanning of Erwin, IBM DB2, Salesforce, Google BigQuery, Looker, and Cassandra.
Unlike some data catalog platforms which tightly couple capabilities like a business glossary and data set annotation with connecting to and scanning data sources, Azure Purview takes a more modular approach. The Purview service can scan data sources, collect their metadata, detect lineage information and classify sensitive data on an automated basis, populating what Microsoft calls a “data map” (and what other data catalog vendors might call a knowledge graph). The data map can be built, accessed and maintained through a user interface or via the application programming interface (API) defined by the open source Apache Atlas project. Azure Purview also provides extension points for the creation of new data source connectors and new data classifiers.
Microsoft can also write applications that run on the platform, which is exactly how Microsoft has implemented the Azure Purview Data Catalog. Other applications and capabilities will be forthcoming from Microsoft to implement other data management capabilities (data quality assessment, implemented as an optional scanning process, is one specific example Flasko mentioned). Third-party independent software vendors can integrate with the Purview service in a similar manner. As an example of this, Microsoft has partnered with Alpharetta, GA-based Profisee to provide master data management (MDM) capabilities on the Azure Purview platform.
This is a fitting partnership, as various members of Profisee’s leadership team (including its CEO, Ian Ahern) formerly ran Stratature, a company Microsoft acquired in 2007, the core technology of which became SQL Server Master Data Services. It would seem, just as Azure Data Factory has become the modern-day, cloud-native successor to SQL Server Integration Services (SSIS), that Azure Purview, either directly or indirectly, will serve as a platform for modern successors to SQL Server Master Data Services (MDS) and Data Quality Services (DQS).
Also read: Azure Data Factory v2: Hands-on overview
Pricing and availability
Because of Purview’s modular, multi-headed architecture, those planning their spend around the platform will have some calculating to do. According to Microsoft’s Azure Purview pricing page, calculating Azure Purview pricing essentially boils down to the cost of the Data Map + cost of Scanning + cost of Resource Set.
The Resource Set is, according to the pricing page, “a built-in feature of the Data Map used to optimize the storage and search of data assets associated with partitioned files in data lakes.” Pricing for Resource Set and Scanning operations is based on vCore hours used, though the price per vCore hour and the number of vCores involved in Resource Set and Scanning operations differ.
The actual computation involved in running the Data Map is billed by “capacity units” used, where one such unit serves in an unlimited capacity over 2GB of data. These capacity units are billed on an always-on basis (rather than by usage), and billing persists until and unless the data map is torn down. Use of the Azure Purview Catalog application, meanwhile, is free.
Azure Purview is generally available effective immediately, with availability in 14 Azure regions. This includes three new regions – West US 2, West Central US & North Europe – that were not part of the preview. For those wanting to learn about Purview, including its latest features, Microsoft will be posting weekly blogs for a limited time, starting on October 6th, 2021, on the Azure Purview TechCommunity site.