Databases

Relational Databases

Azure SQL

Azure SQL is the collective name for a family of relational database solutions based on the Microsoft SQL Server database engine.

Azure SQL Database

  • a fully managed platform-as-a-service (PaaS) database hosted in Azure.

  • available as a Single Database or an Elastic Pool.

Single Database

This option enables you to quickly set up and run a single SQL Server database. You create and run a database server in the cloud, and you access your database through this server. Microsoft manages the server, so all you have to do is configure the database, create your tables, and populate them with your data. You can scale the database if you need more storage space, memory, or processing power. By default, resources are pre-allocated, and you're charged per hour for the resources you've requested. You can also specify a serverless configuration. In this configuration, Microsoft creates its own server, which might be shared by databases belonging to other Azure subscribers. Microsoft ensures the privacy of your database. Your database automatically scales and resources are allocated or deallocated as required.

Elastic Pool

This option is similar to Single Database, except that by default multiple databases can share the same resources, such as memory, data storage space, and processing power through multiple-tenancy. The resources are referred to as a pool. You create the pool, and only your databases can use the pool. This model is useful if you have databases with resource requirements that vary over time, and can help you to reduce costs. For example, your payroll database might require plenty of CPU power at the end of each month as you handle payroll processing, but at other times the database might become much less active. You might have another database that is used for running reports. This database might become active for several days in the middle of the month as management reports are generated, but with a lighter load at other times. Elastic Pool enables you to use the resources available in the pool, and then release the resources once processing has completed.

Azure SQL Managed Instance

  • platform-as-a-service (PaaS)

  • a hosted instance of SQL Server with automated maintenance, which allows more flexible configuration than Azure SQL DB but with more administrative responsibility for the owner.

  • depends on other Azure services such as

    • Azure Storage for backups,

    • Azure Event Hubs for telemetry,

    • Azure Active Directory for authentication,

    • Azure Key Vault for Transparent Data Encryption (TDE)

  • Azure SQL Managed Instance provides features not available in Azure SQL Database. If your system uses features such as linked servers, Service Broker (a message processing system that can be used to distribute work across servers), or Database Mail (which enables your database to send email messages to users), then you should use managed instance.

  • To check compatibility with an existing on-premises system, Data Migration Assistant (DMA) tool helps to analyzes databases on SQL Server and reports any issues that could block migration to a managed instance.

SQL Server on Azure Virtual Machines

  • A virtual machine running in Azure with an installation of SQL Server.

  • infrastructure-as-a-service (IaaS)

  • lift-and-shift ready

Azure Database Migration Service

  • Service that migrates databases to the cloud with no application code changes.

Azure Database for Open-Source Relational Databases

Azure Database for MySQL

  • PaaS

  • based on the MySQL Community Edition.

Azure Database for MariaDB

  • PaaS

  • based on the MariaDB Community Edition.

Features of Azure Database for MySQL & MariaDB

  • Built-in high availability with no additional cost.

  • Predictable performance, using inclusive pay-as-you-go pricing.

  • Easy scaling that responds quickly to demand.

  • Secure data, both at rest and in motion.

  • Automatic backups and point-in-time restore for the last 35 days.

  • Enterprise-level security and compliance with legislation.

Azure Database for PostgreSQL

  • PaaS

  • Deployment Options: Single-server and Hyperscale.

Single Server

  • similar to Azure Database for MySQL

Hyperscale (Citus)

  • a deployment option that scales queries across multiple server nodes to support large database loads.

  • Your database is split across nodes. Data is split into chunks based on the value of a partition key or sharding key.

  • query engine parallelizes incoming SQL queries across servers for faster responses on large datasets.

  • it serves when workload is near 100 gigs of data.

Non-Relational Database

Azure Storage

Azure Storage is a core Azure service that enables you to store data in:

Blob containers - scalable, cost-effective storage for binary files.

  • supports

    • Block blobs - contains up to 50,000 blocks, each of size up to 100 MB, giving a max total of ~4.7 TB.

    • Page blobs - a collection of fixed size 512-byte pages, can hold up to 8TB of data.

    • Append blobs - updating and deletion is not supported and can have a size of 195 GB.

File shares - network file shares such as you typically find in corporate networks.

  • Azure Files enables you to share up to 100 TB of data in a single storage account.

  • The maximum size of a single file is 1 TB, but can be configured to set quotas to limit the size of each share below this figure.

  • Azure File Storage supports up to 2000 concurrent connections per shared file.

  • supports

    • Server Message Block (SMB) file sharing is commonly used across multiple operating systems (Windows, Linux, macOS).

    • Network File System (NFS) shares are used by some Linux and macOS versions. To create an NFS share, you must use a premium tier storage account and create and configure a virtual network through which access to the share can be controlled.

Tables - key-value storage for applications that need to read and write data values quickly.

  • An Azure Table enables you to store semi-structured data.

  • Azure Table Storage splits a table into partitions. Partitioning is a mechanism for grouping related rows, based on a common property or partition key.

  • Partitions are independent from each other, and can grow or shrink as rows are added to, or removed from, a partition. A table can contain any number of partitions.

  • When you search for data, you can include the partition key in the search criteria. This helps to narrow down the volume of data to be examined, and improves performance by reducing the amount of I/O (input and output operations, or reads and writes) needed to locate the data.

Azure Data Lake Storage Gen2

  • Azure Data Lake Store (Gen1) is a separate service for hierarchical data storage for analytical data lakes, often used by so-called big data analytical solutions that work with structured, semi-structured, and unstructured data stored in files.

  • Systems like Hadoop in Azure HDInsight, Azure Databricks, and Azure Synapse Analytics can mount a distributed file system hosted in Azure Data Lake Store Gen2 and use it to process huge volumes of data.

Azure Stream Analytics

Azure Stream Analytics is a real-time stream processing engine that captures a stream of data from an input, applies a query to extract and manipulate data from the input stream, and writes the results to an output for analysis or further processing.

Azure Synapse Analytics

Azure Synapse Analytics is a comprehensive, unified data analytics solution that provides a single service interface for multiple analytical capabilities, including:

  • Pipelines - based on the same technology as Azure Data Factory.

  • SQL - a highly scalable SQL database engine, optimized for data warehouse workloads.

  • Apache Spark - an open-source distributed data processing system that supports multiple programming languages and APIs, including Java, Scala, Python, and SQL.

  • Azure Synapse Data Explorer - a high-performance data analytics solution that is optimized for real-time querying of log and telemetry data using Kusto Query Language (KQL).

Fully managed data warehouse with integral security at every level of scale at no extra cost.

Azure Databricks

Azure Databricks is an Azure-integrated version of the popular Databricks platform, which combines the Apache Spark data processing platform with SQL database semantics and an integrated management interface to enable large-scale data analytics.

  • insights from data and build AI solutions

  • supports Python, Scala, R, Java, and SQL

  • supports datascience framework and libraries including Tensorflow, Pytorch and Scikit-learn.

Azure HDInsight

Azure HDInsight is an Azure service that provides Azure-hosted clusters for popular Apache open-source big data processing technologies, including:

  • Apache Spark - a distributed data processing system that supports multiple programming languages and APIs, including Java, Scala, Python, and SQL.

  • Apache Hadoop - a distributed system that uses MapReduce jobs to process large volumes of data efficiently across multiple cluster nodes. MapReduce jobs can be written in Java or abstracted by interfaces such as Apache Hive - a SQL-based API that runs on Hadoop.

  • Apache HBase - an open-source system for large-scale NoSQL data storage and querying.

  • Apache Kafka - a message broker for data stream processing.

  • Apache Storm - an open-source system for real-time data processing through a topology of spouts and bolts.

Azure Data Factory

  • Azure Data Factory is an Azure service that enables you to define and schedule data pipelines to transfer and transform data.

  • You can integrate your pipelines with other Azure services, enabling you to ingest data from cloud data stores, process the data using cloud-based compute, and persist the results in another data store.

Last updated