Azure Data Lake Storage or a short name Data Lake is a system or repository of a huge quantity of data in a structured or unstructured form that is stored in its natural format(i.e object blobs or files).
Data lakes handle the three Vs of big data (Volume, Velocity, and Variety).
Let’s dig deeper and understand all the topics related to the Azure data lake and the pros and cons of ADLS gen1 vs gen2.
What is Azure data lake storage?
It is a part of the Microsoft Azure public cloud platform, which is a cloud platform that supports big data storage and analytics of any kind and size.
Data lake storage is the storage of information solutions that has specifically been designed for the analytics of big data.
Its working is quite simple. Each data lake service underneath always has a container.
That container is very often called a file system and just like any file system it has folders and files within it which is fully scalable and secure that supports HDFS semantics while working with the Apache Hadoop ecosystem.
On each data lake, you can actually have multiple containers, and multiple file systems containing any structure of files and folders that you wish to have.
What are the features of Azure Data Lake?
Some notable features of Azure data lake storage are as follows
- Infinite sizes of data can be stored in a single repository.
- Both structured and unstructured data in their natural formats can be stored
- There is high availability, durability, and reliability.
There are 2 types of Azure data lake storage ADLS Gen1 and ADLS Gen2. Let’s try to understand each of them in detail.
Azure Data Lake Storage or ADLS Gen1 vs Gen2
- Earlier Azure Data Lake Gen1 was generally used but now Azure Data Lake Gen2 is mostly used and it is also reported that on Feb 29, 2024, Gen1 will be retired, so anyone using Azure Data Lake Gen1 has to migrate to Azure Data Lake Gen2 by that date.
- ADLS Gen1 can be accessed from Hadoop using the WebHDFS-compatible REST APIs.
- It has all enterprise-grade capabilities such as security, manageability, scalability, reliability, and availability.
- ADLS Gen2 is designed for big data analytics which means there is something called azure blob file system or ABFS which is encrypted.
- This file system is compatible, allowing many of the existing solutions in the market to connect with no hustle.
All the systems have almost no issues connecting to the data lake. Out of the box, just a few lines of the code recommendation and you are ready to use it.
But additionally, one thing to highlight is the multiprotocol access. With the help of this access, you will not only have the azure blob file system but you will also be able to access the windows azure storage blob [WASB].
Data Lake is an information lake service that is given by Microsoft Azure. There are two components to the azure data lake. The first we called data lake analytics and the second is known as the data lake store
Azure Data Lake Gen1 vs Azure Data Lake Gen2
|In the azure data lake storage Gen1, you can store literally any amount of data of any size for any amount of time. So basically you will run analytic jobs using data lake analytics on the data stored in the data lake store.||Whereas talking about Azure data lake storage gen2, this advanced version will have both the options for storage that is the file system storage as well as the object system storage.|
|According to the Azure data lake storage gen1, the user will be able to store the data in the form of file storage which is separated into blocks giving you performance & security. This can be dependent on the Azure data lake storage hierarchy||Whereas Azure Data Lake Gen2 provides you with safety as well as healthy and smooth performance. When we talk about object storage, it is there for scalability.|
|Azure Data Lake Gen1 does not support Hot/Cold and Redundant storage||Whereas Azure Data Lake Gen2 supports Hot/Cold and Redundant storage|
How to create Azure data lake storage gen2?
After clearing all the doubts the question arises How to access ADLS gen2?
- All you require to have is an access key to the ADLS account
- You will get the access key from the storage account of azure
- You will go to the Data bricks workspace and create the secret scope.
- After this process, you have to connect the secret scope and the access key.
- These two are the access key along with the secret scope that will be used to create your account for storage. You will quickly be able to locate your Data Lake account.
Azure data lake storage gen2 vs blob storage
- Azure data lake storage and Azure Blob storage have many variations among them. Let us have a glance at the main and important differences:
- Data Lake gen 2 allows granular access control on a higher level as compared to the blob storage because it supports ACL permissions.
- ADLS is a hierarchical file system that possesses a hierarchical namespace. If we have a look at the blob storage, we will find a flat namespace. This is a dig into the performance of blob storage when we are talking about big data analytics.
- ADLS gen2 is Hadoop friendly and they can use the data which is available in storage while the blob storage is not compatible with it
Azure data lake storage Power BI
To access the azure data lake gen2 power bi, you have to first launch power bi on the desktop of your system. After this, you will have to select the option of “get data” followed by “select more” in the home tab. Go through the following steps mentioned below.
Azure followed by the azure data lake store gen2 followed by connecting.
After following these you can connect to DataLake gen2 power bi
Here you again have two options to select from. Either you can use the file system or else the common data folder. The choice depends upon the user.
Azure data lake storage architecture
An azure data lake is fully based on the Apache YARN which itself is a cloud management tool. Built on Apache Hadoop, this cloud-based solution has no need for the hardware that is provided by its user to get installed.
ADLS security assures full safety and it is designed in such a way that it possesses low latency.
Azure data lake storage python has been released by Microsoft as a better and second version of python linked with it. It will provide you with a hierarchical namespace, capabilities of blob storage, and atomic operations.
Azure Data Lake Store pricing
Talking about the pricing you can pay according to the storage.
- for the usage of the first 100 TB, you will have to pay an amount of Rs.2.8098 per GB.
- second 100TB to 1000 TB, you will pay Rs.2.7378 per GB.
- and third 1000 TB to 5000 TB the user will have to pay an amount of Rs.2.6657 per GB.
If you find it costly, you can also go with the monthly packages where you will have to pay a discount of almost 33%.
Azure data lake storage gen 2 limits or when we talk about the storage capacity, it is unlimited. It is the most productive way of storage of data.
Azure data lake storage gen 2 disaster recovery
Azure Data Lake Gen2 disaster recovery can be a bit hectic and time-consuming. It has certain limitations which lead to fail sometimes. You can easily check out the link on their site to avoid all these failovers.
Azure data lake storage monitoring is the metrics provided by the gen2 beneath the storage account present in the azure monitor.
After knowing quite a lot about Azure data lake storage, we can come to the conclusion that it is one of the best places to store all your data undoubtedly. It provides high storage of data at less cost with lots of facilities that you should definitely opt for.
Also if you are using Azure Data Lake Gen1 or planning to use Azure Data Lake Gen1 then I will definitely suggest you shift or choose Gen1 as Gen1 is going to retire soon in 2024.
When is Microsoft Retiring ADLS Gen1?
e Data Lake Gen1 was generally used but now Azure Data Lake Gen2 is mostly used and it is also reported that on Feb 29, 2024, Gen1 will be retired, so anyone using Azure Data Lake Gen1 has to migrate to Azure Data Lake Gen2 by that date.
Azure Data Lake Gen1 Pricing?
For the usage of the first 100 TB, you will have to pay an amount of Rs.2.8098 per GB. second 100TB to 1000 TB, you will pay Rs.2.7378 per GB., and the third 1000 TB to 5000 TB the user will have to pay an amount of Rs.2.6657 per GB.