#Code to list the contents
dbutils.fs.ls("/")
dbutils.fs.ls("/mnt/")
dbutils.fs.ls("/mnt/destination/")
#Code to mount the storage accunt and view a file.
Input values
storage_account_name = "jbadbstorageaccount"
container_name = "destination"
file_name = "DF.csv"
mount_point = "/mnt/destination"
access_key = "{Access_key}"
Unmount if already mounted
if any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
dbutils.fs.unmount(mount_point)
Mount the storage account
dbutils.fs.mount(
source = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
mount_point = mount_point,
extra_configs = {f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": access_key}
)
Read the CSV file into a Spark DataFrame
df = spark.read.csv(f"{mount_point}/{file_name}", header=True, inferSchema=True)
Display the DataFrame
display(df)
In this part of our Azure Databricks Series, we're focusing on a crucial step in data processing-accessing external data stored in Azure Blob Storage. You might have structured or unstructured data stored in Azure, and now you want to bring it into your Databricks environment for analysis. We'll show you exactly how to:
Mount your Azure Storage account to easily access your data.
List the files and folders inside the mounted container.
Read a CSV file from your Azure Storage account and analyze it directly in Databricks.
Whether you're a data engineer, data scientist, or anyone working with big data, this video will streamline your access to cloud-based data and help you kickstart your analytics workflows with ease. 🎉
2. Why Mount an Azure Storage Account in Databricks? 🤔
When you're working in Databricks, you often need access to external data sources, and Azure Blob Storage is one of the most common cloud storage services for structured, semi-structured, and unstructured data. Instead of downloading and manually uploading files, mounting the Azure Storage account allows you to seamlessly access your files as if they were local to your Databricks workspace. 🔗
Advantages of Mounting:
Seamless access to files without having to upload/download them.
Reduce latency in accessing and processing large datasets.
Centralized data storage, especially useful in data pipelines and ETL processes.
So, let's get started with the mounting process! 🚀
3. Prerequisites for This Tutorial ✔️
Before we jump into the technical steps, here’s what you’ll need to follow along:
Azure Databricks Workspace: Ensure you have an active workspace set up. If not, check out our earlier videos on how to create one. 🏢
Azure Storage Account: Create a storage account in Azure and add a container if you haven’t done so already. 🏗️
Access Keys for the Storage Account: You’ll need the access key for the storage account to authenticate and mount it in Databricks. 🔑
Databricks Cluster: Ensure your Databricks cluster is up and running. ☁️
4. Step 1: Setting Up Your Azure Storage Account 🏗️
In this section, we’ll quickly cover how to set up an Azure Storage account and get the necessary access keys to mount it in Databricks.
Login to Azure Portal (portal.azure.com) 🔐
Navigate to Storage Accounts and click Create.
Fill in the required fields (Subscription, Resource Group, Storage Account Name, Region, and Performance tier).
Click Review + Create to deploy your storage account. ✅
Now that your storage account is created, go to the Access keys section and copy the Key1 value. You'll need this to authenticate Databricks. 🔑
Pro Tip: Store your keys securely using Azure Key Vault for better security! 🔒
5. Step 2: Mounting Azure Storage in Databricks 🔗
Now that we have the storage account ready, let’s mount it in Databricks using the following steps:
Launch Databricks Workspace and create a new notebook. 📝
Start by running the following Python code to configure your storage account:
python
Define the storage account details
storage_account_name = "{YourStorageAccountName}"
storage_account_key = "{YourStorageAccountKey}"
container_name = "{YourContainerName}"
Configure the mount point in Databricks
dbutils.fs.mount(
source = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
mount_point = f"/mnt/{container_name}",
extra_configs = {f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": storage_account_key}
)
This code will mount your Azure Storage container in Databricks at the specified mount point.
Verify the mount by listing the contents of the mounted storage:
python
List the contents of the mounted storage
display(dbutils.fs.ls(f"/mnt/{container_name}"))
If everything works well, you should see a list of files and folders inside your storage container. 🎉
Негізгі бет Ғылым және технология 🏗️Azure Databricks Series: Step-by-Step Guide Accessing and Reading CSV Files from Azure Storage🏗️
Пікірлер