from pyspark.sql import SparkSession
Create a Spark Session
spark = SparkSession.builder.getOrCreate()
Define the storage account and container details
storage_account_name = 'jbadbstoracc'
storage_account_key = 'ocARItF+njURDzNDo3u3cBldthxFb11nJ9y19htfFziM0wkc3PO/9icKtKlIJj9YS23pBB7IpXXZ+ASt3UNSAA=='
container_name = 'source'
Define the configuration for the Azure Storage account
spark.conf.set( "fs.azure.account.key." + storage_account_name + ".dfs.core.windows.net", storage_account_key)
Define the path to the CSV file
file_path = "abfss://" + container_name + "@" + storage_account_name + ".dfs.core.windows.net/" + "HDFCBANK.csv"
Read the CSV file into a DataFrame
df = spark.read.format('csv').option('header', 'true').load(file_path)
Display the DataFrame
df.show()
🔑 Why Security Matters for Databricks and Storage Accounts
Before we jump into the step-by-step guide, let's talk about why security is such a critical factor when connecting Databricks to Azure Storage. In most scenarios, organizations store sensitive data such as customer information, proprietary data, or intellectual property in cloud-based storage solutions like Azure Blob Storage or Azure Data Lake Storage. 📊
Having this storage account publicly accessible, even by mistake, can result in severe data breaches, non-compliance with regulations like GDPR, and financial losses. 💸📉 That’s where Private and Service Endpoints come into play! These endpoints allow you to securely access storage accounts without exposing them to the internet. 🌐🚫 Let’s break down both methods.
🎯 Method 1: Accessing Storage via Private Endpoint
🔍 What is a Private Endpoint?
A Private Endpoint in Azure enables secure connectivity between Azure Databricks and your Azure Storage Account through a private IP address. 🏠 This IP address is allocated to your storage account from the Azure Virtual Network (VNet), meaning your traffic stays within the Azure backbone network and never traverses the public internet. 🌍
This ensures that:
Your storage account is inaccessible from the public internet ❌🌐
Data transfers between your Databricks workspace and storage are isolated and secure 🔒
Enhanced security posture with VNet integration 🛡️
👨🏫 Why Use a Private Endpoint?
Private Endpoints are ideal if:
Security compliance is a top priority.
You need to enforce network isolation for regulatory or organizational policies.
You want to prevent data exfiltration risks that come with publicly exposed endpoints.
💻 How to Set Up a Private Endpoint
Let’s get into the steps:
Create a Private Endpoint for the Storage Account:
Go to your Azure Portal and navigate to your Storage Account.
Under Networking, choose Private Endpoints.
Click on Add Private Endpoint, and follow the wizard to select your VNet and subnet.
Configure DNS Settings:
Ensure that your Azure DNS or Custom DNS resolves the private endpoint's IP address to the storage account's FQDN.
Access from Azure Databricks:
In your Databricks notebook, use the following code to mount the storage account:
storage_account_name = "your_storage_account"
container_name = "your_container"
mount_point = "/mnt/your_mount"
dbutils.fs.mount(
source = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
mount_point = mount_point,
extra_configs = {"fs.azure.account.key.your_storage_account.blob.core.windows.net":dbutils.secrets.get(scope = "your_scope", key = "your_key")}
)
Validate the Connection:
Once mounted, list the files in the storage account to verify the connection:
display(dbutils.fs.ls("/mnt/your_mount"))
By following these steps, your storage access is now isolated through a Private Endpoint, offering a much higher level of security than public access. 🚀🔐
🎯 Method 2: Accessing Storage via Service Endpoint
🔍 What is a Service Endpoint?
A Service Endpoint allows you to extend your VNet identity to Azure services, like Azure Storage, over Azure’s backbone network. 📡 However, unlike Private Endpoints, Service Endpoints still allow the storage account to remain publicly accessible, though the traffic flows through a secured Azure backbone network. 🔄
While this may seem like a small distinction, the key benefit here is simplicity. Service Endpoints are easier to configure and manage, especially if you're handling multiple services and need a quick but secure way to connect to Azure Storage. ⚡
👨🏫 Why Use a Service Endpoint?
Service Endpoints are ideal for scenarios where:
You need less stringent network isolation but still want to ensure that data traverses securely via the Azure backbone.
You want to simplify the configuration process.
You need high-performance connectivity without managing private DNS configurations.
💻 How to Set Up a Service Endpoint
Let’s go through the steps to set up a Service Endpoint:
Py4JJavaError: An error occurred while calling o616.load.
: Operation failed: "This request is not authorized to perform this operation.", 403
Негізгі бет Ғылым және технология 🔍Azure Databricks Series: Enhancing Security with Private and Service Endpoints for Storage Access🔍
Пікірлер