Azure SQL Data Warehouse

data : https://rbdevdwhstorage.blob.core.windows.net/rbdevdwhtestdata/FlightData.csv

In my 3rd example, I explored SQL Data warehouse on AZURE.

The idea was to take a half a million rows CSV and then process it using SQL Data Warehouse and see the performance.

I provisioned one azure storage where i saved my test data and then one azure sql data warehouse of 100 dTUs. (least in data warehouses).

Snapshot of data stored to Azure Storage –

w1

Azure SQL Data Warehouse. Copy the Server Name from here and open management studio.

w2.jpg

Using SQL Management studio, connect to this data warehouse like any MSSQL database.

w3.jpg

 

Once you are connected, open new query windows.

First step is to securely connect to the source data, here a csv with half a million rows.

CREATE MASTER KEY;

CREATE DATABASE SCOPED CREDENTIAL AzureStorageCreds
WITH IDENTITY = ‘[identityName]’
, Secret = ‘[azureStorageAccountKey]’
;

CREATE EXTERNAL DATA SOURCE azure_storage
WITH
(
TYPE = HADOOP
, LOCATION =’wasbs://[containername]@[accountname].blob.core.windows.net/[path]’
, CREDENTIAL = AzureStorageCreds
);

next define the format of the file we will be reading

CREATE EXTERNAL FILE FORMAT text_file_format
WITH
(
FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS (
FIELD_TERMINATOR =’,’,
STRING_DELIMITER = ‘”‘,
USE_TYPE_DEFAULT = TRUE
)
);

next step is to make an external table, which is kind of a view to our source data

CREATE EXTERNAL TABLE FlightData
(
[Year] varchar(255),
[Quarter] varchar(255),
[Month] varchar(255),
[DayofMonth] varchar(255),
[DayOfWeek] varchar(255),
FlightDate varchar(255),
UniqueCarrier varchar(255),
AirlineID varchar(255),
Carrier varchar(255),
TailNum varchar(255),
FlightNum varchar(255),
OriginAirportID varchar(255),
OriginAirportSeqID varchar(255),
OriginCityMarketID varchar(255),
Origin varchar(255),
OriginCityName varchar(255),
OriginState varchar(255),
OriginStateFips varchar(255),
OriginStateName varchar(255),
OriginWac varchar(255),
DestAirportID varchar(255),
DestAirportSeqID varchar(255),
DestCityMarketID varchar(255),
Dest varchar(255),
DestCityName varchar(255),
DestState varchar(255),
DestStateFips varchar(255),
DestStateName varchar(255),
DestWac varchar(255),
CRSDepTime varchar(128),
DepTime varchar(128),
DepDelay varchar(128),
DepDelayMinutes varchar(128),
DepDel15 varchar(255),
DepartureDelayGroups varchar(255),
DepTimeBlk varchar(255),
TaxiOut varchar(255),
WheelsOff varchar(255),
WheelsOn varchar(255),
TaxiIn varchar(255),
CRSArrTime varchar(128),
ArrTime varchar(128),
ArrDelay varchar(255),
ArrDelayMinutes varchar(255),
ArrDel15 varchar(255),
ArrivalDelayGroups varchar(255),
ArrTimeBlk varchar(255),
Cancelled varchar(255),
CancellationCode varchar(255),
Diverted varchar(255),
CRSElapsedTime varchar(255),
ActualElapsedTime varchar(255),
AirTime varchar(255),
Flights varchar(255),
Distance varchar(255),
DistanceGroup varchar(255),
CarrierDelay varchar(255),
WeatherDelay varchar(255),
NASDelay varchar(255),
SecurityDelay varchar(255),
LateAircraftDelay varchar(255),
FirstDepTime varchar(255),
TotalAddGTime varchar(255),
LongestAddGTime varchar(255),
DivAirportLandings varchar(255),
DivReachedDest varchar(255),
DivActualElapsedTime varchar(255),
DivArrDelay varchar(255),
DivDistance varchar(255),
Div1Airport varchar(255),
Div1AirportID varchar(255),
Div1AirportSeqID varchar(255),
Div1WheelsOn varchar(255),
Div1TotalGTime varchar(255),
Div1LongestGTime varchar(255),
Div1WheelsOff varchar(255),
Div1TailNum varchar(255),
Div2Airport varchar(255),
Div2AirportID varchar(255),
Div2AirportSeqID varchar(255),
Div2WheelsOn varchar(255),
Div2TotalGTime varchar(255),
Div2LongestGTime varchar(255),
Div2WheelsOff varchar(255),
Div2TailNum varchar(255),
Div3Airport varchar(255),
Div3AirportID varchar(255),
Div3AirportSeqID varchar(255),
Div3WheelsOn varchar(255),
Div3TotalGTime varchar(255),
Div3LongestGTime varchar(255),
Div3WheelsOff varchar(255),
Div3TailNum varchar(255),
Div4Airport varchar(255),
Div4AirportID varchar(255),
Div4AirportSeqID varchar(255),
Div4WheelsOn varchar(255),
Div4TotalGTime varchar(255),
Div4LongestGTime varchar(255),
Div4WheelsOff varchar(255),
Div4TailNum varchar(255),
Div5Airport varchar(255),
Div5AirportID varchar(255),
Div5AirportSeqID varchar(255),
Div5WheelsOn varchar(255),
Div5TotalGTime varchar(255),
Div5LongestGTime varchar(255),
Div5WheelsOff varchar(255),
Div5TailNum varchar(255)
)
WITH
(
LOCATION = ‘/’,
DATA_SOURCE = azure_storage,
FILE_FORMAT = text_file_format,
REJECT_TYPE = value,
REJECT_VALUE = 100000
);

Once this done, we will run one final command to pull data to data warehouse

CREATE TABLE FlightDataStaging
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT * FROM FlightData

Once this is done, lets run a group by command on all the data and see how long it takes.

w4.jpg

you will notice it barely took 2 seconds.

If we want, we can also link our data warehouse to powerBI , and display results.

I have tried to run similar report as above and it came up in no time.

w5.jpg

azure sql backup to azure storage

Use the following script to backup azure sql to storage.
replace database details and storage details in the scripts below.
Also you would need to import your azure publishing file to run this script.
refer to this article on how to azure publishing file.

https://devopsandcloud.wordpress.com/2017/01/21/download-and-import-publish-settings-and-subscription-information/

 

# Check if Windows Azure Powershell is avaiable
try{
Import-Module Azure -ErrorAction Stop
}catch{
throw “Windows Azure Powershell not found! Please make sure to install them from http://www.windowsazure.com/en-us/downloads/#cmd-line-tools”
}

Import-AzurePublishSettingsFile “C:\jenkinsjobs\Pay-As-You-Go.publishsettings” #replace with your publishing file path

$DatabaseServerName=”azureservername.database.windows.net”
$DatabaseName= “DBName”
$DatabasePassword=”azure sql password”
$DatabaseUsername=”azure sql user”
$StorageName=”storage name”
$StorageKey=”storage key”
$StorageContainerName=”containername”
$dateTime = get-date -Format u
$blobName = “$DatabaseName.$dateTime.bacpac”
Write-Host “Using blobName: $blobName”

# Create Database Connection
$securedPassword = ConvertTo-SecureString -String $DatabasePassword -asPlainText -Force
$serverCredential = new-object System.Management.Automation.PSCredential($DatabaseUsername, $securedPassword)
$databaseContext = New-AzureSqlDatabaseServerContext -FullyQualifiedServerName $DatabaseServerName -Credential $serverCredential

# Create Storage Connection
$storageContext = New-AzureStorageContext -StorageAccountName $StorageName -StorageAccountKey $StorageKey

# Initiate the Export
$operationStatus = Start-AzureSqlDatabaseExport -StorageContext $storageContext -SqlConnectionContext $databaseContext -BlobName $blobName -DatabaseName $DatabaseName -StorageContainerName $StorageContainerName

# Wait for the operation to finish
do{
if ($operationStatus)
{
$status = Get-AzureSqlDatabaseImportExportStatus -Request $operationStatus
if ($status){
Start-Sleep -s 3
$progress =$status.Status.ToString()
Write-Host “Waiting for database export completion. Operation status: $progress”
}
else
{
Write-Host “Null Status. Awating updates.”
}
}
}until ($status.Status -eq “Completed”)
Write-Host “Database export is complete”

unpack bacpac files and ease migrations

There were days when i spend hours watching my bacpac files slowly getting restored and then suddenly crashing :/ I have seen this issue often where a table with large amount of data would crash in between restore and i had no other option as i didn’t knew what to do with bacpac files.

But after some reading I found that the bacpac file was just a renamed .zip file and could be extracted and perhaps even edited. This gave me a way out. I extracted the bacpac file to see what I could find.

Here, nicely laid out is the package that is a bacpac file. Opening the Data\ folder will show you the data that is targeted for each table nicely arranged by table name.

bacpac

This made up set of tables shows the layout. Simply choose the tables that have expendable data, such as log entries, error logs, traces etc. Open those files and delete the .BCP files that are contained in them.

Now just re-compress those files, rename the resulting .zip files to .bacpac and you are good to go. One note: when you re-zip those files, make sure to do so from the root of the bacpac folder that was extracted, not the parent folder you extracted to. If you compress the whole folder, the nesting of the files will be off and you will get an error on import.

External Connection to a Windows Azure VM SQL Server

1. Within Windows Azure, navigate to ‘ENDPOINTS’ and Select ‘ADD’ at the bottom of the page.

2. You will need to open up a port in order to allow access to the instance of SQL Server. I’m going to stick with the default port of 1433 and create a new endpoint here. Make sure Both Check boxes (circled below) are NOT ticked

3. Next up we need to configure the firewall on the server itself. Open an RDP session and log into the Azure VM. Open Control Panel > Windows Firewall and configure a new inbound rule in the Windows firewall for port 1433:

4. Select a rule type of ‘Port’. The default port for SQL Server is 1433 but this can be different depending on how the SQL Server properties have been set up. Simply type ‘tcp setting for SQL Server’ in a search engine and this will tell you where to find or change the port number.

After clicking next, the options are as follows:

· Select TCP and specify port 1433

· Next, select ‘Allows This Connection’

· Then, select when you want the rule applied – typically ‘Public.’ This will enable external connection to the port.

· Lastly, assign this rule a name (e.g SQL Port)

5. The next step is to verify that SQL Server is enabled to use the TCP protocol. Ensure that ‘Named Pipes’ is set to DISABLED. You can do all these settings by using SQL configuration manager.

6. Check that SQL Server is configured to use mixed mode authentication. Right Click on the server (within SSMS) and toggle to the ‘Security’ page and verify that the radio button ‘SQL Server and Windows Authentication Mode’ is selected.

7. Create a test login

8. Restart the SQL Server. It is important the service is restarted, otherwise, any changes to the tcp or pipelines will n bot have been committed.

9. The final step is to login in from your local SSMS. Specify the full server name (taken from Azure) and the new user credentials created.