Azure data factory – MYSQL to Azure SQL

The following steps describe how to move data from on premise MYSQL server to MSSQL on Azure.

Every data factory job has 4 key components –

Gateway, Linked services, Source and Pipeline.

Gateway here is what provides access to your MYSQL server. Usually when setting up data factory on azure portal, you will get a link to download and install gateway on the server. I did the other way around. Went to this link to install the gateway https://www.microsoft.com/en-au/download/details.aspx?id=39717

then copied the key from portal for the data factory and in few seconds i could see on the portal that data factory shows 1 gateway online.

I have one VM with a static IP that can connect to all the test DBs and gateway installed. I just keep changing the key and keep connecting to different data factory.

Next are your linked service. These are like your connection strings to the various servers. I have below added the source and destination exmaples for MYSQL source and MSSQL azure target. But you can always change them. You get more details on how to change JSON here based on source and target here : https://msdn.microsoft.com/en-us/library/azure/dn835050.aspx

{
“name”: “SOURCE”,
“properties”: {
“description”: “”,
“server”: “127.0.0.1”,
“database”: “DBName”,
“schema”: “”,
“authenticationType”: “Basic”,
“username”: “user_id”,
“password”: “**********”,
“gatewayName”: “GatewayName-CheckPortal”,
“encryptedCredential”: null
“type”: “OnPremisesMySqlLinkedService”
}
}

 

{
“name”: “TARGET”,
“properties”: {
“description”: “”,
“connectionString”: “Data Source=tcp:azuresql.database.windows.net,1433;Initial Catalog=DBName;User ID=User_Id;Password=**********;Encrypt=True;TrustServerCertificate=False;Application Name=\”Azure Data Factory Linked Service\””,
“type”: “AzureSqlLinkedService”
}
}

 

Next are your datasets. Here you are defining the location of data within your source and target. the main property here is location which gives details on dataset type and name, and then is tied back to the service. So here your are defining your source of data within the servers defined above. Again visit this link incase you want to change your endpoints : https://msdn.microsoft.com/en-us/library/azure/dn835050.aspx

{
“name”: “Source”,
“properties”: {
“published”: false,
“location”: {
“type”: “RelationalTableLocation”,
“tableName”: “tableName”,
“linkedServiceName”: “SOURCE”
},
“availability”: {
“frequency”: “Hour”,
“interval”: 1,
“waitOnExternal”: {}
}
}
}

{
“name”: “Target”,
“properties”: {
“published”: false,
“location”: {
“type”: “AzureSqlTableLocation”,
“tableName”: “TableName”,
“linkedServiceName”: “TARGET”
},
“availability”: {
“frequency”: “Hour”,
“interval”: 1
}
}
}

What connects this in the end is the pipeline. The type here is ‘CopyActivity’ since we are moving copying data, source query has the data we want to move and then target has procedure name we are calling to move data to azure SQL. You can keep this simple by just giving the table name on azure SQL. Example : https://msdn.microsoft.com/en-us/library/azure/34d563cf-1163-47e5-96b8-9c7aec5f37d2#TableSink

{
“name”: “Pipeline_MySQL_To_AzureSQL”,
“properties”: {
“activities”: [
{
“type”: “CopyActivity”,
“transformation”: {
“source”: {
“type”: “RelationalSource”,
“query”: “select * from tableName limit 1000;”
},
“sink”: {
“type”: “SqlSink”,
“sqlWriterStoredProcedureName”: “spOverwriteSomeName”,
“sqlWriterTableType”: “SomeTableType”,
“writeBatchSize”: 0,
“writeBatchTimeout”: “00:00:00”
}
},
“inputs”: [
{
“name”: “Source”
}
],
“outputs”: [
{
“name”: “Target”
}
],
“policy”: {
“timeout”: “01:00:00”,
“concurrency”: 1,
“executionPriorityOrder”: “NewestFirst”,
“retry”: 2
},
“name”: “MySQLToBlobCopyActivity”
}
],
“start”: “2015-07-12T13:00:00Z”,
“end”: “2015-07-12T16:00:00Z”,
“isPaused”: false
}
}

 

few notes from my experience :

make sure to add waitforexternal in your linked service, else you will see pending execution or pending validation on the portal. If you dont know what to add there, just keep it blank and defaults will be picked up.

make sure the gateway has mysql connector 6.6.5 installed if you are trying to connect to mysql server. See this https://msdn.microsoft.com/en-us/library/mt171579.aspx and then get this http://dev.mysql.com/downloads/file.php?id=412152

i also had the target tables ready with proper clustered index so that no error is thrown in case source didnt have a clustered index.