I have been thinking of exploring Azure PAAS offering in Data warehousing/analytics for a while and finally decided to go ahead with 2-3 examples that will help me cover most of the PAAS offering.
In the first example, I am using Event hubs and see if i can try some real time analytics there using azure stream analytics jobs. I assumed that I am sitting in a big mall where thermometers keep sending temperature readings frequently to Azure Event Hubs. Then i am using this data to check temperature every 2 minutes and see if there is no drop/rise in data that indicate some issue with cooling system/heaters.
I started off by provisioning a event hub with 4 partitions in Azure.
Then I made a code in c# which sends seed data to this Azure Event Hub in the same pattern a real thermometer/thermostat would do.
Data was send in JSON format. Here is the example of data send.
Once I am sure data is being received by Events Hubs, I want to do some real time stream analytics and make sure that i get updates for every few minutes for each device.
For Stream Analytics in this example, I Went ahead with azure PAAS offering, Stream Analytics Jobs.
Every Stream Analytics has 3 important setups. Input, thats where our real time data will be picked from (here an Event Hub). Output, where the results of analysis will be saved, (here Power BI) and a query/function which manipulates input data.
See below on how we have setup these 3 in stream analysis.
On the Event Hub Dashboard in portal click input and add these details to attach it to an event hub for input.
Click on test to make sure input endpoint is valid. Then go back to Event Hub Portal and click Output. And then add Power BI as output.
In this case we will setup a query to check Min, Max and Average Temperature every 2 minutes and publish results on power bi so that end user can check if temperature stats are ok.
See below for query –
Save the Query and Then go back to Event Hub Portal. Start the Stream Analytics Service and Check The graph to make sure Input is processed and Output is generated.
Once Output is generated, go to Power BI and see if dataset is created for this exercise and if its getting refreshed.
You should see a new Dataset created.
We can now use this dataset to created any report. I have made a simple table to show data produced every 2 minutes.