Cortana and Power BI integration

Thursday found me in Manchester with the Power BI user group to discuss Cortana integration with Power BI and the Cortana Intelligence Suite. The group can be found here if you want to attend any of their future events. It is well worth it.

You can watch a 10 minute video here, or you can read the setup below.

What is Cortana?

One of the most common questions that we are asked is what is the difference between Cortana and Cortana Intelligence. Cortana is the personal digital assistant, or as my children tell me, she is my other friend! Cortana will help you find things on your PC, manage your calendar, track packages, find files, chat with you and even tell some funny jokes. The more you use Cortana, the more personalised your experience will be. I have to say that I rely on Cortana heavily in both my personal and professional life on a daily basis and she helps me get my work done on time.

What is Cortana Intelligence?

CortanaPowerBI

Cortana Intelligence is different. Cortana Intelligence is a suite of technologies that provides a platform to deliver data science and/or artificial intelligence solutions. Using this in conjunction with the Team Data Science Process, you have both the platform and the process to deliver successful solutions.

So how does Cortana get involved with this?

Cortana is still the personal digital assistant that is found in the intelligence section of the Cortana Intelligence Suite. You will also notice from the graphic above, that PowerBI provides the dashboard and visualizations that will surface the data that is orchestrated and stored in Big Data Stores.

It is here that Cortana can be integrated with Power BI, so that you can call reports from your Power BI service from either your Windows 10 desktop or your Windows 10 phone.

So how can we make this happen?

First of all, you need to right version of Windows. The minimum version required is 1511, and you can run the command winver in Windows to confirm you have the right version. If you do not have the right version, you will have to upgrade to use this feature.

winver.png

The next step is to create a workplace account for both the windows machine and for Power BI. From a windows machine perspective, the account you login as should match the same account used to login to Power BI. In my case that would be chris.testaoneill@microsoft.com.

However, if this is not the case, you can add additional accounts within Windows using Settings followed by Access Work or School. From here you can click Connect and provide the credentials of a secondary account. Be mindful however there could be Domain Administrator restrictions on performing this operation, so it is worth checking and asking permission if you can perform this operation first.

AddWorkAccount.png

But there’s got to be more to it than that, right?

Yes. The steps above are essentially the provisioning work that is required to get the ball rolling. They are one off operations to get the integration started. There is still work you have to do within Power BI for both datasets and reports.

Within the dataset, you have to allow Cortana access to the dataset. There is a specific checkbox next to this precise setting found in the data refresh properties of the specific  dataset itself. You can also augment the querying of Cortana by adding your own custom questions to the Featured Q&A questions section as well.

AllowCortana

But there is also work to be done within the individual reports as well. Within Power BI Desktop, in the Report properties, you must modify the Page Size property in the format section and set the Type drop down to “Cortana“. In addition you can also use the Page Information section to add alternate names for the report as well.

PageSizes.png

After these setting have been defined, you can then publish the report to your Power BI workspace, and after a few minutes you can use Cortana on either your laptop, or your mobile to ask questions. Your all set!

If you want to try this yourself, here are a couple resources that will get you started:

  1. Enable Power BI Integration with Cortana

https://powerbi.microsoft.com/en-us/documentation/powerbi-service-cortana-enable/

  1. Use Power BI to create a custom Answer Page for Cortana https://powerbi.microsoft.com/en-us/documentation/powerbi-service-cortana-desktop-entity-cards/
Advertisements

ADW1: Data Warehousing in the Cloud.

For a few months now, we have had a SQL Server edition in the cloud known as Azure SQL Data Warehouse. This version enables you to provision a data warehouse instance in just 3 to 5 minutes. Its main benefit is that it allows you to scale your compute in seconds to keep up with your organizations data demands. So, if you are performing heavy data loads, you can maximise the amount of compute for the duration of a data load, only to scale the compute back down once the load is complete. Furthermore, if you don’t require access to your data, you can pause the compute so that you can keep control of your costs, while still retaining the data.

These are some of the business benefits of Azure SQL Data Warehouse, but how does it work?

SQL Data Warehouse is a massively parallel processing (MPP) distributed database system. Behind the scenes, SQL Data Warehouse spreads your data across many shared-nothing storage and processing units. The data is stored in a Premium locally redundant storage layer on top of which dynamically linked Compute nodes execute distributed queries. SQL Data Warehouse takes a “divide and conquer” approach to running loads and complex queries. Requests are received by a Control node, optimized for distribution, and then passed to Compute nodes to do their work in parallel as shown in the following graphic.

SQLDW

Azure Data Warehouse is ideal for analytical workloads whether it is a small workload of GB’s, to a large workload of PB’s. It can also interface with unstructured data stored an Azure Blob Store. If transactional consistency and high concurrency is your requirement, then Azure SQL Data Warehouse is not the service to use. SQL Database would be a more appropriate choice.

It only takes minutes to get this up and running, and you can either do this within the Azure Portal, or alternatively you can use PowerShell.

At this point I will give you a choice, you can either watch the following 10 minute video that demonstrates how to set up an instance of Azure SQL Data Warehouse. Alternatively, you can continue to read. If your hard core, why not do both! But I understand your time is precious.

OK, so there is information that you need to have to hand before creating an Azure SQL Data Warehouse:

DWUI

Armed with this information, you can then go ahead and create the SQL Data Warehouse Instance.

Alternatively, you can use the same information to create a PowerShell script to sign into an Azure Subscription, create a resource group and then create a SQL Server instance, and optionally a database. The following PowerShell code creates a resource group named cto_ads_prep_rg located in North Europe using the New-AzureRmResourceGroup cmdlet.  The script then creates a SQL Server instance named ctomsftadssqlsrv with an admin account named ctesta-oneill using the New-AzureRmSqlServer cmdlet.

######################################################################

##                                               PART I: Creating the Azure SQL Server                                             ##

######################################################################

# Sign in to Azure and set the WINDOWS AZURE subscription to work with

$SubscriptionId = “XXXXXXXX-xxXX-XXxx-XXXX-xxxxxxxxxxxx”

Add-AzureRmAccount

Set-AzureRmContext -SubscriptionId $SubscriptionId

 

# CREATE A RESOURCE GROUP

$resourceGroupName = “cto_ads_prep_rg”

$rglocation = “North Europe”

New-AzureRmResourceGroup -Name $resourceGroupName -Location $rglocation

 

# CREATE A SERVER

$serverName = “ctomsftadssqlsrv”

$serverVersion = “12.0”

$serverLocation = “North Europe”

 

$serverAdmin = “ctesta-oneill”

$serverPassword = “P@ssw0rd”

$securePassword = ConvertTo-SecureString –String $serverPassword –AsPlainText -Force

$serverCreds = New-Object –TypeName System.Management.Automation.PSCredential –ArgumentList $serverAdmin, $securePassword

$sqlDbServer = New-AzureRmSqlServer -ResourceGroupName $resourceGroupName -ServerName $serverName -Location $serverLocation -ServerVersion $serverVersion -SqlAdministratorCredentials $serverCreds

 

You can also use PowerShell to configure firewall settings on the SQL Server instance using the New-AzureRmSqlServerFirewallRule cmdlet. This can be performed in the Azure Portal as well.

 

# CREATE A SERVER FIREWALL RULE

$ip = (Test-Connection -ComputerName $env:COMPUTERNAME -Count 1 -Verbose).IPV4Address.IPAddressToString

$firewallRuleName = ‘Client IP address’

$firewallStartIp = $ip

$firewallEndIp = $ip

$fireWallRule = New-AzureRmSqlServerFirewallRule -ResourceGroupName $resourceGroupName -ServerName $serverName -FirewallRuleName $firewallRuleName -StartIpAddress $firewallStartIp -EndIpAddress $firewallEndIp

 

With the firewall rules defined, you will then be able to access to Azure SQL Server using tools such as Visual Studio and SQL Server Management Studio, where you could run T-SQL scripts to create and manage database. Although this can be done in PowerShell using the New-AzureRmSqlDatabase cmdlets as well. The following code creates a data warehouse named ContosoRetailDW.

 

# CREATE A SQL DATABASE

$databaseName = “ContosoRetailDW”

$databaseEdition = “DataWarehouse”

$RequestedServiceObjectiveName = “DW400”

$sqlDatabase = New-AzureRmSqlDatabase -ResourceGroupName $resourceGroupName -RequestedServiceObjectiveName $RequestedServiceObjectiveName -ServerName $serverName -DatabaseName $databaseName -Edition $databaseEdition.

Once a database has been created, you can then understand how to scale and Pause a database in Azure SQL Data Warehouse, but that will be the next blog.

 

 

The learnanalytics.microsoft.com portal

So where can we learn about all the features and capabilities of the Cortana Intelligence Suite. Well here at Microsoft we are looking to address this through our new portal:

www.learnanalytics.microsoft.com

This site is evolving all the time as we start to add webinars and videos. What’s really interesting is the wide range of in-person training events that are available all around the world. Not only is this training done by Microsoft but we also partner with some of the most talented speakers and professionals in the world.

Why not learn Machine Learning with Rafal Lukawiecki from Project Botticelli, or from Presciient. Look into big data technologies such as Apache Spark with data bricks, there is a huge list available and it keeps growing.  Take a look at our training page for the latest courses.

 

Why use the Cortana Intelligence Suite?

“We want a real time data warehouse” said the IT Director.

“OK, so at the moment your doing a daily refresh of the warehouse and the cubes, right?” I replied.

“That’s right, but we have always wanted real time” was the reply.

“Well this is possible, but there would have to be a lot of investments made. First you would have to upgrade the disks from the 10K RPM that you have and have a combination of at least 15K RPM for the warm data and SSD’s for the hot data. Also, I think that the throughput on the backplane on the SAN is too slow, we need to get them…..”

“Hold on a minute!” interrupted the IT Director “I was hoping we can make use of the kit we have.”

“OK, but to get real time, we need to have the supporting hardware to meet that need, and if your not willing to make the investments, I can’t see how it could be achieved. Lets start again. What does real time really mean for your business….”

And after a couple of hours…

“OK, so you agree that at best, with the infrastructure we have in place, we are looking at a half day refresh of the cubes?” I conclude.

“Yes, but if you can sprinkle some of your magic and get that down, that would be great!”

Sound familiar?

Let’s consider another conversation

“We have a range of data source, name relational, but we do have some spreadsheets and XML files that we want to incorporate within the solution.” said the BI developer.”Oh, and Delia in procurement has an access database with information that we use as well”

“Sure, that wont be a problem. We can extract those files and place them in the staging area using SSIS.”

“We have video in the stores that record activity near the tills.  We have always wanted to be able to store this data. At the moment they are on a tape and they are overwritten every week. We have had instances where we have had to review the videos from over a month old. Could we accommodate that?” States the Solution Architect.

“Its possible, but the impact of the size and nature of the data would mean that there would be a performance impact on the solution. Is it a mandatory requirement….”

Again. Does this sound familiar?

In many of the Business Intelligence projects I have undertaken, there is usually an area of compromise. The most common aspect is with real time or near real time data warehousing. The desire for this outcome is there. This is not a surprise as the velocity with which a business requires their information can impact the operational reports that can help to make decisions. But once there is a realisation of the capital and operational costs to implement a solution that is even near real time, the plans are abandoned due to complexity and cost, or the business will decide to pull “reports” from the OLTP system to meet their needs. Which we all know will place contention on the OLTP system.

The same decisions can also occur when it come to the types of data being sourced for a solution. Many project can handle the data that is required for a BI project, but increasingly, there is a thirst for a greater variety of data to become part of the overall solution. Traditional BI solutions can struggle to make chronological use of media files, and even at a simple level, other files types may be abandoned. And its not just the unstructured types of data that are not used. The sheer volume of the data sizes can be overwhelming for an organisations infrastructure.

This is where the Cortana Intelligence Suite can help. Notice that the keys variables for compromise in the above scenarios involve speed, data types and size. Put another way, some potential blockers to solution adoption can be because the infrastructure cannot handle the velocity, variety and/or volume of the data. These are the common three tenants that define the characteristics of a Big Data solution. And the Cortana Intelligence Suite has been engineered to specifically accommodate these scenarios, whilst at the same time still providing the ability to work with relational data stores.

So which technologies can help in these scenarios

Velocity

A couple of options spring to mind. You could make use of the Azure Streaming Analytics service to deal with real time analytical scenarios. Stream Analytics processes ingested events in real-time, comparing multiple streams or comparing streams with historical values and models. It detects anomalies, transforms incoming data, triggers an alert when a specific error or condition appears in the stream, and displays this real-time data in your dashboard. Stream Analytics is integrated out-of-the-box with Azure Event Hubs to ingest millions of events per second. Event driven data could include sensor data from IoT devices.

A second option can include using Storm Cluster to process real time data. Storm is a distributed real-time computation system for processing large volumes of high-velocity data. Storm is extremely fast, with the ability to process over a million records per second per node on a cluster of modest size.

Volume

A range of technologies can be used to handle volume. Azure SQL Data Warehouse can provide storage for TB of relational data that can integrate with unstructured data using PolyBase.

HDInsight is an Apache Hadoop implementation that is globally distributed. It’s a service that allows you to easily build a Hadoop cluster in minutes when you need it, and tear it down after you run your MapReduce jobs to process the data. It can handle huge volumes of semi and unstructured data. There are variation of different types of Hadoop clusters. We have seen two examples here with Storm and HDInsight. I will write another blog about the differences in another post.

Variety

A number of the CIS technologies can handle a wide range of data types. For example you can use a CTAS statement to point to a semi structured file to be decomposed before storing in Azure Data Warehouse. Azure Data Factory can connect to a wide range of data types to be extracted. Azure Data Lake can act as repository for storing the data.

The Cortana Intelligence Suite provides a wide range of technologies that can deal with the barriers to complete BI solutions in the past. As the infrastructure is hosted in Azure, these blockers can now be removed with consideration primarily being given to the operational expenditure required to implement a solution, and less on the capital expenditure. So the IT Director may be more inclined to look into the potential of a real time solution.

So next time you are around a table talking about your latest BI project. Remember that the Cortana Intelligence Suite has the capabilities to deal with velocity, volume and variety, take a look at this interview I did in Holland for a summary.

 

 

It’s not all about Data Science!

I hear you. The Cortana Intelligence Suite (CIS) has an immense depth of technologies that can be used for many industries and business scenarios. So over the last few week I have been receiving feedback from partners and members of the community who have attended my sessions as to which technology they thought was the easiest entry point into the CIS. The overwhelming response was Azure SQL Data Warehouse, which represents one of the Big Data Stores that we have available in our technology suite.

I appreciate that my sample may have been biased given the industry demographics I reach out to. Anyway, rather than blog about it, I thought I would record some videos for you to watch to see how easy it is to get started with your own data warehouse.

Part one goes through an introduction to Azure SQL Data Warehouse, the concepts of Massively Parallel Processing (MPP) and the range of tools that you can use to interact with the Data Warehouse

Part two goes into depth on the importance of Table geometries and how they work with the MPP engine in Azure Data Warehouse.

Part three explores how you can load data into Azure Data Warehouse with a range of tools and demonstrating the use of PolyBase.

I hope you find these videos of use

 

 

 

And now, for our “Features” presentation.

In my last blog, I discussed how members of my team helped me to understand how the techniques that I used in data migrations and data warehousing could contribute immensely to the world of Data Science. Namely through the discipline of Features Engineering. Before you read this blog, it would first be well worth watching Buck Woody’s short video on an Introduction to the Cortana Intelligence Suite (CIS) to give you some context. I will see you back here in 20 minutes……

…. Ok, so in Buck’s presentation he talks about the CRISP-DM process and how it maps to the Cortana Intelligence Suite of technologies. In the Data Preparation part of the process, he explains how we can generate Feature and Labels, and how these can be consumed within Machine Learning models, R or Python to perform predictive or pre-emptive analytics.

Feature selection refers to the process of selecting the input data for processing and analysis, or of finding the most meaningful inputs. A related term, feature engineering (or feature extraction), refers to the process of extracting useful information or features from existing data.

Hold on a minute! Don’t we do this already as SQL Server professionals? Well, we do. If you have worked on a BI or data warehouse project, or perhaps you have done a data migration, one of the most important aspects of these project is to identify the source data that we require to migrate to the new system or to a data warehouse. During the load we are highly likely to cleanse and transform that data to make it useful for our destination.

Within Data Science, it is widely regarded that Features Engineering is an important aspect of the field. The Data Scientists who I work with agree that it is just as, if not more important than, choosing the right model to use in your machine learning experiments.

It seems to be a subject that is talked about very little in the Data Science world, but it is a discipline that can utilise many of the skills that a SQL Server Professional possesses. Don’t get me wrong, there are still plenty of new skills to be learned in this process and in using the Cortana Intelligence Suite, but it is worth knowing that the skills that we currently have will be useful in this arena.

So now I’m really starting to feel like I am part of a Data Science team. It’s a bit like a team that performs cardiovascular surgery. OK, so I may not be the surgeon, or the anaesthetist, but as the scrub nurse it’s important that I pass on the right tools for the job and that those tools are clean. And my role is seen as just as important as the surgeon who ties the knots. In fact, Ryan Swanstrom has written a great post on Data Science and the Perfect Team.

The first task in the data preparation phase is to extract the features from the source data. At this point you may cleanse it, transform it or indeed augment the data with additional data. In the past SQL Server Integration Services (SSIS) would have been the tool of choice to do this. And you can still use SSIS, however you have the ability to extract and perform data manipulations with the following technologies in the Cortana Intelligence Suite.

  • Azure Data Factory – a service that orchestrates data movement and cleansin in Azure
  • R – an open source language that can perform data manipulation and statistical analysis
  • Azure Streaming Analytics – a service that simplifies complex event processing using standard Transact-SQL statement to manipulate datasets.

I will go through these tools in future blogs, and I know R isn’t strictly in the CIS, but as you become more familiar with the language you will find that it can be integrated with many of the CIS technologies that are available, to great effect.

And for completeness. Labels are the attributes that you are trying to predict in a model. So Features Engineering plays a particularly important role in ensuring that your features are primed as best as they can given the business understanding of the data. But remember, like Buck states in his video, you may not even do any modelling with the features that you select, and just push the features to a Power BI Dashboard for descriptive analytics.

So what we have established in this blog is that we already have some of the skillsets as  SQL Server professionals that can contribute to data science through the field of Features Engineering. Its time to get the popcorn and look into this a bit more.

Welcome Aboard!

“With respect, I have to disagree with you.” was one of my responses in a recent interview that has lead me to my current role. I was on my seventh interview, and I was explaining to one of my future team mates why I didn’t regard myself as a data scientist.

At the end of the day I had spent well over 16 years working with SQL Server, specialising in Business Intelligence and descriptive analytics. If I was to call myself a data scientist, I would be regarded as a pretty bad one. And the rhetoric that had been coming out of the industry was that I knew I had to look into this field, but I would only be an apprentice data scientist at best.

That was my thought anyway, but it turns out that there are existing skills that a SQL Server Data Professional possesses that is extremely useful to data science. Particularly in the realms of “Features Engineering” –  I will talk at length about this in another post. Unbeknown to me during numerous interviews, I described techniques and technologies I had used in the world of Data Migrations, Data Warehousing and Cubes that resonated with the resident data scientists. I received the call soon after saying “You’re hired.”

The last 18 months have seen me immersed in the world of cloud migrations and I have been around many conferences presenting content on the Cortana Intelligence Suite (CIS).  At each conference the same themes keep recurring as I am questioned about the CIS sessions I run.  Questions such as “How can I keep pace with all these changes?” and “Are there any links to information that can help me understand this?”.

So if you are feeling “Where do I start!”, then the good news is you are not alone, and this blog, with the other resources that we are providing is intended to provide you with the information to help you augment your existing skills with the skills and technologies required to implement Cortana solutions.

The first thing you should do is browse our team’s portal that has a list of training events around the globe, and video materials. Content will be added to this site fairly regularly in the coming months. I would recommend that you visit Buck Woody’s blog , Backyard Data Science, where the focus is looking at the processes, the statistics and the technology hat fits around data science space. And Ryan Swanstrom blog, Data Science 101 also provides useful insights into the world of data science. Let me know in the comments below what areas you would like to look at and I will get them on the blog as soon as I can.

Welcome aboard! I hope you will find future articles useful as we both learn how to navigate this new and exciting world!