Francisco Candeias

fcec

Data Analytics Engineer

BI Framework: why you should have one for your data pipelines

5 SECONDS-SUMMARY:
  • There are some steps to getting a framework to work properly;
  • Ingestion, Processing, Azure DevOps, Monitor.

Nowadays, every company has data needs. The way that data is treated and governed has a lot of processes that can be overwhelming at some point. Sometimes, when working with company data, your employees, especially developers, can see themselves developing the same processes or data pipelines for different projects. This is why an organised structure like a framework is so important.

With it, you can standardise your processes in order to better control and monitor your data pipelines.

This approach will help with establishing best practices and avoiding turning your data lake into a data swamp while at the same time keeping the specific business logic for each data pipeline independent – meaning that you will still be able to handle each data set uniquely and get the most information out of it. There are some steps to getting a framework to work properly, so let’s see them.

Ingestion

Data ingestion is a crucial step in any data processing pipeline and one that must be replicated many times to gather all the necessary data. Using a well-designed framework to ensure that the data is efficiently extracted from various sources and loaded into the target data storage, you can ensure that you always know how many rows have been imported and where the data landed – and have a concise, robust process with a single point of change to cope with future changes.

Normally, the data ingestion framework extracts data from different sources, like SQL Server databases, .csv files, etc., that can be loaded into cloud storage, such as Azure Data Lake Storage (ADLS) Gen 2. This step is where the connections of data sources are made and where the data is prepared to be ingested into storage. In addition, it’s during this step that the load logic is configured, whether it’s a full or incremental load. Basically, it means that you can retrieve all the data every time your process runs or retrieve only new data.

Keeping raw data in the storage means that for the next step, which is processing, all data is available even if not immediately used, meaning later, you can take advantage of unused fields and achieve comprehensive data tracking.

Processing

Transforming data into structured tables is a critical step in the analytical process. Once the data ingestion process is complete, the next step is to process and transform the data into a structure that can be used for analytical purposes to achieve effective data analysis.

These data pipelines work to clean and maintain data quality and perform calculations, especially the ones you need, which normally are KPIs – and while doing this, provide alerts if something goes wrong when the process runs. While most of the transformations will be specific to the business model being processed, there are logical recurrent tasks, such as monitoring data quality and execution or even deleting data for reprocessing. These recurrent tasks can be standardised, allowing data engineers to easily fit any specific business logic into the process.

Let’s use the Azure Synapse example. After data ingestion, this data is accessed using external tables that read the ADLS Gen 2 directory, where the files for a specific table are stored. To process and transform data, two types of processing pipelines can be developed, one to create dimensions and the other to create fact tables. Both processes make use of Synapse’s Data Flow tool, which is a cloud-based data transformation tool that allows users to create and manage large-scale data transformation operations.

When the data flow is finished, the result is a SQL table that is saved in the Synapse Dedicated Pool. Those tables can be used for analytics solutions like Power BI dashboards which include a connection for a Synapse Dedicated Pool that allows users to import tables or do direct queries.

fact table processing

Example of a simple fact table processing using data flow

Azure DevOps

Azure DevOps is a collection of technologies that offers a complete solution for agile development and DevOps. This suite contains numerous technologies that can assist developers in managing the whole development process, including version control, continuous integration and continuous delivery. Regardless of the development approach, leveraging several environments is an industry-recognised best practice. This ensures that any created information is thoroughly evaluated, both technically and by business users before it is made public or applied in a production context.

Normally those environments are Development, Testing or Quality Assurance (QA) and Production. In the Development environment, data engineers develop and build pipelines. When these pipelines are in a working, stable state, they are then promoted to the QA environment, where the outputs can be tested by the end users and/or developers. If no further development is needed, the pipelines can be deployed to Production.

While data pipelines are developed differently from code, it doesn’t mean that you should leave DevOps out. In an Azure Synapse Analytics workspace, Continuous Integration/Continuous Delivery (CI/CD) moves all entities from one environment (development, QA, production) to another environment. These templates will be used to deploy your features in the QA and Production environments.

This approach provides several advantages. Firstly, it minimises the impact of changes made by other teams/developers. Secondly, downtime and risk are minimised if development and testing are done on dedicated environments. Finally, security and permissions can be restricted to each environment to reduce the risk of human error and data loss and protect sensitive data.

Monitor

Monitoring is another crucial part of the Synapse framework, and to assist in this area there’s also the Azure Monitor tool. Using the Kusto Query Language (KQL), it’s possible to query the logs in near real-time and set up alerts via email that the synapse workspace admin will receive every time a pipeline run fails.

The framework is also capable of storing the logs from Azure Monitor into an ADLS Gen2 Container to keep the historical data and consuming them in Power BI to create a report about those logs. Logs include execution statistics and messages. The ingestion and processing phases of the framework gather functional statistics of execution and store them in the logs too.

Power BI is then used to do ad-hoc analysis to understand, for instance, which pipelines can take longer, whether was there any deviation on rows processed or other functional statistics, which one fails often, understand when the dedicated pool is hitting the max usage, or even creating combinations where the number of rows processed is divided by time to understand whether the process took longer than usual.

Implementing Azure Monitor, it helps to minimise risks, improve the quality of the processes and ensure that our pipelines are performing as expected.

Final thoughts

Having a framework for your data pipelines is really an advantage since it provides standardisation and scalability. Having a framework that runs all your data pipelines makes the process simpler and more efficient because your developers don’t have to waste time defining properties for each individual pipeline but can change their working environment, for example, from development to QA. Having everything running in one place provides more control over the quality of the process and more power to monitor everything properly so that when an issue arises, your developers are alerted by the framework and can check the necessary logs to understand what happened and solve errors faster.

Besides this, imagine having to scale all your data pipelines one by one. With the framework, you can scale all of them in one go just by changing the properties of the framework instead of each pipeline individually.

Francisco CandeiasBI Framework: why you should have one for your data pipelines
read more

Why you should integrate Power BI and Customer Insights

5 SECOND SUMMARY:
  • Integrating Power BI and Customer Insights will allow you to gain a competitive advantage for your company.
  • You can segment customers, gain market insights and quickly and effectively assess to see if your products are relevant in the market.

Have you ever thought about how to better leverage your customers’ analysis? Since 2016, Microsoft has been offering its customers Dynamics 365, a cloud-based software platform with innumerable capabilities to fulfil their needs. Dynamics 365 has a range of apps covering areas from finance to sales, marketing, supply chain management, project management, human resources and much more, and these can be combined to make help every company become more powerful and gain a competitive edge. There are many topics to discuss regarding Dynamics 365 and all its functionalities, but today we’re going to turn our attention to Customer Insights and the ability to build reports based on its data using Power BI. So, without further ado, let’s start by explaining what Customer Insights is:

1. A 360º view of your customer

The more you know about your customers, the more competitive you can be because you are able to give them solutions better fitted to their wants and needs, so yes, just like the title says, Customer Insights (CI) helps you potentiate your customer analysis.

With it, you can first gather all your customer information and then aggregate it using Machine Learning (ML) and Artificial Intelligence (AI) methods. The idea of this app is to give you the ability to unify all the customer data collected by different company departments, creating a unique profile to prevent information duplication and data silos. This empowers all your employees interacting with your customers in different departments to personalise their journey at the same time as achieving better engagement with them and increase the potential to cross-sell or up-sell products, maximising their lifetime value.

Customer Insights is very flexible and capable of importing data from almost any source, which lets you gather all your customer data in the same place, unify it and eliminate data silos. Besides this, with Customer Insights, you can define customer profiles using which you can follow every customer journey.

2. Why do I need Customer Insights?

There are many reasons why you must try using Customer Insights.

  1. First of all, it’s really intuitive and easy to use, giving you the ability to quickly start asking questions about the data you gather.

2. Secondly, you can segment your customers with demographic, transactional or behavioural conditions. With these segments, you can personalise targets for promotions, sales events or any other type of action that utilise segmentation. Beyond that, you can export those segments as data sets to use them in other applications.

3. Thirdly, the capability to see and identify customer trends or forecast changes in the market helps you get ahead of your rivals and make smarter decisions around offering the best solution to your customers.

4. Fourthly, with Customer Insights, you can obtain market insights to help you define or improve your products and services. This helps you anticipate market demands and combine resources to satisfy your customers’ changing needs. Such adaptability can take your company to an elevated level of organisation and efficiency.

5. Last, and maybe most importantly, Customer Insights lets you gather feedback from your customer profiles about any of your products, comparing it with product data on Power BI, as we’re going to see next. This is vitally important because you can quickly evaluate how your customer base rates your products and how they would like you to improve them. Maybe you can´t satisfy everyone, but you can get close to it. The best way of finding out what your customers want is by listening – and when a customer feels listened to, he sticks around.

3. Smash analyses with Power BI

Power BI, as we know, is a tool where a vast number of data sources can find each other to fill extensive dashboards with knowledge from multiple departments, specially prepared with the correct information for decision-making. Customer Insights is one of those sources. Using CI alone, you can focus minutely on your customers, but combined with Power BI, you can take essential information from CI and cross it with other insights. You can have a dashboard in which you analyse customer demand for a specific product (using data from Customer Insights) and, still in this same dashboard, see what the marketing budget is or familiarise yourself with the strategy for that product (using data from the Marketing or Finance department).

The level of enrichment is great, and if you’re wondering, “Why use Power BI to analyse my customer data when I have CI?” think of this basic example. Imagine bringing information together from anywhere you want and personalising your analyses just as you need. That’s the power of Power BI.

You import your customer data from CI to Power BI using a built-in Microsoft connector. After this, you simply create reports and dashboards as you normally would do with Power BI.

Final Thoughts

It’s important that you know that Customer Insights and Power BI are two separate tools. While the insights you get from CI may potentiate the work of your salesman, marketing department and everything else related to your customers, Power BI empowers the work of your leaders, helping them make decisions and implement strategies. When combined, these two tools become very powerful, and the value you can get from them is proportional because the more you scale your analyses, the more you’ll see the benefits.

So, what are you waiting for? Gain the competitive edge by implementing CI for your company and building some cool reports by channelling the data it generates through Power BI? We want you to leverage your customer insights to be even better prepared for new and ever-changing market trends.

Francisco CandeiasWhy you should integrate Power BI and Customer Insights
read more