João Varela

jpsv

Data Scientist - Xpand IT

Machine Learning for fraud detection

5-SECOND SUMMARY:
  • Improving fraud detection systems allows organizations to protect their data more effectively;
  • Artificial Intelligence and Machine Learning allow you to identify patterns and trends in large amounts of data and help decision making in fraud detection;
  • Azure Stream Analytics is the tool that will help you.

Technological advancement is expanding faster than ever, but it also introduces us to new threats. New technology allows criminals to gain access to more sophisticated methods that are growing harder to detect.

This is why it is so important to upgrade our fraud detection systems in order to fight these new threats. Machines are far quicker than humans when it comes to analysing and processing large quantities of data, which drastically improves the reaction time to any attack or cybersecurity anomaly. This results in a faster action time, reducing the possible damage that can be caused during the attack. Such detection systems can be achieved through Artificial Intelligence and Machine Learning, using advanced mathematical techniques that allow us to identify patterns and trends in big quantities of data in order to automate processes and extract insights that help decision-making.

Machine Learning models use information extracted from data such as identities, orders, payment methods, locations or network data, and are then applied to mathematical functions in order to identify atypical events that can hint at a threat. Some of the threats that can be identified with these models are spam/phishing emails or chats, payment fraud or identity theft. The use of these processes makes it possible for us to process big quantities of security data, and then identify the critical and most important points where attentive human intervention is required. This process frees up security teams, allowing them to spend more time-solving problems that truly matter.

Microsoft Azure and fraud detection

When we talk about safety, the time of response to eventual threats is a key factor. Azure offers tools that are helpful in the development and productisation of these processes. Azure Stream Analytics is an analysis service for near-realtime data streams, developed for processing big amounts of data coming from different sources at the same time. This service facilitates integration in ML pipelines and is also able to activate alerts, deliver data to a reporting tool or dashboard, or retain transformed data for future use.

Watch our video!

Conclusion

Technologies such as Machine Learning and Data Science are state-of-the-art solutions for safety problems, and the models that are created are pretty robust, as in they can adapt to new types of threats quickly. These solutions can be integrated into services in order to detect fraud in useful time (NRT), hence reducing the time for detecting fraud, a key element to upgrading the safety of any system. At Xpand IT we are fully capable of solving this and many other Data Science problems. You can get in touch with us to start a conversation here.

João VarelaMachine Learning for fraud detection
read more

Personalisation in Data Science (and recommendation systems)

THIS ARTICLE IN 5 SECONDS:
  • Data science techniques are useful for identifying every customer and tailoring offers for them, using, for example, recommendation systems;
  • Technologies such as artificial intelligence and machine learning have changed the way in which customers interact with brands;
  • Data science is great for the implementation of dynamic user interfaces in apps (they adapt according to the use).

Personalisation is increasingly a key factor in the modern paradigm of the digital market. The theory is simple: every person is unique, has their own taste and lifestyle and this is why everyone should be given a unique experience, suited to their individuality as much as possible.

In practice, Data Science techniques facilitate the analysis of every customer’s behaviour in order to personalise products and make customer interactions with them as natural and satisfying as possible.

Why is personalisation so useful to organisations?

If we think back a few years, we can see that online markets were using the same sales techniques as in a traditional market. The product website looked similar to every customer and their interaction with the product was also the same for all of them. The most recent innovations in data science, associated with big data technologies, have arrived to simplify the mass personalisation of digital products.

Customers are, more demanding than ever, and it is becoming harder and harder to create an innovative product that brings value to the customer. That is why is customising a product in such a way that the customer feels it was designed exclusively for them is so important. Currently, in the IT field, technologies such as artificial intelligence and machine learning are changing the way in which customers interact with brands, improving metrics such as user experience, conversion taxes or minimising the churn rate.

Recommendation systems:

All of us have already experienced that terrifying sensation of receiving an advertisement for a product we were thinking of. No, your phone is not (yet) able to read your mind, but this situation is generated by recommendation systems.

Data science techniques allow us to take advantage of your history of interactions with various products to build a picture of your tastes and interests, allowing the ability to suggest new content, different and unique for every customer. These systems are used for product recommendations in online stores, music and film recommendations on streaming platforms, advertisements we see on the Internet and even in new friend suggestions on social media platforms.

Dynamic user interface:

Nowadays, apps display many more tools and features. More features imply a more complex user interface, with more buttons. However, each user is unique and, therefore, their behaviour when using an app will also differ from person to person.

Data Science facilitates the extraction of these users’ behavioural patterns, making it possible to reorganise the app interface for each user, therefore resulting in a unique, tailored user experience.

Microsoft Azure services:

Azure has various services that can speed up the development and productisation of this type of product. One of those services is Azure Cognitive Search, a search service in the cloud, with integrated AI abilities.

Cognitive Search facilitates an intelligent search of structured and unstructured data, based on the intention of the customer, in contrast to more traditional systems that use techniques such as keyword search. This uses a personalised search of different types of a text documents, on the basis of pertinent information such as name, location, language, and more.

Watch our video!

Conclusion

We hope this article was able to convince you of the importance of the personalisation of digital products, a key factor in today’s market, that gives every user a unique experience, by improving customer satisfaction and their retention time.

Some examples of using Data Science in this area are recommendation systems, frequently used in online shopping and streaming platforms, and the development of dynamic interfaces in apps, where changes are carried out based on user behaviour in order to allow a more comfortable interaction.

João VarelaPersonalisation in Data Science (and recommendation systems)
read more

Data Science and the Demand Forecasting: how to predict sales

We live in an increasingly intelligent environment, where every decision has a greater impact on the world around us. It is therefore important that every company makes considered and informed decisions that lead not only to increased profit but also to the well-being and satisfaction of its customers.

One context in which it is crucial to make informed decisions is inventory management and optimisation. How many products should the company have in stock? How often should they be replenished? These are some of the extremely relevant questions in inventory management, and the answers can make all the difference to the company’s success.

What is demand forecasting all about?

This is where the concept of demand forecasting arises, which consists of using the historical sales records of a given product to estimate future demand. Having an estimate of how many products will be sold allows for better financial management, and also the calculation of profit margins, cash flows, the allocation of resources, and the optimisation of the production and storage of products.

Incorrect inventory management can lead to two types of problems:

? The customer wants to purchase a product, but there is no stock available. If this happens, not only have you lost a vital sales opportunity, leading to a decrease in profits, but also generated customer dissatisfaction.

? Too many products have been made and stay unsold. This problem is especially relevant when the product shelf life is short, such as in baked goods. In this case, production and storage costs will never be recouped, i.e. there is inevitably a loss.

How can we help?

Since demand forecasting takes advantage of historical data to make an estimate for the future, we are facing a Data Science problem. We can build mathematical models, using Machine Learning, that simulates market behaviour with the highest level of detail possible, reducing the difference between the estimated value of purchases and the real value of purchases made.

The biggest challenge for this kind of analysis is the high number of external factors directly affecting the number of purchases, which are not always easy to take into account. Seasonality, weather, events near the site, competitive analyses and promotions are just some of these factors, and it is essential to enrich historical sales data with this kind of information in order to find more significant sales patterns.

A major advantage of using Machine Learning models to forecast market demand is their explainability. From these models, it is possible to extract what factors are contributing positively or negatively to sales figures, and the decision-making process can take this into account in order to minimise negative factors in future wherever possible.

Azure Machine Learning

The Azure Machine Learning platform streamlines the entire Data Science process, from the feasibility analysis with interactive Notebooks to the creation and production of models, facilitating their registration and the continuous delivery of new models. The platform also makes it possible to trigger monitoring resources using Azure Application Insights, automatically recording information about the model in production; that is, what values were received and returned by the model (enabling the identification of data deviations or drifts that indicate the need to retrain the model), as well as response times, the number of requests and identification of any errors during the stock forecast.

Watch our video and discover how to predict sales!

Final thoughts

From the discussion above, we infer that inventory management is a very important factor for optimising company profits, as it is a demand forecast technique that allows us to estimate the future likely number of purchases, which is fundamental data for more accurate, thoughtful management. We also conclude that Machine Learning models are a viable solution for calculating these estimates and that it is extremely important to enrich sales records with external factors that add value and allow the creation of more assertive models.

At Xpand IT we are prepared to solve this, or any other kind of Data Science problem. Please contact us.

João VarelaData Science and the Demand Forecasting: how to predict sales
read more

Data Science and preventive maintenance: prevention is key!

Mechanical problems can stop production lines for hours, leading to a decrease in production and consequently a decrease in profit for the whole factory.

With the emergence of Industry 4.0 and Internet of Things (IoT) technologies, new solutions and challenges have been born. More and more machines are connected to the cloud, allowing the collection of sensory data regarding their state over time. And it is this collection of data from the machines‘ sensors that allows the use of Machine Learning algorithms. These algorithms can identify the state of degradation of the machine, in order to optimise its maintenance process.

Thus, the concept of preventive maintenance arises, consisting of more intelligent maintenance with the main objective of optimising the maintenance periods of the machines. But how do machines collect this data? We explain.

How does data acquisition work

The first phase of this type of project is data collection or the creation of a dataset. Ideally, the values of each of the sensors will be collected from the machines over time, forming a dataset of the type timeseries, i.e., a discrete sampling of data that forms a sequence of values each with a corresponding associated timestamp. This data set will later be used to train the model, so it is important that this data represents the operation of the system as well as possible. Some factors that must be taken into account during this data acquisition process are the sampling frequency, which must be in accordance with the system’s operating frequency and the reduction of external factors that may introduce noise into the collected signal.

Modelling approaches in preventive maintenance

In any Data Science project, it is important to define the question we want our model to answer. Thus, we will not only define the most appropriate modelling approach but also define concrete assertiveness goals for this approach. Thus we can identify three types of questions that come to solve preventive maintenance problems:

Will the machine break down within a specific time period?

This question leads us to a classification problem, whose possible answers are finite and well defined. In this case, given a given set of the most recent data on machine behaviour we want to identify whether or not there will be a problem during a time window: in the next 24 hours, 7 days or next month, for example. For more complex cases, it may also be possible to identify what type of malfunction or which part may be contributing to greater wear and tear on the machine.

 

How long do we have until the next breakdown?

The answer to this type of question is not so linear, since it consists of a number that can vary continuously, that is, we are facing a regression problem. This approach allows us to identify the expected time window for a system failure in a more detailed way. This metric is also known as Remaining Useful Time (RUL), making it possible to perform maintenance on the machine before it reaches its critical point.

 

Is the machine working as expected?

This approach is particularly useful in situations where we do not have access to historical machine breakdown records. In this case, our model will be predicting the expected behaviour of the machine based on its last moments of operation. If there is a large difference between the predicted values and the values collected from the machine, we are facing a case where the machine is not working as expected, i.e., there is the possibility of a problem occurring and the consequent need for maintenance.

Do you know Azure Machine Learning?

Azure Machine Learning is an Azure service that was developed with the goal of facilitating and speeding up the entire lifecycle of an ML model, from data analysis to model training and subsequent model production. Some tools that help accelerate this lifecycle are:

  • Collaborative notebooks: These allow the creation of identical development environments for the whole team, as well as the collaboration of several members in the same file. This helps the knowledge sharing process at an earlier stage of data preprocessing, analysis and visualisation.
  • AutoML: This tool gives us the ability to build models automatically and much faster, allowing us to get to market earlier. AutoML tries to optimise the iterative modelling phase by automatically choosing the best features, models that best fit the specific data type and the best parameters for those models.
  • Drag and drop: For rapid prototyping development, or for those who are not as comfortable with writing code, Azure provides a blockchain platform that allows you to quickly create pipelines for data transformation, model training and productisation.

This Azure service also contains a very useful feature for model registration, making it easier to version and continuously deliver new models. This functionality can also provide a REST API that serves as a communication interface with the model created, which makes its integration in any environment easy.  

Watch our video and discover everything!

Final thoughts

In short, the objective of preventive maintenance is to optimise the costs associated with the operation of machines or other types of systems that need maintenance, and there are several different valid approaches. The choice of the correct approach depends on the problem under study, as well as on the requests and requirements of each client.

At Xpand IT we are prepared to solve this, or any other kind of data science problem. Please contact us.

João VarelaData Science and preventive maintenance: prevention is key!
read more

Data Science is the future (what is our definition?)

Data Science is the future, and the future is here.

We’re in 2021, and the future is already here. We aren’t seeing cars flying everywhere yet, and Elon Musk hasn’t sent people to Mars, but we can already buy a robot to clean our house, we all have a supercomputer in our pocket, and, according to The Economist, the world’s most important resource is no longer oil, but data.

Data-driven companies use large amounts of data behind the scenes to improve decision making, predict market trends, and increase overall customer experience by personalising their products accordingly. The increasing use of smartphones and the rise of the Internet of Things generate inconceivable amounts of data every day, and recent studies predict that the world will store 200 zettabytes of data by 2025. Data science is capable of extracting knowledge and providing deeper insight on customer preferences from these large datasets, through statistics and in-depth analysis.

Data science can be extended to multiple fields, as it uses mathematical techniques and theories as well as computer science processes and algorithms to understand and extract meaningful information from different types of data (tabular, text, time-series, images, and much more).  Data science covers numerous areas such as statistics, machine learning, programming, analytics and data visualisation. Data scientists should master these areas and develop statistical models that detect patterns and trends in large volumes of data. They can be considered storytellers who present data insights to decision-makers in an understandable way.

How does it work? What types of problem can it solve?

Historians say that it is important to study the past in order not to repeat the same mistakes in the future. In most cases, data science tries to apply that idea, it makes use of historical data to predict or analyse future similar outcomes using probabilities and statistics. Complex algorithms take advantage of accumulated data to find meaningful patterns and behaviours, which can later be applied to predict values or events.

We can find data science applications in pretty much every industry, from grocery shop stock management to competitive sports analyses. Although there are many more data science applications, most  use cases are included in one of the commonest problem types listed below:

  • Classification is used when we need to predict a given data point’s category or label. Social media is increasingly used to spread fake news, and classification algorithms are being developed to detect these posts as soon as possible. They can be used to automatically detect spam messages or analyse customer sentiments based on his product reviews, detecting the category of his opinion as either good, bad, angry, happy, etc. Automating processes like those make the acquisition of useful information faster, ultimately reducing time to market.

via GIPHY

  • Anomaly detection, as the name suggests, has the goal of identifying values outside the ordinary. One of the biggest challenges in the finance industry is the fight against fraud. Fraudulent transactions, phishing scams and anomalous transactions are some of the irregularities that can be  detected. We’re witnessing the fourth industrial revolution, the rise of digital industrial technology. One application of data science in industry 4.0 is the early anomaly detection in manufacturing machines, which can have a huge impact on the deterioration detection, preventing major part failures and decreasing unplanned downtime costs.

via GIPHY

  • Forecast approaches can tell us when or how an event is likely to occur. Data Science is being applied to healthcare and it is helping professionals save lives during pandemic times, either by forecasting the next spike or predicting the length of patient hospitalisation. Some football teams are using data science to win games; they try to predict player performances and market values. Liverpool Football Club data scientists analysed thousands of games to predict which areas of the pitch are best to use at any time. Another area where forecasts can have a huge impact is the evolution of the client-company relationship. A good example here is the customer churn rate prediction that our last blog post discusses in greater detail.

via GIPHY

  • ‚You may also like‘ sections in online shops or movies services are built using recommendation systems. Netflix has a large dataset containing user interactions such as what time of the day customers watch, how long for, and on what device, as well as film trends, most-watched actors and much more. Using this data, they estimate the films or shows of the  greatest interest to every user, making their product highly personalised. These recommendation systems are being used in social media too, finding the most likely users or pages to be connected.

via GIPHY

  • Technology is evolving rapidly, and every day more and more expansive computations can be done, and with this comes the ability to develop greater and more complex models and algorithms. Recognition is a futuristic application of data science that we already have around us, with techniques existing for extracting meaningful information from images or sounds. We can unlock our phones using the camera or generate a transcript of a chat. Let’s be honest, for the laziest of us, telling Alexa to turn on the lights always feels stylish.

via GIPHY

What is the Xpand IT Data Science framework?

At Xpand IT we defined our data science process that aims to mitigate the natural uncertainty of data science projects with a structured approach based on agile methodologies (specifically Scrum and our own development framework XPAGILE). This way we can approach these projects in an agile way without losing focus on what matters the most: delivering quality results on time.

As data scientists ourselves, we developed this process so we would not lose any of the steps we deem fundamental, while being able to execute and improve the your vision. Some of the advantages the framework ensures for your solutions:

  • Risk reduction: expert knowledge and best practices are applied from day one, besides being a phase-based approach, we go as far as possible.
  • Maximizing obtained value: we identify the problem from day one, guaranty the goal relevance, and adjust it on every iteration.
  • Viable solution: we check if the final solution is viable, the project productization ensures that it wasn’t just another experiment, but the problem’s solution.
  • Project specificity: we assure each and every project’s quality without falling into the trap of “one solution fits all”.

Final remarks

Data Science can be applied to a wide range of businesses, and it is not easy to define the main advantages for all use cases. Since every case is unique, it is vital to evaluate the specific problem and then identify the best scenario opportunities. The main goal of this post is to show the reader examples of real data science applications and demystify its usage.

Our Data Science unit is ready to help you in any unique use case, and our goal is to deliver value throughout lifecycle of the project, while focusing on understanding your business and helping you to create and deploy the required technology.

João VarelaData Science is the future (what is our definition?)
read more