João Varela

Data Science and the Demand Forecasting: how to predict sales

3 min

We live in an increasingly intelligent environment, where every decision has a greater impact on the world around us. It is therefore important that every company makes considered and informed decisions that lead not only to increased profit but also to the well-being and satisfaction of its customers.

One context in which it is crucial to make informed decisions is inventory management and optimisation. How many products should the company have in stock? How often should they be replenished? These are some of the extremely relevant questions in inventory management, and the answers can make all the difference to the company’s success.

What is demand forecasting all about?

This is where the concept of demand forecasting arises, which consists of using the historical sales records of a given product to estimate future demand. Having an estimate of how many products will be sold allows for better financial management, and also the calculation of profit margins, cash flows, the allocation of resources, and the optimisation of the production and storage of products.

Incorrect inventory management can lead to two types of problems:

👉 The customer wants to purchase a product, but there is no stock available. If this happens, not only have you lost a vital sales opportunity, leading to a decrease in profits, but also generated customer dissatisfaction.

👉 Too many products have been made and stay unsold. This problem is especially relevant when the product shelf life is short, such as in baked goods. In this case, production and storage costs will never be recouped, i.e. there is inevitably a loss.

How can we help?

Since demand forecasting takes advantage of historical data to make an estimate for the future, we are facing a Data Science problem. We can build mathematical models, using Machine Learning, that simulates market behaviour with the highest level of detail possible, reducing the difference between the estimated value of purchases and the real value of purchases made.

The biggest challenge for this kind of analysis is the high number of external factors directly affecting the number of purchases, which are not always easy to take into account. Seasonality, weather, events near the site, competitive analyses and promotions are just some of these factors, and it is essential to enrich historical sales data with this kind of information in order to find more significant sales patterns.

A major advantage of using Machine Learning models to forecast market demand is their explainability. From these models, it is possible to extract what factors are contributing positively or negatively to sales figures, and the decision-making process can take this into account in order to minimise negative factors in future wherever possible.

Azure Machine Learning

The Azure Machine Learning platform streamlines the entire Data Science process, from the feasibility analysis with interactive Notebooks to the creation and production of models, facilitating their registration and the continuous delivery of new models. The platform also makes it possible to trigger monitoring resources using Azure Application Insights, automatically recording information about the model in production; that is, what values were received and returned by the model (enabling the identification of data deviations or drifts that indicate the need to retrain the model), as well as response times, the number of requests and identification of any errors during the stock forecast.

Watch our video and discover how to predict sales!

Final thoughts

From the discussion above, we infer that inventory management is a very important factor for optimising company profits, as it is a demand forecast technique that allows us to estimate the future likely number of purchases, which is fundamental data for more accurate, thoughtful management. We also conclude that Machine Learning models are a viable solution for calculating these estimates and that it is extremely important to enrich sales records with external factors that add value and allow the creation of more assertive models.

At Xpand IT we are prepared to solve this, or any other kind of Data Science problem. Please contact us.

João VarelaData Science and the Demand Forecasting: how to predict sales
read more

Data Science and preventive maintenance: prevention is key!

4 min

Mechanical problems can stop production lines for hours, leading to a decrease in production and consequently a decrease in profit for the whole factory.

With the emergence of Industry 4.0 and Internet of Things (IoT) technologies, new solutions and challenges have been born. More and more machines are connected to the cloud, allowing the collection of sensory data regarding their state over time. And it is this collection of data from the machines’ sensors that allows the use of Machine Learning algorithms. These algorithms can identify the state of degradation of the machine, in order to optimise its maintenance process.

Thus, the concept of preventive maintenance arises, consisting of more intelligent maintenance with the main objective of optimising the maintenance periods of the machines. But how do machines collect this data? We explain.

How does data acquisition work

The first phase of this type of project is data collection or the creation of a dataset. Ideally, the values of each of the sensors will be collected from the machines over time, forming a dataset of the type timeseries, i.e., a discrete sampling of data that forms a sequence of values each with a corresponding associated timestamp. This data set will later be used to train the model, so it is important that this data represents the operation of the system as well as possible. Some factors that must be taken into account during this data acquisition process are the sampling frequency, which must be in accordance with the system’s operating frequency and the reduction of external factors that may introduce noise into the collected signal.

Modelling approaches in preventive maintenance

In any Data Science project, it is important to define the question we want our model to answer. Thus, we will not only define the most appropriate modelling approach but also define concrete assertiveness goals for this approach. Thus we can identify three types of questions that come to solve preventive maintenance problems:

Will the machine break down within a specific time period?

This question leads us to a classification problem, whose possible answers are finite and well defined. In this case, given a given set of the most recent data on machine behaviour we want to identify whether or not there will be a problem during a time window: in the next 24 hours, 7 days or next month, for example. For more complex cases, it may also be possible to identify what type of malfunction or which part may be contributing to greater wear and tear on the machine.


How long do we have until the next breakdown?

The answer to this type of question is not so linear, since it consists of a number that can vary continuously, that is, we are facing a regression problem. This approach allows us to identify the expected time window for a system failure in a more detailed way. This metric is also known as Remaining Useful Time (RUL), making it possible to perform maintenance on the machine before it reaches its critical point.


Is the machine working as expected?

This approach is particularly useful in situations where we do not have access to historical machine breakdown records. In this case, our model will be predicting the expected behaviour of the machine based on its last moments of operation. If there is a large difference between the predicted values and the values collected from the machine, we are facing a case where the machine is not working as expected, i.e., there is the possibility of a problem occurring and the consequent need for maintenance.

Do you know Azure Machine Learning?

Azure Machine Learning is an Azure service that was developed with the goal of facilitating and speeding up the entire lifecycle of an ML model, from data analysis to model training and subsequent model production. Some tools that help accelerate this lifecycle are:

  • Collaborative notebooks: These allow the creation of identical development environments for the whole team, as well as the collaboration of several members in the same file. This helps the knowledge sharing process at an earlier stage of data preprocessing, analysis and visualisation.
  • AutoML: This tool gives us the ability to build models automatically and much faster, allowing us to get to market earlier. AutoML tries to optimise the iterative modelling phase by automatically choosing the best features, models that best fit the specific data type and the best parameters for those models.
  • Drag and drop: For rapid prototyping development, or for those who are not as comfortable with writing code, Azure provides a blockchain platform that allows you to quickly create pipelines for data transformation, model training and productisation.

This Azure service also contains a very useful feature for model registration, making it easier to version and continuously deliver new models. This functionality can also provide a REST API that serves as a communication interface with the model created, which makes its integration in any environment easy.  

Watch our video and discover everything!

Final thoughts

In short, the objective of preventive maintenance is to optimise the costs associated with the operation of machines or other types of systems that need maintenance, and there are several different valid approaches. The choice of the correct approach depends on the problem under study, as well as on the requests and requirements of each client.

At Xpand IT we are prepared to solve this, or any other kind of data science problem. Please contact us.

João VarelaData Science and preventive maintenance: prevention is key!
read more

Data Science is the future (what is our definition?)

4 min

Data Science is the future, and the future is here.

We’re in 2021, and the future is already here. We aren’t seeing cars flying everywhere yet, and Elon Musk hasn’t sent people to Mars, but we can already buy a robot to clean our house, we all have a supercomputer in our pocket, and, according to The Economist, the world’s most important resource is no longer oil, but data.

Data-driven companies use large amounts of data behind the scenes to improve decision making, predict market trends, and increase overall customer experience by personalising their products accordingly. The increasing use of smartphones and the rise of the Internet of Things generate inconceivable amounts of data every day, and recent studies predict that the world will store 200 zettabytes of data by 2025. Data science is capable of extracting knowledge and providing deeper insight on customer preferences from these large datasets, through statistics and in-depth analysis.

Data science can be extended to multiple fields, as it uses mathematical techniques and theories as well as computer science processes and algorithms to understand and extract meaningful information from different types of data (tabular, text, time-series, images, and much more).  Data science covers numerous areas such as statistics, machine learning, programming, analytics and data visualisation. Data scientists should master these areas and develop statistical models that detect patterns and trends in large volumes of data. They can be considered storytellers who present data insights to decision-makers in an understandable way.

How does it work? What types of problem can it solve?

Historians say that it is important to study the past in order not to repeat the same mistakes in the future. In most cases, data science tries to apply that idea, it makes use of historical data to predict or analyse future similar outcomes using probabilities and statistics. Complex algorithms take advantage of accumulated data to find meaningful patterns and behaviours, which can later be applied to predict values or events.

We can find data science applications in pretty much every industry, from grocery shop stock management to competitive sports analyses. Although there are many more data science applications, most  use cases are included in one of the commonest problem types listed below:

  • Classification is used when we need to predict a given data point’s category or label. Social media is increasingly used to spread fake news, and classification algorithms are being developed to detect these posts as soon as possible. They can be used to automatically detect spam messages or analyse customer sentiments based on his product reviews, detecting the category of his opinion as either good, bad, angry, happy, etc. Automating processes like those make the acquisition of useful information faster, ultimately reducing time to market.


  • Anomaly detection, as the name suggests, has the goal of identifying values outside the ordinary. One of the biggest challenges in the finance industry is the fight against fraud. Fraudulent transactions, phishing scams and anomalous transactions are some of the irregularities that can be  detected. We’re witnessing the fourth industrial revolution, the rise of digital industrial technology. One application of data science in industry 4.0 is the early anomaly detection in manufacturing machines, which can have a huge impact on the deterioration detection, preventing major part failures and decreasing unplanned downtime costs.


  • Forecast approaches can tell us when or how an event is likely to occur. Data Science is being applied to healthcare and it is helping professionals save lives during pandemic times, either by forecasting the next spike or predicting the length of patient hospitalisation. Some football teams are using data science to win games; they try to predict player performances and market values. Liverpool Football Club data scientists analysed thousands of games to predict which areas of the pitch are best to use at any time. Another area where forecasts can have a huge impact is the evolution of the client-company relationship. A good example here is the customer churn rate prediction that our last blog post discusses in greater detail.


  • ‘You may also like’ sections in online shops or movies services are built using recommendation systems. Netflix has a large dataset containing user interactions such as what time of the day customers watch, how long for, and on what device, as well as film trends, most-watched actors and much more. Using this data, they estimate the films or shows of the  greatest interest to every user, making their product highly personalised. These recommendation systems are being used in social media too, finding the most likely users or pages to be connected.


  • Technology is evolving rapidly, and every day more and more expansive computations can be done, and with this comes the ability to develop greater and more complex models and algorithms. Recognition is a futuristic application of data science that we already have around us, with techniques existing for extracting meaningful information from images or sounds. We can unlock our phones using the camera or generate a transcript of a chat. Let’s be honest, for the laziest of us, telling Alexa to turn on the lights always feels stylish.


What is the Xpand IT Data Science framework?

At Xpand IT we defined our data science process that aims to mitigate the natural uncertainty of data science projects with a structured approach based on agile methodologies (specifically Scrum and our own development framework XPAGILE). This way we can approach these projects in an agile way without losing focus on what matters the most: delivering quality results on time.

As data scientists ourselves, we developed this process so we would not lose any of the steps we deem fundamental, while being able to execute and improve the your vision. Some of the advantages the framework ensures for your solutions:

  • Risk reduction: expert knowledge and best practices are applied from day one, besides being a phase-based approach, we go as far as possible.
  • Maximizing obtained value: we identify the problem from day one, guaranty the goal relevance, and adjust it on every iteration.
  • Viable solution: we check if the final solution is viable, the project productization ensures that it wasn’t just another experiment, but the problem’s solution.
  • Project specificity: we assure each and every project’s quality without falling into the trap of “one solution fits all”.

Final remarks

Data Science can be applied to a wide range of businesses, and it is not easy to define the main advantages for all use cases. Since every case is unique, it is vital to evaluate the specific problem and then identify the best scenario opportunities. The main goal of this post is to show the reader examples of real data science applications and demystify its usage.

Our Data Science unit is ready to help you in any unique use case, and our goal is to deliver value throughout lifecycle of the project, while focusing on understanding your business and helping you to create and deploy the required technology.

João VarelaData Science is the future (what is our definition?)
read more