Big Data: the state of the art

Xpand IT cannot define the state of the art of Big Data without reflecting upon the huge annual increase in the adoption of Big Data technologies, from which we highlight the Confluent and Cloudera platforms.

Our history with Big Data started back in 2013, “our” year zero in adopting a Hadoop ecosystem. During the next year, we felt that Big Data could, in fact, be an extremely important trend and, even risking taking the wrong decision, we formalised a Big Data unit in Xpand IT. Since early on, we aspired to take action and adopt the avant-garde, innovative and disruptive technologies that would allow us to fulfil this promise, with the help of Big Data. For example, we adopted Spark, Kafka and Kudu right at the beginning of their availability. Nowadays, these technologies, especially Spark and Kafka, probably represent around 80% of our effort in developing software solutions.

The Portuguese market has been found to be a good home to Big Data projects. Xpand IT is present in almost all industries: retail, utilities, telecommunications, health, banking, mobility and transportation, among others.

With the help of the Digital Transformation paradigm, what we have seen is the dissemination of use cases amenable to Big Data technology of an analytic nature as well as of operational nature. The pressure on time-to-data has also increased, which implies a substantial reduction in the time available to process data and extract information – the age of real time analytics.

There are three big trends associated with Big Data: Data Science, Fast Data and Cloud.

1. Data Science

Concepts such as Customer Segmentation, Anomaly Detection or Predictive Analytics are not new. Fortunately, even though there is still a long way to go, a lot of companies are already investing in a data-driven strategy. However, the growing amount of data makes it necessary to act differently. For example, customer segmentation strategies change, since, so far, customers have been divided into limited buckets, by defining segments based on rationalised criteria. Now, it is more efficient to divide customer databases based on the factors we know to influence and define their pattern of buying or adopting a service, using data science techniques. Basically, our main objective is the segment of one: being able to have the right product for the right customer.

2. Fast Data

To demystify this:  when we talk about fast data we are not really discussing fast or slow data. We are talking about the need to process a considerable number of small data-points very quickly – that is the goal. Working specifically inside the Hadoop platform, we find fast data components such as Spark, Kafka and Kudu, which, at the same time, allow the development of apps oriented to low latency processing and data availability, and which can also scale in a significant and efficient way.

3. Cloud

One of the most discussed benefits of using the Cloud is the cost reduction from eliminating data centres. However, the true advantage is in the elasticity the cloud provides. The Cloud allows the adaptation of the infrastructure (both processing and storage) to the needs of each moment. Nowadays, even with Hadoop technology, it is possible to configure clusters with separate compute and storage. This makes it easier to increase/reduce these two factors independently, something that before was not possible, even with the Cloud.

In summary, the process of Digital Transformation rests on three key points: there is no Digital Transformation without data; Data Science is necessary to make sense of data and to extract effective value; and the Cloud is extremely important because it provides the necessary agility to adopt the technology and principles of Big Data.

Nuno Barreto

Partner & Big Data Lead, Xpand IT

Nuno BarretoBig Data: the state of the art


Xpand IT Visionaries

Want to get amazing Big Data, Business Intelligence, Middleware
Mobile articles & news directly from our experts?
Subscribe to our blogs now.

Readers also checked out