Latest news

Power Platform World Tour: Our experience

In the last week of August, Xpand IT has travelled once again to London where on the 28th and the 29th took place the first European stop of the 2019 Power Platform World Tour. We departed Lisbon with some expectations we hoped would be fulfilled: we wanted to understand this platform even better – a platform which is experiencing an interesting growth – and also get a glimpse into its future.

For those unfamiliar with Power Platform, this is a platform that brings together 3 Microsoft products that together bring to life a platform that promises to streamline and promote the Digital Transformation of organizations. PowerApps, Flow and PowerBI are tools that enable the digitalization and automation of internal processes and have enormous potential to transform the way companies manage their processes and make their decisions. With these tools, companies will be able to make informed decisions with agility and with technology-based processes, therefore taking advantage of the benefits that come from it.

The Event

Getting back to London, though… the event offered us two full days of interesting content where it was possible to meet the growing and enthusiastic Power Platform community, to explore the challenges that different industries are tackling with PowerApps and, no less important, to get a dose of inspiration from the showcased solutions and how various companies are already taking advantage of these technologies. With The Shard as background, the event was a community get together and a genuine sharing of experiences…in fact, one of Microsoft’s most powerful messages is the Power Platform’s simplicity of use. When they say that everyone can build an app using PowerApps and Flow, it’s true. With these products, both developers and business users have the right tools and are empowered to get better business results by building apps. This is not a tool that can be used to solve every problem. However, it is undoubtedly possible to use these powerful technologies to address some of the challenges companies face nowadays.

One of the highlights of the event was being able to hear first-hand what Microsoft has to say about these products’ evolution and what the future holds, especially with regards to enhancements and new features that will be available to all users from October 1st onward. The AI Builder is an example of the new features we can count on: capabilities such as binary classification, object detection and form processing make it easier to include Microsoft’s cognitive services in enterprise applications and providing them with a layer of intelligence that up until now wasn’t within the reach of PowerApps applications. The platform includes a whole set of new features – more than 400 in the last 6 months according to Microsoft – that will allow more and more citizen developers to emerge.

Another of the highlights of the event was related to how these initiatives should be managed within the company in partnership with the IT department. Even though there’s a lot of advantages in putting the power of app creation in the hands of any user – in fact, these users are even now using Excel or Access to solve many problems – the company needs to guarantee that the theme of Enterprise Management is properly addressed. More importantly, we need to look at these initiatives in a more programmatic way: their adoption will have to be promoted continuously so that they aren’t regarded as one-shot projects only.

We also confirmed our suspicions related to the unprecedented growth of the platform: 700% growth in production apps and more than 2.5M of monthly active developers in the Power Platform. These are surprising numbers that show us that the low-code market is growing: Gartner and Forrester have named Microsoft PowerApps as market leaders. It’s safe to say that the future is looking bright for PowerApps and the rest of the Power Platform.

In Conclusion

In short, you can expect more news about PowerApps very soon. The event was an excellent opportunity to witness how companies are innovating internally and to learn from the many experiences of the community. We have returned to Lisbon with the certainty that the PowerApps value proposition for internal empowerment scenarios is very interesting and, in this sense, can complement our mobile development offer whether in cross-platform (Xamarin) or native development.

Strategically speaking, our vision for customer facing apps doesn’t include low-code tools. However, we see potential in low-code tools when we focus on internal and Employee Empowerment scenarios. More news coming soon!

Filipa MorenoPower Platform World Tour: Our experience
read more

Middleware as code with Ballerina

Let’s assume that we need to facilitate the integration of several systems. What options do we have?

There are quite a few options for performing our integration, such as Enterprise Service Bus (ESB) or other frameworks like Spring and NodeJS.

Existing approaches

Enterprise Service Bus

Let’s start by looking at the ESB, one of the ways of integrating systems. ESB provides a range of services using a standard method of communication such as REST or SOAP.

It also helps monitor all the messages passing through it and ensures they are delivered to the right place by controlling the routing of each message.

You can use code to create the service logic, but service configuration (routing, users and passwords) doesn’t need to be hard coded, because the ESB provides ways to configure services outside the code. It also helps control deployments and versioning of services.

You need to take into account that this approach has a single point of failure and requires high configuration and maintenance.

Some approaches appear to have ESBs in containers, in order to be more ‘cloud-native’, but they still require a lot of configuration, not being very agile.

Spring and NodeJS frameworks

To achieve a more agile approach, we can use frameworks such as NodeJS or Spring to create our integration, however, to work with communication (endpoints, messages and payloads data) they require libraries and plugins and a lot of boilerplate code to deal with the payload type and make simple calls to services.

There is a new language dancing around to try to mitigate these problems between ESBs and frameworks like Spring and NodeJS. Let’s take a look at Ballerina.

Ballerina

Ballerina is a new language that is being created with the communication paradigm in mind. Ballerina allows concurrent work to be done and has transactional functions where the data is either all successfully altered or none of it is.

It has both a graphical syntax as well as a textual one. So, if you write the code with the textual syntax, you are helped by the graphical information generated afterwards. This graphical information can also be used as documentation, because it represents the flow of messages and intervenients in the service. You can check a comparison of the syntaxes in the following image (in the left the textual syntax and in the right the graphical syntax):

Since Ballerina is constructed with communication in mind, you can count on network data types to be fully supported. This way you don’t need to add libraries to manipulate json or xml.

Ballerina is built upon integration patterns, providing a QoS where the communication is resilient, transactional, secure and observable.

To get a better idea of how Ballerina works you can check the following page https://ballerina.io/philosophy/

Service and proxy example

If you want to try the following example in your machine, please follow the guide on how to install Ballerina on your computer under the next link: https://ballerina.io/learn/getting-started/#download-the-ballerina-distribution .

In this example we will check how to create two services. One will work as a resource provider, and the other as a proxy that will convert the resources from xml to json.

Let’s start with the first service and call it service1.

Service 1

To create the service, first we create a file named service1.bal and write the following code:

import ballerina/http;
import ballerina/log;
service service1 on new http:Listener(9090) {
    resource function getCars(http:Caller caller, http:Request req) {
        var page = xml `<response>
        <cars>
            <car>
                <model>Leaf</model>
                <plate-number>AB-12-CD</plate-number>
                <plate-date>2019-01-01</plate-date>
                <serial-number>AS6F4GR8154E5G841DF548R4G1WW</serial-number>
            </car>
            <car>
                <model>Yaris</model>
                <plate-number>AB-13-CD</plate-number>
                <plate-date>2019-04-03</plate-date>
                <serial-number>ESD5GFN2RG5H451SEWFDBGR3544D</serial-number>
            </car>
        </cars>
        </response>`;
        http:Response res = new;
        res.setPayload(page);
        var result = caller->respond(res);
        if (result is error) {
            log:printError("Error sending response", err = result);
        }
    }
}

In this service we have the function getCars with our logic: first we start to create the response as an xml and set it in the response payload. Evetually we respond to the caller, checking if there is an error responding.

To run the service we use the following command:

ballerina run service1.bal

 

This will create the service in port 9090 with the function getCars. We can check the response of the function in the address http://localhost:9090/service1/getCars in the browser or using curl.

Proxy

Now let’s create the proxy!

First we create another file with the name proxy.bal and then we write the following code:

import ballerina/http;
import ballerina/io;
import ballerina/log;
http:Client clientEndpoint = new("http://localhost:9090/service1");
service proxy on new http:Listener(9091) {
    resource function getCars(http:Caller caller, http:Request req) {
        //Get the xml response from getCars
        var responseGetCars = clientEndpoint->get("/getCars");
        
        if (responseGetCars is http:Response) {
            var msg = responseGetCars.getXmlPayload();
            if (msg is xml) {
                var responseJson = msg.toJSON({});
                //Create the response for this service
                http:Response res = new;
                res.setPayload(untaint responseJson);
                
                var result = caller->respond(res);
                if (result is error) {
                    log:printError("Error sending response", err = result);
                }
            else {
                io:println("Invalid payload received:" , msg.reason());
            }
        else {
            io:println("Error when calling the backend: ", responseGetCars.reason());
        }
    }
}

In the code you can see that we’ve created a service named proxy on port 9091 using a function called getCars. We have an endpoint for service1 (no plugins or libraries ), and in the function getCars, the first thing we do is call the function getCars from service1. Afterwards, we check whether the payload from the response of service1 is an xml; and if it is, we convert it to json using just one function! In the end we respond to the caller usng the json from the conversion.

To test the proxy we can use curl or to have a visual of the json, use your browser and the following address: http://localhost:9091/proxy/getCars

You can compare the output from service1 and the proxy and check that the information is the same in both.

Conclusion

In conclusion we can see that Ballerina is shaping quite well to be a promising language the integration of services, with an agile approach and less boilerplate code required for handling communication between services.

But do proceed with caution, because it hasn’t yet achieved a stable release, so syntax and semantics are still subject to change. The first stable release is expected at the end of this year.

Daniel AmadoMiddleware as code with Ballerina
read more

Single-page applications

These days, web applications are taking over old desktop applications, and bringing with them advantages such as decoupling from any device, and convenience of use. The demand of rich, complex and yet user-friendly web applications is growing every day. Along with this demand and also gaining more and more popularity in web development trends are single-page applications.

A single-page application (SPA) is a web application that interacts with the user by dynamically rewriting the current page rather than loading an entire new page from a server. This results in a more comfortable experience for the user, and one that is not continually interrupted with successive page navigations.

Background – traditional multi-page applications

Multi-page applications (MPA) are the ‘traditional’ web applications that reload and render an entire new page as the result of an interaction between the user and the web app. Every user interaction – like clicking a link or changing the URL – and every data exchange from and to the server will make another request for the new page to be rendered. This process takes time and can have a not-so-positive effect on user experience if you’re aiming for an interactive, responsive application.

This default behaviour from MPAs can be worked around by taking advantage of AJAX, which allows refreshing just part of a page. However, we have to be aware of the complexity added to the development process by this solution.

An MPA will most likely use JavaScript (JS) at the front end to add some interactivity to the application, but does not depend on it for the rendering and delivery of the page content. This makes MPA an architecture that is well suited to supporting legacy browsers that usually offer more limited JS functionality.

MPA’s big advantage relies on search engine optimisation (SEO). When a request is made to the server to render a new page, the response is the final content for that page. Search engine crawlers will be able to see exactly what the user sees, so the application will perform well on the search engine. This is one of the big reasons why some major web sites, like Amazon and The New York Times, are still using this architecture.

On the other hand, applications built using this architecture tend to be bigger and slower, constantly loading pages from the server, which affects the user experience negatively. From the development perspective, the process tends to be more complex and will result in a coupled back and front end.

The rise of single-page applications

Like the name says, an SPA has only a single page. All the necessary code to render the application is retrieved in a single page load. After this initial load, no page reload is triggered, there is no new html file being fetched from the server. Instead, the application re-renders parts of the page as a result of any navigation in the browser. All the following communications between the application and the server are aimed at retrieving or posting data from and to the server and occur behind the scenes using well defined APIs from the back-end services.

SPAs rely heavily on JS to be able to listen to events and re-render parts of the page. Everything happens through JS, this kind of architecture is dependent on it and there is no way around it. Because of this, SPAs favour modern browsers that offer vast, more up-to-date JS support.

The behaviour from an SPA makes it a super-fast, responsive application, offering the user an interactive experience resembling that using a mobile or desktop. From the development perspective, we achieve a decoupled back and front end. The back end will no longer be responsible for rendering the view and the communication between the two modules will only comprise of data exchanges. We also simplify the deployment process greatly.

The problem with SPAs resides in the challenge posed by making the application SEO friendly. Given that most of the page content is loaded asynchronously, search engine crawlers have no way of knowing that more data is coming to the page. There is no single standard solution to handle this drawback, but there are some tools that can be used to create an SEO-friendly SPA. It is also probable that in time SPA frameworks will evolve to make it easier for search engines to crawl and index application content.

Are single-page applications the future of the web?

These type of applications have been around for years, but they are only now becoming widespread in the developer world. This is mainly due to the appearance and increasing popularity of web frameworks and libraries that allow developing SPAs out of the box quickly and efficiently, such as Angular and React. If we compare the trend evolution in these terms, we can see that the popularity of SPAs, Angular and React evolved proportionately over time.

SPAs have been getting more and more popular and it looks as if they are not going anywhere in the near future. The technical and functional benefits of SEO-friendly SPAs cannot be ignored and it is expected these type of applications will become available more frequently, especially with the evolution of the technologies involved and hopefully the resolution of some of the SPA pitfalls. However, we need to acknowledge that right now a SPA may not be the correct solution to every project.

Some MPA characteristics make this approach best suited to applications that serve a lot of content in different categories, and where search engine performance is highly important, such as online stores or marketplaces. SPAs are a good fit for dynamic platforms, possibly with a mobile component where a complex interface and a satisfying and reactive user experience are key factors to be considered, such as social networks or closed communities. A third possibility exists for those who like SPAs and their characteristics but cannot fit the application onto a single page: by considering an hybrid application you can make the best of both approaches.

No architecture is super right or super wrong, you just need to know your necessities and choose the best solution for you and your application.

Patrícia PereiraSingle-page applications
read more

Tableau 2019.3 Beta is out; let’s take a quick look!

Tableau is a software that helps people see and understand data, transforming the way it’s used to solve their problems. It makes analysing data fast and easy, beautiful and useful, to ensure that data makes an impact.

This is Tableau’s goal: translate data into value for business with a positive impact.

There’s a new version being launched, Tableau 2019.3 Beta, and installing this version we can see an interesting set of new capabilities. Using the new version, we were able to improve on our goals. Below we’ve highlighted the features we liked the most:

  • Explain Data— A new feature to help you understand the ‘why’ behind unexpected values in your data;
  • Tableau Catalog — A new capability of the Data Management Add-on to ensure you are using the right data in the right way.

Explain Data

Explain Data provides explanations, using Bayesian statistical methods, for unexpected values in data. With this feature is possible to identify causes and see new relationships between data and it’s enabled on all the existing workbooks for Creators and Explorers. No data prep or setup is required.

It’s very simple: select a mark and learn more about it.

The figure below presents a possible example of an explain data from a selected mark.

In this example, we are analysing a visualisation of products and their average profits. We can see that the product Copiers has a profit way higher than the others. With Explain Data, we learn that this happened because in the product records there is a really high value that increases this measure. This feature also displays a few visualisations related to this explanation, such as the first table that shows the record with this higher value.

The panel displayed by this feature presents the following components:

  1. Selected Mark Information – indicates what mark is being described and analysed;
  2. Measure Selection – shows the measures available to select the one in use for explanation;
  3. Expected Range Summary – describes whether the value is unexpected or not given the other marks in the visualisation;
  4. Explanation List – displays a list of the possible explanations for the value in the selected mark. Selection an explanation in the list will display more details in the Explanation Pane on the right;
  5. Explanation Pane – displays the selected explanation using a combination of text and visualisations.

Tableau Catalog

This new feature aims to help organisations manage their data better, because we are facing a time where is very hard for users to find and trust that they’re using the right data in the right way. This feature will be available for Tableau Server and Tableau Online.

With Tableau Catalog it’s easy to get a complete view of all of the data being used in Tableau, and how it’s connected to the analytics. Data owners can automatically track information about the data, including user permissions, usage metrics and lineage, as shown in the figure below.

In this example, the view with this feature is as though we’re looking into a catalog of data on this database. We can see the warnings that appear when there are errors in data quality (a), such as missing fields, and we can see the lineage of the data (b), such as which tables are related, and the workbooks and sheets the data is being used in.

Tableau Catalog also helps to build trust in the data across an organisation, creating a panel with data details (shown in the figure below):

  • Data Quality Warnings is where users can quickly see when there’s an issue with data being used in a dashboard – such as a missing field or maintenance interruption.
  • Definitions and additional metadata can be added in order for users to have a better understanding of the data itself.

These data details are included alongside the dashboard, enabling users and viewers to understand the source and lineage of data from within a visualisation.

In conclusion, with these new features, Tableau aims to:

  • Eliminate duplicate content, time wasting and prevent analysis based on bad data with Tableau Catalog. With the data quality warning, you may be more aware when there is something wrong with the data values and resolve them. One of the biggest changes is being able to see all the data sources that are being used, helping to avoid publishing duplicate data;
  • Provide faster explanations for unexpected values in data with Explain Data. This feature provides more detail about the data, especially outliers, and lets you explore other scenarios that can be further explored/investigated, saving data exploration time, especially when there is a data set with lots of data.

With these new features, Tableau is getting stronger in the market, bringing unique characteristics to bear against its competitors. This is an advantage because nowadays there are many solutions levelling up, and it is necessary to try to make a difference.

For more information and further details on the new features of Tableau 2019.3, click on the following link.

Carina MartinsTableau 2019.3 Beta is out; let’s take a quick look!
read more

A new strategic market: we’ve arrived in Sweden!

Xpand IT is a Portuguese company supported by Portuguese investment, and it is extraordinary how quickly we have expanded within Portugal. At the end of 2018, the company realised a growth of 45% and a revenue of around 15 million euros, which led Xpand IT to be distinguished in the Financial Times’ Ranking of 2019 (FT1000: Europe’s Fastest Growing Companies). Xpand IT was one of just three Portuguese technology companies to be featured in this ranking.

However, Xpand IT always seeks to grow further. We want to share our expertise with all four corners of the world and deliver a little bit of our culture to all our customers. It is true that Xpand IT’s international involvement has been increasing substantially, with 46.5% of our revenue coming from international customers at the end of last year.

This growth has been supported by two main focal points: exploring strategic markets such as Germany and the United Kingdom (where we now have a branch and an office), and strong leverage of the product we register. Xray and Xporter, both associated with Atlassian ecosystems, are used by more than 5 thousand customers in more than 90 countries! And new products are expected this year, in both artificial intelligence (Digital Xperience) and business intelligence.

This year, Xpand IT’s internationalisation strategy is to invest in new strategic markets in Europe: namely the Nordic countries. Sweden will be the first country focused on, but the goal is to expand our initiatives to the rest of them: Norway, Denmark and Finland.

There are already various commercial initiatives in this market, and we can count on support from partners such as Microsoft, Hitachi Vantara and Cloudera, all already well-established in countries like Sweden. Moreover, cultural barriers and different time zones do not represent a significant impact, which make this strategy an attractive investment prospect for 2019.

In the words of Paulo Lopes, CEO & Senior Partner at Xpand IT: “We are extremely proud of the growth the company has experienced in recent years and expect this success to keep on going. Xpand IT has been undergoing its internationalisation process for a few years now. However, we are presently entering a 2nd phase, where we will actively invest in new markets where we know that our technological expertise paired with a unique team and unique culture can definitely make a difference. We believe that Sweden makes the right starting point for investment in the Nordic market. Soon we will be able to give you even more good news about this project!…”

Ana LamelasA new strategic market: we’ve arrived in Sweden!
read more

Zwoox – Simplify your Data Ingestion

Zwoox is a data ingestion tool, developed by Xpand IT, that facilitates data imports and structuring into a Hadoop cluster.

This tool is highly scalable, thanks to its complete integration with Cloudera Enterprise Data Hub, and takes full advantage of several different Hadoop technologies, such as Spark, Hbase and Kafka. Zwoox eliminates the need to encode data pipelines ‘by hand’, regardless of the data source.

One of Zwoox’s biggest advantages is its capability to accelerate data ingestions, offering numerous options for data import and allowing real-time RDBMS DML replications for Hadoop data structures.

Despite the number of different tools that allow data import for Hadoop clusters, only Zwoox is capable of executing the import in an accessible, efficient and highly scalable manner, maintaining data in HDFS (with Hive tables) or Kudu.

Some of the possibilities offered by Zwoox:

  • Automation and partitioning in HDFS;
  • Translation of data types;
  • Full or delta upload;
  • Audit charts (with full history) without impacting on performance;
  • Derivation of new columns with pre-defined functions or “pluggable” code;
  • Operational integration with Cloudera Manager.

This tool is available on Cloudera Solutions Center and will be available soon on Xpand IT’s website. Meanwhile, you can access our informative document. If you’d like to learn more about Zwoox or data ingestion, please contact us.

Ana LamelasZwoox – Simplify your Data Ingestion
read more

Biometric technology for recognition

Nowadays it is more essential than ever to ensure that users feel safe when using a service, a mobile app and when registering on a website. The user’s priority is to know that their data is properly protected. And consequently biometric technology for recognition plays an increasingly crucial role as one of the safest and most efficient ways to authenticate user access to mobile devices, personal email accounts and even online bank accounts.

Biometrics has become one of the fastest, safest and most efficient ways to provide protection to individuals, not only because it is a requirement of authentication for each person as a citizen of a country – considering that fingerprints are some of the data collected and stored for legal purposes and documents – but also because it is the most casual (and reliable) way to protect our cellphones. The advantages of using biometric technology for recognition are efficiency, precision, convenience and scalability.

In IT, biometrics is primarily found connected to identity verification by using a person’s physical or behavioral features – fingerprints, facial recognition, voice recognition and even retina/iris recognition. We are referring to technologies that measure and analyze features of the human body as a way to allow or deny access.

But how does this identification work in the backend? Software that recognises specific points of presented data as starting points. These starting points are then processed and transported to a database which, in turn, uses an algorithm that converts information into a numeric value. It is this value that is compared to a user’s registered biometric entry, the scanner detected and the user’s authentication approved or denied, depending on whether there is a match or not.

The process of recognition can be carried out in two ways: comparing one value to others or comparing one value to another. The process of recognition of one value to others happens when the sample of a user is submitted to a system and compared to samples of other individuals; while the process of authentication of one value to another works with only one user, comparing the provided data to previously submitted data – as with our mobile devices.

There are countless biometric readings, these being some of the most common:

  1. Fingerprinting (one of the most used, economical biometric technologies for recognition, since it has a significant degree of accuracy. In this type of verification, various points of a finger are analysed, such as endings and unique arches). Examples: apps from Médis, MBWay or Revolut;
  2. Facial recognition using a facial image of the user, composed of various identification points on the face, with the ability to define the distance between the eyes and the nose, for example, and the bone structure and lines of each feature of the face. This reading has some percentage of failure, depending on whether the user has a beard or sunglasses. Examples: Apple’s Face ID;
  3. Voice recognition (recognition is carried out from an analysis of the vocal patterns of an individual, adding a combination of physical and behavioral factors). However, it is not of the most reliable method of recognition). Examples: Siri, from Apple, or Alexa, from Amazon;
  4. Retina/iris recognition (being the least used, retina/iris recognition works by storing lines and geometric patterns – in the case of the iris – and with the blood vessels in the eyes – in the case of the retina. Reliability is very high, but so are the costs, which makes this method of recognition less often used). Read this article on identity recognition in the banking industry;
  5. Writing style (behavioural biometrics based on writing style) (lastly, a way to authenticate a user through their writing – for example, a signature – since the pressure on the paper, the speed of the writing and the movements in the air are very difficult to copy. This is one of the oldest authentication tools, used mainly in the banking industry). Read the article on Read API, Microsoft Azure.
Ana LamelasBiometric technology for recognition
read more

Using Salesforce with Pentaho Data Integration

Pentaho Data Integration is the tool of the trade to move data between systems, and it doesn’t have to be just a business intelligence process. We can actually use it as an agile tool for point-to-point integration between systems. PDI has its own Salesforce input step which makes it a good candidate for integration.

What is Salesforce?

Salesforce is a cloud solution for customer relationship management (CRM). As a next generation multi-tenant Platform as a Service (PaaS), its unique infrastructure enables you to focus your efforts where they are most essential: creating microservices that can be leveraged in innovative applications and really speeding up the CRM development process.

Salesforce is the right platform to give you a complete 360º vision of your customer and his interactions with your brand, whether this happens via your email campaigns, call centres, social networks, or a simple phone call. Marketing automation is, for example, just one of the many great things Salesforce brings to you in an all-in-one platform.

How do we use PDI to connect to Salesforce?

For this access we need all our Salesforce connection details: the username, password and the SOAP web service URL. PDI has to be compatible with the SOAP API version that you use. For example:

  PDI version   SOAP API version number
  2.0   1.0
  3.8   20.0
  4.2   21.0
  6.0   24.0
  7.0   37.0
  8.2   40.0

 

Nevertheless, even if Salesforce gives us a new version of the API we can still use the old API perfectly well. Just be careful, because if you’ve created new modules inside the platform, the new API won’t have these customisations, and so you’ll need to use the Salesforce Object Query Language (SOQL) to get the data. But don’t worry, we’ll explain it all in the next section.

How do we use PDI to connect to Salesforce?

The SOQL syntax is quite similar to SQL syntax, but with a few differences:

  1. The SOQL does not recognise any special characters (such as * or ; ) and so we have to use all the fields that we will get from Salesforce, and we cannot add the ; at EOF.
  2. We cannot use comments in a query; SOQL does not recognise this either.
  3. To create joins we need to know a few things:
    • For the native modules that we need linkage to (direct relationship), we need to add in final name a ‘s’. For example:

Get all Orders with and without has Products (OrderItem Module)

  • For the customisation modules that we need to get data from another module (direct relationship) we need add to final name the  ‘__r’ . For example:
    Filter  OrderItems by Product_Skins__c field inside Product 2 Module 

How do we extract data from Salesforce with PDI?

We can use the Salesforce input step inside PDI to get data from Salesforce using SOQL; just be aware you can only use up to 20,000 characters to create the query.

  • Connection parameters specified:
    • Salesforce web service URL:

<url of Salesforce Platform>/services/Soap/u/<number of API Soap updated>

  • Username: Username Access to the Platform  (i.e. myname@pentaho.com)
  • Password:Password + Token (the company provides the token for us add to the password in Kettle.Properties) i.e: PASSWORDTOKEN

Settings parameters specified:

    • Specify query: Without active (like we can see in the image below) we only need to choose the module (the table containing records that we need to access).

For the next tab (Content) we have the following parameters options:

  • If we want to get all records from Salesforce (I mean, if we want to get delete records and insert records) you need place a tick in Query All Records, and choose from the parameters below one of the following options:
    • All (get new/update records and delete records), Update (get only inserts and update records) ;
    • If you untick the tick from Query All Records parameters, we only get insert/update registers;
    • Delete (we get only delete records).

How does PDI know if records are new/updates or deletes?

The Salesforce has native fields very useful for controlling the process. However we cannot see these fields in layout or on builder schema in SF. We only can see the data associated with these specific fields if we’re using the SOQL or PDI to access these fields.

  • CreatedById and CreateDate are fields that shows the user and data time when records were created.
  • The LastModifiedDate and LastModifiedID shows the data time and the user who modified the record. We can use these fields to get data updated in SF.
  • Id (Salesforce Id) present in URL as a string of 18 characters (Java config.) displays the register.
    For example:
  • We have more one native field IsDeleted with data type = Boolean that shows if the record was removed (IsDelete = true) or not (IsDelete = false).

In Additional field option we have three options:

  • Time out is useful in asynchrony systems because we can configure the timeout interval in milliseconds before the step times out;
  • Use Compression is useful to get more performance from the process. Because when you tick it, the system will direct all calls to the API and send all in .qzip field;
  • Limit is for configuring the maximum number of records to retrieve from the query.

Inside the last tab, we can see all fields from the query inside the first tab. Without SOQL we get all the module fields. With SOQL we get all the fields inside on SELECT function.

And for these cases, we need to do the manually changes.
For more details:

The base64 displays images or PDFs present in SF.

If we need send images (.jpeg) or pdf (.pdf) directly to SF we load these type of fields  using JAVA to convert binary files to the base64.

For example, to send a PDF file to SF:

How to load data to Salesforce with PDI?

Send data to Salesforce from other databases or from Salesforce.

The connection option is equal as described in Salesforce Input.
In Settings Options we have new parameters:

  • Rollback all Changes on error – if we got any error nothing will integrate into SF;
  • Batch Size – we can bring a static number of the records and integrate them simultaneously (the same batch) to SF;
  • In Output Fields Label we need to add the field name that we want to get the Salesforce ID for each record integrated.

In the Fields Option, we need to put field mapping.

  • For Module Field, we need to put the API Name field in SF to get the new data;
  • In the Steam Field, we need to put the name of the field that will be integrated into the respective field in SF;
  • Use External id = N to all field updated inside the respective Module;
  • Use External id = Y to all records that we need updating but are not present in the current module, but present in another module.

Delete records inside Salesforce

We delete records from Salesforce with Delete Salesforce step. We need to specify the key field from Table Input that does the reference to the key in Salesforce (Saleforce Id).

Update Salesforce records

If we only want to update records in SF we need to use the Salesforce Update Step.
Inside Fields (Key included) Option we need to add the key to records for the specific module.

Upsert data to Salesforce

If we want to insert and update in the same Batch to SF, we need to use Salesforce Upsert.
The parameter Upsert Comparison Field helps match the data in SF.

Fátima MirandaUsing Salesforce with Pentaho Data Integration
read more

Meetup Data Science Hands-on by Lisbon Kaggle: hot topics

Data Science Hands-on: “Predicting movies’ worldwide revenue”

On May 4th, a day known worldwide as Star Wars Day (“May the fourth“), approximately 40 Data Science fans seized this occasion to learn more about this subject by practicing and sharing on yet another Lisbon Kaggle Meetup. The “Data Science Hands-on” Meetup took place at Instituto Superior Técnico (IST Campus) and it was precisely dedicated to cinema:

  • the problem addressed consisted in predicting movies’ revenue before their premiere!

This event was also sponsored by Xpand IT, in collaboration with Hackerschool Lisboa, a group of IST students interested in technology, who also evangelizes the practice of learn-by-doing.

First off, the event started with a presentation by Xpand IT’s own Ricardo Pires, who introduced the company and their units focused on data treatment and exploration. Participants received a sample of how these problems fit in a real-world context. Shortly after, professor Rui Henriques, who teaches Data Science at IST, explained his perspective on how to approach a Data Science problem, providing some tips related to the meetup’s challenge.

Data from this challenge leverage learning and provide an idea of a potentially real problem, as they are semi-structured and demand a great amount of effort to process.

An estimated 80% of Data Scientists’ daily work revolves around data treatment.

(Source: Forbes

After the two presentations, participants started to unravel the mysteries hidden within the data. They verified, for example, a generalized increase in revenue over the years. They also noticed that American movies had a superior revenue, compared to all the rest.

Tackling the challenge

On the first part, participants modelled the problem with simpler columns, structured as:

  • budget
  • popularity
  • runtime
  • data

By doing so, they’ve tried to obtain the first predictions for the movies’ revenue. On the image below, which represents Spearman’s rank correlation coefficient, we can verify that budget and popularity columns are the most correlated with revenue.

During the second phase, contestants tackled the semi-structured columns, applying the one-hot encoding technique, as:

  • director
  • cast

Through this deeper analysis of the data, teams found out that the movies that generated more revenue (see table below).

Other relevant aspect to consider is that popularity is not always directly related with revenue, such is the case with “Transformers: Dark of the Moon”, as it is represented as less popular, but with a high revenue nonetheless.

It is also interesting to observe the actors who generated more revenue on average:

Conclusions

At the end of the meetup, participants shared their implemented solutions:

  • The group with the best results applied Logistic Regression. Despite being a simple model, it can provide adequate results when the focus is data treatment.
  • Data treatment went through several techniques, such as detection of outliers, in movies with a very discrepant budget, replacing these values with the median.
  • Budget and revenue columns were transformed into their respective logarithm, in order to approximate them to a Gaussian distribution.
  • One of the advantages of using a simpler model is that these are also easier to explain to a business stakeholder.

The fourth of May was spent learning alongside the most wonderful people, enlightening in every way. In case you’re interested in Data Science, join the community and show up at our monthly events.

More information on the “Data Science Hands-on” Meetup.

Joana Pinto

Data Scientist, Xpand IT

Alexandre Gomes

Data Scientist, Xpand IT

Sara GodinhoMeetup Data Science Hands-on by Lisbon Kaggle: hot topics
read more

Bootstrap: Introduction to the world’s most popular CSS library

Bootstrap is the most popular HTML, CSS and JavaScript based framework for developing responsive, mobile-first websites.

With the successive growth of mobile devices in the world, it is becoming clearer that having a responsive website is a must, and by taking a mobile-first approach, this framework has been revealed as an indispensable tool and became more popular year after year, mostly because of its feature-rich nature and ease of use. One of the most essential aspects of this framework, which represents the foundation on which to build an organised, structured layout, is its grid. Bootstrap is built on a powerful 12-Column Grid System, which allows developers to arrange and align content in a fully customisable, responsive grid. The grid adjusts according to the device resolution or viewport size, making the website content interactable and pleasant for both mobile and desktop users.

Beyond this, Bootstrap offers a base style for most HTML elements, making the website look more polished, as well as an extensive list of pre-built, fully-responsive components that are easy to integrate and customise. In terms of customisation, Bootstrap lets you change the base style, including fonts, colours and sizes, as well as modifying the existing breakpoints used in grid layout by overriding the existing CSS rules with custom ones according to the project design.

For those who prefer to build a responsive website from scratch, without the assistance of any 3rd party libraries, and who use ready-made CSS code and components from previous projects to achieve this, or who may tend to have a more conservative approach towards accepting its framework features, Bootstrap can also offer great benefits.

So, what are these benefits of Bootstrap?

Well, where you have a project with a tight schedule and with multiple developers involved, Bootstrap offers consistency between projects and people (it represents a commonly known technology) as well as speed in development, thanks to its pre-styled classes, which require much less effort and time than when creating everything from scratch. It´s important to mention that Bootstrap has good cross-browser compatibility, being currently compatible with all the latest major browsers (Chrome, Firefox, Safari, Microsoft Edge and Internet Explorer 10+) and excellent support, thanks to the huge community behind it. And, most importantly, it´s completely free and open-source. Before looking at some examples, let´s see how easy is to get started with Bootstrap.

Keep reading
Diogo CardanteBootstrap: Introduction to the world’s most popular CSS library
read more