What is Apache Superset?
Superset is a modern BI app with a simple interface, feature-rich when it comes to views, that allows the user to create and share dashboards.
This app is simple and doesn’t require programming, and allows the user to explore, filter and organise data. The best part is… it’s Open Source!
What does Apache Superset provide?
What is truly appealing about Apache Superset is the fact that you can explore each dashboard in a complex way. Superset allows you to focus on each graph/metric and easily filter and organise.
Another attractive feature in this app is the SQL/IDE editor with interactive querying.
Concerning security, Superset allows you to define a list of users and a list of default functionalities (associated with the groups of users) and allows you to view user statistics, providing you total control. You can establish baseline permissions, as well as granting access to certain views or menus. The app also provides an action log.
Visually, Superset has a minimalist and well-organised interface. Even though it is not as easy to use as Tableau, Superset can be an alternative to creating dashboards or people with some knowledge of SQL.
Superset supports most SQL databases by using Python ORM (SQL Alchemy), which allows you to access MySQL, Postgres, Oracle, MS SQL Server, MariaDB, Sybase, Redshift and others (more information here).
Superset also works with Druid (for example, Airbnb uses Superset with Druid 0.8x), but it does not have all the advanced features available.
This feature is definitely a plus. SQL-Lab allows you to select a database, schema and table (previously uploaded) and do an interactive query, preview the data and also save the query history (as shown below).
A semantic layer allows you to define fields and metrics (for example, ratios or anything expressed by SQL):
You also have Python modules available (some available macros), inside SQL, via Jinja.
The least positive side of this is the fact that you cannot add or query multiple tables at the same time. The solution is making a view, which works as a logical layer and abstracts the query from SQL, therefore acting as a virtual table. The only negative aspect of this is that there will always be a query running against another view query, thereby potentially resulting in performance issues.
How to create a dashboard
To create a dashboard, Superset works as follows: there are sources, where you can find databases and tables; slices which are sheets with graphs; and, lastly, dashboards which are composed of groups of slices. Each slice is associated with one or more dashboards, and each dashboard has various associated slices.
Views have different types of graphs available such as histograms, box plots, heatmaps or line charts.
It is simple to edit graphs: the available features for each view are on the left-hand side, and you just have to change them and press “Run Query”.
Although flexible in most areas, Superset imposes some standardisation, which happens with the colour schemas.
Each view allows you to filter views through wildcards.
Superset also allows you to share the view, export data to .json and .csv, and see the exact query performed behind each view.
Superset integrates with the main authentication backends (database, OpenID, LDAP, OAuth, REMOTE_USE, …).
Concerning privileges, as stated above, this app provides default roles such as Admin (full access), Alpha Gamma, Sql_lab and Public.
It is possible to establish permissions for each user, restricting access to a subset of data sources, menus, views, specific metrics and other criteria. Hence, it is relatively easy to define which type of permission and/or access to data is granted to each person.
People using Superset
According to GitHub, Superset is currently being used by Airbnb, Twitter, GfK Data Lab, Yahoo!, Udemy and others.
It is important to note that “Superset was tested in large environments with hundreds of users. The production environment of Airbnb runs with Kubernetes and more than 600 active users who see more than 100 thousand graphs per day”.
Superset Vs Tableau
- Able to join between tables within the same or different DBs.
- Unable to query/join multiple tables. Only possible view by view, which means having multiple queries, thereby affecting performance.
- Detailed customisation of dashboards, with legends, filters, tags, etc.
- Limited customisation by type of view (however, creation of CSS templates is available).
- Easy beginner learning and doesn’t require users to know SQL. Since the platform allows more complex and flexible tasks, there is a second learning curve for users who want to make the best use of Tableau.
- Easy and smooth learning, but requires SQL knowledge from users.
Superset’s main advantages
Besides all the advantages already stated, one of the main features of Superset is… it’s Open Source Business Intelligence!
- Provides BI without needing code (easy to use for those who are not programmers: you only need to know basic SQL);
- Easy and quick setup;
- Provides “SQL-Lab” that allows interactive querying;
- A semantic layer that broadens the dashboard with ratios and other metrics (based on SQL);
- Easy and attractive interactive view, that allows data exploration;
- Satisfies the needs of most companies to allow simple data analysis.
- The app still doesn’t support NoSQL databases;
- Even though the number of users is growing, it still has little or no support;
- Sometimes, SQL-Lab freezes in queries for large amounts of data;
- Has a considerable number of other unsolved issues.
Data Scientist, Xpand IT