This page provides you with instructions on how to extract data from Microsoft SQL Server and analyze it in Superset. (If the mechanics of extracting data from Microsoft SQL Server seem too complex or difficult to maintain, check out Stitch, which can do all the heavy lifting for you in just a few clicks.)
What is Microsoft SQL Server?
Microsoft SQL Server is a relational database management system that supports applications on a single machine, on a local area network, or across the web. SQL Server supports Microsoft's .NET framework out of the box, and integrates nicely into the Microsoft ecosystem.
What is Superset?
Apache Superset is a cloud-native data exploration and visualization platform that businesses can use to create business intelligence reports and dashboards. It includes a state-of-the-art SQL IDE, and it's open source software, free of cost. The platform was originally developed at Airbnb and donated to the Apache Software Foundation.
Getting data out of SQL Server
The most common way most folks who work with databases get their data is by using queries for extraction. With SELECT statements you can filter, sort, and limit the data you want to retrieve. If you need to export data in bulk, you can use Microsoft SQL Server Management Studio, which enables you to export entire tables and databases in formats like text, CSV, or SQL queries that can restore the database if run.
Loading data into Superset
You must replicate data from your SaaS applications to a data warehouse before you can report on it using Superset. Superset can connect to almost 30 databases and data warehouses. Once you choose a data source you want to connect to, you must specify a host name and port, database name, and username and password to get access to the data. You then specify the database schema or tables you want to work with.
Keeping SQL Server data up to date
All set! You've written a script to move data from SQL Server into your data warehouse. But data freshness is one of the most important aspects of any analysis – what happens when you have new data that you need to add?
You could load the entire SQL Server database again. Doing this is almost guaranteed to be slow and painful, and cause all kinds of latency.
A better approach is to build your script to recognize new and updated records in the source database. Using an auto-incrementing field as a key is a great way to accomplish this. The key functions something like a bookmark, so your script can resume where it left off. When you've built in this functionality, you can set up your script as a cron job or continuous loop to get new data as it appears in SQL Server.
From Microsoft SQL Server to your data warehouse: An easier solution
As mentioned earlier, the best practice for analyzing Microsoft SQL Server data in Superset is to store that data inside a data warehousing platform alongside data from your other databases and third-party sources. You can find instructions for doing these extractions for leading warehouses on our sister sites Microsoft SQL Server to Redshift, Microsoft SQL Server to BigQuery, Microsoft SQL Server to Azure Synapse Analytics, Microsoft SQL Server to PostgreSQL, Microsoft SQL Server to Panoply, and Microsoft SQL Server to Snowflake.
Easier yet, however, is using a solution that does all that work for you. Products like Stitch were built to move data automatically, making it easy to integrate Microsoft SQL Server with Superset. With just a few clicks, Stitch starts extracting your Microsoft SQL Server data, structuring it in a way that's optimized for analysis, and inserting that data into a data warehouse that can be easily accessed and analyzed by Superset.