Superset Documentation

Superset Documentation

Apache Superset Dev

Dec 05, 2019

CONTENTS

1 Superset Resources 3

2 Apache Software Foundation Resources 5

3 Overview 73.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Indices and tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

i

ii


Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application

Important: Disclaimer: Apache Superset is an effort undergoing incubation at The Apache Software Foundation(ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further reviewindicates that the infrastructure, communications, and decision making process have stabilized in a manner consistentwith other successful ASF projects. While incubation status is not necessarily a reflection of the completeness orstability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Note: Apache Superset, Superset, Apache, the Apache feather logo, and the Apache Superset project logo are eitherregistered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.

CONTENTS 1

http://www.apache.org/


2 CONTENTS

CHAPTER

ONE

SUPERSET RESOURCES

• Superset’s Github, note that we use Github for issue tracking

• Superset’s contribution guidelines and code of conduct on Github.

• Our mailing list archives. To subscribe, send an email to [email protected]

• Join our Slack

3

https://github.com/apache/incubator-superset

https://github.com/apache/incubator-superset/issues

https://github.com/apache/incubator-superset/blob/master/CONTRIBUTING.md

https://github.com/apache/incubator-superset/blob/master/CODE_OF_CONDUCT.md

https://lists.apache.org/[email protected]

https://join.slack.com/t/apache-superset/shared_invite/enQtNDMxMDY5NjM4MDU0LTc2Y2QwYjE4NGYwNzQyZWUwYTExZTdiZDMzMWQwZjc2YmJmM2QyMDkwMGVjZTA4N2I2MzUxZTk2YmE5MWRhZWE


4 Chapter 1. Superset Resources

CHAPTER

TWO

APACHE SOFTWARE FOUNDATION RESOURCES

• The Apache Software Foundation Website

• Current Events

• License

• Thanks to the ASF’s sponsors

• Sponsor Apache!

5

http://www.apache.org

http://www.apache.org/events/current-event

https://www.apache.org/licenses/

https://www.apache.org/foundation/thanks.html

http://www.apache.org/foundation/sponsorship.html


6 Chapter 2. Apache Software Foundation Resources

CHAPTER

THREE

OVERVIEW

3.1 Features

• A rich set of data visualizations

• An easy-to-use interface for exploring and visualizing data

• Create and share dashboards

• Enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP,OAuth & REMOTE_USER through Flask AppBuilder)

• An extensible, high-granularity security/permission model allowing intricate rules on who can access individualfeatures and the dataset

• A simple semantic layer, allowing users to control how data sources are displayed in the UI by defining whichfields should show up in which drop-down and which aggregation and function metrics are made available tothe user

• Integration with most SQL-speaking RDBMS through SQLAlchemy

• Deep integration with Druid.io

3.2 Databases

The following RDBMS are currently suppored:

• Amazon Athena

• Amazon Redshift

• Apache Drill

• Apache Druid

• Apache Hive

• Apache Impala

• Apache Kylin

• Apache Pinot

• Apache Spark SQL

• BigQuery

• ClickHouse

7

https://aws.amazon.com/athena/

https://aws.amazon.com/redshift/

https://drill.apache.org/

http://druid.io/

https://hive.apache.org/

https://impala.apache.org/

http://kylin.apache.org/

https://pinot.incubator.apache.org/

https://spark.apache.org/sql/

https://cloud.google.com/bigquery/

https://clickhouse.yandex/


• Google Sheets

• Greenplum

• IBM Db2

• MySQL

• Oracle

• PostgreSQL

• Presto

• Snowflake

• SQLite

• SQL Server

• Teradata

• Vertica

Other database engines with a proper DB-API driver and SQLAlchemy dialect should be supported as well.

8 Chapter 3. Overview

https://www.google.com/sheets/about/

https://greenplum.org/

https://www.ibm.com/analytics/db2/

https://www.mysql.com/

https://www.oracle.com/database/

https://www.postgresql.org/

http://prestodb.github.io/

https://www.snowflake.com/

https://www.sqlite.org/

https://www.microsoft.com/en-us/sql-server/

https://www.teradata.com/

https://www.vertica.com/


3.3 Screenshots

3.3. Screenshots 9




3.3. Screenshots 11


3.4 Contents

3.4.1 Installation & Configuration

Getting Started

Superset has deprecated support for Python 2.* and supports only ~=3.6 to take advantage of the newer Pythonfeatures and reduce the burden of supporting previous versions. We run our test suite against 3.6, but running on 3.7should work as well.

Cloud-native!

Superset is designed to be highly available. It is “cloud-native” as it has been designed scale out in large, distributedenvironments, and works well inside containers. While you can easily test drive Superset on a modest setup orsimply on your laptop, there’s virtually no limit around scaling out the platform. Superset is also cloud-native inthe sense that it is flexible and lets you choose your web server (Gunicorn, Nginx, Apache), your metadata databaseengine (MySQL, Postgres, MariaDB, . . . ), your message queue (Redis, RabbitMQ, SQS, . . . ), your results backend(S3, Redis, Memcached, . . . ), your caching layer (Memcached, Redis, . . . ), works well with services like NewRelic,StatsD and DataDog, and has the ability to run analytic workloads against most popular database technologies.

Superset is battle tested in large environments with hundreds of concurrent users. Airbnb’s production environmentruns inside Kubernetes and serves 600+ daily active users viewing over 100K charts a day.

The Superset web server and the Superset Celery workers (optional) are stateless, so you can scale out by running onas many servers as needed.

Start with Docker

Note: The Docker-related files and documentation has been community-contributed and is not actively maintainedand managed by the core committers working on the project. Some issues have been reported as of 2019-01. Help andcontributions around Docker are welcomed!

If you know docker, then you’re lucky, we have shortcut road for you to initialize development environment:

git clone https://github.com/apache/incubator-superset/cd incubator-superset/contrib/docker# prefix with SUPERSET_LOAD_EXAMPLES=yes to load examples:docker-compose run --rm superset ./docker-init.sh# you can run this command everytime you need to start superset now:docker-compose up

After several minutes for superset initialization to finish, you can open a browser and view http://localhost:8088 tostart your journey.

From there, the container server will reload on modification of the superset python and javascript source code. Don’tforget to reload the page to take the new frontend into account though.

See also CONTRIBUTING.md#building, for alternative way of serving the frontend.

It is also possible to run Superset in non-development mode: in the docker-compose.yml file remove the volumesneeded for development and change the variable SUPERSET_ENV to production.

If you are attempting to build on a Mac and it exits with 137 you need to increase your docker resources. OSXinstructions: https://docs.docker.com/docker-for-mac/#advanced (Search for memory)


https://github.com/apache/incubator-superset/blob/master/CONTRIBUTING.md#building

https://docs.docker.com/docker-for-mac/#advanced


Or if you’re curious and want to install superset from bottom up, then go ahead.

See also contrib/docker/README.md

OS dependencies

Superset stores database connection information in its metadata database. For that purpose, we use thecryptography Python library to encrypt connection passwords. Unfortunately, this library has OS level depen-dencies.

You may want to attempt the next step (“Superset installation and initialization”) and come back to this step if youencounter an error.

Here’s how to install them:

For Debian and Ubuntu, the following command will ensure that the required dependencies are installed:

sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip→˓libsasl2-dev libldap2-dev

Ubuntu 18.04 If you have python3.6 installed alongside with python2.7, as is default on Ubuntu 18.04 LTS, run thiscommand also:

sudo apt-get install build-essential libssl-dev libffi-dev python3.6-dev python-pip→˓libsasl2-dev libldap2-dev

otherwise build for cryptography fails.

For Fedora and RHEL-derivatives, the following command will ensure that the required dependencies are installed:

sudo yum upgrade python-setuptoolssudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel→˓openssl-devel libsasl2-devel openldap-devel

Mac OS X If possible, you should upgrade to the latest version of OS X as issues are more likely to be resolved forthat version. You will likely need the latest version of XCode available for your installed version of OS X. You shouldalso install the XCode command line tools:

xcode-select --install

System python is not recommended. Homebrew’s python also ships with pip:

brew install pkg-config libffi openssl pythonenv LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/→˓include" pip install cryptography==2.4.2

Windows isn’t officially supported at this point, but if you want to attempt it, download get-pip.py, and run pythonget-pip.py which may need admin access. Then run the following:

C:\> pip install cryptography

# You may also have to create C:\TempC:\> md C:\Temp

3.4. Contents 13

https://github.com/apache/incubator-superset/blob/master/contrib/docker/README.md

https://bootstrap.pypa.io/get-pip.py


Python virtualenv

It is recommended to install Superset inside a virtualenv. Python 3 already ships virtualenv. But if it’s not installed inyour environment for some reason, you can install it via the package for your operating systems, otherwise you caninstall from pip:

pip install virtualenv

You can create and activate a virtualenv by:

# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.# See https://docs.python.org/3.6/library/venv.htmlpython3 -m venv venv. venv/bin/activate

On windows the syntax for activating it is a bit different:

venv\Scripts\activate

Once you activated your virtualenv everything you are doing is confined inside the virtualenv. To exit a virtualenv justtype deactivate.

Python’s setup tools and pip

Put all the chances on your side by getting the very latest pip and setuptools libraries.:

pip install --upgrade setuptools pip

Superset installation and initialization

Follow these few simple steps to install Superset.:

# Install supersetpip install superset

# Initialize the databasesuperset db upgrade

# Create an admin user (you will be prompted to set a username, first and last name→˓before setting a password)$ export FLASK_APP=supersetflask fab create-admin

# Load some data to play withsuperset load_examples

# Create default roles and permissionssuperset init

# To start a development web server on port 8088, use -p to bind to another portsuperset run -p 8080 --with-threads --reload --debugger

After installation, you should be able to point your browser to the right hostname:port http://localhost:8088, login usingthe credential you entered while creating the admin account, and navigate to Menu -> Admin -> Refresh Metadata.This action should bring in all of your datasources for Superset to be aware of, and they should show up in Menu ->Datasources, from where you can start playing with your data!


http://localhost:8088


A proper WSGI HTTP Server

While you can setup Superset to run on Nginx or Apache, many use Gunicorn, preferably in async mode, whichallows for impressive concurrency even and is fairly easy to install and configure. Please refer to the documentation ofyour preferred technology to set up this Flask WSGI application in a way that works well in your environment. Here’san async setup known to work well in production:

gunicorn \-w 10 \-k gevent \--timeout 120 \-b 0.0.0.0:6666 \--limit-request-line 0 \--limit-request-field_size 0 \--statsd-host localhost:8125 \superset:app

Refer to the Gunicorn documentation for more information.

Note that the development web server (superset run or flask run) is not intended for production use.

If not using gunicorn, you may want to disable the use of flask-compress by setting ENABLE_FLASK_COMPRESS =False in your superset_config.py

Flask-AppBuilder Permissions

By default, every time the Flask-AppBuilder (FAB) app is initialized the permissions and views are added automat-ically to the backend and associated with the ‘Admin’ role. The issue, however, is when you are running multipleconcurrent workers this creates a lot of contention and race conditions when defining permissions and views.

To alleviate this issue, the automatic updating of permissions can be disabled by setting FAB_UPDATE_PERMS =False (defaults to True).

In a production environment initialization could take on the following form:

superset init gunicorn -w 10 . . . superset:app

Configuration behind a load balancer

If you are running superset behind a load balancer or reverse proxy (e.g. NGINX or ELB on AWS), you may need toutilise a healthcheck endpoint so that your load balancer knows if your superset instance is running. This is providedat /health which will return a 200 response containing “OK” if the the webserver is running.

If the load balancer is inserting X-Forwarded-For/X-Forwarded-Proto headers, you should set ENABLE_PROXY_FIX= True in the superset config file to extract and use the headers.

In case that the reverse proxy is used for providing ssl encryption, an explicit definition of the X-Forwarded-Proto maybe required. For the Apache webserver this can be set as follows:

RequestHeader set X-Forwarded-Proto "https"

Configuration

To configure your application, you need to create a file (module) superset_config.py and make sure it is inyour PYTHONPATH. Here are some of the parameters you can copy / paste in that configuration module:

3.4. Contents 15

https://docs.gunicorn.org/en/stable/design.html


#---------------------------------------------------------# Superset specific config#---------------------------------------------------------ROW_LIMIT = 5000

SUPERSET_WEBSERVER_PORT = 8088#---------------------------------------------------------

#---------------------------------------------------------# Flask App Builder configuration#---------------------------------------------------------# Your App secret keySECRET_KEY = '\2\1thisismyscretkey\1\2\e\y\y\h'

# The SQLAlchemy connection string to your database backend# This connection defines the path to the database that stores your# superset metadata (slices, connections, tables, dashboards, ...).# Note that the connection information to connect to the datasources# you want to explore are managed directly in the web UISQLALCHEMY_DATABASE_URI = 'sqlite:////path/to/superset.db'

# Flask-WTF flag for CSRFWTF_CSRF_ENABLED = True# Add endpoints that need to be exempt from CSRF protectionWTF_CSRF_EXEMPT_LIST = []# A CSRF token that expires in 1 yearWTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365

# Set this API key to enable Mapbox visualizationsMAPBOX_API_KEY = ''

All the parameters and default values defined in https://github.com/apache/incubator-superset/blob/master/superset/config.py can be altered in your local superset_config.py . Administrators will want to read through the file tounderstand what can be configured locally as well as the default values in place.

Since superset_config.py acts as a Flask configuration module, it can be used to alter the settings Flask itself,as well as Flask extensions like flask-wtf, flask-cache, flask-migrate, and flask-appbuilder.Flask App Builder, the web framework used by Superset offers many configuration settings. Please consult the FlaskApp Builder Documentation for more information on how to configure it.

Make sure to change:

• SQLALCHEMY_DATABASE_URI, by default it is stored at ~/.superset/superset.db

• SECRET_KEY, to a long random string

In case you need to exempt endpoints from CSRF, e.g. you are running a custom auth postback endpoint, you can addthem to WTF_CSRF_EXEMPT_LIST

WTF_CSRF_EXEMPT_LIST = [‘’]

Database dependencies

Superset does not ship bundled with connectivity to databases, except for Sqlite, which is part of the Python standardlibrary. You’ll need to install the required packages for the database you want to use as your metadata database as wellas the packages needed to connect to the databases you want to access through Superset.

Here’s a list of some of the recommended packages.


https://github.com/apache/incubator-superset/blob/master/superset/config.py

https://github.com/apache/incubator-superset/blob/master/superset/config.py

https://flask-appbuilder.readthedocs.org/en/latest/config.html

https://flask-appbuilder.readthedocs.org/en/latest/config.html


database pypi package SQLAlchemy URI prefixAmazonAthena

pip install"PyAthenaJDBC>1.0.9"

awsathena+jdbc://

AmazonAthena

pip install"PyAthena>1.2.0"

awsathena+rest://

AmazonRedshift

pip installsqlalchemy-redshift

redshift+psycopg2://

ApacheDrill

pip installsqlalchemy-drill

For the REST API:‘‘ drill+sadrill:// For JDBCdrill+jdbc://

ApacheDruid

pip install pyduid druid://

ApacheHive

pip install pyhive hive://

Apache Im-pala

pip install impyla impala://

ApacheKylin

pip install kylinpy kylin://

ApachePinot

pip install pinotdb pinot+http://CONTROLLER:5436/ query?server=http://CONTROLLER:5983/

ApacheSpark SQL

pip install pyhive jdbc+hive://

BigQuery pip install pybigquery bigquery://ClickHouse pip install

sqlalchemy-clickhouseGoogleSheets

pip install gsheetsdb gsheets://

IBM Db2 pip install ibm_db_sa db2+ibm_db://MySQL pip install mysqlclient mysql://Oracle pip install cx_Oracle oracle://Post-greSQL

pip install psycopg2 postgresql+psycopg2://

Presto pip install pyhive presto://Snowflake pip install

snowflake-sqlalchemysnowflake://

SQLite sqlite://SQL Server pip install pymssql mssql://Teradata pip install

sqlalchemy-teradatateradata://

Vertica pip installsqlalchemy-vertica-python

vertica+vertica_python://

Note that many other databases are supported, the main criteria being the existence of a functional SqlAlchemy dialectand Python driver. Googling the keyword sqlalchemy in addition of a keyword that describes the database youwant to connect to should get you to the right place.

(AWS) Athena

The connection string for Athena looks like this

awsathena+jdbc://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.→˓amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...

3.4. Contents 17


Where you need to escape/encode at least the s3_staging_dir, i.e.,

s3://... -> s3%3A//...

You can also use PyAthena library(no java required) like this

awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.→˓amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...

See PyAthena.

(Google) BigQuery

The connection string for BigQuery looks like this

bigquery://{project_id}

To be able to upload data, e.g. sample data, the python library pandas_gbq is required.

Snowflake

The connection string for Snowflake looks like this

snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse=→˓{warehouse}

The schema is not necessary in the connection string, as it is defined per table/query. The role and warehouse can beomitted if defaults are defined for the user, i.e.

snowflake://{user}:{password}@{account}.{region}/{database}

Make sure the user has privileges to access and use all required databases/schemas/tables/views/warehouses, as theSnowflake SQLAlchemy engine does not test for user rights during engine creation.

See Snowflake SQLAlchemy.

Teradata

The connection string for Teradata looks like this

teradata://{user}:{password}@{host}

Note: Its required to have Teradata ODBC drivers installed and environment variables configured for proper work ofsqlalchemy dialect. Teradata ODBC Drivers available here: https://downloads.teradata.com/download/connectivity/odbc-driver/linux

Required environment variables:

export ODBCINI=/.../teradata/client/ODBC_64/odbc.iniexport ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini

See Teradata SQLAlchemy.


https://github.com/laughingman7743/PyAthena#sqlalchemy

https://github.com/snowflakedb/snowflake-sqlalchemy

https://downloads.teradata.com/download/connectivity/odbc-driver/linux

https://downloads.teradata.com/download/connectivity/odbc-driver/linux

https://github.com/Teradata/sqlalchemy-teradata


Apache Drill

At the time of writing, the SQLAlchemy Dialect is not available on pypi and must be downloaded here: SQLAlchemyDrill

Alternatively, you can install it completely from the command line as follows:

git clone https://github.com/JohnOmernik/sqlalchemy-drillcd sqlalchemy-drillpython3 setup.py install

Once that is done, you can connect to Drill in two ways, either via the REST interface or by JDBC. If you areconnecting via JDBC, you must have the Drill JDBC Driver installed.

The basic connection string for Drill looks like this

drill+sadrill://{username}:{password}@{host}:{port}/{storage_plugin}?use_ssl=True

If you are using JDBC to connect to Drill, the connection string looks like this:

drill+jdbc://{username}:{password}@{host}:{port}/{storage_plugin}

For a complete tutorial about how to use Apache Drill with Superset, see this tutorial: Visualize Anything with Supersetand Drill

Caching

Superset uses Flask-Cache for caching purpose. Configuring your caching backend is as easy as providing aCACHE_CONFIG, constant in your superset_config.py that complies with the Flask-Cache specifications.

Flask-Cache supports multiple caching backends (Redis, Memcached, SimpleCache (in-memory), or the local filesys-tem). If you are going to use Memcached please use the pylibmc client library as python-memcached does not handlestoring binary data correctly. If you use Redis, please install the redis Python package:

pip install redis

For setting your timeouts, this is done in the Superset metadata and goes up the “timeout searchpath”, from your sliceconfiguration, to your data source’s configuration, to your database’s and ultimately falls back into your global defaultdefined in CACHE_CONFIG.

CACHE_CONFIG = {'CACHE_TYPE': 'redis','CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)'CACHE_KEY_PREFIX': 'superset_results','CACHE_REDIS_URL': 'redis://localhost:6379/0',

}

It is also possible to pass a custom cache initialization function in the config to handle additional caching use cases.The function must return an object that is compatible with the Flask-Cache API.

from custom_caching import CustomCache

def init_cache(app):"""Takes an app instance and returns a custom cache backend"""config = {

'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)'CACHE_KEY_PREFIX': 'superset_results',

(continues on next page)

3.4. Contents 19

https://github.com/JohnOmernik/sqlalchemy-drill

https://github.com/JohnOmernik/sqlalchemy-drill

http://thedataist.com/visualize-anything-with-superset-and-drill/

http://thedataist.com/visualize-anything-with-superset-and-drill/

https://pythonhosted.org/Flask-Cache/

https://pypi.python.org/pypi/redis

https://pythonhosted.org/Flask-Cache/


(continued from previous page)

}return CustomCache(app, config)

CACHE_CONFIG = init_cache

Superset has a Celery task that will periodically warm up the cache based on different strategies. To use it, add thefollowing to the CELERYBEAT_SCHEDULE section in config.py:

CELERYBEAT_SCHEDULE = {'cache-warmup-hourly': {

'task': 'cache-warmup','schedule': crontab(minute=0, hour='*'), # hourly'kwargs': {

'strategy_name': 'top_n_dashboards','top_n': 5,'since': '7 days ago',

},},

}

This will cache all the charts in the top 5 most popular dashboards every hour. For other strategies, check the super-set/tasks/cache.py file.

Deeper SQLAlchemy integration

It is possible to tweak the database connection information using the parameters exposed by SQLAlchemy. In theDatabase edit view, you will find an extra field as a JSON blob.

This JSON string contains extra configuration elements. The engine_params object gets unpacked into thesqlalchemy.create_engine call, while the metadata_params get unpacked into the sqlalchemy.MetaData call. Re-fer to the SQLAlchemy docs for more information.

Schemas (Postgres & Redshift)

Postgres and Redshift, as well as other databases, use the concept of schema as a logical entity on top of the database.For Superset to connect to a specific schema, there’s a schema parameter you can set in the table form.


https://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine

https://docs.sqlalchemy.org/en/rel_1_2/core/metadata.html#sqlalchemy.schema.MetaData


External Password store for SQLAlchemy connections

It is possible to use an external store for you database passwords. This is useful if you a running a custom secretdistribution framework and do not wish to store secrets in Superset’s meta database.

Example: Write a function that takes a single argument of type sqla.engine.url and returns the password forthe given connection string. Then set SQLALCHEMY_CUSTOM_PASSWORD_STORE in your config file to point tothat function.

def example_lookup_password(url):secret = <<get password from external framework>>return 'secret'

SQLALCHEMY_CUSTOM_PASSWORD_STORE = example_lookup_password

A common pattern is to use environment variables to make secrets available.SQLALCHEMY_CUSTOM_PASSWORD_STORE can also be used for that purpose.

def example_password_as_env_var(url):# assuming the uri looks like# mysql://localhost?superset_user:{SUPERSET_PASSWORD}return url.password.format(os.environ)

SQLALCHEMY_CUSTOM_PASSWORD_STORE = example_password_as_env_var

SSL Access to databases

This example worked with a MySQL database that requires SSL. The configuration may differ with other backends.This is what was put in the extra parameter

{"metadata_params": {},"engine_params": {

"connect_args":{"sslmode":"require","sslrootcert": "/path/to/my/pem"

}}

}

Druid

• From the UI, enter the information about your clusters in the Sources -> Druid Clusters menu by hitting the +sign.

• Once the Druid cluster connection information is entered, hit the Sources -> Refresh Druid Metadata menu itemto populate

• Navigate to your datasources

Note that you can run the superset refresh_druid command to refresh the metadata from your Druid clus-ter(s)

3.4. Contents 21


Presto

By default Superset assumes the most recent version of Presto is being used when querying the datasource. If you’reusing an older version of presto, you can configure it in the extra parameter:

{"version": "0.123"

}

CORS

The extra CORS Dependency must be installed:

superset[cors]

The following keys in superset_config.py can be specified to configure CORS:

• ENABLE_CORS: Must be set to True in order to enable CORS

• CORS_OPTIONS: options passed to Flask-CORS (documentation <https://flask-cors.corydolphin.com/en/latest/api.html#extension>)

DOMAIN SHARDING

Chrome allows up to 6 open connections per domain at a time. When there are more than 6 slices in dashboard, a lotof time fetch requests are queued up and wait for next available socket. PR 5039 adds domain sharding to Superset,and this feature will be enabled by configuration only (by default Superset doesn’t allow cross-domain request).

• SUPERSET_WEBSERVER_DOMAINS: list of allowed hostnames for domain sharding feature. default None

MIDDLEWARE

Superset allows you to add your own middleware. To add your own middleware, update theADDITIONAL_MIDDLEWARE key in your superset_config.py. ADDITIONAL_MIDDLEWARE should be a list ofyour additional middleware classes.

For example, to use AUTH_REMOTE_USER from behind a proxy server like nginx, you have to add a simple mid-dleware class to add the value of HTTP_X_PROXY_REMOTE_USER (or any other custom header from the proxy) toGunicorn’s REMOTE_USER environment variable:

class RemoteUserMiddleware(object):def __init__(self, app):

self.app = appdef __call__(self, environ, start_response):

user = environ.pop('HTTP_X_PROXY_REMOTE_USER', None)environ['REMOTE_USER'] = userreturn self.app(environ, start_response)

ADDITIONAL_MIDDLEWARE = [RemoteUserMiddleware, ]

Adapted from http://flask.pocoo.org/snippets/69/


https://github.com/apache/incubator-superset/pull/5039


Event Logging

Superset by default logs special action event on it’s database. These log can be accessed on the UI navigating to“Security” -> “Action Log”. You can freely customize these logs by implementing your own event log class.

Example of a simple JSON to Stdout class:

class JSONStdOutEventLogger(AbstractEventLogger):

def log(self, user_id, action, *args, **kwargs):records = kwargs.get('records', list())dashboard_id = kwargs.get('dashboard_id')slice_id = kwargs.get('slice_id')duration_ms = kwargs.get('duration_ms')referrer = kwargs.get('referrer')

for record in records:log = dict(

action=action,json=record,dashboard_id=dashboard_id,slice_id=slice_id,duration_ms=duration_ms,referrer=referrer,user_id=user_id

)print(json.dumps(log))

Then on Superset’s config pass an instance of the logger type you want to use.

EVENT_LOGGER = JSONStdOutEventLogger()

Upgrading

Upgrading should be as straightforward as running:

pip install superset --upgradesuperset db upgradesuperset init

We recommend to follow standard best practices when upgrading Superset, such as taking a database backup priorto the upgrade, upgrading a staging environment prior to upgrading production, and upgrading production while lessusers are active on the platform.

Note: Some upgrades may contain backward-incompatible changes, or require scheduling downtime, when that isthe case, contributors attach notes in UPDATING.md in the repository. It’s recommended to review this file prior torunning an upgrade.

Celery Tasks

On large analytic databases, it’s common to run queries that execute for minutes or hours. To enable support for longrunning queries that execute beyond the typical web request’s timeout (30-60 seconds), it is necessary to configure anasynchronous backend for Superset which consists of:

3.4. Contents 23


• one or many Superset workers (which is implemented as a Celery worker), and can be started with the celeryworker command, run celery worker --help to view the related options.

• a celery broker (message queue) for which we recommend using Redis or RabbitMQ

• a results backend that defines where the worker will persist the query results

Configuring Celery requires defining a CELERY_CONFIG in your superset_config.py. Both the worker andweb server processes should have the same configuration.

class CeleryConfig(object):BROKER_URL = 'redis://localhost:6379/0'CELERY_IMPORTS = (

'superset.sql_lab','superset.tasks',

)CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'CELERYD_LOG_LEVEL = 'DEBUG'CELERYD_PREFETCH_MULTIPLIER = 10CELERY_ACKS_LATE = TrueCELERY_ANNOTATIONS = {

'sql_lab.get_sql_results': {'rate_limit': '100/s',

},'email_reports.send': {

'rate_limit': '1/s','time_limit': 120,'soft_time_limit': 150,'ignore_result': True,

},}CELERYBEAT_SCHEDULE = {

'email_reports.schedule_hourly': {'task': 'email_reports.schedule_hourly','schedule': crontab(minute=1, hour='*'),

},}

CELERY_CONFIG = CeleryConfig

• To start a Celery worker to leverage the configuration run:

celery worker --app=superset.tasks.celery_app:app --pool=prefork -Ofair -c 4

• To start a job which schedules periodic background jobs, run

celery beat --app=superset.tasks.celery_app:app

To setup a result backend, you need to pass an instance of a derivative of werkzeug.contrib.cache.BaseCache to the RESULTS_BACKEND configuration key in your superset_config.py. It’s possible to useMemcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache), memory or the file system (in a single server-type setup or for testing), or to write your own caching interface. Your superset_config.pymay look somethinglike:

# On S3from s3cache.s3cache import S3CacheS3_CACHE_BUCKET = 'foobar-superset'S3_CACHE_KEY_PREFIX = 'sql_lab_result'



https://pypi.python.org/pypi/s3werkzeugcache



RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)

# On Redisfrom werkzeug.contrib.cache import RedisCacheRESULTS_BACKEND = RedisCache(

host='localhost', port=6379, key_prefix='superset_results')

Important notes

• It is important that all the worker nodes and web servers in the Superset cluster share a common metadatadatabase. This means that SQLite will not work in this context since it has limited support for concurrency andtypically lives on the local file system.

• There should only be one instance of celery beat running in your entire setup. If not, background jobs canget scheduled multiple times resulting in weird behaviors like duplicate delivery of reports, higher than expectedload / traffic etc.

Email Reports

Email reports allow users to schedule email reports for

• slice and dashboard visualization (Attachment or inline)

• slice data (CSV attachment on inline table)

Schedules are defined in crontab format and each schedule can have a list of recipients (all of them can receive a singlemail, or separate mails). For audit purposes, all outgoing mails can have a mandatory bcc.

Requirements

• A selenium compatible driver & headless browser

– geckodriver and Firefox is preferred

– chromedriver is a good option too

• Run celery worker and celery beat as follows

celery worker --app=superset.tasks.celery_app:app --pool=prefork -Ofair -c 4celery beat --app=superset.tasks.celery_app:app

Important notes

• Be mindful of the concurrency setting for celery (using -c 4). Selenium/webdriver instances can consume alot of CPU / memory on your servers.

• In some cases, if you notice a lot of leaked geckodriver processes, try running your celery processes with

celery worker --pool=prefork --max-tasks-per-child=128 ...

• It is recommended to run separate workers for sql_lab and email_reports tasks. Can be done by usingqueue field in CELERY_ANNOTATIONS

SQL Lab

SQL Lab is a powerful SQL IDE that works with all SQLAlchemy compatible databases. By default, queries areexecuted in the scope of a web request so they may eventually timeout as queries exceed the maximum duration ofa web request in your environment, whether it’d be a reverse proxy or the Superset server itself. In such cases, it is

3.4. Contents 25

https://github.com/mozilla/geckodriver

http://chromedriver.chromium.org/


preferred to use celery to run the queries in the background. Please follow the examples/notes mentioned above toget your celery setup working.

Also note that SQL Lab supports Jinja templating in queries and that it’s possible to overload the default Jinja contextin your environment by defining the JINJA_CONTEXT_ADDONS in your superset configuration. Objects referencedin this dictionary are made available for users to use in their SQL.

JINJA_CONTEXT_ADDONS = {'my_crazy_macro': lambda x: x*2,

}

SQL Lab also includes a live query validation feature with pluggable backends. You can configure which validationimplementation is used with which database engine by adding a block like the following to your config.py:

FEATURE_FLAGS = {'SQL_VALIDATORS_BY_ENGINE': {

'presto': 'PrestoDBSQLValidator',}

}

The available validators and names can be found in sql_validators/.

Scheduling queries

You can optionally allow your users to schedule queries directly in SQL Lab. This is done by addding extra metadata tosaved queries, which are then picked up by an external scheduled (like [Apache Airflow](https://airflow.apache.org/)).

To allow scheduled queries, add the following to your config.py:

FEATURE_FLAGS = {# Configuration for scheduling queries from SQL Lab. This information is# collected when the user clicks "Schedule query", and saved into the `extra`# field of saved queries.# See: https://github.com/mozilla-services/react-jsonschema-form'SCHEDULED_QUERIES': {

'JSONSCHEMA': {'title': 'Schedule','description': (

'In order to schedule a query, you need to specify when it ''should start running, when it should stop running, and how ''often it should run. You can also optionally specify ''dependencies that should be met before the query is ''executed. Please read the documentation for best practices ''and more information on how to specify dependencies.'

),'type': 'object','properties': {

'output_table': {'type': 'string','title': 'Output table name',

},'start_date': {

'type': 'string','title': 'Start date',# date-time is parsed using the chrono library, see# https://www.npmjs.com/package/chrono-node#usage'format': 'date-time','default': 'tomorrow at 9am',

},



https://airflow.apache.org/



'end_date': {'type': 'string','title': 'End date',# date-time is parsed using the chrono library, see# https://www.npmjs.com/package/chrono-node#usage'format': 'date-time','default': '9am in 30 days',

},'schedule_interval': {

'type': 'string','title': 'Schedule interval',

},'dependencies': {

'type': 'array','title': 'Dependencies','items': {

'type': 'string',},

},},

},'UISCHEMA': {

'schedule_interval': {'ui:placeholder': '@daily, @weekly, etc.',

},'dependencies': {

'ui:help': ('Check the documentation for the correct format when ''defining dependencies.'

),},

},'VALIDATION': [

# ensure that start_date <= end_date{

'name': 'less_equal','arguments': ['start_date', 'end_date'],'message': 'End date cannot be before start date',# this is where the error message is shown'container': 'end_date',

},],# link to the scheduler; this example links to an Airflow pipeline# that uses the query id and the output table as its name'linkback': (

'https://airflow.example.com/admin/airflow/tree?''dag_id=query_${id}_${extra_json.schedule_info.output_table}'

),},

}

This feature flag is based on [react-jsonschema-form](https://github.com/mozilla-services/react-jsonschema-form),and will add a button called “Schedule Query” to SQL Lab. When the button is clicked, a modal will show upwhere the user can add the metadata required for scheduling the query.

This information can then be retrieved from the endpoint /savedqueryviewapi/api/read and used to schedule the queriesthat have scheduled_queries in their JSON metadata. For schedulers other than Airflow, additional fields can be easily

3.4. Contents 27

https://github.com/mozilla-services/react-jsonschema-form


added to the configuration file above.

Celery Flower

Flower is a web based tool for monitoring the Celery cluster which you can install from pip:

pip install flower

and run via:

celery flower --app=superset.tasks.celery_app:app

Building from source

More advanced users may want to build Superset from sources. That would be the case if you fork the project to addfeatures specific to your environment. See CONTRIBUTING.md#setup-local-environment-for-development.

Blueprints

Blueprints are Flask’s reusable apps. Superset allows you to specify an array of Blueprints in yoursuperset_config module. Here’s an example of how this can work with a simple Blueprint. By doing so,you can expect Superset to serve a page that says “OK” at the /simple_page url. This can allow you to run otherthings such as custom data visualization applications alongside Superset, on the same server.

from flask import Blueprintsimple_page = Blueprint('simple_page', __name__,

template_folder='templates')@simple_page.route('/', defaults={'page': 'index'})@simple_page.route('/<page>')def show(page):

return "Ok"

BLUEPRINTS = [simple_page]

StatsD logging

Superset is instrumented to log events to StatsD if desired. Most endpoints hit are logged as well as key events likequery start and end in SQL Lab.

To setup StatsD logging, it’s a matter of configuring the logger in your superset_config.py.

from superset.stats_logger import StatsdStatsLoggerSTATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset')

Note that it’s also possible to implement you own logger by deriving superset.stats_logger.BaseStatsLogger.

Install Superset with helm in Kubernetes

You can install Superset into Kubernetes with Helm <https://helm.sh/>. The chart is located in install/helm.

To install Superset into your Kubernetes:


https://github.com/apache/incubator-superset/blob/master/CONTRIBUTING.md#setup-local-environment-for-development

https://flask.palletsprojects.com/en/1.0.x/tutorial/views/

https://helm.sh/


helm upgrade --install superset ./install/helm/superset

Note that the above command will install Superset into default namespace of your Kubernetes cluster.

Custom OAuth2 configuration

Beyond FAB supported providers (github, twitter, linkedin, google, azure), its easy to connect Superset with otherOAuth2 Authorization Server implementations that support “code” authorization.

The first step: Configure authorization in Superset superset_config.py.

AUTH_TYPE = AUTH_OAUTHOAUTH_PROVIDERS = [

{ 'name':'egaSSO','token_key':'access_token', # Name of the token in the response of access_

→˓token_url'icon':'fa-address-card', # Icon for the provider'remote_app': {

'consumer_key':'myClientId', # Client Id (Identify Superset application)'consumer_secret':'MySecret', # Secret for this Client Id (Identify

→˓Superset application)'request_token_params':{

'scope': 'read' # Scope for the Authorization},'access_token_method':'POST', # HTTP Method to call access_token_url'access_token_params':{ # Additional parameters for calls to

→˓access_token_url'client_id':'myClientId'

},'access_token_headers':{ # Additional headers for calls to access_

→˓token_url'Authorization': 'Basic Base64EncodedClientIdAndSecret'

},'base_url':'https://myAuthorizationServer/oauth2AuthorizationServer/','access_token_url':'https://myAuthorizationServer/

→˓oauth2AuthorizationServer/token','authorize_url':'https://myAuthorizationServer/oauth2AuthorizationServer/

→˓authorize'}

}]

# Will allow user self registration, allowing to create Flask users from Authorized→˓UserAUTH_USER_REGISTRATION = True

# The default user self registration roleAUTH_USER_REGISTRATION_ROLE = "Public"

Second step: Create a CustomSsoSecurityManager that extends SupersetSecurityManager and overridesoauth_user_info:

from superset.security import SupersetSecurityManager

class CustomSsoSecurityManager(SupersetSecurityManager):


3.4. Contents 29



def oauth_user_info(self, provider, response=None):logging.debug("Oauth2 provider: {0}.".format(provider))if provider == 'egaSSO':

# As example, this line request a GET to base_url + '/' + userDetails→˓with Bearer Authentication,

# and expects that authorization server checks the token, and response with user→˓details

me = self.appbuilder.sm.oauth_remotes[provider].get('userDetails').datalogging.debug("user_data: {0}".format(me))return { 'name' : me['name'], 'email' : me['email'], 'id' : me['user_name

→˓'], 'username' : me['user_name'], 'first_name':'', 'last_name':''}...

This file must be located at the same directory than superset_config.py with the namecustom_sso_security_manager.py.

Then we can add this two lines to superset_config.py:

from custom_sso_security_manager import CustomSsoSecurityManagerCUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager

Feature Flags

Because of a wide variety of users, Superset has some features that are not enabled by default. For example, someusers have stronger security restrictions, while some others may not. So Superset allow users to enable or disable somefeatures by config. For feature owners, you can add optional functionalities in Superset, but will be only affected by asubset of users.

You can enable or disable features with flag from superset_config.py:

DEFAULT_FEATURE_FLAGS = {'CLIENT_CACHE': False,'ENABLE_EXPLORE_JSON_CSRF_PROTECTION': False

}

Here is a list of flags and descriptions:

• ENABLE_EXPLORE_JSON_CSRF_PROTECTION

– For some security concerns, you may need to enforce CSRF protection on all query request to explore_jsonendpoint. In Superset, we use flask-csrf add csrf protection for all POST requests, but this protectiondoesn’t apply to GET method.

– When ENABLE_EXPLORE_JSON_CSRF_PROTECTION is set to true, your users cannot make GETrequest to explore_json. The default value for this feature False (current behavior), explore_json acceptsboth GET and POST request. See PR 7935 for more details.

3.4.2 Tutorial - Creating your first dashboard

This tutorial targets someone who wants to create charts and dashboards in Superset. We’ll show you how to connectSuperset to a new database and configure a table in that database for analysis. You’ll also explore the data you’veexposed and add a visualization to a dashboard so that you get a feel for the end-to-end user experience.


https://sjl.bitbucket.io/flask-csrf/

https://github.com/apache/incubator-superset/pull/7935


Connecting to a new database

We assume you already have a database configured and can connect to it from the instance on which you’re runningSuperset. If you’re just testing Superset and want to explore sample data, you can load some sample PostgreSQLdatasets into a fresh DB, or configure the example weather data we use here.

Under the Sources menu, select the Databases option:

On the resulting page, click on the green plus sign, near the top right:

You can configure a number of advanced options on this page, but for this walkthrough, you’ll only need to do twothings:

1. Name your database connection:

2. Provide the SQLAlchemy Connection URI and test the connection:

This example shows the connection string for our test weather database. As noted in the text below the URI, youshould refer to the SQLAlchemy documentation on creating new connection URIs for your target database.

Click the Test Connection button to confirm things work end to end. Once Superset can successfully connect andauthenticate, you should see a popup like this:

3.4. Contents 31

https://wiki.postgresql.org/wiki/Sample_Databases

https://wiki.postgresql.org/wiki/Sample_Databases

https://github.com/dylburger/noaa-ghcn-weather-data

https://docs.sqlalchemy.org/en/rel_1_2/core/engines.html#database-urls


Moreover, you should also see the list of tables Superset can read from the schema you’re connected to, at the bottomof the page:

If the connection looks good, save the configuration by clicking the Save button at the bottom of the page:

Adding a new table

Now that you’ve configured a database, you’ll need to add specific tables to Superset that you’d like to query.

Under the Sources menu, select the Tables option:

On the resulting page, click on the green plus sign, near the top left:

You only need a few pieces of information to add a new table to Superset:



• The name of the table

• The target database from the Database drop-down menu (i.e. the one you just added above)

• Optionally, the database schema. If the table exists in the “default” schema (e.g. the public schema in Post-greSQL or Redshift), you can leave the schema field blank.

Click on the Save button to save the configuration:

When redirected back to the list of tables, you should see a message indicating that your table was created:

This message also directs you to edit the table configuration. We’ll edit a limited portion of the configuration now -just to get you started - and leave the rest for a more advanced tutorial.

Click on the edit button next to the table you’ve created:

On the resulting page, click on the List Table Column tab. Here, you’ll define the way you can use specific columnsof your table when exploring your data. We’ll run through these options to describe their purpose:

• If you want users to group metrics by a specific field, mark it as Groupable.

• If you need to filter on a specific field, mark it as Filterable.

• Is this field something you’d like to get the distinct count of? Check the Count Distinct box.

• Is this a metric you want to sum, or get basic summary statistics for? The Sum, Min, and Max columns willhelp.

3.4. Contents 33


• The is temporal field should be checked for any date or time fields. We’ll cover how this manifests itself inanalyses in a moment.

Here’s how we’ve configured fields for the weather data. Even for measures like the weather measurements (precipi-tation, snowfall, etc.), it’s ideal to group and filter by these values:

As with the configurations above, click the Save button to save these settings.

Exploring your data

To start exploring your data, simply click on the table name you just created in the list of available tables:

By default, you’ll be presented with a Table View:

Let’s walk through a basic query to get the count of all records in our table. First, we’ll need to change the Since filterto capture the range of our data. You can use simple phrases to apply these filters, like “3 years ago”:



The upper limit for time, the Until filter, defaults to “now”, which may or may not be what you want.

Look for the Metrics section under the GROUP BY header, and start typing “Count” - you’ll see a list of metricsmatching what you type:

Select the COUNT(*) metric, then click the green Query button near the top of the explore:

You’ll see your results in the table:

Let’s group this by the weather_description field to get the count of records by the type of weather recorded by addingit to the Group by section:

and run the query:

3.4. Contents 35


Let’s find a more useful data point: the top 10 times and places that recorded the highest temperature in 2015.

We replace weather_description with latitude, longitude and measurement_date in the Group by section:

And replace COUNT(*) with max__measurement_flag:

The max__measurement_flag metric was created when we checked the box under Max and next to the measure-ment_flag field, indicating that this field was numeric and that we wanted to find its maximum value when grouped byspecific fields.

In our case, measurement_flag is the value of the measurement taken, which clearly depends on the type of mea-surement (the researchers recorded different values for precipitation and temperature). Therefore, we must filter ourquery only on records where the weather_description is equal to “Maximum temperature”, which we do in the Filterssection at the bottom of the explore:

Finally, since we only care about the top 10 measurements, we limit our results to 10 records using the Row limitoption under the Options header:

We click Query and get the following results:



In this dataset, the maximum temperature is recorded in tenths of a degree Celsius. The top value of 1370, measuredin the middle of Nevada, is equal to 137 C, or roughly 278 degrees F. It’s unlikely this value was correctly recorded.We’ve already been able to investigate some outliers with Superset, but this just scratches the surface of what we cando.

You may want to do a couple more things with this measure:

• The default formatting shows values like 1.37k, which may be difficult for some users to read. It’s likely youmay want to see the full, comma-separated value. You can change the formatting of any measure by editing itsconfig (Edit Table Config > List Sql Metric > Edit Metric > D3Format)

• Moreover, you may want to see the temperature measurements in plain degrees C, not tenths of a degree. Or youmay want to convert the temperature to degrees Fahrenheit. You can change the SQL that gets executed againstthe database, baking the logic into the measure itself (Edit Table Config > List Sql Metric > Edit Metric > SQLExpression)

For now, though, let’s create a better visualization of these data and add it to a dashboard.

We change the Chart Type to “Distribution - Bar Chart”:

Our filter on Maximum temperature measurements was retained, but the query and formatting options are dependenton the chart type, so you’ll have to set the values again:

3.4. Contents 37


You should note the extensive formatting options for this chart: the ability to set axis labels, margins, ticks, etc. Tomake the data presentable to a broad audience, you’ll want to apply many of these to slices that end up in dashboards.For now, though, we run our query and get the following chart:

Creating a slice and dashboard

This view might be interesting to researchers, so let’s save it. In Superset, a saved query is called a Slice.

To create a slice, click the Save as button near the top-left of the explore:



A popup should appear, asking you to name the slice, and optionally add it to a dashboard. Since we haven’t yetcreated any dashboards, we can create one and immediately add our slice to it. Let’s do it:

Click Save, which will direct you back to your original query. We see that our slice and dashboard were successfullycreated:

Let’s check out our new dashboard. We click on the Dashboards menu:

and find the dashboard we just created:

3.4. Contents 39


Things seemed to have worked - our slice is here!

But it’s a bit smaller than we might like. Luckily, you can adjust the size of slices in a dashboard by clicking, holdingand dragging the bottom-right corner to your desired dimensions:

After adjusting the size, you’ll be asked to click on the icon near the top-right of the dashboard to save the newconfiguration.

Congrats! You’ve successfully linked, analyzed, and visualized data in Superset. There are a wealth of other tableconfiguration and visualization options, so please start exploring and creating slices and dashboards of your own.

3.4.3 Security

Security in Superset is handled by Flask AppBuilder (FAB). FAB is a “Simple and rapid application developmentframework, built on top of Flask.”. FAB provides authentication, user management, permissions and roles. Pleaseread its Security documentation.


https://flask-appbuilder.readthedocs.io/en/latest/security.html


Provided Roles

Superset ships with a set of roles that are handled by Superset itself. You can assume that these roles will stay up-to-date as Superset evolves. Even though it’s possible for Admin users to do so, it is not recommended that youalter these roles in any way by removing or adding permissions to them as these roles will be re-synchronized to theiroriginal values as you run your next superset init command.

Since it’s not recommended to alter the roles described here, it’s right to assume that your security strategy shouldbe to compose user access based on these base roles and roles that you create. For instance you could create a roleFinancial Analyst that would be made of a set of permissions to a set of data sources (tables) and/or databases.Users would then be granted Gamma, Financial Analyst, and perhaps sql_lab.

Admin

Admins have all possible rights, including granting or revoking rights from other users and altering other people’sslices and dashboards.

Alpha

Alpha users have access to all data sources, but they cannot grant or revoke access from other users. They are alsolimited to altering the objects that they own. Alpha users can add and alter data sources.

Gamma

Gamma users have limited access. They can only consume data coming from data sources they have been given accessto through another complementary role. They only have access to view the slices and dashboards made from datasources that they have access to. Currently Gamma users are not able to alter or add data sources. We assume that theyare mostly content consumers, though they can create slices and dashboards.

Also note that when Gamma users look at the dashboards and slices list view, they will only see the objects that theyhave access to.

sql_lab

The sql_lab role grants access to SQL Lab. Note that while Admin users have access to all databases by default,both Alpha and Gamma users need to be given access on a per database basis.

Public

It’s possible to allow logged out users to access some Superset features.

By setting PUBLIC_ROLE_LIKE_GAMMA = True in your superset_config.py, you grant public role thesame set of permissions as for the GAMMA role. This is useful if one wants to enable anonymous users to viewdashboards. Explicit grant on specific datasets is still required, meaning that you need to edit the Public role andadd the Public data sources to the role manually.

3.4. Contents 41


Managing Gamma per data source access

Here’s how to provide users access to only specific datasets. First make sure the users with limited access have [only]the Gamma role assigned to them. Second, create a new role (Menu -> Security -> List Roles) and clickthe + sign.

This new window allows you to give this new role a name, attribute it to users and select the tables in thePermissions dropdown. To select the data sources you want to associate with this role, simply click on thedropdown and use the typeahead to search for your table names.

You can then confirm with your Gamma users that they see the objects (dashboards and slices) associated with thetables related to their roles.

Customizing

The permissions exposed by FAB are very granular and allow for a great level of customization. FAB creates manypermissions automagically for each model that is created (can_add, can_delete, can_show, can_edit, . . . ) as well asfor each view. On top of that, Superset can expose more granular permissions like all_datasource_access.

We do not recommend altering the 3 base roles as there are a set of assumptions that Superset is built upon. It ispossible though for you to create your own roles, and union them to existing ones.

Permissions

Roles are composed of a set of permissions, and Superset has many categories of permissions. Here are the differentcategories of permissions:

• Model & action: models are entities like Dashboard, Slice, or User. Each model has a fixed set ofpermissions, like can_edit, can_show, can_delete, can_list, can_add, and so on. By addingcan_delete on Dashboard to a role, and granting that role to a user, this user will be able to deletedashboards.

• Views: views are individual web pages, like the explore view or the SQL Lab view. When granted to a user,he/she will see that view in its menu items, and be able to load that page.

• Data source: For each data source, a permission is created. If the user does not have theall_datasource_access permission granted, the user will only be able to see Slices or explore the datasources that are granted to them

• Database: Granting access to a database allows for the user to access all data sources within that database, andwill enable the user to query that database in SQL Lab, provided that the SQL Lab specific permission havebeen granted to the user



Restricting access to a subset of data sources

The best way to go is probably to give user Gamma plus one or many other roles that would add access to specific datasources. We recommend that you create individual roles for each access profile. Say people in your finance departmentmight have access to a set of databases and data sources, and these permissions can be consolidated in a single role.Users with this profile then need to be attributed Gamma as a foundation to the models and views they can access, andthat Finance role that is a collection of permissions to data objects.

One user can have many roles, so a finance executive could be granted Gamma, Finance, and perhaps anotherExecutive role that gather a set of data sources that power dashboards only made available to executives. Whenlooking at its dashboard list, this user will only see the list of dashboards it has access to, based on the roles andpermissions that were attributed.

Restricting the access to some metrics

Sometimes some metrics are relatively sensitive (e.g. revenue). We may want to restrict those metrics to only a fewroles. For example, assumed there is a metric [cluster1].[datasource1].[revenue] and only Adminusers are allowed to see it. Here’s how to restrict the access.

1. Edit the datasource (Menu -> Source -> Druid datasources -> edit the record"datasource1") and go to the tab List Druid Metric. Check the checkbox Is Restricted inthe row of the metric revenue.

2. Edit the role (Menu -> Security -> List Roles -> edit the record “Admin”), in the per-missions field, type-and-search the permission metric access on [cluster1].[datasource1].[revenue] (id: 1), then click the Save button on the bottom of the page.

Any users without the permission will see the error message Access to the metrics denied: revenue (Status: 500) inthe slices. It also happens when the user wants to access a post-aggregation metric that is dependent on revenue.

3.4.4 SQL Lab

SQL Lab is a modern, feature-rich SQL IDE written in React.

3.4. Contents 43

https://facebook.github.io/react/


Feature Overview

• Connects to just about any database backend

• A multi-tab environment to work on multiple queries at a time

• A smooth flow to visualize your query results using Superset’s rich visualization capabilities

• Browse database metadata: tables, columns, indexes, partitions

• Support for long-running queries

– uses the Celery distributed queue to dispatch query handling to workers

– supports defining a “results backend” to persist query results

• A search engine to find queries executed in the past

• Supports templating using the Jinja templating language which allows for using macros in your SQL code

Extra features

• Hit alt + enter as a keyboard shortcut to run your query

Templating with Jinja

SELECT *FROM some_tableWHERE partition_key = '{{ presto.first_latest_partition('some_table') }}'

Templating unleashes the power and capabilities of a programming language within your SQL code.

Templates can also be used to write generic queries that are parameterized so they can be re-used easily.


http://www.celeryproject.org/

http://jinja.pocoo.org/docs/dev/


Available macros

We expose certain modules from Python’s standard library in Superset’s Jinja context:

• time: time

• datetime: datetime.datetime

• uuid: uuid

• random: random

• relativedelta: dateutil.relativedelta.relativedelta

Jinja’s builtin filters can be also be applied where needed.

Extending macros

As mentioned in the Installation & Configuration documentation, it’s possible for administrators to expose more moremacros in their environment using the configuration variable JINJA_CONTEXT_ADDONS. All objects referenced inthis dictionary will become available for users to integrate in their queries in SQL Lab.

3.4.5 Visualizations Gallery

3.4. Contents 45

http://jinja.pocoo.org/docs/dev/templates/

https://superset.incubator.apache.org/installation.html#installation-configuration




3.4. Contents 47




3.4. Contents 49




3.4. Contents 51




3.4. Contents 53




3.4. Contents 55




3.4.6 Druid

Superset has a native connector to Druid and a majority of Druid’s features are accessible through Superset.

Note: Druid now supports SQL and can be accessed through Superset’s SQLAlchemy connector. The long-termvision is to deprecate the Druid native REST connector and query Druid exclusively through the SQL interface.

Aggregations

Common aggregations or Druid metrics can be defined and used in Superset. The first and simpler use case is to usethe checkbox matrix expose in your datasource’s edit view (Sources -> Druid Datasources -> [yourdatasource] -> Edit -> [tab] List Druid Column). Clicking the GroupBy and Filterablecheckboxes will make the column appear in the related dropdowns while in explore view. Checking CountDistinct, Min, Max or Sum will result in creating new metrics that will appear in the List Druid Metric tabupon saving the datasource. By editing these metrics, you’ll notice that their json element corresponds to Druid ag-gregation definition. You can create your own aggregations manually from the List Druid Metric tab followingDruid documentation.

3.4. Contents 57


Post-Aggregations

Druid supports post aggregation and this works in Superset. All you have to do is create a metric, much like you wouldcreate an aggregation manually, but specify postagg as a Metric Type. You then have to provide a valid jsonpost-aggregation definition (as specified in the Druid docs) in the Json field.

Unsupported Features

Note: Unclear at this point, this section of the documentation could use some input.

3.4.7 Misc

Visualization Tools

The data is visualized via the slices. These slices are visual components made with the D3.js. Some components canbe completed or required inputs.

Country Map Tools

This tool is used in slices for visualization number or string by region, province or department of your countries. So,if you want to use tools, you need ISO 3166-2 code of region, province or department.



ISO 3166-2 is part of the ISO 3166 standard published by the International Organization for Standardization (ISO),and defines codes for identifying the principal subdivisions (e.g., provinces or states) of all countries coded in ISO3166-1

The purpose of ISO 3166-2 is to establish an international standard of short and unique alphanumeric codes to representthe relevant administrative divisions and dependent territories of all countries in a more convenient and less ambiguousform than their full names. Each complete ISO 3166-2 code consists of two parts, separated by a hyphen:[1]

The first part is the ISO 3166-1 alpha-2 code of the country; The second part is a string of up to three alphanumericcharacters, which is usually obtained from national sources and stems from coding systems already in use in thecountry concerned, but may also be developed by the ISO itself.

List of Countries

• Belgium

ISO Name of regionBE-BRU BruxellesBE-VAN AntwerpenBE-VLI LimburgBE-VOV Oost-VlaanderenBE-VBR Vlaams BrabantBE-VWV West-VlaanderenBE-WBR Brabant WallonBE-WHT HainautBE-WLG LiègeBE-VLI LimburgBE-WLX LuxembourgBE-WNA Namur

• Brazil

3.4. Contents 59


ISO Name of regionBR-AC AcreBR-AL AlagoasBR-AP AmapáBR-AM AmazonasBR-BA BahiaBR-CE CearáBR-DF Distrito FederalBR-ES Espírito SantoBR-GO GoiásBR-MA MaranhãoBR-MS Mato Grosso do SulBR-MT Mato GrossoBR-MG Minas GeraisBR-PA ParáBR-PB ParaíbaBR-PR ParanáBR-PE PernambucoBR-PI PiauíBR-RJ Rio de JaneiroBR-RN Rio Grande do NorteBR-RS Rio Grande do SulBR-RO RondôniaBR-RR RoraimaBR-SP São PauloBR-SC Santa CatarinaBR-SE SergipeBR-TO Tocantins

• China

ISO Name of regionCN-34 AnhuiCN-11 BeijingCN-50 ChongqingCN-35 FujianCN-62 GansuCN-44 GuangdongCN-45 GuangxiCN-52 GuizhouCN-46 HainanCN-13 HebeiCN-23 HeilongjiangCN-41 HenanCN-42 HubeiCN-43 HunanCN-32 JiangsuCN-36 JiangxiCN-22 JilinCN-21 LiaoningCN-15 Nei MongolContinued on next page



Table 1 – continued from previous pageISO Name of regionCN-64 Ningxia HuiCN-63 QinghaiCN-61 ShaanxiCN-37 ShandongCN-31 ShanghaiCN-14 ShanxiCN-51 SichuanCN-12 TianjinCN-65 Xinjiang UygurCN-54 XizangCN-53 YunnanCN-33 ZhejiangCN-71 TaiwanCN-91 Hong KongCN-92 Macao

• Egypt

ISO Name of regionEG-DK Ad DaqahliyahEG-BA Al Bahr al AhmarEG-BH Al BuhayrahEG-FYM Al FayyumEG-GH Al GharbiyahEG-ALX Al IskandariyahEG-IS Al Isma iliyahEG-GZ Al JizahEG-MNF Al MinufiyahEG-MN Al MinyaEG-C Al QahirahEG-KB Al QalyubiyahEG-LX Al UqsurEG-WAD Al Wadi al JadidEG-SUZ As SuwaysEG-SHR Ash SharqiyahEG-ASN AswanEG-AST AsyutEG-BNS Bani SuwayfEG-PTS Bur Sa idEG-DT DumyatEG-JS Janub Sina’EG-KFS Kafr ash ShaykhEG-MT MatrouhEG-KN QinaEG-SIN Shamal Sina’EG-SHG Suhaj

• France

3.4. Contents 61


ISO Name of regionFR-67 Bas-RhinFR-68 Haut-RhinFR-24 DordogneFR-33 GirondeFR-40 LandesFR-47 Lot-et-GaronneFR-64 Pyrénées-AtlantiquesFR-03 AllierFR-15 CantalFR-43 Haute-LoireFR-63 Puy-de-DômeFR-91 EssonneFR-92 Hauts-de-SeineFR-75 ParisFR-77 Seine-et-MarneFR-93 Seine-Saint-DenisFR-95 Val-d’OiseFR-94 Val-de-MarneFR-78 YvelinesFR-14 CalvadosFR-50 MancheFR-61 OrneFR-21 Côte-d’OrFR-58 NièvreFR-71 Saône-et-LoireFR-89 YonneFR-22 Côtes-d’ArmorFR-29 FinistèreFR-35 Ille-et-VilaineFR-56 MorbihanFR-18 CherFR-28 Eure-et-LoirFR-37 Indre-et-LoireFR-36 IndreFR-41 Loir-et-CherFR-45 LoiretFR-08 ArdennesFR-10 AubeFR-52 Haute-MarneFR-51 MarneFR-2A Corse-du-SudFR-2B Haute-CorseFR-25 DoubsFR-70 Haute-SaôneFR-39 JuraFR-90 Territoire de BelfortFR-27 EureFR-76 Seine-MaritimeFR-11 AudeFR-30 Gard

Continued on next page



Table 2 – continued from previous pageISO Name of regionFR-34 HéraultFR-48 LozèreFR-66 Pyrénées-OrientalesFR-19 CorrèzeFR-23 CreuseFR-87 Haute-VienneFR-54 Meurthe-et-MoselleFR-55 MeuseFR-57 MoselleFR-88 VosgesFR-09 AriègeFR-12 AveyronFR-32 GersFR-31 Haute-GaronneFR-65 Hautes-PyrénéesFR-46 LotFR-82 Tarn-et-GaronneFR-81 TarnFR-59 NordFR-62 Pas-de-CalaisFR-44 Loire-AtlantiqueFR-49 Maine-et-LoireFR-53 MayenneFR-72 SartheFR-85 VendéeFR-02 AisneFR-60 OiseFR-80 SommeFR-17 Charente-MaritimeFR-16 CharenteFR-79 Deux-SèvresFR-86 VienneFR-04 Alpes-de-Haute-ProvenceFR-06 Alpes-MaritimesFR-13 Bouches-du-RhôneFR-05 Hautes-AlpesFR-83 VarFR-84 VaucluseFR-01 AinFR-07 ArdècheFR-26 DrômeFR-74 Haute-SavoieFR-38 IsèreFR-42 LoireFR-69 RhôneFR-73 Savoie

• Germany

3.4. Contents 63


ISO Name of regionDE-BW Baden-WürttembergDE-BY BayernDE-BE BerlinDE-BB BrandenburgDE-HB BremenDE-HH HamburgDE-HE HessenDE-MV Mecklenburg-VorpommernDE-NI NiedersachsenDE-NW Nordrhein-WestfalenDE-RP Rheinland-PfalzDE-SL SaarlandDE-ST Sachsen-AnhaltDE-SN SachsenDE-SH Schleswig-HolsteinDE-TH Thüringen

• Italy

ISO Name of regionIT-CH ChietiIT-AQ L’AquilaIT-PE PescaraIT-TE TeramoIT-BA BariIT-BT Barletta-Andria-TraniIT-BR BrindisiIT-FG FoggiaIT-LE LecceIT-TA TarantoIT-MT MateraIT-PZ PotenzaIT-CZ CatanzaroIT-CS CosenzaIT-KR CrotoneIT-RC Reggio Di CalabriaIT-VV Vibo ValentiaIT-AV AvellinoIT-BN BeneventoIT-CE CasertaIT-NA NapoliIT-SA SalernoIT-BO BolognaIT-FE FerraraIT-FC Forli’ - CesenaIT-MO ModenaIT-PR ParmaIT-PC PiacenzaIT-RA RavennaIT-RE Reggio Nell’Emilia




Table 3 – continued from previous pageISO Name of regionIT-RN RiminiIT-GO GoriziaIT-PN PordenoneIT-TS TriesteIT-UD UdineIT-FR FrosinoneIT-LT LatinaIT-RI RietiIT-RM RomaIT-VT ViterboIT-GE GenovaIT-IM ImperiaIT-SP La SpeziaIT-SV SavonaIT-BG BergamoIT-BS BresciaIT-CO ComoIT-CR CremonaIT-LC LeccoIT-LO LodiIT-MN MantuaIT-MI MilanoIT-MB Monza and BrianzaIT-PV PaviaIT-SO SondrioIT-VA VareseIT-AN AnconaIT-AP Ascoli PicenoIT-FM FermoIT-MC MacerataIT-PU Pesaro E UrbinoIT-CB CampobassoIT-IS IserniaIT-AL AlessandriaIT-AT AstiIT-BI BiellaIT-CN CuneoIT-NO NovaraIT-TO TorinoIT-VB Verbano-Cusio-OssolaIT-VC VercelliIT-CA CagliariIT-CI Carbonia-IglesiasIT-VS Medio CampidanoIT-NU NuoroIT-OG OgliastraIT-OT Olbia-TempioIT-OR OristanoIT-SS Sassari


3.4. Contents 65


Table 3 – continued from previous pageISO Name of regionIT-AG AgrigentoIT-CL CaltanissettaIT-CT CataniaIT-EN EnnaIT-ME MessinaIT-PA PalermoIT-RG RagusaIT-SR SyracuseIT-TP TrapaniIT-AR ArezzoIT-FI FlorenceIT-GR GrossetoIT-LI LivornoIT-LU LuccaIT-MS Massa CarraraIT-PI PisaIT-PT PistoiaIT-PO PratoIT-SI SienaIT-BZ BolzanoIT-TN TrentoIT-PG PerugiaIT-TR TerniIT-AO AostaIT-BL BellunoIT-PD PaduaIT-RO RovigoIT-TV TrevisoIT-VE VeneziaIT-VR VeronaIT-VI Vicenza

• Japan

ISO Name of regionJP-01 HokkaidoJP-02 AomoriJP-03 IwateJP-04 MiyagiJP-05 AkitaJP-06 YamagataJP-07 FukushimaJP-08 IbarakiJP-09 TochigiJP-10 GunmaJP-11 SaitamaJP-12 ChibaJP-13 TokyoJP-14 KanagawaContinued on next page



Table 4 – continued from previous pageISO Name of regionJP-15 NiigataJP-16 ToyamaJP-17 IshikawaJP-18 FukuiJP-19 YamanashiJP-20 NaganoJP-21 GifuJP-22 ShizuokaJP-23 AichiJP-24 MieJP-25 ShigaJP-26 KyotoJP-27 OsakaJP-28 HyogoJP-29 NaraJP-30 WakayamaJP-31 TottoriJP-32 ShimaneJP-33 OkayamaJP-34 HiroshimaJP-35 YamaguchiJP-36 TokushimaJP-37 KagawaJP-38 EhimeJP-39 KochiJP-40 FukuokaJP-41 SagaJP-42 NagasakiJP-43 KumamotoJP-44 OitaJP-45 MiyazakiJP-46 KagoshimaJP-47 Okinawa

• Morocco

ISO Name of regionMA-BES Ben SlimaneMA-KHO KhouribgaMA-SET SettatMA-JDI El JadidaMA-SAF SafiMA-BOM BoulemaneMA-FES FèsMA-SEF SefrouMA-MOU Zouagha-Moulay YacoubMA-KEN KénitraMA-SIK Sidi KacemMA-CAS Casablanca


3.4. Contents 67


Table 5 – continued from previous pageISO Name of regionMA-MOH MohammediaMA-ASZ Assa-ZagMA-GUE GuelmimMA-TNT Tan-TanMA-TAT TataMA-LAA LaâyouneMA-HAO Al HaouzMA-CHI ChichaouaMA-KES El Kelaâ des SraghnaMA-ESI EssaouiraMA-MMD MarrakechMA-HAJ El HajebMA-ERR ErrachidiaMA-IFR IfraneMA-KHN KhénifraMA-MEK MeknèsMA-BER Berkane TaourirtMA-FIG FiguigMA-JRA JeradaMA-NAD NadorMA-OUJ Oujda AngadMA-KHE KhémissetMA-RAB RabatMA-SAL SaléMA-SKH Skhirate-TémaraMA-AGD Agadir-Ida ou TananeMA-CHT Chtouka-Aït BahaMA-INE Inezgane-Aït MelloulMA-OUA OuarzazateMA-TAR TaroudanntMA-TIZ TiznitMA-ZAG ZagoraMA-AZI AzilalMA-BEM Béni MellalMA-CHE ChefchaouenMA-FAH Fahs AnjraMA-LAR LaracheMA-TET TétouanMA-TNG Tanger-AssilahMA-HOC Al HoceïmaMA-TAO TaounateMA-TAZ Taza

• Netherlands



ISO Name of regionNL-DR DrentheNL-FL FlevolandNL-FR FrieslandNL-GE GelderlandNL-GR GroningenNL-YS IJsselmeerNL-LI LimburgNL-NB Noord-BrabantNL-NH Noord-HollandNL-OV OverijsselNL-UT UtrechtNL-ZE ZeelandNL-ZM Zeeuwse merenNL-ZH Zuid-Holland

• Russian

ISO Name of regionRU-AD AdygeyRU-ALT AltayRU-AMU AmurRU-ARK Arkhangel’skRU-AST Astrakhan’RU-BA BashkortostanRU-BEL BelgorodRU-BRY BryanskRU-BU BuryatRU-CE ChechnyaRU-CHE ChelyabinskRU-CHU ChukotRU-CU ChuvashRU-SPE City of St. PetersburgRU-DA DagestanRU-AL Gorno-AltayRU-IN IngushRU-IRK IrkutskRU-IVA IvanovoRU-KB Kabardin-BalkarRU-KGD KaliningradRU-KL KalmykRU-KLU KalugaRU-KAM KamchatkaRU-KC Karachay-CherkessRU-KR KareliaRU-KEM KemerovoRU-KHA KhabarovskRU-KK KhakassRU-KHM Khanty-MansiyRU-KIR KirovRU-KO Komi


3.4. Contents 69


Table 6 – continued from previous pageISO Name of regionRU-KOS KostromaRU-KDA KrasnodarRU-KYA KrasnoyarskRU-KGN KurganRU-KRS KurskRU-LEN LeningradRU-LIP LipetskRU-MAG Maga BuryatdanRU-ME Mariy-ElRU-MO MordoviaRU-MOW Moscow CityRU-MOS MoskvaRU-MUR MurmanskRU-NEN NenetsRU-NIZ NizhegorodRU-SE North OssetiaRU-NGR NovgorodRU-NVS NovosibirskRU-OMS OmskRU-ORL OrelRU-ORE OrenburgRU-PNZ PenzaRU-PER Perm’RU-PRI Primor’yeRU-PSK PskovRU-ROS RostovRU-RYA Ryazan’RU-SAK SakhalinRU-SA SakhaRU-SAM SamaraRU-SAR SaratovRU-SMO SmolenskRU-STA Stavropol’RU-SVE SverdlovskRU-TAM TambovRU-TA TatarstanRU-TOM TomskRU-TUL TulaRU-TY TuvaRU-TVE Tver’RU-TYU Tyumen’RU-UD UdmurtRU-ULY Ul’yanovskRU-VLA VladimirRU-VGG VolgogradRU-VLG VologdaRU-VOR VoronezhRU-YAN Yamal-NenetsRU-YAR Yaroslavl’




Table 6 – continued from previous pageISO Name of regionRU-YEV YevreyRU-ZAB Zabaykal’ye

• Singapore

Id Name of region205 Singapore

• Spain

ISO Name of regionES-AL AlmeríaES-CA CádizES-CO CórdobaES-GR GranadaES-H HuelvaES-J JaénES-MA MálagaES-SE SevillaES-HU HuescaES-TE TeruelES-Z ZaragozaES-S3 CantabriaES-AB AlbaceteES-CR Ciudad RealES-CU CuencaES-GU GuadalajaraES-TO ToledoES-AV ÁvilaES-BU BurgosES-LE LeónES-P PalenciaES-SA SalamancaES-SG SegoviaES-SO SoriaES-VA ValladolidES-ZA ZamoraES-B BarcelonaES-GI GironaES-L LleidaES-T TarragonaES-CE CeutaES-ML MelillaES-M5 MadridES-NA7 NavarraES-A AlicanteES-CS CastellónES-V ValenciaES-BA BadajozES-CC Cáceres


3.4. Contents 71


Table 7 – continued from previous pageISO Name of regionES-C A CoruñaES-LU LugoES-OR OurenseES-PO PontevedraES-PM BalearesES-GC Las PalmasES-TF Santa Cruz de TenerifeES-LO4 La RiojaES-VI ÁlavaES-SS GuipúzcoaES-BI VizcayaES-O2 AsturiasES-MU6 Murcia

• Uk

ISO Name of regionGB-BDG Barking and DagenhamGB-BAS Bath and North East SomersetGB-BDF BedfordshireGB-WBK BerkshireGB-BEX BexleyGB-BBD Blackburn with DarwenGB-BMH BournemouthGB-BEN BrentGB-BNH Brighton and HoveGB-BST BristolGB-BRY BromleyGB-BKM BuckinghamshireGB-CAM CambridgeshireGB-CMD CamdenGB-CHS CheshireGB-CON CornwallGB-CRY CroydonGB-CMA CumbriaGB-DAL DarlingtonGB-DBY DerbyshireGB-DER DerbyGB-DEV DevonGB-DOR DorsetGB-DUR DurhamGB-EAL EalingGB-ERY East Riding of YorkshireGB-ESX East SussexGB-ENF EnfieldGB-ESS EssexGB-GLS GloucestershireGB-GRE GreenwichGB-HCK Hackney




Table 8 – continued from previous pageISO Name of regionGB-HAL HaltonGB-HMF Hammersmith and FulhamGB-HAM HampshireGB-HRY HaringeyGB-HRW HarrowGB-HPL HartlepoolGB-HAV HaveringGB-HRT HerefordshireGB-HEF HertfordshireGB-HIL HillingdonGB-HNS HounslowGB-IOW Isle of WightGB-ISL IslingtonGB-KEC Kensington and ChelseaGB-KEN KentGB-KHL Kingston upon HullGB-KTT Kingston upon ThamesGB-LBH LambethGB-LAN LancashireGB-LEC LeicestershireGB-LCE LeicesterGB-LEW LewishamGB-LIN LincolnshireGB-LND LondonGB-LUT LutonGB-MAN ManchesterGB-MDW MedwayGB-MER MerseysideGB-MRT MertonGB-MDB MiddlesbroughGB-MIK Milton KeynesGB-NWM NewhamGB-NFK NorfolkGB-NEL North East LincolnshireGB-NLN North LincolnshireGB-NSM North SomersetGB-NYK North YorkshireGB-NTH NorthamptonshireGB-NBL NorthumberlandGB-NTT NottinghamshireGB-NGM NottinghamGB-OXF OxfordshireGB-PTE PeterboroughGB-PLY PlymouthGB-POL PooleGB-POR PortsmouthGB-RDB RedbridgeGB-RCC Redcar and ClevelandGB-RIC Richmond upon Thames


3.4. Contents 73


Table 8 – continued from previous pageISO Name of regionGB-RUT RutlandGB-SHR ShropshireGB-SOM SomersetGB-SGC South GloucestershireGB-SY South YorkshireGB-STH SouthamptonGB-SOS Southend-on-SeaGB-SWK SouthwarkGB-STS StaffordshireGB-STT Stockton-on-TeesGB-STE Stoke-on-TrentGB-SFK SuffolkGB-SRY SurreyGB-STN SuttonGB-SWD SwindonGB-TFW Telford and WrekinGB-THR ThurrockGB-TOB TorbayGB-TWH Tower HamletsGB-TAW Tyne and WearGB-WFT Waltham ForestGB-WND WandsworthGB-WRT WarringtonGB-WAR WarwickshireGB-WM West MidlandsGB-WSX West SussexGB-WY West YorkshireGB-WSM WestminsterGB-WIL WiltshireGB-WOR WorcestershireGB-YOR YorkGB-ANT AntrimGB-ARD ArdsGB-ARM ArmaghGB-BLA BallymenaGB-BLY BallymoneyGB-BNB BanbridgeGB-BFS BelfastGB-CKF CarrickfergusGB-CSR CastlereaghGB-CLR ColeraineGB-CKT CookstownGB-CGV CraigavonGB-DRY DerryGB-DOW DownGB-DGN DungannonGB-FER FermanaghGB-LRN LarneGB-LMV Limavady




Table 8 – continued from previous pageISO Name of regionGB-LSB LisburnGB-MFT MagherafeltGB-MYL MoyleGB-NYM Newry and MourneGB-NTA NewtownabbeyGB-NDN North DownGB-OMH OmaghGB-STB StrabaneGB-ABD AberdeenshireGB-ABE AberdeenGB-ANS AngusGB-AGB Argyll and ButeGB-CLK ClackmannanshireGB-DGY Dumfries and GallowayGB-DND DundeeGB-EAY East AyrshireGB-EDU East DunbartonshireGB-ELN East LothianGB-ERW East RenfrewshireGB-EDH EdinburghGB-ELS Eilean SiarGB-FAL FalkirkGB-FIF FifeGB-GLG GlasgowGB-HLD HighlandGB-IVC InverclydeGB-MLN MidlothianGB-MRY MorayGB-NAY North AyrshireGB-NLK North LanarkshireGB-ORK Orkney IslandsGB-PKN Perthshire and KinrossGB-RFW RenfrewshireGB-SCB Scottish BordersGB-ZET Shetland IslandsGB-SAY South AyrshireGB-SLK South LanarkshireGB-STG StirlingGB-WDU West DunbartonshireGB-WLN West LothianGB-AGY AngleseyGB-BGW Blaenau GwentGB-BGE BridgendGB-CAY CaerphillyGB-CRF CardiffGB-CMN CarmarthenshireGB-CGN CeredigionGB-CWY ConwyGB-DEN Denbighshire


3.4. Contents 75


Table 8 – continued from previous pageISO Name of regionGB-FLN FlintshireGB-GWN GwyneddGB-MTY Merthyr TydfilGB-MON MonmouthshireGB-NTL Neath Port TalbotGB-NWP NewportGB-PEM PembrokeshireGB-POW PowysGB-RCT RhonddaGB-SWA SwanseaGB-TOF TorfaenGB-VGL Vale of GlamorganGB-WRX Wrexham

• Ukraine

ISO Name of regionUA-71 CherkasyUA-74 ChernihivUA-77 ChernivtsiUA-43 CrimeaUA-12 Dnipropetrovs’kUA-14 Donets’kUA-26 Ivano-Frankivs’kUA-63 KharkivUA-65 KhersonUA-68 Khmel’nyts’kyyUA-30 Kiev CityUA-32 KievUA-35 KirovohradUA-46 L’vivUA-09 Luhans’kUA-48 MykolayivUA-51 OdessaUA-53 PoltavaUA-56 RivneUA-40 Sevastopol’UA-59 SumyUA-61 Ternopil’UA-21 TranscarpathiaUA-05 VinnytsyaUA-07 VolynUA-23 ZaporizhzhyaUA-18 Zhytomyr

• Usa

ISO Name of regionUS-AL AlabamaUS-AK Alaska




Table 9 – continued from previous pageISO Name of regionUS-AK AlaskaUS-AZ ArizonaUS-AR ArkansasUS-CA CaliforniaUS-CO ColoradoUS-CT ConnecticutUS-DE DelawareUS-DC District of ColumbiaUS-FL FloridaUS-GA GeorgiaUS-HI HawaiiUS-ID IdahoUS-IL IllinoisUS-IN IndianaUS-IA IowaUS-KS KansasUS-KY KentuckyUS-LA LouisianaUS-ME MaineUS-MD MarylandUS-MA MassachusettsUS-MI MichiganUS-MN MinnesotaUS-MS MississippiUS-MO MissouriUS-MT MontanaUS-NE NebraskaUS-NV NevadaUS-NH New HampshireUS-NJ New JerseyUS-NM New MexicoUS-NY New YorkUS-NC North CarolinaUS-ND North DakotaUS-OH OhioUS-OK OklahomaUS-OR OregonUS-PA PennsylvaniaUS-RI Rhode IslandUS-SC South CarolinaUS-SD South DakotaUS-TN TennesseeUS-TX TexasUS-UT UtahUS-VT VermontUS-VA VirginiaUS-WA WashingtonUS-WV West VirginiaUS-WI Wisconsin


3.4. Contents 77


Table 9 – continued from previous pageISO Name of regionUS-WY Wyoming

Need to add a new Country?

To add a new country in country map tools, we need to follow the following steps :

1. You need shapefiles which contain data of your map. You can get this file on this site: https://www.diva-gis.org/gdata

2. You need to add ISO 3166-2 with column name ISO for all record in your file. It’s important because it’s a normfor mapping your data with geojson file

3. You need to convert shapefile to geojson file. This action can make with ogr2ogr tools: https://www.gdal.org/ogr2ogr.html

4. Put your geojson file in next folder : superset/assets/src/visualizations/CountryMap/countries with the next name: nameofyourcountries.geojson

5. You can to reduce size of geojson file on this site: https://mapshaper.org/

6. Go in file superset/assets/src/explore/controls.jsx

7. Add your country in component ‘select_country’ Example :

select_country: {type: 'SelectControl',label: 'Country Name Type',default: 'France',choices: ['Belgium','Brazil','China','Egypt','France','Germany','Italy','Japan','Morocco','Netherlands','Russia','Singapore','Spain','Uk','Usa',].map(s => [s, s]),description: 'The name of country that Superset should display',

},

Videos

Note: This section of the documentation has yet to be filled in.


https://www.diva-gis.org/gdata

https://www.diva-gis.org/gdata

https://www.gdal.org/ogr2ogr.html

https://www.gdal.org/ogr2ogr.html

https://mapshaper.org/


Importing and Exporting Datasources

The superset cli allows you to import and export datasources from and to YAML. Datasources include both databasesand druid clusters. The data is expected to be organized in the following hierarchy:

.databases

| database_1| | table_1| | | columns| | | | column_1| | | | column_2| | | | ... (more columns)| | | metrics| | | metric_1| | | metric_2| | | ... (more metrics)| | ... (more tables)| ... (more databases)

druid_clusterscluster_1

| datasource_1| | columns| | | column_1| | | column_2| | | ... (more columns)| | metrics| | metric_1| | metric_2| | ... (more metrics)| ... (more datasources)

... (more clusters)

Exporting Datasources to YAML

You can print your current datasources to stdout by running:

superset export_datasources

To save your datasources to a file run:

superset export_datasources -f <filename>

By default, default (null) values will be omitted. Use the -d flag to include them. If you want back references to beincluded (e.g. a column to include the table id it belongs to) use the -b flag.

Alternatively, you can export datasources using the UI:

1. Open Sources -> Databases to export all tables associated to a single or multiple databases. (Tables for one ormore tables, Druid Clusters for clusters, Druid Datasources for datasources)

2. Select the items you would like to export

3. Click Actions -> Export to YAML

4. If you want to import an item that you exported through the UI, you will need to nest it inside its parent element,e.g. a database needs to be nested under databases a table needs to be nested inside a database element.

3.4. Contents 79


Exporting the complete supported YAML schema

In order to obtain an exhaustive list of all fields you can import using the YAML import run:

superset export_datasource_schema

Again, you can use the -b flag to include back references.

Importing Datasources from YAML

In order to import datasources from a YAML file(s), run:

superset import_datasources -p <path or filename>

If you supply a path all files ending with *.yaml or *.yml will be parsed. You can apply additional flags e.g.:

superset import_datasources -p <path> -r

Will search the supplied path recursively.

The sync flag -s takes parameters in order to sync the supplied elements with your file. Be careful this can delete thecontents of your meta database. Example:

superset import_datasources -p <path / filename> -s columns,metrics

This will sync all metrics and columns for all datasources found in the <path / filename> in the Supersetmeta database. This means columns and metrics not specified in YAML will be deleted. If you would add tables tocolumns,metrics those would be synchronised as well.

If you don’t supply the sync flag (-s) importing will only add and update (override) fields. E.g. you can add averbose_name to the column ds in the table random_time_series from the example datasets by saving thefollowing YAML to file and then running the import_datasources command.

databases:- database_name: main

tables:- table_name: random_time_seriescolumns:- column_name: ds

verbose_name: datetime

3.4.8 FAQ

Can I query/join multiple tables at one time?

Not directly no. A Superset SQLAlchemy datasource can only be a single table or a view.

When working with tables, the solution would be to materialize a table that contains all the fields needed for youranalysis, most likely through some scheduled batch process.

A view is a simple logical layer that abstract an arbitrary SQL queries as a virtual table. This can allow you to join andunion multiple tables, and to apply some transformation using arbitrary SQL expressions. The limitation there is yourdatabase performance as Superset effectively will run a query on top of your query (view). A good practice may be tolimit yourself to joining your main large table to one or many small tables only, and avoid using GROUP BY wherepossible as Superset will do its own GROUP BY and doing the work twice might slow down performance.



Whether you use a table or a view, the important factor is whether your database is fast enough to serve it in aninteractive fashion to provide a good user experience in Superset.

How BIG can my data source be?

It can be gigantic! As mentioned above, the main criteria is whether your database can execute queries and returnresults in a time frame that is acceptable to your users. Many distributed databases out there can execute queries thatscan through terabytes in an interactive fashion.

How do I create my own visualization?

We are planning on making it easier to add new visualizations to the framework, in the meantime, we’ve tagged a fewpull requests as example to give people examples of how to contribute new visualizations.

https://github.com/airbnb/superset/issues?q=label%3Aexample+is%3Aclosed

Can I upload and visualize csv data?

Yes, using the Upload a CSV button under the Sources menu item. This brings up a form that allows you specifyrequired information. After creating the table from CSV, it can then be loaded like any other on the Sources ->Tables page.

Why are my queries timing out?

There are many reasons may cause long query timing out.

• For running long query from Sql Lab, by default Superset allows it run as long as 6 hours before it being killedby celery. If you want to increase the time for running query, you can specify the timeout in configuration. Forexample:

SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6

• Superset is running on gunicorn web server, which may time out web requests. If you want to increase thedefault (50), you can specify the timeout when starting the web server with the -t flag, which is expressed inseconds.

superset runserver -t 300

• If you are seeing timeouts (504 Gateway Time-out) when loading dashboard or explore slice, you are probablybehind gateway or proxy server (such as Nginx). If it did not receive a timely response from Superset server(which is processing long queries), these web servers will send 504 status code to clients directly. Superset hasa client-side timeout limit to address this issue. If query didn’t come back within clint-side timeout (60 secondsby default), Superset will display warning message to avoid gateway timeout message. If you have a longergateway timeout limit, you can change the timeout settings in superset_config.py:

SUPERSET_WEBSERVER_TIMEOUT = 60

Why is the map not visible in the mapbox visualization?

You need to register to mapbox.com, get an API key and configure it as MAPBOX_API_KEY insuperset_config.py.

3.4. Contents 81

https://github.com/airbnb/superset/issues?q=label%3Aexample+is%3Aclosed


How to add dynamic filters to a dashboard?

It’s easy: use the Filter Box widget, build a slice, and add it to your dashboard.

The Filter Box widget allows you to define a query to populate dropdowns that can be used for filtering. To buildthe list of distinct values, we run a query, and sort the result by the metric you provide, sorting descending.

The widget also has a checkbox Date Filter, which enables time filtering capabilities to your dashboard. Afterchecking the box and refreshing, you’ll see a from and a to dropdown show up.

By default, the filtering will be applied to all the slices that are built on top of a datasource that shares the columnname that the filter is based on. It’s also a requirement for that column to be checked as “filterable” in the column tabof the table editor.

But what about if you don’t want certain widgets to get filtered on your dashboard? You can do that by editing yourdashboard, and in the form, edit the JSON Metadata field, more specifically the filter_immune_slices key,that receives an array of sliceIds that should never be affected by any dashboard level filtering.

{"filter_immune_slices": [324, 65, 92],"expanded_slices": {},"filter_immune_slice_fields": {

"177": ["country_name", "__time_range"],"32": ["__time_range"]

},"timed_refresh_immune_slices": [324]

}

In the json blob above, slices 324, 65 and 92 won’t be affected by any dashboard level filtering.

Now note the filter_immune_slice_fields key. This one allows you to be more specific and define for aspecific slice_id, which filter fields should be disregarded.

Note the use of the __time_range keyword, which is reserved for dealing with the time boundary filtering men-tioned above.

But what happens with filtering when dealing with slices coming from different tables or databases? If the columnname is shared, the filter will be applied, it’s as simple as that.

How to limit the timed refresh on a dashboard?

By default, the dashboard timed refresh feature allows you to automatically re-query every slice on a dashboardaccording to a set schedule. Sometimes, however, you won’t want all of the slices to be refreshed - especially ifsome data is slow moving, or run heavy queries. To exclude specific slices from the timed refresh process, add thetimed_refresh_immune_slices key to the dashboard JSON Metadata field:

{"filter_immune_slices": [],"expanded_slices": {},"filter_immune_slice_fields": {},"timed_refresh_immune_slices": [324]

}

In the example above, if a timed refresh is set for the dashboard, then every slice except 324 will be automaticallyre-queried on schedule.

Slice refresh will also be staggered over the specified period. You can turn off this staggering by setting thestagger_refresh to false and modify the stagger period by setting stagger_time to a value in millisecondsin the JSON Metadata field:



{"stagger_refresh": false,"stagger_time": 2500

}

Here, the entire dashboard will refresh at once if periodic refresh is on. The stagger time of 2.5 seconds is ignored.

Why does ‘flask fab’ or superset freezed/hung/not responding when started (my home directory isNFS mounted)?

By default, superset creates and uses an sqlite database at ~/.superset/superset.db. Sqlite is known to don’twork well if used on NFS due to broken file locking implementation on NFS.

You can override this path using the SUPERSET_HOME environment variable.

Another work around is to change where superset stores the sqlite database by addingSQLALCHEMY_DATABASE_URI = 'sqlite:////new/location/superset.db' in superset_config.py(create the file if needed), then adding the directory where superset_config.py lives to PYTHONPATH environmentvariable (e.g. export PYTHONPATH=/opt/logs/sandbox/airbnb/).

What if the table schema changed?

Table schemas evolve, and Superset needs to reflect that. It’s pretty common in the life cycle of a dashboard to wantto add a new dimension or metric. To get Superset to discover your new columns, all you have to do is to go to Menu-> Sources -> Tables, click the edit icon next to the table who’s schema has changed, and hit Save fromthe Detail tab. Behind the scene, the new columns will get merged it. Following this, you may want to re-edit thetable afterwards to configure the Column tab, check the appropriate boxes and save again.

How do I go about developing a new visualization type?

Here’s an example as a Github PR with comments that describe what the different sections of the code do: https://github.com/airbnb/superset/pull/3013

What database engine can I use as a backend for Superset?

To clarify, the database backend is an OLTP database used by Superset to store its internal information like your listof users, slices and dashboard definitions.

Superset is tested using Mysql, Postgresql and Sqlite for its backend. It’s recommended you install Superset on one ofthese database server for production.

Using a column-store, non-OLTP databases like Vertica, Redshift or Presto as a database backend simply won’t workas these databases are not designed for this type of workload. Installation on Oracle, Microsoft SQL Server, or otherOLTP databases may work but isn’t tested.

Please note that pretty much any databases that have a SqlAlchemy integration should work perfectly fine as a data-source for Superset, just not as the OLTP backend.

How can i configure OAuth authentication and authorization?

You can take a look at this Flask-AppBuilder configuration example.

3.4. Contents 83

https://www.sqlite.org/lockingv3.html

https://www.sqlite.org/lockingv3.html

https://github.com/airbnb/superset/pull/3013

https://github.com/airbnb/superset/pull/3013

https://github.com/dpgaspar/Flask-AppBuilder/blob/master/examples/oauth/config.py


How can I set a default filter on my dashboard?

Easy. Simply apply the filter and save the dashboard while the filter is active.

How do I get Superset to refresh the schema of my table?

When adding columns to a table, you can have Superset detect and merge the new columns in by using the “RefreshMetadata” action in the Source -> Tables page. Simply check the box next to the tables you want the schemarefreshed, and click Actions -> Refresh Metadata.

Is there a way to force the use specific colors?

It is possible on a per-dashboard basis by providing a mapping of labels to colors in the JSON Metadata attributeusing the label_colors key.

{"label_colors": {

"Girls": "#FF69B4","Boys": "#ADD8E6"

}}

Does Superset work with [insert database engine here]?

The community over time has curated a list of databases that work well with Superset in the Database dependenciessection of the docs. Database engines not listed in this page may work too. We rely on the community to contribute tothis knowledge base.

For a database engine to be supported in Superset through the SQLAlchemy connector, it requires having a Pythoncompliant SQLAlchemy dialect as well as a DBAPI driver defined. Database that have limited SQL support may workas well. For instance it’s possible to connect to Druid through the SQLAlchemy connector even though Druid does notsupport joins and subqueries. Another key element for a database to be supported is through the Superset DatabaseEngine Specification interface. This interface allows for defining database-specific configurations and logic that gobeyond the SQLAlchemy and DBAPI scope. This includes features like:

• date-related SQL function that allow Superset to fetch different time granularities when running time-seriesqueries

• whether the engine supports subqueries. If false, Superset may run 2-phase queries to compensate for thelimitation

• methods around processing logs and inferring the percentage of completion of a query

• technicalities as to how to handle cursors and connections if the driver is not standard DBAPI

• more, read the code for more details

Beyond the SQLAlchemy connector, it’s also possible, though much more involved, to extend Superset and writeyour own connector. The only example of this at the moment is the Druid connector, which is getting superseded byDruid’s growing SQL support and the recent availability of a DBAPI and SQLAlchemy driver. If the database youare considering integrating has any kind of of SQL support, it’s probably preferable to go the SQLAlchemy route.Note that for a native connector to be possible the database needs to have support for running OLAP-type queries andshould be able to things that are typical in basic SQL:

• aggregate data

• apply filters (==, !=, >, <, >=, <=, IN, . . . )


https://docs.sqlalchemy.org/en/latest/dialects/

https://www.python.org/dev/peps/pep-0249/

https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs.py

https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs.py


• apply HAVING-type filters

• be schema-aware, expose columns and types

3.5 Indices and tables

• genindex

• modindex

• search

3.5. Indices and tables 85

Documents

Superset Documentation