Skip to main content
Introduction to Data Modeller

How to manage the data behind your Kubit environment

Updated this week

The Data Modeller intends to support self-service data model changes in your Kubit environment. Want to add handling for a new column in your fact table? Create a derived property, add a new dimension table, or even a whole new schema? This article will provide an introduction to the Data Modeller and will help you understand the main concepts behind it and what you can do in it.

Prerequisites

  • Access to the Data Modeller feature is exclusively available to Direct Connect customers. Customers utilizing Data Share and Managed Service have read-only access and are unable to modify or create data models within the platform.

  • You have to be a Kubit Admin to access Data Modeller.

Getting Started

You can find the Data Modeller icon in the navigation sidebar on the left of your screen:

Once you click on it you will see:

  • a New Schema button you can use to create a new schema

  • a list of existing Kubit Schemas on the left

  • the default Schema configuration which you can observe or edit by navigating through the tabs:

    • Configuration

    • Connection

    • Tables

    • Fields

    • View Model

Configuration

In this view you can see:

General schema settings:

  • the Schema Name (editable)

  • what Timezone is being used for this Schema

  • when doing reports with a Time Unit: Weekly reports - what day of the week is the Week Start set to

This is where you will see all of these parameters in action when you are building reports:

Subjects

  • a read-only list of all Subjects configured for this schema

Sampling

Sampling can be used to speed up reports and to save costs, especially when dealing with very large datasets. To understand more about Sampling in Kubit you can read this detailed article. What you can control from here is the default sampling rates for:

  • Default: applies to all Query/Funnel/Path/Retention reports

  • Cohort: applies to all Cohort Filters and Segments

Additional Funnel and Path Partition Fields

The default Partition options in Kubit are (more about them - here):

  • None

  • Day

  • Conversion Window

In a nutshell, the Partition options control the time window within a conversion that happened within a Funnel or a Path. But in some cases, you might be interested in using a non-temporal field for the partitioning such as Session. This then allows you to configure the Funnel report to measure only the conversions that happened within a single session.

Status

The Enabled toggle controls whether a Schema is visible for the Kubit users who build Reports or not. Once disabled, users can no longer select this schema from any Schema dropdown.

Set the status to Disabled when you're working on a new Schema and don't want your teammates to start using it just yet.

Connection

In order to consume data from your data warehouse you need to set up the connection to it. Kubit relies on a JDBC connection to execute Reports. The first step is to choose the warehouse type from a list of supported warehouses:

  • BigQuery

  • ClickHouse

  • Databricks

  • MySQL

  • Presto

  • Redshift

  • Snowflake

Then, you have to fill out the rest of the parameters which are as follows (with the exception of BigQuery):

  • Username: which should be used to login to your warehouse (we recommend creating a dedicated user for Kubit)

  • Password: for the username (we recommend a length of 60 characters, a randomly generated mix of uppercase, lowercase, numbers, and special characters)

  • Timeout: on the JDBC connection, make sure to check what the timeout configured on your warehouse and set it to the same value or less

  • JDBC URL: to establish the JDBC connection

  • End Date Type: what should the date picker default to as the end date - today, yesterday, a static date, or something else? The recommendation is to keep the default (CURRENT).


Tables

There are 3 things you can do in this view:

  • add more facts or tables

  • edit table (mapping)

  • create Derived Columns

Add Table

To add a table you start by clicking on the Add Table button:

Then you see what we call the Schema Browser where you can navigate between schemas in the database configured in the Connection tab. In each schema you will see a full list of:

  • tables

  • views

  • materialized views

Once you find the table you want to add you select it, choose whether it's a fact or a dimension table and then hit Confirm:

When you add a new table you need to set some required fields before you can save the changes to your Schema. You will see a table view of all the columns and their types which you can map to Kubit Fields:

and below that you will find the required fields:

  • Event: this is the event name

  • Timestamp: the event timestamp is used for finding the sequence of events in the Funnel or Path

  • Date: the event date is used for all the date ranges in Kubit

  • Subjects: you need to add at least one Subject per fact table

In case you don't have an event date column in your table you can create a Derived Column and use it!

Edit Table (Mapping)

Which columns from a table should be used in Kubit? Here's a short demo of how you select the required fields and map a couple of columns:

A few things to call out:

  • once you set the Required Fields for a Table you cannot change them

  • you can apply type overrides only when you are adding a new table (e.g. a numeric value is stored as a string but you want to do sums with it in Kubit)

In this view, you can also set the Field name a column should map to and its description.

Create Derived Column

Derived Columns come to the rescue when you don't have all of the required columns already in place or you want to slightly modify the values in a column. A Derived Column, in a nutshell, allows you to create a new logical column based on other columns that already exist in the table. To start, click on Add Derived Column:

Then you have 2 ways to define the column:

  1. Raw SQL: most common use cases are when you need to cast a type, concatenate strings, or pick a property from an object (e.g. JSON or Struct)

  2. Mapped String: useful when you need to map an enum, sometimes the database would utilize a numeric type to store some data and you need the human-readable names in your analytics tool - this is when you will use Mapped String

The SQL option supports only single-row operations. If you'd like to use Window Functions, please check out Derived Fields.

Here are a couple of examples, one is using SQL to create a date column from a timestamp column:

And the other is mapping integers to human-readable names:

Fields

This is where you can:

  • Enable Fields for Filter and Breakdown

  • Disable a Field (make it unavailable for selection in any new Reports)

  • Change the Field Name and Description

  • Create Derived Fields

Enable Fields for Filter and Breakdown

This is as simple as it gets if you want to tweak whether a Field is eligible for Filter or Breakdown or not all you have to do is (un)check a checkbox and hit Confirm.

Disable a Field

In some cases, you might want to Disable a Field. Maybe you're no longer populating a column with data or you don't trust it - whatever the reason, it's again just a click on top of a checkbox and then Confirm:

Change Field Name and Description

Maybe a Field Name is a bit confusing or ambiguous - nothing to worry about, you can always change it or add a description to make it easier to understand. Here's how:

Derived Fields

There are 2 main scenarios in which you should use Derived Field instead of a Derived Column:

  1. you want to apply SQL Window Functions, e.g. to find out the duration of a session by calculating the time difference between the first and last event in a session

  2. you have multiple fact tables which contain the same Field so instead of defining Derived Column in each table, you can define it as a Derived Field only once

To go back to our session duration example, let's check the Window Functions checkbox and input this sample SQL:

timestampdiff(second, min(@event_timestamp) over (partition by @session_id), max(@event_timestamp) over (partition by @session_id))

Remember to use @ when referencing a column!

When Window Functions aren't being used the behaviour is equivalent to Derived Column, it just may be applied over more than 1 column in case you have multiple fact tables in a union.

View Model

This is where you can see all the tables and how they are related to each other and which columns are mapped to which Fields.

Did this answer your question?