Metadata

Metadata are created and stored in GDBase - a relational database, part of Algolytics analytical platform.

Tables with metadata are created with following SQL code:

CREATE TABLE clients(id INTEGER PRIMARY KEY /*, ...*/)
CREATE TABLE external_data(id INTEGER PRIMARY KEY, client_id INTEGER /*, ...*/)
CREATE TABLE variables(id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, event_id INTEGER NOT NULL, definition TEXT NOT NULL, type INTEGER NOT NULL CHECK(type = 1 OR type = 2), default_value TEXT)
CREATE INDEX ON variables(client_id)
CREATE TABLE aggregates(id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, variable_id INTEGER, aggregate_type INTEGER, window_type INTEGER, window_size/*count|time*/ INTEGER, window_shift/*count|time*/ INTEGER, definition TEXT, return_type INTEGER CHECK((definition IS NULL AND return_type IS NULL) OR (definition IS NOT NULL AND return_type IS NOT NULL AND (return_type = 1 OR return_type = 2))), dictionary_id INTEGER, external_data_id INTEGER, external_data_name TEXT, name TEXT NOT NULL CHECK(name REGEXP '[a-zA-Z_$][a-zA-Z_$0-9]*'))
CREATE UNIQUE INDEX ON aggregates(client_id, name)
CREATE TABLE derived_aggregates(derived_aggregate_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL)
CREATE INDEX ON derived_aggregates(derived_aggregate_id)
CREATE TABLE trigger_aggregates(trigger_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL)
CREATE INDEX ON trigger_aggregates(trigger_id, aggregate_id)
CREATE TABLE triggers(id INTEGER PRIMARY KEY, definition TEXT NOT NULL, needs_change INTEGER/*boolean*/ NOT NULL CHECK(needs_change = 0 OR needs_change = 1), `group` INTEGER NOT NULL)
CREATE TABLE model_triggers(model_id INTEGER NOT NULL, trigger_id INTEGER NOT NULL)
CREATE INDEX ON model_triggers(model_id, trigger_id)
CREATE TABLE model_aggregates(model_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL, used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR used = 1))
CREATE INDEX ON model_aggregates(model_id, aggregate_id)
CREATE TABLE models(id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR used = 1), target_aggregate_id INTEGER NOT NULL, target_start INTEGER NOT NULL CHECK(target_start >= 0), target_length INTEGER NOT NULL CHECK(target_length >= 0), target_length_exact INTEGER/*boolean*/ NOT NULL CHECK(target_length_exact = 0 OR target_length_exact = 1), scoring_code TEXT CHECK(NOT used OR scoring_code IS NOT NULL), saved_state_time INTEGER CHECK(NOT used OR saved_state_time IS NOT NULL))
CREATE INDEX ON models(client_id)
CREATE TABLE dictionary(id INTEGER, categorical INTEGER/*boolean*/ NOT NULL CHECK(categorical = 0 OR categorical = 1), value TEXT CHECK(categorical OR value IS NULL), start FLOAT CHECK(NOT categorical OR start IS NULL), start_inclusive FLOAT CHECK((start IS NULL AND start_inclusive IS NULL) OR (start IS NOT NULL AND start_inclusive IS NOT NULL AND (start_inclusive = 0 OR start_inclusive = 1))), end FLOAT CHECK((NOT categorical OR end IS NULL) AND (start IS NULL OR end IS NULL OR end >= start)), end_inclusive FLOAT CHECK((end IS NULL AND end_inclusive IS NULL) OR (end IS NOT NULL AND end_inclusive IS NOT NULL AND (end_inclusive = 0 OR end_inclusive = 1))), mapped_value TEXT NOT NULL)
CREATE INDEX ON dictionary(id)

Tables structure

  • clients
    • id - id, a primary key
    • authorization data
    • other data about a client
  • external data
    • id - id, a primary key
    • client_id
    • data pointing at external source of data
  • variables (stores definitions of variables obtained from JSON)
    • id INTEGER PRIMARY KEY
    • event_id INTEGER- corresponds with the eventId field in JSON
    • definition_type - a type of definition: JSON_PATH or TRANSFORMED
    • definition TEXT - definition of changing event into variable (JSONPath)
      • if definition_type = JSON_PATH then definition contains an expression JsonPath. It should returns numerical or text value. Syntax of expressions is described in JsonPath_README.md. For efficiency reasons it is the best to use $.x.y.z expressions types. Moreover, optimalization for [?(@.a == 'x' && @.b == 3 && @.c == @.d && ...)] filters is added.
      • if definition_type = TRANSFORMED then definition contains full definition of java class, which must inherit from DoubleTransformation or StringTransformation (without the import of this class). In defined class should be no specified package.
    • input_id - for TRANSFORMED variable indicates id of transferred to the class, definied in definition_type input variable. Id has to be JSON_PATH variable.
    • category - if not null, definition should return a list, if this list contains category, variable will be created.
    • type (numerical, categorical) - type of the variable definied in DBConstants (VARIABLE_...).
    • default_value - default value of the variable, used if the definition returns null value
  • aggregates
    • id - id, a primary key
    • variable_id - indicates a variables table. If the aggregate is not created from a variables table, the value is null
    • aggregate_type - type of the aggregate defined in DBConstants (AGGREGATE_...) (see Aggregate types section)
    • window_type - type of the window defined in DBConstants (WINDOW_...) (see Window types section). Each variable used by target must have window type set to WINDOW_TARGET
    • window_size (count/time) - window size, defined as number of occurrences or time in miliseconds
    • window_shift (count/time) - window shift (for windows with shift), defined as number of occurrences or time in miliseconds
    • definition - a Java expression defining aggregate as a derivative of other aggregates. Every aggregate used in the definition must be listed in derived_aggregates table. If the aggregate is not a derivative aggregate, the value is null
    • return_type - type of aggregate defined by definition column (DBConstants.VARIABLE_...)
    • dictionary_id - a dictionary from dictionary table. If null then the aggregate do not use a dictionary
    • external_data_id - id of external data from external data table
    • external_data_name - name of variable in external data
    • name - variable name, used in Java expressions, unique for client_id
  • derived_aggregates
    • derived_aggregate_id - id of derived aggregate from aggregates table
    • aggregate_id - ids of aggregates used in creating the derived aggregate (may be multiple ids for given derived_aggregate_id)
  • triggers (definition of scoring moment, i.e. definition of a moment of creating a training line with target)
    • id - id, a primary key
    • definition - definition of given trigger variable, which is an expression in Java using input variables to calculate value of trigger variable. The expression returns true or false, it may be also a definition of user's affiliance to a model. Input variables from the Java expression must be listed in the trigger_aggregates table.
    • needs_change - scoring occurs only, if during the last launch of trigger the definition returned false
    • group - a group to which the trigger belongs. If the model defines triggers from different groups, and each group contain multiple triggers, scoring occurs only, if at least 1 trigger in every group returns, that scoring must occur
  • models
    • id INTEGER PRIMARY KEY
    • active - if false, model is inactive, cannot be used for modeling or scoring. If there is new model added, but it is without code yet, active shuold have true value.
    • used - if false, the model is not used for scoring (it does not affect counting tables for moedling).
    • target_aggregate_id - indicates target, used while building a model
    • target_type:
      • if has GENERAL value, target_start, target\length, target_length_exact (described below) will be included
      • if has CLICK value:
        • trigger_validator (described below) will be included
        • at the offline start, giving negative and positive target value will be needed
        • row of last positive trigger before positive target or last target in data if there is no positive target value is recorded to a table
        • every annoucement after trigger is included in counting aggregates of target and validator. For every new trigger (if there was no positive target before) aggregates are counted again (aggregates counted before are forgotten)
      • if has CLICK_AD value, usage is special for clickad, inter alia:
        • at the offline start, giving negative and positive target value is needed.
        • every annoucement, that responds to the display of advertisement to a user (for each user many rows can be shown) is recorded to a table.
        • fitting target and trigger is based on specific fields in JSON. For trigger, despite userId is null, target will be found basing on different fields.
    • target_start - time in miliseconds. The target window starts after this time.
    • target_length - length of the target window in miliseconds.
    • target_length_exact - negative value means, that target window is counted till the end of the data (cannot be smaller than target_length).
    • trigger_validator - expression in Java, which, after trigger, has to return true. Thereby row can be recorded to a learning table. If it has null value, there is no additional restriction on recording rows. validator can use aggregates with any type of window (WINDOW_TARGET and WINDOW_GLOBAL are treated the same).
    • saved_state_time - to what time the state of users was calculated after building a model. Value is set by application.
    • value - value connected with model, for example CPC (cost per click) in RTB case. If in response for request there is score returned (not value), then value should be 1.
    • client_value - value connected with model, for example CPC of a client in RTB case, if not needed then null.
    • positive_target_cnt - in RTB case: ordered tergets' quantity, which should be generated in a campaign(ex. clicks); if not needed, then null.
    • weight - weight; in RTB case bidding price is value\*scroe\_from\_model\*weight
    • positive_target_ratio - ratio of number of rows with positive target to the total number of rowsin table used for modeling. Until it is not counted, set value to 0.
    • category_id - campaign category
    • use_category - 1 - model building per campaign category; 0 - model building per campaign.
    • end_date - time in miliseconds to the end of campaign (in RTB case: time to the end of which ordered number of positive targets (positive_target_cnt) should be made.
  • model_aggregates (defining models' arguments, every model can use various aggregates, and every aggregate can be used in various models)
    • model_id - id of the model from models table
    • aggregate_id - id of the variable from aggregates table (may by many for given model_id)
    • used - if positive, the variable is passed to scoring code, if negative, the variable is used in model building
  • trigger_aggregates (defining triggers' arguments, every trigger can use various aggregates, and every aggregate can be used in various triggers)
    • trigger_id - id of the trigger from trigger table
    • aggregate_id - id of the variable from aggregates table (may by many for given trigger_id)
  • model_triggers (defining models' triggers, every model can use various triggers, and every trigger can be used in various models)
    • model_id - id of the model from models table
    • trigger_id - id of the trigger from triggers table (may by many for given model_id)
  • dictionary (a dictionary mapping values from JSON to variables)
    • id - id of group or interval of values
    • categorical - if true, then value variable from this table is used as an input. If false, then start, start_inclusive, end, end_inclusive are used as an input
    • value - a specific value from JSON
    • start - start of an interval
    • start_inclusive - defines, whether the start of the interval is open or closed
    • end - end of an interval
    • end_inclusive - defines, whether the end of the interval is open or closed
    • mapped_value - value passed to an aggregate
  • default_model (default parameters used when adding new models from api level automatically)
    • client_id - id of a client
    • target_definition - definition of derivative aggregate being target (at uploading and creating new model string ${model_id} will be modified to model_id of new model).
    • target_aggregates - aggregates needed to count the target (names of aggregates separated by comas, ex.:'agg1,agg2,agg3').
    • trigger_definition - definition of trigger (at uploading and creating new model string ${model_id} will be modified to model_id of new model).
    • trigger_aggregates - aggregates needed to count the trigger (names of aggregates separated by comas, ex.:'agg1,agg2,agg3').
    • trigger_validator_definition - definition of trigger validator (at uploading and creating new model string ${model_id} will be modified to model_id of new model).
    • trigger_validator_aggregates - aggregates needed to count the trigger validator (names of aggregates separated by comas, ex.:'agg1,agg2,agg3').
    • target_type - corresponds to target_type variable in models table.
    • target_start - corresponds to target_start variable in models table.
    • target_length - corresponds to target_length variable in models table.
    • target_length_exact - corresponds to target_length_exact variable in models table.
    • target_aggregate_name - name of the aggregate being a target (at uploading and creating new model string ${model_id} will be modified to model_id of new model).
    • weight - corresponds to weight variable in models table.
  • model_urls (url adresses that should be included/excluded when building a model)
    • model_id - id of a model
    • client_id - id of a client
    • url - url adress of www website (domena)
    • included - if 1, website should be included in modeling, if 0 website should be excluded

Window types

public final static int WINDOW_TARGET       = 0;
public final static int WINDOW_GLOBAL       = 1;
public final static int WINDOW_TIME         = 2;
public final static int WINDOW_TIME_SLIDE   = 3;
public final static int WINDOW_COUNT        = 4;
public final static int WINDOW_COUNT_SLIDE  = 5;

Aggregate types

public final static int AGGREGATE_COUNT     = 1;
public final static int AGGREGATE_SUM       = 2;
public final static int AGGREGATE_LASTVALUE = 3;
public final static int AGGREGATE_EXISTS    = 4;
public final static int AGGREGATE_MIN       = 5;
public final static int AGGREGATE_MAX       = 6;
public final static int AGGREGATE_CURRENT_VALUE     = 7;