Data Center Connector

The Data Center connector allows querying a remote openLooKeng data center. This can be used to join data between different openLooKeng clusters from the local openLooKeng environment.

Local DC Connector Configuration

To configure the Data Center connector, create a catalog properties file in etc/catalog named, for example, <dc-name>.properties, to mount the Data Center connector as the <dc-name> catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

Basic Configuration

connector.name=dc
connection-url=http://example.net:8080
connection-user=<The User Name of Remote openLooKeng>
connection-password=<The Password of remote openLooKeng>
Property NameDescriptionRequiredDefault Value
connection-urlURL of openLooKeng cluster to connectYes
connection-userUser Name to use when connecting to openLooKeng clusterNo
connection-passwordPassword to use when connecting to openLooKeng clusterNo

Security Configuration

When the remote openLooKeng has enabled security authentication or TLS / SSL, the corresponding security configuration should be carried out on the <dc-name>.properties.

Kerberos Authentication Mode

Property NameDescriptionDefault Value
dc.kerberos.config.pathKerberos configuration file
dc.kerberos.credential.cachepathKerberos credential cache
dc.kerberos.keytab.pathKerberos keytab file
dc.kerberos.principalThe principal to use when authenticating to the openLooKeng coordinator
dc.kerberos.remote.service.nameopenLooKeng coordinator Kerberos service name. This parameter is required for Kerberos authentication
dc.kerberos.service.principal.patternopenLooKeng coordinator Kerberos service principal pattern. The default is ${SERVICE}@${HOST}.${SERVICE} is replaced with the value of dc.kerberos.remote.service.name and ${HOST} is replaced with the host name of the coordinator (after canonicalization if enabled)${SERVICE}@${HOST}
dc.kerberos.use.canonical.hostnameUse the canonical host name of the openLooKeng coordinator for the Kerberos service principal by first resolving the host name to an IP address and then doing a reverse DNS lookup for that IP address.false

Token Authentication Mode

Property NameDescriptionDefault Value
dc.accesstokenAccess token for token based authentication

External Certificate Authentication Mode

Property NameDescriptionDefault Value
dc.extra.credentialsExtra credentials for connecting to external services. The extra.credentials is a list of key-value pairs. Example: foo:bar;abc:xyz will create credentials abc=xyz and foo=bar

SSL/TLS Configuration

Property NameDescriptionDefault Value
dc.sslUse HTTPS for connectionsfalse
dc.ssl.keystore.passwordThe keystore password
dc.ssl.keystore.pathThe location of the Java keystore file that contains the certificate and private key to use for authentication
dc.ssl.truststore.passwordThe truststore password
dc.ssl.truststore.pathThe location of the Java truststore file that will be used to validate HTTPS server certificates

Proxy Configuration

Property NameDescriptionDefault Value
dc.socksproxySOCKS proxy host and port. Example: localhost:1080
dc.httpproxyHTTP proxy host and port. Example: localhost:8888

Performance Optimization Configuration

Property NameDescriptionDefault Value
dc.metadata.cache.enabledMetadata Cache Enabledtrue
dc.metadata.cache.maximum.sizeMetadata Cache Maximum Size10000
dc.metadata.cache.ttlMetadata Cache TTL1.00s
dc.query.pushdown.enabledEnable sub-query push down to this data center. If this property is not set, by default sub-queries are pushed downtrue
dc.query.pushdown.moduleFULL_PUSHDOWN: All push down. BASE_PUSHDOWN: Partial push down, which indicates that filter, aggregation, limit, topN and project can be pushed down.FULL_PUSHDOWN
dc.http-compressionWhether use zstd compress response body, default value is falsefalse

Other Properties

Property NameDescriptionDefault Value
dc.http-request-connectTimeoutHTTP request connect timeout, default value is 30s30.00s
dc.http-request-readTimeoutHTTP request read timeout, default value is 30s30.00s
dc.httpclient.maximum.idle.connectionsMaximum idle connections to be kept open in the HTTP client20
dc.http-client-timeoutTime until the client keeps retrying to fetch the data, default value is 10 min10.00m
dc.max.anticipated.delayMaximum anticipated delay between two requests for a query in the cluster. If the remote openLooKeng did not receive a request for more than this delay, it may cancel the query10.00m
dc.application.name.prefixPrefix to append to any specified ApplicationName client info property, which is used to Set source name for the openLooKeng query. If neither this property nor ApplicationName are set, the source for the query will be hetu-dchetu-dc
dc.remote-http-server.max-request-header-sizeThis property should be equivalent to the value of http-server.max-request-header-size in the remote server
dc.remote.cluster.idA unique id for the remote cluster

Remote openLooKeng Configuration

openLooKeng Configuration

You can set following properties in the etc/config.properties:

Property NameDescriptionDefault Value
hetu.data.center.split.countMaximum number of splits allowed per query5
hetu.data.center.consumer.timeoutThe maximum delay of waiting to be taken after the data is obtained by executing the query10min

Nginx Configuration

When HA is enabled at the remote end and Nginx is used as the proxy, the configuration of Nginx needs to be modified:

http {
    upstream for_aa {
        ip_hash;
        server 192.168.0.101:8090;   #coordinator-1;
        server 192.168.0.102:8090;   #coordinator-2;
        check interval=3000 rise=2 fall=5 timeout=1000 type=http;
    }

    upstream for_cross_region {
        hash $hashKey consistent;
        server 192.168.0.101:8090;   #coordinator-1;
        server 192.168.0.102:8090;   #coordinator-2;
        check interval=3000 rise=2 fall=5 timeout=1000 type=http;
    }
    
    server {
        listen nginx_ip:8888; # nginx port
        
        location / {
            proxy_pass http://for_aa;
            proxy_redirect off;
            proxy_set_header Host $host:$server_port;
        }
        
        location ^~/v1/dc/(.*)/(.*) {
            set $hashKey $2;
            proxy_redirect off;
            proxy_pass http://for_cross_region;
		    proxy_set_header Host $host:$server_port;
        }
        
        location ^~/v1/dc/statement/(.*)/(.*)/(.*) {
            set $hashKey $3;
            proxy_redirect off;
            proxy_pass http://for_cross_region;
		    proxy_set_header Host $host:$server_port;
        }
    }
}

Multiple openLooKeng Clusters

You can have as many catalogs as you need, so if you have additional data centers, simply add another properties file to etc/catalog with a different name (making sure it ends in .properties). For example, if you name the property file sales.properties, openLooKeng will create a catalog named sales using the configured connector.

Global Dynamic Filter

The global dynamic filtering is enabled, when the cross openLooKeng query is executed, the filter is generated locally and sent to the remote openLooKeng for data filtering to reduce the amount of data pulled from the remote openLooKeng. It is necessary to ensure that state store is enabled in openLooKeng environment (please refer to the configuration document of state store for relevant configuration). There are two ways to enable global dynamic filtering:

Method 1: You can set following properties in the etc/config.properties:

Property NameDescriptionDefault Value
enable-dynamic-filteringWhether the dynamic filtering feature is enabledfalse
dynamic-filtering-max-per-driver-row-countIf the maximum number of rows per driver is exceeded, the dynamic filtering feature of the query will be automatically cancelled100
dynamic-filtering-max-per-driver-sizeIf the maximum amount of data allowed to be processed by each driver exceeds this value, the dynamic filtering feature of the query will be automatically cancelled10KB

Method 2: Set properties in session

  • By openLooKeng CLI

    java -jar hetu-cli-*-execute.jar --server ip:port --session enable-dynamic-filter=ture --session dynamic-filtering-max-per-driver-row-count=10000 --session dynamic-filtering-max-per-driver-size=1MB
    
  • By openLooKeng JDBC:

    Properties properties = new Properties();
    properites.setProperties("enable-dynamic-filter", "true");
    properites.setProperties("dynamic-filtering-max-per-driver-row-count", "10000");
    properites.setProperties("dynamic-filtering-max-per-driver-size", "1MB");
    
    String url = "jdbc:lk://127.0.0.0:8090/hive/default";
    Connection connection = DriverManager.getConnection(url, properties);
    

Querying Remote Data Center

The Data Center connector provides a catalog prefixed with the property file name for every catalog in the remote data center. Treat each prefixed remote catalogs as a separate catalog in the local cluster. You can see the available remote catalogs by running SHOW CATALOGS:

SHOW CATALOGS;

If you have catalog named mysql in the remote data center, you can view the schemas in this remote catalog by running SHOW SCHEMAS:

SHOW SCHEMAS FROM dc.mysql;

If you have a schema named web in the remote catalog mysql, you can view the tables in that catalog by running SHOW TABLES:

SHOW TABLES FROM dc.mysql.web;

You can see a list of the columns in the clicks table in the web schema using either of the following:

DESCRIBE dc.mysql.web.clicks;
SHOW COLUMNS FROM dc.mysql.web.clicks;

Finally, you can access the clicks table in the web schema:

SELECT * FROM dc.mysql.web.clicks;

If you used a different name for your catalog properties file, use that catalog name instead of dc in the above examples.

Data Center Connector Limitations

Data Center connector is a read-only connector. The following SQL statements are not yet supported:

ALTER SCHEMA, ALTER TABLE, ANALYZE, CACHE TABLE, COMMENT, CREATE SCHEMA, CREATE TABLE, CREATE TABLE AS, CREATE VIEW, DELETE, DROP CACHE, DROP SCHEMA, DROP TABLE, DROP VIEW, GRANT, INSERT, INSERT OVERWRITE, REVOKE, SHOW CACHE, SHOW CREATE VIEW, SHOW GRANTS, SHOW ROLES, SHOW ROLE GRANTS, UPDATE, VACUUM