TIBCO Scribe® Online Connector For Apache HBase
The TIBCO Scribe® Online Connector for Apache HBase is based on the CData driver for Apache HBase, this Connector allows you to integrate Apache HBase with CRM, accounting, eCommerce, and marketing systems. If you use Apache HBase to store data for an application you have developed, you can integrate that application in TIBCO Scribe® Online without building a custom Connector. This Connector based on Scribe.Connector.AdoNet library and CData ApacheHBase ADO.NET provider.
Possible use cases include:
- Integrate with any application across your business that uses Apache HBase as a back end
- Move any Apache HBase data to other systems
Connector Specifications
Supported | |
---|---|
Agent Types |
|
On Premise | X |
Cloud | X |
Replication Services |
|
Source | |
Target | |
Integration Services |
|
Source | X |
Target | X |
Migration Services |
|
Source | |
Target | |
Maps |
|
Integration | X |
Request-Reply | X |
Message |
This Connector is available from the TIBCO Scribe® Online Marketplace. See Marketplace TIBCO Scribe® Certified Connectors for more information.
Supported Entities
Apache HBase database tables and views are exposed as entities.
Special Operations
Supports the Execute Block to execute stored procedures, where each stored procedure is represented as an entity and each input parameter is represented as a field within that entity. See Execute Block and the CData Apache HBase documentation for additional information.
Setup Considerations
Requires Apache HBase 2017 or higher.
Selecting An Agent Type For Apache HBase
Refer to TIBCO Scribe® Online Agents for information on available Agent types and how to select the best Agent for your Solution.
Connecting To Apache HBase
Note: Best practice is to create Connections with credentials that limit permissions in the target system, following the principle of least privilege. Using Administrator level credentials in a Connection provides Administrator level access to the target system for TIBCO Scribe® Online users. Depending on the entities supported, a TIBCO Scribe® Online user could alter user accounts in the target system.
- Select More > Connections from the menu.
- From the Connections page select Add
to open the Add a New Connection dialog.
- Select the Connector from
the drop-down list in the Connection Type field, and then enter the following information for this Connection:
- Name — This can be any meaningful name, up to 25 characters.
- Alias — An alias for this Connection name. The alias is generated from the Connection name, and can be up to 25 characters. The Connection alias can include letters, numbers, and underscores. Spaces and special characters are not accepted. You can change the alias. For more information, see Connection Alias.
- Server — Enter the IPv4 address or URL of the Apache HBase server instance.
- Port — Optional. Enter the port number for the Apache HBase server. If not set, the default value of 5432 is used. Must be in the 1023 to 65535 range.
- AuthScheme — Enter one of the following authentication methods:
- NONE — No authentication.
- BASIC — Basic authentication.
- NEGOTIATE — Use Kerberos authentication.
See the Auth Scheme section in the CData documentation for Apache HBase for more information.
- User — Name of the database user with access to connect to Apache HBase.
- Password — Password for the database user. Not required if AuthScheme is set to NONE.
- Additional Parameters — Optional field where you can specify one or more connection string parameters. See the Connection String Options section of the CData documentation for a list of parameters that can be used and their default values. Note that in some cases the CData Apache HBase ADO.NET Provider does not fully support all of the possible parameters.
Syntax for the Additional Parameters field is as follows:
- All blank characters, except those within a value or within quotation marks, are ignored
- Preceding and trailing spaces are ignored unless enclosed in single or double quotes, such as Keyword=" value"
- Semicolons (;) within a value must be delimited by quotation marks
- Use a single quote (') if the value begins with a double quote (")
- Use a double quote (") if the value begins with a single quote (')
- Parameters are case-insensitive
- If a KEYWORD=VALUE pair occurs more than once in the connection string, the value associated with the last occurrence is used
- If a keyword contains an equal sign (=), it must be preceded by an additional equal sign to indicate that the equal sign is part of the keyword
- Parameters that are handled by other fields or default settings in the Connection dialog are ignored if used in the Additional Parameters field, including:
- Server
- Port
- AuthScheme
- User
- Password
- Logfile — This parameter is not visible in the Connection dialog, but is set by the Connector. The default size is a maximum of 10MB. Any CData log files generated by this setting are stored in the default TIBCO Scribe® Online Agent Logs directory, C:\Program Files (x86)\Scribe Software\TIBCO Scribe® Online Agent\logs\. The format for log file names for CData logs are as follows: <ConnectorName><GUID of the Connection><DateTimeStamp>.log
Note: For information on setting log file verbosity, see Verbosity in the CData Help.
- MaxLogFileSize — This parameter is set by the Connector to a maximum of 10MB.
- Other
- RTK
- Select Test to ensure that the Agent can connect to your database. Be sure to test the Connection against all Agents that use this Connection. See Testing Connections.
- Select OK/Save to save the Connection.
Metadata Notes
Tables and Views from the Apache HBase database are exposed as entities. HBase supports a bytes-in/bytes-out interface using Put and Result. Anything that can be converted to an array of bytes can be stored as a value. Because this database is untyped, the selection of a suitable type of data occurs when the metadata is returned. Metadata parser behavior can be controlled using the following parameters:
Naming
Connection metadata must have unique entity, relationship, and field names. If your Connection metadata has duplicate names, review the source system to determine if the duplicates can be renamed.
TypeDetectionScheme
This parameter controls how the data type of the field is determined. See the Type Detection Scheme Connection String option in the CData documentation. Possible values include:
- RowScan — Scans rows to determine the data type. See Metadata Parser Example.
- None — Returns all columns as strings. Default string length is 2000 unless DefaultColumnSize is specified in the Other Connection String. None is the default value for TypeDetectionScheme.
RowScanDepth
The number of rows to scan to determine columns and their data types for the table. Default value is 10000 rows. See the Row Scan Depth Connection String option in the CData documentation.
WARNING: For large tables, if this value is too low, data types may not be as accurate as they could be.
Metadata Parser Example
Assume there is a table of employees with one column family, labeled docs, with columns Age, Salary, Name and Birthday, and some corresponding values inside.
If TypeDetectionScheme is set to RowScan, the data is interpreted and displayed in TIBCO Scribe® Online as shown in the following table.
Field |
Data Type |
Value |
---|---|---|
docsAge |
Integer32 |
56 |
docsSalare |
DateTime |
2018-10-09T21:00:00Z |
docsName |
String(2000) |
John |
docsBirthday |
Integer31 |
80000 |
RowKey |
String(255) |
123 |
Relationships
- Hierarchical relationships, such as grandparent, parent, grandchild relationships are not supported. See Hierarchical Data for examples.
- Parent/Child relationships are not supported.
Apache HBase Connector As IS Source
Consider the following when using the Apache HBase Connector as an Integration Services source.
Filtering
The CData Provider supports two types of filtering:
- Native HBase filtering based on string comparison where HBase compares strings for an exact match. The comparison is case sensitive. This is the default setting and provides the best performance.
- Client filtering provides additional functionality for typing columns and returning them. This option decreases performance and is not recommended when using a Cloud Service. Client filtering is enabled when the Other Connection String parameter UseSQLFiltering is set to true.
Other=”UseSQLFiltering=true;”
WARNING: Using both HBase native filtering with dynamic column typing is not recommended. This can lead to unexpected filter results. Although the types of columns are defined on the client, the server still filters the strings that are provided to it for full compliance, and the local culture of the machine influences the generation of these lines. The string representing a value may differ from the original as shown in the following list of dates:
- 6/15/2009 (en-US)
- 15/06/2009 (fr-FR)
- 2009/06/15 (ja-JP)
- 2009-06-15 (ru-RU)
- When using a GUID as a filter value, you may see following error:
[500] Could not execute the specified command: Cannot compare data type of System.Guid to System.String [Impossible to cast type of System.String to System.Guid for value '00000000-0000-0000-0000-000000000000']
Use a TOSTRING function to convert the GUID to a string first. For example:
TOSTRING("00000000-0000-0000-0000-000000000000")
- Some other systems, such as the HUE HBase browser, store values with double-quotes, such as the Chinese alphabet. To find a value that contains quotes, you must use escape characters. For example, to find the value “幫助” you have to use value “\”幫助\””.
Net Change
Net Change is supported for date and timestamp values if metadata discovery is enabled by setting the TypeDetectionScheme Connection String to RowScan.
When a datetime is configured on the Query Block on the Block Properties Net Change Tab to query for new and updated records, that configuration is treated as an additional filter. The Net Change datetime filter is applied as an AND after any other filters specified on the Block Properties Filter Tab. TIBCO Scribe® Online builds a query combining both the Net Change filter and the filters on the Filter tab. See Net Change And Filters for an example.
Some Connectors for TIBCO Scribe® Online only support one filter. For those Connectors you can use either Net Change or one filter on the Filter tab, not both.
Note: The Net Change date is ignored when previewing data on the Preview tab. Filters on the Block Properties Filters tab are used to filter the data on the Preview tab.
DateTime Properties
Apache HBase stores all data as strings. Data type conversions are handled by the CData provider. When the RowScan option is specified for the TypeDetectionSchema Connection String, additional Connection Strings can be used to modify the behavior of the DateTime Conversion.
- DateTimeFormat —
- Format used when inserting datetime values into the database.
- This format is important if exposing all data as strings, because internal HBase filters require full compliance.
- Default format is yyyy-MM-dd'T'HH:mm:ss.fffzzz
- Other and ConvertDateTimeToGMT —
- Indicates whether to convert datetime values to GMT, instead of the local machine time.
- true — Default format is: 2017-02-02T20:12:12.000Z
- false — Format is: 2017-02-02T23:12:12.000+03:00
- DateTime values can be saved as the converted time in GMT or in the local timezone.
- Indicates whether to convert datetime values to GMT, instead of the local machine time.
Note: If a DateTime column contains only a time value, such as 12:32:55, it is treated as today's date with the time value appended.
Double Properties
The CData Provider uses the local machine’s culture to convert and read double values. Depending on the regional settings, the number formatting may differ. For example all strings in the following list are read as the same value:
- "123123123.123"
- "123123123,123"
- "123,123,123.123"
- "123 123 123.123"
- "123.123.123,123"
Note: Regardless of how these numbers are represented by the CData Provider, they are stored in the database as strings with different delimiters. This affects filtering when the TypeDetectionScheme Connection String is set to None.
Native Query
The Apache HBase Connector supports SQL queries in Native Query Blocks to create customized queries for Apache HBase. The query can be as simple or complex as you need it to be; however, it should return a single result set. The native query text is sent to Apache HBase exactly as it is entered without any modifications.
You can use SELECT , UPDATE , INSERT and DELETE clauses. If support for Enhanced SQL is enabled, you can use Joins and Aggregate functions. For additional details, see the SQL Compliance section of the CData documentation .
After entering the SQL query, you must select Test to validate the query. Invalid queries are not accepted by the Connector. See Native Query Block and Creating Native Queries For Microsoft SQL Server for additional information.
When entering a query for Apache HBase in the Native Query Block, note the following:
- Every column name must be inside either double-quotes or square brackets because in some cases there are quotes in the column names.
- The NULLS FIRST/LAST statement used to control the position of NULLs when sorting is supported. By default FIRST is used and NULLs are always at the beginning.
Example:
SELECT * FROM Demo ORDER BY [data:int] DESC NULLS LAST
- You can add columns, but not column families, using an INSERT statement.
Example:
Assume you have a table named "USER" with column family "data" and three columns in that family labeled "name", "age", and "address". To add a new column labeled "phone", use the following command:
INSERT INTO USER (rowkey, [data:name], [data:age], [data:address], [data:phone]) values (‘USER_1’, ‘Alex’, 34, ‘Washington DC’, ‘36842834682’)
After executing this command, reset metadata and you can use the new column in any Map Block.
Note: To create a new column family use Stored Procedure UpdateTable in the Native Query Block or use the Execute Block. For more information on stored procedures, see EXECUTE Statements in the CData documentation.
When testing a Native Query in a Map, if the source datastore does not return any data, TIBCO Scribe® Online cannot build the schema for the underlying metadata and the Map cannot be saved. To allow TIBCO Scribe® Online to build the schema, do the following:
- Create a single temporary record in the source datastore that matches the Native Query.
- Test the Native Query and ensure that it is successful.
- Save the Map.
- Remove the temporary record from the source datastore.
Apache HBase Connector As IS Target
Consider the following when using the Apache HBase Connector as an Integration Solution target.
Update And Insert
- Due to the HBase architecture, attempting to update or insert data into a column that does not exist inside an existing column family creates new columns in the selected column family. For example, if you have table with column name “test-family:test” and you execute an update or insert operation in Native Query with the column name “test-family:tost” a new column is created.
- Column names are case-sensitive. Using different cases generates new columns with a numerical suffix.
Column Names In HBase
Column Display Names
“family:EXAMPLE”
“family:EXAMPLE”
“family:example”
“family:example_1”
“family:ExaMPLE”
“family:ExaMPLE_2”
In this situation, to update the example column, you must use "family:example_1". Note that it is not case-sensitive. The following examples are considered the same:
UPDATE Test set “family:example_1” = 5
UPDATE Test set “FAMILY:EXAMPLE_1” = 5
- If the column data type is DateTime but only a time is entered, the current date is automatically added to the time.
- Columns cannot be named "column", for example, [family:column]. If column is used as a name, the following error is generated when updating or deleting:
System.Data.CData.ApacheHBase.ApacheHBaseException: '[500] Could not execute the specified command: The row [existing_row_key] does not exist.'
Batch Processing
Batch processing is not supported.
TIBCO Scribe® Online API Considerations
To create connections with the TIBCO Scribe® Online API, the Apache HBase Connector requires the following information:
Connector Name |
Apache HBase |
Connector ID |
F96010A2-5783-45D8-A248-38F3DC736B25 |
TIBCO Scribe® Online Connection Properties
In addition, this Connector uses the Connection properties shown in the following table.
Note: Connection property names are case-sensitive.
Name | Data Type | Required | Secured | Usage |
---|---|---|---|---|
Server |
string |
Yes |
No |
|
Port |
string |
No |
No |
Integer |
AuthSchem |
string |
Yes |
No |
Supported values: • NONE • BASIC • NEGOTIATE |
User |
string |
No |
No |
User can be empty in Apache HBase |
Password |
string |
No |
Yes |
Password can be empty in Apache HBase |
ConnectionString |
string |
No |
No |
|
License Agreement
The TIBCO Scribe® Online End User License Agreement for the Apache HBase Connector describes TIBCO and your legal obligations and requirements. TIBCO suggests that you read the End User License Agreement.
More Information
For additional information on this Connector, refer to the Knowledge Base and Discussions in the TIBCO Community.