createPartitionedTable
Syntax
createPartitionedTable(dbHandle, table, tableName, [partitionColumns], [compressMethods])
Arguments
dbHandle is a DFS database handle returned by function database. The database can be either in the local file system, or in the distributed file system. If the dbHandle is empty or unspecified, the table is an in-memory partitioned table.
table is a table or a tuple of tables. The table schema will be used to construct the new partitioned table.
tableName is a string indicating the name of the partitioned table.
partitionColumns is a STRING scalar/vector indicating the partitioning column(s). For a composite partition, partitionColumns is a string vector.
compressMethods is a dictionary indicating which compression methods are used for specified columns. The keys are columns name and the values are compression methods ("lz4" or "delta"). If unspecified, use LZ4 compression method. Note:
-
The delta compression method can be used for SHORT, INT, LONG or temporal data types.
-
Save strings as SYMBOL type to enable compression of strings.
Details
Create an empty partitioned table with the same schema as the specified table.
- To create a DFS table or a table on disk, parameter table must be a table. This function is used with append! or tableInsert to generate a partitioned table. It cannot be used to create a partitioned table with sequential domain.
- To create an in-memory partitioned table, parameter table can be a table or a tuple of tables. The number of tables given by the parameter table must be the same as the number of partitions in the database.
Note:
-
Only the schema of table is used. None of the rows in table is imported to the newly created partitioned table.
-
For a DFS database, the maximum number of handles (including temporary handles*) to partitioned tables is 8,192 per node.
*temporary handles: If no handle is specified when you create a partitioned table in
a DFS database with createPartitionedTable
, each database creates a
temporary handle to hold the return value. If you create multiple tables under the
same database, the temporary handle for the database is overwritten each time.
Examples
Example 1. Create a DFS table
n=10000
t=table(2020.01.01T00:00:00 + 0..(n-1) as timestamp, rand(`IBM`MS`APPL`AMZN,n) as symbol, rand(10.0, n) as value)
db = database("dfs://rangedb_tradedata", RANGE, `A`F`M`S`ZZZZ)
Trades = db.createPartitionedTable(table=t, tableName="Trades", partitionColumns="symbol", compressMethods={timestamp:"delta"});
At this point, the table Trades is empty. The schema of Trades is the same as the schema of table t. Then we append table t to table Trades.
Trades.append!(t);
Now the contents of table Trades have been updated on disk. In the DFS system, the system doesn't dynamically refresh the contents of tables. We need to load the table into memory before we can work with it interactively.
Trades=loadTable(db,`Trades);
select min(value) from Trades;
// output
0
After appending data to a DFS table, we don't need to use function loadTable to load the table before querying
the table, as the system automatically refreshes the table after appending
operations. After system restarts, however, we need to use
loadTable
to load a DFS table before querying the table.
Example 2. Create in-memory partitioned tables
Example 2.1. Create a partitioned in-memory table
n = 20000
colNames = `time`sym`qty`price
colTypes = [TIME,SYMBOL,INT,DOUBLE]
t = table(n:0, colNames, colTypes)
db = database(, RANGE, `A`D`F)
pt = db.createPartitionedTable(t, `pt, `sym)
insert into pt values(09:30:00.001,`AAPL,100,56.5)
insert into pt values(09:30:01.001,`DELL,100,15.5)
Example 2.2. Create a partitioned keyed table
n = 20000
colNames = `time`sym`qty`price
colTypes = [TIME,SYMBOL,INT,DOUBLE]
t = keyedTable(`time`sym, n:0, colNames, colTypes)
db = database(, RANGE, `A`D`F)
pt = db.createPartitionedTable(t, `pt, `sym)
insert into pt values(09:30:00.001,`AAPL,100,56.5)
insert into pt values(09:30:01.001,`DELL,100,15.5)
Example 2.3. Create a partitioned stream table
When creating a partitioned stream table, the second parameter of
createPartitionedTable
must be a tuple of tables, and its
length must be equal to the number of partitions. Each table in the tuple represents
a partition. In the following example, trades_stream1 and trades_stream2 form a
partitioned stream table trades. We cannot directly write data to trades. Instead,
we need to write to trades_stream1 and trades_stream2.
n=200000
colNames = `time`sym`qty`price
colTypes = [TIME,SYMBOL,INT,DOUBLE]
trades_stream1 = streamTable(n:0, colNames, colTypes)
trades_stream2 = streamTable(n:0, colNames, colTypes)
db=database(, RANGE, `A`D`F)
trades = createPartitionedTable(db,[trades_stream1, trades_stream2], "", `sym)
insert into trades_stream1 values(09:30:00.001,`AAPL,100,56.5)
insert into trades_stream2 values(09:30:01.001,`DELL,100,15.5)
select * from trades;
time | sym | qty | price |
---|---|---|---|
09:30:00.001 | AAPL | 100 | 56.5 |
09:30:01.001 | DELL | 100 | 15.5 |
Example 2.4. Create a partitioned MVCC table
Similar to a partitioned stream table, to create a partitioned MVCC table, the second
parameter of createPartitionedTable
must be a tuple of tables, and
its length must be equal to the number of partitions. Each table in the tuple
represents a partition. In the following example, trades_mvcc1 and trades_mvcc2 form
a partitioned MVCC table trades. We cannot directly write data to trades. Instead,
we need to write to trades_mvcc1 and trades_mvcc2.
n=200000
colNames = `time`sym`qty`price
colTypes = [TIME,SYMBOL,INT,DOUBLE]
trades_mvcc1 = mvccTable(n:0, colNames, colTypes)
trades_mvcc2 = mvccTable(n:0, colNames, colTypes)
db=database(, RANGE, `A`D`F)
trades = createPartitionedTable(db,[trades_mvcc1, trades_mvcc2], "", `sym)
insert into trades_mvcc1 values(09:30:00.001,`AAPL,100,56.5)
insert into trades_mvcc2 values(09:30:01.001,`DELL,100,15.5)
select * from trades;
time | sym | qty | price |
---|---|---|---|
09:30:00.001 | AAPL | 100 | 56.5 |
09:30:01.001 | DELL | 100 | 15.5 |