Migrating from MergeTree to ReplicatedMergeTree in ClickHouse: A Step-by-Step Guide

April 11, 2024 | by techdbzone.com

ClickHouse is a robust columnar database management system renowned for its capabilities in handling analytics at scale. It offers a variety of table engines, among which MergeTree and ReplicatedMergeTree are particularly popular. In this blog post, we delve into the process of migrating a non-replicated MergeTree table to a ReplicatedMergeTree table, using the example of an inventory table.

Why Migrate to ReplicatedMergeTree?

Migrating to a ReplicatedMergeTree offers several benefits, particularly in terms of enhanced data redundancy, fault tolerance, and scalability across multiple nodes in a cluster. This migration ensures that data is continuously replicated to other nodes, safeguarding against hardware failures and ensuring data availability for read and write operations even if one or more nodes fail.

Step 1: Assess the Current Table

The first step in the migration process is to evaluate the existing table that you plan to migrate. In our case, the inventory table is sized at 30.55 GiB with a total disk size used of 160.46 GiB.

Step 2: Create the Replicated Table

Creating the replicated version of the table involves defining a schema that mirrors the existing table but changes the engine to ReplicatedMergeTree. It is crucial to specify a proper ZooKeeper path, which is vital for managing the state of the replicas in the cluster. Here is an example schema for the inventory_store_replicated table:

CREATE TABLE inventory_store_replicated ON CLUSTER sng_clickhouse_cluster
(
    `eventTimestamp` Int64,
    `warehouse_id` LowCardinality(String),
    `inventory_offset` Int64 CODEC(T64),
    `productIdentifier` LowCardinality(String),
    `InventoryEvent` LowCardinality(String),
    `InventoryStatus` LowCardinality(String),
    `EventType` LowCardinality(String) DEFAULT 'UPDATE',
    `UpdateTimestamp` Nullable(Int64),
    `ProductCategoryID` Int64,
    `ProductID` Int64,
    `Quantity` Float64,
    `SupplyType` LowCardinality(String),
    `TransactionType` LowCardinality(String),
    `TransactionPrice` Float64,
    `RetailPrice` Float64,
    `TransactionID` String,
    `OrderID` Int64,
    `SupplierID` LowCardinality(String),
    `ProductDescription` String,
    `ManagerID` UInt32,
    `ExternalReferenceID` String,
    `CustomerID` String,
    `TransactionHash` Int32,
    `order_size` Nullable(Float64),
    `order_currency_isoCode` LowCardinality(String),
    `product_departmentCode` LowCardinality(String),
    `sku` LowCardinality(String),
    `order_priceMultiplier` Nullable(Float64),
    `conversion_rate_to_eur` Nullable(Float64),
    `currency_conversion_inverse` UInt8,
    `recordedRetailPrice` Nullable(Float64),
    `estimatedWholesalePrice` Nullable(Float64),
    `estimatedCostPrice` Nullable(Float64),
    `profitMargin` Nullable(Float64),
    `wholesaleDelayMs` Array(Int64) DEFAULT if(recordedRetailPrice IS NULL, [], [0]),
    `markout_estimatedCostPrice` Array(Nullable(Float64)) DEFAULT if(recordedRetailPrice IS NULL, [],
[estimatedCostPrice]),
    `storeIdentifier` LowCardinality(Nullable(String)),
    `cancel_eventTimestamp` Nullable(Int64),
    `LatestPrice` Decimal(18, 8),
    `SupplySide` String,
    `SupplierProductID` Int64,
    `ProductSKU` LowCardinality(String),
    `UnitSize` UInt64,
    `PriceChangeDirection` Int8,
    `PriceTicks` Decimal(38, 8),
    `PriceChangeBps` Decimal(38, 8),
    `MarketType` LowCardinality(String),
    `DataSourceType` LowCardinality(String)
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/inventory_store', '{replica}')
PARTITION BY toDate(fromUnixTimestamp(eventTimestamp))
PRIMARY KEY eventTimestamp
ORDER BY (eventTimestamp, warehouse_id, ProductID, TransactionID)
TTL toDate(fromUnixTimestamp(eventTimestamp)) + toIntervalYear(1)
SETTINGS index_granularity = 8192;

Note – I have used `inventory_store` in Zookeeper path as I am going to rename tables after migration , so path name will be same as table name for easy identification.

Step 3: Attach Existing Partitions

Before transferring data, it’s necessary to attach existing partitions from the non-replicated table to the replicated table. This can be automated using SQL scripts that generate the required ALTER TABLE commands:

-- Using the concat function
SELECT DISTINCT concat('ALTER TABLE inventory_store_replicated ATTACH PARTITION ID \'', partition_id, '\' FROM inventory_store;')
FROM system.parts
WHERE (table = 'inventory_store') AND active;

-- Using manual concatenation
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221102' FROM inventory_store;
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221103' FROM inventory_store;
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221104' FROM inventory_store;
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221201' FROM inventory_store;
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221202' FROM inventory_store;
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221205' FROM inventory_store;
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221206' FROM inventory_store;
ALTER TABLE inventory_store_replicated ATTACH PARTITION ID '20221207' FROM inventory_store;

Alternatively, if you know the partition IDs, you can manually construct and execute the ALTER TABLE commands:

Step 4: Execute the Migration Queries

After preparing the ALTER TABLE commands, execute them to move the partitions. This can be done in batch or individually, depending on your preference and the system’s performance during the operation.

Step 5: Verify the Migration

Once the data transfer is complete, it’s essential to verify the integrity and completeness of the migration:

These queries ensure that the record counts in the source and destination tables match, confirming a successful migration.

-- Check count for the source table
SELECT count(1) FROM inventory_store;

-- Check count for the destination table
SELECT count(1) FROM inventory_store_replicated;

Step 6: Rename and Redirect Traffic

After confirming the data integrity, the next step is to rename the old table and redirect application traffic to the new replicated table:

RENAME TABLE inventory_store TO inventory_store_old, inventory_store_replicated TO inventory_store;

Step 7: Monitor and Cleanup

Monitor the new system closely for 1-2 days to ensure it operates as expected. If no issues arise, it’s safe to drop the old table:

DROP TABLE inventory_store_old;

Conclusion

Migrating from a MergeTree to a ReplicatedMergeTree engine in ClickHouse can significantly improve the resilience and scalability of your database infrastructure. By following the detailed steps outlined in this guide, database administrators can ensure a smooth transition with minimal downtime. Always ensure to test the migration process in a staging environment before applying it in production to avoid any data discrepancies or interruptions.

View all

TechDBzone

Migrating from MergeTree to ReplicatedMergeTree in ClickHouse: A Step-by-Step Guide

Why Migrate to ReplicatedMergeTree?

Step 1: Assess the Current Table

Step 2: Create the Replicated Table

Step 3: Attach Existing Partitions

Step 4: Execute the Migration Queries

Step 5: Verify the Migration

Step 6: Rename and Redirect Traffic

Step 7: Monitor and Cleanup

Conclusion

RELATED POSTS

Migrating from MergeTree to ReplicatedMergeTree in ClickHouse: A Step-by-Step Guide

Why Migrate to ReplicatedMergeTree?

Step 1: Assess the Current Table

Step 2: Create the Replicated Table

Step 3: Attach Existing Partitions

Step 4: Execute the Migration Queries

Step 5: Verify the Migration

Step 6: Rename and Redirect Traffic

Step 7: Monitor and Cleanup

Conclusion

RELATED POSTS

Exploring Indexing in MySQL and ClickHouse

How to Add a New Node to an Already Existing ClickHouse Cluster