{question}
How do we load data from the MongoDB Manual (On-Prem) to the SingleStore self-managed using Change Data Capture (CDC-IN)?
{question}
{answer}
In this article, we will discuss how to load data from an on-premises MongoDB to an on-premises SingleStore using the CDC-IN feature.
- Deploy 3 EC2 instances to create a 3-node replica set for MongoDB.
- Install MongoDB on all three servers using the following documentation here.
- After MongoDB is installed, check the status of the 'mongod' service.
sudo systemctl status mongod (to check the status of the mongod service)
sudo systemctl enable mongod (to enable or start the mongod service )
- Navigate to path '/etc/mongod.conf' and edit the 'mongod.conf' file on all of the three instances.
- Change the bind IP here to 0.0.0.0 to bind all the IPs on all the instances.
- Edit and add the replication entry and give a common replica set name to make a 3-node replica set.
# mongod.conf
# for documentation of all options, see:
# http://docs.mongodb.org/manual/reference/configuration-options/
# where to write logging data.
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
# Where and how to store data.
storage:
dbPath: /var/lib/mongo
journal:
enabled: true
# engine:
# wiredTiger:
# how the process runs
processManagement:
timeZoneInfo: /usr/share/zoneinfo
# network interfaces
net:
port: 27017
bindIp: 0.0.0.0 # Enter 0.0.0.0,:: to bind to all IPv4 and IPv6 addresses
or, alternatively, use the net.bindIpAll setting.
#security:
#operationProfiling:
replication:
replSetName: myReplicaSet
#sharding:
## Enterprise-Only Options
#auditLog:
#snmp:
- After this restart the 'mongod' service using the below commands -
sudo systemctl restart mongod (to restart the mongod service )
sudo systemctl status mongod (to check the status of the mongod service)
- After this log do a 'mongosh' and run the following commands:
##Use the rs.initiate() command to initialize the replica set
rs.initiate(
{
_id: "myReplicaSet",
members: [
{ _id: 0, host: "server1IPaddress:27017" },
{ _id: 1, host: "server2IPaddress:27017" },
{ _id: 2, host: "server3IPaddress:27017" }
]
}
)
## To verify the replica set status using the rs.status() command in the mongo shell to ensure that replication is working correctly.
## You can find the primary and secondary nodes hostnames & ports here also in the response of the rs.status(); command
rs.status();
- Create a user and password in the admin database with root permissions to authenticate while creating the pipeline.
# start using the admin database
use admin
## command to create an admin user and password with root privileges
db.createUser( { user: "adminUser", pwd: "adminPassword", roles: [ { role: "root", db: "admin" } ] } )
- Since there would be no data present, create a random database and a collection inside that with a few dummy entries.
## Start using testdb database
use testdb
## Create a collection named as dummyCollection
db.createCollection("dummyCollection")
## Insert few dummy entries inside the collection
db.dummyCollection.insertMany([
{ name: "John", age: 30 },
{ name: "Jane", age: 25 },
{ name: "Doe", age: 35 }
])
## check if data is inserted and present
db.dummyCollection.find()
- Login to your SingleStore cluster & check if Java is installed on the master node.
- If Java is not installed, please install it using the commands below.
sudo apt update # For Ubuntu/Debian
sudo yum update # For CentOS/RHEL
sudo apt install default-jdk # For Ubuntu/Debian
sudo yum install java-1.8.0-openjdk # For CentOS/RHEL (Java 8)
java -version (verify installation)
- Log in to SingleStore studio now and set the Java path using the command below.
which java (to get the path where the java is installed)
OR
whereis java (to get the path where the java is installed)
##SQL Command to be run in SingleStore Studio
SET GLOBAL java_pipelines_java11_path = 'javafilepath';
OR
sdb-admin update-config --all --set-global --key "java_pipelines_java11_path" --value "/usr/bin/java"
- After this please run the below-mentioned commands to create a pipeline link -
##SQL Command to be run in SingleStore Studio
If you have MongoDB installed please use Shell connection string is this -
Shell - mongosh "mongodb+srv://cluster0.rrb2vgy.mongodb.net/" --apiVersion 1 --username <db_username>
You can use the "mongodb+srv://cluster0.rrb2vgy.mongodb.net/" part in your SingleStore command CONFIG part to detect your Atlas Cluster.
Create LINK linkname AS MONGODB
CONFIG '{"mongodb.connection.string": "mongodb+srv://freecluster.vnizelt.mongodb.net/"}' ,
"collection.include.list": "databasename.collectionname",
"mongodb.ssl.enabled":"true",
"mongodb.authsource":"admin",
"mongodb.members.auto.discover": "false"}'
CREDENTIALS '{"mongodb.user":"admin",
"mongodb.password":"admin"}';
OR
Create LINK linkname AS MONGODB
CONFIG '{"mongodb.hosts":"primaryhostname:port,secondaryhostname:port,secondaryhostname:port",
"collection.include.list": "databasename.collectionname",
"mongodb.ssl.enabled":"false",
"mongodb.authsource":"admin",
"mongodb.members.auto.discover": "true"}'
CREDENTIALS '{"mongodb.user":"username",
"mongodb.password":"password"}';
- After creating the link, run the command below to start loading data from the MongoDB manual to the SingleStore on-prem cluster.
##SQL Command to be run in SingleStore Studio
## Infer tables from source
## Create tables, pipelines, and stored procedures in SingleStore based on the inference from the source collections.
CREATE TABLES AS INFER PIPELINE AS LOAD DATA LINK mongotest '*' FORMAT AVRO;
- Verify if the data is present in the tables. If not perform the following troubleshooting steps:
##SQL Command to be run in SingleStore Studio
##To check the STATUS of pipeline created above
SHOW PIPELINES; (Check the status of pipelines)
###IF your pipelines are showing stopped run the command-
START PIPELINE PIPELINENAME;
OR
START ALL PIPELINES;
-
To view the ingested data, run the following SQL statement:
SELECT _id :> JSON , _more :> JSON FROM <table_name>;
- Starting SingleStore 8.7 Sharded clusters are supported as well.
Related Links :
Install MongoDB on a Linux machine
What is Replication & Replica Set
What is Sharding & Sharded Cluster
Load Data from MongoDB to SingleStore Self-Managed
{answer}