Wednesday, August 31, 2022

Backing Up Databases In Cloudera

 Cloudera recommends that you schedule regular backups of the databases that Cloudera Manager uses to store configuration, monitoring, and reporting data and for managed services that require a database


Cloudera Manager Server - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up



To back up a PostgreSQL database, use the same procedure whether the database is embedded or external:


Step 1: Start


Step 2: Log in to the host where the Cloudera Manager Server is installed


Step 3:  Get the name, user, and password properties for the Cloudera Manager database from /etc/cloudera-scm-server/db.properties


com.cloudera.cmf.db.name=scm

com.cloudera.cmf.db.user=scm

com.cloudera.cmf.db.password=NnYfWIjlbk


Step 4: Run the following command as root using the parameters from the preceding step

 pg_dump -h hostname -p 7432 -U scm > /tmp/scm_server_db_backup.$(date +%Y%m%d)


Step 5: Enter the password from the com.cloudera.cmf.db.password property in step 2


Step 6: Stop


Tuesday, August 30, 2022

linux-filesystems

 By default, the ext3 and ext4 filesystems reserve 5% space for use by the root user. This reserved space counts as Non DFS Used.


To view the reserved space use the tune2fs command.
$ sudo lsblk  
sudo tune2fs -l /dev/nvme0n1p1 | egrep "Block count|Reserved block count"

OUTPUT :
Reserved block count: 36628312
Block size:      4096

The Reserved block count is the number of ext3/ext4 filesystem blocks that are reserved. The block size is the size in bytes.

Cloudera recommends reducing the root user block reservation from 5% to 1% for the DataNode volumes.

To set reserved space to 1% with the tune2fs command

tune2fs -m 1 /dev/sde1

run the following command to see output

$ sudo tune2fs -l /dev/nvme0n1p1 | egrep "Block count|Reserved block count"

Monday, August 29, 2022

Oozie

 Apache Oozie is a Java Web application used to schedule Apache Hadoop jobs. it is a workflow scheduler system to manage Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work.


It is integrated with the Hadoop stack, with YARN as its architectural center, and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop.


Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts)


Oozie is a scalable, reliable and extensible system.


There are two basic types of Oozie jobs

1) Oozie Workflow jobs are Directed Acyclical Graphs (DAGs), specifying a sequence of actions to execute.

2) Oozie Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.