Cloud Stable: August 2022

Wednesday, August 31, 2022

Backing Up Databases In Cloudera

Cloudera recommends that you schedule regular backups of the databases that Cloudera Manager uses to store configuration, monitoring, and reporting data and for managed services that require a database

Cloudera Manager Server - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up

To back up a PostgreSQL database, use the same procedure whether the database is embedded or external:

Step 1: Start

Step 2: Log in to the host where the Cloudera Manager Server is installed

Step 3: Get the name, user, and password properties for the Cloudera Manager database from /etc/cloudera-scm-server/db.properties

com.cloudera.cmf.db.name=scm

com.cloudera.cmf.db.user=scm

com.cloudera.cmf.db.password=NnYfWIjlbk

Step 4: Run the following command as root using the parameters from the preceding step

pg_dump -h hostname -p 7432 -U scm > /tmp/scm_server_db_backup.$(date +%Y%m%d)

Step 5: Enter the password from the com.cloudera.cmf.db.password property in step 2

Step 6: Stop

Tuesday, August 30, 2022

linux-filesystems

By default, the ext3 and ext4 filesystems reserve 5% space for use by the root user. This reserved space counts as Non DFS Used.

To view the reserved space use the tune2fs command.
$ sudo lsblk
sudo tune2fs -l /dev/nvme0n1p1 | egrep "Block count|Reserved block count"

OUTPUT :
Reserved block count: 36628312
Block size: 4096

The Reserved block count is the number of ext3/ext4 filesystem blocks that are reserved. The block size is the size in bytes.

Cloudera recommends reducing the root user block reservation from 5% to 1% for the DataNode volumes.

To set reserved space to 1% with the tune2fs command

tune2fs -m 1 /dev/sde1

run the following command to see output

$ sudo tune2fs -l /dev/nvme0n1p1 | egrep "Block count|Reserved block count"

Monday, August 29, 2022

Oozie

Apache Oozie is a Java Web application used to schedule Apache Hadoop jobs. it is a workflow scheduler system to manage Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work.

It is integrated with the Hadoop stack, with YARN as its architectural center, and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts)

Oozie is a scalable, reliable and extensible system.

There are two basic types of Oozie jobs

1) Oozie Workflow jobs are Directed Acyclical Graphs (DAGs), specifying a sequence of actions to execute.

2) Oozie Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.