DBCollect/Overview: Difference between revisions

From Dirty Cache Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 1: Line 1:
[[Category:DBCollect]]
[[Category:DBCollect]]
{{PrevNext|DBC:Errors/DBC-E002|DBCollect/Considerations}}__NOTOC__
{{PrevNext|DBCollect|DBCollect/Considerations}}__NOTOC__
= Overview =
= Overview =



Latest revision as of 15:39, 10 October 2024



Overview

What is DBCollect

DBCollect is a performance and metadata collection tool for servers running Oracle databases, providing workload and config data from database hosts

It is NOT (yet?) an official DELL tool - but written by Bart Sjerps as an independent Open-Source project

It collects database and host information from Oracle database servers in a standardized way, so it can easily be processed by workload analysis tools. It supports Enterprise Linux (RHEL, OEL, CentOS) as well as AIX and Solaris (SPARC).

DBcollect on Github

How does it work

  1. System or Database Administrator runs DBCollect on a database host
    • Can be a VM or bare metal host
    • Should be run on all database nodes in scope of the project
    • Supported are Linux (EL6 or higher, runs out of the box), AIX, HP-UX or Solaris (may require some steps to make it work)
  2. DBCollect detects running Oracle databases (instances)
    • Generates capacity and performance reports
    • Also picks up OS config and performance data (SAR/Sysstat)
  3. Output is created as a ZIP file in TEMP dir
    • /tmp/dbcollect-<hostname>.zip
  4. Customer sends ZIP files for further processing
    • Secure FTP, or direct upload to LoadMaster website (via provided secure UUID url)
  5. Additional info to be provided, such as
    • Licensing details
    • Grouping (which systems are prod, test/dev, DR etc), VM-to-Hypervisor mapping
    • Any current issues (performance, other)
    • Physical locations
    • SLA requirements (RPO/RTO, yearly growth, platform requirements/preferences i.e. hypervisors, storage/servers, etc)

Why DBcollect

In the past, I frequently got workload data from customers in all kinds of strange formats (unusable AWR formats, reports in text format, with far too large AWR intervals, etc). Unix info was usually either not available at all or a random dump of output of some commands. By standardizing the reporting in a single tool, we get consistent, reliable database and host information.

This is needed for creating workload assessments where the data must be machine readable in a standardized format.

In most cases it automatically detects all running Oracle instances on the host.

Problem

  • Customers find all kinds of creative ways to provide system data
    • Excel sheets, Word docs, strange RAR or TAR files, AWR reports with insanely long or short intervals (6 weeks ) or weird language settings, etc.
    • Not easily processed by automated tools (Try to load Excel into a database table)
    • Still missing capacity and system data (how big is the database? What processor is being used? Compression? Encryption? Backups?)
    • Usually only for one or few databases (customer may have hundreds)
    • Often unreliable (outdated, non-deterministic metrics, and so on)

Solution

  • Provide a standard way of getting OS and Database performance/config data
    • Oracle AWR, Linux/UNIX SAR, Database configuration, OS configuration
    • Retrieves historic data (which is already stored on the OS and in the databases)
    • All (running) Oracle instances, at least 10 days (Oracle) or 1 month (OS) performance
  • Be as secure and safe as possible
    • Cannot change anything on the databases
    • Only reads/writes files in the /tmp directory
    • Does not pick up security sensitive data (passwords, etc)
    • Honor customer licenses (in particular, Diagnostics Pack is required to generate AWRs)
  • Make it as easy as possible for the administrator to run it
    • Tool consists of one file, requires running one command only, simple installation
    • Generates only one output (ZIP) file in the /tmp directory
  • Open-Source (GitHub, GPLv3 licensed)
    • Users can review the code (Python, SQL*Plus scripts and commands)
    • Users are encouraged to help testing, writing improvements (via Git pull requests or just by email)

DBcollect architecture

DBcollect is written in Python. It supports Python 2 and 3 (mainly because a lot of legacy systems only have Python 2 installed). It also contains a few SQL scripts that are executed by Oracle SQL*Plus. It is distributed as a ZipApp package. This means it is not binary compiled, but all Python and SQL files are bundled in a ZIP file in such a way that it can be executed as a single command but users can inspect the contents by simply running unzip:

unzip -v /usr/local/bin/dbcollect

Collected data

  • Host CPU, memory, network interface and disk info
  • Running processes, file systems, installed packages, os version and release etc.
  • Hardware vendor, product etc where possible
  • SAR reports
  • AWR or Statspack reports for the given period (default 10 days)
  • Additional database information (such as capacity of tablespaces and disks, settings, compression analysis, backup, ...)

For detailed info on what data is collected, most commands and files are in the config section: DBcollect config

Removing SQL code

DBcollect generates HTML formatted Oracle AWR reports. These reports usually contain snippets of SQL queries. These snippets may sometimes help us in finding the root cause of slow queries, but if it is prohibited to send SQL queries outside of the organization, the SQL sections can be completely removed from the DBcollect ZIP files:

dbcollect --strip

Note that this only works for HTML-formatted AWR reports, it does NOT work for Statspack.

Safety

DBcollect is designed to run on production systems and great effort has been spent on making it robust. The SECURITY section on the github page explains in great detail how this works. Note that dbcollect has an option to remove SQL code from AWR reports, in case this is condidered a security or confidentiality issue.

root user

DBcollect does NOT run as root (ever). If you execute it as root, it will re-launch itself as another user, usually 'oracle' or 'nobody' on systems without Oracle. You can force another user:

dbcollect --user oraprod

Example

# Install dbcollect via the quick install metohd
curl https://raw.githubusercontent.com/outrunnl/dbcollect/master/scripts/download | python
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2154  100  2154    0     0   4018      0 --:--:-- --:--:-- --:--:--  4011
Github dbcollect downloader - download the latest dbcollect release
-> Retrieving GitHub metadata from https://api.github.com/repos/outrunnl/dbcollect/releases/latest
-> Downloading binary version 1.16.3 from https://github.com/outrunnl/dbcollect/releases/download/v1.16.3/dbcollect
-> md5 hash: 6d080140269bf9fe14039cfbfbd6806e
-> Saving binary as /usr/local/bin/dbcollect
-> Setting permissions to mode 755

# Test version
[root@db08 ~](-) # dbcollect --version
dbcollect 1.16.3 - collect Oracle AWR/Statspack, database and system info
Author:    Bart Sjerps <info@dirty-cache.com>
Copyright: Copyright 2023, Bart Sjerps
License:   GPLv3+, https://www.gnu.org/licenses/gpl-3.0.html
Version:   1.16.3
Builddate: 2024-07-01 15:04
Buildhash: 1e08f98a7ca3b546d166c964300be221f952c7a0

Verify dbcollect source code

# Verify dbcollect contents
# As dbcollect is actually a ZIP file, you could unzip it and verify the code
unzip -v /usr/local/bin/dbcollect 
Archive:  /usr/local/bin/dbcollect
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       0  Stored        0   0% 07-01-2024 15:01 00000000  lib/
       0  Stored        0   0% 07-01-2024 15:01 00000000  modules/
       0  Stored        0   0% 07-01-2024 15:01 00000000  sql/
    7816  Stored     7816   0% 06-10-2024 13:06 267803f6  dbcollect.py
       0  Stored        0   0% 10-31-2023 19:14 00000000  lib/__init__.py
     257  Stored      257   0% 07-01-2024 15:04 34d07d31  lib/buildinfo.py
    2767  Stored     2767   0% 05-30-2024 15:33 d95147e0  lib/user.py
    1984  Stored     1984   0% 05-30-2024 15:34 5272c204  lib/archive.py
    2322  Stored     2322   0% 05-30-2024 15:34 fdd6efa2  lib/functions.py
--- snip ---
     209  Stored      209   0% 02-05-2024 08:38 446743e2  sql/splunk/readme.txt
      79  Stored       79   0% 02-05-2024 08:38 3b1af7bd  sql/splunk/splunk_header.sql
      58  Stored       58   0% 07-01-2024 15:04 8fed2ee0  __main__.py
--------          -------  ---                            -------
  166163           166163   0%                            81 files

Run dbcollect

# Run dbcollect (default 10 days of AWR, user 'oracle'
dbcollect 1.6.0 - collect Oracle AWR/Statspack, database and system info
INFO     : dbcollect 1.6.0 - database and system info collector
INFO     : Current user is oracle
INFO     : Zip file is /tmp/dbcollect-db01.lan.zip
INFO     : Collecting OS info (Linux)
INFO     : Collecting Linux SAR files
INFO     : Collecting Oracle info
INFO     : Found instance DEMO on /u01/app/oracle/product/12.1.0/dbhome_1
WARNING  : Skipped DEMO (DOWN)
INFO     : Found instance DEMO2 on /u01/app/oracle/product/12.1.0/dbhome_1
WARNING  : Skipped DEMO2 (DOWN)
INFO     : Found instance demo on /u01/app/oracle/product/12.1.0/dbhome_1
WARNING  : Skipped demo (DOWN)
INFO     : Found instance DB01 on /u01/app/oracle/product/12.1.0/dbhome_1
INFO     : DB01: AWR usage detected, generating reports 
INFO     : Generating Oracle AWR/Statspack reports, with 1 workers
INFO     : processing instance DB01
report 15 of 230 (6.5% done), elapsed: 0:00:34, remaining: 0:08:18
INFO     : Zip file /tmp/dbcollect-db01.lan.zip is created succesfully.
INFO     : Finished

Review collected data

# Inspect ZIP file contents
[root@db01 ~](-) # unzip -v /tmp/dbcollect-db01.lan.zip 
Archive:  /tmp/dbcollect-db01.lan.zip
dbcollect version=1.6.0 hostname=db01.lan
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     541  Defl:N      272  50% 07-06-2021 13:07 0fea7e9c  db01.lan/meta.json
    1247  Defl:N      663  47% 07-06-2021 13:07 176503d7  db01.lan/cmd/lscpu
    1133  Defl:N      145  87% 07-06-2021 13:07 a248c133  db01.lan/cmd/lsscsi
      40  Defl:N       27  33% 07-06-2021 13:07 4cbb2c4b  db01.lan/cmd/sestatus
    2476  Defl:N      865  65% 07-06-2021 13:07 94874f95  db01.lan/cmd/lsmod
  139387  Defl:N    25763  82% 07-06-2021 13:07 7d28e51b  db01.lan/cmd/dmesg
   20691  Defl:N     3664  82% 07-06-2021 13:07 912b04bc  db01.lan/cmd/psef
   26229  Defl:N     4734  82% 07-06-2021 13:07 0bac6129  db01.lan/cmd/psfaux
    5050  Defl:N      889  82% 07-06-2021 13:07 aafb6b4a  db01.lan/cmd/lsblk
    1076  Defl:N      406  62% 07-06-2021 13:07 a73ecda5  db01.lan/cmd/dfpt
     326  Defl:N      196  40% 07-06-2021 13:07 85199d6c  db01.lan/cmd/iplink
     433  Defl:N      181  58% 07-06-2021 13:07 5e9c5f56  db01.lan/cmd/ipaddr
   45599  Defl:N    17256  62% 07-06-2021 13:07 1e259717  db01.lan/cmd/rpmqf
   28046  Defl:N     6136  78% 07-06-2021 13:07 7d437fa2  db01.lan/cmd/sysctla
    2020  Defl:N      648  68% 07-06-2021 13:07 b7ca76b3  db01.lan/proc/cpuinfo
    1310  Defl:N      521  60% 07-06-2021 13:07 128cc245  db01.lan/proc/meminfo
     378  Defl:N      151  60% 07-06-2021 13:07 83f093db  db01.lan/proc/filesystems
     979  Defl:N      296  70% 07-06-2021 13:07 562875d5  db01.lan/proc/partitions
     536  Defl:N      299  44% 07-06-2021 13:07 696196ae  db01.lan/proc/devices
    3142  Defl:N      764  76% 07-06-2021 13:07 32da3154  db01.lan/proc/mounts
      40  Defl:N       42  -5% 07-06-2021 13:07 f842b1fd  db01.lan/proc/mdstat
     215  Defl:N      157  27% 07-06-2021 13:07 2091f2f6  db01.lan/proc/misc
      22  Defl:N       24  -9% 07-06-2021 13:07 c1ca737d  db01.lan/proc/uptime
     393  Defl:N      226  43% 11-23-2020 16:08 358d6782  db01.lan/etc/os-release
      37  Defl:N       39  -5% 11-23-2020 16:08 36030dbe  db01.lan/etc/system-release
     103  Defl:N      102   1% 01-03-2021 15:34 02058559  db01.lan/etc/issue
      87  Defl:N       86   1% 01-03-2021 15:34 ad1f98c9  db01.lan/etc/motd
    1019  Defl:N      520  49% 05-31-2021 21:05 547a8978  db01.lan/etc/oratab
      13  Defl:N       15 -15% 06-20-2021 00:16 875d0c1f  db01.lan/sys/class/dmi/id/sys_vendor
      24  Defl:N       26  -8% 06-21-2021 11:59 a4480e0b  db01.lan/sys/class/dmi/id/product_name
      33  Defl:N       35  -6% 06-21-2021 11:59 fbe92f48  db01.lan/sys/class/dmi/id/board_name
      18  Defl:N       20 -11% 06-20-2021 00:16 5deaba97  db01.lan/sys/class/dmi/id/board_vendor
...
  605964  Defl:N    75815  88% 06-27-2021 23:50 fbb4552b  db01.lan/var/log/sa/sa27
  605964  Defl:N    76052  87% 06-28-2021 23:50 5704a62d  db01.lan/var/log/sa/sa28
  605964  Defl:N    76113  87% 06-29-2021 23:50 8267a285  db01.lan/var/log/sa/sa29
  605964  Defl:N    75803  88% 06-30-2021 23:50 d7478b20  db01.lan/var/log/sa/sa30
  574442  Defl:N    54492  91% 07-06-2021 13:07 1f290b08  db01.lan/oracle/DB01/DB01_awrrpt_1_24717_24718_20210627_0000.html
  627738  Defl:N    64554  90% 07-06-2021 13:07 554dcc81  db01.lan/oracle/DB01/DB01_awrrpt_1_24718_24719_20210627_0100.html
  583958  Defl:N    57024  90% 07-06-2021 13:07 3c7b8d7e  db01.lan/oracle/DB01/DB01_awrrpt_1_24719_24720_20210627_0201.html
...
   24290  Defl:N     4504  82% 07-06-2021 13:07 49383845  db01.lan/oracle/DB01/dbinfo.txt
    4011  Defl:N      636  84% 07-06-2021 13:07 c0304c39  db01.lan/oracle/DB01/pdbinfo.txt
  127795  Defl:N     3911  97% 07-06-2021 13:07 551eaccc  db01.lan/oracle/DB01/reports.sql
   41510  Defl:N     3589  91% 07-06-2021 13:15 b742ecf9  db01.lan/dbcollect.log
--------          -------  ---                            -------
157350796         16088556  90%                            403 files