DBCollect/FAQ

From Dirty Cache Wiki
Revision as of 10:51, 9 December 2024 by Bart (talk | contribs) (→‎Frequently Asked Questions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Frequently Asked Questions

  • Why can't we just send some AWR reports? AWR reports are great but have a few problems and limitations for our purpose:
    • AWR only provides performance and limited configuration metrics. There is no database size/config information such as sizes of tablespaces, redo logs, temp files, segments, ASM disks/diskgroups, archive/flashback/bct files
    • No OS configuration or hardware information (such as CPU type & model)
    • No disk/network configuration
    • No UNIX SAR/sysstat performance data
    • No compression, backup, archiving details
    • AWRs are sometimes generated using non-English locale (cannot be parsed)
    • AWRs are sometimes generated in txt format instead of html (hard to parse, error-prone)
    • AWRs are sometimes provided as RAC versions (completely different layout, hard to parse)
    • Usually only a few AWRs are provided, sometimes with a very large interval (many hours or even days) which is not detailed enough to do accurate sizings or performance analysis
    • No way to know if there are other instances on the same system for which we need to know details
dbcollect is designed to run as non-root, user but it has to be the oracle user or a user with sysdba privileges, or any other user using a credentials file. The SQL scripts only contain SELECT statements so they cannot modify database data. The Python tools cannot delete/overwrite any file except in `/tmp` or the output ZIP file otherwise specified in the arguments. External commands are not executed as root and are verified to only gather system info, not modify anything (some commands may be executed if using 'sudoers' access). CPU consumption is limited by default to either 50% or a maximum of 8 CPUs. These restrictions should make one confident that dbcollect is safe to run on production systems. For additional safety, condider using a credentials file.
  • Why is dbcollect written in Python 2? This is no longer supported!
Python 3 is not available by default on many older systems, i.e. Linux (RHEL/OEL/CentOS), Solaris. On EL6 I even had to backport support for Python 2.6.
Update: dbcollect now works on both Python 2 and Python 3, and Python 3 is the preferred version.
  • How long will it take to run _dbcollect_ ?
This mostly depends on how many AWR/Statspack reports need to be generated and how many CPUs are available. Collecting the OS information usually only takes a few seconds. For normal environments, an AWR report (HTML) takes a about 1-2 seconds, Statspack even less. For a single instance environment, 10 day collect period, 1 hour interval, the amount of reports is about 240 so _dbcollect_ will run for under 10 minutes. There are some known Oracle issues with AWR generation resulting in much longer times. The latest version of _dbcollect_ predicts the remaining time so you have an idea.

As of version 1.11, dbcollect runs AWR reports in parallel on each instance, making it much faster.

  • Does dbcollect gather confidential data?
dbcollect only retrieves system configuration files, SAR/AWR/Statspack etc. In AWR and Statspack however, a number of SQL queries (statements) can be visible. For AWR, dbcollect can remove sections containing SQL statements to prevent collecting pieces of potentially confidential data. The values of bind parameters are not visible. See the --strip option. Passwords or user credentials are never collected.
  • dbcollect appears to be a binary package. How do I know what it is doing?
dbcollect is actually a Python ZipApp package. You can unzip it using unzip and list its contents, the Python code and SQL scripts can be extracted using standard zip/unzip tools.
  • How do I know my download has not been tampered with?
If you downloaded dbcollect from github releases using https, you should be good. If you want to make sure, get the MD5 hash and I can check for you if it is the correct one: md5sum dbcollect
  • I want to check what information dbcollect has gathered
Inspect the zip file /tmp/dbcollect-<hostname>.zip and check its contents.