DBCollect/Detection
Instance detection
This page describes how dbcollect detects (running) Oracle instances and their correct ORACLE_HOMEs.
Running instances
- If a logons file is provided (using
--logons
), dbcollect will NOT automatically detect instances, but only connect to the services listed in the logons file. - If no logons file is specified, dbcollect runs the command
ps -eo pid,user,group,args
to find all running processes on the OS. - Running instances are matched if the args match
ora_pmon_<oracle_sid>
- If the
--included
option is given, only instances that are explicitly included will be processed (a warning message is given for non-included instances) - If the
--excluded
option is given, only instances that are not explicitly excluded will be processed (a warning message is given for excluded instances)
The logons file can be used to connect remote databases (running on another host, where dbcollect may not be supported, i.e., Windows or other non-supported environments. Be aware that no OS-specific data will be collected in that case.
ORACLE_HOME
dbcollect uses SQL*Plus to connect to databases. As of Oracle 19, there is no clean, consistent way to find the correct ORACLE_HOME for a running instance. Older versions would have a hc_<sid>.dat
file where the most recent timestamp on these files was used to figure out which ORACLE_HOME is the correct one. In Oracle 19, the DBS directory is moved from $ORACLE_HOME/dbs
to $ORACLE_BASE/dbs
which is shared by multiple ORACLE_HOMEs and therefore can no longer be used to identify the correct directory.
Therefore, the instance detection part has been rewritten in version 1.18.0 as follows:
Finding candidates
- If the
--orahome
option is used, try the ORACLE_HOMEs given by this parameter first - Next, try to locate the instance in
oratab
. The ORACLE_HOME given for the instance will be tried - Finally, try the ORACLE_HOMEs found via processing the Oracle Inventory (
inventory.xml
)
The Oratab and Inventory parts can be skipped (for troubleshooting purpuses) with the --no-oratab
and --no-orainv
options.
Validating ORACLE_HOME
- Each ORACLE_HOME directory must have an executable
sqlplus
binary in$ORACLE_HOME/bin/sqlplus
- Each ORACLE_HOME directory must NOT have an executable
crsctl
binary in$ORACLE_HOME/bin/sqlplus
(this means it is a GRID_HOME and not used for database connections)
Trying SQL connection
For each valid ORACLE_HOME candidate, dbcollect will try to connect to the database, either as SYSDBA (if no logons file is used) or using the provided logon definition.
- No logons:
timeout 10 $ORACLE_HOME/bin/sqlplus -S -L / as sysdba
- Logons:
timeout 10 $ORACLE_HOME/bin/sqlplus -S -L <logon>
The connection attempt may fail for all kinds of reasons. If the reason is ORA-01017 (logon denied)
or ORA-01034 (Oracle not available)
then a warning is issued and dbcollect will try to connect using the next ORACLE_HOME candidate, until all candidates are attempted and all of them failed, resulting in DBC-E027 (no valid connection).
It is also possible that the sqlplus command times out (if the database hangs or something else is wrong). This results in a timeout warning and again, the next candidate will be attempted.
Any other errors will result in dbcollect to abort with the DBC-E027 message.
Specifically for ORA-01017
, dbcollect will try to read $ORACLE_HOME/rdbms/lib/config.c
to check for the correct OSDBA group membership, and issue warning DBC-W011 with a message about the correct DBA group.
Multiple ORACLE_HOMEs using multiple Oracle users
For security reasons, dbcollect never runs as root and it will switch to either the first oracle user found (using the ps command) or the user given by --user
.
When not using a logons file, SQL*Plus will connect as SYSDBA, which requires the user to be member of the OSDBA
group which is hardcoded in the Oracle binaries (and defined in the config.c
file.
If the user is not member of the correct OSDBA group, SQLPlus will fail with ORA-01017
. dbcollect cannot switch to another user once it is running - which results in problems with this multiple Oracle user scenario.
Possible solutions:
- Add the user that dbcollect uses to the OSDBA user listed in the
DBC-W011
message. For example:gpasswd -a oraprod dbatest
. This may be removed after running dbcollect (gpasswd -d ...
- Create a special user for dbcollect that is member of all required OSDBA groups (as well as the group owning ASM block devices, usually
asmdba
orasmadmin
. The user should also be member of theoinstall
group, or at least have read access tooratab
and/or the Oracle inventory and the SQL*Plus executable. - Use a logons file for each instance (this will cause sqlplus not to connect as sysdba)
- Run dbcollect multiple times, one for each Oracle user, and use
--include|--exclude
to specify the instances belonging to each user. Optionally use the--filename
option to separate the dbcollect ZIP files.