DBCollect/Troubleshooting: Difference between revisions

From Dirty Cache Wiki
Jump to navigation Jump to search
Created page with "Category:DBCollect = DBCollect Troubleshooting = == Tips == * Debug : If something fails, run dbcollect with the <code>-D</code> (debug) option. This will show the content of exception debug messages and print the full dbcollect.log logfile when finished."
 
 
(3 intermediate revisions by the same user not shown)
Line 7: Line 7:
* Debug
* Debug
: If something fails, run dbcollect with the <code>-D</code> (debug) option. This will show the content of exception debug messages and print the full dbcollect.log logfile when finished.
: If something fails, run dbcollect with the <code>-D</code> (debug) option. This will show the content of exception debug messages and print the full dbcollect.log logfile when finished.
* Logfile
: {{dbcollect}} writes log messages to <code>/tmp/dbcollect.log</code>. This file will be moved into the dbcollect ZIP file when {{dbcollect}} finishes. View the contents of /tmp/dbcollect.log or dump it from the zip file: <code>unzip -qc /tmp/dbcollect-hostname.zip hostname/dbcollect.log</code>
* Sending failed results
: If something goes wrong, you can send the ZIP file anyway, this allows me to inspect the debug messages and find out what went wrong.
== Problems with executing DBCollect ==
{{dbcollect}} itself will not run for some reason.
See [[DBC:Errors/Executing]]
== DBCollect Error and warning messages ==
{{dbcollect}} runs but spits out error or warning messages.
Almost all errors and warning messages have a message ID. For errors, this is E followed by a 3-digit number, for warnings, this is W followed by a 3-digit number.
The list of messages can be found here:
[[DBC:Errors | List of error and warning messages]]
== DBcollect runs for a very long time ==
This is most likely due to performance issues in Oracle with the generation of AWR reports.
It is especially the case if you run {{dbcollect}} on Oracle RAC. There are a number of known issues that cause it to be very slow. I have observed up to 20 seconds per AWR report (usually about 0.5 seconds) making {{dbcollect}} run for over an hour or more for a typical 10-day, 1-hour interval, single database cluster.
Check Support notes for fixes and workarounds: 2404906.1, 2565465.1, 2318124.1, 29932310.8, 2148489.1, 29470291.8. Or be very patient.
Update: As of version 1.11, {{dbcollect}} runs multiple AWR reports in parallel for each instance, making it much faster, by default 50% of CPUs with a max of 8.
To use all available CPUs, use <code>--tasks 0</code> (note that CPU load will likely go to 100%, be careful)

Latest revision as of 10:54, 9 December 2024


DBCollect Troubleshooting

Tips

  • Debug
If something fails, run dbcollect with the -D (debug) option. This will show the content of exception debug messages and print the full dbcollect.log logfile when finished.
  • Logfile
dbcollect writes log messages to /tmp/dbcollect.log. This file will be moved into the dbcollect ZIP file when dbcollect finishes. View the contents of /tmp/dbcollect.log or dump it from the zip file: unzip -qc /tmp/dbcollect-hostname.zip hostname/dbcollect.log
  • Sending failed results
If something goes wrong, you can send the ZIP file anyway, this allows me to inspect the debug messages and find out what went wrong.

Problems with executing DBCollect

dbcollect itself will not run for some reason.

See DBC:Errors/Executing

DBCollect Error and warning messages

dbcollect runs but spits out error or warning messages.

Almost all errors and warning messages have a message ID. For errors, this is E followed by a 3-digit number, for warnings, this is W followed by a 3-digit number. The list of messages can be found here:

List of error and warning messages

DBcollect runs for a very long time

This is most likely due to performance issues in Oracle with the generation of AWR reports. It is especially the case if you run dbcollect on Oracle RAC. There are a number of known issues that cause it to be very slow. I have observed up to 20 seconds per AWR report (usually about 0.5 seconds) making dbcollect run for over an hour or more for a typical 10-day, 1-hour interval, single database cluster. Check Support notes for fixes and workarounds: 2404906.1, 2565465.1, 2318124.1, 29932310.8, 2148489.1, 29470291.8. Or be very patient.

Update: As of version 1.11, dbcollect runs multiple AWR reports in parallel for each instance, making it much faster, by default 50% of CPUs with a max of 8.

To use all available CPUs, use --tasks 0 (note that CPU load will likely go to 100%, be careful)