Pythian Group

Subscribe to Pythian Group feed
Love Your Data
Updated: 7 hours 2 min ago

Weighing the Pros and Cons of Oracle Autonomous Database

Tue, 2020-11-10 07:00

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 20, 2019.

The next generation of managed database services is upon us, offering organizations of all sizes the ability to scale database workloads quickly without forking out huge investments in either hardware or specialized expertise. One of these services is Oracle Autonomous Database (ADB), marketed by the company as the “world’s first self-driving database” that uses machine learning to replace previously manual tasks.

Sounds pretty cool – but is ABD a good choice for you and your organization?

What exactly is the Oracle Autonomous Database?

If you’re in any way used to working in an on-premises Oracle system, you’ll be at home in ADB, as it’s built on the same underlying (and well-known) technology; Oracle Exadata hardware combined with software like Pluggable Database (PDB), which isolates each client to their own data neighborhood by restricting their ability to see outside their own PDB. These all run on Real Application Clusters (RAC) in a container database in one of Oracle’s cloud data centers with Oracle managing the whole thing, either on shared or dedicated underlying hardware.

It’s also worth pointing out that ADB is a relational database. That’s a clear differentiator between it and, say, Google BigQuery, a fantastic big data solution with the ability to scale to petabytes. ADB only scales to 128TB, but comes with relational database management system (RDBMS) functionality such as referential integrity (primary key / foreign-key relationships), server-side code objects such as procedures and functions and triggers.

One important thing to keep in mind, however, is that ADB isn’t a standalone product or service. It’s actually an umbrella term for the combination of two underlying Oracle services: Autonomous Data Warehouse (largely tuned for data warehouse workloads with data stored in columnar format) and the Autonomous Transaction Processing database (tuned for Online Transaction Processing-type workloads using the more traditional row storage format). 

Makes sense. But what else is so great about it?

Think of it as essentially a fully-fledged Oracle database that’s far easier and less costly to procure and scale up and down than an on-premises counterpart. And instead of your IT team managing various services, ADB automatically deals with things such as:

  • Monitoring
  • Addressing underlying component errors
  • Fine-tuning and query stabilization
  • Backups and recoveries

It also provides many features users have come to expect from one of the most powerful RDBMS systems in the world. These include everything from simple tables to the more advanced features such as server-side code (procedures, functions, triggers, types) and complex table and index types. That’s along with the ability to monitor query performance using tools to which users have grown accustomed.

Sounds impressive. But what are the drawbacks?

We all know the saying “you’ve got to give to get” and unfortunately, even though new features are constantly being added, it’s just as applicable to ADB as many other things in life. Any managed database is great for scaling up and improving efficiency, but in exchange ADB sacrifices certain benefits and features typically found in an Oracle on-premises system, for example:

  • Database Vault, OLAP, Text, Multimedia, Workspace Manager and other features are not available.
  • There’s no control over database initialization parameters or tablespaces.
  • There’s no ability to control database instance memory sizes or configurations.
  • You can’t monitor via Oracle Enterprise Manager.

PDB lockdown profiles also restrict certain commands such as ALTER TABLESPACE.

Interested in learning more about Oracle Autonomous Database? This post only scratches the surface of the deep dive that is our new white paper on ADB and whether it might be a good fit at your organization. 

Download the Oracle Autonomous Database White Paper to continue your evaluation of one of the most powerful cloud databases in the world.

Categories: DBA Blogs

Issues With Database Service Names Starting With Pluggable Database (PDB) Name

Thu, 2020-11-05 08:34

In this post I’ll describe a problem which occurs when creating and using a database service name which starts with the pluggable database’s (PDB) name, and has a dot (“.”) after it: “<pdb_name>.<optional_suffix>.” I noticed this issue when I was trying to reset an expired user’s password by connecting to that service.

I’ll also describe other anomalies, for example, being able to stop the service, but not start it, as well as the service remaining active even if the PDB gets dropped. I also acknowledge there may be other side effects in addition to those I describe here. The workaround is relatively simple — don’t create services with the format “<pdb_name>.<optional_suffix>.” Or, said differently, don’t start the service name with “<pdb_name>.”

It’s worth noting that the problem reproduces on 18c and 19c, but not on 12.1 and 12.2.

As of now, the development team has created a predefect — “Bug 32009641: ORA-01017 WHEN CONNECTING TO A PDB USING A SERVICE NAME WHICH STARTS WITH THE PDB” — though at the time of writing this post, it’s not published yet, as the team has to confirm whether it is a bug, and whether it can be published.

While it’s unlikely that many users will run into the issue described in this post, it’s good to be aware of it, and to avoid creating service names starting with “<pdb_name>.” until this issue is fixed. Mine is only a simple use case, and there could be other side effects not discussed here.

Setting up the test environment to reproduce the problem

I performed the steps below on a 19.8.0 DB; no ASM or GI managed resources are present. I start with a new PDB named PDB1, and open it:

SQL> create pluggable database pdb1 admin user pdb_adm identified by oracle create_file_dest='/u02/oradata/CDB1';

Pluggable database created.

The default CDB services are started, and PDB1’s default service “pdb1,” associated with CON_ID=3. This identifies the initial current container for the service:

CDB$ROOT:

SQL> select con_id, name from v$services order by con_id, name;

    CON_ID NAME
---------- ---------------
         1 CDB2
         1 CDB2XDB
         1 SYS$BACKGROUND
         1 SYS$USERS
         3 pdb1

SQL> select con_id, name from cdb_services order by con_id, name;

    CON_ID NAME
---------- -----------------
         1 CDB2
         1 CDB2.localdomain
         1 CDB2XDB
         1 SYS$BACKGROUND
         1 SYS$USERS
         3 PDB1

6 rows selected.

PDB1:

SQL> select con_id, name from v$services order by con_id, name;

    CON_ID NAME
---------- -----
         3 pdb1

SQL> select con_id, name from cdb_services order by con_id, name;

    CON_ID NAME
---------- -----
         3 PDB1

Next, I create and start services in PDB1. I chose service names to include various combinations of names using “pdb1” and “pdb1.”:

SQL> alter session set container=pdb1;

exec dbms_service.create_service(service_name=>'pdb1.test1',network_name=>'pdb1.test1');
exec dbms_service.create_service(service_name=>'test2.pdb1',network_name=>'test2.pdb1');
exec dbms_service.create_service(service_name=>'test3.pdb1.app',network_name=>'test3.pdb1.app');
exec dbms_service.create_service(service_name=>'pdb1_A',network_name=>'pdb1_A');
exec dbms_service.create_service(service_name=>'pdb1_B.app',network_name=>'pdb1_B.app');
exec dbms_service.create_service(service_name=>'custom_service.app',network_name=>'custom_service.app');

exec dbms_service.start_service(service_name=>'pdb1.test1');
exec dbms_service.start_service(service_name=>'test2.pdb1');
exec dbms_service.start_service(service_name=>'test3.pdb1.app');
exec dbms_service.start_service(service_name=>'pdb1_A');
exec dbms_service.start_service(service_name=>'pdb1_B.app');
exec dbms_service.start_service(service_name=>'custom_service.app');

I’ll be using the below script to display the services’ status:

SQL> !cat s.sql
set lin 500 pagesize 9999
col name for a50

show con_name
prompt
prompt ** Output from v$active_services: **
select con_id, name from v$active_services where lower(name) like '%pdb%' or lower(name) like '%app%' order by con_id, name;

prompt ** Output from cdb_services: **
select con_id, name from cdb_services where lower(name) like '%pdb%' or lower(name) like '%app%' order by con_id, name;

Let’s check the status of the above-created services:

CDB$ROOT:

** Output from v$active_services: **

    CON_ID NAME
---------- ------------------
         1 pdb1.test1        <<--
         3 custom_service.app
         3 pdb1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

** Output from cdb_services: **

    CON_ID NAME
---------- ------------------
         3 PDB1
         3 custom_service.app
         3 pdb1.test1        <<--
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

PDB1:

** Output from v$active_services: **

    CON_ID NAME
---------- ------------------
         3 custom_service.app
         3 pdb1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

6 rows selected.

** Output from cdb_services: **

    CON_ID NAME
---------- ------------------
         3 PDB1
         3 custom_service.app
         3 pdb1.test1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

As you can see, the “pdb1.test1” service is apparently associated to the root container (v$active_services.con_id=1), and not with PDB1 as configured (cdb_services.con_id=3). All the other created services are being reported as running in PDB1 (v$active_services.con_id=3).

Why is this a problem?

Let’s create two local users in PDB1 — u1 and u2 — and set u1’s password to expired:

SQL> grant create session to u1 identified by oracle;
SQL> grant create session to u2 identified by oracle;
SQL> alter user u1 password expire;

If I try to connect as u2, everything works without problems:

$ sqlplus u2/oracle@localhost/pdb1.test1

SQL> select sys_context('userenv','cdb_name') cdb_name, sys_context('userenv','con_name') con_name, sys_context('userenv','service_name') service_name from dual;

CDB_NAME             CON_NAME             SERVICE_NAME
-------------------- -------------------- --------------------
CDB2                 PDB1                 pdb1.test1

However, if I try to connect to u1 and change its password, this is what happens:

$ sqlplus u1/oracle@localhost/pdb1.test1

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 11:57:56 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

ERROR:
ORA-28001: the password has expired


Changing password for u1
New password:
Retype new password:
ERROR:
ORA-01017: invalid username/password; logon denied


Password unchanged
Enter user-name:
Analysis

Tracing the session with oradebug before entering the new password (with events 10046 and 10079), the session gets established via the pdb1.test1 service, and it’s traced under “CDB$ROOT(1)”:

*** 2020-10-26T12:01:31.706897+01:00 (CDB$ROOT(1))                    <<-- traced at "CDB$ROOT(1)" level
*** SESSION ID:(269.54910) 2020-10-26T12:01:31.706934+01:00
*** SERVICE NAME:(pdb1.test1) 2020-10-26T12:01:31.706937+01:00        <<-- service pdb1.test1 was used when connecting
*** MODULE NAME:(sqlplus@hol.localdomain (TNS V1-V3)) 2020-10-26T12:01:31.706940+01:00
*** ACTION NAME:() 2020-10-26T12:01:31.706943+01:00
*** CONTAINER ID:(1) 2020-10-26T12:01:31.706945+01:00

Not going into the details of the trace, we might get a clue at what’s happening by checking the audit log.

From PDB1:

SQL> select * from (
select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text
from sys.aud$
where userid = 'U1'
  and returncode <> 0
order by ntimestamp# desc
) where rownum = 1;

NTIMESTAMP#                    USERID     USERHOST             TERMINAL          ACTION# RETURNCODE CURRENT_USER    COMMENT$TEXT
------------------------------ ---------- -------------------- --------------- --------- ---------- --------------- ----------------------------------------------------------------------
26-OCT-20 10.57.56.687258 AM   U1         hol.localdomain      pts/6                 100      28001 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client address:
                                                                                                                     (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=58922))

From CDB$ROOT:

SQL> select * from (
select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text
from sys.aud$
where userid = 'U1'
  and returncode <> 0
order by ntimestamp# desc
) where rownum = 1;  2    3    4    5    6    7

NTIMESTAMP#                    USERID   USERHOST         TERMINAL   ACTION# RETURNCODE CURRENT_USER    COMMENT$TEXT
------------------------------ -------- ---------------- ---------- ------- ---------- --------------- -----------------------------------------------------------------
26-OCT-20 11.01.32.725820 AM   U1       hol.localdomain  pts/6          100       1017 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add
                                                                                                       ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=58922))

As you can see, when entering the password for the u1 user, we get “ORA-28001: the password has expired,” which is recorded in the PDB’s audit log. Then, after changing the password, an attempt is made to connect to the root container as u1. Since u1 is not a common user, the login fails, and we get the “ORA-01017: invalid username/password; logon denied” error.

What happens if I try to repeat the same steps using the “test2.pdb1” service name, which according to v$active_services maps to CON_ID=3, so PDB1?

This time the password change succeeds:

$ sqlplus u1/oracle@localhost/test2.pdb1

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 12:38:04 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

ERROR:
ORA-28001: the password has expired


Changing password for u1
New password:
Retype new password:
Password changed

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.8.0.0.0

SQL>

Enabling tracing on the session before the initial password is provided, this time the trace gets generated in the context of “PDB1(3).” Notice the difference with the previous trace header, where “CDB$ROOT(1)” was reported:

*** 2020-10-26T12:39:18.045405+01:00 (PDB1(3))                     <<-- traced at "CDB$ROOT(1)" level
*** SESSION ID:(269.12918) 2020-10-26T12:39:18.045547+01:00
*** SERVICE NAME:(test2.pdb1) 2020-10-26T12:39:18.045557+01:00     <<-- service test2.pdb1 was used when connecting
*** MODULE NAME:(sqlplus@hol.localdomain (TNS V1-V3)) 2020-10-26T12:39:18.045564+01:00
*** ACTION NAME:() 2020-10-26T12:39:18.045570+01:00
*** CONTAINER ID:(3) 2020-10-26T12:39:18.045574+01:00

Checking PDB1’s audit log contents, there’s an initial ORA-28001, followed by RETURNCODE=0, thus a successful login:

SQL> select * from (
select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text
from sys.aud$
where userid = 'U1'
order by ntimestamp# desc
) where rownum <=2;  2    3    4    5    6

NTIMESTAMP#                    USERID   USERHOST         TERMINAL   ACTION# RETURNCODE CURRENT_USER    COMMENT$TEXT
------------------------------ -------- ---------------- ---------- ------- ---------- --------------- -----------------------------------------------------------------
26-OCT-20 11.39.18.069634 AM   U1       hol.localdomain  pts/6          100          0 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add
                                                                                                       ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=59244))

26-OCT-20 11.38.04.992001 AM   U1       hol.localdomain  pts/6          100      28001 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add
                                                                                                       ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=59244))

There are no new entries for the u1 user in CDB$ROOT’s audit log.

Other anomalies Start / stop service

There are also other anomalies, e.g. the ‘pdb1.test1’ service can be started and stopped from PDB1, but then not started back:

From PDB1:

SQL> exec dbms_service.stop_service(service_name=>'pdb1.test1');

PL/SQL procedure successfully completed.

Querying v$active_services from PDB1 or CDB$ROOT, the service is not displayed. However, if I try to start it:

SQL> exec dbms_service.start_service(service_name=>'pdb1.test1');
BEGIN dbms_service.start_service(service_name=>'pdb1.test1'); END;

*
ERROR at line 1:
ORA-44773: Cannot perform requested service operation.
ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 63
ORA-06512: at "SYS.DBMS_SERVICE", line 486
ORA-06512: at line 1


SQL>
Closing and / or dropping PDB1

Going to the initial setup, where all of the services I defined, except “pdb1.test1”, are mapped to PDB1:

SQL> @s

CON_NAME
------------------------------
CDB$ROOT

** Output from v$active_services: **

    CON_ID NAME
---------- -------------------
         1 pdb1.test1
         3 custom_service.app
         3 pdb1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

** Output from cdb_services: **

    CON_ID NAME
---------- -------------------
         3 PDB1
         3 custom_service.app
         3 pdb1.test1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

Let’s close PDB1, and check the services status:

SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT

SQL> alter pluggable database pdb1 close immediate;

Pluggable database altered.

SQL> @s

CON_NAME
---------
CDB$ROOT

** Output from v$active_services: **

    CON_ID NAME
---------- -----------
         1 pdb1.test1
         3 pdb1

** Output from cdb_services: **

no rows selected

SQL>

All of the user created services were closed, except for pdb1.test1. If I try to connect to PDB1 as user u2, I get “ORA-01109: database not open.”

Let’s drop the PDB:

SQL> drop pluggable database pdb1 including datafiles;

Pluggable database dropped.

SQL> @s

CON_NAME
---------
CDB$ROOT

** Output from v$active_services: **

    CON_ID NAME
---------- -----------
         1 pdb1.test1

** Output from cdb_services: **

no rows selected

SQL>

Connecting to PDB1 as user u2 now returns “ORA-01017: invalid username/password; logon denied” which indicates that an authentication attempt is performed. Since only CDB$ROOT is open, let’s see if I can connect as system:

$ sqlplus system/oracle@localhost/pdb1.test1

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 13:05:59 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Last Successful login time: Mon Oct 26 2020 11:38:01 +01:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.8.0.0.0

SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT
SQL>
Conclusion

The above are relatively simple use cases, and until this issue is fixed, it’s advisable to avoid creating service names starting with “<pdb_name>.” as there could be other implications not discussed here. You may also want to keep an eye on the status of the predefect tracked as Bug 32009641.

Categories: DBA Blogs

Reducing Contention on Hot Cursor Objects (Cursor: Pin S)

Tue, 2020-10-20 11:37

First, let me offer a little explanation about the wait event “cursor: pin S.”

Oracle states: “A session waits on this event when it wants to update a shared mutex pin and another session is currently in the process of updating a shared mutex pin for the same cursor object.” In other words, two or more sessions are trying to concurrently run the same statement (the same cursor in library cache), which forces them to compete to update a shared mutex pin for the same cursor object.

This wait event provides very useful information to identify why sessions are competing to update a shared mutex pin:

 pin S.

Here’s how a mutex works:

If a session wants to use a cursor, it must not disappear from the library cache while in use. The session uses a mutex to ensure the cursor cannot be changed or deleted so, to this end, it logs that there is an interested session by incrementing the mutex usage by one. This is called taking a shared lock.

The process for taking a shared lock:
  1. A session wants to run a cursor and so checks the owning cursor pin mutex to see if there is a session waiting to change the mutex (e.g. performing a hard-parse). It does this by checking the high-order bits to see if they are zero or have a session ID.
  2. If the high-order bits are zero, then it locks and increments by one (this is an atomic action). Waiting to lock and increment causes the “cursor: pin S” wait event. This increment is done on the low-order bits of the mutex.
  3. If the lock and increment fails, then some other session must be updating the mutex, so it’s necessary to sleep and try again, i.e. lock and increment. The “cursor: pin S” wait event will be longer. This can cause extra CPU load on the server as it spins attempting to update the mutex.
  4. If the high-order bits are not zero then there is a session waiting to change the mutex. The current interested session waits on the event “cursor: pin S wait on X.” If this is the case then it sleeps and tries again.
  5. Once the cursor is closed and finished, the shared lock on the mutex must be released by performing a lock and decrementing by one. Once again, if there is a failure to lock and decrement the next step is to sleep and try again.

If a session wants to perform a hard parse on a cursor already existing in the library cache it must acquire the mutex in exclusive mode.

The process for taking an exclusive lock:
  1. A session wants to perform a hard parse on a statement so it checks the cursor pin mutex to see if it’s in use.
  2. It checks the high-order bits and, if zero, updates the high-order bits to the current session ID (this compare-and-swap routine is a CPU atomic action).
  3. If the high-order bits are already set, the process has to wait on the event “cursor: pin X.” The session then sleeps and tries again.
  4. Once the high-order bits are set to the current session ID, it checks the low-order bits to see if the cursor is currently in use.
  5. If the low-order bits are not zero, it must wait for the counter to decrement to zero (Note: the counter cannot be incremented once the high-order bits are set to the session ID).
  6. Once the low-order bits are set to zero then the hard parse can proceed.
  7. The session removes the exclusive mutex lock by resetting the high-order bits to zero.
Real Production Environment

Oracle states that the wait event should be very rare, since updating the shared mutex pin is extremely fast.

This is true in most cases, but we recently ran into a scenario where a vast number of sessions  were waiting on “cursor: pin S” while running different statements (v$session showed different sql_id). As Oracle states, the process to take a shared lock was very fast, but it still generated a big impact on performance. As the load from the application grew (in this case load grew exponentially, as this was an online retailer with increased traffic due to COVID-19), this impact became more and more evident.

In this scenario, we found a number of statements that shared the same characteristic; they all invoked the same function which, among other tasks, ran a simple select statement. As they were all trying to run the select statement in the function, they were all competing for access to the shared mutex pin for that specific select statement.

The best approach to reduce the contention on the shared mutex for the statement is to mark it as hot. This allows Oracle to split the executions of this statement among several “SQL-copies.” Once the statement is marked as hot, Oracle will stop running the original statement and will, instead, split the executions among the different “SQL-copies” of the cursor object. Since each “SQL-copy” has its own shared mutex, load on the single mutex will be split among the copies generated by this mechanism.

Even though it doesn’t eliminate the waits on “cursor: pin S,” this technique proved to be very effective by dramatically reducing waits and improving performance on a real productive environment with around 9000 executions of the same SQL statement per second.

Below is a step-by-step guide to create a scenario very similar to the issue we encountered, along with the solution implemented to overcome its impact on performance. I also cover some important topics to keep in mind when you reboot the database, and when dealing with RAC environments.

Creating a test environment to reproduce waits on “cursor: pin S”

In order to create the test environment, I created a dummy table, a function, two shell scripts and four SQL scripts.

1. Create a dummy table (code_table) and populate it with some simple values using a for loop.

SQL> create table code_table (code_name char(1), low_value number, high_value number);

Table created.

SQL> declare
  2    letters char(26) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  3    v_num number := 1;
  4  begin
  5    for i in 1..26 LOOP
  6      insert into code_table values (substr(letters,i,1), i*v_num, i*(v_num+1000));
  7      v_num := v_num + 1001;
  8    end loop;
  9    commit;
 10  end; 
 11  /

PL/SQL procedure successfully completed.

2. Create a function (fx_num) that contains the select statement that will force the sessions to wait on “cursor: pin S’.

SQL> create or replace function fx_num (v_name varchar) return number is
  2    v_low number;
  3    v_high number;
  4  begin
  5    select low_value, high_value into v_low, v_high from code_table where code_name=v_name;
  6    return(DBMS_RANDOM.value(low => v_low, high => v_high));
  7 end;
  8 /

Function created.

3. Create a shell script (launch_test.sh) that will spam multiple concurrent sessions to mimic high concurrency on the object cursor.

::::::::::::::::
launch_test.sh
::::::::::::::::

#!/bin/bash

export ORACLE_SID=proddb2
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/db
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export PATH=$ORACLE_HOME/bin:$PATH

for i in {1..50}
do
nohup sqlplus -s user/pass @launch_sql1.sql >> launch_sql1.log 2>&1 &
nohup sqlplus -s user/pass @launch_sql2.sql >> launch_sql2.log 2>&1 &
nohup sqlplus -s user/pass @launch_sql3.sql >> launch_sql3.log 2>&1 &
nohup sqlplus -s user/pass @launch_sql4.sql >> launch_sql4.log 2>&1 &
done

exit 0

4. Create four SQL scripts (launch_sql[1-4].sql), containing a simple query that calls the function created previously. Each of these scripts will be executed 50 times; thus simulating 200 concurrent sessions trying to call the function (fx_num).

::::::::::::::
sts_tst1.sql
::::::::::::::

select f_random(substr(to_char(sysdate,'MON'),1,1)) from dual;
exit
::::::::::::::
sts_tst2.sql
::::::::::::::

select object_name from user_objects where object_id < fx_num(substr(object_id,1,1));
exit
::::::::::::::
sts_tst3.sql
::::::::::::::

select name, count(*) from user_source where line < fx_num(substr(name,1,1)) group by name;
exit
::::::::::::::
sts_tst4.sql
::::::::::::::

select trunc(last_analyzed), sample_size, count(*) from user_tab_cols where column_id < fx_num(substr(column_name,1,1)) group by trunc(last_analyzed), sample_size;
exit

5. Finally, create a shell script (check_waits.sh) to list the sessions waiting on “cursor: pin S.” As stated before, the column P1 of gv$session view shows the hash value of the statement protected by the shared mutex. Showing this value will help identify the root cause of the waits on “cursor: pin S.”

:::::::::::::::::
check_waits.sql
:::::::::::::::::

set lines 200 pages 200
prompt - Waits on cursor pin S:
select inst_id, substr(event,1,30) event, p1, sql_id, count(*)
from gv$session
where event = 'cursor: pin S'
group by inst_id, substr(event,1,30), p1, sql_id;
exit
Running the test scenario

First, let’s generate the waits on “cursor: pin S”:

1. In order to check for session waiting on the wait event “cursor: pin S,” I run the SQL script check_waits.sql on an endless while loop as follows:

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @check_waits.sql; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

no rows selected

2. On a second terminal, I execute the shell script launch_test.sh to generate 200 sessions by executing the four SQL scripts 50 times each. This step forces the database to receive 200 executions of different select statements that invoke the function called fx_num.

[oracle@oradb02 ddml]$ ./launch_test.sh

3. On the first terminal, we can now see a lot of sessions waiting on “cursor: pin S.” As you can see in the following output, instance two registers up to 157 sessions waiting on this event at one point in time.

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @check_waits; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

INST_ID EVENT         P1        SQL_ID        COUNT(*)
------- ------------- --------- ------------- -------- 
      2 cursor: pin S 356306711 2ndpau148y2by      14
      2 cursor: pin S 356306711 7tr4jwnamtmsr      55
      2 cursor: pin S 356306711                    23
      2 cursor: pin S 356306711 a3xkbsayc47kq      13

- Waits on cursor pin S:

INST_ID EVENT         P1        SQL_ID        COUNT(*) 
------- ------------- --------- ------------- -------- 
      2 cursor: pin S 356306711 5n3qfbb42gfdr        1
      2 cursor: pin S 356306711 2ndpau148y2by       11
      2 cursor: pin S 356306711 7tr4jwnamtmsr       84
      2 cursor: pin S 356306711                     42
      2 cursor: pin S 356306711 a3xkbsayc47kq       19

- Waits on cursor pin S:

INST_ID EVENT         P1        SQL_ID        COUNT(*)
------- ------------- --------- ------------- --------
      2 cursor: pin S 356306711 2ndpau148y2by        1
      2 cursor: pin S 356306711 7tr4jwnamtmsr        4
      2 cursor: pin S 356306711 a3xkbsayc47kq        6

4. The view gv$sql shows the sessions waiting on “cursor: pin S” when executing the statements in the SQL scripts (launch_sqlX.sql). You can see both the actual problematic SQL statement (SQL_ID 7tr4jwnamtmsr) and its calling statements waiting for the event “cursor: pin S.”

SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql
  2  where hash_value=356306711
  3     or sql_id in ('2ndpau148y2by','7tr4jwnamtmsr','a3xkbsayc47kq','5n3qfbb42gfdr')
  4  group by sql_id, hash_value, sql_text order by 1;

SQL_ID        HASH_VALUE EXECUTIONS SQL_TEXT
------------- ---------- ---------- -----------------------------------------------------------------------------------------------------
2ndpau148y2by 1217333630        800 select job_name, count(*) from job_masters where job_id > fx_num(substr(owner,1,1)) group by job_name
5n3qfbb42gfdr 3358046647        800 select fx_num(substr(to_char(sysdate,'MON'),1,1)) from dual
7tr4jwnamtmsr 356306711     3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
a3xkbsayc47kq 3166838358        800 select trunc(timestamp), completion_status, count(*) from job_masters where job_id <  
                                    fx_num(substr(job_name,1,1)) group by trunc(timestamp), completion_status

5. If you take a look into column P1, you’ll notice even though the sessions are running different statements (column SQL_ID) they are still trying to access the same shared mutex. All the sessions point to the same hash_value (356306711): the select statement in the function fx_num.

6. In addition, we can see the number of executions for the select statements in the function (sql_id 7tr4jwnamtmsr) is much higher than those from the select statements in the SQL scripts (launch_sqlX.sql). This makes total sense since the statements in the SQL scripts call the function fx_num for each row. This extremely high number of concurrent executions of the function is the root cause of the high waits on “cursor: pin S.” These sessions are struggling to access the shared mutex that protects the select statement in the function.

Implementing the recommended fix

The best approach to fix this performance issue is to mark the select statement in the function as hot. As stated earlier, this tells Oracle to split the executions on this statement among a number of “SQL-copies.” This will also split the load among different cursor objects protected by different shared mutexes.

1. The first step is to set the hidden parameter _kgl_hot_object_copies.

This parameter governs the number of “SQL-copies” that Oracle will create for each statement marked as hot. The recommendation is to set this parameter to half the amount of CPUs in the server. Keep in mind, you will need to reboot for this change to take effect, so plan accordingly.

SQL> alter system set '_kgl_hot_object_copies'=8 scope=spfile sid='*';

System altered

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup
ORACLE instance started.
(...)
Database mounted.
Database opened.
SQL> select ksppinm, ksppstvl from x$ksppi a, x$ksppsv b, v$instance
  2  where a.indx=b.indx AND substr(ksppinm,1,1) = '_' AND ksppinm = '_kgl_hot_object_copies';

KSPPINM                    KSPPSTVL
-------------------------- ----------
_kgl_hot_object_copies     8

2. Once you set the parameter _kgl_hot_object_copies and restart the database, you have to set the SQL suffering the waits on “cursor: pin S” to hot. Oracle provides the procedure dbms_shared_pool.markhot for this purpose.

3. You need to confirm that the cursor you’re planning to mark as hot is already in the library cache. You can use the following query to confirm this, and also to get the full_hash_value which you must provide to set the statement as hot.

SQL> select kglnahsh, kglnahsv from v$sql, x$kglob where kglhdadr=address and sql_id = '7tr4jwnamtmsr';

KGLNAHSH   KGLNAHSV
---------- --------------------------------
356306711  eb4cdceda1c495cd7cdc91e5153ccf17

4. The following statement marks the SQL as hot.

SQL> begin
  2    dbms_shared_pool.markhot(hash => 'eb4cdceda1c495cd7cdc91e5153ccf17', namespace => 0, global => true);
  3  end;
  4  /

PL/SQL procedure successfully completed.

5. You can query the view gv$db_object_cache to check on which instance the statement has been marked as hot by checking the PROPERTY column.

SQL> select inst_id, hash_value, namespace, child_latch, property, status, sum(executions) executions, name
  2  from gv$db_object_cache where hash_value = 356306711
  3  group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH PROPERTY STATUS EXECUTIONS NAME
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------
      2 356306711  SQL AREA            0 HOT      VALID     3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

6. Keep in mind the procedure will only mark the statement as hot if it finds it in the library cache. Because of this, you can see that gv$db_object_cache shows the statement marked as hot only in instance two even though I executed the procedure dbms_shared_pool.markhot with the global argument set to true.

Running the test scenario one more time

1. With the statement set to hot, let’s try the same test one more time. Again, I run the check_waits.sql script on a while loop.

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @chk; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

no rows selected

2. Same as before, let’s launch the test on a separate terminal.

[oracle@oradb02 ddml]$ ./launch_test.sh

3. Instance two now shows just a few sessions waiting on “cursor: pin S.” The total number of concurrent waits dropped to just eight sessions.

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @chk; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

INST_ID EVENT         P1         SQL_ID        COUNT(*)
------- ------------- ---------- ------------- --------
      2 cursor: pin S 1835995903 ddttvdtqqy4rz        1
      2 cursor: pin S 939937426  2ndpau148y2by        2
      2 cursor: pin S 4244290431 2ndpau148y2by        1
      2 cursor: pin S 939937426                       1
      2 cursor: pin S 939937426 dg9n6z0w0cmnk         2
      2 cursor: pin S 4244290431 61m38g7ygpfvz        1

6 rows selected.

- Waits on cursor pin S: 

no rows selected

- Waits on cursor pin S: 

no rows selected

4. While the view gv$sql shows an increase on the number of executions of the select statements in the SQL scripts (launch_sqlX.sql), the number of executions for the SQL in the function fx_num (sqlid 7tr4jwnamtmsr) remains unchanged (still 3854606).

SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql
  2  where hash_value=356306711
  3     or sql_id in ('2ndpau148y2by','7tr4jwnamtmsr','a3xkbsayc47kq','5n3qfbb42gfdr')
  4  group by sql_id, hash_value, sql_text order by 1;

SQL_ID        HASH_VALUE EXECUTIONS SQL_TEXT 
------------- ---------- ---------- ------------------------------------------------------------------------------------------------------
2ndpau148y2by 1217333630       1600 select job_name, count(*) from job_masters where job_id > fx_num(substr(owner,1,1)) group by job_name
5n3qfbb42gfdr 3358046647       1600 select fx_num(substr(to_char(sysdate,'MON'),1,1)) from dual
7tr4jwnamtmsr 356306711     3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
a3xkbsayc47kq 3166838358       1600 select trunc(timestamp), completion_status, count(*) from job_masters where job_id < 
                                    fx_num(substr(job_name, 1,1)) group by trunc(timestamp), completion_status

5. If we filter by the SQL signature from the statement in the function fx_num, you can see there are now a group of new statements with a high number of executions. It seems these statements were executed as part of the second test. As a side note, the SQL signature for all these sql_id match because they all share the same SQL text.

SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql
  2  where force_matching_signature = (select force_matching_signature from gv$sql where hash_value=356306711)
  3  group by sql_id, hash_value, sql_text order by 1;

SQL_ID        HASH_VALUE EXECUTIONS SQL_TEXT
------------- ---------- ---------- ------------------------------------------------------------------------------------------------------
0u9hxt3azayv5 3589634917     125688 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
1far8p5csfrmx 1502043773     142500 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
5bacfy8mwthrj 667730673       90651 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
61m38g7ygpfvz 4244290431     140775 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
7tr4jwnamtmsr 356306711     3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
9pyrc45h3tgg9 1614593513     148272 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
arg5wx08suqm0 294476384      117839 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
ddttvdtqqy4rz 1835995903     104122 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
dg9n6z0w0cmnk 939937426      102875 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

9 rows selected.

6. The view gv$db_object_cache shows these statements have been identified as HOTCOPY of the statement in the function fx_num. We have exactly eight hot copies, as defined by the hidden parameter _kgl_hot_object_copies.

SQL> select inst_id, hash_value, namespace, child_latch, property, status, sum(executions) executions, name
  2  from gv$db_object_cache where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  3  group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH PROPERTY STATUS EXECUTIONS NAME
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------------
      2 294476384  SQL AREA            0 HOTCOPY4 VALID      117082 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 HOTCOPY4 VALID      117159 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0 HOT      VALID     3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 HOTCOPY2 VALID       90188 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 HOTCOPY2 VALID       90218 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 HOTCOPY8 VALID      102067 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 HOTCOPY8 VALID      101647 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 HOTCOPY5 VALID      141347 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 HOTCOPY5 VALID      141440 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 HOTCOPY6 VALID      147205 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 HOTCOPY6 VALID      147329 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 HOTCOPY3 VALID      103453 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 HOTCOPY3 VALID      103484 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 HOTCOPY1 VALID      124746 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 HOTCOPY1 VALID      125095 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 HOTCOPY7 VALID      139691 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 HOTCOPY7 VALID      139803 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

18 rows selected.

7. If we keep running this same test over and over again, we will confirm that the number of executions of the original sql_id (marked as hot) remains unchanged. At the same time the executions on the “SQL-copies” continue to grow. In other words, the “SQL-copies” now perform all executions of this statement.

[oracle@oradb02 ddml]$ ./launch_test.sh

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache 
  3  where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  4  group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 
      2 294476384  SQL AREA             0 HOTCOPY4 VALID 240474     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA         88672 HOTCOPY4 VALID 240582     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA             0 HOT 0    VALID 3657733    SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA         53015 HOT 0    VALID 3576660    SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA             0 HOTCOPY2 VALID 206588     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA         49905 HOTCOPY2 VALID 206662     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA             0 HOTCOPY8 VALID 228203     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA         20114 HOTCOPY8 VALID 228118     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA             0 HOTCOPY5 VALID 275321     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA         89725 HOTCOPY5 VALID 275482     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA             0 HOTCOPY6 VALID 280358     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA         48617 HOTCOPY6 VALID 279606     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA             0 HOTCOPY3 VALID 201829     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA         70399 HOTCOPY3 VALID 201773     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA             0 HOTCOPY1 VALID 244295     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA         97125 HOTCOPY1 VALID 244809     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA             0 HOTCOPY7 VALID 253317     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA         47999 HOTCOPY7 VALID 253521     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

18 rows selected.
Keeping statements as hot on a RAC database

1. As mentioned earlier, statements cannot be marked as hot if they are not first loaded into the library cache. For this reason, the statement in our example was not set as hot in instance one.

2. So now let’s run the load test on instance one to see what happens.

[oracle@oradb01 ddml]$ ./launch_test.sh

3. Below is an example of what you’d see if the statement is used on both instances but only marked as hot on one of the instances (inst_id=2).

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache 
  3  where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  4  group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------------
      1 356306711  SQL AREA            0          VALID      899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA        53015          VALID      832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA            0 HOTCOPY4 VALID      240474 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 HOTCOPY4 VALID      240582 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0 HOT      VALID     3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 HOTCOPY2 VALID      206588 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 HOTCOPY2 VALID      206662 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 HOTCOPY8 VALID      228203 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 HOTCOPY8 VALID      228118 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 HOTCOPY5 VALID      275321 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 HOTCOPY5 VALID      275482 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 HOTCOPY6 VALID      280358 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 HOTCOPY6 VALID      279606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 HOTCOPY3 VALID      201829 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 HOTCOPY3 VALID      201773 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 HOTCOPY1 VALID      244295 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 HOTCOPY1 VALID      244809 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 HOTCOPY7 VALID      253317 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 HOTCOPY7 VALID      253521 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

20 rows selected.

4. After running the procedure dbms_shared_pool.markhot one more time, we can now see that the fix is correctly working in both instances of the RAC.

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache 
  3  where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  4  group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------------
      1 294476384  SQL AREA            0 HOTCOPY4 VALID      235617 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 294476384  SQL AREA        88672 HOTCOPY4 VALID      235190 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA            0 HOT      VALID      899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA        53015 HOT      VALID      832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA            0 HOTCOPY2 VALID      243111 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA        49905 HOTCOPY2 VALID      243114 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA            0 HOTCOPY8 VALID      272199 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA        20114 HOTCOPY8 VALID      272379 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA            0 HOTCOPY5 VALID      223078 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA        89725 HOTCOPY5 VALID      223230 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA            0 HOTCOPY6 VALID      254701 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA        48617 HOTCOPY6 VALID      254807 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA            0 HOTCOPY3 VALID      238938 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA        70399 HOTCOPY3 VALID      238361 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA            0 HOTCOPY1 VALID      231836 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA        97125 HOTCOPY1 VALID      230782 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA            0 HOTCOPY7 VALID      225743 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA        47999 HOTCOPY7 VALID      225810 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA            0 HOTCOPY4 VALID      129780 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 HOTCOPY4 VALID      129861 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0 HOT      VALID     4565713 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     4469646 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 HOTCOPY2 VALID      120900 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 HOTCOPY2 VALID      121150 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 HOTCOPY8 VALID      120642 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 HOTCOPY8 VALID      120730 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 HOTCOPY5 VALID      117835 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 HOTCOPY5 VALID      117268 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 HOTCOPY6 VALID      140124 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 HOTCOPY6 VALID      140374 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 HOTCOPY3 VALID      118871 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 HOTCOPY3 VALID      118520 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 HOTCOPY1 VALID       98116 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 HOTCOPY1 VALID       98091 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 HOTCOPY7 VALID      119193 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 HOTCOPY7 VALID      119279 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

36 rows selected.
Restarting the database

1. It is extremely important to note that the statement’s hot status will not survive a database / instance reboot. This means you need to mark it as hot again each time you restart it.

2. Here are the steps to follow to make sure this setup remains available after an instance / database reboot:

a. Make sure you still have the hidden parameter _kgl_hot_object_copies set to number of CPUs.

b. Place the cursor on the library cache in each instance. This means you need to run the statement with the exact same code on each instance, to ensure it uses the right full_hash_value.

c. Mark the instance as hot using the procedure dbms_shared_pool.markhot. As a side note, you can set the global argument to true if you want to mark the statement as hot on all instances but, as mentioned before, the statement will only be marked as hot on those instances were the statement is present on the library cache.

Unmarking hot statements

1. Finally, you can unmark the hot statement using the procedure dbms_shared_pool.unmarkhot.

SQL> begin
  2    dbms_shared_pool.unmarkhot(hash => 'eb4cdceda1c495cd7cdc91e5153ccf17', namespace => 0, global => true);
  3  end;
  4  /

PL/SQL procedure successfully completed.

2. You can confirm that you’ve unmarked the statement by querying the view gv$db_object_cache.

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  3  group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 
      1 294476384  SQL AREA            0 OBSCOPY4 VALID      235617 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 294476384  SQL AREA        88672 OBSCOPY4 VALID      235190 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA            0          VALID      899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA        53015          VALID      832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA            0 OBSCOPY2 VALID      243111 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA        49905 OBSCOPY2 VALID      243114 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA            0 OBSCOPY8 VALID      272199 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA        20114 OBSCOPY8 VALID      272379 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA            0 OBSCOPY5 VALID      223078 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA        89725 OBSCOPY5 VALID      223230 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA            0 OBSCOPY6 VALID      254701 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA        48617 OBSCOPY6 VALID      254807 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA            0 OBSCOPY3 VALID      238938 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA        70399 OBSCOPY3 VALID      238361 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA            0 OBSCOPY1 VALID      231836 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA        97125 OBSCOPY1 VALID      230782 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA            0 OBSCOPY7 VALID      225743 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA        47999 OBSCOPY7 VALID      225810 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA            0 OBSCOPY4 VALID      385506 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 OBSCOPY4 VALID      385414 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0          VALID     4565713 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015          VALID     4469646 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 OBSCOPY2 VALID      359549 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 OBSCOPY2 VALID      359951 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 OBSCOPY8 VALID      376019 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 OBSCOPY8 VALID      376026 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 OBSCOPY5 VALID      420406 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 OBSCOPY5 VALID      420031 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 OBSCOPY6 VALID      455769 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 OBSCOPY6 VALID      455326 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 OBSCOPY3 VALID      340070 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 OBSCOPY3 VALID      339658 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 OBSCOPY1 VALID      364612 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 OBSCOPY1 VALID      365127 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 OBSCOPY7 VALID      395629 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 OBSCOPY7 VALID      395981 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

36 rows selected.

3. You will notice that all hot copies will be marked as obsolete, meaning they are no longer valid since the original statement is no longer hot.

I hope this post gives you an idea of how to work around the bottlenecks that can be caused by “cursor: pin s.” Please let me know if you have any questions, or tips of your own.

Categories: DBA Blogs

OPatchAuto Bug on 19.3 GI/RDBMS When Patching to 19.6

Thu, 2020-10-15 14:48

The other day I was doing a fresh install of Oracle 19.3. I was using the binaries from edelivery, doing some testing in that version, then patching to 19.6. The installation and the tests went fine. Next, I proceeded to analyze patch 30501910 and saw that everything was OK. 

## Note.- I trimmed the log for readability
[root@node2 ~]$ . oraenv
ORACLE_SID = [root] ? DB193H12
The Oracle base has been set to /u01/app/oracle
[root@node2 ~]$ export PATH=$ORACLE_HOME/OPatch:$PATH
[root@node2 ~]$ opatchauto apply /u01/patches/30501910 -analyze
OPatchauto session is initiated at Tue Oct  6 13:23:11 2020
...
Host:node2
CRS Home:/u01/app/19.3.0.0/grid
Version:19.0.0.0.0

==Following patches were SUCCESSFULLY analyzed to be applied:

Patch: /u01/patches/30501910/30489227
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

Patch: /u01/patches/30501910/30489632
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

Patch: /u01/patches/30501910/30655595
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

Patch: /u01/patches/30501910/30557433
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

OPatchauto session completed at Tue Oct  6 11:42:46 2020

Time taken to complete the session 2 minutes, 41 seconds


I proceeded with the patch with the command below:

opatchauto apply /u01/patches/30501910

Everything went well in node 1, with the patching completed successfully in GI and RDBMS, but the patch on node 2 failed on the RDBMS with the error below:

## Note.- I trimmed the log for readability
[root@node2 ~]$ opatchauto apply /u01/patches/30501910
...
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/oracle/product/19.3.0.0/dbhome_1, host: node2.
Command failed:  /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/opatchauto  apply /u01/patches/30501910 -oh /u01/app/oracle/product/19.3.0.0/dbhome_1 -target_type rac_database -binary -invPtrLoc /u01/app/oracle/product/19.3.0.0/dbhome_1/oraInst.loc -jre /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/jre -persistresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_node2_rac_2.ser -analyzedresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_analyze_node2_rac_2.ser
Command failure output: 
==Following patches FAILED in apply:

Patch: /u01/patches/30501910/30489227
Log: /u01/app/oracle/product/19.3.0.0/dbhome_1/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-05_17-25-22PM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: ApplySession failed in system modification phase... 'ApplySession::apply failed: java.io.IOException: oracle.sysman.oui.patch.PatchException: java.io.FileNotFoundException: /u01/app/oraInventory/ContentsXML/oui-patch.xml (Permission denied)' 

After fixing the cause of failure Run opatchauto resume
...

After looking for information, I found MOS (My Oracle Support) note 2582139.1, which directed me to copy the oui-patch.xml from node 1 to node 2. 

[grid@node1 ~]$ cd /u01/app/oraInventory/ContentsXML/
[grid@node1 ContentsXML]$ scp oui-patch.xml node2:/u01/app/oraInventory/ContentsXML/
oui-patch.xml   
 
[root@node2 ContentsXML]# ls -ltr
total 20
-rw-rw----. 1 grid   oinstall 565 Oct  6 11:13 inventory.xml
-rw-r--r--. 1 oracle oinstall 174 Oct  6 12:43 oui-patch.xml.back
-rw-rw----. 1 grid   oinstall 300 Oct  6 12:43 comps.xml
-rw-rw----. 1 grid   oinstall 292 Oct  6 12:43 libs.xml
-rw-rw----. 1 grid   oinstall 174 Oct  6 14:02 oui-patch.xml                                                                        

After doing this, I reran the OPatchAuto. This time I faced the error “Unable to create patchObject.”

## Note.- I trimmed the log for readability
[root@node2 ~]$ opatchauto apply /u01/patches/30501910
...
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/oracle/product/19.3.0.0/dbhome_1, host: node2.
Command failed:  /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/opatchauto  apply /u01/patches/30501910 -oh /u01/app/oracle/product/19.3.0.0/dbhome_1 -target_type rac_database -binary -invPtrLoc /u01/app/oracle/product/19.3.0.0/dbhome_1/oraInst.loc -jre /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/jre -persistresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_node2_rac_2.ser -analyzedresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_analyze_node2_rac_2.ser
Command failure output: 
==Following patches FAILED in apply:

Patch: /u01/patches/30501910/30489227
Log: 
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Unable to create patchObject
Possible causes are:
   ORACLE_HOME/inventory/oneoffs/30489227 is corrupted. PatchObject constructor: Input file "/u01/app/oracle/product/19.3.0.0/dbhome_1/inventory/oneoffs/30489227/etc/config/actions" or "/u01/app/oracle/product/19.3.0.0/dbhome_1/inventory/oneoffs/30489227/etc/config/inventory" does not exist. 

After fixing the cause of failure Run opatchauto resume
...

To address this error, I found another note — 2286025.1 — which advises taking any of the following actions:

  1. OPatch lsinventory
  2. OPatch – analyze
  3. Restore database home backup for Hostname2.

After trying options 1 and 2, I kept getting the same error, so I proceeded to patch the GI on node 2.

## Note.- I trimmed the log for readability
[root@node2 ~]$ . oraenv
ORACLE_SID = [root] ? +ASM2
The Oracle base has been set to /u01/app/grid
[root@node2 ~]$ export PATH=$ORACLE_HOME/OPatch:$PATH
[root@node2 ~]$ opatchauto apply /u01/patches/30501910 -oh $ORACLE_HOME
OPatchauto session is initiated at Tue Oct  6 12:28:25 2020

...

--------------------------------Summary--------------------------------

Patching is completed successfully. Please find the summary as follows:

Host:node2
CRS Home:/u01/app/19.3.0.0/grid
Version:19.0.0.0.0
Summary:

==Following patches were SUCCESSFULLY applied:

Patch: /u01/patches/30501910/30489227
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

Patch: /u01/patches/30501910/30489632
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

Patch: /u01/patches/30501910/30557433
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

Patch: /u01/patches/30501910/30655595
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

OPatchauto session completed at Tue Oct  6 12:49:02 2020
Time taken to complete the session 20 minutes, 38 seconds

For the RDBMS I followed my own blog post (How to Clone an Oracle Home in 19c) and cloned the Oracle home from node 1, as this one was already patched correctly with 30501910.

[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_1
[oracle@node1 ~]$ $ORACLE_HOME/runInstaller -silent  -createGoldImage -destinationLocation /u01/patches
Launching Oracle Database Setup Wizard...

Successfully Setup Software.
Gold Image location: /u01/patches/db_home_2020-10-06_02-11-59PM.zip

[oracle@node1 ~]$ unset ORACLE_HOME
[oracle@node1 ~]$ unset ORACLE_SID
[oracle@node1 ~]$ unset ORACLE_RSID
[oracle@node1 ~]$ unset ORACLE_UNQNAME
[oracle@node1 ~]$ unset ORACLE_BASE

[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_2
[oracle@node1 ~]$ export ORACLE_HOSTNAME=node1
[oracle@node1 ~]$ export ORA_INVENTORY=/u01/app/oraInventory
[oracle@node1 ~]$ export NODE1_HOSTNAME=node1
[oracle@node1 ~]$ export NODE2_HOSTNAME=node2
[oracle@node1 ~]$ export ORACLE_BASE=/u01/app/oracle

[oracle@node1 ~]$ unzip -q /u01/patches/db_home_2020-10-06_02-11-59PM.zip -d $ORACLE_HOME
 
[oracle@node1 ~]$ env | egrep "ORA|NODE"
ORACLE_BASE=/u01/app/oracle
NODE2_HOSTNAME=node2
ORACLE_HOSTNAME=node1
NODE1_HOSTNAME=node1
ORA_INVENTORY=/u01/app/oraInventory
ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_2

[oracle@node1 ~]$ cd $ORACLE_HOME
[oracle@node1 dbhome_2]$ ${ORACLE_HOME}/runInstaller -ignorePrereq -waitforcompletion -silent \
> -responseFile ${ORACLE_HOME}/install/response/db_install.rsp \
> oracle.install.option=INSTALL_DB_SWONLY \
> ORACLE_HOSTNAME=${ORACLE_HOSTNAME} \
> UNIX_GROUP_NAME=oinstall \
> INVENTORY_LOCATION=${ORA_INVENTORY} \
> SELECTED_LANGUAGES=en \
> ORACLE_HOME=${ORACLE_HOME} \
> ORACLE_BASE=${ORACLE_BASE} \
> oracle.install.db.InstallEdition=EE \
> oracle.install.db.OSDBA_GROUP=dba \
> oracle.install.db.OSOPER_GROUP=dba \
> oracle.install.db.OSBACKUPDBA_GROUP=dba \
> oracle.install.db.OSDGDBA_GROUP=dba \
> oracle.install.db.OSKMDBA_GROUP=dba \
> oracle.install.db.OSRACDBA_GROUP=dba \
> oracle.install.db.CLUSTER_NODES=${NODE1_HOSTNAME},${NODE2_HOSTNAME} \
> oracle.install.db.isRACOneInstall=false \
> oracle.install.db.rac.serverpoolCardinality=0 \
> oracle.install.db.config.starterdb.type=GENERAL_PURPOSE \
> oracle.install.db.ConfigureAsContainerDB=false \
> SECURITY_UPDATES_VIA_MYORACLESUPPORT=false \
> DECLINE_SECURITY_UPDATES=true
Launching Oracle Database Setup Wizard...
...
The response file for this session can be found at:
 /u01/app/oracle/product/19.3.0.0/dbhome_2/install/response/db_2020-10-06_03-06-10PM.rsp

You can find the log of this install session at:
 /u01/app/oraInventory/logs/InstallActions2020-10-06_03-06-10PM/installActions2020-10-06_03-06-10PM.log

As a root user, execute the following script(s):
	1. /u01/app/oracle/product/19.3.0.0/dbhome_2/root.sh

Execute /u01/app/oracle/product/19.3.0.0/dbhome_2/root.sh on the following nodes: 
[node1, node2]

After doing this, the only task remaining was to verify that the Oracle home was patched correctly, and switch the database to the new Oracle home.

[oracle@node1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
30557433;Database Release Update : 19.6.0.0.200114 (30557433)
30489227;OCW RELEASE UPDATE 19.6.0.0.0 (30489227)

OPatch succeeded.

[oracle@node2 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
30557433;Database Release Update : 19.6.0.0.200114 (30557433)
30489227;OCW RELEASE UPDATE 19.6.0.0.0 (30489227)

OPatch succeeded.

[oracle@node1 ~]$ srvctl modify database -db test -oraclehome /u01/app/oracle/product/19.3.0.0/dbhome_2
[oracle@node1 ~]$ srvctl config database -db test -a
Database unique name: test
Database name: test
Oracle home: /u01/app/oracle/product/19.3.0.0/dbhome_2
Oracle user: oracle
Spfile: +DATA/test/PARAMETERFILE/spfile.270.1053084685
Password file: +DATA/test/PASSWORD/pwdtest.258.1053083711
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: RECO,DATA
Mount point paths: 
Services: 
Type: RAC
Start concurrency: 
Stop concurrency: 
OSDBA group: dba
OSOPER group: oper
Database instances: test1,test2
Configured nodes: node1,node2
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is administrator managed

I plan to re-try this exercise by copying the oui-patch.xml before patching and see if that fixes this error. In the meantime, I hope this blog post helps you in case you face the above-mentioned bugs.

Categories: DBA Blogs

How to Enable RAM Cache in Oracle Exadata

Thu, 2020-10-08 08:23
A brief background on RAM cache or in-memory OLTP acceleration

In this post I will present a feature — to simplify let’s call it RAM cache — introduced in Oracle Exadata Storage Server image 18.1. You’ve probably heard a lot about the new Oracle Exadata X8M and its Intel Optane DC Persistent Memory (PMem). This new feature + architecture allows the database processes running on the database servers to remotely read and write through a protocol called Remote Direct Memory Access (RDMA) from / to the PMem cards in the storage servers. You can find the detailed Oracle Exadata X8M deployment process here.

It’s true that RDMA has existed in the Oracle Exadata architecture from the beginning, as Oracle points out in their blog post titled, Introducing Exadata X8M: In-Memory Performance with All the Benefits of Shared Storage for both OLTP and Analytics:

RDMA was introduced to Exadata with InfiniBand and is a foundational part of Exadata’s high-performance architecture.

What you may not know, is there’s a feature called in-memory OLTP acceleration (or simply RAM cache) which was introduced in the Oracle Exadata Storage Server image 18.1.0.0.0 when Oracle Exadata X7 was released. This feature allows read access to the storage server RAM on any Oracle Exadata system (X6 or higher) running that version or above. Although, this is not the same as PMem, since RAM is not persistent, it is still very cool since it allows you to take advantage of the RAM available in the storage servers.

In-memory OLTP acceleration (or simply RAM cache).

Modern generations of Exadata storage servers come with a lot of RAM available. By comparison, X8 and X7 come with 192GB of RAM by default, as opposed to the 128GB of RAM that came with X6.

Unfortunately, the RAM cache feature is only available on storage servers X6 or higher and these are the requirements:

  • Oracle Exadata System Software 18c (18.1.0).
  • Oracle Exadata Storage Server X6, X7 or X8.
  • Oracle Database version 12.2.0.1 April 2018 DBRU, or 18.1 or higher.

That large amount of RAM is rarely fully utilized by the Oracle Exadata storage servers. This RAM cache feature allows you to use all or part of the available RAM in the storage servers. Doing this extends your database buffer cache to the storage server’s RAM for read operations.

In the new Oracle Exadata X8M the I/O latency is under 19µs for read operations. This is due to the PMem cache combined with the RoCE (RDMA over converged ethernet) network. In the Oracle Exadata X7/X8 the I/O latency for reads with RAM cache using RDMA over InfiniBand is around 100µs. Without RAM cache the number goes up to 250µs reading directly from the flash cache. The following information is from the Oracle Exadata Database Machine X8-2 data sheet:

For OLTP workloads Exadata uniquely implements In-Memory OLTP Acceleration. This feature utilizes the memory installed in Exadata Storage Servers as an extension of the memory cache (buffer cache) on database servers. Specialized algorithms transfer data between the cache on database servers and in-memory cache on storage servers. This reduces the IO latency to 100 us for all IOs served from in-memory cache. Exadata’s (sic) uniquely keeps only one in-memory copy of data across database and storage servers, avoiding memory wastage from caching the same block multiple times. This greatly improves both efficiency and capacity and is only possible because of Exadata’s unique end-to-end integration.

How I set up RAM cache in the Exadata storage servers

As I mentioned previously, the recent generation of Oracle Exadata storage servers come with a lot of RAM. This RAM is normally not used at its fullest by the cellsrv services and features. Having said that, I normally take into consideration the amount of free memory (RAM) in the storage servers. First, I pick the storage server using the most RAM and do the math: freemem*0.7=RAM cache value. Next, I set the RAM cache to 70 percent of the free memory of the storage server using more RAM than the others. Note: I avoid using all the free memory for the RAM cache in case the storage server requires more memory for storage indexes or other needs in the future.

Let’s say my busiest storage server has 73GB of free memory. Applying the formula we get to: 73*0.7=51.1GB.

Oracle Exadata architecture was built to spread the workload evenly across the entire storage grid, so you’ll notice that the storage servers use pretty much the same amount of memory (RAM).

Here comes the action and fun. We must first check how much memory is available in our storage servers by running this from dcli (make sure your cell_group file is up-to-date):

[root@exadbadm01 ~]# dcli -l root -g cell_group free -g

In my case the cel01 is the storage server using more memory than others. Let’s check some details of this storage server:

[root@exaceladm01 ~]# cellcli
CellCLI: Release 19.2.7.0.0 - Production on Thu Aug 06 07:44:59 CDT 2020
Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.
CellCLI> LIST CELL DETAIL
name: exaceladm01
accessLevelPerm: remoteLoginEnabled
bbuStatus: normal
cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012
cpuCount: 24/24
diagHistoryDays: 7
fanCount: 8/8
fanStatus: normal
flashCacheMode: WriteBack
flashCacheCompress: FALSE
httpsAccess: ALL
id: 1446NM508U
interconnectCount: 2
interconnect1: bondib0
iormBoost: 0.0
ipaddress1: 192.168.10.13/22
kernelVersion: 4.1.12-124.30.1.el7uek.x86_64
locatorLEDStatus: off
makeModel: Oracle Corporation SUN SERVER X7-2L High Capacity
memoryGB: 94
metricHistoryDays: 7
notificationMethod: mail,snmp
notificationPolicy: critical,warning,clear
offloadGroupEvents:
powerCount: 2/2
powerStatus: normal
ramCacheMaxSize: 0
ramCacheMode: Auto
ramCacheSize: 0
releaseImageStatus: success
releaseVersion: 19.2.7.0.0.191012
rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64
releaseTrackingBug: 30393131
rollbackVersion: 19.2.2.0.0.190513.2
smtpFrom: "exadb Exadata"
smtpFromAddr: exaadmin@loredata.com.br
smtpPort: 25
smtpServer: mail.loredata.com.br
smtpToAddr: support@loredata.com.br
smtpUseSSL: FALSE
snmpSubscriber: host=10.200.55.182,port=162,community=public,type=asr,asrmPort=16161
status: online
temperatureReading: 23.0
temperatureStatus: normal
upTime: 264 days, 8:48
usbStatus: normal
cellsrvStatus: running
msStatus: running
rsStatus: running

From the output above we can see that the parameter ramCacheMode is set to auto while ramCacheMaxSize and ramCacheSize are 0. These are the default values and mean the RAM cache feature is not enabled.

This storage server has ~73GB of free / available memory (RAM):

[root@exaceladm01 ~]# free -m
total used free shared buff/cache available
Mem: 96177 15521 72027 4796 8628 75326
Swap: 2047 0 2047

Now we can enable the RAM cache feature by changing the parameter ramCacheMode to “On”:

CellCLI> ALTER CELL ramCacheMode=on
Cell exaceladm01 successfully altered

Immediately after the change we check the free / available memory (RAM) in the storage server operation system:

[root@exaceladm01 ~]# free -m
total used free shared buff/cache available
Mem: 96177 15525 72059 4796 8592 75322
Swap: 2047 0 2047

Not much has changed, because the memory remains available for the storage server to use for RAM cache. However, when we enable the RAM cache feature, the storage server will not automatically allocate / use this memory.

We can see that only 10GB was defined in the ramCacheMaxSize and ramCacheSize parameters:

CellCLI> LIST CELL DETAIL
name: exaceladm01
accessLevelPerm: remoteLoginEnabled
bbuStatus: normal
cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012
cpuCount: 24/24
diagHistoryDays: 7
fanCount: 8/8
fanStatus: normal
flashCacheMode: WriteBack
flashCacheCompress: FALSE
httpsAccess: ALL
id: 1446NM508U
interconnectCount: 2
interconnect1: bondib0
iormBoost: 0.0
ipaddress1: 192.168.10.13/22
kernelVersion: 4.1.12-124.30.1.el7uek.x86_64
locatorLEDStatus: off
makeModel: Oracle Corporation SUN SERVER X7-2L High Capacity
memoryGB: 94
metricHistoryDays: 7
notificationMethod: mail,snmp
notificationPolicy: critical,warning,clear
offloadGroupEvents:
powerCount: 2/2
powerStatus: normal
ramCacheMaxSize: 10.1015625G
ramCacheMode: On
ramCacheSize: 10.09375G
releaseImageStatus: success
releaseVersion: 19.2.7.0.0.191012
rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64
releaseTrackingBug: 30393131
rollbackVersion: 19.2.2.0.0.190513.2
smtpFrom: "exadb Exadata"
smtpFromAddr: exaadmin@loredata.com.br
smtpPort: 25
smtpServer: mail.loredata.com.br
smtpToAddr: support@loredata.com.br
smtpUseSSL: FALSE
snmpSubscriber: host=10.200.55.182,port=162,community=public,type=asr,asrmPort=16161
status: online
temperatureReading: 23.0
temperatureStatus: normal
upTime: 264 days, 8:49
usbStatus: normal
cellsrvStatus: running
msStatus: running
rsStatus: running

To confirm we can run the following query from cellcli:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
10.1015625G On 10.09375G

To reduce the memory used by the RAM cache feature we can simply change the ramCacheMaxSize parameter:

CellCLI> ALTER CELL ramCacheMaxSize=5G;
Cell exaceladm01 successfully altered

If we check the values of the RAM cache parameters we will see this:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
5G On 0

As soon as the database blocks start being copied to the RAM cache we will see the ramCacheSize value increasing:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
5G On 3.9250G

Increasing a bit more:

CellCLI> ALTER CELL ramCacheMaxSize=15G;
Cell exaceladm01 successfully altered

When checking, you’ll notice it takes a while for the cellsrv to populate the RAM cache with blocks copied from the flash cache:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
15G On 0
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
15G On 11.8125G
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
15G On 15G

Re-setting to auto makes everything clear again:

CellCLI> ALTER CELL ramCacheMode=Auto
Cell exaceladm01 successfully altered
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
0 Auto 0

Now we adjust to the value we got from our calculation of 70 percent of the free memory:

CellCLI> ALTER CELL ramCacheMode=On
Cell exaceladm01 successfully altered
CellCLI> ALTER CELL ramCacheMaxSize=51G
Cell exaceladm01 successfully altered
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
51G On 32.8125G
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
51G On 35.2500G
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
51G On 51G

With that configuration in place, if we want to be notified if the storage server is running out memory we can quickly create a threshold based on the cell memory utilization (CL_MEMUT) metric to notify us when the memory utilization goes beyond 95 percent:

CellCLI> CREATE THRESHOLD CL_MEMUT.interactive comparison=">", critical=95

Conclusion

To sum up, RAM cache (aka, in-memory OLTP acceleration) is a feature available only on Oracle Exadata Database Machine X6 or higher with at least the 18.1 image. In addition, it’s available for the Oracle Database 12.2.0.1 with April 2018 DBRU or higher. This feature helps extend the database buffer cache to the free RAM in the storage servers, but only for read operations, since RAM is not persistent. For persistent memory, Oracle introduced the Persistent Memory Cache with Oracle Exadata Database Machine X8M.

It’s worth mentioning that a database will only leverage RAM cache when there is pressure on the database buffer cache. The data blocks present in the RAM cache are persistently stored in the storage server’s flash cache. When a server process on the database side requests a block that is no longer stored in the database buffer cache, but is in the RAM cache, the cellsrv will send this block from the RAM cache to the buffer cache for the server process to read it. It is faster to read from the RAM cache instead of reading it from the flash cache or disk.

While the in-memory OLTP acceleration feature is not a magic solution, it is a plus for our Exadata system. Since we almost always see free memory in the storage server, this is a way of optimizing the resources we’ve already paid for. This feature is already in the Exadata licenses, so there is no extra cost option, and it is not related to the database in-memory option. Having Exadata is all you need.

Happy caching! See you next time!

Franky

References:

Categories: DBA Blogs

Resolving Error ORA-28353: Failed to Open Wallet

Tue, 2020-10-06 08:15

Full disclosure: this is a post I’ve had in draft mode for almost one and a half years.

I noticed the original error after applying the October 2018 bundle patch (BP) for 11.2.0.4. While I realize most clients are no longer in 11.2.0.4, this information remains valid for anyone upgrading from 11.2 to 12, 18 or 19c.

I had been doing several tests on my Spanish RAC (Real Application Cluster) Attack for 12.2. The goal was to patch my client to October 2018 PSU; obtaining enough security leverage to avoid patching their database and do their DB (database) upgrade to 18c. I created RAC VMs to enable testing. I also set up my environment to match the client’s, which had TDE with FIPS 140 enabled (I will provide more details on this later in the post).

While the patching was successful, the problem arose after applying the patch. I was unable to open the database despite having the correct password for the encryption key.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-28365: wallet is not open

SQL> alter system set encryption key identified by “xxx”;
alter system set encryption key identified by “xxxx”
*
ERROR at line 1:
ORA-28353: failed to open wallet

SQL> select * from v$encryption_wallet;

INST_ID WRL_TYPE
---------- --------------------
WRL_PARAMETER
------------------------
STATUS
------------------
1 file
/etc/oracle/wallets/$ORACLE_UNQNAME
CLOSED

SQL> ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY *****;
ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY *****
*
ERROR at line 1:
ORA-28353: failed to open wallet

When I tried to open the database, this is what appeared in the alert.log:

Tue Feb 16 11:21:56 2018
QMNC started with pid=81, OS id=86184
kcbztek_get_tbskey: decrypting encrypted key for tablespace 16 without opening the wallet

I did a rollback of the patch, and as soon as I rolled back the patch, the database opened:

SQL> select name,open_mode ,database_role from v$database;

NAME OPEN_MODE DATABASE_ROLE
--------- -------------------- ----------------
TEST READ WRITE PRIMARY

After many days of looking for information to address the error, I noticed that FIPS 140-2 was enabled. FIPS (Federal Information Processing Standard), 140-2, is a US government standard defining cryptographic module security requirements.

You can see it’s enabled for SSL in the following file:

[oracle@collabn1 /u01/app/oracle/product/11.2.0.4/dbhome_1/ldap/admin ]$ cat fips.ora
#
# file to be reside in $ORACLE_HOME/ldap/admin
#
# sec-220
#
sslfips_140=true

I was able to find a document called After Applying October 2018 CPU/PSU, Auto-Login Wallet Stops Working For TDE With FIPS Mode Enabled (Doc ID 2474806.1).

This helped me discover the solution is to patch the DB with October 2018 PSU and, after patching the binaries, recreate the auto login file cwallet.sso with a compatibility of version 12. After you have done this, you will be able to open your DB normally.

[oracle@collabn1 ~]$ cd /etc/oracle/wallets/test
[oracle@collabn1 /etc/oracle/wallets/test]$ mv cwallet.sso cwallet.sso.original

[oracle@collabn1 /etc/oracle/wallets/test]$ orapki wallet create -wallet /etc/oracle/wallets/test -auto_login -compat_v12
Oracle PKI Tool : Version 11.2.0.4.0 - Production
Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.
Enter wallet password: ******

[oracle@collabn1 ~]$ srvctl status database -d test
Instance test1 is running on node collabn1
Instance test2 is running on node collabn2

SQL> select name,open_mode ,database_role from v$database;

NAME OPEN_MODE DATABASE_ROLE
--------- -------------------- ----------------
TEST READ WRITE PRIMARY

SQL> select * from v$encryption_wallet;

INST_ID WRL_TYPE
---------- --------------------
WRL_PARAMETER
------------------------
STATUS
------------------
1 file
/etc/oracle/wallets/$ORACLE_UNQNAME
OPEN

IMPORTANT: DO NOT recreate the ewallet.p12 file! This will likely cause data loss, as you will lose the master key required to decrypt your encrypted data.

Below is an example of what you DO NOT WANT TO DO:

[oracle@collabn1 ~]$ cd /etc/oracle/wallets/test
[oracle@collabn1 /etc/oracle/wallets/test]$ mv ewallet.p12 ewallet.p12.old

[oracle@collabn1 /etc/oracle/wallets/test]$ sqlplus / as sysdba
SQL> ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY Welcome1 WITH BACKUP;

keystore altered

It’s important to note that the above also applies to Jan 2019 Database BP, or to any upgrade from 11.2.0.4 to 12, 18 or 19c. This means you will face this issue for anything after October 2018 if you are using TDE and SSL with FIPS.

Note: This was originally posted in rene-ace.com.

Categories: DBA Blogs

Overcoming ORA-01722 Error While Doing an Upgrade With Different DST Versions

Tue, 2020-09-29 14:59

This will be a short entry, but hopefully, it will help if you face the error “ORA-01722: invalid number” while doing an upgrade.

Today I was doing an Oracle RDBMS upgrade from 12.2 with DST 34, to 19.7 with DST 35. The upgrade failed with the error below:

Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed

-------------------------------------------------
Errors in database [TEST]
Stage     [DBUPGRADE]
Operation [STOPPED]
Status    [ERROR]
Info    [
Error: UPG-1400
UPGRADE FAILED [TEST]
Cause: This indicates that the database upgrade failed with errors.
For further details, see the log file located at /u01_diag/working/pythian/upgrade/upg_logs/TEST/102/autoupgrade_20200921_user.log]

-------------------------------------------------
Logs: [/u01_diag/working/pythian/upgrade/upg_logs/TEST/102/autoupgrade_20200921_user.log]
-------------------------------------------------Job 102 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [0]
Jobs failed                    [1]
Jobs pending                   [0]
-------------------- JOBS FAILED ---------------------
Job 102 for TEST

Exiting


-- From the Upgrade Log


17:12:53 SQL> Rem Check if time zone file version used by the database exists in new home
17:12:53 SQL> SELECT TO_NUMBER('MUST_PATCH_TIMEZONE_FILE_VERSION_ON_NEW_ORACLE_HOME')
17:12:53   2     FROM sys.props$
17:12:53   3     WHERE
17:12:53   4       (
17:12:53   5        (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > &C_LTZ_CONTENT_VER)
17:12:53   6        AND
17:12:53   7        (0 = (select count(*) from v$timezone_file))
17:12:53   8       );
old   5:       (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > &C_LTZ_CONTENT_VER)
new   5:       (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > 32)
SELECT TO_NUMBER('MUST_PATCH_TIMEZONE_FILE_VERSION_ON_NEW_ORACLE_HOME')
                 *
ERROR at line 1:
ORA-01722: invalid number

While searching for the error, the first thing that came into my mind is that the 19.7 was patched incorrectly, but that wasn’t the case. As you can see, the 19.7 was patched with a higher version of the DST patch than the 12.2 OH

([SID:TEST][oracle@hostname:/u01/app/oracle/product/12.2.0.1/dbhome_1/OPatch ] )
oracle $ ./opatch lspatches
29997937;RDBMS - DSTV34 UPDATE - TZDATA2019B
30133625;OJVM RELEASE UPDATE: 12.2.0.1.191015 (30133625)
30138470;Database Oct 2019 Release Update : 12.2.0.1.191015 (30138470)
30122814;OCW OCT 2019 RELEASE UPDATE 12.2.0.1.191015 (30122814)

OPatch succeeded.

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/OPatch ] )
oracle $ ./opatch lspatches
30565805;RDBMS 19C  REGRESSION  ORA-01843  NOT A VALID MONTH ERROR WHEN EXECUTING TO_DATE(20191120,RRMMDD)
31335037;RDBMS - DSTV35 UPDATE - TZDATA2020A
30894985;OCW RELEASE UPDATE 19.7.0.0.0 (30894985)
30869156;Database Release Update : 19.7.0.0.200414 (30869156)

OPatch succeeded.

My first move was to search for the following files in the 19.7 OH:

  • timezlrg_34.dat
  • timezone_34.dat

They weren’t there, and because this is a cumulative patch, DST 34 files shouldn’t be missing from the 19.7 OH patched with DST 35. Accordingly, I would classify this as a bug in the DST 35 patch.

([SID:TEST][oracle@hostname:home/oracle/working/antunez ] )
oracle $ cd $ORACLE_HOME/oracore/zoneinfo

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ ls -ltr | egrep "timezlrg_34.dat|timezone_34.dat" | wc -l
0

I fixed this by copying the files from the 12.2 $ORACLE_HOME/oracore/zoneinfo to the 19.7 $ORACLE_HOME/oracore/zoneinfo

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ cp /u01/app/oracle/product/12.2.0.1/dbhome_1/oracore/zoneinfo/timezone_34.dat ./

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ cp /u01/app/oracle/product/12.2.0.1/dbhome_1/oracore/zoneinfo/timezlrg_34.dat ./

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ ls -ltr | egrep "timezlrg_34.dat|timezone_34.dat" | wc -l
2

Once I did this, I relaunched my upgrade. This time, it finished successfully

([SID:TEST][oracle@hostname:/home/oracle/working/antunez/TEST ] )
oracle $ $ORACLE_HOME19c/jdk/bin/java -jar $ORACLE_HOME19c/rdbms/admin/autoupgrade.jar -config TEST_config.cfg -mode DEPLOY -noconsole
AutoUpgrade tool launched with default options
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed

Job 102 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 102 for TEST

---- Drop GRP at your convenience once you consider it is no longer needed ----
Drop GRP from TEST: drop restore point AUTOUPGRADE_221145114461854_TEST

Hopefully, this small post will help you if you are faced with this error, as it seems the DST 35 patch is not including the necessary DST 34 files.

Note: This was originally posted on rene-ace.com.

Categories: DBA Blogs

Using SQLCODE Function

Tue, 2020-09-08 18:13

The idea of this post is to show how to handle errors using SQLCODE function. That function shows the code error for the latest exception when running a PL/SQL code.

Here is a small example of code that raises an exception when two or more consecutive ORA-02292 errors occur.

--Create a Table to capture how many times ORA-02292 errors occurred.
CREATE TABLE tcount (x int); -- table to count the number of 2292 errors

--Parent Table
create table orders (order_id int, total_sales int);
CREATE UNIQUE INDEX orders_pk on orders (order_id);
ALTER TABLE orders ADD CONSTRAINT orders_pk PRIMARY KEY (order_id) ENABLE;

--Child Table
CREATE TABLE products
( product_id numeric(10) not null,
supplier_id numeric(10) not null,
order_id int,
CONSTRAINT fk_orders
FOREIGN KEY (order_id)
REFERENCES orders(order_id)
);

--Add data in Parent and Child Tables
insert into orders values (1,1);
insert into products values (1,1,1);
commit;

--Verify the data on those Tables
select * from orders;
select * from products;

--Create the procedure to Delete data from Parent table and force the error ORA-02292
CREATE OR REPLACE PROCEDURE delete_order
(order_id_in IN NUMBER)
IS
vtest2292 int := 0;
BEGIN
  delete from orders where order_id = order_id_in;
  commit;

  -- if the delete instruction runs fine the tcount table is truncated
  EXECUTE IMMEDIATE ('truncate table tcount');

EXCEPTION
  WHEN OTHERS THEN
  DECLARE
  error_code NUMBER := SQLCODE;
  BEGIN
    IF error_code = -2292 THEN
      DBMS_OUTPUT.PUT_LINE('ERROR 2292!!!!!!!!!!!!!'); -- error found
      insert into tcount values (1);
      commit;

      select count(1) into vtest2292 from tcount;

      IF vtest2292 >= 2 then
        DBMS_OUTPUT.PUT_LINE('ERROR 2292 >x2!!!!!!!!!!!!!!'); -- two or more consecutive errors found
        raise_application_error (-20001,'Two or more ORA-2292 were occurred deleting an order.');
      END IF;
    ELSE
      raise_application_error (-20002,'An ERROR has occurred deleting an order.');
    END IF;
  END;
END;
/

--Job to run the procedure delete_order and confirm that just after the second consecutive execution the error will be written in alert log
BEGIN
  DBMS_SCHEDULER.create_job (
  job_name => 'job_delete_order',
  job_type => 'PLSQL_BLOCK',
  job_action => 'begin delete_order(1); end;',
  start_date => SYSTIMESTAMP,
  repeat_interval => 'FREQ=HOURLY;BYMINUTE=0; interval=1;',
  enabled => TRUE);
END;
/

--Running the job job_delete_order to force the error, we have to monitor the database alert log to confirm that just after two executions the error will be visible there
set serveroutput on
select count(1) from tcount;
exec dbms_scheduler.run_job('job_delete_order',FALSE);
select count(1) from tcount;

tail -f alert_<database_name>.log

The text associated with the ORA-02292 error is “integrity constraint (%s.%s) violated – child record found.” This issue can be fixed by deleting the child rows first, and afterward deleting the parent row.

Another option is to modify the foreign key constraint to delete child rows when a parent row is deleted by setting up the “Foreign Key with Delete Cascade” option, but with SQLCODE function we can capture the error and handle it as needed.

Categories: DBA Blogs

How To Get The Most Out Of Your Next Virtual Learning Event

Wed, 2020-09-02 10:18

COVID-19 pandemic has disrupted every aspect of our lives this year. But in July, in spite of all the upheaval, the All India Oracle Users Group (AIOUG) still managed to hold its annual Oracle Groundbreakers Yatra conference. By moving the event online, the organizers were able to present more than 100 sessions in 14 days, offering over 125 hours of learning, plus networking with 100+ speakers from more than two dozen countries.

COVID-19 has made virtual learning the new normal. I know not everyone is happy about this but, to be honest, I love virtual webinars. They offer benefits you just can’t get from live events, including:

  • The very best subject matter experts. Many people who deserve the title of “expert” can’t afford the time it takes to travel to a live event. But when organizers are running a virtual conference, they can invite experts from anywhere in the world, and make them available to the audience at almost any time.
  • Savings in time and money. Online conferences give me all the learning of a live event without the cost of travel, food and accommodations. Plus, attending virtually means my time commitment is measured in hours instead of days.
  • Learning without intimidation. Not everyone is comfortable asking questions in front of a large crowd. Online learning is a great equalizer, and anyone can raise a virtual hand to ask a question without feeling shy. These events have made it easy for me to connect with speakers and grow my network through LinkedIn and Twitter.

But we should also talk about the things that make virtual events less than ideal. During the OG Yatra conference, Oracle’s Jim Grisanzio listed some major drawbacks of the online experience:

  • There’s no human contact. When your speaker is present only virtually, it’s easy to get distracted and end up learning nothing at all.
  • It’s boring. Virtual presentations lack eye contact, real-time exchanges, and audience reactions that help to keep us engaged with the material.
  • It’s stressful. As countless teachers have discovered during the pandemic, the attention span for online learning is much shorter than it is for the live classroom experience.

There’s no doubt that virtual learning presents real challenges. But as someone who has grown to actually prefer online events, I’m happy to offer some practical tips to help you make the most of virtual webinars and learning:

  • Don’t be greedy. Don’t sign up for every session. You’ll end up exhausted and unable to learn much of anything. Instead, be selective. Choose only those sessions that really interest you and that are worthy of your complete focus.
  • Do your homework. Before you join a webinar, do a bit of research on the topic. The more basic knowledge you have, the more value you’ll get from the session.
  • Learn about the speakers. Review their LinkedIn profiles so that you know something about their background and career path. This way, you’ll be better able to form realistic expectations for each session.
  • Focus completely on learning. Don’t try to do your job at the same time you’re attending a webinar. You’ll learn nothing, and your work will suffer. If you’re holding a pager or you have an on-call shift, flip the pager and swap shifts with a colleague.
  • Take notes. Before you log on, make sure you have a notebook and pen at the ready. Research shows that old-school note-taking leads to better retention of learning. That’s because writing notes by hand forces our brains to process what we’re putting down on paper. Plus, writing notes by hand does a better job of triggering that brain activity than typing notes on a computer. People who take screenshots of slides typically never look at them again. When it comes to learning, screenshots are pretty much useless.

I hope this post will help you make the most of virtual webinars and learning. Sure, online events have their drawbacks, but they are undeniably the way of the future — especially in light of COVID-19. Today, virtual learning offers us a great way to acquire knowledge safely and affordably from just about anywhere. In the future, I believe technology will create solutions to some of the difficulties of the online experience. But for now, organizations like the All India Oracle Users Group deserve our thanks for creating our industry’s newest and best opportunities to learn.

Categories: DBA Blogs

CreateGoldImage Followup – Is CreateGoldImage Required for Cloning Oracle Home?

Tue, 2020-08-25 15:38

On October 31, 2019, I blogged about How To Use CreateGoldImage For Cloning In 19c.

CreateGoldImage seems like a nice new feature; however, it’s also very buggy.

Here are a few bugs:

2641469.1 27432355.8 27432338.8

I received a suggestion to use tar to clone Grid Home; however, I was uncertain if it would work.

Why would Oracle create a new process for CreateGoldImage when traditional tar works?

Out of curiosity, I decided to test this by creating tarball for GRID_HOME from 19.8 RAC.

Use the tarball to upgrade Oracle Restart 12.2 to 19.8

==============================
Source: 19c Oracle RAC is located at /u01/app/19.0.0.0/grid
==============================

### Create tarball for 19.8 GRID_HOME from RAC instance:

[root@ol7-19-lax1 grid]# pwd
/u01/app/19.0.0/grid
[root@ol7-19-lax1 grid]# time tar -cvzpf /tmp/grid19.8_clone.tar.gz . > /tmp/clone_grid19.8.log ; echo $?

### Transfer tarball to target.

==============================
TARGET: 19c Oracle Restart is located at /u01/app/oracle/19.8.0.0/grid
==============================

I know what you are thinking. Why is GRID_HOME at two different locations?

The two vagrant boxes I have used to create test cases were created by two different individuals.

### Review current patch for 12.2 Oracle Restart:

[grid@db-asm-1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
27144050;Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)
28163235;ACFS JUL 2018 RELEASE UPDATE 12.2.0.1.180717 (28163235)
28163190;OCW JUL 2018 RELEASE UPDATE 12.2.0.1.180717 (28163190)
28163133;Database Jul 2018 Release Update : 12.2.0.1.180717 (28163133)
OPatch succeeded.

### Copy tarball to /u01/app/oracle/19.8.0.0.

[grid@db-asm-1 19.8.0.0]$ time cp -fv /media/swrepo/grid19.8_clone.tar.gz . ; echo $?
‘/media/swrepo/grid19.8_clone.tar.gz’ -> ‘./grid19.8_clone.tar.gz’
[grid@db-asm-1 19.8.0.0]$ ls -l
total 7015760
-rwxr-xr-x 1 grid oinstall 7184131622 Aug 15 22:30 grid19.8_clone.tar.gz

### Unzip tarball to /u01/app/oracle/19.8.0.0

[grid@db-asm-1 19.8.0.0]$ time gunzip grid19.8_clone.tar.gz
[grid@db-asm-1 19.8.0.0]$ ls -l
total 13542216
-rwxr-xr-x 1 grid oinstall 13867223040 Aug 15 22:30 grid19.8_clone.tar

### Create directory for GRID_HOME – /u01/app/oracle/19.8.0.0/grid

[grid@db-asm-1 19.8.0.0]$ mkdir grid

### Untar to GRID_HOME – /u01/app/oracle/19.8.0.0/grid

[grid@db-asm-1 19.8.0.0]$ cd grid
[grid@db-asm-1 grid]$ time tar xf ../grid19.8_clone.tar . ; echo $?

### Upgrade 12.2 Oracle HAS to 19.8 using 19.8 tarball (GUI Method):

[grid@db-asm-1 grid]$ ./gridSetup.sh

### Verify Oracle Restart is 19.8:

[grid@db-asm-1 grid]$ $ORACLE_HOME/OPatch/opatch lspatches
31335188;TOMCAT RELEASE UPDATE 19.0.0.0.0 (31335188)
31305087;OCW RELEASE UPDATE 19.8.0.0.0 (31305087)
31304218;ACFS RELEASE UPDATE 19.8.0.0.0 (31304218)
31281355;Database Release Update : 19.8.0.0.200714 (31281355)
OPatch succeeded.
[grid@db-asm-1 grid]$

In conclusion, there is an option to use tarball if CreateGoldImage does not work; thereby avoiding creating a ticket with Oracle support.

Please note: I am unaware whether using tarball is supported — it would be best to check with Oracle support.

Categories: DBA Blogs

Mining Statspack and AWR Repositories for Historical Trends

Fri, 2020-08-07 16:43

There are various ways that we can approach the analysis of a past database performance problem. The initial steps usually differ based on their scope. Is it limited to a certain SQL/process/task, or is it perceived as a database-wide slowdown? Additionally, is it occurring at the moment, or is it an event that occurred in the past?

In case the starting scope of analysis is database-wide, mining the Automatic Workload Repository (AWR) is one of the methods we can use to obtain historical performance trends. However, not all customers have access to it, either because it requires the Diagnostics Pack license on Enterprise Edition, or because they are running Oracle Standard Edition, where it’s not present.

In such cases, we can still use Statspack as a free alternative to AWR, even though it’s not as sophisticated. One of Statspack’s shortcomings is it doesn’t store Active Session History data, which we can use to drill-down into the activity of particular sessions over time. With Statspack we’re missing this session-level granularity.

In this post, I’m going to present a script I use to get an overview of the workload dynamics of a database querying the Statspack repository. There’s also an AWR counterpart, as I’ll mention later in the post.

statspack_load_trends.sql 1. Script’s Properties

Let’s summarize the script’s main properties:

  1. It queries the Statspack repository directly and doesn’t create any (temporary) objects, nor relies on any pre-created Statspack reports.
  2. We can use it to analyze a Statspack repository imported from another DB, containing data with a different DBID as the DB hosting the repository.
  3. We can analyze a period spanning instance restart(s). The script considers only adjacent snapshot pairs having the same startup_time value. In this case, “adjacent” denotes two snapshots which belong to the same [DBID, instance number] pair, and which SNAP_IDs are closest to each other when ordered. That’s just to emphasize that the difference between two consecutive SNAP_IDs is not always one (think RAC with cached sequence values, an instance restart, or pressure on the shared pool).
2. Purpose

The script provides a quick high-level overview of the DB workload dynamics. It reports a combination of historical OS utilization statistics (stats$osstat), system time model statistics (stats$sys_time_model), and aggregated instance-level statistics (stats$sysstat) for a given period. Currently, it doesn’t query stats$system_event for wait event details. Several time-based metrics are presented in a form of Average Active Sessions (AAS), calculated by simply dividing the observed time-based metric by the elapsed time in the observed period.

You can download the script here, and its AWR counterpart is available here. Before running the AWR version, make sure the database has the Diagnostics Pack license. The following explanation applies to both scripts.

Note: I got the idea for this script from various AWR mining scripts in Chapter 5 (Sizing Exadata) from the “Oracle Exadata Recipes: A Problem-Solution Approach” book. Additionally, the idea to calculate and display CPU core-normalized values for some of the CPU usage statistics originates from John Beresniewicz’s AWR1page project.

3. Calculating CPU Capacity

The script considers the number of CPU cores, and not threads (in case hyperthreading is enabled) to calculate the number of CPU seconds between two Statspack snapshots. Various publications explain the reasons for this approach, but to summarize: Depending on the workload type, hyperthreading can provide up to approx. 30% higher CPU throughput compared to non-hyperthread mode.

When the number of processes running on CPU approach the number of CPU cores, the system might become CPU saturated/over-subscribed. At that point, its response time becomes unpredictable, and additional workload decreases the amount of useful work performed by the system.

4. Usage

The script produces a very wide output, so I recommend spooling it out to a file for easier viewing. Because Statspack tables have public synonyms, any user that has permission to select from the repository tables can run it.

Note: I’ve seen the script fail with errors like “ORA-00936: missing expression,” or “ORA-01722: invalid number” when used on databases running with cursor_sharing=FORCE. To avoid the error, I included the /*+ cursor_sharing_exact*/ hint in the script’s SELECT statement. Setting cursor_sharing=EXACT at the session-level is also a valid alternative.

SQL> spool load_trends.txt
SQL> @statspack_load_trends.sql

First, we provide the DBID and instance number we want to analyze. If we don’t provide an instance number, all of the instances for the provided DBID are considered:

Instances in this Statspack schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   DB Id   |Inst Num|DB Name     |Instance    |Host
-----------|--------|------------|------------|-------------
 1558102526|       1|ORCL        |orcl1       |ol7-122-rac1
 1558102526|       2|ORCL        |orcl2       |ol7-122-rac2


Enter DBID to analyze - default "1558102526" :   /* enter DBID to analyze */
Enter instance number or "all" to analyze all instancs for DBID = 1558102526 - default "all" : /* report data for a specific RAC instance or all of them */

Next, we provide a time range to analyze.

5. Sample Output

Let’s check what the output looks like. Due to its width, and to fit the blog format, it’s segmented in several sections. Also, due to some (temporary) blog formatting limitations, I recommend viewing wide code sections by clicking “Open code in new window.”

  • “CPU #Cores;#Threads”: The number of CPU cores and threads (in case of enabled hyperthreading) reported by the OS.
  • “Tot CPU Time Avail [Cores] (s)”: The number of CPU seconds available between the two analyzed snapshots based on the number of CPU cores.
Instance|Snap ID        |Begin Snap     |End Snap       |Elapsed|            CPU|     Tot CPU Time|
  Number|start-end      |Time           |Time           |   Mins|#Cores;#Threads|Avail [Cores] (s)|
--------|---------------|---------------|---------------|-------|---------------|-----------------|
       1|19195-19196    |16-APR-20 17:00|16-APR-20 18:00|  59.98|         24; 48|        86,376.00|
        |19196-19197    |16-APR-20 18:00|16-APR-20 19:00|  60.00|         24; 48|        86,400.00|
        |19197-19198    |16-APR-20 19:00|16-APR-20 20:00|  60.00|         24; 48|        86,400.00|
        |19198-19199    |16-APR-20 20:00|16-APR-20 21:00|  60.02|         24; 48|        86,424.00|
        |19199-19200    |16-APR-20 21:00|16-APR-20 22:00|  59.98|         24; 48|        86,376.00|

Note: One hour between snapshots is probably excessive, but that’s beyond the scope of this post.

Time Model Statistics: stats$sys_time_model

The next section reports the time spent in the database in the form of Average Active Sessions (AAS). For completeness, and to better understand what the figures represent, I’m including how various statistics from stats$sys_time_model are related:

                DB time = CPU + Wait time spent by foreground sessions
background elapsed time = CPU + Wait time spent by background sessions
                 DB CPU = CPU time spent by foreground sessions
    background cpu time = CPU time spent by background sessions

Considering the above, we can calculate the AAS figures as follows:

       Total AAS = ("DB time" + "background elapsed time")/elapsed_time
Total AAS on CPU = ("DB CPU" + "background cpu time")/elapsed_time

FG AAS = "DB time" / elapsed_time
BG AAS = "background elapsed time" / elapsed_time

FG AAS on CPU = "DB CPU" / elapsed_time
BG AAS on CPU = "background cpu time" / elapsed_time

Total AAS in wait = ("Total AAS" - "Total AAS on CPU") / elapsed_time

FG AAS in wait = ("DB time" - "DB CPU") / elapsed_time
BG AAS in wait = ("background elapsed time" - "background cpu time") / elapsed_time

Columns reporting CPU-related figures display two values: The “usual” AAS value, and the “core-normalized Average Active Sessions” value, using the acronym “NPC”. If the core-normalized value approaches (or even crosses) the value of “1,” the system could potentially be CPU saturated:

  • “AAS [FG+BG]”: The number of AAS considering foreground and background sessions.
  • “AAS [FG]”: The number of AAS considering foreground sessions only.
  • “AAS [BG]”: The number of AAS considering background sessions only.

Continuing with the CPU component of the above three mentioned metrics:

  • “AAS on CPU [FG+BG]”: The number of AAS on CPU considering foreground and background sessions, followed by core-normalized AAS on CPU (NPC).
  • “AAS on CPU [FG]”: Same as above, though only considering foreground sessions.
  • “AAS on CPU [BG]”: Same as above, though only considering background sessions.

The “AAS wait” figures follow the same principle as the above-mentioned columns, and “AAS RMAN CPU” represents AAS performing “RMAN CPU time (backup/restore)” activities.

Returning to our sample output:

      AAS|      AAS|      AAS|AAS on CPU| AAS on CPU|AAS on  |AAS on CPU|AAS on  |AAS on CPU| AAS wait| AAS wait| AAS wait|             |
  [FG+BG]|     [FG]|     [BG]|[FG+BG]   |[FG+BG] NPC|CPU [FG]|  [FG] NPC|CPU [BG]|[BG] NPC  |  [FG+BG]|     [FG]|     [BG]|AAS RMAN CPU |
---------|---------|---------|----------|-----------|--------|----------|--------|----------|---------|---------|---------|-------------|
     96.4|     94.3|      2.0|       8.5|        0.4|     8.3|       0.3|     0.2|       0.0|     87.9|     86.0|      1.9|          0.0|
     32.9|     31.6|      1.3|      10.3|        0.4|    10.1|       0.4|     0.2|       0.0|     22.5|     21.4|      1.1|          0.0|
     59.4|     58.9|      0.6|      23.3|        1.0|    23.2|       1.0|     0.1|       0.0|     36.2|     35.7|      0.5|          0.0|
     13.3|     12.9|      0.5|       5.8|        0.2|     5.7|       0.2|     0.1|       0.0|      7.5|      7.1|      0.4|          0.0|
     23.0|     22.2|      0.8|       6.0|        0.3|     5.9|       0.2|     0.1|       0.0|     17.0|     16.3|      0.7|          0.0|

The first line reports 94.3 foreground AAS, out of which only 8.3 were on CPU, and 86 were in various waits. Looking at the third line, the situation changes, as out of the 58.9 AAS, 23.3 were on CPU, and 35.7 in waits. Checking the per-CPU-core-normalized value, we see it reports 1, which means the machine might be approaching or has already crossed CPU saturation. We also see that there was no RMAN activity occurring during that time. Background processes also spent most of their time in waits, rather than on CPU.

For convenience, we have displayed the number of seconds consumed by foreground sessions, breaking them further down into CPU and wait components, and reporting the relative percentages. This is basically the same information we saw in the previous section, just expressed as time instead of AAS:

  DB Time (s)            DB CPU (s)                      |
[FG CPU+WAIT]  =           [FG CPU] +          [FG WAIT] |
---------------------------------------------------------|
    339,555.62 =      29,924.51  9% +     309,631.11 91% |
    113,683.70 =      36,469.52 32% +      77,214.18 68% |
    211,880.46 =      83,404.47 39% +     128,475.99 61% |
     46,325.13 =      20,692.78 45% +      25,632.35 55% |
     79,966.07 =      21,274.38 27% +      58,691.69 73% |
OS Statistics – stats$osstat

Next, OS statistics from stats$osstat are displayed. “Tot OS Load@end_snap” is the recorded OS load at the time of the end snapshot creation. The other four columns represent Average Active Processes (AAP), which is simply the measured time of each named statistic divided by elapsed time in the observed period. Similarly, as above, the normalized value per core is also reported here for the BUSY statistic (sum of USER+SYS). The meaning is the same; If the value approaches 1, the system might be CPU saturated.

In our sample report, the third line reports 23.9 processes on CPU, or 1 per CPU core (that’s considering all the OS processes, not only Oracle’s). That also correlates with the “AAS on CPU [FG+BG]” figure in the third line we saw in the above snippet. Because in this particular case the machine is dedicated to one Oracle instance, it used all of the available CPU:

       Tot OS|           |AAP OS  |   AAP OS|   AAP OS|   AAP OS|
Load@end_snap|AAP OS BUSY|BUSY NPC|     USER|      SYS|   IOWAIT|
-------------|-----------|--------|---------|---------|---------|
         37.0|        9.2|     0.4|      8.2|      0.9|     12.1|
         76.8|       10.8|     0.4|      9.8|      0.9|      5.3|
          9.4|       23.9|     1.0|     22.9|      0.9|      2.3|
          4.3|        6.2|     0.3|      5.7|      0.5|      1.4|
          4.8|        6.4|     0.3|      5.6|      0.7|      4.7|
System Statistics: stats$sysstat

Finally, stats$sysstat reports various system statistics. I won’t describe their meaning because that’s beyond the scope of this post. It’s worth noting that apart from “Logons Current,” almost all other statistics are expressed in units of work per second. The only exceptions are statistics related to parallel operations. Because their usage usually pertains to “heavy-duty” DDL/DML tasks, we don’t expect to see many such operations per second. Thus, the whole snapshot interval seems a more appropriate time-frame to report the number of occurrences of such events.

    Logons|        |      User|            |SQL*Net roundtrips|SQL*Net roundtrips|   Bytes received via|     Bytes sent via|   Bytes received via|
   Current|Logons/s|   calls/s|  Executes/s|  to/from client/s|  to/from dblink/s|SQL*Net from client/s|SQL*Net to client/s|SQL*Net from dblink/s|
----------|--------|----------|------------|------------------|------------------|---------------------|-------------------|---------------------|
     556.0|     9.5|     872.9|       692.2|             723.0|           4,575.6|          1,846,238.6|        6,305,967.2|          1,177,004.3|
     527.0|    16.2|   1,008.0|       639.4|             828.5|           5,773.2|          2,462,067.1|        7,760,807.5|          1,453,024.0|
     607.0|    18.5|     738.8|       588.3|             556.1|           5,618.1|          1,986,647.1|        3,644,026.9|          1,448,627.4|
     427.0|     9.2|     873.3|       910.0|             716.4|           5,972.3|          2,691,244.6|        4,067,039.1|          1,532,389.7|
     418.0|     7.4|     719.9|       627.8|             588.5|           7,471.6|          2,564,916.7|        3,773,344.1|          1,852,806.9|

   
     Bytes sent via|Cluster wait|Session logical|      DB block|    Consistent| Consistent reads |  Physical|  Physical|Physical read|Physical write|
SQL*Net to dblink/s|      time/s|        reads/s|     changes/s|   changes/sec|undo rec applied/s|   reads/s|  writes/s|IO requests/s| IO requests/s|
-------------------|------------|---------------|--------------|--------------|------------------|----------|----------|-------------|--------------|
          576,510.8|         0.0|      339,009.8|      31,353.8|       3,062.0|           4,002.4|  47,349.4|   1,879.1|      2,621.5|         448.5|
          726,935.7|         0.0|      487,469.9|      48,874.4|         487.3|             563.9|  31,277.7|   2,127.6|      5,021.6|         526.8|
          707,648.6|         0.0|      343,665.8|      38,862.0|         379.4|             362.9|  37,057.7|     777.7|      1,949.2|         265.2|
          751,698.7|         0.0|      288,724.2|      26,163.7|         618.0|             435.8|  14,001.6|     823.3|        828.5|         274.1|
          940,096.3|         0.0|      335,631.4|      24,500.0|         198.5|             211.9|  53,625.8|     638.2|      2,451.6|         227.5|
     


    Parses|      Hard|     Parse|     Parse|      User|       User|     Redo size|          Redo|      Rollback changes|
   total/s|  parses/s|describe/s|failures/s| commits/s|rollbacks/s|       bytes/s|      writes/s|undo records applied/s|
----------|----------|----------|----------|----------|-----------|--------------|--------------|----------------------|
     142.0|     103.9|       0.0|       0.0|     158.2|        0.8|  46,951,095.1|         137.8|                   0.1|
     143.4|     100.1|       0.0|       0.3|     155.0|        0.8|  49,017,168.1|         170.3|                   1.2|
     135.9|      89.2|       0.0|       0.1|     143.3|        0.8|  11,513,858.2|         149.7|                   0.1|
     141.1|     109.8|       0.0|       0.0|     284.4|        0.8|   9,513,089.1|         226.2|                   0.1|
     123.0|      93.4|       0.0|       0.0|     175.5|        0.9|   7,462,206.6|         169.3|                   0.3|
     

     
           Queries|    DML statements|     PX oper not|PX oper downgraded|PX oper downgraded|PX oper downgraded|PX oper downgraded|PX oper downgraded	   
parallelized/Total|parallelized/Total|downgraded/Total|   to serial/Total|75 to 99 pct/Total|50 to 75 pct/Total|25 to 50 pct/Total| 1 to 25 pct/Total	   
------------------|------------------|----------------|------------------|------------------|------------------|------------------|------------------	   
           1,912.0|               3.0|         1,989.0|              38.0|               0.0|               1.0|               4.0|               1.0	   
           2,450.0|               6.0|         2,551.0|              10.0|               0.0|               0.0|               0.0|               0.0	   
           2,477.0|              13.0|         2,584.0|               9.0|               0.0|               0.0|               0.0|               0.0	   
           1,553.0|               3.0|         1,646.0|               9.0|               0.0|               0.0|               0.0|               0.0	   
           1,390.0|               2.0|         1,487.0|               8.0|               0.0|               0.0|               0.0|               0.0	   
6. Visualizing Results

When comparing a period where there was an issue to one where the database was running fine, or also when just checking for trends, it’s more convenient to plot the results. That’s an easy way to get an overview of how certain metrics changed over time, or how do they compare across nodes on a RAC database.

To ease that task, the two scripts contain two ways of formatting columns: One for plotting/charting purposes, and one displaying column headings on two lines for a more user-friendly format (the format used in the above descriptions). Based on the needs, the appropriate block of column formatting commands has to be uncommented in the script. You can plot results using a third-party utility, such as Tableau, which is used to produce graphs in the following sections.

CPU Usage Distribution and Limits Across Nodes

The graph below represents the average number of OS processes on CPU, normalized to the CPU core count for each of the three nodes on a RAC system. As noted above, when the normalized value per core crosses the value of “1,” the host might be oversubscribed on CPU. Nodes 2 and 3 are usually below the value of 1. However, spikes in usage on node 1 might require further investigation. Also, there seems to be an imbalance in CPU usage across nodes:Number of OS processes on CPU, normalized to the CPU core count

Breakdown of Foreground and Background Sessions on CPU and Wait Components, Expressed as AAS

When investigating a problematic time period, we can quickly get a high-level overview of the relation between CPU and waits experienced by foreground and background sessions:

Average Active Sessions breakdown to foreground and background CPU and wait components Observing Waits After Applying a Fix to Reduce Them

After applying “a fix” on April 16th, the time spent waiting by foreground sessions decreased substantially. CPU demand also decreased. Foreground CPU vs waits

 

Comparing the Number of AAS on CPU Normalized to CPU Core Count at the OS and DB Level on a Three-Node RAC DB

The observed DB isn’t using all/most of the CPU available on the nodes, and there’s “something else” using it at the OS level. That’s visible on the graph for node 1, between May 6th at 7-9 PM, where CPU usage at the OS level increased, but that was not the case for DB sessions. Additionally, because we have the normalized per CPU core value displayed, we can see that node 1 crosses the value of 1 quite often.Comparison of Average Active Sessions between OS and DB level

Logons per Second and Number of Logged-in Sessions per Node

RAC nodes 1 and 2 already have a relatively high number of Logons per second at ~5, whereas node 3 has this even higher at ~23. Additionally, there’s a large imbalance in the number of established sessions on node 3 compared to nodes 1 and 2. Because each node has 8 physical cores (not visible from the below graphs), the ~2500 established sessions represent a potential risk should too many of them try to become active at the same time. Overall it seems a connection pooling review is in place for this system.Session trends

A final example from a different server, but still related to connection management. Detecting a large drop in the number of established sessions and correlating with the number of logins per second:Session trends

7. Other Mining Tools

Some time ago, Maris Elsins published a post describing a set of handy scripts to mine the AWR repository. Make sure to check it out!

To conclude, here’s a list of some free mining utilities. These are mostly created for AWR, but some are for Statspack. Some of them parse a pre-created set of AWR/Statspack reports. Others connect directly to the database and extract/analyze data from there. The script presented in this post might not offer the same functionality as those utilities. However, for some of my use-cases, it complemented them by providing a customized set of pre-calculated workload related figures.

Categories: DBA Blogs

Query Statspack’s “SQL ordered by” sections over a time period

Fri, 2020-07-31 09:26

In my previous blog post <link>, I presented the statspack_load_trends.sql script, which provides a high-level overview of workload dynamics on DB instance(s) over time. In this post, I’ll present the statspack_top_sqls.sql script, which returns SQL performance statistics by mining the Statspack repository. You can download it <here>.

The script provides the same SQLs with the same performance statistics as in Statspack’s various “SQL ordered by” sections. However, it does so by reporting the figures of all categories in a single line, and, more importantly, does it over a time range, considering each available consecutive pair of snapshots. Thus, it provides a way to quickly identify SQLs contributing the most to a particular “SQL ordered by” category over a time period. Alternatively, we can also check for specific SQL(s) behavior over time.

Its core logic is based on Oracle’s $ORACLE_HOME/rdbms/admin/sprepins.sql (StatsPack Report Instance) script, but it doesn’t rely on its existence to run. Similarly as for statspack_load_trends.sql, it queries the Statspack repository directly. It doesn’t create or use any (temporary) objects, not even the global temporary table that sprepins.sql uses when producing the report. We can also use it to analyze a repository imported from another DB and handles periods spanning instance restart(s).

Important note in case you want to compare the results with Statspack reports

Testing the script by comparing its output to regular Statspack reports (created by running $ORACLE_HOME/rdbms/admin/spreport.sql or sprepins.sql), I noticed that sometimes the numbers in the “SQL ordered by” sections didn’t match between the two. Examples include SQLs reported by my script, but not by Statspack reports. Or even Statspack reports reporting the same SQL (same hash value) multiple times in the same “SQL ordered by” section.
The root cause of those anomalies is described in the MOS note “Statspack Reports Show Different Results In “SQL ordered by …” When Taken At Different Times with Same Snapid (Doc ID 2258762.1)”: “When more than one statspack reports are taken repeatedly in the same session by non-perfstat user, some data might get mixed up in the temporary table used by the reporting script, and the result may get corrupted.”.

The problem was not connecting as the owner of the Statspack repository (usually PERFSTAT) when generating multiple consecutive snapshots looping over a snapshot range. The same was true also when creating a single Statpack report.
The takeaway is to always connect as the Statspack repository owner when running spreport.sql, especially if you use any helper scripts which generate Statspack reports for a series of snapshots.

Usage Starting the script

Let’s see the script in action analyzing a sample Swingbench run on a 2 node RAC database. The output is wide, so I suggest to spool it to a file for easier viewing/plotting:

SQL> spool top_sqls.txt
SQL> @statspack_top_sqls.sql

List SQL by [elapsed_time | cpu_time | buffer_gets | disk_reads | executions | parse_calls | max_sharable_mem | max_version_count | cluster_wait_time]:

Enter a value - default "elapsed_time" :

First, we specify by which category we want the SQLs to be ordered by. We can choose one of the above-listed possibilities, which are the same categories the “SQL ordered by” Statspack report’s sections displays. The script reports the same SQLs in the same order as they appear in the selected Statspack report category.

Suppose we want to order SQLs by “cpu_time”, and that the corresponding Statspack report lists 10 SQLs in the “SQL ordered by CPU” section. The script lists the same ones. However, the added benefit of the script is that it reports values, which the Statspack report doesn’t display. For example, the “SQL ordered by CPU” Statspack report section doesn’t display the “Physical Reads” statistic. Instead, the “SQL ordered by Elapsed time” section lists it. If a SQL isn’t qualified to display in the “SQL ordered by Elapsed time” section, we won’t get those values from the Statspack report.

Next, we provide the DBID and instance number we want to be analyzed. If we don’t provide an instance number, the script considers all which are present in the repository:

Instances in this Statspack schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   DB Id    Inst Num DB Name      Instance     Host
----------- -------- ------------ ------------ --------------------------------
 1558102526        1 ORCL         orcl1        ol7-122-rac1.localdomain
 1558102526        2 ORCL         orcl2        ol7-122-rac2.localdomain


Enter DBID to analyze - default "1558102526" :
Enter instance number or "all" to analyze all instancs for DBID = 1558102526 - default "all" :

Finally, we specify the time range we’d like to analyze:

Enter begin time for report [DD-MON-YYYY HH24:MI] - default "30-APR-2020 10:54" : 15-FEB-2020 12:30
Enter end time for report [DD-MON-YYYY HH24:MI] - default "30-APR-2020 22:54" : 15-FEB-2020 13:00
Script output

Let’s check what the script output looks like. Because of the output width, I have broken it up in several sections. The whole file containing the spool, and the respective Statspack reports for comparison, can be obtained here .

Since the original spool file contains 280 lines, I’m showing an abbreviated version of the first 8 columns. The full output lists SQLs between each snapshot. Since we didn’t specify which instance number we’re interested in, both instances are considered. Additionally, we see it’s normal and expected, that consecutive snapshots don’t have consecutive numbers. Snapshots 4 and 21 are two consecutive snapshots on instance 1, as we can confirm by checking their snap time. The other columns are self-explanatory:

INSTANCE_NUMBER  B_SNAP_ID  E_SNAP_ID B_SNAP_TIME        E_SNAP_TIME        INTERVAL_MIN  DBTIMEMIN        AAS
--------------- ---------- ---------- ------------------ ------------------ ------------ ---------- ----------
1          1          2 15-FEB-20 12:30:00 15-FEB-20 12:40:00           10       2.30       0.23
&lt;removed 16 lines listing SQLs&gt;
2          3 15-FEB-20 12:40:00 15-FEB-20 12:50:00           10       1.60       0.16
&lt;removed 25 lines listing SQLs&gt;
3          4 15-FEB-20 12:50:00 15-FEB-20 13:00:00           10       1.55       0.15
&lt;etc&gt;
4         21 15-FEB-20 13:00:00 15-FEB-20 13:10:00           10       1.66       0.17

21 22 15-FEB-20 13:10:00 15-FEB-20 13:20:00 10 1.30 0.13

22 23 15-FEB-20 13:20:00 15-FEB-20 13:30:00 10 1.18 0.12

2 11 12 15-FEB-20 12:30:00 15-FEB-20 12:40:00 10 3.81 0.38

12 13 15-FEB-20 12:40:00 15-FEB-20 12:50:00 10 2.70 0.27

13 14 15-FEB-20 12:50:00 15-FEB-20 13:00:00 10 2.50 0.25

14 15 15-FEB-20 13:00:00 15-FEB-20 13:10:00 10 2.94 0.29

15 16 15-FEB-20 13:10:00 15-FEB-20 13:20:00 10 2.18 0.22

16 17 15-FEB-20 13:20:00 15-FEB-20 13:30:00 10 1.98 0.20

 

Let’s check an excerpt of the output for snapshots 1-2 and 2-3. Apart from the “HV” column (SQL old hash value), the other columns are self-explanatory. For blog post brevity, I’m showing only the first 10 SQLs per snapshot pair.

B_SNAP_ID  E_SNAP_ID         HV ELAPSED_TIME_SEC EXECUTIONS ELAPSED_PER_EXEC_SEC PERCENT_OF_DBTIME_USED CPU_TIME_SEC CPU_TIME_MS_PER_EXEC
--------- ---------- ---------- ---------------- ---------- -------------------- ---------------------- ------------ --------------------
        1          2 3565022785            80,55        483                  ,17                  58,47        14,63                 30,3
                     2319948924             55,8       5931                  ,01                  40,51         7,69                  1,3
                     1852190137            14,22       1024                  ,01                  10,32         7,75                 7,57
                     1113394757             8,17      12332                    0                   5,93         2,97                  ,24
                     4194254847              6,4        483                  ,01                   4,64          ,84                 1,73
                     1283549268             4,55        169                  ,03                    3,3          ,89                 5,28
                     2588369535             4,21         24                  ,18                   3,06         1,12                46,55
                     4212635381             4,18         24                  ,17                   3,04         1,09                45,39
                     4219272024             3,97       1396                    0                   2,88          ,86                  ,62
                     2835506982             3,74        173                  ,02                   2,71          ,57                 3,32
                     (..)                                                                                                                                         
        2          3 3565022785            46,93        956                  ,05                  48,79        18,87                19,73
                     2319948924            22,85      11550                    0                  23,75         7,52                  ,65
                     1852190137            15,35       2158                  ,01                  15,95        11,98                 5,55
                     1283549268             6,36        380                  ,02                   6,61         1,65                 4,33
                     2835506982                6        377                  ,02                   6,24         1,03                 2,72
                     1822227481             5,32       7742                    0                   5,53         1,26                  ,16
                     4194254847             4,69        957                    0                   4,87         1,22                 1,28
                     3463613875             4,61        380                  ,01                   4,79          ,62                 1,62
                     1113394757             4,07      25794                    0                   4,23         3,28                  ,13
                     4219272024             3,89       2945                    0                   4,04         1,43                  ,49
                     (..)
B_SNAP_ID  E_SNAP_ID         HV PHYSICAL_READS PHYSICAL_READS_PER_EXECUTION BUFFER_GETS GETS_PER_EXECUTION ROWS_PROCESSED ROWS_PROCESSED_PER_EXECUTION PARSE_CALLS
--------- ---------- ---------- -------------- ---------------------------- ----------- ------------------ -------------- ---------------------------- -----------
        1          2 3565022785           5860                        12,13      261329             541,05            483                            1         483
                     2319948924           4614                          ,78      205925              34,72          26467                         4,46           1
                     1852190137            394                          ,38      131100             128,03           1024                            1        1025
                     1113394757            336                          ,03      124291              10,08         172648                           14           1
                     4194254847            396                          ,82        7760              16,07            483                            1           1
                     1283549268            262                         1,55        6188              36,62            169                            1         169
                     2588369535             76                         3,17       13104                546             24                            1          24
                     4212635381             76                         3,17       13104                546            737                        30,71           1
                     4219272024            167                          ,12       19979              14,31           1396                            1           1
                     2835506982            255                         1,47        3579              20,69            173                            1         173
                                                                                                                                                                  
        2          3 3565022785           1138                         1,19      511742             535,29            956                            1         957
                     2319948924            487                          ,04      402425              34,84          51879                         4,49           0
                     1852190137            164                          ,08      274493              127,2           2158                            1        2157
                     1283549268            303                           ,8       13726              36,12            380                            1         380
                     2835506982            448                         1,19        7218              19,15            377                            1         377
                     1822227481            259                          ,03       23226                  3           7742                            1           0
                     4194254847            150                          ,16       14371              15,02            957                            1           0
                     3463613875            298                          ,78        5844              15,38            380                            1           0
                     1113394757              0                            0      260084              10,08         361116                           14           0
                     4219272024             76                          ,03       42277              14,36           2945                            1           0
B_SNAP_ID  E_SNAP_ID         HV MAX_SHARABLE_MEM_KB LAST_SHARABLE_MEM_KB MAX_VERSION_COUNT LAST_VERSION_COUNT DELTA_VERSION_COUNT CLUSTER_WAIT_TIME_SEC
--------- ---------- ---------- ------------------- -------------------- ----------------- ------------------ ------------------- ---------------------
        1          2 3565022785               55,39                55,39                 1                  1                   1                 13,67
                     2319948924               47,44                47,44                 1                  1                   1                  9,38
                     1852190137               55,38                55,38                 1                  1                   1                  2,65
                     1113394757               43,41                43,41                 1                  1                   1                  2,17
                     4194254847               47,64                47,64                 1                  1                   1                  1,26
                     1283549268               59,46                59,46                 1                  1                   1                   ,83
                     2588369535               55,38                55,38                 1                  1                   1                  2,21
                     4212635381               35,51                35,51                 1                  1                   1                  2,21
                     4219272024               27,42                27,42                 1                  1                   1                  1,15
                     2835506982               55,38                55,38                 1                  1                   1                   ,39
                                                                                                                                                       
        2          3 3565022785              113,27               113,27                 2                  2                   1                 15,91
                     2319948924               47,44                47,44                 1                  1                   0                     9
                     1852190137              113,25               113,25                 2                  2                   1                  1,32
                     1283549268              121,42               121,42                 2                  2                   1                  1,42
                     2835506982              113,27               113,27                 2                  2                   1                   ,72
                     1822227481                31,5                 31,5                 1                  1                   0                   1,5
                     4194254847               47,64                47,64                 1                  1                   0                  1,76
                     3463613875               59,63                59,63                 1                  1                   0                   ,81
                     1113394757               43,41                43,41                 1                  1                   0                   ,41
                     4219272024               27,42                27,42                 1                  1                   0                  1,77
B_SNAP_ID  E_SNAP_ID         HV CWT_PERCENT_OF_ELAPSED_TIME AVG_HARD_PARSE_TIME_MS MODULE                      SQL_TEXT
--------- ---------- ---------- --------------------------- ---------------------- --------------------------- ----------------------------------------------------------------
        1          2 3565022785                       16,97                        JDBC Thin Client            BEGIN :1 := orderentry.neworder(:2 ,:3 ,:4 ); END;
                     2319948924                        16,8                        New Order                   SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C
                     1852190137                       18,61                                                    BEGIN :1 := orderentry.browseproducts(:2 ,:3 ,:4 ); END;
                     1113394757                       26,55                        Browse Products             SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C
                     4194254847                       19,72                        New Order                   INSERT INTO ORDERS ( ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_
                     1283549268                       18,16                                                    BEGIN :1 := orderentry.newcustomer(:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,:
                     2588369535                       52,52                                                    BEGIN :1 := orderentry.SalesRepsQuery(:2 ,:3 ,:4 ); END;
                     4212635381                        52,9                        Sales Rep Query             SELECT TT.ORDER_TOTAL, TT.SALES_REP_ID, TT.ORDER_DATE, CUSTOMERS
                     4219272024                       28,85                        New Order                   INSERT INTO ORDER_ITEMS ( ORDER_ID, LINE_ITEM_ID, PRODUCT_ID, UN
                     2835506982                       10,53                                                    BEGIN :1 := orderentry.browseandupdateorders(:2 ,:3 ,:4 ); END;
                                                                                                              
        2          3 3565022785                       33,89                        JDBC Thin Client            BEGIN :1 := orderentry.neworder(:2 ,:3 ,:4 ); END;
                     2319948924                       39,39                        New Order                   SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C
                     1852190137                        8,63                                                    BEGIN :1 := orderentry.browseproducts(:2 ,:3 ,:4 ); END;
                     1283549268                       22,26                                                    BEGIN :1 := orderentry.newcustomer(:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,:
                     2835506982                       12,06                                                    BEGIN :1 := orderentry.browseandupdateorders(:2 ,:3 ,:4 ); END;
                     1822227481                       28,14                        New Order                   SELECT CUSTOMER_ID, CUST_FIRST_NAME, CUST_LAST_NAME, NLS_LANGUAG
                     4194254847                       37,54                        New Order                   INSERT INTO ORDERS ( ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_
                     3463613875                       17,49                        Update Customer Details     INSERT INTO CUSTOMERS ( CUSTOMER_ID , CUST_FIRST_NAME , CUST_LAS
                     1113394757                        9,98                        Browse Products             SELECT PRODUCTS.PRODUCT_ID, PRODUCT_NAME, PRODUCT_DESCRIPTION, C
                     4219272024                       45,53                        New Order                   INSERT INTO ORDER_ITEMS ( ORDER_ID, LINE_ITEM_ID, PRODUCT_ID, UN

 

Final note

Oracle’s sprepins.sql script has a /*+ first_rows */ hint in the inline view containing analytical functions used to compute the current and previous row values from the stats$sql_summary table. The hint is present, but not enabled in statspack_top_sqls.sql. If due to some reason you need to re-enable it, just search for and enable it in the script (or use it’s alternative first_rows(0) ).

Categories: DBA Blogs

How to Manage the ORA-02292 Error Through the SQLCODE Function

Sat, 2020-07-25 19:16

ORA-02292 is a known issue caused by existing foreign key constraints. This is caused when a record is being deleted from the parent table and its child table has records linking to the parent table.

02292, 00000,"integrity constraint (%s.%s) violated - child record found"
// *Cause: attempted to delete a parent key value that had a foreign
//      key dependency.
// *Action: delete dependencies first then parent or disable constraint.

When you remove the child row(s), the issue is fixed and you can successfully remove the parent row.

One of our clients is experiencing this error when trying to run a procedure through a scheduled job. Intermittently, the first run of the scheduled job fails with ORA-02292: “Integrity constraint (%s.%s) violated – child record found.” However, subsequent job execution succeeds.

There are two ways to fix this issue. The first one is to modify the procedure code to remove child row(s) and then remove the parent row. The second is to disable alerts written to the alert log when the job fails the first time. You can do this with the SQLCODE function.

The SQLCODE function returns the code of the most recent exception, and based on that code, the procedure can handle the error and avoid unnecessary alerts. If the same job fails more than twice, the error is recorded in the alert log and an alert is generated (through the monitor tool).

The code below shows you how to handle the ORA-02292 error.

  1. Create parent and child tables with data:
--Parent table
create table orders (order_id int, total_sales int);
CREATE UNIQUE INDEX orders_pk on orders (order_id);
ALTER TABLE orders ADD CONSTRAINT orders_pk PRIMARY KEY (order_id) ENABLE;

--Child table
CREATE TABLE products
( product_id numeric(10) not null,
supplier_id numeric(10) not null,
order_id int,
CONSTRAINT fk_orders
FOREIGN KEY (order_id)
REFERENCES orders(order_id)
);

insert into orders values (1,1);
insert into products values (1,1,1);
commit;
select * from orders;
select * from products;

2. Create a table to capture how many times ORA-02292 errors occurred:

CREATE TABLE tcount (x int);

3. Add the code in the procedure which deletes rows from the parent table, and also ignores ORA-02292 for the first time:

CREATE OR REPLACE PROCEDURE delete_order
(order_id_in IN NUMBER)
IS
vtest2292 int := 0;
BEGIN
delete from orders where order_id = 1;
commit;
-- if the delete instruction runs fine the tcount table is truncated
EXECUTE IMMEDIATE ('truncate table tcount'); 

EXCEPTION
WHEN OTHERS THEN

DECLARE
error_code NUMBER := SQLCODE;
BEGIN

IF error_code = -2292 THEN
null;
DBMS_OUTPUT.PUT_LINE('ERROR 2292!!!!!!!!!!!!!'); -- error found
insert into tcount values (1);
commit;
select count(1) into vtest2292 from tcount;

-- two or more consecutive errors found
IF vtest2292 >= 2 then
DBMS_OUTPUT.PUT_LINE('ERROR 2292 >= X2!!!!!!!!!!!!!!'); 
raise_application_error (-20001,'Attempted to delete a parent key value that had a foreign key dependency');
END IF;

ELSE
raise_application_error (-20002,'An ERROR has occurred deleting an order.');
END IF;
END;
END;
/

4. If the error occurs more than twice, consecutively, an alert appears in the alert log:

BEGIN
DBMS_SCHEDULER.create_job (
job_name => 'job_delete_order',
job_type => 'PLSQL_BLOCK',
job_action => 'begin delete_order(1); end;',
start_date => SYSTIMESTAMP,
repeat_interval => 'FREQ=HOURLY;BYMINUTE=0; interval=1;',
enabled => TRUE);
END;
/

5. By running the job job_delete_order, you can confirm that no errors are written in the Oracle Alert log after the execution of the first job.

set serveroutput on
exec delete_order(1);
select count(1) from tcount;
exec dbms_scheduler.run_job('job_delete_order',FALSE);
select count(1) from tcount;
Categories: DBA Blogs

The Brand New Exadata X8M Deployment Process Revealed

Sun, 2020-07-05 09:41

Here we will see how the deployment process of the new Exadata X8M works.

 

RoCE issues from the factory

Exadata X8M servers are coming from the factory with the RoCE private network disabled. In case the Field Engineer assigned to work on the physical setup of the Exadata did not enable the RoCE network it is your job to do so.

RoCE network must be enabled on all Compute Nodes and also on all Storage Servers.

In Exadata X8M the private network is not on InfiniBand switches anymore, but on RoCE (RDMA over Converged Ethernet) Fabric switches. The interface cards we see in the operating system are re0 and re1.

When checking the active interface cards we cannot see re0 and re1:

[root@ex03db01 ~]# ifconfig
bondeth0: flags=5187<up,broadcast,running,master,multicast>  mtu 1500
        inet 10.201.80.54  netmask 255.255.254.0  broadcast 10.201.81.255
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 54309  bytes 3744342 (3.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14088  bytes 1318384 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        inet 10.201.84.190  netmask 255.255.254.0  broadcast 10.201.85.255
        ether 00:10:e0:ee:c5:6c  txqueuelen 1000  (Ethernet)
        RX packets 279171  bytes 18019054 (17.1 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 9553  bytes 1693920 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x9ca00000-9cafffff
 
eth3: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 31847  bytes 2396622 (2.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14088  bytes 1318384 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth4: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 22492  bytes 1349520 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 104 (104.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
lo: flags=73<up,loopback,running>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 136405  bytes 6139347 (5.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 136405  bytes 6139347 (5.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Most of the InfiniBand related commands/tools do not work anymore, but ibstat still does, so we can use that tool to check the state of the private network state:

[root@ex03db01 ~]# ibstat | grep -i 'state\|rate'
                State: Down
                Physical state: Disabled
                Rate: 100
                State: Down
                Physical state: Disabled
                Rate: 100

Checking the config of RoCE interface cards:

[root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re0
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=re0
BOOTPROTO=none
ONBOOT=no
HOTPLUG=no
IPV6INIT=no
 
[root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re1
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=re1
BOOTPROTO=none
ONBOOT=no
HOTPLUG=no
IPV6INIT=no

Bringing RoCE interface cards up:

[root@ex03db01 ~]# ifup re0
/sbin/ifup-local: /sbin/ifup-local re0:
/sbin/ifup-local:  + RoCE configuration...
/sbin/ifup-local:  + Matched (wildcard) interface re0.
/sbin/ifup-local:  + RoCE Configuration: /bin/roce_config -i re0...
 
NETDEV=re0; IBDEV=mlx5_0; PORT=1
 + RoCE v2 is set as default rdma_cm preference
 + Tos mapping is set
 + Default roce tos is set to 32
 + Trust mode is set to dscp
 + PFC is configured as 0,1,1,1,1,1,0,0
 + Congestion control algo/mask are set as expected
 + Buffers are configured as 32768,229120,0,0,0,0,0,0
 
Finished configuring "re0" ã½(â¢â¿â¢)ã
 
/sbin/ifup-local:  + Non-RoCE Configuration...
/sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re0.
 
 
[root@ex03db01 ~]# ifup re1
/sbin/ifup-local: /sbin/ifup-local re1:
/sbin/ifup-local:  + RoCE configuration...
/sbin/ifup-local:  + Matched (wildcard) interface re1.
/sbin/ifup-local:  + RoCE Configuration: /bin/roce_config -i re1...
 
NETDEV=re1; IBDEV=mlx5_0; PORT=2
 + RoCE v2 is set as default rdma_cm preference
 + Tos mapping is set
 + Default roce tos is set to 32
 + Trust mode is set to dscp
 + PFC is configured as 0,1,1,1,1,1,0,0
 + Congestion control algo/mask are set as expected
 + Buffers are configured as 32768,229120,0,0,0,0,0,0
 
Finished configuring "re1" ã½(â¢â¿â¢)ã
 
/sbin/ifup-local:  + Non-RoCE Configuration...
/sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re1.

Now we can see that the interfaces re0 and re1 are up, but with no IPs assigned:

[root@ex03db01 ~]# ifconfig
bondeth0: flags=5187<up,broadcast,running,master,multicast>  mtu 1500
        inet 10.201.80.54  netmask 255.255.254.0  broadcast 10.201.81.255
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 54533  bytes 3767354 (3.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14414  bytes 1349944 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        inet 10.201.84.190  netmask 255.255.254.0  broadcast 10.201.85.255
        ether 00:10:e0:ee:c5:6c  txqueuelen 1000  (Ethernet)
        RX packets 279584  bytes 18051211 (17.2 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 9727  bytes 1720009 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x9ca00000-9cafffff
 
eth3: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 32071  bytes 2419634 (2.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14414  bytes 1349944 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth4: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 22492  bytes 1349520 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 104 (104.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
lo: flags=73<up,loopback,running>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 136804  bytes 6157123 (5.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 136804  bytes 6157123 (5.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
re0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        ether 0c:42:a1:3b:45:12  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
re1: flags=4163<up,broadcast,running,multicast>  mtu 1500
        ether 0c:42:a1:3b:45:13  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

We can use ibstat again to confirm the interfaces are enabled:

[root@ex03db01 ~]# ibstat | grep -i 'state\|rate'
                State: Active
                Physical state: LinkUp
                Rate: 100
                State: Active
                Physical state: LinkUp
                Rate: 100
OEDA specifics

To start any Exadata deployment you need the OEDA configuration files. They are a set of files generated by the OEDA (Oracle Exadata Deployment Assistant) tool. OEDA tool is currently a web-based tool that will allow the client to fill up all the IP addresses and hostnames that the new Exadata will be assigned. Normally this step is taken by the client with the support of their network team.

Configuration files needed:

  • Clientname-clustername.xml
  • Clientname-clustername-InstallationTemplate.html
  • Clientname-clustername-preconf.csv

The OEDA tool for Linux is also needed and can be downloaded from the Patch ID 30640393. It is recommended to go with the latest version available, but if the configuration files were generated with a different/older version go with that version to avoid warnings during the execution of the onecommand.

Stage the OEDA for Linux in /u01/onecommand/ and unzip it:

[root@ex03db01 ~]# mkdir -p /u01/onecommand/
[root@ex03db01 ~]# unzip -q p30640393_193800_Linux-x86-64.zip -d /u01/onecommand/
[root@ex03db01 ~]# cd /u01/onecommand/linux-x64

Once in the correct directory run onecommand to list the steps just to make sure it is working:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -l
 Initializing
 
1. Validate Configuration File
2. Setup Required Files
3. Create Users
4. Setup Cell Connectivity
5. Verify Infiniband
6. Calibrate Cells
7. Create Cell Disks
8. Create Grid Disks
9. Install Cluster Software
10. Initialize Cluster Software
11. Install Database Software
12. Relink Database with RDS
13. Create ASM Diskgroups
14. Create Databases
15. Apply Security Fixes
16. Install Autonomous Health Framework
17. Create Installation Summary
18. Resecure Machine
applyElasticConfig.sh preparation and execution Technical background

applyElasticConfig.sh is a script, provided by Oracle within the OEDA, which performs the initial setup of the compute nodes and storage servers. That script works with the factory IP range and hostnames by default, but we found a way to trick it and make it work even when the client had already changed the IP addresses and hostnames. The initial setup is basically defining the network configuration, IP addresses, hostnames, DNS and NTP configuration and the script will look for nodes in the IP range of the 172.x.x.x network, so if the client had already changed the IPs and hostnames the script will not find anything. It is worth to mention that there is no documentation about this anywhere in the docs.oracle.com. You can find something here:

  • Configuring Oracle Exadata Database Machine
  • ApplyElasticConfig failed during the execution of elasticConfig.sh (Doc ID 2175587.1)
  • Bug 23064772 OEDA: applyelasticconfig.sh fails with error unable to locate rack item with ulocation

Even though these documents briefly mention the applyElasticConfig.sh script they do not mention how to overcome the issue when the IPs and hostnames were already changed.

Preparation

In order to make the script look for the servers when their hostnames and IPs were changed, you have to edit the es.properties file which is located under /u01/onecommand/linux-x64/properties. Consider changing only the parameters related to the IPs, Subnets, and Hostnames. The variables we care about are: ROCEELASTICNODEIPRANGE, ROCEELASTICILOMIPRANGE, ELASTICSUBNETS and SKIPHOSTNAMECHECK. Change those to the range of IPs found in the Clientname-clustername-InstallationTemplate.html for each network:

  • ROCEELASTICNODEIPRANGE expects the range of IPs in the management network.
  • ROCEELASTICILOMIPRANGE expects the range of IPs of the ILOM of the servers.
  • ELASTICSUBNETS expects the subnet of the management network.
  • SKIPHOSTNAMECHECK defaults to false, so if the hostnames were also changed you want to set this to true.

Find some examples below:

[root@ex03db01 linux-x64]# cat properties/es.properties|grep ELASTIC
#ROCEELASTICNODEIPRANGE=192.168.1.1:192.168.1.99
ROCEELASTICNODEIPRANGE=10.201.84.190:10.201.84.206
ROCEELASTICILOMIPRANGE=10.201.84.196:10.201.84.201
ELASTICCONFIGMARKERFILE=/.elasticConfig
ELASTICRACKNAMES=x5,x6,sl6,x7,x8
QINQELASTICCONFIGMINVERION=20.1.0.0.0.200323
#ELASTICSUBNETS=172.16.2:172.16.3:172.16.4:172.16.5:172.16.6:172.16.7
ELASTICSUBNETS=10.201.84
 
[root@ex03db01 linux-x64]# grep SKIPHOST properties/es.properties
#SKIPHOSTNAMECHECK=false
SKIPHOSTNAMECHECK=true
Execution

Now that you have the es.properties ELASTIC* parameters matching your infrastructure configuration you are ready to execute the applyElasticConfig.sh script. To execute it you just need to call the script passing the Clientname-clustername.xml configuration file to it:

[root@ex03db01 linux-x64]# ./applyElasticConfig.sh -cf /root/config/Client-ex03.xml
 Applying Elastic Config...
 Discovering pingable nodes in IP Range of 10.201.84.190 - 10.201.84.206.....
 Found 6 pingable hosts..[10.201.84.193, 10.201.84.194, 10.201.84.195, 10.201.84.190, 10.201.84.191, 10.201.84.192]
 Validating Hostnames..
 Discovering ILOM IP Addresses..
 Getting uLocations...
 Getting Mac Addressess..
 Getting uLocations...
 Mapping Machines with local hostnames..
 Mapping Machines with uLocations..
 Checking if Marker file exists..
 Updating machines with Mac Address for 6 valid machines.
 Creating preconf..
 Writing host-specific preconf files..
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel02_preconf.csv for ex03cel02 ....
 Preconf file copied to ex03cel02 as /var/log/exadatatmp/firstconf/ex03cel02_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db01_preconf.csv for ex03db01 ....
 Preconf file copied to ex03db01 as /var/log/exadatatmp/firstconf/ex03db01_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db03_preconf.csv for ex03db03 ....
 Preconf file copied to ex03db03 as /var/log/exadatatmp/firstconf/ex03db03_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel03_preconf.csv for ex03cel03 ....
 Preconf file copied to ex03cel03 as /var/log/exadatatmp/firstconf/ex03cel03_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel01_preconf.csv for ex03cel01 ....
 Preconf file copied to ex03cel01 as /var/log/exadatatmp/firstconf/ex03cel01_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db02_preconf.csv for ex03db02 ....
 Preconf file copied to ex03db02 as /var/log/exadatatmp/firstconf/ex03db02_preconf.csv
 Running Elastic Configuration on ex03cel02.client.com
 Running Elastic Configuration on ex03db01.client.com
 Running Elastic Configuration on ex03db03.client.com
 Running Elastic Configuration on ex03cel03.client.com
 Running Elastic Configuration on ex03cel01.client.com
 Running Elastic Configuration on ex03db02.client.com
 /////
OEDA onecommand preparation and execution Technical background

OEDA is a set of scripts, files, and a form we use to plan and deploy an Exadata. Sometimes we refer to it as the onecommand utility. It is called onecommand because with just one command we can deploy everything. This onecommand is the install.sh script.

Preparation

To be able to run the install.sh script we have to prepare some things first in the environment. Some prerequisites:

  • The switches must have been already set up by the Field Engineer responsible for the physical installation of the hardware.
  • The applyElasticConfig.sh script must have been run and completed successfully.
  • The files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html must be staged to /u01/onecommand/linux-x64/WorkDir.

Stage the files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html to /u01/onecommand/linux-x64/WorkDir:

[root@ex03db01 ~]# ls -lh /u01/onecommand/linux-x64/WorkDir
total X.9G
-rwxr-xr-x 1 root root 355M Jun  9 12:34 ahf_setup
-rw-r--r-- 1 root root 2.9G Jun  9 12:54 V982063-01.zip
-rw-r--r-- 1 root root 2.7G Jun  9 12:57 V982068-01.zip
-rw-r--r-- 1 root root 2.4G Jun  9 12:57 p30805684_190000_Linux-x86-64.zip
-rw-r--r-- 1 root root 600M Jun  9 12:57 p6880880_180000_Linux-x86-64.zip
-rw-r--r-- 1 root root 1.3G Jun  9 12:57 p30899722_190000_Linux-x86-64.zip

After all of this is done you can run the step 1 to validate the configuration files with the environment:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 1
 Initializing
 Executing Validate Configuration File
 Validating cluster: ex03-clu1
  Locating machines...
 Validating platinum...
 Checking Disk Tests Status....
 Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version.
 Validating nodes for database readiness...
 Completed validation...
 
 SUCCESS: Ip address: 10.201.84.190 is configured correctly
 SUCCESS: Ip address: 10.201.80.54 is configured correctly
 SUCCESS: Ip address: 10.201.84.191 is configured correctly
 SUCCESS: Ip address: 10.201.80.55 is configured correctly
 SUCCESS: Ip address: 10.201.84.192 is configured correctly
 SUCCESS: Ip address: 10.201.80.56 is configured correctly
 SUCCESS: Ip address: 10.201.80.60 is configured correctly
 SUCCESS: Ip address: 10.201.80.62 is configured correctly
 SUCCESS: Ip address: 10.201.80.61 is configured correctly
 SUCCESS: Ip address: 10.201.80.58 is configured correctly
 SUCCESS: Ip address: 10.201.80.59 is configured correctly
 SUCCESS: Ip address: 10.201.80.57 is configured correctly
 SUCCESS: Validated NTP server 10.248.1.1
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/ahf_setup exists...
 SUCCESS: Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version.
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db01
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db02
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db03
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db01
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db02
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db03
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1
 SUCCESS: DatabaseHome Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1
 SUCCESS: Disk size 14000GB on cell ex03cel01.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Disk size 14000GB on cell ex03cel02.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Disk size 14000GB on cell ex03cel03.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel01.client.com matches the value specified in OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel02.client.com matches the value specified in OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel03.client.com matches the value specified in OEDA configuration file
 Successfully completed execution of step Validate Configuration File [elapsed Time [Elapsed = 85395 mS [1.0 minutes] Tue Jun 09 22:51:44 PDT 2020]]

If it finishes successfully you are good to move forward.

Execution

Now we just need to execute the remaining steps. You can execute one-by-one or all in a row. I normally do the step 1 and step 2 separate from the others just because they tend to fail easier than others. Running all of them in a row would not cause any harm since once any step fails the execution will immediately stop. So it is up to you how you would like to execute it.

In case you need to undo any of the steps you can use the -u and the step you would like to undo. You can use the install.sh -h to help you on that:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -h
 Warning: Invalid input(s) for {-h=null}
 **********************************
 
  install.sh -cf <config.xml> -l [options]
  install.sh -cf <config.xml> -s <step #=''> | -r <num-num>
  install.sh
  ARGUMENTS:
   -l                 List all the steps that exist
   -cf                Use to specify the full path for the config file
   -s <step #=''>        Run only the specified step
   -r <num-num>       Run the steps one after the other as long as no errors
                      are encountered
   -u <num-num> | <step#> Undo a range of steps or a particular step
                      For a range of steps, specify the steps in reverse order
   -h                 Print usage information
   -override          Force to run undo steps related to celldisk and grid disk
   -force             Delete binaries under grid home and database home when
                      uninstalling clusterware and database software
   -delete            Delete staging area/directories
   -nocalibratecell   Create the installation summary file without running the calibrate cell command
   -noinfinicheck     Create the installation summary file without running InfiniBand verification
   -p                 Prompts for root password for each or all the nodes. This option allows
                      deployments in Exadata environments with non-default and/or different
                       root passwords on each of the nodes in the rack
   -usesu             Use SU with root account to run commands for grid/oracle users
   -sshkeys           Run deployment with root SSH Keys that are setup by setuprootssh.sh or oedacli. Must be used with "-usesu"
   -customstep        Run custom actions. Actions can be:
                           updatecellroute:  generate cellroute.ora in domUs
   -clustername       Specify the cluster name, or All. Only used with -customstep to specify
                       the cluster on which to run the custom action
   -upgradeNetworkFirmware  X7 Broadcom network card Firmware upgrade
  Version : 200519

To undo a step simply execute this one to undo step 2:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2

Or to undo from step 2 to step 4:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2-4

Here is the execution of step 2:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 2
 Initializing
 Executing Setup Required Files
 Copying and extracting required files...
 Required files are:
 /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/V982068-01.zip
 /u01/onecommand/linux-x64/WorkDir/V982063-01.zip
 Copying required files...
 Checking status of remote files...
 Checking status of existing files on remote nodes...
 Getting status of local files...
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip at /u01/app/oracle/Oeda/Software/V982063-01.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip at /u01/app/oracle/Oeda/Software/V982068-01.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30805684_190000_Linux-x86-64.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30899722_190000_Linux-x86-64.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/Patches/p6880880_180000_Linux-x86-64.zip
 Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db03.client.com
 Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db03.client.com
 Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db03.client.com
 Completed copying files...
 Extracting required files...
 Copying resourcecontrol and other required files
 No config Keys in the configuration file..
 Creating databasemachine.xml for EM discovery
 Done Creating databasemachine.xml for EM discovery
 Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 325110 mS [5.0 minutes] Wed Jun 10 12:16:46 CDT 2020]]

Here is the execution of steps from 3 to 8:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 3-8
 Initializing
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Users
 Creating users...
 Creating users in cluster ex03-clu1
 Validating existing users and groups...
 Creating required directories on nodes in cluster ex03-clu1
 Updating /etc/hosts on nodes in cluster ex03-clu1
 Setting up ssh for users in cluster ex03-clu1
 Creating cell diag collection user CELLDIAG on cell servers..
 Completed creating all users...
 Successfully completed execution of step Create Users [elapsed Time [Elapsed = 77818 mS [1.0 minutes] Wed Jun 10 12:20:31 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Setup Cell Connectivity
 Creating cellip.ora and cellinit.ora  ...
 Creating cellip.ora for cluster ex03-clu1
 Creating cellinit.ora for cluster ex03-clu1
 Done creating cellip.ora and cellinit.ora...
 Successfully completed execution of step Setup Cell Connectivity [elapsed Time [Elapsed = 14675 mS [0.0 minutes] Wed Jun 10 12:20:52 CDT 2020]]
 Executing Verify Infiniband
 Validating infiniband network with rds-ping...
 Check Admin network connectivity...
 Running infinicheck to verify infiniband fabric for cluster ex03-clu1...
 Running verify topology to verify infiniband network...
 No Infiniband link errors found...
 SUCCESS: Verify topology does not report any errors on node ex03db01.client.com...
 ****************ex03db01*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 SUCCESS: Verify topology does not report any errors on node ex03db02.client.com...
 ****************ex03db02*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 SUCCESS: Verify topology does not report any errors on node ex03db03.client.com...
 ****************ex03db03*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 Successfully completed execution of step Verify Infiniband [elapsed Time [Elapsed = 280227 mS [4.0 minutes] Wed Jun 10 12:25:37 CDT 2020]]
 Executing Calibrate Cells
 Calibrating cells...
 Successfully completed execution of step Calibrate Cells [elapsed Time [Elapsed = 461064 mS [7.0 minutes] Wed Jun 10 12:33:18 CDT 2020]]
 Executing Create Cell Disks
 Validating Self-Signed Certificates on cell servers...
 Fixing Cell Certificates on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Reconfiguring WLS...
 Cell name attribute does not match hostnames
 Cell ex03cel03 has cell name ru06, cell name attribute will be reset to ex03cel03
 Cell ex03cel01 has cell name ru02, cell name attribute will be reset to ex03cel01
 Cell ex03cel02 has cell name ru04, cell name attribute will be reset to ex03cel02
 Checking physical disks for errors before creating celldisks
 Creating cell disks...
 Dropping Flash Cache before enabling WriteBack on cells [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Enable FlashCache mode to WriteBack in [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Creating flashcache on cells...
 Successfully completed execution of step Create Cell Disks [elapsed Time [Elapsed = 218067 mS [3.0 minutes] Wed Jun 10 12:36:56 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Grid Disks
 Creating grid disks for cluster ex03-clu1
 Checking Cell Disk status...
 Successfully completed execution of step Create Grid Disks [elapsed Time [Elapsed = 123858 mS [2.0 minutes] Wed Jun 10 12:39:04 CDT 2020]]
[root@ex03db01 linux-x64]#

Here is the execution of steps from 9 to 16:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 9-16
 Initializing
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Cluster Software
 Installing cluster ex03-clu1
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running clusterware installer...
 Setting up Opatch for cluster ex03-clu1
 Patching cluster ex03-clu1...
 Successfully completed execution of step Install Cluster Software [elapsed Time [Elapsed = 667497 mS [11.0 minutes] Wed Jun 10 12:51:15 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Initialize Cluster Software
 Initializing cluster ex03-clu1
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running root.sh on node ex03db01.client.com
 Checking file root_ex03db01.client.com_2020-06-10_12-54-03-631071286.log on node ex03db01.client.com
 Running root.sh on node ex03db02.client.com
 Checking file root_ex03db02.client.com_2020-06-10_13-02-42-916817198.log on node ex03db02.client.com
 Running root.sh on node ex03db03.client.com
 Checking file root_ex03db03.client.com_2020-06-10_13-05-42-659221162.log on node ex03db03.client.com
 Generating response file for Configuration Tools...
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running Configuration Assistants on ex03db01.client.com
 Checking status of cluster...
 Cluster Verification completed successfully
 Successfully completed execution of step Initialize Cluster Software [elapsed Time [Elapsed = 1184567 mS [19.0 minutes] Wed Jun 10 13:11:06 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Database Software
 Installing database software ...
 Validating nodes for database readiness...
 Installing database software with database home name DbHome1
 Installing database software ...
 Extracting Database Software file /u01/app/oracle/Oeda/Software/V982063-01.zip into /u01/app/oracle/product/19.0.0.0/dbhome_1
 Running database installer on node ex03db01.client.com ... Please wait...
 After running database installer...
 Patching Database Home /u01/app/oracle/product/19.0.0.0/dbhome_1
 Successfully completed execution of step Install Database Software [elapsed Time [Elapsed = 717961 mS [11.0 minutes] Wed Jun 10 13:23:11 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Relink Database with RDS
 Successfully completed execution of step Relink Database with RDS [elapsed Time [Elapsed = 36009 mS [0.0 minutes] Wed Jun 10 13:23:54 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create ASM Diskgroups
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Validating ASM Diskgroups..
 Successfully completed execution of step Create ASM Diskgroups [elapsed Time [Elapsed = 138147 mS [2.0 minutes] Wed Jun 10 13:26:20 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Databases
 Setting up Huge Pages for Database..[test]
 Creating database [test]...
 Patch 30805684 requires specific post-installation steps. Databases will be restarted ...
 Running datapatch on database [test]
 Recompiling Invalid Objects (if any) on database [test]
 Successfully completed execution of step Create Databases [elapsed Time [Elapsed = 1252604 mS [20.0 minutes] Wed Jun 10 13:47:19 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Apply Security Fixes
 Setting up Huge Pages for ASM Instance..
 Bouncing clusterware to set required parameters...
 Checking and enabling turbo mode if required...
 ex03db03.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db03.client.com
 ex03db02.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db02.client.com
 ex03db01.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db01.client.com
 Copying over /root/config/client-ex03.xml to all nodes under /etc/exadata/config
 Successfully completed execution of step Apply Security Fixes [elapsed Time [Elapsed = 436720 mS [7.0 minutes] Wed Jun 10 13:54:43 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Autonomous Health Framework
 Copying over AHF to all nodes in the Cluster..[ex03db01, ex03db02, ex03db03]
 Configuring Autonomous Health Framework(AHF) on all computes nodes..
 AHF has been installed on all compute nodes at: /opt/oracle.ahf . EXAchk can be run by invoking ./exachk
 Generating an EXAchk report...
 EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_13567.zip
 Generating the EXAchk Infrastructure Report...
 EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_141143_infrastructure.zip
 Successfully completed execution of step Install Autonomous Health Framework [elapsed Time [Elapsed = 2234216 mS [37.0 minutes] Wed Jun 10 14:32:04 CDT 2020]]
[root@ex03db01 linux-x64]#

Here is the execution of step 17:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 17
 Initializing
 Executing Create Installation Summary
 Getting system details...
 Generating Installation Summary report: /u01/onecommand2/linux-x64/ExadataConfigurations/client-Development-InstallationReport.xml...
 Creating Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html...
 Created Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html
 All deployment reports are stored in /u01/onecommand2/linux-x64/ExadataConfigurations/client-AK00625423-deploymentfiles.zip
 Generating Platinum CSV file and copying it over to /opt/oracle.SupportTools on all compute nodes
 Writing platinum file  : /u01/onecommand2/linux-x64/WorkDir/client_null-platinum.csv
 Successfully completed execution of step Create Installation Summary [elapsed Time [Elapsed = 53311 mS [0.0 minutes] Wed Jun 10 14:36:07 CDT 2020]]

Just a rac-status.sh run to check how the cluster was setup (learn more about rac-status.sh here):

[root@ex03db01 ~]# ./pythian/rac-status.sh -a
 
                Cluster ex03-clu1 is a X8M-2 Elastic Rack HC 14TB
 
        Type      |      Name      |   db01   |   db02   |   db03   |
  ------------------------------------------------------------------
   asm            | asm            |  Online  |  Online  |  Online  |
   asmnetwork     | asmnet1        |  Online  |  Online  |  Online  |
   chad           | chad           |  Online  |  Online  |  Online  |
   cvu            | cvu            |  Online  |     -    |     -    |
   dg             | DATA           |  Online  |  Online  |  Online  |
   dg             | RECO           |  Online  |  Online  |  Online  |
   dg             | SPARSE         |  Online  |  Online  |  Online  |
   network        | net1           |  Online  |  Online  |  Online  |
   ons            | ons            |  Online  |  Online  |  Online  |
   proxy_advm     | proxy_advm     | Offline x| Offline x| Offline x|
   qosmserver     | qosmserver     |  Online  |     -    |     -    |
   vip            | db01           |  Online  |     -    |     -    |
   vip            | db02           |     -    |  Online  |     -    |
   vip            | db03           |     -    |     -    |  Online  |
   vip            | scan1          |     -    |  Online  |     -    |
   vip            | scan2          |     -    |     -    |  Online  |
   vip            | scan3          |  Online  |     -    |     -    |
  ------------------------------------------------------------------
    x  : Resource is disabled
       : Has been restarted less than 24 hours ago
 
      Listener    |      Port      |   db01   |   db02   |   db03   |     Type     |
  ---------------------------------------------------------------------------------
   ASMNET1LSNR_ASM| TCP:1525       |  Online  |  Online  |  Online  |   Listener   |
   LISTENER       | TCP:1521       |  Online  |  Online  |  Online  |   Listener   |
   LISTENER_SCAN1 | TCP:1864       |     -    |  Online  |     -    |     SCAN     |
   LISTENER_SCAN2 | TCP:1864       |     -    |     -    |  Online  |     SCAN     |
   LISTENER_SCAN3 | TCP:1864       |  Online  |     -    |     -    |     SCAN     |
  ---------------------------------------------------------------------------------
       : Has been restarted less than 24 hours ago
 
         DB       |     Version    |   db01   |   db02   |   db03   |    DB Type   |
  ---------------------------------------------------------------------------------
   test           | 19.0.0.0   (1) |   Open   |   Open   |   Open   |    RAC (P)   |
  ---------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column
 
         1 : /u01/app/oracle/product/19.0.0.0/dbhome_1  oracle oinstall
 
       : Has been restarted less than 24 hours ago
 
 
[root@ex03db01 ~]# ps -ef|grep pmon
root     362094  50259  0 14:40 pts/1    00:00:00 grep --color=auto pmon
oracle   364290      1  0 13:52 ?        00:00:00 asm_pmon_+ASM1
oracle   367756      1  0 13:53 ?        00:00:00 ora_pmon_test1
[root@ex03db01 ~]#

That’s it. The deployment is finished. Now you just need to patch the compute nodes, storage servers, RoCE switches, GI, and DBs to whatever version you would like to go up to.

You might be thinking “what about step 18”. Well, step 18 “Resecure the machine” means you will harden the servers by dropping SSH keys, enhancing password complexity, expire current passwords, and implement password expiration time, etc. Sometimes those changes make the administration a bit harder and also you might want to implement your own security policies. So we normally skip this step, but again, it is up to you.

See you next time, sincerely,

Franky Faust

 

Categories: DBA Blogs

Oracle Database and Two-Factor Authentication (2FA)

Tue, 2020-06-16 16:36
Background

A common question from Oracle customers is whether they can configure the Oracle Database to use:

  • Two-Factor Authentication (2FA)
  • Multi-Factor Authentication (MFA)
  • A “Time-based One-time Password” (TOTP) code, which is usually a six-digit code generated from a hardware or software application.

The short answer is, yes!(though, not natively). You can implement it through other directory service technologies, namely RADIUS.

Previous blog posts discussed the setup and testing of the new Oracle 18c+ feature of Oracle “Centrally Managed Users” (CMU). It allows you to manage Oracle database users through Microsoft Active Directory (AD), and effectively offloads user management to AD. Oracle RADIUS authentication isn’t part of CMU. It’s a similar but slightly different implementation, and each has its own benefits and limitations.

However, by leveraging Oracle’s compatibility with the RADIUS protocol and an external directory service, you can achieve true 2FA capabilities for the Oracle database (and on a per-user basis, meaning that application and/or service accounts remain unaffected).

This post describes how to implement Oracle database 2FA using FreeRADIUS. The subsequent post extends the setup to use the commercially available Cisco Duo platform instead.

RADIUS Quick Summary

The RADIUS (Remote Authentication Dial-in Service) protocol is based on “AAA:” Authentication, Authorization, and Accounting. This post is based mainly on the Authentication part. Furthermore, RADIUS can operate in two modes: synchronous mode and challenge-response (asynchronous) mode. Oracle Database is actually capable of both (and has been since at least Oracle 8i). However, the most practical and applicable form is “synchronous,” and will therefore be the focus of this post.

RADIUS can authenticate against numerous sources including its own “users” flat file, LDAP directory services, Microsoft Active Directory, and others. RADIUS refers to “clients” and “NASs” (Network Access Servers, which broker the credentials). When authenticating against a RADIUS server, the Oracle Database acts as the “RADIUS client” and “NAS.”

From the Oracle Database “Database Licensing Information User Manual” (part number E94254-18 dated April 2020):

Network encryption (native network encryption, network data integrity, and SSL/TLS) and strong authentication services (Kerberos, PKI, and RADIUS) are no longer part of Oracle Advanced Security and are available in all licensed editions of all supported releases of Oracle Database.

Source: https://docs.oracle.com/en/database/oracle/oracle-database/19/dblic/database-licensing-information-user-manual.pdf

However, to use RADIUS authentication, the Oracle client software must include the Advanced Security option. Therefore, a full client installation is required and the InstantClient is not sufficient. Verify Oracle client software compatibility using the adapters command.

Authentication Process

The process flow is relatively simple, yet important to understand:

  1. The Oracle client attempts to connect to the database and provides credentials (username and password) along with a 2FA TOTP code.
  2. Oracle Database receives this information, finds the user in the local catalog, and determines that the authentication type is EXTERNAL.
  3. Oracle Database then scans the SQLNET.ORA file to determine where to authenticate EXTERNAL users.
  4. Using the RADIUS connectivity information from the SQLNET.ORA file, the Oracle Database passes the credential details onto the RADIUS server.
  5. The RADIUS server first authenticates the username/password with a directory service which could be a local file, Active Directory, an LDAP service, etc. This is “Primary Authentication.”
  6. If validated, the RADIUS server then authenticates the TOTP with the multi-factor authentication service (for example, the Google Authenticator PAM module, Cisco Duo, etc). This is “Secondary Authentication.”
  7. If also validated, the RADIUS server passes back the “Access-Accept” response to the Oracle Database which then accepts and completes the connection.

Here’s an illustration of the process:

Prerequisite Setup

To use 2FA with a TOTP, a “RADIUS server” is required. Some common options include:

This post focuses on setting up and configuring Oracle Database 2FA through FreeRADIUS. FreeRADIUS is a pretty simple software suite for DBAs to set up for testing and experimentation purposes without worrying about licensing costs or third-party vendors.

The first requirement is to find a location to run FreeRADIUS. This can be a local server or a Virtual Machine (for example, created within VirtualBox or a cloud VM). For examples on how to create a cloud VM in OCI for DBA testing and experimentation, go to https://blog.pythian.com/part-1-creating-an-oracle-18c-centrally-managed-users-testbed-using-oracle-cloud-infrastructure/.

VM requirements are minimal for testing/experimentation purposes. One or two vCPUs with 2 GB of memory should be sufficient.

For a base OS, many versions of Linux are supported including CentOS 7 and Oracle Linux 7.

Oracle Database and RADIUS

For an explanation about how Oracle Database interacts with RADIUS (and specifically FreeRADIUS), see Jared Still’s blog post: https://blog.pythian.com/using-freeradius-to-authorize-oracle-connections/

A key takeaway from that post: FreeRADIUS users need to be in UPPERCASE for Oracle Database compatibility.

Installing FreeRADIUS

This section summarizes the installation steps for FreeRADIUS on CentOS 7 or Oracle Linux 7. Perform all steps as “root” user, unless otherwise indicated:

If you are installing on Oracle Linux 7, add the necessary YUM repo (not required for CentOS 7):

cat << EOF > /etc/yum.repos.d/networkradius.repo
[networkradius]
name=NetworkRADIUS-7
baseurl=http://packages.networkradius.com/centos/7/repo/
gpgcheck=0
EOF

Perform other Linux prerequisites:

sed -i 's/=enforcing/=permissive/g' /etc/selinux/config
setenforce Permissive

yum -y update

Install the core FreeRADIUS package and the utilities package. The later adds testing tools:

yum -y install freeradius freeradius-utils

Adjust the Linux firewall (if required):

firewall-cmd --permanent --zone=public --add-port=1812/udp
firewall-cmd --permanent --zone=public --add-port=1813/udp
firewall-cmd --reload

Create a service for automatic-start (but don’t start it yet):

systemctl enable radiusd.service
Initial RADIUS Testing

To perform a basic test of the RADIUS functionality, a test user is required. FreeRADIUS is configured based on several files, including an “authorize” file. This used to be a “users” file. “users” is now just a symbolic link to “authorize“.

IMPORTANT: Directives in the “authorize” file are processed sequentially. Therefore, when testing, it’s recommended you put the entries near or at the top to ensure they aren’t skipped due to some preceding and/or superseding directive.

Add a test user called “testuser” with a password of “Passw0rd” to the file:

sed -i '1itestuser Cleartext-Password := \"Passw0rd\"' /etc/raddb/mods-config/files/authorize
head -4 /etc/raddb/mods-config/files/authorize

To test, it’s recommended to manually start the FreeRADIUS software (instead of through the Linux service) in debug mode:

radiusd -X

From a different window/session, test the connection using the command:

radtest testuser Passw0rd localhost 0 testing123

IMPORTANT: The “testing123” argument is a “RADIUS secret” configured in the default “RADIUS client.” This secret is used for secure communication between RADIUS clients and the server. See Jared Still’s blog post for more information.

The result should be “Received Access-Accept …

FreeRADIUS Linux-PAM Setup

The next step is to set up FreeRADIUS to use Linux-PAM (Pluggable Authentication Modules).

IMPORTANT: For initial proof of concept experimentation and testing, local Linux users are authenticated using RADIUS and 2FA. Therefore, in this test scenario, the RADIUS server runs with root permissions so it can read all 2FA configuration files. In a production or hardened environment (including those using external directory services) you should configure FreeRADIUS to run in a “lower-permission” environment.

Change both user and group to root (in the test environment):

grep 'user\|group' /etc/raddb/radiusd.conf | grep = | grep -v ^[[:space:]]\#
sed -i '/user =/  s/\(\#\)\{0,1\}[[:space:]]user = radius.*/\t user = root/'   /etc/raddb/radiusd.conf
sed -i '/group =/ s/\(\#\)\{0,1\}[[:space:]]group = radius.*/\t group = root/' /etc/raddb/radiusd.conf
grep 'user\|group' /etc/raddb/radiusd.conf | grep = | grep -v ^[[:space:]]\#

Then, enable Pluggable Authentication Modules (PAM) including adding the required symbolic links:

grep pam /etc/raddb/sites-enabled/default
sed -i '/^\#[[:space:]]pam/ s/^\#//' /etc/raddb/sites-enabled/default
grep pam /etc/raddb/sites-enabled/default

ln -s /etc/raddb/mods-available/pam /etc/raddb/mods-enabled/pam
ls -l /etc/raddb/mods-enabled/pam
ls -l /etc/raddb/mods-available/pam

Change auth-type to PAM in the authorize file. Remember that file directives are processed sequentially, so place it near the top:

sed -i '2iDEFAULT Auth-Type := PAM' /etc/raddb/mods-config/files/authorize
head -5 /etc/raddb/mods-config/files/authorize

Finally, add a new Linux user on the same system for testing (remember that this user must be in uppercase for Oracle Database usage):

useradd RADUSER
passwd RADUSER
# Assume the set password is Passw0rd

Test FreeRADIUS and Linux-PAM authorization by running the FreeRADIUS server in debugging mode again:

radiusd -X

From another session/window, test a connection:

radtest RADUSER Passw0rd localhost 0 testing123

The result should again be “Received Access-Accept …

Google Authenticator PAM Module Setup

Google provides an example PAM module for 2FA through https://github.com/google/google-authenticator-libpam (Cisco Duo or Otka provide alternative options).

Install the Google Authenticator PAM module:

yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install google-authenticator

Perform a basic configuration for the RADUSER local Linux user:

sudo -u RADUSER google-authenticator \
   --time-based \
   --allow-reuse \
   --no-rate-limit \
   --window-size=3 \
   --force

IMPORTANT: The command above uses configuration options optimized for testing. In an actual production environment, I recommend reviewing all options and deploying in a more more secure manner. For example, using “–disallow-reuse” and similar.

Sample output:

You can generate the TOTP codes using a mobile phone application such as “Google Authenticator” or “Authy.” Scan the generated QR code with the phone app or manually enter the “secret key” into the app.

However, when testing, constantly generating codes on a phone can become cumbersome. Generating the codes using a Linux command might be more efficient.

To generate codes from the Linux bash shell, install the oathtool package:

yum -y install oathtool

Then, you can generate codes using the “secret key” from the user’s ~/.google-authenticator file. For example:

oathtool --base32 --totp "`head -1 /home/RADUSER/.google_authenticator`"

Finally, you must adjust the Linux-PAM configuration to reference the new Google Authenticator PAM Module:

cat << EOF > /etc/pam.d/radiusd
#%PAM-1.0
auth       requisite    pam_google_authenticator.so forward_pass
auth       required     pam_sepermit.so
auth       substack     password-auth
auth       include      postlogin
account    required     pam_nologin.so
account    include      password-auth
session    include      password-auth
EOF
Testing Using the Local User and One-time Password

With the FreeRADIUS and Google Authenticator PAM Module implementation of 2FA, the password must be a single string that is comprised of the actual user password and the passcode. So, the format is: <password><TOTP>.

Other implementations (such as Cisco Duo) might allow for a comma between the two, or push notification to a mobile device. However, for FreeRADIUS and the Google Authenticator PAM Module, the format is one continuous string with no additional characters.

So, to test:

MY_PASSWD=Passw0rd
MY_OTP=$(oathtool --base32 --totp "`head -1 /home/RADUSER/.google_authenticator`")
radtest RADUSER ${MY_PASSWD}${MY_OTP} localhost 0 testing123

Example successful output:

 

 

 

 

 

Testing from the Oracle Database

The previously referenced blog post by Jared Still covers Oracle setup for FreeRADIUS. Review this post for additional information about the Oracle setup steps in this section. Repeating the steps in a simplified format:

As “root” on the FreeRADIUS server, add a FreeRADIUS client that the Oracle Databases will use to connect:

cat << EOF >> /etc/raddb/clients.conf

client oracle-db {
    ipaddr = 192.168.1.0/24
    secret = secretoracle
    shortname = oracle
    nastype = other
}
EOF

Restart the FreeRADIUS service to ensure all changes take effect:

systemctl restart radiusd.service
systemctl status radiusd.service

As the “oracle” user (or Oracle software owner) on the Oracle Database server, save the RADIUS secret to a file:

mkdir -p ${ORACLE_HOME}/network/security
echo "secretoracle" > ${ORACLE_HOME}/network/security/radius.key
chmod 600 ${ORACLE_HOME}/network/security/radius.key

Add the required entries to the SQLNET.ORA file and verify the file (add the appropriate IP address for the FreeRADIUS server):

cat <<EOF >> ${ORACLE_HOME}/network/admin/sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES=(BEQ,TCPS,RADIUS)
SQLNET.RADIUS_AUTHENTICATION=192.168.1.212
SQLNET.RADIUS_AUTHENTICATION_PORT=1812
SQLNET.RADIUS_SECRET=${ORACLE_HOME}/network/security/radius.key
EOF

If you want, you can include other parameters such as the RADIUS timeout and retries. Alternatively, for higher availability and redundancy, use the SQLNET.RADIUS_ALTERNATE parameter to specify a backup RADIUS server to use if the primary becomes unavailable.

IMPORTANT: The SQLNET.ORA configuration above is for the database server only (RDBMS home). For Oracle clients, all that’s required is that the SQLNET.AUTHENTICATION_SERVICES parameter includes RADIUS. The other SQLNET.ORA parameters are not required/used in Oracle client homes.

Make sure that the database initialization parameters are set properly:

sqlplus -s / as sysdba << EOF
alter system set os_authent_prefix='' scope=spfile sid='*';
alter system set remote_os_authent=false scope = spfile sid='*';
shutdown immediate
startup
alter pluggable database all open;
EOF

Add the test user to the database:

sqlplus -s / as sysdba << EOF
alter session set container=PDB1;
create user raduser identified externally;
grant create session to raduser;
grant select on v_\$database to raduser;
EOF

Testing is as simple as providing the TOTP in the password field. Because the user was added to the database using “ IDENTIFIED EXTERNALLY,” this is the Linux user’s password.

Using a connection string such as <username>/<password><passcode>, here’s an example of a full connection using SQLPlus:

Success! An Oracle Database connection to a database user using an external credential and 2FA!

If the password or the TOTP code is incorrect, then the standard “ORA-01017: invalid username/password; logon denied” error occurs.

Next Steps: Microsoft Active Directory

For many implementations, the next logical step is implemented using Microsoft Active Directory (AD) users instead of local Linux users. Switching this test setup to use AD instead of local users for authentication is relatively easy and involves the following:

cat << EOF > /etc/pam.d/radiusd
#%PAM-1.0
auth       requisite    pam_google_authenticator.so forward_pass
auth       required     pam_sss.so use_first_pass
account    required     pam_nologin.so
account    include      password-auth
session    include      password-auth
EOF
  • Logging to the FreeRADIUS server using the AD domain user and generating the ~/.google-authenticator configuration file using the previously described process.
  • Creating the domain user in the database and testing.
Conclusion

Configuring the Oracle Database to use 2FA is something that many inquire about but few actually do. However, doing so isn’t overly complex and allows you to leverage open source software such as FreeRADIUS. All that’s required is a small VM to act as a RADIUS server.

In a real production environment, this server could become a single point of failure and therefore might require you to set it up with reliable infrastructure and/or in a redundant configuration (by including the SQLNET.RADIUS_ALTERNATE parameter in the SQLNET.ORA file).

This environment is simple and flexible for DBA testing. However, most enterprise users will instead want to leverage a more robust and flexible solution such as the commercial offerings from Cisco Duo or Otka. Configuring Oracle Database authentication using Cisco Duo is covered in the next post in this series.

Categories: DBA Blogs

AWS RDS: 5 Must-Know Actions for Oracle DBAs

Wed, 2020-06-03 00:00

Managing Oracle on AWS has some twists. Here are five daily DBA activities that have changed on AWS:

Kill Sessions:

begin
rdsadmin.rdsadmin_util.kill(
sid => &sid,
serial => &serial,
method => 'IMMEDIATE');
end;
/

 

Flush shared_pool or buffer_cache:

exec rdsadmin.rdsadmin_util.flush_shared_pool;
exec rdsadmin.rdsadmin_util.flush_buffer_cache;

 

Perform RMAN Operations:

BEGIN
 rdsadmin.rdsadmin_rman_util.validate_database(
 p_validation_type => 'PHYSICAL+LOGICAL',
 p_parallel => 4,
 p_section_size_mb => 10,
 p_rman_to_dbms_output => FALSE);
END;
/

 

Grant Privileges to SYS Objects

# Grant

begin
    rdsadmin.rdsadmin_util.grant_sys_object(
        p_obj_name  => 'V_$SESSION',
        p_grantee   => 'PYTHIAN',
        p_privilege => 'SELECT');
end;
/

# Grant with Grant Option

begin
    rdsadmin.rdsadmin_util.grant_sys_object(
        p_obj_name     => 'V_$SESSION',
        p_grantee      => 'PYTHIAN',
        p_privilege    => 'SELECT',
        p_grant_option => true);
end;
/

# Revoke

begin
    rdsadmin.rdsadmin_util.revoke_sys_object(
        p_obj_name  => 'V_$SESSION',
        p_revokee   => 'PYTHIAN',
        p_privilege => 'SELECT');
end;
/

 

Create Custom Functions to Verify Passwords:

begin
    rdsadmin.rdsadmin_password_verify.create_verify_function(
        p_verify_function_name => 'CUSTOM_PASSWORD_FUNCTION', 
        p_min_length           => 12, 
        p_min_uppercase        => 2, 
        p_min_digits           => 1, 
        p_min_special          => 1,
        p_disallow_at_sign     => true);
end;
/

If you want to double-check the generated code, here’s simple trick: Check on DBA_SOURCE:

col text format a150
select TEXT  from DBA_SOURCE 
where OWNER = 'SYS' and NAME = 'CUSTOM_PASSWORD_FUNCTION' order by LINE;

I hope this helps!

Categories: DBA Blogs

OATUG Forum Online (or, the 2020 Version of Collaborate Las Vegas)

Fri, 2020-05-29 08:58

Hi everyone!

It’s been a very busy year for me, so I haven’t had much time to blog. But, here we go!

This isn’t a technical post, but more of an opinion piece. I’ve split this post into two parts. Part one was written before I spoke at the OATUG (Oracle Applications and Technology Users Group) Forum online. Part two was written after I spoke.

Part 1: Before…

The global pandemic has disrupted virtually everything in our world. The same goes for the world of IT, including one of the best things about our industry: Conferences and live events.

I was supposed to attend two major conferences in the first quarter of 2020. One was cancelled and the other became an online event. Aborting major events like these is no trivial undertaking. Not only for the organizers themselves but also for all the sponsors and attendees (customers and vendors included).

Huge events like Oracle Open World, Google Next, or the one that will be the focus of most of this post, Collaborate, have a long-lasting impact that extends throughout the year. At these events, a casual meeting or a chat during a coffee break can kickstart a relationship. It might develop into a large services agreement, a partnership, or a whole new successful startup.

You have to agree that these events are important, if not crucial, for our industry.

Collaborate

As for Collaborate, well, I was very excited to not only attend but also speak at the event. After attending DOAG, BIWA, and UKOUG the previous years, Collaborate is the next big Oracle-related event in which I’d be participating. Then, of course, came COVID-19.

However, Collaborate isn’t cancelled. Instead, it’s now a fully virtual event. Here I’d like to give a massive KUDOS to the event organizers for their commitment to the event and the attendees. It certainly couldn’t have been easy to make the initial decision to proceed with Collaborate. Nor could it have been easy to completely undo and transform months and months of planning and logistics into a fully virtual conference in a matter of weeks.

I was very pleased to still be selected to speak at the (now) online conference. And, a big thanks to Pythian for their speaker program. It enables me to attend these gatherings, virtual or otherwise.

So, here I am, a few hours prior to my first big online presentation to an unknown number of people, and crossing my fingers that everything works as expected. This is basically how I feel before any “typical” presentation. The main difference? Well, the presentation won’t be quite the same. Interaction with the listening public is close to zero, and there is little practical opportunity for people to reach out to you with questions afterwards.

My Thoughts…

I know, I know… there’s always social media, and so on. However, it’s not the same, and you know it. There’s no immediate feedback from the attendees, and you can’t really tell if they find the content interesting or boring. Nor is it easy to tell whether you should slow down, or speed up. It definitely feels more impersonal.

This is my opinion from a “presenter” standpoint. As an “attendee,” there are two major issues for me: One is that it’s no longer a live in-person event, it’s awkward to ask your manager for time to attend it. The second big issue for me is the time zone. I’m in the Eastern time zone, and the event is based in Las Vegas, so sessions can go all the way from 4:30 PM to 10:30 PM EDT. This, on top of my regular eight-hour workday, is simply too much.

So, I might be able to attend a session or two and watch saved sessions at a later time. This takes me back to my previous point about lost face-to-face interactions with people.

Anyway, I’m really interested to see how this plays out. I really hope the event is a big success because, unfortunately, our current “situation” might last a long time. We need to prepare for that possibility.

Part 2: After…

And… the event is over. Unfortunately, it wasn’t very satisfactory, which makes me a little sad. You see, Collaborate is one of the biggest Oracle events that I know of with a great many attendees, but only four showed up to my live presentation. Yes, the feedback was very good, and surely those people who attended were very interested in the topic I was speaking about. I thank each and every one of them.

A few days later, I checked my presentation statistics and there were a few views and downloads. But overall, the number is far below the 20 to 50 people that attended this same presentation at other on-site conferences.

Of course, I’m not a renowned speaker and the subject matter might not be of broader interest. However, the online experience is just “not the same,” and that takes a toll on the number of attendees.

Regardless, it was a good overall experience. I give my sincere congratulations to the organizers for the huge effort they clearly put into the entire event.

Hopefully, we can all meet in person soon and enjoy a truly magnificent Collaborate.

Categories: DBA Blogs

Datascape Podcast Episode 41 – Choosing a Public Cloud for Oracle Database

Thu, 2020-04-23 00:00

You can run Oracle anywhere. It’s very flexible, which means anywhere you can get a VM, you can run it, including a virtual box on your laptop. There’s even have a free edition (XE) if you don’t have large data needs. In this episode, we’ll discuss why you would choose to host your Oracle Database anywhere other than its native cloud. We look at the feasibility of doing so, along with some of the benefits and drawbacks that you’ll likely encounter moving to a public cloud.

Simon Pane, a friend of the show and Principal Consultant at Pythian, is here to walk us through this topic. As usual, he provides excellent insights. We start by discussing Oracle’s cloud (OCI) and the reasons to stay with it, which include its flexibility and very good support. However, the platform is in its relative infancy which means that there will likely some problems.

We then move on to the big three: Azure, AWS, and Google Cloud. Moving to a public cloud provider might be part of an organizational “bigger picture” and other non-functional reasons. With Azure and AWS, clients are likely hoping to take advantage of the other features, such as Azure’s Office 365 and AWS’s RDS. Despite being licensed for Oracle, Azure is not optimized for it because there are no special shapes or options. AWS does have managed services, but still has drawbacks, which Simon explores in-depth.

We then move on to Google Cloud, the newest kid on the block. There are certainly great features (like BigQuery) that come along with it, but there is a major discrepancy: Licensing. The Oracle licensing policy does not include GC, so Simon offers some potential workarounds for this problem.

Ultimately, there is no slam-dunk solution for where to host Oracle databases. If you put your blinders on and focus on one area, it might seem that one cloud resolves all of your needs. However, it really comes down to making trade-offs and deciding which need is most important. Be sure to tune in today!

Key Points from this Episode
  • Discover reasons someone might move an Oracle on-premises database to a public cloud.
  • How Oracle-supported versions work and why this might be a reason for using public clouds.
  • Insights into Oracle cloud (OCI), its bad name, and the two reasons people don’t use it.
  • The motivators for utilizing OCI: It’s best at home, supportability, and unique needs.
  • Learn why you would use Azure as a home for your Oracle Database.
  • The pain points you might encounter hosting an Oracle Database on Azure.
  • Developments between Microsoft and Oracle. Two former enemies now building bridges.
  • Reasons to go with AWS: Brand recognition, first to market, and managed service.
  • Oracle licenses are incredibly expensive and drive the cost up in cloud and on-premises.
  • Discover some of the shortcomings of hosting an Oracle Database on AWS.
  • An overview of the special Oracle licensing considerations to remember.
  • The benefits and drawbacks of using Google Cloud for Oracle Databases.
  • Cloud SQL: Another example of Oracle’s terrible product names!
  • A universal cloud challenge: None of them are great with upgrades.
  • Why having cloud as backup or DR can be a good way to navigate the cloud decision matrix.
  • What public clouds have in store for Oracle going into the future.

Links Mentioned in Today’s Episode
Categories: DBA Blogs

101 Series of Oracle in Google Cloud – Part I : Building ASM and Database

Wed, 2020-04-08 10:57

About a year ago, I worked on a project with some amazing teammates (Simon Pane and Karun Dutt) to collaborate on a POC to move their Oracle environment into Google Cloud. This series reflects some of the things I learned while working on this project, and provides a guide for testing Oracle in Google Cloud.

Note: that I am not a licensing expert, so before you do this, make sure you consult with your Google/Oracle Sales representative to be clear about what’s allowed and the costs involved.

Because this is a “101” series, let’s start by logging into Google Cloud. By now you should already have an account in cloud.google.com and installed the Google Cloud SDK. Because I use a Mac, I downloaded it from this location. Then, I logged into my Google Cloud account:

Renes-MacBook-Pro-2:~ rene$ gcloud auth login **********@*****.com

Because I’m using Google Free Tier, there are limitations on disk sizes, instance type, quotas, etc. These are the specifications I used:

  • Google Cloud instance name: rene-ace-test-inst1
  • n1-standard-8 VM which has 8 vCPU and 30GB of memory
  • Centos 7
  • 1 150GB SSD disk for ASM, disk group name: DATA
  • 1 100GB HD disk for the boot disk
  • Default subnet network
  • No quotas
  • Oracle 19.3 for OHAS and RDBMS
  • Oracle DB instance name: DBTEST

First, create the disk to use for ASM. In this case, I used a 150GB SSD disk:

Renes-MacBook-Pro-2:~ rene$ gcloud compute disks create rene-ace-disk-asm1 \
>    --project=oracle-migration \
>    --type=pd-ssd \
>    --size=150GB \
>    --labels=item=rene-ace \
>    --zone=us-central1-c
Created [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/disks/rene-ace-disk-asm1].
NAME                ZONE           SIZE_GB  TYPE    STATUS
rene-ace-disk-asm1  us-central1-c  150      pd-ssd  READY

Create a 100GB HD disk for the boot disk with Centos 7:

Renes-MacBook-Pro-2:~ rene$ gcloud compute disks create rene-ace-inst1-boot-disk \
> --project=oracle-migration \
> --type=pd-standard \
> --size=100GB \
> --zone=us-central1-c \
> --image=centos-7-v20200309 \
> --image-project=centos-cloud 
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/disks/rene-ace-inst1-boot-disk].
WARNING: Some requests generated warnings:
 - Disk size: '100 GB' is larger than image size: '10 GB'. You might need to resize the root repartition manually if the operating system does not support automatic resizing. See https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd for details.

NAME                      ZONE           SIZE_GB  TYPE         STATUS
rene-ace-inst1-boot-disk  us-central1-c  100      pd-standard  READY

After creating the ASM disk, create the VM with the boot disk that you just created:

Renes-MacBook-Pro-2:~ rene$ gcloud compute instances create rene-ace-test-inst1 \
>    --project=oracle-migration \
>    --zone=us-central1-c \
>    --machine-type=n1-standard-8 \
>    --subnet=default \
>    --network-tier=PREMIUM \
>    --no-restart-on-failure \
>    --maintenance-policy=TERMINATE \
>    --no-service-account \
>    --no-scopes \
>    --disk=name=rene-ace-inst1-boot-disk,device-name=rene-ace-inst1-boot-disk,mode=rw,boot=yes,auto-delete=yes \
>    --tags=allow-ssh,egress-nat-gce \
>    --labels=item=rene-ace
Created [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/instances/rene-ace-test-inst1].
NAME                 ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP   STATUS
rene-ace-test-inst1  us-central1-c  n1-standard-8               10.***.***.***   34.***.***.***  RUNNING

Attach the disk to use for ASM:

Renes-MacBook-Pro-2:~ rene$ gcloud compute instances attach-disk rene-ace-test-inst1 \
>    --disk=rene-ace-disk-asm1 \
>    --device-name=rene-ace-disk-asm1 \
>    --mode=rw \
>    --zone=us-central1-c
Updated [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/instances/rene-ace-test-inst1].
Renes-MacBook-Pro-2:~ rene$ gcloud compute ssh rene-ace-test-inst1 --zone=us-central1-c
Updating project ssh metadata...?Updated [https://www.googleapis.com/compute/v1/projects/oracle-migration].                                                                            
Updating project ssh metadata...done.                                                                                                                                                  
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.5519028484974783249' (ECDSA) to the list of known hosts.
/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
[rene@rene-ace-test-inst1 ~]$ 

Unless otherwise mentioned, perform all steps as the root user. Now, install the 19c pre-install RPM:

[rene@rene-ace-test-inst1 ~]$ sudo su -
[root@rene-ace-test-inst1 ~]# curl -o ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm \
>    https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/getPackage/oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18204  100 18204    0     0  55201      0 --:--:-- --:--:-- --:--:-- 55331
[root@rene-ace-test-inst1 ~]# yum -y localinstall ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm
Loaded plugins: fastestmirror
Examining ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm: oracle-database-preinstall-19c-1.0-1.el7.x86_64
Marking ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package oracle-database-preinstall-19c.x86_64 0:1.0-1.el7 will be installed
--> Processing Dependency: bc for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
Determining fastest mirrors
epel/x86_64/metalink                                                                                                                                             |  18 kB  00:00:00     
 * base: us.mirror.nsec.pt
 * epel: mirror.steadfastnet.com
 * extras: mirrors.gigenet.com
 * updates: bay.uchicago.edu
base                                                                                                                                                             | 3.6 kB  00:00:00     
epel                                                                                                                                                             | 4.7 kB  00:00:00     
extras                                                                                                                                                           | 2.9 kB  00:00:00     
google-cloud-sdk/signature                                                                                                                                       |  454 B  00:00:00     
google-cloud-sdk/signature                                                                                                                                       | 1.4 kB  00:00:00 !!! 
google-compute-engine/signature                                                                                                                                  |  454 B  00:00:00     
google-compute-engine/signature                                                                                                                                  | 1.4 kB  00:00:00 !!! 
updates                                                                                                                                                          | 2.9 kB  00:00:00     
(1/9): epel/x86_64/group_gz                                                                                                                                      |  95 kB  00:00:00     
(2/9): epel/x86_64/updateinfo                                                                                                                                    | 1.0 MB  00:00:00     
(3/9): base/7/x86_64/group_gz                                                                                                                                    | 165 kB  00:00:00     
(4/9): extras/7/x86_64/primary_db                                                                                                                                | 164 kB  00:00:00     
(5/9): epel/x86_64/primary_db                                                                                                                                    | 6.7 MB  00:00:00     
(6/9): base/7/x86_64/primary_db                                                                                                                                  | 6.0 MB  00:00:00     
(7/9): google-cloud-sdk/primary                                                                                                                                  | 168 kB  00:00:00     
(8/9): google-compute-engine/primary                                                                                                                             | 3.9 kB  00:00:00     
(9/9): updates/7/x86_64/primary_db                                                                                                                               | 7.6 MB  00:00:00     
google-cloud-sdk                                                                                                                                                              1224/1224
google-compute-engine                                                                                                                                                             11/11
--> Processing Dependency: bind-utils for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: compat-libcap1 for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: compat-libstdc++-33 for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: glibc-devel for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: ksh for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: libaio for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: libaio-devel for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: libstdc++-devel for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: nfs-utils for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: psmisc for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: smartmontools for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: sysstat for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: unzip for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: xorg-x11-utils for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: xorg-x11-xauth for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Running transaction check
---> Package bc.x86_64 0:1.06.95-13.el7 will be installed
---> Package bind-utils.x86_64 32:9.11.4-9.P2.el7 will be installed
--> Processing Dependency: bind-libs-lite(x86-64) = 32:9.11.4-9.P2.el7 for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: bind-libs(x86-64) = 32:9.11.4-9.P2.el7 for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: liblwres.so.160()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libisccfg.so.160()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libisc.so.169()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libirs.so.160()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libdns.so.1102()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
...
Installed:
  oracle-database-preinstall-19c.x86_64 0:1.0-1.el7                                                                                                                                     

Dependency Installed:
  GeoIP.x86_64 0:1.5.0-14.el7                  bc.x86_64 0:1.06.95-13.el7           bind-libs.x86_64 32:9.11.4-9.P2.el7                     bind-libs-lite.x86_64 32:9.11.4-9.P2.el7 
  bind-license.noarch 32:9.11.4-9.P2.el7       bind-utils.x86_64 32:9.11.4-9.P2.el7 compat-libcap1.x86_64 0:1.10-7.el7                      compat-libstdc++-33.x86_64 0:3.2.3-72.el7
  geoipupdate.x86_64 0:2.5.0-1.el7             glibc-devel.x86_64 0:2.17-292.el7    glibc-headers.x86_64 0:2.17-292.el7                     gssproxy.x86_64 0:0.7.0-26.el7           
  kernel-headers.x86_64 0:3.10.0-1062.18.1.el7 keyutils.x86_64 0:1.5.8-3.el7        ksh.x86_64 0:20120801-140.el7_7                         libICE.x86_64 0:1.0.9-9.el7              
  libSM.x86_64 0:1.2.2-2.el7                   libX11.x86_64 0:1.6.7-2.el7          libX11-common.noarch 0:1.6.7-2.el7                      libXau.x86_64 0:1.0.8-2.1.el7            
  libXext.x86_64 0:1.3.3-3.el7                 libXi.x86_64 0:1.7.9-1.el7           libXinerama.x86_64 0:1.1.3-2.1.el7                      libXmu.x86_64 0:1.1.2-2.el7              
  libXrandr.x86_64 0:1.5.1-2.el7               libXrender.x86_64 0:0.9.10-1.el7     libXt.x86_64 0:1.1.5-3.el7                              libXtst.x86_64 0:1.2.3-1.el7             
  libXv.x86_64 0:1.0.11-1.el7                  libXxf86dga.x86_64 0:1.1.4-2.1.el7   libXxf86misc.x86_64 0:1.0.3-7.1.el7                     libXxf86vm.x86_64 0:1.1.4-1.el7          
  libaio.x86_64 0:0.3.109-13.el7               libaio-devel.x86_64 0:0.3.109-13.el7 libbasicobjects.x86_64 0:0.1.1-32.el7                   libcollection.x86_64 0:0.7.0-32.el7      
  libdmx.x86_64 0:1.1.3-3.el7                  libevent.x86_64 0:2.0.21-4.el7       libini_config.x86_64 0:1.3.1-32.el7                     libnfsidmap.x86_64 0:0.25-19.el7         
  libpath_utils.x86_64 0:0.2.1-32.el7          libref_array.x86_64 0:0.1.5-32.el7   libstdc++-devel.x86_64 0:4.8.5-39.el7                   libtirpc.x86_64 0:0.2.4-0.16.el7         
  libverto-libevent.x86_64 0:0.2.5-4.el7       libxcb.x86_64 0:1.13-1.el7           lm_sensors-libs.x86_64 0:3.4.0-8.20160601gitf9185e5.el7 mailx.x86_64 0:12.5-19.el7               
  nfs-utils.x86_64 1:1.3.0-0.65.el7            psmisc.x86_64 0:22.20-16.el7         quota.x86_64 1:4.01-19.el7                              quota-nls.noarch 1:4.01-19.el7           
  rpcbind.x86_64 0:0.2.0-48.el7                smartmontools.x86_64 1:7.0-1.el7_7.1 sysstat.x86_64 0:10.1.5-18.el7_7.1                      tcp_wrappers.x86_64 0:7.6-77.el7         
  unzip.x86_64 0:6.0-20.el7                    xorg-x11-utils.x86_64 0:7.5-23.el7   xorg-x11-xauth.x86_64 1:1.0.9-1.el7                    

Complete!

You must install some additional packages and update what you have already installed:

[root@rene-ace-test-inst1 ~]# yum -y install wget java-11-openjdk gcc gcc-c++
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: us.mirror.nsec.pt
...
Installed:
  gcc.x86_64 0:4.8.5-39.el7              gcc-c++.x86_64 0:4.8.5-39.el7              java-11-openjdk.x86_64 1:11.0.6.10-1.el7_7              wget.x86_64 0:1.14-18.el7_6.1             

Dependency Installed:
  avahi-libs.x86_64 0:0.6.31-19.el7        copy-jdk-configs.noarch 0:3.3-10.el7_5               cpp.x86_64 0:4.8.5-39.el7                  cups-libs.x86_64 1:1.6.3-40.el7             
  dejavu-fonts-common.noarch 0:2.33-6.el7  dejavu-sans-fonts.noarch 0:2.33-6.el7                fontconfig.x86_64 0:2.13.0-4.3.el7         fontpackages-filesystem.noarch 0:1.44-8.el7 
  giflib.x86_64 0:4.1.6-9.el7              java-11-openjdk-headless.x86_64 1:11.0.6.10-1.el7_7  javapackages-tools.noarch 0:3.4.1-11.el7   libfontenc.x86_64 0:1.1.3-3.el7             
  libjpeg-turbo.x86_64 0:1.2.90-8.el7      libmpc.x86_64 0:1.0.1-3.el7                          libxslt.x86_64 0:1.1.28-5.el7              lksctp-tools.x86_64 0:1.0.17-2.el7          
  mpfr.x86_64 0:3.1.1-4.el7                pcsc-lite-libs.x86_64 0:1.8.8-8.el7                  python-javapackages.noarch 0:3.4.1-11.el7  python-lxml.x86_64 0:3.2.1-4.el7            
  ttmkfdir.x86_64 0:3.0.9-42.el7           tzdata-java.noarch 0:2019c-1.el7                     xorg-x11-font-utils.x86_64 1:7.5-21.el7    xorg-x11-fonts-Type1.noarch 0:7.5-9.el7     

Complete!
[root@rene-ace-test-inst1 ~]# yum -y update
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: us.mirror.nsec.pt
 * epel: mirror.steadfastnet.com
 * extras: mirrors.gigenet.com
 * updates: bay.uchicago.edu
Resolving Dependencies
...
Installed:
  kernel.x86_64 0:3.10.0-1062.18.1.el7                                                                                                                                                  

Updated:
  binutils.x86_64 0:2.27-41.base.el7_7.3      firewalld.noarch 0:0.6.3-2.el7_7.4               firewalld-filesystem.noarch 0:0.6.3-2.el7_7.4  google-cloud-sdk.noarch 0:286.0.0-1    
  kernel-tools.x86_64 0:3.10.0-1062.18.1.el7  kernel-tools-libs.x86_64 0:3.10.0-1062.18.1.el7  kmod.x86_64 0:20-25.el7_7.1                    kmod-libs.x86_64 0:20-25.el7_7.1       
  libicu.x86_64 0:50.2-4.el7_7                python-firewall.noarch 0:0.6.3-2.el7_7.4         python-perf.x86_64 0:3.10.0-1062.18.1.el7      python-requests.noarch 0:2.6.0-9.el7_7 
  rsyslog.x86_64 0:8.24.0-41.el7_7.4          systemd.x86_64 0:219-67.el7_7.4                  systemd-libs.x86_64 0:219-67.el7_7.4           systemd-sysv.x86_64 0:219-67.el7_7.4   

Complete!

[root@rene-ace-test-inst1 ~]# yum clean all
Loaded plugins: fastestmirror
Cleaning repos: base epel extras google-cloud-sdk google-compute-engine updates
Cleaning up list of fastest mirrors

Make sure that there is at least 16GB of swap space:

[root@rene-ace-test-inst1 ~]# dd if=/dev/zero of=/swapfile bs=1M count=16384
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 80.5556 s, 213 MB/s
[root@rene-ace-test-inst1 ~]# mkswap /swapfile
Setting up swapspace version 1, size = 16777212 KiB
no label, UUID=50136cfb-9ab7-4d5a-8ed7-726da89a7ede
[root@rene-ace-test-inst1 ~]# chmod 0600 /swapfile
[root@rene-ace-test-inst1 ~]# swapon /swapfile
[root@rene-ace-test-inst1 ~]# echo "/swapfile          swap            swap    defaults        0 0" >> /etc/fstab
[root@rene-ace-test-inst1 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
[root@rene-ace-test-inst1 ~]# echo "transparent_hugepage=never" >> /etc/default/grub
[root@rene-ace-test-inst1 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-1062.18.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1062.18.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-1062.12.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1062.12.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-1062.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1062.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-2a464dc72a01028f0cd989f1d687b484
Found initrd image: /boot/initramfs-0-rescue-2a464dc72a01028f0cd989f1d687b484.img
done

Disable the firewall and change SELINUX to permissive:

[root@rene-ace-test-inst1 ~]# systemctl stop firewalld
[root@rene-ace-test-inst1 ~]# systemctl disable firewalld
[root@rene-ace-test-inst1 ~]# sed -i.bak '/^SELINUX=/ s/SELINUX=.*/SELINUX=permissive/' /etc/selinux/config^C
[root@rene-ace-test-inst1 ~]# cat /etc/selinux/config | grep SELINUX
# SELINUX= can take one of these three values:
SELINUX=permissive
# SELINUXTYPE= can take one of three values:
SELINUXTYPE=targeted 

Next, enable hugepages by changing the value in /etc/sysctl.conf and restarting the VM:

[root@rene-ace-test-inst1 ~]# echo "vm.nr_hugepages=4096" >> /etc/sysctl.conf
[root@rene-ace-test-inst1 ~]# shutdown -r now
Connection to 34.66.112.87 closed by remote host.
Connection to 34.66.112.87 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Renes-MacBook-Pro-2:~ rene$ gcloud compute ssh rene-ace-test-inst1 --zone=us-central1-c
Last login: Tue Mar 31 01:31:10 2020 from 189.216.207.2
-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
[rene@rene-ace-test-inst1 ~]$ sudo su -
Last login: Tue Mar 31 01:31:21 UTC 2020 on pts/0
[root@rene-ace-test-inst1 ~]# grep Huge /proc/meminfo
AnonHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4096
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Create the directories for the binaries, the additional OS groups, and update the bash profile:

[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/19.0.0.0/grid
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oracle/product/19.0.0.0/dbhome_1
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oraInventory
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oracle/fast_recovery_area
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oracle_software
[root@rene-ace-test-inst1 ~]# mkdir -p /home/oracle/working
[root@rene-ace-test-inst1 ~]# chown -R oracle:oinstall /u01
[root@rene-ace-test-inst1 ~]# chown -R oracle:oinstall /home/oracle/working
[root@rene-ace-test-inst1 ~]# chmod -R 775 /u01

[root@rene-ace-test-inst1 ~]# 
[root@rene-ace-test-inst1 ~]# 
[root@rene-ace-test-inst1 ~]# cat <<EOT >> /home/oracle/.bash_profile
>     
> #  Oracle RDBMS Settings:
> export ORACLE_SID=ORCL
> export ORACLE_BASE=/u01/app/oracle
> export ORACLE_HOME=\${ORACLE_BASE}/product/19.0.0.0/dbhome_1
> export GRID_HOME=/u01/app/19.0.0.0/grid
> export PATH=\${ORACLE_HOME}/bin:\${PATH}
> export SWLIB=/u01/app/oracle_software
> EOT
[root@rene-ace-test-inst1 ~]# groupadd -g 54327 asmdba
[root@rene-ace-test-inst1 ~]# groupadd -g 54328 asmoper
[root@rene-ace-test-inst1 ~]# groupadd -g 54329 asmadmin
[root@rene-ace-test-inst1 ~]# usermod -u 54321 -g oinstall -G dba,asmadmin,asmdba,asmoper oracle
[root@rene-ace-test-inst1 ~]# 
[root@rene-ace-test-inst1 ~]# echo "umask 022" >> /home/oracle/.bashrc

Log in as the oracle use and download the latest OPatch using Maris Elsins getMOSpatch:

[rene@rene-ace-test-inst1 ~]$ sudo su - oracle
Last login: Tue Mar 31 01:09:04 UTC 2020 on pts/0
[oracle@rene-ace-test-inst1 ~]$ cd ${SWLIB}
[oracle@rene-ace-test-inst1 oracle_software]$ wget https://github.com/MarisElsins/getMOSPatch/raw/master/getMOSPatch.jar

...
(raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8932 (8.7K) [application/octet-stream]
Saving to: ‘getMOSPatch.jar’

100%[==============================================================================================================================================>] 8,932       --.-K/s   in 0s      

2020-03-31 01:46:46 (80.6 MB/s) - ‘getMOSPatch.jar’ saved [8932/8932]

[oracle@rene-ace-test-inst1 oracle_software]$ java -jar getMOSPatch.jar patch=6880880 platform=226P regexp=.*190.* download=all
Enter your MOS username: ********@*********.com
Enter your MOS password: 
|
Enter Comma separated platforms to list: 226P

We're going to download patches for the following Platforms/Languages:
 226P - Linux x86-64

Processing patch 6880880 for Linux x86-64 and applying regexp .*190.* to the filenames:
 1 - p6880880_190000_Linux-x86-64.zip
 Enter Comma separated files to download: all
 All files will be downloadad because download=all was specified.

Downloading all selected files:
 Downloading p6880880_190000_Linux-x86-64.zip: 110MB at average speed of 16744KB/s - DONE!  

Download Oracle base software using wget from edelivery.oracle.com. Download files V982063-01.zip (ORACLE RDBMS) and V982068-01.zip (ORACLE GI). You should have the following files:

[oracle@rene-ace-test-inst1 oracle_software]$ ls -ltr
total 5931508
-rw-r--r--. 1 oracle oinstall 3059705302 Apr 23  2019 V982063-01.zip
-rw-r--r--. 1 oracle oinstall 2889184573 Apr 23  2019 V982068-01.zip
-rw-r--r--. 1 oracle oinstall  115653541 Mar 31 01:49 p6880880_190000_Linux-x86-64.zip

Unzip the files to the corresponding location and update the OPatch:

[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/V982063-01.zip -d ${ORACLE_HOME}
[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/V982068-01.zip -d ${GRID_HOME}
[oracle@rene-ace-test-inst1 oracle_software]$ rm -rf ${ORACLE_HOME}/OPatch
[oracle@rene-ace-test-inst1 oracle_software]$ rm -rf ${GRID_HOME}/OPatch
[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/p6880880_190000_Linux-x86-64.zip -d ${ORACLE_HOME}
[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/p6880880_190000_Linux-x86-64.zip -d ${GRID_HOME}

As the root user, install the cvuqdisk:

[rene@rene-ace-test-inst1 ~]$ sudo su -
Last login: Tue Mar 31 01:35:45 UTC 2020 on pts/0
[root@rene-ace-test-inst1 ~]# cd /u01/app/19.0.0.0/grid/cv/rpm
[root@rene-ace-test-inst1 rpm]# rpm -qi cvuqdisk
package cvuqdisk is not installed
[root@rene-ace-test-inst1 rpm]# CVUQDISK_GRP=oinstall; export CVUQDISK_GRP
[root@rene-ace-test-inst1 rpm]# rpm -iv cvuqdisk-*.rpm
Preparing packages...
cvuqdisk-1.0.10-1.x86_64

As the oracle user, run cluvfy pre crsinst to make sure there is nothing missing so that you can successfully install OHAS and ASM. Because this is not a RAC environment, you only need to use ASM.

[oracle@rene-ace-test-inst1 ~]$ cd ${GRID_HOME}
[oracle@rene-ace-test-inst1 grid]$ ./runcluvfy.sh stage -pre crsinst -n `hostname -s`

ERROR: 
PRVG-10467 : The default Oracle Inventory group could not be determined.

Verifying Physical Memory ...PASSED
Verifying Available Physical Memory ...PASSED
Verifying Swap Size ...PASSED
Verifying Free Space: rene-ace-test-inst1:/usr,rene-ace-test-inst1:/var,rene-ace-test-inst1:/etc,rene-ace-test-inst1:/sbin,rene-ace-test-inst1:/tmp ...PASSED
Verifying User Existence: oracle ...
  Verifying Users With Same UID: 54321 ...PASSED
Verifying User Existence: oracle ...PASSED
...
CVU operation performed:      stage -pre crsinst
Date:                         Apr 4, 2020 2:09:38 PM
CVU home:                     /u01/app/19.0.0.0/grid/
User:                         oracle

Modify the response file so that you can use the GI software as a CRS_SWONLY installation:

[oracle@rene-ace-test-inst1 grid]$ cp ${GRID_HOME}/inventory/response/grid_install.rsp ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.option/ s~oracle.install.option=$~oracle.install.option=CRS_SWONLY~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_HOSTNAME/ s~ORACLE_HOSTNAME=$~ORACLE_HOSTNAME=`hostname -A`~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^INVENTORY_LOCATION/ s~INVENTORY_LOCATION=$~INVENTORY_LOCATION=/u01/app/oraInventory~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_BASE/ s~ORACLE_BASE=$~ORACLE_BASE='${ORACLE_BASE}'~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.asm.OSDBA/ s~oracle.install.asm.OSDBA=$~oracle.install.asm.OSDBA=asmdba~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.asm.OSOPER/ s~oracle.install.asm.OSOPER=$~oracle.install.asm.OSOPER=asmoper~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.asm.OSASM/ s~oracle.install.asm.OSASM=$~oracle.install.asm.OSASM=asmadmin~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ diff ${GRID_HOME}/inventory/response/grid_install.rsp ${SWLIB}/grid_install.rsp
60c60
< INVENTORY_LOCATION=
---
> INVENTORY_LOCATION=/u01/app/oraInventory
75c75
< oracle.install.option=
---
> oracle.install.option=CRS_SWONLY
80c80
< ORACLE_BASE=
---
> ORACLE_BASE=/u01/app/oracle
98c98
< oracle.install.asm.OSDBA=
---
> oracle.install.asm.OSDBA=asmdba
105c105
< oracle.install.asm.OSOPER=
---
> oracle.install.asm.OSOPER=asmoper
111c111
< oracle.install.asm.OSASM=
---
> oracle.install.asm.OSASM=asmadmin

Launch the software installation:

[oracle@rene-ace-test-inst1 grid]$ ${GRID_HOME}/gridSetup.sh -silent -responseFile ${SWLIB}/grid_install.rsp
Launching Oracle Grid Infrastructure Setup Wizard...

The response file for this session can be found at:
 /u01/app/19.0.0.0/grid/install/response/grid_2020-04-04_02-12-20PM.rsp

You can find the log of this install session at:
 /tmp/GridSetupActions2020-04-04_02-12-20PM/gridSetupActions2020-04-04_02-12-20PM.log

As a root user, execute the following script(s):
	1. /u01/app/oraInventory/orainstRoot.sh
	2. /u01/app/19.0.0.0/grid/root.sh

Execute /u01/app/oraInventory/orainstRoot.sh on the following nodes: 
[rene-ace-test-inst1]
Execute /u01/app/19.0.0.0/grid/root.sh on the following nodes: 
[rene-ace-test-inst1]

Successfully Setup Software.
Moved the install session logs to:
 /u01/app/oraInventory/logs/GridSetupActions2020-04-04_02-12-20PM

When the software installation is complete, run orainstRoot.sh and root.sh as root user. Also, run roothas.pl to setup the HAS stack:

[rene@rene-ace-test-inst1 ~]$ sudo su -
Last login: Sat Apr  4 14:06:06 UTC 2020 on pts/0
[root@rene-ace-test-inst1 ~]# /u01/app/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
[root@rene-ace-test-inst1 ~]# /u01/app/19.0.0.0/grid/root.sh
Check /u01/app/19.0.0.0/grid/install/root_rene-ace-test-inst1_2020-04-04_14-15-37-955415144.log for the output of root script

[root@rene-ace-test-inst1 ~]# export GRID_HOME=/u01/app/19.0.0.0/grid
[root@rene-ace-test-inst1 ~]# ${GRID_HOME}/perl/bin/perl -I ${GRID_HOME}/perl/lib -I ${GRID_HOME}/crs/install ${GRID_HOME}/crs/install/roothas.pl
Using configuration parameter file: /u01/app/19.0.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/rene-ace-test-inst1/crsconfig/roothas_2020-04-04_02-16-37PM.log
LOCAL ADD MODE 
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE 
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node rene-ace-test-inst1 successfully pinned.
2020/04/04 14:16:47 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'

rene-ace-test-inst1     2020/04/04 14:17:24     /u01/app/oracle/crsdata/rene-ace-test-inst1/olr/backup_20200404_141724.olr     724960844     
2020/04/04 14:17:24 CLSRSC-327: Successfully configured Oracle Restart for a standalone server
[root@rene-ace-test-inst1 ~]# sleep 10

[root@rene-ace-test-inst1 ~]# ${GRID_HOME}/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
               OFFLINE OFFLINE      rene-ace-test-inst1      STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        OFFLINE OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       rene-ace-test-inst1      STABLE
--------------------------------------------------------------------------------

As the root user, prepare the disks for ASM using UDEV rules:

[root@rene-ace-test-inst1 ~]# if [ ! -e /dev/sdb1 ]; then
>   echo -e "n\np\n1\n\n\nw" | fdisk /dev/sdb
> fi
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0xae81dcbe.

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

Command (m for help): Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): Partition number (1-4, default 1): First sector (2048-314572799, default 2048): Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-314572799, default 314572799): Using default value 314572799
Partition 1 of type Linux and of size 150 GiB is set

Command (m for help): The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@rene-ace-test-inst1 ~]#  
[root@rene-ace-test-inst1 ~]# ASM_DISK1=`/usr/lib/udev/scsi_id -g -u -d /dev/sdb`
[root@rene-ace-test-inst1 ~]#  
[root@rene-ace-test-inst1 ~]# cat > /etc/udev/rules.d/99-oracle-asmdevices.rules <<EOF
> KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/\$parent", RESULT=="${ASM_DISK1}", SYMLINK+="oracleasm/asm-disk1", OWNER="oracle", GROUP="dba", MODE="0660"
> EOF
[root@rene-ace-test-inst1 ~]# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0Google_PersistentDisk_rene-ace-disk-asm1", SYMLINK+="oracleasm/asm-disk1", OWNER="oracle", GROUP="dba", MODE="0660"
[root@rene-ace-test-inst1 ~]# /sbin/partprobe /dev/sdb1
[root@rene-ace-test-inst1 ~]# sleep 10

[root@rene-ace-test-inst1 ~]# /sbin/udevadm control --reload-rules
[root@rene-ace-test-inst1 ~]# sleep 10
[root@rene-ace-test-inst1 ~]# /sbin/partprobe /dev/sdb1
[root@rene-ace-test-inst1 ~]# sleep 10
[root@rene-ace-test-inst1 ~]# /sbin/udevadm control --reload-rules
[root@rene-ace-test-inst1 ~]# sleep 10
[root@rene-ace-test-inst1 ~]# ls -al /dev/oracleasm/*
lrwxrwxrwx. 1 root root 7 Apr  4 14:20 /dev/oracleasm/asm-disk1 -> ../sdb1

As the oracle user, prepare the RDBMS install software response file:

[oracle@rene-ace-test-inst1 grid]$ cp ${ORACLE_HOME}/install/response/db_install.rsp ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.option/ s~oracle.install.option=$~oracle.install.option=INSTALL_DB_SWONLY~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.InstallEdition/ s~oracle.install.db.InstallEdition=$~oracle.install.db.InstallEdition=EE~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_HOSTNAME/ s~ORACLE_HOSTNAME=$~ORACLE_HOSTNAME=`hostname -A`~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^UNIX_GROUP_NAME/ s~UNIX_GROUP_NAME=$~UNIX_GROUP_NAME=oinstall~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^INVENTORY_LOCATION/ s~INVENTORY_LOCATION=$~INVENTORY_LOCATION=/u01/app/oraInventory~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_HOME/ s~ORACLE_HOME=$~ORACLE_HOME='${ORACLE_HOME}'~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_BASE/ s~ORACLE_BASE=$~ORACLE_BASE='${ORACLE_BASE}'~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.DBA_GROUP/ s~oracle.install.db.DBA_GROUP=$~oracle.install.db.DBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OPER_GROUP/ s~oracle.install.db.OPER_GROUP=$~oracle.install.db.OPER_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.BACKUPDBA_GROUP/ s~oracle.install.db.BACKUPDBA_GROUP=$~oracle.install.db.BACKUPDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.DGDBA_GROUP/ s~oracle.install.db.DGDBA_GROUP=$~oracle.install.db.DGDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.KMDBA_GROUP/ s~oracle.install.db.KMDBA_GROUP=$~oracle.install.db.KMDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSDBA_GROUP/ s~oracle.install.db.OSDBA_GROUP=$~oracle.install.db.OSDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSOPER_GROUP/ s~oracle.install.db.OSOPER_GROUP=$~oracle.install.db.OSOPER_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSBACKUPDBA_GROUP/ s~oracle.install.db.OSBACKUPDBA_GROUP=$~oracle.install.db.OSBACKUPDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSDGDBA_GROUP/ s~oracle.install.db.OSDGDBA_GROUP=$~oracle.install.db.OSDGDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSKMDBA_GROUP/ s~oracle.install.db.OSKMDBA_GROUP=$~oracle.install.db.OSKMDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSRACDBA_GROUP/ s~oracle.install.db.OSRACDBA_GROUP=$~oracle.install.db.OSRACDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^SECURITY_UPDATES_VIA_MYORACLESUPPORT/ s~SECURITY_UPDATES_VIA_MYORACLESUPPORT$=~SECURITY_UPDATES_VIA_MYORACLESUPPORT=FALSE~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^DECLINE_SECURITY_UPDATES/ s~DECLINE_SECURITY_UPDATES=$~DECLINE_SECURITY_UPDATES=TRUE~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ diff ${ORACLE_HOME}/install/response/db_install.rsp ${SWLIB}/db_install.rsp
29c29
< oracle.install.option=
---
> oracle.install.option=INSTALL_DB_SWONLY
34c34
< UNIX_GROUP_NAME=
---
> UNIX_GROUP_NAME=oinstall
41c41
< INVENTORY_LOCATION=
---
> INVENTORY_LOCATION=/u01/app/oraInventory
45c45
< ORACLE_HOME=
---
> ORACLE_HOME=/u01/app/oracle/product/19.0.0.0/dbhome_1
50c50
< ORACLE_BASE=
---
> ORACLE_BASE=/u01/app/oracle
62c62
< oracle.install.db.InstallEdition=
---
> oracle.install.db.InstallEdition=EE
79c79
< oracle.install.db.OSDBA_GROUP=
---
> oracle.install.db.OSDBA_GROUP=dba
85c85
< oracle.install.db.OSOPER_GROUP=
---
> oracle.install.db.OSOPER_GROUP=dba
90c90
< oracle.install.db.OSBACKUPDBA_GROUP=
---
> oracle.install.db.OSBACKUPDBA_GROUP=dba
95c95
< oracle.install.db.OSDGDBA_GROUP=
---
> oracle.install.db.OSDGDBA_GROUP=dba
100c100
< oracle.install.db.OSKMDBA_GROUP=
---
> oracle.install.db.OSKMDBA_GROUP=dba
105c105
< oracle.install.db.OSRACDBA_GROUP=
---
> oracle.install.db.OSRACDBA_GROUP=dba

Next, install the RDBMS software:

[oracle@rene-ace-test-inst1 ~]$ ${ORACLE_HOME}/runInstaller -silent -waitforcompletion -responseFile ${SWLIB}/db_install.rsp
Launching Oracle Database Setup Wizard...

The response file for this session can be found at:
 /u01/app/oracle/product/19.0.0.0/dbhome_1/install/response/db_2020-04-04_02-22-21PM.rsp

You can find the log of this install session at:
 /u01/app/oraInventory/logs/InstallActions2020-04-04_02-22-21PM/installActions2020-04-04_02-22-21PM.log

As a root user, execute the following script(s):
	1. /u01/app/oracle/product/19.0.0.0/dbhome_1/root.sh

Execute /u01/app/oracle/product/19.0.0.0/dbhome_1/root.sh on the following nodes: 
[rene-ace-test-inst1]

Successfully Setup Software.

As the root user, run RDBMS root.sh:

[root@rene-ace-test-inst1 ~]#  /u01/app/oracle/product/19.0.0.0/dbhome_1/root.sh
Check /u01/app/oracle/product/19.0.0.0/dbhome_1/install/root_rene-ace-test-inst1_2020-04-04_20-07-58-636555961.log for the output of root script

After the RDBMS software installation completes, setup the listener as the oracle user. Note that from this point forward, all tasks are performed as the oracle user:

[oracle@rene-ace-test-inst1 ~]$ lsnrctl start

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 04-APR-2020 14:39:21

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Starting /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Log messages written to /u01/app/oracle/diag/tnslsnr/rene-ace-test-inst1/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rene-ace-test-inst1.us-central1-c.c.oracle-migration.internal)(PORT=1521)))

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                04-APR-2020 14:39:22
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /u01/app/oracle/diag/tnslsnr/rene-ace-test-inst1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rene-ace-test-inst1.us-central1-c.c.oracle-migration.internal)(PORT=1521)))
The listener supports no services
The command completed successfully
[oracle@rene-ace-test-inst1 ~]$ lsnrctl stop

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 04-APR-2020 14:39:29

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
The command completed successfully
[oracle@rene-ace-test-inst1 ~]$  
[oracle@rene-ace-test-inst1 ~]$ srvctl add listener
[oracle@rene-ace-test-inst1 ~]$ srvctl start listener

Create and start the ASM Instance:

[oracle@rene-ace-test-inst1 ~]$ export ORACLE_SID=+ASM
[oracle@rene-ace-test-inst1 ~]$  
[oracle@rene-ace-test-inst1 ~]$ cat <<EOT > ${GRID_HOME}/dbs/init+ASM.ora
> instance_type=ASM
> asm_diskstring='/dev/oracleasm/*'
> large_pool_size=12M
> remote_login_passwordfile='EXCLUSIVE'
> memory_target=0
> sga_target=3G
> pga_aggregate_target=400M
> processes=1024
> EOT
[oracle@rene-ace-test-inst1 ~]$   
[oracle@rene-ace-test-inst1 ~]$ srvctl add asm -d '/dev/oracleasm/*'
[oracle@rene-ace-test-inst1 ~]$ srvctl start asm
[oracle@rene-ace-test-inst1 ~]$  
[oracle@rene-ace-test-inst1 ~]$ export ORAENV_ASK=NO
[oracle@rene-ace-test-inst1 ~]$ . oraenv
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@rene-ace-test-inst1 ~]$   
[oracle@rene-ace-test-inst1 ~]$ echo "CREATE spfile FROM pfile;" | sqlplus -s / as sysasm

File created.

[oracle@rene-ace-test-inst1 ~]$ srvctl stop asm
[oracle@rene-ace-test-inst1 ~]$ srvctl start asm

Create the ASM diskgroup (DATA):

[oracle@rene-ace-test-inst1 ~]$ echo "
> SELECT header_status,path FROM v\$asm_disk;
>   
> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY
>    DISK '/dev/oracleasm/asm-disk1'
>    ATTRIBUTE
>       'compatible.asm'   = '19.0.0.0.0',
>       'compatible.rdbms' = '19.0.0.0.0';
>   
> SELECT header_status,path FROM v\$asm_disk;
> " | sqlplus -s / as sysasm

HEADER_STATU
------------
PATH
--------------------------------------------------------------------------------
CANDIDATE
/dev/oracleasm/asm-disk1

Diskgroup created.

HEADER_STATU
------------
PATH
--------------------------------------------------------------------------------
MEMBER
/dev/oracleasm/asm-disk1

Create the database (DBTEST):

[oracle@rene-ace-test-inst1 ~]$ mkdir -p /u01/app/oracle/fast_recovery_area/DBTEST
[oracle@rene-ace-test-inst1 ~]$ 
[oracle@rene-ace-test-inst1 ~]$ export ORACLE_SID=DBTEST
[oracle@rene-ace-test-inst1 ~]$ dbca -silent \
>    -createDatabase \
>    -templateName General_Purpose.dbc \
>    -gdbName ${ORACLE_SID} \
>    -sid ${ORACLE_SID} \
>    -createAsContainerDatabase FALSE \
>    -sysPassword welcome1 \
>    -systemPassword welcome1 \
>    -emConfiguration NONE \
>    -datafileDestination '+DATA' \
>    -recoveryAreaDestination '/u01/app/oracle/fast_recovery_area' \
>    -redoLogFileSize 128 \
>    -storageType ASM \
>    -diskGroupName DATA \
>    -characterSet AL32UTF8 \
>    -nationalCharacterSet AL16UTF16 \
>    -registerWithDirService FALSE \
>    -automaticMemoryManagement FALSE \
>    -initparams \
>    db_create_file_dest='+DATA', \
>    db_create_online_log_dest_1='+DATA', \
>    filesystemio_options='SETALL', \
>    log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST', \
>    log_archive_format='DBTEST_%s%t%r.arc', \
>    pga_aggregate_target=200, \
>    processes=150, \
>    sga_target=1200, \
>    streams_pool_size=64M, \
>    undo_retention=7200
[WARNING] [DBT-06208] The 'SYS' password entered does not conform to the Oracle recommended standards.
   CAUSE: 
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
[WARNING] [DBT-06208] The 'SYSTEM' password entered does not conform to the Oracle recommended standards.
   CAUSE: 
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
Prepare for db operation
10% complete
Registering database with Oracle Restart
14% complete
Copying database files
43% complete
Creating and starting Oracle instance
45% complete
49% complete
53% complete
56% complete
62% complete
Completing Database Creation
68% complete
70% complete
71% complete
Executing Post Configuration Actions
100% complete
Database creation complete. For details check the logfiles at:
 /u01/app/oracle/cfgtoollogs/dbca/DBTEST.
Database Information:
Global Database Name:DBTEST
System Identifier(SID):DBTEST
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/DBTEST/DBTEST.log" for further details.

Adjust redo log settings to 128MB:

[oracle@rene-ace-test-inst1 ~]$ sqlplus

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Apr 4 15:04:31 2020
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Enter user-name: /as sysdba

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select name from v$database;

NAME
---------
DBTEST

SQL> SET serverout on
   
DECLARE
   file_bytes           NUMBER;
   max_group            INT;
   num_groups           INT;
   num_threads          INT;
   created              INT;
   str                  VARCHAR2(128);
BEGIN
   --SELECT MAX(bytes) INTO file_bytes FROM v$log;
   SELECT 128*1024*1024 INTO file_bytes FROM dual;
   SELECT COUNT(DISTINCT(thread#)) INTO num_threads FROM v$log;
   SELECT MAX(group#), COUNT(DISTINCT(group#)) INTO max_group, num_groups FROM v$logfile WHERE type = 'ONLINE';
   SELECT COUNT(*) INTO created FROM v$standby_log;
   IF (created = 0) THEN
      FOR i IN 1..num_threads LOOP
         --FOR j IN i..max_group+i LOOP
         FOR j IN 1..5 LOOP
            str := 'ALTER DATABASE ADD LOGFILE THREAD '||i||' GROUP '||TO_CHAR(j+num_groups*i)||' SIZE '||file_bytes;
            DBMS_OUTPUT.put_line(str);
            EXECUTE IMMEDIATE str;
         END LOOP;
      END LOOP;
   END IF;
END;
/SQL> SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25  
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 4 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 5 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 6 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 7 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 8 SIZE 134217728

PL/SQL procedure successfully completed.

Adjust the database to use large pages only, minimize SGA allocations, and adjust file system IO options:

[oracle@rene-ace-test-inst1 ~]$ echo "
>    alter system set USE_LARGE_PAGES=ONLY scope=spfile;
>    alter system set FILESYSTEMIO_OPTIONS=SETALL scope=spfile;
>    alter system set SGA_TARGET=650M scope=spfile;
>    alter system set SGA_MAX_SIZE=650M scope=spfile;
>    alter system set SHARED_POOL_SIZE=256M scope=spfile;
>    alter system set PGA_AGGREGATE_TARGET=50M scope=spfile;
>    alter system set DB_CACHE_SIZE=64M scope=spfile;
>    alter system set PARALLEL_MAX_SERVERS=0 scope=spfile;
>    alter system set RECYCLEBIN=OFF scope=spfile;
> " | sqlplus -s / as sysdba

System altered.


System altered.


System altered.


System altered.


System altered.


System altered.


System altered.


System altered.


System altered.

Stop and restart the database so that the new parameters take effect. Use crs_status.sh to verify the status of the OHAS environment:

[oracle@rene-ace-test-inst1 ~]$ srvctl stop database -d ${ORACLE_SID}
[oracle@rene-ace-test-inst1 ~]$ srvctl start database -d ${ORACLE_SID}

[oracle@rene-ace-test-inst1 ~]$ wget https://raw.githubusercontent.com/rene-ace/oracle_bash_scripts/master/crs_status.sh --output-document=${HOME}/working/crs_status.sh
...
(raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1320 (1.3K) [text/plain]
Saving to: ‘/home/oracle/working/crs_status.sh’

100%[==============================================================================================================================================>] 1,320       --.-K/s   in 0s      

2020-04-04 20:44:09 (224 MB/s) - ‘/home/oracle/working/crs_status.sh’ saved [1320/1320]

[oracle@rene-ace-test-inst1 working]$ chmod 750 ${HOME}/working/crs_status.sh
[oracle@rene-ace-test-inst1 working]$ sed -i '/^CRS_HOME/ s~CRS_HOME=$~CRS_HOME='${GRID_HOME}'~' ${HOME}/working/crs_status.sh
[oracle@rene-ace-test-inst1 working]$ ${HOME}/working/crs_status.sh
Name                                          Target     State           Server                    State details
------------------------------------------------------------------------------------------------------------------------
ora.DATA.dg                                   ONLINE     ONLINE          rene-ace-test-inst1       STABLE 
ora.LISTENER.lsnr                             ONLINE     ONLINE          rene-ace-test-inst1       STABLE 
ora.asm                                       ONLINE     ONLINE          rene-ace-test-inst1       Started,STABLE 
ora.ons                                       OFFLINE    OFFLINE         rene-ace-test-inst1       STABLE 
ora.cssd                                      ONLINE     ONLINE          rene-ace-test-inst1       STABLE 
ora.dbtest.db                                 ONLINE     ONLINE          rene-ace-test-inst1       Open,HOME=/u01/app/o 
ora.diskmon                                   OFFLINE    OFFLINE         STABLE                     
ora.evmd                                      ONLINE     ONLINE          rene-ace-test-inst1       STABLE 

Now that you have created the Google VM, installed OHAS, and created the Oracle DB, we’re set to perform several tasks that I’ll be describing in future series posts. I hope this helps expand your understanding of Google Cloud and how to install a database in it.

Note: This was originally published at rene-ace.

Categories: DBA Blogs

World Backup Day – Protect Your Oracle Databases

Tue, 2020-03-31 08:00
In recognition of World Backup Day, I will discuss backups as they pertain to Oracle databases.

There are a number of options for backing up, testing backups, and restoring them. In fact, the number of possibilities can be a little overwhelming. Backing up does not necessarily need to be too complex. It is best to start off simple, and then modify the backup procedures as needed.

This article is not a tutorial by any means, but just an overview of some of the methods for backing up your database, and testing those backups. For some readers, this may be a refresher, and for those new to Oracle, it is a brief introduction to some features of the Oracle RMAN utility.

Making a backup can be as simple as the RMAN command backup database. Most backups, however, are more complex than that. Rather than spend much time on backups though, I actually want to talk about testing backups.

Backups that do not work when a restore and recovery attempt is made are the things that nightmares are made of. Making sure the backups are actually useful is something that cannot be overstated.

My interest in backups

First a little background on why I tend to be rather particular about making backups and being certain that a backup can be used to successfully restore a database. The two incidents below are from long ago; pre-Pythian and pre-Oracle (at least for me). The technologies used will make this obvious.

Inventory System

Long ago I was responsible for an inventory system. The fact that this inventory software ran on 8-inch floppy disks will provide clues to a couple of things:

  • it was long ago
  • it could not have been a large inventory (thankfully, as you will see)

One day something happened to the inventory disk. I do not recall just what happened, but I do remember that the entire inventory had to be re-entered.

By hand.

By me.

You would think I would have learned my lesson and start making regular backups. But no. If you have never made backups on 8-inch floppy disks, you do not really understand the meaning of the word ‘tedium’.

But of course, there was another crash, the disk was trash, and I had to re-enter all of the inventory.

Again.

By hand.

That was powerful lesson. But wait, there’s more!

The Orders System

At a somewhat later stage, I was at a different job, working onsite at a client site where they ran our custom software. My job that day was to upgrade the version of Xenix (Google it!) on the client’s machine. And of course, before doing that I made a backup. This backup was to tape, and so fairly simple to make.

The backup was made via the Unix (Linux not yet invented) tar command, directly to the tape device. Pretty standard stuff for the time. When the backup was complete, I ran ‘tar tvf /dev/tape’ to get a directory listing of the backup as a means of verification. The command was successful, all the correct files were there.

Next steps:

  • format the system drive
  • install later version of Xenix
  • reload backup software and data files

The first two steps were fine. That final, and crucial step, however, was not so good. It seems a colleague from the employer had upgraded the RAM the previous day. One step in that process was missed. Everything appeared to be OK after that. The problem was the memory used by the tape driver was not handled correctly.

Every file on that backup was nothing but garbage, except for the header. The headers were fine, but the contents were not.

This was not a good day, week, or month.

Should I ever seem rather persnickety and nitpicky about backups and restores, there are good reasons for that.

Backups are not that important

I say that only in jest; of course backups are important. But are there some things more important than backups?

Restores: restores are very important. Even more important is the recovery, as the database cannot be opened until the recovery step completes.

How is recovery different than restore? Recovery is the process that brings the restored files to a point where the database can be opened. Often this point is a particular time or SCN (System Change Number) that is specified at the time the database is recovered. This is referred to as an ‘incomplete’ recovery.

A ‘complete recovery’ is one where ALL redo from the archive logs is applied to the datafiles, applying all changes to the database.

Sure, you need backups to do a restore. But if you never test your backups, you may as well just not make them. A backup that cannot be used when a restore is necessary can make life very difficult indeed. You want to do everything you can to protect your data.

There is much more to know about Oracle backups, but for the purpose of discussing the validation of backups, the backup will be kept simple.

Testing Oracle Backups

The best way to test a backup is to restore it.

While this is the ideal, it is not always practical, and sometimes not even possible due to the size of the database. Very few companies are willing to spend the money and time needed to have a server available just for testing a restore of a 152-terabyte database.

The next best thing is to validate the restores. Here, Oracle has you covered with RMAN methods to validate that backups can indeed be restored.

Backup Validate

First on the list is backup validate. I know, I said this would be about restoring databases. However, the backup validate command can help you determine the condition of your database, spotting physical corruption in the database, and if you choose, it can also locate and record logical discrepancies.

The command can be as simple as backup validate check logical database

This will take some time, depending on the size of the database, as Oracle will read all required database files. This operation can be performed online and is non-destructive, as no backup files are created, and no files are altered.

Read more here: Backup Validate

Make a Backup

The following is the RMAN script used to create a backup that is used for some following commands.

run {
  allocate channel ch1 device type DISK;
  allocate channel ch2 device type DISK;
  backup incremental level=0 tag "RMAN-TEST" format '/mnt/oracle-backups/rac19c01/%d_T%T_db_s%s_p%p_t%t' database filesperset 4 plus archivelog tag "RMAN-TEST";
  alter system switch logfile;
  backup format '/mnt/oracle-backups/rac19c01/%d_T%T_arch_s%s_p%p_t%t' archivelog all filesperset 4 tag "RMAN-TEST";
  backup format '/mnt/oracle-backups/rac19c01/%d_T%T_cf%s_p%p_t%t' current controlfile tag "RMAN-TEST";
}
Restore Preview

Restore preview is fast, very fast. It is fast because it really doesn’t do much. The preview simply reads the metadata from the backup catalogue. This is useful just to ensure that the catalogue has captured the files necessary for a restore. This command does not read any files, just the metadata.

RMAN> restore preview database;

Starting restore at 2020-03-02 17:30:50
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=300 instance=cdb1 device type=DISK


List of Backup Sets
===================


BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
24      Incr 0  840.32M    DISK        00:00:20     2020-03-02 17:14:57
        BP Key: 24   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s33_p1_t1034010877
  List of Datafiles in backup set 24
  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name
  ---- -- ---- ---------- ------------------- ----------- ------ ----
  1    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/system.259.1021126503
  4    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/undotbs1.261.1021126575
  9    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/undotbs2.274.1021127203

...


List of Archived Log Copies for database with db_unique_name CDB
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - -------------------
810     1    457     A 2020-03-02 17:14:19
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_457.346.1034010983

812     1    458     A 2020-03-02 17:16:23
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_458.342.1034010991

813     1    459     A 2020-03-02 17:16:31
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_459.340.1034010995

811     2    359     A 2020-03-02 17:14:19
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_359.344.1034010985

814     2    360     A 2020-03-02 17:16:25
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_360.339.1034010997

recovery will be done up to SCN 80321491
Media recovery start SCN is 80321491
Recovery must be done beyond SCN 80321782 to clear datafile fuzziness
Finished restore at 2020-03-02 17:30:52

At the end of the listing, you will find that media recovery must be done beyond SCN 80321782 for a successful recovery.

However, the listing does not include that SCN information. It is there, we just need to look at the archive logs explicitly:

RMAN> list backup of archivelog all;

List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
3       325.40M    DISK        00:00:06     2020-03-02 17:05:26
        BP Key: 3   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s11_p1_t1034010320

  List of Archived Logs in backup set 3
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    451     80282070   2020-03-02 15:00:03 80302686   2020-03-02 16:22:34
  1    452     80302686   2020-03-02 16:22:34 80318226   2020-03-02 17:05:18
  2    355     80302690   2020-03-02 16:22:36 80318223   2020-03-02 17:05:17

...

  List of Archived Logs in backup set 43
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    457     80321365   2020-03-02 17:14:19 80321856   2020-03-02 17:16:23
  1    458     80321856   2020-03-02 17:16:23 80321897   2020-03-02 17:16:31
  1    459     80321897   2020-03-02 17:16:31 80321912   2020-03-02 17:16:34
  2    360     80321860   2020-03-02 17:16:25 80321989   2020-03-02 17:16:37

...

We see 80321860, which is well past 80321782 and so we are confident that at least the required files have been recorded in the backup catalogue.

Note: By default, RMAN uses a date formate of ‘YYYY-MON-DD’.

This behavior can be altered by setting the NLS_DATE_FORMAT shell variable:

$ export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'

$ rman target /
...

Restore Validate

Does the knowledge that your backup files are recorded in the backup catalogue give you a warm, fuzzy feeling of contentment?

It shouldn’t. A database cannot be restored with metadata alone. Not knowing that backups have been tested is the kind of thing that should keep a DBA awake at night.

Knowing that you need proper rest, let’s see another method to test backups, restore validate.

RMAN can validate the files used for a restore. This is not just a test to see if the files are available, but a read of all blocks in the backup files to ensure they are valid.

The following is an example, using the backup that was made previously:

RMAN> restore validate database;

Starting restore at 2020-03-02 17:52:37
using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s32_p1_t1034010877
channel ORA_DISK_1: piece handle=/mnt/oracle-backups/rac19c01/CDB_T20200302_db_s32_p1_t1034010877 tag=RMAN-TEST
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01

...

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s43_p1_t1034010975
channel ORA_DISK_1: piece handle=/mnt/oracle-backups/rac19c01/CDB_T20200302_db_s43_p1_t1034010975 tag=RMAN-TEST
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:07
Finished restore at 2020-03-02 17:54:11

This is a small test database; the validation required only about two minutes.

It’s okay to get some warm and fuzzies now, but don’t get too cozy. There are other types of files that must be restored for a recovery to take place.

Let’s check on the archive log files. We know that all the required archive logs were created after 13:00 today, as the backup was made at about 17:14, and the alert log shows several log switches between 17:00 and 17:14

2020-03-02T17:07:24.248403-08:00
Thread 1 advanced to log sequence 455 (LGWR switch)
  Current log# 1 seq# 455 mem# 0: +DATA/CDB/ONLINELOG/group_1.265.1021126651
  Current log# 1 seq# 455 mem# 1: +DATA/CDB/ONLINELOG/group_1.268.1021126653
2020-03-02T17:07:24.320618-08:00
ARC2 (PID:30093): Archived Log entry 805 added for T-1.S-454 ID 0x7f6be175 LAD:1

So now we can validate that archive logs can be read:

RMAN>  restore validate archivelog  from time "to_date('2020-03-02 17:00:00','yyyy-mm-dd hh24:mi:ss')";

Starting restore at 2020-03-02 18:09:58
using channel ORA_DISK_1

channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_452.341.1034010319
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_453.349.1034010441
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_454.352.1034010445
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_455.351.1034010445
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_456.347.1034010859
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_457.346.1034010983
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_458.342.1034010991
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_459.340.1034010995
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_355.353.1034010317
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_356.319.1034010441
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_357.350.1034010447
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_358.348.1034010859
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_359.344.1034010985
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_360.339.1034010997
Finished restore at 2020-03-02 18:10:12

Let’s also see if the controlfile is there:

RMAN>  restore validate controlfile;

Starting restore at 2020-03-02 18:13:09
using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece +DATA/CDB/AUTOBACKUP/2020_03_02/s_1034011019.338.1034011019
channel ORA_DISK_1: piece handle=+DATA/CDB/AUTOBACKUP/2020_03_02/s_1034011019.338.1034011019 tag=TAG20200302T171659
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
Finished restore at 2020-03-02 18:13:11

Finally, a list of the backup by tag (you do use tags, don’t you?) can be used to show that all files are present:

RMAN> list backupset tag "RMAN-TEST";


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
3       325.40M    DISK        00:00:06     2020-03-02 17:05:26
        BP Key: 3   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s11_p1_t1034010320

...

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
44      4.32M      DISK        00:00:00     2020-03-02 17:16:54
        BP Key: 44   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s53_p1_t1034011014

  List of Archived Logs in backup set 44
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    453     80318226   2020-03-02 17:05:18 80318656   2020-03-02 17:07:20
  1    454     80318656   2020-03-02 17:07:20 80318683   2020-03-02 17:07:24
  2    356     80318223   2020-03-02 17:05:17 80318660   2020-03-02 17:07:21
  2    357     80318660   2020-03-02 17:07:21 80318695   2020-03-02 17:07:27

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
45      Full    19.08M     DISK        00:00:01     2020-03-02 17:16:58
        BP Key: 45   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_cf54_p1_t1034011017
  Control File Included: Ckp SCN: 80322102     Ckp time: 2020-03-02 17:16:57

Now you can rest easy, as the data files, archivelogs and controlfile are all present in the backup made earlier.

The only way to be more sure of the backup is to use it to restore and recover the database.

Docs here: RESTORE

Recover Validate Header

While recover validate may sound like restore validate on steroids, it really isn’t.

According to the documentation, this command is nearly the same as restore preview. The difference is that when recover validate is used, the file headers are read to ensure they match what is found in the backup catalogue.

This would provide a little more assurance than the restore preview, but not as much as the restore validate.

The problem here is that in this 19.3 database, recover database validate header is examining only the archive log backups:

MAN> recover database validate header;

Starting recover at 2020-03-02 18:43:11
using channel ORA_DISK_1
using channel ORA_DISK_1


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
40      173.87M    DISK        00:00:04     2020-03-02 17:16:49
        BP Key: 40   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s49_p1_t1034011005

  List of Archived Logs in backup set 40
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    449     80240390   2020-03-02 12:18:51 80261546   2020-03-02 13:39:51

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
38      199.81M    DISK        00:00:05     2020-03-02 17:16:43
        BP Key: 38   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s47_p1_t1034010998

  List of Archived Logs in backup set 38
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  2    354     80240394   2020-03-02 12:18:51 80302690   2020-03-02 16:22:36
  1    450     80261546   2020-03-02 13:39:51 80282070   2020-03-02 15:00:03

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
39      174.02M    DISK        00:00:05     2020-03-02 17:16:43
        BP Key: 39   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s48_p1_t1034010998

  List of Archived Logs in backup set 39
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    451     80282070   2020-03-02 15:00:03 80302686   2020-03-02 16:22:34

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
41      151.39M    DISK        00:00:05     2020-03-02 17:16:50
        BP Key: 41   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s50_p1_t1034011005

  List of Archived Logs in backup set 41
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    452     80302686   2020-03-02 16:22:34 80318226   2020-03-02 17:05:18
  2    355     80302690   2020-03-02 16:22:36 80318223   2020-03-02 17:05:17

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
44      4.32M      DISK        00:00:00     2020-03-02 17:16:54
        BP Key: 44   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s53_p1_t1034011014

  List of Archived Logs in backup set 44
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  2    356     80318223   2020-03-02 17:05:17 80318660   2020-03-02 17:07:21
  1    453     80318226   2020-03-02 17:05:18 80318656   2020-03-02 17:07:20
  1    454     80318656   2020-03-02 17:07:20 80318683   2020-03-02 17:07:24
  2    357     80318660   2020-03-02 17:07:21 80318695   2020-03-02 17:07:27

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
42      15.46M     DISK        00:00:00     2020-03-02 17:16:53
        BP Key: 42   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s51_p1_t1034011013

  List of Archived Logs in backup set 42
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    455     80318683   2020-03-02 17:07:24 80318691   2020-03-02 17:07:24
  1    456     80318691   2020-03-02 17:07:24 80321365   2020-03-02 17:14:19
  2    358     80318695   2020-03-02 17:07:27 80321362   2020-03-02 17:14:19
  2    359     80321362   2020-03-02 17:14:19 80321860   2020-03-02 17:16:25

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
43      5.46M      DISK        00:00:00     2020-03-02 17:16:53
        BP Key: 43   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s52_p1_t1034011013

  List of Archived Logs in backup set 43
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    457     80321365   2020-03-02 17:14:19 80321856   2020-03-02 17:16:23
  1    458     80321856   2020-03-02 17:16:23 80321897   2020-03-02 17:16:31
  2    360     80321860   2020-03-02 17:16:25 80321989   2020-03-02 17:16:37
  1    459     80321897   2020-03-02 17:16:31 80321912   2020-03-02 17:16:34
validation succeeded for backup piece
recovery will be done up to SCN 2163919
Media recovery start SCN is 2163919
Recovery must be done beyond SCN 80321989 to clear datafile fuzziness
Finished recover at 2020-03-02 18:43:13

At this time I do not know if this is due to a bug in the code, or a bug in the documentation.

Either way, restore validate gives me the kind of assurance I need that my backups can be restored.

Learn more about Pythian’s Oracle Services.

Categories: DBA Blogs

Pages