Data Guard for Dummies

Written by Charles Kim,  Oracle ACE and an Oracle Certified DBA

Data Guard provides a simple solution to protect your data.  Data Guard maintains standby databases as transactionally consistent copies of the production database. These standby databases can be located at remote disaster recovery sites thousands of miles away from the production data center, or they may be located in the same city, same campus, or even in the same building. If the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch any standby database to the production role, thus minimizing the downtime associated with the outage, and preventing any data loss.

You can have up to nine standby databases, logical or physical.  While data guard is traditionally considered a disaster recovery tool, it can also alleviate traffic and improve performance by offloading queries and reports from your production server.  In this article, we will look at the architecture behind Data Guard, tell you what factors you might consider in your implementation, and then lead you through a Data Guard setup.

There is a case study on Oracle’s website on using RAC, ASM, and Data Guard to provide Maximum Availability (MAA).  It is a how-to case study from working with Fidelity National Financial and is available through this link.  Also, you can consult www.dataguardbook.com.  It is written by the authors of this presentation and has lots of scripts and downloads related to Data Guard.

Data Guard Protection Modes

Maximum protection. This protection mode ensures that no data loss will occur if the primary database fails. To provide this level of protection, the redo data needed to recover each transaction must be written to both the local online redo log and to the standby redo log on at least one standby database before the transaction commits.

Maximum availability. This protection mode provides the highest level of data protection that is possible without compromising the availability of the primary database. Like maximum protection mode, a transaction will not commit until the redo needed to recover that transaction is written to the local online redo log and to the standby redo log of at least one transactionally consistent standby database. Unlike maximum protection mode, the primary database does not shut down if a fault
prevents it from writing its redo stream to a remote standby redo log. Instead, the primary database operates in maximum performance mode until the fault is corrected, and all gaps in redo log files are resolved. When all gaps are resolved, the primary database automatically resumes operating in maximum availability mode.

Maximum performance. This protection mode (the default) provides the highest level of data protection that is possible without affecting the performance of the primary database. This is accomplished by allowing a transaction to commit as soon as the redo data needed to recover that transaction is written to the local online redo log.

MYTHBUSTER: LGWR Transmits Redo to Standby Databases, Redo Only Gets Sent at Log Switch Time

Redo Transport Services

At the same time that LGWR is writing redo, a separate Data Guard process called Log Network Server (LNS) is transmitting redo from the redo buffer in SGA to the standby database.

On the standby database, redo records are received by another Data Guard process called Remote File server (RFS).  RFS receives the redo and writes it sequentially to a standby redo log file (SRL).  Data Guard supports two redo methods using LNS: synchronous (LGWR SYNC) or asynchronous (LGWR ASYNC).  With 11g, it drops the LGWR prefix and is simply SYNC or ASYNC.

SYNC guarantees protection for each transaction by requiring an acknowledgement be returned for the standby database before proceeding.  When RFS sees that the write is completed on the standby, it sends an acknowledgement back to LGWR through LNS on the primary database.



 

MYTHBUSTER: Any zero data loss configuration Data Guard configuration will result in production downtime if the standby database is not reachable.

ASYNC doesn’t require the primary to wait for status on the standby databases.  If the waits get too long, then LNS automatically switches to the online redo logs.  Once it has caught up, then it will switch back to writing out of the redo buffer.  Because LGWR doesn’t have to wait for acknowledgements, it has limited impact on performance compared with the SYNC method.

MYTHBUSTER: You Must Recreate the Original Primary Database After Failover

Automatic Gap Resolution

If connectivity is lost between the primary and one or more standby databases, redo data being generated on the primary database cannot be sent to those standby databases. Once a connection is reestablished, the missing archived redo log files (referred to as a gap) are automatically detected by Data Guard, which then automatically transmits the missing archived redo log files to the standby databases. The standby databases are synchronized with the primary database, without manual intervention by the DBA.

The primary database has the Fetch Archive Log (FAL) process to provide a client-server mechanism for transmitting archived logs to the standby following a communication loss between the primary and standby(s), for automatic gap resolution and resynchronization.  Both FAL_CLIENT and FAL_SERVER parameters use the net service names of the primary database.

Compression

New in 11g, Redo Transport Compression resolves gaps up to 3x faster with better data protection.  Compression has a negligible impact on response time or throughput given there is sufficient CPU.  It requires Oracle Advanced Compression Option 11g.  To enable compression set Data Guard broker property _REDO_TRANSPORT_COMPRESS_ALL or set compression attribute of the redo transport destination.

Network compression might be particularly useful with ASYNC where network volume exceeds bandwidth.  Let’s do a test case with a 100Mbps network (12.5MB per second) where the redo rate is 22MB per second.  Without compression, transport lag increased linearly over time.  With compression enabled, transport lag ranged from 4-10 seconds with a compression ratio of 60%.

Log Apply Services

Data Guard provides two methods to apply this redo data to the standby database and keep it transactionally consistent:

  • Redo Apply, used for physical standby databases
  • SQL Apply, used for logical standby databases

 

Redo Apply

Physical Standby Database is a block-for-block copy of the primary database that uses redo apply to apply changes.  The Managed Recovery Process (MRP) service mines the redo log and applies it to the redo logs on the physical standby.  While apply is active, it can be opened in read-only mode for reporting/queries.  It can also be used for fast backups, further offloading the production database.

To open the physical standby read-only in 10g, recovery needs to be canceled on the standby using the following command:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

and then the database can be opened read-only:

ALTER DATABASE OPEN;

To change the standby database back to being a physical standby database performing redo apply, active user sessions need to be terminated and MRP restarted with the ALTER DATABASE RECOVER MANAGED STANDBY DATABASE command.

MYTHBUSTER: A physical standby database can’t receive primary undo while open read-write.

SQL Apply

Logical Standby Database is an open, independent, active database that contains the same logical information (rows) as the production database.  However, the physical organization and structure can be very different.  It uses the Logical Standby Process (LSP) service instead of the MRP service.  With SQL Apply, you can host multiple schemas which can be queried for reports while redo data is being applied via SQL.  You can also create additional indexes and materialized views for better query performance.

On 10g, the logical standby will wait for the full archived redo log file.  To start real-time apply on the logical standby database, issue the following statement:

SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;

The following example shows how to stop SQL Apply, bypass the database guard,
execute SQL statements on the logical standby database, and then reenable the guard:

SQL> ALTER DATABASE STOP LOGICAL STANDBY APPLY; Database altered.
SQL> ALTER SESSION DISABLE GUARD;
PL/SQL procedure successfully completed.
SQL> ALTER TABLE SCOTT.EMP ADD CONSTRAINT EMPID UNIQUE (EMPNO);
Table altered.
SQL> ALTER SESSION ENABLE GUARD;
PL/SQL procedure successfully completed.
SQL> ALTER DATABASE START LOGICAL STANDBY APPLY;
Database altered.

 

MYTHBUSTER: The Data Guard Broker is your single point of failure.

The Data Guard broker automates the creation, maintenance, and monitoring of Data Guard configurations.  You can use it to simplify switchovers and failovers by allowing you to invoke them using either a single key click or enable fast-start takeover to fail over autiomatically.

If the Data Guard Broker is enabled, Data Guard also uses the Data Guard Broker Monitor (DMON) process to manage and monitor the primary and standby databases as a unified configuration.

The Data Guard command-line interface (DGMGRL) enables you to control and monitor a Data Guard configuration from the DGMGRL prompt or within scripts. You can perform most of the activities required to manage and monitor the databases in the configuration using DGMGRL.

In addition to these, DG Menu is a comprehensive script to diagnose, monitor, and maintain both physical and logical Data Guard configurations.  It is available through the Data Guard book download section or by clicking here.  DG Menu is always being improved and modified as we implement the menu system from one client to the next.

Switchover and Failover

Switchover is a planned role reversal while a failover accommodates an unplanned failure of the primary database.  You can execute this switchover via SQL, a simple GUI interface, or automate it using Data Guard Fast-Start Failover.  Fast-start failover provides the ability to automatically fail over to a designated, synchronized standby database in the event of loss of the primary database.  After a fast-start failover occurs, the old primary database is automatically reconfigured as a new standby database upon reconnection.

In 11g, there is a new parameter for FastStartFailoverLagLimit.  It establishes an acceptable limit, in seconds, that the standby is allowed to fall behind the primary in terms of redo applied, beyond which a fast-start failover will not be allowed. The lowest possible value is 10 seconds and it defaults to 30 seconds.

New in 11g, DBAs can define their own conditions for an immediate automatic failover under the following conditions.

  • Datafile Offline
  • Corrupted Controlfile
  • Corrupted Dictionary
  • Inaccessible Logfile
  • Stuck Archiver
  • Any explicit ORA-xyz error

ENABLE FAST_START FAILOVER [CONDITION <value>]

Also in 11g, applications can request fast-start failover using the following API call.
DBMS_DG.INITIATE_FS_FAILOVER

Data Guard Setup Requirements

There are several key Data Guard setup considerations. First, the DBAs should be cognizant of the timezone differences between the primary and DR database servers. As a recommendation, the DBAs should push to have both environments at the same timezones. Applications that are sensitive to time will be impacted by this decision.  Since ASM forces OMF naming standards, databases in Data Guard should be consistent in their configuration.

DB_UNIQUE_NAME initialization parameter plays a key role in the data guard. The primary and Data Guard DB_UNIQUE_NAME must be unique in the data guard configuration. In our example, the standard name for DB_UNIQUE_NAME on the DR site was ${ORACLE_SID}DR. Since DB_UNIQUE_NAME is set, DB_FILE_NAME_CONVERT and LOG_FILE_NAME_CONVERT initialization parameters are set using the ${ORACLE_SID}DR standard naming convention.  Finally, the DB_UNIQUE_NAME is also used to register the database to Oracle’s cluster registry with srvctl.

Password File

The password file is a requirement for Data Guard. The password file should be created on the primary and physical standby database (if it doesn’t already exist). Command syntax:

> orapwd file=[password file name] password=[sys password] entries=25

As a general rule, when the orapwd file is updated on the primary database, it is also pushed to the DR site using sftp or rsync.

Enable Flashback Recovery

Below is a simple script to enable Flashback recovery. The database must be in mounted state to enable Flash Recovery Area.

ALTER SYSTEM SET db_file_recovery_dest_size = ‘300G’ SCOPE=BOTH SID=’*’;

Then set both DB_FILE_RECOVERY_DEST and DB_FLASHBACK_RETENTION_TARGET

ALTER SYSTEM SET db_file_recovery_dest = ‘+ DG_EDOC_PF501′ SCOPE=BOTH SID=’*’;
ALTER SYSTEM SET db_flashback_retention_target = 2880;SQL> show parameter recovery_file

NAME                                            TYPE   VALUE
————————————       —-       ——————————
db_recovery_file_dest             stri      +DG_EDOC_PF501
db_recovery_file_dest_size   big       300G

Currently, the only thing that is stored in the FRA is the Flashback logs.  Flashback should be enabled for both primary and DR databases. Here, DBAs chose to implement the Flashback database option as an alternative to setting a time delay to the application of archived redo log files. In the event of data corruption, the DR site can utilize the Flashback database option to a point in time to recover from the application of corrupted or erroneous data to the standby database.

Primary Database Initialization Parameters

Below are the key Oracle initialization parameters on the primary database server to enable Data Guard:

DB_NAME   EDOCPOC
DB_UNIQUE_NAME   EDOCPOC
SERVICE_NAMES:   EDOCPOC
LOG_ARCHIVE_CONFIG   ‘DG_CONFIG=(EDOCPOC, EDOCTLR)’
LOG_ARCHIVE_DEST_STATE_1   enable
LOG_ARCHIVE_DEST_1   ‘LOCATION=+DBA_DD501
VALID_FOR(ALL_LOGFILES, ALLROLES)
DB_UNIQUE_NAME=EDOCPOC’
LOG_ARCHIVE_DEST_STATE_2   enable
LOG_ARCHIVE_DEST_2   ‘SERVICE=EDOCPOC_LTC lgwr async
DELAY=minutes (default 30mins)
VALID_FOR(ONLINE_LOGFILES, PRIMARY_ROLE)
DB_UNIQUE_NAME=EDOCTLR’
FAL_CLIENT    EDOCPOC_CTC
FAL_SERVER    EDOCPOC_LTC
DB_CREATE_FILE_DEST    +DBA_DD501
DB_FILE_NAME_CONVERT    ‘+DG_EDOC_PD501’,’+DBA_DD501’
LOG_FILE_NAME_CONVERT    ‘+DG_EDOC_PD501’,’+DBA_DD501’
STANDBY_FILE_MANAGEMENT    AUTO

The lgwr_async parameter defines that Data Guard will use asynchronous redo transport services to stream redo data to the standby location.

Physical Standby Configuration Parameters

Below are the key Oracle initialization parameters on the failover database server that impact Data Guard:

DB_NAME    EDOCPOC
DB_UNIQUE_NAME    EDOCTLR
SERVICE_NAMES:    EDOCTLR
LOG_ARCHIVE_CONFIG    ‘DG_CONFIG=(EDOCTLR, EDOCPOC)’
LOG_ARCHIVE_DEST_STATE_1    enable
LOG_ARCHIVE_DEST_1    ‘LOCATION=+DG_EDOC_PD501
VALID_FOR(ALL_LOGFILES, ALLROLES)
DB_UNIQUE_NAME=EDOCTLR’
LOG_ARCHIVE_DEST_STATE_2    enable
LOG_ARCHIVE_DEST_2    ‘SERVICE=EDOCPOC_CTC lgwr async
DELAY=minutes (default 30mins)
VALID_FOR(ONLINE_LOGFILES, PRIMARY_ROLE)
DB_UNIQUE_NAME=EDOCPOC’
FAL_CLIENT    EDOCPOC_LTC
# primary use this net8 name to push logs to
FAL_SERVER (net8 name)    EDOCPOC_CTC
# fetch archived logs from primary server.
DB_CREATE_FILE_DEST    +DG_EDOC_PD501
DB_FILE_NAME_CONVERT    ’+DBA_DD501’, ‘+DG_EDOC_PD501’
LOG_FILE_NAME_CONVERT    ’+DBA_DD501’, ‘+DG_EDOC_PD501’
STANDBY_FILE_MANAGEMENT    AUTO

 

Create Standby Redo Logs

According to Oracle, Data Guard can recover and apply more from redo data from a standby redo log than from archived redo log files alone. With this in mind and for increased availability, we decided to implement standby redo logs. In addition, standby redo logs were multiplexed similarly to the way online redo logs are multiplexed.
As far as implementing standby redo logs, there were some guidelines implemented:

  • When the DBA adds a redo group on primary database, the DBA must add an equivalent standby redo group. Using group numbers will simplify maintainability of standby redo log file groups. Do not skip log file group numbers for standby redo log file groups. This will incur additional space in the standby database control file.
  • Recommended to add a redo group to a thread; thread is optional for standby database. If a thread is not assigned at creation time, Oracle will implicitly assign thread to standby.
  • Ensure that log files sizes are identical on the primary and standby databases
  • Determine the appropriate number of standby redo log file groups. The following equation is recommended: ( maximum number of logfiles for each thread + 1 ) * maximum number of threads
  • Create equivalent standby redo log file groups on the primary database that mirror the Data Guard Standby database. This will reduce the time required to switchover to the standby role and reduce the amount of time needed for manual DBA tasks.
SQL> alter database add standby logfile group 4 (‘+DG_EDOC_PD501′,’+DG_EDOC_PD501’) size 150M;
Database altered.
SQL> alter database add standby logfile group 24 (‘+DG_EDOC_PD501′,’+DG_EDOC_PD501’) size 150M;
Database altered.

Furthermore, DBAs need to verify that current database parameters such as MAXLOGFILES and MAXLOGMEMBERS are adequately set for creating additional standby log groups and logfiles. One of the ways to do this is by looking at the trace file created by the “alter database backup controlfile to trace” command.

Archive Logging Requirement

With any Data Guard implementation, forced database logging is a requirement.

alter database force logging
SQL>/

If the database is already in “force logging” mode, you will see the ORA-12920 message displayed.

Summarized materialized view data did not need to be involved in the redo transport process since all the relevant data existed in the underlying baseline tables. The rebuild of the key materialized views would have to be re-created at the DR site as a manual failover process.

Although both primary and DR Data Guard environments are multi-node RAC databases, only one node will be active to receive and apply Redo/Archive logs. Instead of having all the instances in the RAC participate in the Data Guard redo apply process, the first instance was targeted for this task. The second instance was left shutdown until a failover / switchover occurred.

Start / Stop Redo Apply Process

To start the redo apply process, enter the following command:

> alter database recover managed standby database disconnect from session;

The “DISCONNECT FROM SESSION” option causes the redo apply to run in a background session. The command to stop redo apply is:

> alter database recover managed standby database cancel;

To start real-time apply, include the USING CURRENT LOGFILE clause on the SQL statement:

>alter database recover managed standby database using current logfile disconnect from session;

If the standby database is already set for delayed application of archive logs, this can be disabled by:

>alter database recover managed standby database nodelay;

The above two options will enable REAL TIME Apply with no delays to the physical standby implementation.

Transport: LWGR, ASYNC
Redo Apply: Real time Apply overwrites the delay option
Delay: No Delay

 

Protection Mode Upgrade / Downgrade

Situations will arise when protection modes may need to be upgraded or downgraded. Below are the
requirements specified at each of the protection modes offered by Data Guard:

Max Protection: LWGR, SYNC, AFFIRM, require standby Redo
Max Availability: LWGR, SYNC, AFFIRM, require standby Redo
Max Performance: LWGR, ASYNC/SYNC, AFFIRM/NOAFFIRM, not require standby Redo

ARCH, SYNC, AFFIRM/NOAFFIRM does not require standby Redo logs on the Data Guard site, though the recommendation is to use them.

On primary system …shutdown and mount before

  1. alter database set standby database to maximize [protection|availability|performance] ;
  2. alter system set log_archive_dest_2 = ‘SERVICE=EDOCPOC_LTC optional
    lgwr sync affirm VALID_FOR=(ONLINE_LOGFILES, PRIMARY_ROLE) DB_UNIQUE_NAME=EDOCPOC’
    DELAY=minutes (default is 30mins);

Repeat step a). on the standby system.
Repeat step b). on the standby after a switchover.

Gap Resolution

Typical archive log gap resolution occurs automatically. In rare situations, automatic gap recovery might not take place (such as when the required archive log is no longer on-disk at the primary database) and DBAs may need to perform gap recovery manually.

To manually resolve gaps:

a). Determine gap on standby db

select thread#, low_sequence#, high_sequence# from v$archive_gap;

b). Based on the output above, locate archived logs on primary:

select name from v$archived_log
where thread# = ##
and dest_id=1 (dest_1)
and sequence# between <low_sequence# > and <high_sequence#>;

c). Copy ver (rman restore if ASM) these files to standby ser

RMAN> copy archivelog
‘+DBA_DD 927’ 501/EDOCPOC/ARCHIVELOG/2006_07_07/thread_2_seq_60.1227.595183
to ‘/tmp/dba/t2s60.dbf’;
sftp /tmp/dba/t2s60.dbf oracle@iltcdb001:/tmp/dba/EDOCPOC_t2s60.arc

d). Register these files on standby database:

SQL> alter database register physical logfile ‘/tmp/dba/EDOCPOC_t2s60.arc’;
Database altered.

e). Restart redo apply:

alter database recover managed standby database disconnect from session;

f). Repeat from a). until there are no more gaps:

The v$archive_gap fixed view only returns the next gap that is currently blocking Redo Apply from continuing. After resolving the gap and starting redo apply process, query v$archive_gap again on the physical standby to determine the next gap.

FAL_SERVER / CLIENT

The FAL mechanism handles the following types of archive gaps and problems:

  • When creating a physical or logical standby database, the FAL mechanism can automatically retrieve any any archived redo log files generated during a hot backup of the primary database.
  • When there are problems with archived redo log files that have already been received on the standby database, the FAL mechanism can automatically retrieve archived redo log files to resolve any of the following situations:
1. When the archived redo log file is deleted from disk before it is applied to the standby database.
2. When the archived redo log file cannot be applied because of a corruption.
3. When the archived redo log file is accidentally replaced by another file (for example, a text file) that is not an archived redo log file before the redo data has been applied to the standby database.
  • When you have multiple physical standby databases, the FAL mechanism can automatically retrieve missing archived redo log files from another physical standby database.

The FAL client and server are configured using the FAL_CLIENT and FAL_SERVER initialization parameters that are set on the standby database.

Operations that Affect Standby Database

Changes made to the Init.ora parameter on the primary site must manually be changed on the standby database(s).

If the DBA renames a datafile in the primary database, the DBA must manually rename the same datafiles in standby even though standby_file_management parameter is set to AUTO.  Below are the steps required to rename the OMF datafile in the physical standby:

1. Check for gaps in logs

select thread#, low_sequence#, high_sequence# from v$archive_gap;
select max(r.sequence#) last_seq_recd, max(l.sequence#) last_seq_sent
from v$archived_log r, v$log l
where r.dest_id = 2 and l.archived= ‘YES’;

2. Stop RedoApply – Stop DG

alter database recover managed standby database cancel;

3. Shutdown Immediate

4. Rman copy datafile ‘old_file’ to ‘+DG’

5. Startup mount

6. Alter database rename file ‘old_file’ to ‘new_file’;

7. Start DG

alter database recover managed standby database using current logfile disconnect;

When Adding/Dropping/Resizing Online redo Logs on the primary database, the same corresponding action for the online and standby redo logs needs to be performed on the physical standby database.

Stop RedoApply – Stop DG

alter database recover managed standby database cancel;
alter system set standby_file_management = ‘MANUAL’;
alter database add logfile thread # group # size 100m;
alter database drop logfile group # ;
alter system set standby_file_management = ‘AUTO’;

 


Charles Kim is an Oracle ACE and an Oracle Certified DBA. Charles is a freelance consultant and works predominately in the Maximum Availability Architecture (MAA) space (RAC, ASM,Data Guard, and other HA solutions).

Charles released his first book, titled Oracle Database 11g New Features for DBA and Developers, in November 2007. Charles also co-authored the Linux Recipes for Oracle DBAs with APress, published in November 2008. Charles is also the author of the MAA case study at Oracle’s website.

He holds certifications in Oracle, Red Hat Linux, and Microsoft; has more than 18 years of IT experience; and has worked with Oracle since 1991. Charles blogs regularly on his DBAExpert.com website and provides technical solutions to Oracle DBAs and developers.