Quantcast
Channel: Ramblings of a SQL DBA mind… by Nitin Garg
Viewing all articles
Browse latest Browse all 34

Transaction log file full because of log_reuse_wait_desc=replication, CDC, goldengate

$
0
0

Came across weird situation where received alert for SQL Server transaction log file full and found log_reuse_wait_desc=replication from sys.databases

 

Now before mentioning troubleshooting step, setup of the database is question is little bit complex because

  • Database is part of SQL Server replication where database publisher is at vendor side and we don’t have access to publisher DB :(
  • Oracle Golden Gate replication is configured from this database to Oracle DB
  • Database in question is in SIMPLE recovery model
  • Change Data Capture (CDC) is found enabled

 

To start with resolution, below steps are taken to troubleshoot and fix this,

 

  • First ran script to find log_reuse_wait_desc and is_cdc_enabled

    select log_reuse_wait_desc,is_cdc_enabled,
    *
    from
    sys.databases

    Output of this script told that, log reuse wait is “Replication” and CDC bit is “1” which means it is enabled for this database

    

 

  • Then ran following script to understand active subscriptions present on database as I don’t have access to publisher,

    Note: run this query under the database context and not on master

    SELECT publisher,publisher_db,publication,time, distribution_agent,transaction_timestamp FROM dbo.MSreplication_subscriptions

 

Output of this query told that the there are two active subscriptions for this database where time value told that one is running actively fine and other one is 3 years old, which seemed strange,

 

  • Worked with vendor and vendor told that the active one is what they have at publisher side whereas other one is not present at their end, after providing the distribution_agent name to them and via SQL agent validation it was confirmed that job doesn’t exist anymore hence subscription is not valid
  • So, I went ahead to delete inactive subscriber to which I have access and deleted the inactive subscription from Replication > Local Subscriptions Folder in SSMS GUI

     

    Important to note here is: We cannot execute sp_repldone
    or sp_removedbreplication

    because it may disrupt the active subscription as well, moreover the option to reconfigure publication again is also not possible because I don’t have access to publisher and involving vendor is bit difficult, so went ahead with just deleting the inactive subscription which went absolutely fine

     

    Note: Optional commands to remove the entire replciation setting on database which is not useful for me but may help someone reading this article,

    –Execute
    sp_repldone
    @xactid =
    NULL, @xact_segno =
    NULL, @numtrans = 0, @time = 0, @reset = 1

    –Execute
    sp_removedbreplication
    ‘dbname’

     

  • Now after doing step 4, ideally t-log space should have been released but NO, log_reuse_wait_desc was still “Replication”
  • So Ran DBCC OPENTRAN and it gave following output,

    Replicated Transaction Information:

    Oldest distributed LSN : (0:0:0)

    Oldest non-distributed LSN : (10217:892:1)

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

     

  • You might have guessed by now that it may be because of CDC (change Data Capture), but to clarify, CDC Clean job was running absolutely fine and it was re-run to ensure anything pending gets cleaned up but NO, it didn’t worked
  • So after some more research found that CDC should have two jobs, one to capture and other to clean and in this SQL server only cleanup job was present with Job name: cdc.dbname_cleanup
  • Now the option to disable CDC and enable is out of scope because it seems that golden gate might be using CDC and if it is touched than it might get impact and it might need to set it up again which is quite a task in itself
  • So went ahead with some more research and took a call to CREATE CDC Capture job and try to see if it helps because it seemed less intrusive that other available options,

     

    –Command to CREATE Capture job <run in respective database context>

    EXEC
    sys.sp_cdc_add_job
    ‘capture’

    GO

     

    –A job will create as Job ‘cdc.dbname_capture’ started successfully.

     

    –Command to DROP Capture job <run in respective database context>

    EXEC
    sys.sp_cdc_drop_job
    ‘capture’

    GO

     

    –Just an FYI, below command are for those who are looking to create even Cleanup if CDC is enabled and neither of above jobs are found.

    –Command to CREATE Cleanup job <run in respective database context>

    EXEC
    sys.sp_cdc_add_job
    ‘cleanup’

    GO

    –Command to DROP Cleanup job <run in respective database context>

    EXEC
    sys.sp_cdc_drop_job
    ‘cleanup’

    GO

 

  • After job ‘cdc.dbname_capture’ is created, it automatically started and kept running (I believe this is default setting for CDC to continuously run this job)
  • As log size was pretty much 50+GB hence ran DBCC OPENTRAN and found that the Replicated Transaction Information is now updating whereas relatively it was static when ran in step 6 above,

     

    Oldest active transaction:

    SPID (server process ID): 67

    UID (user ID) : -1

    Name : user_transaction

    LSN : (12779:42734:1)

    Start time : Jul 15 2016 1:08:23:143PM

    SID : 0x010500000000000515000000efaec1579fa8a196f757550fcb251500

     

    Replicated Transaction Information:

    Oldest distributed LSN : (0:0:0)

    Oldest non-distributed LSN : (10458:4850:1)

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

  • After running it for about 10 mins, observed that the t-log file internal free space started increasing which is a good sign
  • After running it for about 30-45 mins, observed that t-log was 99% free and validated the sys.databases and log_reuse_wait_Desc value is “NOTHING” :) good news

     

  • So, shrinked log file and released space and ran the DROP CAPTURE job command (refer step 10) to remove the job as pretty much unsure on whether it’s needed or not, will leave it for future to monitor and see if it is actually required for continuous run, it is anyway an ONLINE operation and can be done anytime

 

Above steps resolved the issue in my environment, all or some of steps may help reader of this blog to solve issue in their environment, feel free to leave your comments and refer the references link below which I used for my case and some of them may help understand concepts as well.

 

Adios for now!

 

 

Useful References:

http://www.sqlskills.com/blogs/paul/replication-preventing-log-reuse-but-no-replication-configured/

http://www.sqlservercentral.com/Forums/Topic695034-357-1.aspx

https://subhrosaha.wordpress.com/2011/12/17/sql-server-log_reuse_wait_desc-set-to-replication/

https://msdn.microsoft.com/en-us/library/cc645937(v=sql.105).aspx

https://msdn.microsoft.com/en-us/library/cc627396(v=sql.105).aspx

http://www.sqlservercentral.com/Forums/Topic1142599-391-1.aspx

https://www.brentozar.com/archive/2012/08/scary-sql-surprises-crouching-tiger-hidden-replication/

https://blogs.msdn.microsoft.com/repltalk/2010/11/17/how-to-cleanup-replication-bits/

Download PDF

Viewing all articles
Browse latest Browse all 34

Trending Articles