Transactional replication – The subscription status of the object could not be changed. (Source: MSSQLServer, Error number: 14068)

Recently I’ve faced a general error on SQL Server transactional replication – The subscription status of the object could not be changed. This was popped up on no matter what changes trying to implement – add a new subscriber, drop a subscriber, add or remove any article, etc. In other words – the replication became something like read-only.

The changes I tried to implement using SSMS as well as system procedures, like “sp_dropsubscription”, but with no luck: The subscription status of the object could not be changed. (Source: MSSQLServer, Error number: 14068). Other errors by trying other procedures I’ve got:

Msg 14068, Level 16, State 1, Procedure sp_MSrepl_changesubstatus, Line 1386 [Batch Start Line 0] The subscription status of the object could not be changed.

Replication Monitor could not reinitialize one or more subscriptions that support automatic initialization. Additional Information: An exception occurred while executing a transact sql statement or batch. (Microsoft.SQLServer.ConnectionInfo)

The subscription status of the object could not be changed. Could not update the distribution database subscription table. The subscription status could not be changed. Changed database context to “TestDatabase”. (Microsoft SQL Server: Error 14068) 

Even a snapshot agent was failing without saying anything useful… Then it looked like I need to re-setup a replication from scratch. This wouldn’t be something much if only one publication on a server with few articles, etc. But in my case – there are over 10 publications with more than 100 subscriptions. All the servers are in production and to re-setup replication – would cost a lot…

Fortunately, I’ve found a way to fix the replication without re-setup. Here what I found.

First, I’ve checked is there any orphaned articles in the replication? This might be done by running the script:

use [database]
go

select object_name(objid), name, artid
from dbo.sysschemaarticles

Success! I’ve found one article exists as a replicated object (in my case udf function), but already dropped out of the database:

Please note, you can’t simply re-create a dropped article since it will get different object_id and will not fix the issue (just now though, instead of deleting the entries as described below, you may try to update article id, but at the moment I was fixing, didn’t thought about this option).

The next step – clean this up. It’s as easy as to delete a record out of a table. For sure, you need to know from what tables. Here you go, four tables in total: syssubscriptions and sysschemaarticles in a publisher database, and MSsubscriptions and MSArticles in distribution database. In my case, from the query above I’ve identified that artid = 2 is the candidate to drop. Before deleting, I’ve copied the entries to a backup table in case I’ll screw up something else and need to restore the change. Remember, all the things I was doing are on production environment…

use [database]
go

delete
from syssubscriptions 
where artid = 2

delete
from sysschemaarticles 
where artid = 2

delete
from distribution.dbo.MSsubscriptions 
where article_id = 2

delete
from distribution.dbo.MSArticles  
where article_id = 2

Success again! After the orphaned article has been deleted out of replication, now I can do all the changes again – add a new subscriber, add a new article, etc. All back to normal!

Thanks for reading!

How to transfer 3 million of rows or 1.5GB of data per one minute

Recently I had a huge task – improve ETL process between MS SQL servers – to transfer near billion of rows every day in a real time. This means, once a master SQL server receives new rows, the rows must be immediately transferred to 8 satellite MS SQL servers.

The back-end. Every MS SQL server running on one Intel Xeon E5-2690 2.9GHz 8 cores CPU, has 384GB of memory and enterprise SSD storage: separate RAID10 for data, index and log files. TempDB running on RAM drive (in memory). The main database table are separated in file-groups per day.

Before the improvement, I found around up to 800.000 rows per minute through-output. Since the amount of data were increasing every single day, the performance were too slow. By the way, the method to move the rows to satellite servers were to use stored procedure and to transfer the rows in chunks up to 1000000 of rows .

To improve the through-output, the decision were made to use SSIS package instead of USP, since SSIS is probably the best place for ETL processes. And of course to multi-thread the process. So after a lot of tests the best performance we reached using the data flow bellow:

dataflow
SSIS package Data Flow

The most important part is Balance Data Distributor. Using the component, the SSIS package were able to write in 4 separate threads into one table. This helped a lot and every additional thread gave additional 500000 of rows per minute through-output (using single thread without the component, SSIS were able to reach up to 1000000 rows per minute). So the total through-output are up to 3 million of rows or 1.5GB of data per minute in the real time from the master MS SQL server to 8 satellite MS SQL servers.

Hope this will help someone to save a lot of time and provides ability to have really fast solution to move huge amount of data between servers, etc. prepared in a short time.

If you need any help on details, leave a comment or write me an email!

MSSQL automated security sync on AlwaysOn environment

As all of you probably know AlwaysOn is not only good feature, but also more and complex work for a DBA. In this article I’ll share with you how to automate security (logins) sync between AlwaysOn Availability Group (AG) replicas. Why we need this? Because if you or anyone else will create/alter/edit any user on SQL server primary replica, the user will not be amended on a secondary replica and if you forget to do the same change, sooner or later you may have a big problems due to security. Please note that you don’t need to care about database level security, because this will be synced by AG itself, you need to care about server level security and server roles.

So, to automate security sync I have decided to create a trigger on the SQL server. The trigger is built of few parts: the first is server logins, the second – server roles and the last – prevent any security changes on secondary replica.

SQL server logins

First thing you should know is that all changes performed on every secondary AG replica fired up by the trigger from primary replica will be executed under SQL server service account. To ensure minimal rights to the service account you need to grant view server state and view any definition (many people grant sysadmin role to the service account, but I strongly disagree with it):

GRANT VIEW SERVER STATE TO [Domain\SQL_Service_Account]
GO
GRANT VIEW ANY DEFINITION TO [Domain\SQL_Service_Account]
GO

The next thing trigger will do is to capture all CREATE_LOGIN, ALTER_LOGIN and DROP_LOGIN events, extract other require data such us affected login, command text from the captured event and create T-SQL command which will be executed on every secondary replica.

SQL server roles

Previously I have mentioned that I strongly disagree to grant sysadmin role to SQL server service account. So how we can add login to lets say sysadmin role if the service account does not have such permission? I decided, to keep secure environment is better than to have fully automated changes on server roles. Regarding my decision, in case of ADD_SERVER_ROLE_MEMBER or DROP_SERVER_ROLE_MEMBER event the trigger will send alert email when any server role change, with details like this:

Subject: Server role changed on [ServerName1]
Message: Need to perform the same server role change on [Servername2, ServerName3, etc.] secondary replica as performed on [ServerName1] primary replica: ALTER SERVER ROLE [diskadmin] DROP MEMBER [test]

This will allow you to know if anybody else made any server role change or at least remember you to make the same change on other replicas.

Prevent any security changes on secondary replica

And finally, what if anybody will make a security change on any secondary replica? I prefer to rollback any change like this. So if you will try to amend any user on secondary replica, you will get an error message:

Unable to add/alter/drop users on SECONDARY replica. Please perform the action on PRIMARY replica.

The source code

Here is the source code of the trigger, feel free to use it and amend it to fit your needs. If you have any thoughts, feel three to contact me or write any comment.

IF EXISTS (
       SELECT *
       FROM   sys.server_triggers
       WHERE  NAME = 'Trg_TrackLoginManagement'
   )
    DROP TRIGGER [Trg_TrackLoginManagement] ON ALL SERVER 
GO


CREATE TRIGGER [Trg_TrackLoginManagement]
ON ALL SERVER
FOR DDL_LOGIN_EVENTS, ADD_SERVER_ROLE_MEMBER, DROP_SERVER_ROLE_MEMBER
AS
  SET NOCOUNT ON
  
  DECLARE @recipients VARCHAR(200)
  DECLARE @AGDatabase VARCHAR(200)
  DECLARE @AGGroupName VARCHAR(200)
  DECLARE @SQLServiceAccount VARCHAR(200)
  
  
  SELECT @recipients = 'alert@email.com'			--Change to email you will receive alert
  SET @AGDatabase = 'DatabaseName'				--Change a database name which belongs to AG
  SET @AGGroupName = 'AGGroupName'				--Change to your AlwaysON AG name
  SET @SQLServiceAccount = 'Domain\SQLServiceAccount'	--Change to SQL service account SQL server is running on
  
  IF (
         SELECT ars.role_desc
         FROM   sys.dm_hadr_availability_replica_states ars
                INNER JOIN sys.availability_databases_cluster ag
                     ON  ars.group_id = ag.group_id
         WHERE  ag.database_name = @AGDatabase
                AND ars.is_local = 1
     ) = 'PRIMARY'
  BEGIN
      DECLARE @data                  XML,
              @EventType             VARCHAR(100),
              @EventTime             DATETIME,
              @ServerName            VARCHAR(100),
              @AffectedLoginName     VARCHAR(100),
              @WhoDidIt              VARCHAR(100),
              @EmailSubject          VARCHAR(500),
              @EmailBody             VARCHAR(800),
              @EmailRecipients       VARCHAR(300),
              @text                  VARCHAR(1000),
              @sql                   VARCHAR(1000),
              @executesql            VARCHAR(1000)
      
      SET @data = EVENTDATA()
      SET @EventType = @data.value('(/EVENT_INSTANCE/EventType)[1]', 'varchar(100)')
      SET @AffectedLoginName = @data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'varchar(100)')
      SET @text = REPLACE(
              REPLACE(
                  @data.value(
                      '(/EVENT_INSTANCE/TSQLCommand/CommandText)[1]',
                      'varchar(max)'
                  ),
                  CHAR(10),
                  ''
              ),
              CHAR(13),
              ''
          )
      
      SELECT @sql = CASE 
                         WHEN @EventType = 'CREATE_LOGIN' THEN (
                                  SELECT 'create login [' + p.name + '] ' 
                                         + CASE 
                                                WHEN p.type IN ('U', 'G') THEN 'from windows '
                                                ELSE ''
                                           END 
                                         + 'with ' 
                                         + CASE 
                                                WHEN p.type = 'S' THEN 'password = ' 
                                                     + MASTER.sys.fn_varbintohexstr(l.password_hash) 
                                                     + ' hashed, ' 
                                                     + 'sid = ' 
                                                     + MASTER.sys.fn_varbintohexstr(l.sid) 
                                                     + ', check_expiration = ' 
                                                     + CASE 
                                                            WHEN l.is_expiration_checked > 0 THEN 'ON, '
                                                            ELSE 'OFF, '
                                                       END 
                                                     + 'check_policy = ' 
                                                     + CASE 
                                                            WHEN l.is_policy_checked > 0 THEN 'ON, '
                                                            ELSE 'OFF, '
                                                       END
                                                ELSE ''
                                           END 
                                         + 'default_database = [' 
                                         + p.default_database_name 
                                         + CASE 
                                                WHEN LEN(p.default_language_name) > 0 THEN '], default_language = [' 
                                                     + p.default_language_name 
                                                     + ']'
                                                ELSE ']'
                                           END
                                  FROM   sys.server_principals p
                                         LEFT JOIN sys.sql_logins l
                                              ON  p.principal_id = l.principal_id
                                  WHERE  p.name = @AffectedLoginName
                              )
                         WHEN @EventType = 'ALTER_LOGIN' THEN @text
                         WHEN @EventType = 'DROP_LOGIN' THEN REPLACE(@text, CHAR(13), '')
                         WHEN @EventType = 'ADD_SERVER_ROLE_MEMBER' THEN @text
                         WHEN @EventType = 'DROP_SERVER_ROLE_MEMBER' THEN @text
                         ELSE ''
                    END
      
      
      SELECT replica_server_name
      INTO   #secondarys
      FROM   sys.dm_hadr_availability_replica_cluster_nodes
      WHERE  group_name = @AGGroupName
             AND replica_server_name <> @@SERVERNAME
      
      WHILE EXISTS (
                SELECT *
                FROM   #secondarys
            )
      BEGIN
          SELECT TOP 1 @ServerName = replica_server_name
          FROM   #secondarys
          
          IF @EventType IN ('CREATE_LOGIN', 'ALTER_LOGIN', 'DROP_LOGIN')
          BEGIN
              SET @executesql = 'sqlcmd -S ' + @ServerName + ' -e -Q "' + @sql + '"'
              EXEC xp_cmdshell @executesql
          END
          
          DELETE 
          FROM   #secondarys
          WHERE  replica_server_name = @ServerName
      END
      
      PRINT @Eventtype + ' activity successfully transferred to secondary server.'
      
      DROP TABLE #secondarys
      
      IF @EventType IN ('ADD_SERVER_ROLE_MEMBER', 'DROP_SERVER_ROLE_MEMBER')
      BEGIN
          
          DECLARE @EmailMain VARCHAR(MAX)
          DECLARE @Secondaries VARCHAR(500)
          
          SELECT @Secondaries = COALESCE(@Secondaries + '; ', '') + replica_server_name
          FROM   sys.dm_hadr_availability_replica_cluster_nodes
          WHERE  group_name = @AGGroupName
                 AND replica_server_name <> @@SERVERNAME
          
          SET @EmailSubject = 'Server role changed on ' + @@SERVERNAME
          SET @EmailMain = 'Need to perform the same server role change on ' + @Secondaries + ' secondary replica '
          SET @EmailMain = @EmailMain + 'as performed on ' + @@SERVERNAME + ' primary replica:<BR>'
          SET @EmailMain = @EmailMain + @sql
          
          EXEC msdb.dbo.sp_send_dbmail 
               @recipients = @recipients,
               @subject = @EmailSubject,
               @body = @EmailMain,
               @body_format = 'HTML'
      END
  END
  ELSE
  BEGIN
      IF (
             SELECT SYSTEM_USER
         ) <> @SQLServiceAccount
         AND @EventType IN ('CREATE_LOGIN', 'ALTER_LOGIN', 'DROP_LOGIN')
      BEGIN
          PRINT 'Unable to add/alter/drop users on SECONDARY replica. Please perform the action on PRIMARY replica.'
          
          ROLLBACK
      END
  END
GO

 

Advanced SQL server monitoring techniques

The idea to write the article were inspired me by our very first KIT club conference. In the conference I’d talked about advanced SQL server monitoring techniques and after a lot of great feedback I decided to write what I talked about in more detailed way and to share a source code. If you will find any issues to use the source, you know where to find me.

In this article I’ll go through the very important places for advanced SQL server monitoring. This will not cover all SQL server monitoring, but will show you some methods I have not found that any monitoring software are already using. I will cover storage, performance counters, backups, indexes, waits and SQL server configuration changes monitoring. So, lets start from storage monitoring first.

Storage monitoring

The most popular way to monitor SQL server storage is to get an alert email with drives free space thresholds. This might be configured to receive daily emails with current details on drives, or to send an alert when a disk free space getting to low. It’s not perfect and it’s not enough to be one step ahead. Let me give you same examples. One day I’ve received an email with details:

StorageAlert1
Storage alert email with normal thresholds

The thresholds on the email looks perfect, I don’t need to take any actions, everything looks fine. But the other day I’ve received another email with different details:

StorageAlert2
Storage alert email with critical thresholds

In the alert email I found that the E: drive is almost full. Now, I’m in trouble. My server is in trouble. I need to take an action immediately. But wait, how to avoid situation like this? It would be better to know, when it might happen and to take an action before production database goes unresponsive. For this I have created SSRS report to monitor storage and I’m able to plan capacity:

DatabaseCapacityPlanning
Disk space capacity planning

As you can see from the report, in the first section on the left side we have details for each server drive: last and current months free space in percent, total capacity, how many free space in megabytes have been before two months and last month, and how many free space there are now. In the next two sections we can find disk space growth since baseline disk space check performed last 60 and 30 days. In the next two sections “Grow since baseline disk space check performed – Last 60 days” and last 30 days are information about how many megabytes disk grew up in total (column MB) and per day (column MB/Day) since baseline. But the most important is to know how many days/weeks/months (columns Days/Weeks/Months to zero) we have to zero. Having this information, we are able to plan future steps and avoid disk space issues on a server. The last two columns predict how many GB’s a disk will growth in the next 6 and 12 months with the current growth. Even more, we are able to drill down by clicking on any server drive to check disk growth in last three or more months in more details:

DriveCapacityGraph
Drive capacity growth details per day

By having those very simple SSRS reports we are able to do huge work and more important – avoid troubles and mistakes.

 

Performance counters

Unlike storage monitoring, there are number of different methods to monitor SQL server performance counters. But I didn’t saw the tool to clearly compare and to have a very simple view of your server. It’s very popular to draw a curve for each counter but on the other hand, it’s not so easy to read the values.

The most important things we should know about SQL server are – how heavy a SQL server is, how fast a SQL server is running and how hard a SQL server is running. It’s very easy to answer how heavy a SQL server is – everything you need to know is how many megs or gigs your databases are in total. This very important when you need to think about – maintenance (indexes, statistics); backups configuration; etc. How fast a SQL server is running we will find from the batch request per second performance counter and how hard a SQL server is running – from wait time per second. The funny thing is that a SQL server can wait for 3 seconds in 1 second. This is because at the same time SQL server can wait for number of different things – storage, CPU, IO, network etc. I have created a SSRS report to monitor performance counters in very easy way, so we can see when a SQL server running very fast:

PerformanceCounters1
Fast running SQL server

Or, when a SQL server running very hard:

PerformanceCounters2
Hard running SQL server

 

Backup monitoring

When we are using 3rd party tools to backup databases, the tools usually providing a report about successfully and unsuccessfully backups. That’s sounds cool and looks like we are done here. Unfortunately not. I had some situations when backup tool were green and no alerts, but in true the backups was not performed. For this I have found a way to check and to be alerted if I have a database without Full/Diff or Log (for databases in full recovery mode) backups:

BackupsMonitoring
Missing backups alert email

If I receive the email like above, I know, that I should go and check if backup process were failed or maybe I have missed to schedule a log backup for database in full recovery mode. This ensures at least that I will not dismiss to configure backup for any database (new or after recovery mode change, etc) and informs if something went wrong.

 

Index monitoring

We should not only care about index maintenance, but to monitor missing and unused indexes too. Every index must be reviewed if it’s not missed or is unused. This does not mean that we should create every single missed index or drop every unused one. This need a lot of hard work and knowledge about your database.

I prefer at least on monthly basis to get a report about index usage:

IndexMonitoring
Missing and unused indexes

In the green square we are able to find filtered (more than 10k compiles/seeks/scans and more than 80% impact) missing indexes. In the red square – unused indexes. By having this report it’s very easy to start looking on an index needs.

 

Waits monitoring

To have a good picture of a SQL server bottlenecks, config changes, etc., the wait statistics report is what we need. It’s simple, powerful way to identify why SQL server running slower than previously, to know if after the config changes SQL running faster or maybe slower, to forecast changes needs to SQL server. Let me show you how easy to check things:

SQLWaitStats1
SQL server wait per time frame

From the report we can very easy identify, that some CPU related changes (probably max degree of parallelism or cost threshold for parallelism) has been made two times:

SQLWaitStats2
SQL server wait per time frame

 

SQL server configuration changes

It is extremely important to know if any key SQL server configuration change has been made or a trace flag were turned on by someone especially without meeting change control process:

SQLServerChanges
SQL server changes alert

 From the alert we may identify who and when made a change, what the old value been and what the new value are.

 

That’s all I got for this time. Hope you will find useful points to improve your monitoring solution! As I mentioned at the beginning, if you have any question, please ask me. I’ll do my best to help you.

All the source code you can download here.