Technology Corner

Home » Database

Category Archives: Database

Follow Technology Corner on WordPress.com

Comparison: Sql Server in Azure VM and Azure Sql Database

Azure SQL Database is native to the Azure and offered as Platform as a service (PaaS). The objective of this offering is to reduce the overall costs to the minimum for provisioning and managing many databases. This offering comes with built-in high availability, disaster recovery, and upgrades for the database. Databases run on latest version of SQL Server Enterprise Edition. Many databases can be managed by single IT resource which reduces the overall cost of administrations.

SQL Server running on Azure VMs is categorized as Infrastructure as a service (IaaS). This setup is best for migrating existing applications to Azure or extending existing on-premises applications to the cloud in deployments.

You can use preinstalled SQL Server on VMs or install your own licensed version of SQL Server. This setup is good when you want to run existing applications that require fast migration to the cloud with minimal changes. You have the full administrative rights over a dedicated SQL Server instance and a cloud-based VM.

Comparison Table

Advertisements

Change Tracking example -Sql Server

If there is a requirement to get incremental or changed data from database frequently without putting a heavy load on database objects, then Change Tracking mechanism of Sql Server can be out of the box solution for this requirement. Normally, developers have to do custom implementation to achieve change tracking behavior. It can be implementation by considering triggers, timestamp columns, or maintaining new tables.

Following is step by step instructions to enable and use of change tracking feature in SQL Server.

Step 1: Check if database compatibility level is set to 90 or greater. If It is lower than 90 then change tracking will not work.

SELECT compatibility_level
FROM sys.databases WHERE name = '';

Step 2: Enable Isolation level on a database to Snapshot. It will ensure change tracking information is consistent.

ALTER DATABASE SET ALLOW_SNAPSHOT_ISOLATION ON

Step 3: Set Change tracking on a database.

ALTER DATABASE SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS,AUTO_CLEANUP = ON)

CHANGE_RETENTION: It specifies the time period for which change tracking information is kept
AUTO_CLEANUP: It enables or disables the cleanup task that removes old change tracking information.

Step 4: Enable change tracking on a table.

ALTER TABLE
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = OFF)

TRACK_COLUMNS_UPDATED: Setting value to “ON” will make SQL Server Engine storing extra information about columns which are enabled for change tracking. ‘OFF’ is default value to avoid extra overhead on SQL Server to maintain extra columns information.

Step 5: Example to get changed data.

It is example of SQL procedure which will only send changed data from table. Application can pass @lastVersion = 0 first time and going forward application can keep the last version in the cache and pass on last stored version.


CREATE PROCEDURE [dbo].[GetIncrementalChanges]
@lastVersion BIGINT = 0 OUTPUT
AS
BEGIN
DECLARE @curVersion BIGINT = CHANGE_TRACKING_CURRENT_VERSION()
IF @lastVersion = 0
BEGIN
SELECT
a.*
FROM a
END
ELSE
BEGIN
SELECT
a.*
FROM a
INNER JOIN CHANGETABLE(CHANGES , @lastVersion) ct ON A.Id= ct.Id
END

SET @lastVersion = @curVersion

END

Disable Change Tracking

1. Before disabling change tracking on a database, all tables should have change tracking disabled.

Testing Sql statements

You can find working example in attached SQL file or code below:

changetracking


SET NOCOUNT ON
go
PRINT 'Creating test database'
Go
CREATE DATABASE testDb
GO
USE testDb
go
PRINT 'Get compatibility level of db'
GO

SELECT compatibility_level
FROM sys.databases WHERE name = 'v';

GO
PRINT 'Setting db isolation level'
ALTER DATABASE testDb SET ALLOW_SNAPSHOT_ISOLATION ON;

GO
PRINT 'Creating table testchange'
GO
CREATE TABLE dbo.TestChange
(
Id INT NOT NULL ,
NAME VARCHAR(20)
NOT NULL CONSTRAINT [PK_ID] PRIMARY KEY CLUSTERED ( [Id] ASC )
);

GO
PRINT 'Inserting initial values'
GO

INSERT INTO dbo.TestChange
( Id, NAME )
VALUES ( 1, -- Id - int
'ABC' -- NAME - varchar(2)
),
( 2, 'XXX' );
GO

PRINT 'See current change tracking version before Change tracking enabled';

SELECT [change tracking version after Enabling] = CHANGE_TRACKING_CURRENT_VERSION();
GO
PRINT 'Enable Change Tracking on database';

ALTER DATABASE testDb SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS,AUTO_CLEANUP = ON)

GO
PRINT 'Enable Change Tracking on testchange table';
GO
ALTER TABLE dbo.TestChange
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = OFF);

GO

SELECT [change tracking version after Enabling] = CHANGE_TRACKING_CURRENT_VERSION();

GO
CREATE PROCEDURE [dbo].[GetIncrementalChanges]
@lastVersion BIGINT = 0 OUTPUT
AS
BEGIN
DECLARE @curVersion BIGINT = CHANGE_TRACKING_CURRENT_VERSION()
IF @lastVersion = 0
BEGIN
SELECT
a.*
FROM TestChange a
END
ELSE
BEGIN
SELECT
a.*
FROM TestChange a
INNER JOIN CHANGETABLE(CHANGES dbo.TestChange, @lastVersion) ct ON A.Id= ct.Id
END

SET @lastVersion = @curVersion

END
GO

DECLARE @lastVersion1 BIGINT =0

EXECUTE dbo.GetIncrementalChanges @lastVersion = @lastVersion1 OUTPUT -- bigint

PRINT 'Get Last Version'
SELECT [Last Version] = @lastVersion1

PRINT 'insert new rows in table'

INSERT INTO dbo.TestChange
( Id, NAME )
VALUES ( 3, -- Id - int
'YYYY' -- NAME - varchar(2)
),
( 4, -- Id - int
'ZZZ' -- NAME - varchar(2)
)

EXECUTE dbo.GetIncrementalChanges @lastVersion = @lastVersion1 OUTPUT -- bigint

PRINT 'Get latest Version'
SELECT @lastVersion1

INSERT INTO dbo.TestChange
( Id, NAME )
VALUES ( 5, -- Id - int
'KKKK' -- NAME - varchar(2)
),
( 6, -- Id - int
'LLLL' -- NAME - varchar(2)
)

EXECUTE dbo.GetIncrementalChanges @lastVersion = @lastVersion1 OUTPUT -- bigint

PRINT 'Get latest Version'
SELECT @lastVersion1

GO
PRINT 'Disable Change Tracking on table'
ALTER TABLE dbo.TestChange
DISABLE CHANGE_TRACKING
GO
PRINT 'Current change tracking version after disabling';
SELECT [change tracking version after disabling] = CHANGE_TRACKING_CURRENT_VERSION()
GO
PRINT 'Disable Change Tracking on Database'

ALTER DATABASE testDb SET CHANGE_TRACKING = OFF

GO

PRINT 'test complete, dropping database'
USE master
Go
DROP DATABASE testDb

NoSql (It’s “Not only SQL” not “No to Sql”)

This is my first post on NoSql database technologies. There have been drastic changes in database technologies over the few years. Increase in user’s requests, high availability of applications, real time performance forced to think on different database technologies. We have traditional RDBMS, memory and NoSql databases available in market to suffice particular business needs. Here I’ll illustrate some of key aspects of NoSql databases like what is NoSql, why we need it, advantages and disadvantages of NoSql.

What is NoSql Movement?

It’s a different way of thinking in database technologies. It is unlike relational database management system where we have tables, procedures, functions, normalization concepts. NoSql databases are not built primarily on tables and don’t use sql for manipulation or querying database.

NoSql databases have specific purpose to achieve, that means NoSql database might not support all the features like in relational databases.

NoSql databases are based on CAP Theorem.

  • Consistency: Most of the applications or services attempt to provide strong consistent data. Interactions with applications/services are expected to behave in transactional manner ie. Operation should be atomic (succeed or failure entirely), uncommitted transactions should be isolated from each other and transaction once committed should be permanent.
  • Availability: Load on services /applications are increasing and eventually services should be highly available to users. Every request should be succeed.
  • Partition tolerant: Your services should provide some amount of fault tolerance in case of crash, failure or heavy load. It is important that in case of these circumstances your services should still perform as expected. Partition tolerant is one of desirable property of service. Services can serve request from multiple nodes

Why NoSql?

Since NoSql databases are using for specific purpose. They are normally using for huge data where performance matters. Relational database systems are hard to scale out in case of write operation. We can load balance database servers by replicating on multiple servers, in this case read operation can be load balance but write operation needs consistency across multiple servers. Writes can be scaled only by partitioning the data. This affects reads as distributed joins are usually slow and hard to implement. We can support increase in no. of users or requests by scaling up relational databases which means we need more hardware support, licensing, increase in costs etc.

Relational databases are not good option on heavy load which are doing read and write operations simultaneously like Facebook, Google, Amazon, Twitter etc.

A NoSQL implementation, on the other hand, can scale out, i.e. distribute the database load across more servers.

clip_image002

Source: Couchbase.com

Common characteristic in NoSql databases

· Aggregating (supported by column databases): Aggregation usage to calculate aggregated values like Count, Max, Avg, Min etc. Some of NoSql provides support for aggregation framework which have inbuilt aggregation of values. Approach in column databases is to store values in columns instead rows (de-normalized data). This kind of data mainly used in data analytics and business intelligence. Google’s BigTable and Apache’s Cassandra supports some feature of column databases.

· Relationships (support by graph databases): A graph database uses graph structures with nodes, edges and properties. Every element contains a direct pointer to adjacent element; in this case it doesn’t need to lookup indexes or scanning whole data. Graph databases are mostly use in relational or social data where elements are connected. Eg. Neo4j, BigData, OrientDB.

 

image

Source: wikipaedia

 

· Document based. Document databases are considered by many as the next logical step from simple key-/value-stores to slightly more complex and meaningful data structures as they at least allow encapsulating key-/value-pairs in documents. Eg. CouchDb, MongoDb.

Mapping of document based db vs relational db

 

Document Based Databases Relational databases
Collections Table
Document Row

 

· Key- Value Store: Values are stored as simply key-value pairs. Values only stored like blob object and doesn’t care about data content. Eg. Dynamo DB, LevelDB, RaptorDB.

· Databases Scale out: When the load increases on databases, database administrators were scaling up tradition databases by increasing hardware, buying bigger databases- instead of scale out i.e. distributing databases on multiple nodes /servers to balance load. Because of increase in transactions rates and availability requirements and availability of databases on cloud or virtual machine, scaling out is not economic pain in increasing cost anymore.

On the other hand, NoSql databases can scale out by distributing on multiple servers. NoSQL databases typically use clusters of cheap commodity servers to manage the exploding and transaction volumes.  The result is that the cost per gigabyte or transaction/second for NoSQL can be many times less than the cost for RDBMS, allowing you to store and process more data at a much lower price;

Now question here is why scaling out in RDBMS is hard to implement. Traditional databases support ACID properties that guarantee that database transactions are processed reliably. A transaction can have write operations for multiple records, so to keep consistency across multiple nodes is slow and complex process, because multiple servers would need to communicate back and forth to keep data integrity and synchronize transactions while preventing deadlock. On the other hand NoSql databases supports single record transaction and data is partitioned on multiple nodes to process transactions fast.

· Auto Sharding (Elasticity): NoSql databases support automatic data sharding (horizontal partitioning of data), where database breaks down into smaller chunks (called shard) and can be shared across distributed servers or cluster. This feature provides faster responses to transactions and data requests.

 

· Data Replication: Most of NoSql supports data-replication like relational databases to support same data-availability across distributed servers.

 

· No schema required (Flexible data model): Data can be inserted in a NoSQL DB without first defining a rigid database schema. The format of the data being inserted can be changed at any time, without application disruption. This provides greater application flexibility, which ultimately delivers significant business flexibility.

 

· Caching: Most of NoSql databases supports integrated caching to support low latency and high throughput. This behavior is contrast with traditional database management systems where it needs separate configuration or development to support.

Challenges of No-SQL

Till now we have seen significant advantages of NoSql over RDBMS, however there are many challenges to implement NoSql.

Maturity: Most of the NoSql databases are in open source or in pre-production stage. In this case it might be risk to adopt these databases on enterprise level. For small business or use case it might be better to consider. On the other hand RDBMS databases are matured, providing many features and having good documentations or resources.

Support: Most of RDBMS are not open source that means they come with commitment and assurance in case of failure. They are reliable products and properly tested. Most of NoSql databases are open source and not widely adopted by organizations. It is very hard to get effective support from open sources databases. Some of NoSql databases created by small startups for specific needs, not for global reach.

Tools: RDBMS databases have lot of tools to monitor databases, queries analyzing, optimizations, performance profiling, analytics and Business Intelligence. Objective of NoSql databases are to minimize use of admin tools which has not achieved fully yet, still there are certain things which need skills and tools to monitor database activities.

When to consider NoSql

Following are some of indicators you can consider while choosing NoSql database for your application:

· If your application needs high performance databases.

· Need less or zero administration of databases.

· You want flexible data model. Minor of major changes should not impact whole system.

· Application that needs less complex transactions.

· High availability.

· Not or less consideration on Business Intelligence and analytics.

References:

· http://nosql-database.org/

· http://www.couchbase.com

· www.mongodb.org

· http://en.wikipedia.org/wiki/Nosql

Enhanced by Zemanta

When to avoid Between statement in sql query

I was playing with sql queries for performance improvement and got to know that use of Between statement in Sql query for table with huge data degrades performance because it scans all the records within range.

Let’s take a scenario where Table A with columns like id, executiondate, name etc. Suppose this table has millions of records.

If we run query like this:

SELECT * FROM TABLEA WHERE executiondate BETWEEN '20120101' AND '20120205'

This statement scans all the dates in table and it can take more time to execute.

We know that ‘=’ operator is faster that BETWEEN statement so to avoid use of BETWEEN keyword. I created a temp table which has dates and put INNER JOIN on dates in TableA.

First I created a function GetDates which returns results of dates.

Function: GetDates

CREATE FUNCTION [dbo].[GetDates]
    (
      @StartDate [datetime] ,
      @EndDate [datetime]
    )
RETURNS @datesTable TABLE ( [Date] DATETIME NULL )
    WITH EXECUTE AS CALLER
AS 
    BEGIN

        WHILE @StartDate <= @EndDate 
            BEGIN
                INSERT  @datesTable
                VALUES  ( @StartDate )
                SET @StartDate = DATEADD(day, 1, @StartDate)
            END
        RETURN
    END

Query

SELECT * from TableA  a INNER JOIN GetDates('20120101','201200205') 
d on d.Date = a.executiondate

sql profiler stats for 631203 records

Query Type CPU Reads Duration
Between 247792 213318 263397
Without Between 20436 7617 39106

Improving Performance with SQL Server 2008 Indexed Views

Its nice article on performance of indexed views.

http://msdn.microsoft.com/en-us/library/dd171921%28v=sql.100%29.aspx

Common tips to increase performance of Sql Queries

 

Below is some very common tips to increase sql queries performance:

  • Every index increases the time in takes to perform INSERTS, UPDATES and DELETES, so the number of indexes should not too high. Try to use maximum 4-5 indexes on one table, not more. If you have read-only table, then the number of indexes may be increased.
  • Keep your indexes as narrow as possible. This reduces the size of the index and reduces the number of reads required to read the index.
  • Try to create indexes on columns that have integer values rather than character values.
  • If you create a composite (multi-column) index, the order of the columns in the key are very important. Try to order the columns in the key as to enhance selectivity, with the most selective columns to the leftmost of the key.
  • If you want to join several tables, try to create surrogate integer keys for this purpose and create indexes on their columns.
  • Create surrogate integer primary key (identity for example) if your table will not have many insert operations.
  • Clustered indexes are more preferable than non-clustered, if you need to select by a range of values or you need to sort results set with GROUP BY or ORDER BY.
  • If your application will be performing the same query over and over on the same table, consider creating a covering index on the table.
  • You can use the SQL Server Profiler Create Trace Wizard with "Identify Scans of Large Tables" trace to determine which tables in your database may need indexes. This trace will show which tables are being scanned by queries instead of using an index.
  • Avoid Indexing small tables.
  • Index the order by / group by/ distinct columns for better response time.
  • Try to restrict use of Outer join
  • Use Parameterized query because it only compile once.