Search

sqldose

Daily dose to keep your SQL fit and fine.

Category

Uncategorized

Starting with MS SQL Server

In one of my previous blog (https://sqldose.wordpress.com/2015/10/20/a-basic-introduction-of-database-tables-and-sql/ ) I gave a basic introduction about what is A Database and what is SQL.

Now let us start with basic of Microsoft SQL Server and how to work with it. I am a fan of MS SQL Server. It is easy to use, Developer Friendly, Best performance and much more.
Ideally the first thing should be installation of MS SQL Server. You will find a lot already about it online I will be skipping this. But still, if you find any issue with installation drop your comments.
Connecting the Database Engine through SQL Server Management Studio.

SQL Server DB Engine

Database Engine, Analysis (SSAS), Integration (SSIS) and Reporting (SSRS). These all run as service. (We will focus on Database Engine only).
You can manage these service through SQL Server Configuration Manager. If the service is stopped, you will not be able to connect. See an example below.

SQL Server Service stopped.jpg

 

SQL Server Service stopped error

Note : Any time you see an error shown above, first check the service. If you do not have SQL Server Configuration Manager installed. You can also check the service under ‘Services’(Go To start and type Service). However, Microsoft suggest to do this through configuration manager as it will also take care of any dependent service.

Let us move ahead and connect to our Database Engine, where I can see all my DB’s and other SQL Objects.

SQL Server DB.jpg

 

In my next continuation I will show how to create a DB and few important properties of DB that can help in maintanenace and performance of DB’s.

SET ANSI_NULLS, NOCOUNT and QUOTED_IDENTIFIER.

Hi Guys,

When you generate a script to create a Stored Procedure, below three statements are generate by default.

SET ANSI_NULLS on

SET NOCOUNT on

SET QUOTED_IDENTIFIER on

Why do we need these. Why Management Studio thinks we would require these options. Let us understand each one of these one by one.

  1. SET ANSI NULL ON :

I am sure you know what is SET and ON. Lets us understand ANSI and NULL.

ANSI is  “American National Standard Institute” which looks after the standards for products, services, processes, systems, and personnel. In easy way I should say that, they are the one which make rules about how a particular should work or use (RULEs can be broken sometimes ;)).

Now, What is meant by NULL?

NULL simply means unknown. Does unknown has any value?

NO!

NULL cannot be ZERO. It is simply NULL which is NOT DEFINED.

So ANSI has set some very basic standards for NULL. Any comparison or calculation performed with NULL is NULL. for example NULL + 1 = NULL.

If (NULL = ‘NULL’) => FALSE.

NULL is neither a string nor any number.

So, When I do SET ANSI_NULLS ON; I am simply telling SQL SERVER to follow the standard of ANSI. Let us see with below example.

2015-11-20_22h54_41

When ANSI_NULLS is set to ON. ‘=’ operator is not able to identify NULL. This is the standard of ANSI that any comparison with NULL is always NULL. So you will never ever get a result for this. To find NULL in a column we use Keyword ‘IS NULL’.

Now let us check the same query by making ANSI_NULLS OFF.

 

2015-11-20_22h55_01

So, here it is. Now it is no more following the ANSI standard.

Below is one more example snapshot. which makes it more clear.

In ideal situation, we should always follow the ANSI guidelines in our code.2015-11-20_22h56_47

 

 

2. SET NOCOUNT ON :

As the name suggest we are informing SQL Server to ‘Do Not Count’. But what exactly SQL Server should not count. Let us check below snapshot.

2015-11-20_22h59_45

When I SET NO COUNT OFF, SSMS gave me details of how many rows are affected. At below snapshot there is not such details because we have SET NOCOUNT ON.

2015-11-20_22h59_30

This can come very handy while improving the performance of Stored Procedure.

 

3. SET Quoted_Identifier ON

(Ref. –  https://msdn.microsoft.com/en-us/library/ms174393.aspx)

Consider a situation where you want your column name be ‘Identity’. But Identity is a keyword for SQL Server. How it will differentiate between a user referred Identitiy Or T-SQL referred Identity. This is where Quoted Identifier help us out. Below is in example taken from MSDN representing how we can use already present Keywords.

 

Quoted Identifier 1

 

 

Quoted Identifier 2

 

Above two Snapshots shows how with the help of Quoted Identifier I was able to use reserved Keyword of SQL Server.

 

MSDN has also explained how to use quoted identifier for Single and Double Quotation Marks. Please refer that for details.

 

 

 

 

Consequences of Timeout/Error on a Delete statement

Hello Guys,

Lately, one of my friend asked me a Question, “If Delete is in progress and query get erred out or Timeout, will already deleted data will be rolled back, considering we do not have any transaction? “

My previous Answer was NO, After I executed the below statements.

I will create two tables,

PkTable – Having only one primary column.

FkTable – Have one column acting as Foreign key.

CreatingTable_With_Pkey_FKey

CASE 1 – If Query Get erred out

So the idea is, If I try to delete an id from PkTable which is also present in FkTable. Then SQL Statement will erred out. I will check if SQL Deletes all the rows other than the row having Foreign key relationship.

Let us first insert some data in my table.

Insert Records in Pkey Table

Now only one id in my FKTable, to have at-least one Foreign key relationship.

Insert Records in Fkey Table

We are ready to execute delete statement.

Delete Records in Pkey Table

From the above snapshot it is clear that our delete statement was completed with error.(Which was FKey constraint error). So let us check what we have remaining in our PKTable.

Select Records in Pkey Table After Delete

From above image, it is clear that DELETE will remove all the other records except the one for which SQL throws an exception.

CASE 2 – If Query Get Time out

What will happen if query does not get error out but got Time out due to huge number of records or some other reason.

Before i continue, I have to make some configurations. So that i can deliberately make my query to get timeout.

Choose QueryOptions as shown below.

QueryOptions

Configure the highlighted option. I have changed it to 5 seconds.

QueryOptions_ChangeTimeout

Let’s make a quick check if our configuration is working correct or not.

Check For Query Timeout

Great!.

I have repopulated my table. Now, let us do a delete.

Delete Records in Pkey Table With Timeout

Select Records in Pkey Table After Delete with timeout

As expected, all the records before timeout has been deleted.

But here is the Catch. The above Statements may seems to be correct but does not hold true. If you notice I am not deleting a batch of rows. It is just one row at a single time. It was a blunder at my side that I missed to explain (Thanks to Vikas and Chris for pointing this out).

So what I did wrong?

While statement actually making the each DELETE statement as an individual statement. So it is actually a 10000 Delete rather than one single DELETE.

Now, if we run the Delete as a batch it actually Rollback all the deleted rows if an error is encountered. Please see below snapshot.

Total number of rows before delete.

New_Select

Deleting in a batch.

New1

Let us check if we have actually deleted something?

New_Select

No change. This implies Deletion does not happened if it encounters any error.

(For other T-SQL, XACT_ABORT should also be considered. Check below link for details  https://msdn.microsoft.com/en-us/library/ms188792.aspx)

A Basic Introduction of Database, Tables and SQL.

It seems to be the most basic thing but this post is specifically to those who have just started or want to start their career in Database Domain. When you are a novice. You have only idea about what is a Database. But in practical life how companies are actually managing it?

Let us start with the basic.

What is a Database?

Let me take you to the old days of humanity. When there was no Computer. How companies are managing their data back then. Forget about the company ask your Grand Father, How he use to manage all the accounts. The answer is simply writing it in text book. He comes in every night working hard, and wrote all his spending and earning in a notebook. He must have fully covered 10, 20 or much more notebooks in his entire life. This whole collection of his notebooks is nothing but a database. GrandPa database. Database is a repository of essential information. It can be in any format.

Now your grandpa must have wrote his work in different formats. Like in para. “I have to pay Mr. Mark an amount of $10.” Or he might write the same thing as “Mark – Pay $10”. But what should be the most appropriate way to store your data. It would be.

Name MoneyToPAy MoneyToReceive
Mark $10 0

This is a Table. For me it is much easier to read

This give us a basic idea of what is a Database and what is a table?

Now we will enter to the age where Computers. All of your Grandpa Document can be stored in a computer. He will have its own database where he will write all his daily account in details. But how? What is an interface for him to make database, to store data in it.

This is where huge companies provided their product. ORACLE, IBM, MICROSOFT. Tools which can help you create a database, create table. Store data in it. See the result when required. One of such tool is MS SQL developed by Microsoft. Oracle owns Oracle Database. They all have their own methods to work on databases but ultimately they provide you with one solution which is letting you manage you huge database in table format. Conceptually, MS SQL is no different from ORACLE or MYSQL (etc). But each have their own benefits.

What is SQL?

Structured Query Language. Now, Companies have provided us a method to store your data. But how a user is going to interact with that data. How to read or write into a table. For that purpose Structured Query Language is used(SQL). Because Microsoft named their product as MS SQL, Most people think that that SQL is related to Microsoft only. But that is not correct, even Oracle and other platforms uses SQL to work with database (Yes, a little syntax can vary but ultimately the meaning of keywords remain same for all).

So, SQL is a language which can be used to work with your Databases.

Stay Tuned, In my next post I will explain about Microsoft SQL Server.

Next Continuation(https://sqldose.wordpress.com/2016/04/10/starting-with-ms-sql-server/)

Non-Clustered Index

In my previous blog I explained about clustered index (https://sqldose.wordpress.com/2015/09/20/clustered-index-and-related-interview-questions/ ).

Considering that we are versed with the concept of heap, binary and clustered. Lets move on to non clustered index(NCI).

NCI for me is the most mysterious object in SQL Server. The way SQL Server work with it, sometimes can be very unpredictable. But hey do not worry, that is why we are for. Share the most with you all.

Have you used information at the end of book. Which predict all the page number where a specific topic is referred.

NonClusteredIndex

Above image shows a topic Member Function, Which can be find at the corresponding pages.

Similarly, NCI is a structure ( binary or heap) which holds the pointer to the data rows of table at it leaves.

Confused???

Even i am.  😛

Ok! let me explain this in another way. As soon as I create an NCI on a column, SQL Server will create a separate structure and fill it with data(just one value) and PageID where that data can be find.

So if I have to explain NCI in a plain simple language for the MemberFunction shown in above image. It will be, The Value “MemberFunction” can be find at PageID = 101,108…

NOTE : I mentioned NCI structure as btree or heap. Why? NCI will be a heap if my table on which NCI is created does not have a CI(i.e. A HEAP). NCI will be a btree if my table has a CI. So structure of NCI depends on table on which NCI is created.

The most important point to be noted about NCI is they are stored separately. So every time when you perform any DML on your table, the same statement will be applied to NCI as well.

Now, How about if you can actually see what my NCI is filled with. What values it actually holds.

Lets get started.

Here, I am using the SalesOrderDetail (I call it SOD) table of AdventureWorks Database. (You can download this database for your practice.)

SalesOrderDetail

SOD already have two NCI. We will look for only one NCI i.e. IX_SalesOrderDetailProductID.

SalesOrderDetail_NCI

Let us first check how this Non-Clustered Index looks like physically.

NonClusterIndex_Physically1

First query in above snapshot provide me the index id for my NCI. I then used the  DBCC IND command to find the pages allocated to this table under my NCI. From the result of DBCC IND I will look for maximum value in IndexLevel column. This column help me understand whether this page is part of ROOT node or INTERMEDIATE NODE or LEAF NODE. Highest Value represent Root Node. Lowest represent Leaf and others are intermediate nodes.

In our case we have only two type of values i.e. 0 and 1. It means 1 represent the Root Node, 0 as leaf and we do not have any intermediate nodes.

  • PagePID=24336 is a root node for my NCI.

Now check this page for data. We will use DBCC PAGE command for this.

NonClusterIndex Physically2

Let us understand the above result.

ProductID is my NCI column, while SalesOrderID and SalesOrderDetailID are my CI. These will act as pointer. If  I remove the clustered index these two column will be replaced by RID which is physical location address. See below snapshot.

NonClusterIndex Physically3

(As I have deleted the  CI, Pages have been shuffled so new Root page for my index is 24920).

Now back to NCI ProductID. The above snapshots represent that ProductID=707 can be found at ChildPageID 24857,24858…so on. And the way to reach there is HEAP RID/ClusteredIndex column.

To summarize the above. Value 707 lies at page 24857(and many more) and you can use RID or CI as pointer.

I hope this clears what NCI is. If still in doubt drop me your questions.

Now few questions about NCI.

  1. Which column should be the best candidate for non clustered index?

Ans. Column which is most queried with operator = should be the one.

  1. Can we have more than one nci?

 Ans. Yes, you can. But having many nci on table can hit your performance adversely. With Every INSERT or UPDATE, SQL server will have to perform changes in NCI as well. So if you have many nci. SQL server will actually consume more time in updating indexes rather than just working with your data in table.

In some rare scenario, NCI can even cause deadlocks. So we need to be extra care full with those.

  1. Can nci be added to a column having duplicate or null values?

Ans yes.

  1. Can nci be added to a column having unique value?

Ans  yes

  1. Will SQL Server uses NCI every time for a where clause on NCI column?

Ans. NO, It totally depend on the percentage of data which is being retrieved.

If productID=707 have 90% of rows out of total records. Then SQL Server will prefer to use CI rather then taking a long route of NCI.

Clustered Index and related Interview Questions

Since the day I wrote my first blog. My friends are eagerly waiting for just one topic. The Indexing. I was trying to avoid this one, as I believe there is lot already written about it. But I cant anymore, after all my life is at stake. :p
Indexing is the most important thing to look after in a database. It is very unpredictable but with correct knowledge of your data and the how user is interacting with data can make wonders for you.
Before starting straight upto indexing. It is very important to know a basic about two very important data structure. Heap and B-tree.
Heap : A heap in most simplest way I can define as untidy or unordered way of storing data. It follow just one rule. Either parent will be greater (Known as max heap) or it will be smaller (min heap). Below tree is a max heap. Closely notice that root level is the largest of all. Also every parent is greater than the child.

Heap Structure

Let’s say if i want to add a value 6. There is no specific rule of adding a 6. I can place it anywhere by just making sure that parent should be larger. There is two appropriate place for 6.

Heap Structure_Node6

B-Tree – Now think about the above heap. And apply one simple rule. Larger value will go at right and smaller values will go at left. So your btree will look like something below. (This is a very rough example of a B-Tree of order 2, If require read more about B-Tree on Web. This example is just for a rough idea). Also B-Tree have one rule, All Leaf should lie at same level.

B-Tree_Structure

Now, let me ask you one question. You have to search 7. Which one is easier to traverse. A heap or btree. Obviously its a btree for me. As I know I need to search at specific direction only. Well if in case you thought heap and if you can prove it then boy you have my respect.
I hope we have at least basic idea about heap and binary tree. In short time you will come to know why it they are important in term of indexing.

What is indexing?
I won’t go with the text book definition. In my language, Indexing is a structure which can help you identify the exact position of your data. For example if I am searching for customer name by using customer id. Index should know where exactly the data for this customer id lies and it should take minimal amount of time to fetch me the result.

Types of indexing in SQL?
Broadly we have two classifications clustered and non clustered. There are more sub categories like ColumnStore, XML Indexes, Spatial etc. But we will be discussing about the major category as of now.

Cluster index :
Let’s create two table, one with Clustered Index(I will call it CI in further references) and other the usual table without any indexing.

CreatingClusteredIndex

Now insert these both tables with unordered id data. See below snapshot.

InsertInto_Heap_Btree

Now let us check how data is stored in the table.

Select_Heap_Btree

Did you noticed. Table having no CI, stored the data exactly in an unordered way. While, Table with CI Sorted it automatically. Does not my above phrase seems similar to the definition of heap. Yes, A table without any indexes is a HEAP. As soon as you apply CI on your table it becomes a BTree.

Now few questions about CI
1. Which column should be the best candidate for clustered index?
Column which is most queried with operator < or > should be the one. For ex. Date, Salary etc.

What! Do not believe it? Because you have been doing it on your id column. And you always search an id with =.
So it either make me wrong or you guys have been doing it wrong all the time.
Let take this discussion forward. And let me prove what my theory is.

Consider my CI is on Salary and I am searching salary>50000. SQL will traverse till 50000 and take all the nodes which are at right.
On the other hand, if my CI was on EmployeeId. There was no use of it. Because ultimately sql will still have to scan your complete row one by one for each record and then check if that EmployeeID satisfies Salary>50000.

I hope it clears a bit. (If you are thinking about Non CI on Salary, Rethink about how it store your data)

Then, why on earth someone will take CI on id. Hmmm… Relax! They are not doing it wrong but there can be multiple reason for it.

First, Ordering your table on basis of ID make it much more comfortable for users.

Second, Having a CI on column having Duplicate Values can be a performance hit in case of OLTP environment, Where there is lot of Insert/Delete is happening. Actually, for very duplicate value in a CI, SQL server will provide a Unique Identifier for each value. To make them unique internally. It will be an extra effort for SQL Server if lot of values are getting inserted and Deleted.
Other most important reason is, there are certain limitation for which you need to have an unique clustered index on a table. Like in case if you are using XML index etc. in such cases, you cannot apply a Unique CI on columns such as salary.

Conclusion : CI will make your table in a proper order. And it totally depend on situation and circumstances where you want to put your clustered index. One which is more searched about or the one which you want to put in order.

2. Can we have more than one CI in one table?
No. You cannot. See the below snapshot where i tried to make another CI on column [Name] and SQL Server throws an exception.

CreatingTwoClusteredIndex

However you can have more than one column under one CI. In such cases. Table will be sorted on the basis of first column if CI and then second and so on( like dictionary). See below Snapshot.

CI_On_TwoColumns

3. Can CI be added to a column having duplicate or null values?
YES. As I explained above in question 1. yes you can. Giving the condition it is not a unique clustered index. Check below snapshots.

Duplicate and Nulls in CI

4. If no clustered index is define how table is sorted?
If you have defined a primary key. Then by default it will act as a CI. If there is no pkey as well then it will act as heap. It is impossible to predict the order of a heap.

5. Can we have a Primary Key and CI on separate columns?
Yes. P.key is to enforce a constraint whereas CI is way to enhance the searching mechanism.

6. Does inserting, updating and deleting records affect the performance in case of ci?
I believe in case of update it should not because we are not manipulating the btree structure. (Unless we are updating the same column on which CI is applied.)
But for insert and delete it should take some performance hit. Reason, B-Tree need to keep its leaves at same. So whenever any insert or delete affect this order. SQL server will take some extra effort to make restructure the B-Tree.

*NOTE – This will not happen with each insert or delete.

This is it from my side on CI. Now its your turn to share your views on CI.

We will discuss about NCI in my next post.

The most dangerous command in sql

GUYS,

I was just surfing internet to read more about GAM and SGAM. Accidently I read an awesome article about the most dangerous sql command that you can execute. Just read and enjoy. Do not implement.

http://www.sqlskills.com/blogs/paul/dbcc-writepage/

How SQL store your data?

What happens when you create a table?

Have you ever give it a thought, how sql server actually managing your table? Where is your data actually got stored?

If yes, then you are equally passionate about sql server. Which means you are a competitor and we might be fighting for same job some other day. :p

So here how it actually goes.

SQL server store your data in a page. A single page is of 8kb size, 96 byte of the page is used as header to store certain information like which table it belongs too. Each row of your table is stored into these pages. Data Row will not span pages, however if your data row is too large like in case of nvarchar(max) etc. SQL server automatically moves your certain data to ROW_OVERFLOW_DATA allocation unit and keep the pointer on the original page.

So When you create a table. At that time SQL server will not assign you any pages. WHAT!!!. Yes, that’s true. It might be that you are just playing with the create statement, SQL server cannot let anyone play with its important resources. 🙂

2015-08-30_23h56_39

It is untill when you execute your first INSERT,  this is when SQL actually assign you a set of pages.

2015-08-30_23h58_20

Did I just said set of pages!

Yeaah!

Sql server will assign you an Extent. An extent is a set of 8 pages. Before proceeding let me explain about extents.

As I said, An Extent is Group of 8 Pages. Which make its total size of 64KB. SQL Server works with two type of extents.

  1. Mixed Extent : Pages present in a mixed extent can belong to different table. For ex. Page 1 can belong to Customer Table. Page 2 can belong to Employee. Page 3 Can belong to Manager. Page 4 again Customer and so on. There is not actual sequence and it will be very random.
  2. Uniform Extent : This type of Extent belongs to one and only one table.

Now the question arises, How SQL server will decide when to assign any table a Mixed Extent or Uniform Extent?

It actually depends on the amount of data. Whenever the size of your table grows upto 8 pages(or 64KB). SQL Server will assign your table its own dedicated Extent. Before that SQL Server will allocate pages from a Mixed Extent.

NOTE – A Page belongs to one and only one table. Pages are never shared among tables.

So when you execute your first insert statement, SQL Server will assign you a page from a Mixed Extent. As time proceeds and your data grows. At a certain point SQL will realise that now this table needs its own extent.

I would also like to mention here that not only your table but your indexes as well are stored in pages.

Below is the link to give you more detailed knowledge of Pages and extents.

https://msdn.microsoft.com/en-us/library/ms190969.aspx

Leave your comments. Share your knowledge. Or shoot me with your questions.

Create a free website or blog at WordPress.com.

Up ↑