SQL server

Index density and selectivity.

Two terms that used to define an index’s weight/quality are: selectivity and density. I have come across these terms many times and assumed their meaning which was mostly right. To end those assumptions and have a proper explanation, see the example below:

In a company with 1,000 employees, an index on Date of Birth would likely be highly selective—meaning that a query for employees born on a given day should, statistically, never return more than an average of 3 or 4 employees per day. In SQL Server terminology, this index would have a density of .003% (or 3 out of 1,000), which in turn, would translate to an index selectivity of .997 (index density and selectivity are inversely related or proportional.) Essentially, this index would be beneficial in any query used against the birth date of employees, as the data within this index is ‘selective’ or capable of discriminating against different types of results. Within SQL Server, the more selective an index, the greater the chance that it will get used, and the more efficient it will be at returning results in a performant manner.

Full credit for example goes to an excerpt from a whitepaper by Michael K. Campbell

TOP 10 SQL PERFORMANCE BOOSTERS
INCREASE SQL SERVER PERFORMANCE
WITH THE HARDWARE YOU ALREADY OWN
BY MICHAEL K. CAMPBELL
Advertisements
Standard
SQL server

Creating foreign keys in SQL server.

I am going back to the basics here. I realized that in my quest for learning advanced topics, I tend to get a bit rusty with the basics. Hence I am creating a new a new basics tag for my posts and will be creating very basic posts with this tag.

Foreign key (FK) constraints allow you to create relationships between two tables. One of the tables will be treated as a parent table and the other table will be treated as the child. The FK constraint will connect one column from the parent with one column of the child.

-- Creating the parent table and adding constraints.
CREATE table parent
(
id int identity,
value int
)
alter table parent alter column id int not null
alter table parent add constraint pk_parent_id primary key (id)
-- Creating the child table and adding constraints.
CREATE table child
(
id int identity,
parent_id int,
value int
)
alter table child alter column id int not null
alter table child add constraint pk_child_id primary key (id)
alter table child add constraint fk_child_parent_parent_id foreign key(parent_id) references parent(id)

You cannot delete a row from the parent table, if there is a related row in the child table. You can delete rows from the child table without issues.

-- Inserting a row into the parent table and a related row into the child table.
insert into parent VALUES (20)
INSERT INTO child(parent_id,value) VALUES (1,10)
-- Trying to delete the previously inserted row from the parent will give an error.
delete FROM parent where id=1

Msg 547, Level 16, State 0, Line 1
The DELETE statement conflicted with the REFERENCE constraint "fk_child_parent_parent_id". The conflict occurred in database "dba", table "dbo.child", column 'parent_id'.
The statement has been terminated.

You cannot insert a row into the child table’s column (that is restricted by the foreign key) if that value does not exist in the parent table. You can insert rows into the parent table without issues.

-- If there is no value in parent.id = 111, the row below will error out.
INSERT INTO child(parent_id,value) VALUES (111,10)

Msg 547, Level 16, State 0, Line 1The INSERT statement conflicted with the FOREIGN KEY constraint “fk_child_parent_parent_id”. The conflict occurred in database “dba”, table “dbo.parent”, column ‘id’.The statement has been terminated.

To delete rows from the parent table, you can use a cascade on delete. Another manual method is to delete all the referenced child rows and then to delete the parent row. The similar restrictions apply for updating rows to the related tables.

Standard
SQL server, SQLServerPedia Syndication

Implicit and explicit transactions.

There are two ways a query can be committed in SQL. Implicitly and explicitly. Queries that have use the begin tran/commit statements are performing explicit transactions. Queries without the begin tran/commit statements are implicitly committed. There are some performance implications when not explicitly commiting a transaction. Example below:

— Create the table
CREATE table testins
(
id int constraint cix_id_testins primary key clustered identity(1,1),
some_data varchar(25) null
)

— implicit transactions, tran1
SET NOCOUNT on
DECLARE @counter int
SET @counter =1
WHILE (@counter < 100000)
begin
    insert into testins VALUES (‘Test Data’)
    set @counter=@counter+1
end

— explicit transactions, tran2
SET NOCOUNT on
DECLARE @counter int
SET @counter =1
BEGIN tran
    WHILE (@counter < 100000)
    begin
        insert into testins VALUES (‘Test Data’)
        set @counter=@counter+1
    end
commit

The implicit transaction (tran1) took 49 seconds to complete, whereas the explicit transaction (tran2) completed in 1 second. The diffrence in this scenario has to do with Log buffer flush. In the implicit trasactions, the log
blocks in memory are flushed to disk (log file) more often than when the explicit transactions was used. In the explicit transaction, the logs are flushed to the log file only when the log blocks in memory are full. To learn more about SQL Server transaction log files check out these links: log file logical architecture, log file physical architecture.

Standard
SQL server, SQLServerPedia Syndication

Finding the size of a data type.

When doing capacity planning, it is important to know the space that will be required for a row of data. This will also help with performance because if you can change your schema to pull in more rows into a buffer page (without degrading other aspects), then you will have better performance.

One function you can use is the DATALENGTH function which will show you the number of bytes used to represent any expression. Consider the data/expression: ‘20130101‘, which is a date. To see the effects on size when using different data types, see below.

SELECT DATALENGTH(CAST(‘20130101’ as char))  — 30 bytes
SELECT DATALENGTH(CAST(‘20130101’ as binary))  — 30 bytes
SELECT DATALENGTH(CAST(‘20130101’ as nvarchar))  — 16 bytes

SELECT DATALENGTH(CAST(‘20130101’ as char(8)))  — 8 bytes
SELECT DATALENGTH(CAST(‘20130101’ as varbinary))  — 8 bytes
SELECT DATALENGTH(CAST(‘20130101’ as text))  — 8 bytes
SELECT DATALENGTH(CAST(‘20130101’ as varchar))  — 8 bytes
SELECT DATALENGTH(CAST(‘20130101’ as image))  — 8 bytes

SELECT DATALENGTH(CAST(‘20130101’ as int))  — 4 bytes
SELECT DATALENGTH(CAST(‘20130101’ as date))  — 3 bytes

You can see how 10 times the data could be stored if the column was a date type as compared with a char type.

Standard
SQL server, SQLServerPedia Syndication

Breaking down dates using DATEPART.

You may have written ad-hoc queries to pull data for a certain date range. As this report gets popular over time and the date range expands, you may be asked to do a break down of the report. For example: per month or querter.

The magical function that helps with this purpose is the DATEPART function. With the DATEPART function, you can extract (otherwise tricky) calendar references like quarter, month, week, weekday, etc without having to guess the correct condition in your where clause.

Below is an example of how this function is used to break down a yearly report into quarters.

SELECT subq1.QuarterPart, sum(subq1.OrderPrice) from
(
SELECT
CASE DATEPART(quarter, OrderCompletedDate)
    when 1 then ‘Q1’
    when 2 then ‘Q2’
    when 3 then ‘Q3’
    when 4 then ‘Q4’
end as QuarterPart
,OrderPrice
from Orders where OrderCompletedDate BETWEEN ‘2012-01-01’ AND ‘2013-01-01’
)subq1
GROUP by subq1.QuarterPart
order by subq1.QuarterPart asc

Compare this with having to figure out and write where clauses like: ” where OrderCompletedDate >= ‘2012-01-01’ and OrderCompletedDate < ‘2012-04-01’ “. Now imagine doing this for a weekly breakdown.

Standard
SQL server, SQLServerPedia Syndication

AppDomain 2 (dbname.dbo[runtime].745) created message in error log.

This message is related to the CLRs created in the DB. The message will usually be: AppDomain some_number (dbname.dbo[runtime].some_number) created.

To see the AppDomains currently online in the DB, run the command: select * from sys.dm_clr_appdomains. This will show you the DB and user id under which these CLRs are running. Other related DMVs are: sys.dm_clr_loaded_assemblies, sys.dm_clr_tasks.

From BOL: Application domain (AppDomain) is a construct in the Microsoft .NET Framework common language runtime (CLR) that is the unit of isolation for an application.

Basically, the AppDomain is used to isolate the individual processes for security and manageability reasons. For performance reasons the AppDomain is cached after the routine has finished running. Based on the cost/value column in sys.dm_clr_appdomains the AppDomain will be removed from memory when under pressure.

 

Standard
SQL server, SQLServerPedia Syndication

CTE (Common Table Expressions) and sub queries.

Common Table Expressions or CTEs can be used to create temporary result sets based on another query. The CTE is not saved as a DB object and has the same lifetime of the query. CTEs come in handy when a recursive query is required, or when you need to aggregate on top of another select statement.

The CTE is populated by using a Select clause, similarly to how views are defined . Below I show an example of a CTE.

— With CTE
with uo_cte(id,CustomerID,ShoppingListstatus,ShoppingListdate)
as
(
SELECT  top 100 Id,CustomerID,ShoppingListStatus,ShoppingListDate FROM UserShoppingList where ShoppingListDate<‘2013-02-10’
)
SELECT ShoppingListstatus,COUNT(id) from uo_cte
group by ShoppingListstatus

I usually use subqueries to aggregate on top of another select clause. In the example below, I show how the CTE above can be redefined as a subquery.

— With Subquery
SELECT subq1.ShoppingListStatus,COUNT(subq1.Id) from
(
SELECT  top 100 Id,CustomerID,ShoppingListStatus,ShoppingListDate FROM UserShoppingList where ShoppingListDate<‘2013-02-10’
)subq1
GROUP by subq1.ShoppingListStatus

The above examples have pretty much the same performance and function. CTE’s have an advantage over using a subquery in that you can use recursion in a CTE. To read more about a recursive CTE, check out this link.

Standard