SQL Server supports unicode and non-unicode characters and hence supports multiple languages. Varchar,Char,Text datatypes support non-unicode data and Nvarchar,Nchar,Ntext datatypes support unicode data. Let’s see a simple unicode and non-unicode data demo:
北京 – stands for Beijing in Chinese. The database collation is ‘SQL_Latin1_General_CP1_CI_AS’, so, we know Chinese characters are not directly supported.We can set collation on the table but I choose not to for the demo so as to see how special characters error out when stored in non-unicode column.
create table foobarDemo1(scity varchar(20))
create table foobarDemo2(scity nvarchar(20))
Insert into foobarDemo1(scity)
select '北京'
Insert into foobarDemo2(scity)
select N'北京'
--Data not stored Correctly.
select * from foobarDemo1
--Data stored correctly because of unicode column datatype
select * from foobarDemo2
Unicode data takes twice the storage as non-unicode data, considering this, it is important to define right datatype for the application. This disadvantage on the storage side can take performance hit as more data pages have to be read to fetch the data. Below demo shows how unicode and non-unicode data is stored on the data page.
Create Database TestDb
Go
use TestDb
go
Create Table foobarDemo1(sname varchar(8000))
Create Table foobarDemo2(sname nvarchar(4000))
GO
Insert into foobarDemo1(sname)
Select Replicate('Clippers',1000)
Insert into foobarDemo2(sname)
Select Replicate('Clippers',500)
--Enable trace flag to read the data
DBCC Traceon(3604)
--query to find which data page has the data row(works in sql2012 and above)
Select object_name(object_id) as TableName,allocated_page_page_id
from sys.dm_db_database_page_allocations
(DB_ID('TestDB'),Object_ID('foobarDemo1'),NULL,NULL,'Limited')
where is_iam_page=0
--shows how the non-unicode data is stored in the page
Dbcc page('testdb',1,282,2)
--query to find which data page has the unicode data(works in sql2012 and above)
Select object_name(object_id) as TableName,allocated_page_page_id
from sys.dm_db_database_page_allocations
(DB_ID('TestDB'),Object_ID('foobarDemo2'),NULL,NULL,'Limited')
where is_iam_page=0
--shows how the unicode data is stored in the page;you can see placeholder for special character
Dbcc page('testdb',1,284,2)
Starting from SQL 2008R2, SQL Server has Unicode Compression which basically compresses non-unicode data stored in unicode column as non-unicode. More on this in this MSDN Link.