Thursday, November 24, 2011

SQL Server - Bulk import from a text file into a single table column

I am working on a terminology management system. Its data is stored in a SQL Server (2008 R2, Express Edition) database. Recently I needed a large set of words to test my web application with. I copied a term list from a website and pasted the list into a text file. Somehow the list had to be imported in a Terms table, where each record contained a term in the Term column. The other columns would be updated later, therefore I allowed NULL values for them.

The following BULK INSERT SQL statement inserts the values from our text file into the Terms table:
USE TermBase
FROM 'terms.txt'
WITH (FORMATFILE = 'terms.fmt')

A format file must be used because the number of columns in the text file (1) is smaller than the number of columns in the Terms table. Format files are typically created by using the bcp utility and modified with a text editor as needed. The terms.fmt file has been created with the following command:

bcp TermBase.dbo.Terms format nul -c -f terms.fmt -SLOCALHOST\SQLEXPRESS -Umyusername -Pmypassword

and its contents are:
1     SQLCHAR   0   12      "\t"     1     TermId            ""
2     SQLCHAR   0   100     "\t"     2     Term              Latin1_General_CI_AS
3     SQLCHAR   0   12      "\t"     3     LanguageId        ""
4     SQLCHAR   0   12      "\t"     4     StatusId          ""
5     SQLCHAR   0   12      "\t"     5     VisibilityId      ""
6     SQLCHAR   0   12      "\t"     6     DictionaryId      ""
7     SQLCHAR   0   12      "\t"     7     CategoryId        ""
8     SQLCHAR   0   12      "\t"     8     ApprovalScore     ""
9     SQLCHAR   0   8000    "\t"     9     Description       Latin1_General_CI_AS
10    SQLCHAR   0   200     "\t"     10    Source            Latin1_General_CI_AS
11    SQLCHAR   0   24      "\t"     11    InsertDate        ""
12    SQLCHAR   0   24      "\r\n"   12    UpdateDate        ""
Our terms.txt file that contains the word list to import has only one column. Therefore we modify the format file until it looks like:
1     SQLCHAR   0   100     "\r\n"   2     Term              Latin1_General_CI_AS

I will not describe the structure of the format file in detail, but notice that I have changed the number of columns from 12 to 1 and that I have removed all subsequent lines except for line number 2. That line describes the word list in our text file, that must be imported into the Terms column. Each word terminates with an end of line (\r\n).

The format file must end with and empty line (containing only a line feed), otherwise SQL Server Management Studio will cancel the import with this error message:
Msg 4862, Level 16, State 1, Line 1
Cannot bulk load because the file "terms.fmt" could not be read. Operating system error code (null).

No comments:

Post a Comment