Summary: Although partitioning tables and indexes has always been a design tactic chosen to improve performance and manageability in larger databases, Microsoft SQL Server 2005 has new features that simplify the design. This whitepaper describes the logical progression from manually partitioning data by creating your own tables to the preliminary features, which enabled partitioning through views in SQL Server 7.0 and SQL Server 2000, to the true partitioned table features available in SQL Server 2005. In SQL Server 2005, the new table-based partitioning features significantly simplify design and administration of partitioned tables while continuing to improve performance. The paper's primary focus is to detail and document partitioning within SQL Server 2005 – giving you an understanding of why, when and how to use partitioned tables for the greatest benefit in your VLDB (Very Large Database). Although primarily a VLDB design strategy, not all databases start out large. SQL Server 2005 provides flexibility and performance while significantly simplifying the creation and maintenance of partitioned tables. Review this document to get detailed information about why you should consider partitioned tables, what they can offer and finally how to design, implement, and maintain partitioned tables.

Scripts from this Whitepaper:

The scripts and examples used in the code samples for this whitepaper can be found in the SQLServer2005PartitionedTables.zip file.

The History of Partitioning. 2

Partitioning Objects manually in releases before SQL Server 7.0. 2

Partitioned Views in SQL Server 7.0. 3

Partitioned Views in SQL Server 2000. 4

Partitioned Tables in SQL Server 2005. 4

Definitions and Terminology. 10

Range Partitions 10

Defining the Partitioning Key. 11

Index Partitioning. 12

Special Conditions for Partitions – Split, Merge and Switch12

Steps for Creating Partitioned Tables 16

Determine IF Object should be partitioned. 16

Determine Partitioning Key and Number of Partitions 17

Determine IF Multiple Filegroups should be used. 17

Create filegroups 17

CREATE PARTITION FUNCTION for a Range Partition. 18

CREATE PARTITION SCHEME. 20

Create the partitioned table. 21

Create Indexes: Partitioned or Not? 22

Putting it all Together: Case Studies 22

Range Partitioning - Sales Data. 22

Joining Partitioned Tables 31

Sliding Window Scenario. 35

List Partitioning – Regional Data. 41

Summary. 43

Why do you need Partitioning?

Before one can talk about how to implement partitioning and the features of partitioning one must first understand the need; what are partitions and why might someone consider using them? When you create tables, you design those tables to store information about an entity – i.e. customers or sales. Each table should have attributes that describe only that entity and for customers and sales the historical premise is that all of your customers and all of your sales go into their respective tables. While a single table for each entity is the easiest to design and understand, it may not be the best for performance, scalability and manageability especially as the table grows large. Partitioning can provide benefits for both large tables (and/or their indexes) and tables which have varying access patterns. More specifically, through partitioning practices large tables have better scalability and manageability and the use of tables that have changing data is simplified when adding or deleting large "chunks" (or ranges) of data.

So what constitutes a large table? The idea of VLDB (Very Large Database) is that the total size of the database is measured in hundreds of gigabytes or even terabytes but the term does not necessarily specify individual table sizes. A large table is one that does not perform as desired or one where the maintenance costs have gone beyond pre-defined maintenance periods. Furthermore, a table can be considered large if one user’s activities significantly affect another or if maintenance operations affect other user’s abilities. In effect, this even limits availability. Even though the server is available, how can you consider your database available when the sales table’s performance is severely degraded or even inaccessible during maintenance for 2 hours per day, per week, or even per month? In some cases, periodic downtime is acceptable yet it is often possible to avoid or minimize downtime through better design.

A table whose access patterns vary may also be considered large when sets (or ranges) of rows have very different usage patterns. Although usages patterns may not always vary (and this is not a requirement for partitioning), when usage patterns do vary there can be additional gains. Again, thinking in terms of sales, the current month's data is read-write while the previous month's data (and often the larger part of the table) is read-only. Large tables where the data usage varies or large tables where the maintenance overhead is overwhelming can limit the table's ability to respond to varied user requests, in turn limiting both availability and scalability. Moreover, especially when large sets of data are being used in different ways maintenance operations can end up routinely maintaining static data. Performing maintenance operations on data which does not truly need it – is costly. The costs can be seen in performance problems, blocking problems, backups (space, time and operational costs) as well as negatively impacting the overall scalability of the server.

Furthermore, if a large table exists on a system with multiple CPUs, partitioning the table can lead to better performance through parallel operations. Large-scale operations across extremely large data sets – typically many million rows – can benefit by performing multiple operations against individual subsets in parallel. A simple example of performance gains over partitions can be seen in previous releases with aggregations. For example, instead of aggregating a single large table, SQL Server can work on partitions independently and then aggregate the aggregates. In SQL Server 2005, joins can benefit directly from partitioning; SQL Server 2000 supported parallel join operations on subsets yet needed to create the subsets on the fly. In SQL Server 2005, related tables (i.e. Order and OrderDetails) that are partitioned to the same partitioning key and the same partitioning function are said to be aligned. When the optimizer detects that two partitioned and aligned tables are joined, SQL Server 2005 can choose to join the data which resides on the same partitions first and then combine the results. This allows SQL Server 2005 to more effectively use multiple-CPU machines.

So how can partitioning help? Where tables and indexes become very large, partitioning can help by splitting large amounts of data into smaller more manageable chunks (i.e. partitions). The type of partitioning described in this paper is termed horizontal partitioning. With horizontal partitioning, large chunks of rows will be stored in multiple separate partitions. The definition of the partitioned set is customized, defined, and managed – by your needs. Partitioning in SQL Server 2005 allows you to partition your tables based on specific data usage patterns using defined ranges. Finally, SQL Server 2005 offers numerous options for the long-term management of partitioned tables and indexes by adding complementary features designed around the new table and index structure.

The History of Partitioning

The concept of partitioning is not new to SQL Server. In fact, forms of partitioning have been possible in every release. However, without features to aid in creating and maintaining your partitioning scheme, partitioning has often been cumbersome and underutilized. Typically, the design is misunderstood by users and developers and the benefits are diminished. However, because of the significant performance gains inherent in the concept, SQL Server 7.0 began improving the features enabling forms of partitioning through partitioned views and SQL Server 2005 now offers the largest advances through partitioned tables.

Partitioning Objects manually in releases before SQL Server 7.0

In SQL Server 6.5 and earlier, partitioning had to be part of your design as well as built into all of your data access coding and querying practices. By creating multiple tables and then managing access to the correct tables through stored procedures, views or client applications you could often improve performance for some operations but at the cost of complexity of design. Each user and developer needed to be aware of and properly reference the correct tables. Each partition was created and managed separately and views were used to simplify access; however, this solution yielded few performance gains. When a UNIONed view existed to simplify user/application access, the query processor had to access every underlying table in order to determine if data was needed for the result-set. If only a limited subset of those underlying tables was needed, then each user and developer needed to know the design in order to reference only the appropriate table(s).

Partitioned Views in SQL Server 7.0

The challenges faced by manually creating partitions in releases prior to SQL Server 7.0, were primarily related to performance. While views simplified application design, user access and query writing, they did not offer performance gains. With the release of SQL Server 7.0, views were combined with constraints to allow the query optimizer to remove irrelevant tables from the query plan (i.e. partition elimination) and significantly reduce overall plan cost when a UNIONed view accessed multiple tables.

In Figure 1, examine the YearlySales view. Instead of having all sales within one single, large table, you could define twelve individual tables (SalesJanuary2003, SalesFebruary2003, etc¼) and then views for each quarter as well as a View for the entire year – YearlySales.

SQL Server 2005
Partitioned Tables and Indexes

Table of Contents

Why do you need Partitioning?

The History of Partitioning

Partitioning Objects manually in releases before SQL Server 7.0

Partitioned Views in SQL Server 7.0

SQL Server 2005Partitioned Tables and Indexes

Table of Contents

Why do you need Partitioning?

The History of Partitioning

Partitioning Objects manually in releases before SQL Server 7.0

Partitioned Views in SQL Server 7.0

SQL Server 2005
Partitioned Tables and Indexes