Monday, December 15, 2008

Recommended mailbox size and Exchange Databases

Below is classical questions asked at least one time in every consulting place I walked in so far. I do understand from the costumer stand point why to ask these questions to Exchange SME.

From my experience most of the contract I death with did not fallow the MS best practices. The ones did follow, very little work to do, or there was no reason for me to be there.

Here is a great tip for those who wish to implement MS best practice. Use magic way of locating this information "goggle it". Most of the time Google will take you to right article or TechNet page faster than any other method as far as I know. I have also included some basic foundation information which I think is very important for all exchange administrators to know and understand better the entire concept.

Optimizing Database Access

The exchange database arte not to be larger than 50 to 100 GB, according to best practices, explained in below MS article, the link for the original article is also included.

  • For servers supporting large information stores (50 to 100 GB), it is especially important to follow these guidelines
  • Place transaction log files and database files on different disks.
  • Dedicate a high performance spindle to the transaction logs.
  • Use a dedicated partition for the databases. Experience shows that as servers get bigger, the database partition starts to use a lot of I/O. This is especially true for RAID 5 partitions because of the added overhead. As a result, it's a good idea to only put database files on the database partition.
  • Put the MTA database and tracking logs on the system disk (if you don't have a spare spindle), not the database partition.

    The reality behind the size of the database is that the bigger the database is, it gets harder for application to handle. Same goes for I/O, CPU and memory relationship. The factors listed above are to provide the best performance. The RAID also indicating the most redundancy for given configuration. The fact is that, understanding what type of operations any application uses, the key for deciding the RAID level. For instance, if the task or process read intensive (Logs) the RAID level has to be decided in accordance to the read operations. If, the task or process write intensive (Database) same goes for RAID. So keep in mind the fastest and most redundant way will be considering these factors and performing implementation accordingly.

The following sample disk configuration is recommended for typical large servers.

  • Mirror set 1

    System disk. Includes binaries, swap file, MTA database.

  • Mirror set 2

    Transaction files only.

  • RAID 5 partition

    Exchange information store and directory databases only.

What is the recommended mailbox size for per users, what are the some industry best practices out there, people are already using

  • What is the recommended size for Exchange databases?

    50-100 Gig for per database.

  • Offline Exchange Defragmentation and how long will it take.

    It takes about one hour to defragment 5 to 10 GB

  • Why to do Exchange offline defragmentation?

    If there is enough whitespace (Unusable space) on the exchange databases, the administrator might consider performing offline defragmentation. Remember offline means "Outage" so it has to be planned with business owners and end users.

    There is no need to perform offline defrag, if you are running enterprise version of exchange. Simply create new database move the users into this newly created database and delete the old database.

  • Is your mailbox is big and causing performance issues to the exchange server?

    It's not the size of the mailbox that impacts performance - it is the number of items in the folder or folders that are being accessed on the server. Read more

Ps: Please remember these number are estimate, the actual CPU,HD, and month of memory in the system will effect these numbers.

Below information taken from same article, if you pay attention to couple lines below you will have good foundation and understand how exchange writes data to the database.

When Exchange is running, technically the databases are inconsistent.

Exchange starts, while Exchange is running normally, the databases are technically inconsistent. Why is that so?

The Exchange database engine caches the disk in memory by swapping 4 KB chunks of data, called pages, in and out of memory. It updates the pages in memory and takes care of writing new or updated pages back to the disk. This means that when requests come into the system, the database engine can buffer data in memory so it doesn't have to constantly go to disk. This makes the system more efficient because writing to memory is "cheaper" (or faster) than writing to disk. When users make requests, the database engine starts loading the requests into memory and marks the pages as "dirty" (a dirty page is a page in memory that has been written with data). These dirty pages are then later written to the information store databases on disk.

Although caching data in memory is the fastest and most efficient way to process data, it means that while Exchange is running, the information on disk is never completely up-to-date. The latest version of the database is in memory, and since many changes in memory haven't made it onto disk yet, the database and memory are out of sync

Why LOG files are very important (Many people think databases are the most important)

  • Most people naturally think that the database files are the most important aspect of data recovery. But, transaction log files are actually more important because they reflect what will happen with the data, not what has happened.
  • Transaction log files are a sequence of files whose purpose is to keep a secure copy on disk of volatile data in memory, so the system can recover in the event of a failure.
  • When a change is made to the database, the database engine updates the data in memory and synchronously writes a record of the transaction to the log file that tells it how it could redo the transaction in case the system fails
  • Logically you can think of the data as moving from memory to the log file to the database on disk, but what actually happens is that data moves from memory to the database on disk
  • To keep track of the data that hasn't yet been written to the database file on disk, the database engine maintains a checkpoint file called Edb.chk for every log file sequence. The checkpoint file is a pointer in the log sequence that maintains the status between memory and the database file on disk. It indicates the point in the log file where the information store needs to start the recovery from if there's been a failure. In fact, the checkpoint file is essential for efficient recovery because if it didn't exist, the information store would have to attempt recovery by starting from the beginning of the oldest log file it found on disk and then check every page in every log file to determine whether it had already been written to the database
  • Circular Logging—Don't Use It! And why

    It eliminates your ability to recover all changes since your last backup if your information store is corrupted due to a hardware failure. Remember logs are the duplication of real data being written to the database, it is insurance for exchange if information in the memory vanishes. When you turn on circular login, start deleting all logs as they come in, you are eliminating to your exchange server to recover if it crashes. Therefore you would need your last full backup to get back to business. Or after tuning CL logging it is advices to immediate full backup.

When offline defrag runs, it creates a new database file and then copies all the data in the old file to the new file.

This can take a lot of time. On average, it takes about one hour to defragment 5 to 10 GB.

Recommended Mailbox Size Limits


Oz ozugurlu MVP (Exchange)


MCSE (M+,S+) MCDST, Security+, Server +,Project+



Mike Crowley said...

"The reality behind the size of the database is that the bigger the database is, it gets harder for application to handle."
This is not true. The only reason Microsoft recommends the 100GB limit on Exchange 2007 databases is because of the cost and technology available for backup/restore. They have said if cost and technology for backup were to improve they will adjust their size recommendation. This is also why they recommend up to 200GB for CCR - because you can backup off of the passive node which can usually run during the day. Microsoft says that Exchange 2007 runs fine with multi-terabyte databases, but its not recommended for above mentioned reasons.

Oz Ozugurlu said...

Mike as always thanks for the comments,

The MS numbers are only for guidelines and taken from some of the testing. The Reality behind these numbers will change based on various factors.
The logic behind my statement is clear, if you have database 50 gigs versus 100 gigs which one takes up less process, memory, CPU, and I/O?
You will realize, the smaller database will be easy for an application to handle, and this is true for Exchange as well. Some of MVP’s I know running over 200GIG databases happily after all. These database can be restored in very quick time as well. ( Snapshots).
Having Enterprise version of exchange and making one DB 200 gig and not creating multiple small databases is BAD idea. If you are running standard version, you have no choice anyways (- :


Anonymous said...

Another reason not to go past 100gig is if you were to run into database corruption issues and had to perform eseutil /p /d and isinteg on a 300gig database. It would take FOREVER! Don't go past 100gig. lol