In my previous article, I have described the basics of MDF and parsed each page headers in MDF file. This time I will focus on a record structure in a data page.
Sample database (again)
In the series I prepared 4n6ist_sample.zip, which holds "pictures" table. Here is a schema.
In concrete terms, each column is represented as follows:
- id - Fixed-size integer (4 bytes)
- date - Fixed-size ASCII string (8 bytes)
- category - Fixed-size Unicode string (32 bytes)
- filename - Variable-size Unicode string (up to 510 bytes)
- data - Variable-size data (no specific limits)
In pictures table, there are 2 allocated records, and 1 deleted records which we are not able to see on usual tool like SQL Server Management Studio.
Data page
We have already confirmed with DBCC IND query that page 368 is data page (page type=1) in the database as follows.
Also we have checked with DBCC PAGE query that pminlen of the page is 48.
Record structure
Now let's dig into records in the page. I found a whitepaper "Forensic Analysis of a SQLServer 2005 Database Server" by Kevvie Fowler. He mentioned and explained about record structure. I have checked with 4n6ist_sample.mdf and created a table of my understanding.
Offset | Bytes | Description |
---|---|---|
0 | 1 | StatusBits |
1 | 1 | Unused |
2 | 2 | Offset to a location indicating number of columns |
4 | pminlen - 4 | Data of fixed-length columns |
pminlen | 2 | Total number of columns |
pminlen + 2 | 1 + x/8 | Null bitmap |
pminlen + 3 + x/8 | 2 | Number of variable-length columns |
pminlen + 3 + x/8 + 2 | 2 | End offset of first variable-length column |
... | ... | ... |
pminlen + 3 + x/8 + 2*y | 2 | End offset of last variable-length column |
pminlen + 4 + x/8 + 2*y | variable-length | Data of variable-length columns |
(x: Total number of columns, y: Number of variable-length columns)
Null bitmap consumes 1 bit for a column, therefore the size is 1 byte if the total number of columns is less than 8. "pminlen" is one of important values and 48 at this time.
I apply the table to records of page 368 in 4n6ist_sample.mdf, the page is located from offset 0x2E0000 to 0x2E2000 because of this:
Page No. 368 (0x170) * Page Size 8,192 (0x2000) = 3,014,656 (0x2E0000)
The size of a page header is 96 (0x60) bytes, it means between 0x2E0000 and 0x2E005F. Record structure starts at offset 0x2E0060. A region of first record are highlighted at the following figure.
First record (allocated)
The result of dissection is as follows. (Following offset means relative offset at 0x2E0060)
Offset | Bytes | Description | Data |
---|---|---|---|
0 | 1 | StatusBits | 0x30 |
1 | 1 | Unused | 0x00 |
2 | 2 | Offset to a location indicating number of columns | 48 (0x0030) |
4 | 44 | Data of fixed-length columns | 1 (id: 4 bytes) 20120815 (date: 8 bytes) Castles (category: 32 bytes) |
48 (0x30) | 2 | Total number of columns | 5 (0x0005) |
50 (0x32) | 1 | Null bitmap | 0x00 |
51 (0x33) | 2 | Number of variable-length columns | 2 (0x0002) |
53 (0x35) | 2 | End offset of first variable-length column | 85 (0x0055) |
55 (0x37) | 2 | End offset of second variable-length column | 101 (0x8065) * Most significant bit(MSB) will be ignored |
57 (0x39) | - | Data of variable-length columns | PIXNIO-194113 (filename: 28 bytes) Slot data for LOB (image: 16 bytes) |
We can see values of each column except image column on first record. I will cover the treatment of such large data next time or later. Continue the same procedure for the following records.
Second record (allocated)
The result of dissection is as follows. (Following offset means relative offset at 0x2E00C5)
Offset | Bytes | Description | Data |
---|---|---|---|
0 | 1 | StatusBits | 0x30 |
1 | 1 | Unused | 0x00 |
2 | 2 | Offset to a location indicating number of columns | 48 (0x0030) |
4 | 44 | Data of fixed-length columns | 2 (id: 4 bytes) 20191230 (date: 8 bytes) Fire Flames (category: 32 bytes) |
48 (0x30) | 2 | Total number of columns | 5 (0x0005) |
50 (0x32) | 1 | Null bitmap | 0x00 |
51 (0x33) | 2 | Number of variable-length columns | 2 (0x0002) |
53 (0x35) | 2 | End offset of first variable-length column | 85 (0x0055) |
55 (0x37) | 2 | End offset of second variable-length column | 101 (0x8065) * MSB will be ignored |
57 (0x39) | - | Data of variable-length columns | PIXNIO-2321940 (filename: 28 bytes) Slot data for LOB (image: 16 bytes) |
Third record (deleted)
The result of dissection is as follows. (Following offset means relative offset at 0x2E012A)
Offset | Bytes | Description | Data |
---|---|---|---|
0 | 1 | StatusBits | 0x30 |
1 | 1 | Unused | 0x00 |
2 | 2 | Offset to a location indicating number of columns | 48 (0x0030) |
4 | 44 | Data of fixed-length columns | 3 (id: 4 bytes) 20191215 (date: 8 bytes) Horses (category: 32 bytes) |
48 (0x30) | 2 | Total number of columns | 5 (0x0005) |
50 (0x32) | 1 | Null bitmap | 0x00 |
51 (0x33) | 2 | Number of variable-length columns | 2 (0x0002) |
53 (0x35) | 2 | End offset of first variable-length column | 85 (0x0055) |
55 (0x37) | 2 | End offset of second variable-length column | 101 (0x8065) * MSB will be ignored |
57 (0x39) | - | Data of variable-length columns | PIXNIO-2315661 (filename: 28 bytes) Slot data for LOB (image: 16 bytes) |
We can easily see the data of third record which has been deleted on the database. It's a simple case, but at least it implies that deletion process doesn't clear data or truncate a record immediately.
Pminlen
At page 368, a value of pminlen is 48. We have confirmed the value with DBCC IND query, which parses the page header and record at specific page. In fact, we can also calculate the value from table schema. The total bytes of the size of fixed-length columns in pictures table is 44:
- 4 (id) + 8(date) + 32(category) = 44
pminlen is 48 because the value includes first 4 bytes of record structure:
- pminlen = 1(StatusBits) + 1(Unused) + 2(Offset to a location indicating number of columns) + 44(Total bytes of the size of fixed-length columns) = 48
When we analyze MDF file and try to recover records from specific table, we have to identify which page is important for us (i.e. pages containing data records). One of hints is page type because data page holds data records (i.e. m_type=1). Secondly, it is pminlen which can be identified from table schema.
With these two value we can expect the page number on which we should focus on. From previous article, here is the output of mdf_parse_pageheader.py, which parses each page header in MDF.
When we filter by "m_type=1 & pminlen=48", the result is as follows:
We can narrow down total 1024 pages to 32 candidate pages and it includes number 368. This way is not smart yet, we may be able to narrow down candidates using other value in page header.
I will cover slot array and deleted record next time.