MSSQL forensics (4) - LOB data structure

In my previous article, I have described how to identify and read deleted records in data page. However, we haven't seen binary data(JPG) in the page. This time I have dived into tracking such large data.

Now we have 2 sample database as follows:

  1. 4n6ist_sample.zip (4n6ist_sample.mdf): inserted 3 records then deleted 1 record
  2. 4n6ist_sample2.zip (4n6ist_sample2.mdf): inserted 3 records, deleted 1 record then inserted 1 record

In this article, I will use 4n6ist_sample.mdf mainly and cover 4n6ist_sample2.mdf at last section.

DBCC PAGE & Textpointer

In MSSQL, relatively large data is represented as LOB (Large Object).

Pictures table in sample database has data column whose data type is image. I inserted sample JPG contents into the column. At least, these contents are not stored at data page (page 368).

We can utilize DBCC PAGE to understand structures of data column.

Paul Randal has mentioned the syntax for DATA PAGE:

dbcc page ( {‘dbname’ | dbid}, filenum, pagenum [, printopt={0|1|2|3} ])
...
The printopt parameter has the following meanings:
 - 0 – print just the page header
 - 1 – page header plus per-row hex dumps and a dump of the page slot array 
(unless its a page that doesn’t have one, like allocation bitmaps)
 - 2 – page header plus whole page hex dump
 - 3 – page header plus detailed per-row interpretation

"filenum" parameter is always 1 when a database consists of only one file. "printopt=3" is useful to know values of each column. Here is output of first record at page 368.

DATA_PAGE_368_3

From column 1 to 4, I have already mentioned in previous articles. This time I focus on column 5. This data column has been interpreted as [Textpointer], and shown 2 values "TextTimeStamp" and "RowId". I tried to decode the value of TextTimeStamp, but I think the value doesn't indicate date/time information, it looks like unique id.

I looked up the means of RowId on the Internet and found a page. It mentions that Row ID specifies the physical location. The location is usually specified as FileNum:PageNum:SlotNum. In MSSQL, "slot" and "record" has the same meaning.

Text Page (Type 3 & 4)

Then, RowId (1:264:1) indicates second record of page 264 in 4n6ist_sample.mdf because SlotNum starts from 0. Page header shows that page type is 3 as follows.

Page264_Header

Page type 3 and 4 are categorized as text page. Let's see second record of the page with DBCC PAGE.

Page264_Header

The output shows that the record is Type: 5 (LARGE_ROOT_YUKON). This is not page type, but record type. It seems that record type 5 manages pointers of child nodes.

In this case, next pointer is first record (Slot 0) of page 265 and data length is 215,265. Page header shows that page type of page 265 is 4.

Page265_Header

The following shows first record of page 265 with DBCC PAGE.

Page265_Record1

The output shows that the record type is Type: 2 (INTERNAL). It refers multiple pointers of child notes.

In this case, next pointer is first record (Slot 0) of page 280 and data length is 8,040. The following pointer is first record of page 281 and data length is 8,040. You can see the following pointers continue as well. Page header shows that these page type are 3.

Page280_Header

The following shows first record of page 280 with DBCC PAGE.

Page280_Record1

The output shows that the record type is Type: 3 (DATA). It contains binary data. In this case, we can confirm the data starts with JPG header (0xFFD8FF...).

LOB Structure

From RowId in data page to actual data in text page, the linkage is illustrated as follows:

LOB_Chain

I tried to reveal these structures with test data. Here are tables.

RowId(TextPointer) in data page
Offsets Bytes Description
0 4 TextTimeStamp (BlobId)
4 4 Unused
8 4 RowId (PageNum)
12 2 RowId (FileNum)
14 2 RowId (SlotNum)
Record header in text page
Offsets Bytes Description
0 1 StatusBit
1 1 Unused
2 2 Record length
4 8 BlobId (TextTimeStamp)
12 2 Record type
Record type 5 (LARGE_ROOT_YUKON) - Header
Offsets Bytes Description
0 2 MaxLinks
2 2 CurLinks
4 2 Level
6 4 Unused
Record type 5 (LARGE_ROOT_YUKON) - Body
Offsets Bytes Description
0 4 Size (Offset)
4 4 Page
6 2 FileID
8 2 Slot
Record type 2 (INTERNAL) - Header
Offsets Bytes Description
0 2 MaxLinks
2 2 CurLinks
4 2 Level
Record type 2 (INTERNAL) - Body
Offsets Bytes Description
0 4 Size (Offset)
4 4 Unused
8 4 Page
12 2 FileID
14 2 Slot
Record type 3 (DATA)
Offsets Bytes Description
0 14 Record header
14 Record length - 14 DATA

LOB Extraction

I have written a python script based on above tables to extract LOB data. Here is output when I provide RowId information (i.e. PageNum: 264, SlotNum: 1) of data column of first record at page 368.

Extraction_LOB_PIXNIO-1941113

Output file PIXNIO-1941113.jpg is valid JPG file within data column of first record.

PIXNIO-1941113.jpg

LOB recovery from deleted record

Finally, it's time to try to recover LOB from deleted record. In sample database, third record has been deleted and can be parsed with my script.

Page368_Deleted

The script for LOB extraction can extract when we provide appropriate page & slot number with option (i.e. -p 264 -s 3).

Page264_Deleted

Recovered Image (Horse)

How about 4n6ist_sample2.mdf? I try the scripts as well.

Page368_Deleted2

Output information is same with 4n6ist_sample.mdf.

Page264_Deleted2

Overwritten with new picture

The picture is last inserted one. I have checked relevant pages and found as follows:

  • Record in data page and are record type 5 in text page have been intact
  • Record type 2 in text page has been overwritten with new record
  • Record type 3 in text page has been partially overwritten with new record

The following figure shows the state and linkage in 4n6ist_sample2.mdf

LOB_Chain_Overwritten

It seems that recovery depends on page allocation algorithm with MSSQL. We have a chance to recover data partially even if many record inserted after deletion.

Conclusion

In this series, I have confirmed that deleted information can be recoverable in MSSQL. I have focused on MDF file and several scripts are available at github. I haven't covered yet but probably LDF file also be valuable in the perspective of forensics. LDF forensics will be next step.

I think Digital Forensic Challenge 2020 will open soon. I recommend that you participate if you would like to grow your digital forensic skills.