MSSQL forensics (1) - MDF fundamentals

Last year I had participated in Digital Forensics Challenge 2019 (DFC2019) and enjoyed a lot.

They made a lot of exciting questions. Especially, I had spent a fair amount of times to work on challenges around Microsoft SQL Server(MSSQL). During my challenge I tried to find useful tools/articles for MSSQL forensics, but it seems that there is not much information on the Internet.

That's why I will write a series what I have learned about MSSQL from a forensic perspective.

Scope

There are a lot of components of MSSQL. What I can describe at the series is as follows:

  • MDF file structure
  • Page header
  • Data page
  • Recover deleted records
  • Recover large object (LOB)

Sample

Unfortunately, all questions and dataset of DFC2019 are not available because DFC2019 has ended. Then I created sample MSSQL database.

Download

Sample database has been created as follows.

Software

  • Windows Server 2016 Standard Edition
  • SQL Server 2017
  • SQL Server Management Studio v18.2 (SSMS)

Database

  • Database name: 4n6ist_sample
    New_Database
  • Table name: pictures
  • Schema:
    Pictures_Schema
  • Records: Inserted 3 records then deleted 1 record(id=3) with the following query
    Sample Query

After the execution of the query, we can see 2 allocated records.

SELECT_from_pictures

Just for reference, these two binaries of data column contain JPG picture as follows:

Allocated_2Pictures
(I've got these pictures at PIXNIO, which provides public domain images)

Now my goal is to recover deleted record (i.e. id=3) as possible.

Sample database files consist of "4n6ist_sample.mdf" and "4n6ist_sample_log.ldf". Here I focus on only "4n6ist_sample.mdf" file.

MDF File Structure

Paul Randal has already covered MSSQL page structure at this article. Here is a big picture.

Big Picture of MDF Structure

In summary what we should understand is:

  • MDF file consists of multiple pages
  • The size of a page is 8k bytes
  • A page consists of header, records and slot array
    • Header: represents page type, the count of records, free space in the page, and so on.
    • Records: vary depending on page type, but generally data records hold actual data associated with a table.
    • Slot array: is to manage each record position. Each entry (2 bytes) points to each record offset on the page.

Page header

We can get information about page header using DBCC IND and DBCC PAGE query. Here is an example output on SSMS.

DBCC IND shows summary of all pages associated with specified table.

DBCC IND('database name', 'table name', -1)
DBCC_IND

DBCC PAGE shows detail information of specified page.

DBCC PAGE('4n6ist_sample', 1, 368, 0)
DBCC_PAGE
We can see record area with hex view if we set 1 to third parameter like "DBCC PAGE('4n6ist_sample', 1, 368, 1)"

What our interest in page header is:

  • m_type: 1(data type), 3(text mix page) and 4(text tree page)
  • pminlen: Size of the fixed-length columns of the record
  • m_slotCnt: Count of records
  • m_freeCnt: Size of free space
  • m_freeData: Offset to the first byte after the end of the last record

From my understanding, m_feeCnt and m_freeData are illustrated as follows:

FreeData_FreeCnt

In addition to Paul's article, Mark S. Rasmussen has described the details of page header structure. I have written a python script for parsing MDF page header. The script allows to parse MDF file without SQL Server environment.

mdf_parse_pageheader

I will cover how to handle the output and data page structure next time.