Notes on NTFS

The file system NTFS can be used with the operating system Windows NT or later. It offers some special features which also have effects for TreeSize.

We will describe some of these features and their impacts on this software in the following paragraphs.

Access Control Lists

The way users can access files and folders can be restricted. One can grant or deny other users or groups certain rights like reading, writing, executing or deleting. That way one can even deny administrators to access files and folders.

If an administrator tries to access a folder in the Windows Explorer to which the owner denied any other users reading access, an “Access Denied” error message will be displayed.

However, TreeSize is able to scan such folders, if you are logged in as administrator or as a user that has the right to perform backups (This option can be changed at “Control Panel > Administrative Tools > Local Security Policy” and with the user editor of Windows).

File Based Compression

NTFS supports compression on an individual file basis. Files that are compressed on an NTFS volume can be read and written without first being decompressed by another program. Decompression happens automatically and transparently during the reading of the file. The file is compressed again when it is saved.

The space occupied by a compressed file is usually much smaller than its normal size. As a consequence, for folders that are partially or completely compressed, the allocated space reported by TreeSize may be smaller than the size reported for this folder.

TreeSize is able to show the compression ratio in an extra column on the “Details” tab. Additionally it can show compressed files and folders in a different color. These features can be turned on or off in the Options dialog.

TreeSize is able to compress and decompress entire file system branches using the context menu.

In Windows 10 Microsoft introduced new transparent compression-features in NTFS, designed to compact the files of the operating system, mainly DLL and EXE files. In contrast to old file based compression, these files are not flagged as compressed in their file attributes.

Sparse Files

Files which are large but only partially used are called sparse files.

Because the operating system does not allocate disk space for the unused parts of a sparse file, it occupies less disk space than its actual size is.

TreeSize treats sparse files like compressed files and also calculates the compression ratio for them.

Alternate Data Streams (ADS)

In NTFS, a file consists of different data streams. One stream holds the security information (access rights and such things), another one holds the “real data” you expect to be in a file.

There may be alternate data streams, holding data the same way the standard data stream does. These alternate data streams are hidden. That means that you can have a file with 1 byte in the official main data stream and some hundred MB in one or more alternate data streams.

The dir command, file managers or windows explorer will show 1 byte as the size of this file, but it actually allocates much more space on your hard drive.

_images/TreeSize-MainWindow_AlternateDataStreams.png

TreeSize can detect alternate data streams and add their sizes to the allocated file size.

Note

ADS may store information in the same cluster as the main data stream, so if a file has one or more ADS, this file does not necessarily allocate more disk space.

You can choose to detect alternate data streams, to get a more accurate allocated space of directory branches, in the TreeSize Options dialog.

This option is deactivated by default, because querying the ADS takes some time and increases the overall time needed for a scan. You can search for files containing alternate data streams using the Custom File Search of TreeSize.

Automatic Data Deduplication

Windows Server 2012 and later offer a data deduplication feature: The data deduplication segments files with fractionally equal content into so-called “chunks” which are moved into the subfolder “System Volume InformatonDedupChunkStore" located on the corresponding NTFS partition.

After the deduplication has been applied by Windows, the original data is replaced by a pointer to the corresponding chunk in the ChunkStore directory. After they have been deduplicated by the NTFS deduplication two identical files will only require half of the disk space they occupied before.

Since the original files now only contain a small pointer instead of the data, the allocated disk space will be indicated by Windows with a much smaller value than before (for two identical files the occupied disk space would be indicated as “0 Byte”).

To make TreeSize show the original file and folder sizes, simply switch the view mode from “Allocated Space” to “Size”. The “Allocated Space” shown in TreeSize is the disk space you would obtain by deleting the corresponding file.

Offline Files

Windows Server and some 3rd party tools and appliances offer a feature called “offline files”: Files that have not been used for a long time will be automatically moved to cheaper and slower storage, and a small stub file remains at its original location.

Usually TreeSize reports the allocated space of such a stub file correctly, which is often only the size of one file system cluster.

There is however one situation in which the allocated space for stub files may not be reported correctly. In case TreeSize runs into Access Denied errors, it uses Windows API functions intended for backup software in order to be able to scan also those parts of the file system and provide values for their size and allocated space.

We have seen some appliances which reported the full file size as allocated space in this case for the stub files, most likely because this would be the size occupied in a backup.

To avoid this, ensure that the user which runs the scans has full read access to the scanned file system.