The mystery with the disappearing file

 

Symptoms:

Imagine a case where one or more files on a network shared drive are constantly disappearing. Usually it is a precise folder where this happens. If the disappearing files or subfolders are moved somewhere else within the same network share, they stay intact.

Some older files and subfolders on this location are not affected.

Real case:

I user contacted me with a very strange problem: he has opened an excel file that he just moved from another folder to the affected location, and after editing a few cells wanted to save and close it. The error "Element could not be found", stated that the file didn’t exist anymore. Indeed a second look with Windows Explorer showed the Excel file was missing.

 


The initial idea was that – because of a macro element – either client’s or server’s antivirus software kept deleting it. A check of the logs and quarantined records dismissed this idea.

I still insisted that some other process was deleting the file and enabled an object access auditing on all domain users, presuming that some third user’s computer process was causing that (offline files sync?). Still the audit logs showed only the actions from the affected user and he didn’t initial any DELETE access request.

A third possible cause didn’t quite well fit the picture, because a failed disk sector or block cannot be repeatedly assigned to the same file, created in different times, yet still, we performed a disk check. No problems there as you can imagine.

It was only the fourth debug vector that brought me to the cause. As you recall, I enabled auditing but only for the domain users. What if a process was run with the System credentials? I configured a more verbose auditing and hit the spot almost immediately

 

Cause:

The Security logs revealed an Event ID 4663 with the following information:

 

Process Information:

    Process ID:        0x6f8

    Process Name:        C:\Windows\System32\dfsrs.exe

 

Access Request Information:

    Accesses:        DELETE

 

Bingo!

A further search in the DFS Replication logs showed an Event ID 4412 at the same time with the following relevant information:

 

The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.

 

There was only one small problem left: The server that created the DFSR Replication log was not accessible from client computers. It was our backup repository, configured to replicate all network shares and store them on tapes.

 

Resolution:

A last online research revealed a case where it was possible for Server 2003, 2008 and 2008 R2 to break a DFSR under certain conditions. See the reference below for details, but generally if your DFS replication for some reason was stopped or got stuck and during this time you move and rename folders, they are later not properly replicated and the upstream (source) server always deletes any future changes, because the downstream (destination) server marks them as conflict.

If you are in such situation, you will have to download the hotfix from the below URL and install it on the destination server, or, even better, on all DFSR member servers. The hotfix requires a reboot.

One last step is needed afterwards: On the source server move, or copy and delete, the affected folder to a location that is not replicated. Wait for the DFS replication to delete the folder on all other member servers. Then move or copy back the affected folder on the source server. Wait to replicate once again and try to reproduce the initial file problem. It shouldn’t reappear.

 

References:

https://support.microsoft.com/en-us/kb/2450944