BSOD on Startup along with error message : STOP: 0x00000116 and an issue with the atikmpag.sys file

PROBLEM:

On startup, after the windows logo appears, it fails to go any further and I end up with the message:

"Attempt to reset display driver and recover from timeout failed"
along with error message : STOP: 0x00000116 and an issue with the atikmpag.sys file
On starting via safemode it gives me:

Problem Details:
Problem Event Name: BlueScreen
OS Version: 6.1.7601.2.1.0.768.3
Locale ID: 2057

BCCode: 116
BCP1: FFFFA800B0E24E0
BCP2: FFFFF88004FB2078
BCP3: 0000000000000000
BCP4: 0000000000000002

System Specs:
HP Pavilion_dv6 6052ea
Windows 7 x64 bit
intel(R) Core(TM) i7-2630QM CPU@ 2.00GHz

I had the thermal module and fan and cooling pads replaced about 2 weeks ago, and it'd all been working fine until yesterday. Is there any chance that this might be causing it? Ive read online the atikmpag.sys file is associated with the GPU card though rather than the cooling system. I'm a bit of an amateur so any and all help/advice would be massively appreciated

Original title: BSOD on Startup



WHAT TO DO TO GET ANALYSED DATA:

In order to assist, we will need the DMP files to analyze what exactly occurred at the time of the crash, etc.

If you don't know where DMP files are located, here's how to get to them:
1. Navigate to the %systemroot%\Minidump folder.
2. Copy any and all DMP files in the Minidump folder to your Desktop and then zip up these files.
3. Upload the zip containing the DMP files to Skydrive or a hosting site of your choice and paste in your reply.

If you are going to use Skydrive but don't know how to upload to it, please visit the following:
http://www.wikihow.com/Use-SkyDrive

Please note that any "cleaner" programs such as TuneUp Utilities, CCleaner, etc, by default will delete DMP files upon use.

If your computer is not generating DMP files, please do the following:
1. Start > type %systemroot% which should show the Windows folder, click on it. Once inside that folder, ensure there is a Minidump folder created. If not, CTRL-SHIFT-N to make a New Folder and name it Minidump.

2. Windows key + Pause key. This should bring up System. Click Advanced System Settings on the left > Advanced > Performance > Settings > Advanced > Ensure there's a check-mark for 'Automatically manage paging file size for all drives'.

3. Windows key + Pause key. This should bring up System. Click Advanced System Settings on the left > Advanced > Startup and Recovery > Settings > System Failure > ensure there is a check mark next to 'Write an event to the system log'.

Ensure Small Memory Dump is selected and ensure the path is %systemroot%\Minidump.

4. Double check that the WERS is ENABLED:

Start > Search > type services.msc > Under the name tab, find Windows Error Reporting Service > If the status of the service is not Started then right click it and select Start. Also ensure that under Startup Type it is set to Automatic rather than Manual. You can do this by right clicking it, selecting properties, and under General selecting startup type to 'Automatic', and then click Apply.

If you cannot get into normal mode to do any of this, please do this via Safe Mode.



SOLUTION:
All of the attached DMP files are of the VIDEO_TDR_FAILURE (116) bugcheck.

Ensure you have the latest video card drivers. If you are already on the latest video card drivers, uninstall and install a version or a few versions behind the latest to ensure it's not a latest driver only issue. If you have already experimented with the latest video card driver and many previous versions, please give the beta driver for your card a try.

>>The basic definition of a 0x116 bugcheck is:

There may be a bug in the video driver or video hardware.So, let me now explain what VIDEO_TDR_ERROR means. First off, TDR is an acronym for 'Timeout Detection and Recovery'. Timeout Detection and Recovery was introduced in Vista and carried over to Windows 7. Rather than putting exactly what Timeout Detection and Recovery does exactly, I'll just directly quote the MSDN article!


>>Timeout detection:
The GPU scheduler, which is part of the DirectX graphics kernel subsystem (Dxgkrnl.sys), detects that the GPU is taking more than the permitted amount of time to execute a particular task. The GPU scheduler then tries to preempt this particular task. The preempt operation has a "wait" timeout, which is the actual TDR timeout. This step is thus the timeout detection phase of the process. The default timeout period in Windows Vista and later operating systems is 2 seconds. If the GPU cannot complete or preempt the current task within the TDR timeout period, the operating system diagnoses that the GPU is frozen.

To prevent timeout detection from occurring, hardware vendors should ensure that graphics operations (that is, DMA buffer completion) take no more than 2 seconds in end-user scenarios such as productivity and game play.



>>Preparation for recovery:
The operating system's GPU scheduler calls the display miniport driver's DxgkDdiResetFromTimeout function to inform the driver that the operating system detected a timeout. The driver must then reinitialize itself and reset the GPU. In addition, the driver must stop accessing memory and should not access hardware. The operating system and the driver collect hardware and other state information that could be useful for post-mortem diagnosis.



>>Desktop recovery:
The operating system resets the appropriate state of the graphics stack. The video memory manager, which is also part of Dxgkrnl.sys, purges all allocations from video memory. The display miniport driver resets the GPU hardware state. The graphics stack takes the final actions and restores the desktop to the responsive state. As previously mentioned, some legacy DirectX applications might render just black at the end of this recovery, which requires the end user to restart these applications. Well-written DirectX 9Ex and DirectX 10 and later applications that handle Device Remove technology continue to work correctly. An application must release and then recreate its Direct3D device and all of the device's objects. For more information about how DirectX applications recover, see the Windows SDK.

Article here.



With this being said, if Timeout Detection and Recovery fails to recover the display driver, it will then shoot the 0x116 bugcheck. There are many different things that can cause a 0x116, which I will explain below:


>>The following hardware issues can cause a TDR event:


1. Unstable overclock (CPU, GPU, etc). Revert all and any overclocks to stock settings. 


2. Bad sector in memory resulting in corrupt data being communicated between the GPU and the system (video memory otherwise known as VRAM or physical memory otherwise known as RAM).

GPU testing: Furmark, run for ~15 minutes and watch temperatures to ensure there's no overheating and watch for artifacts.

RAM testing: Memtest - Refer to the below:

# Memtest:

Memtest86+:

Download Memtest86+ here:

http://www.memtest.org/
Which should I download?
You can either download the pre-compiled ISO that you would burn to a CD and then boot from the CD, or you can download the auto-installer for the USB key. What this will do is format your USB drive, make it a bootable device, and then install the necessary files. Both do the same job, it's just up to you which you choose, or which you have available (whether it's CD or USB).

# How Memtest works:
Memtest86 writes a series of test patterns to most memory addresses, reads back the data written, and compares it for errors.

The default pass does 9 different tests, varying in access patterns and test data. A tenth test, bit fade, is selectable from the menu. It writes all memory with zeroes, then sleeps for 90 minutes before checking to see if bits have changed (perhaps because of refresh problems). This is repeated with all ones for a total time of 3 hours per pass.

Many chipsets can report RAM speeds and timings via SPD (Serial Presence Detect) or EPP (Enhanced Performance Profiles), and some even support changing the expected memory speed. If the expected memory speed is overclocked, Memtest86 can test that memory performance is error-free with these faster settings.

Some hardware is able to report the "PAT status" (PAT: enabled or PAT: disabled). This is a reference to Intel Performance acceleration technology; there may be BIOS settings which affect this aspect of memory timing.

This information, if available to the program, can be displayed via a menu option.

Any other questions, they can most likely be answered by reading this great guide here:

http://forum.canardpc.com/threads/28864-FAQ-please-read-before-posting


3. Corrupt hard drive or Windows install / OS install resulting in corruption to the registry or page file.

# HDD diagnostics: Seatools - Refer to the below:

http://www.seagate.com/support/downloads/seatools/

You can run it via Windows or DOS. Do note that the only difference is simply the environment you're running it in. In Windows, if you are having what you believe to be device driver related issues that may cause conflicts or false positive, it may be a wise decision to choose the most minimal testing environment (DOS).

Run all tests EXCEPT: Fix All, Long Generic, and anything Advanced.

To reset your page file, follow the instructions below:
a ) Go to Start...Run...and type in "sysdm.cpl" (without the quotes) and press Enter.

- Then click on the Advanced tab,
- Then on the Performance Settings Button,
- Then on the next Advanced tab,
- Then on the Virtual Memory Change button.

b ) In this window, note down the current settings for your pagefile (so you can restore them later on).

-Then click on the "No paging file" radio button, and

- then on the "Set" button. Be sure, if you have multiple hard drives, that you ensure that the paging file is set to 0 on all of them.

-Click OK to exit the dialogs.

c ) Reboot (this will remove the pagefile from your system)

d ) Then go back in following the directions in step a ) and re-enter the settings that you wrote down in step

b ). Follow the steps all the way through (and including) the reboot.

e ) Once you've rebooted this second time, go back in and check to make sure that the settings are as they're supposed to be.


# Run System File Checker:
SFC.EXE /SCANNOW

Go to Start and type in "cmd.exe" (without the quotes)

At the top of the search box, right click on the cmd.exe and select "Run as adminstrator"

In the black window that opens, type "SFC.EXE /SCANNOW" (without the quotes) and press Enter.

Let the program run and post back what it says when it's done.

- Overheating of the CPU or GPU and or other components can cause 0x116 bugchecks. Monitor your temperatures and ensure the system is cooled adequately.

- GPU failure. Whether it's heat, power issue (PSU issue), failing VRAM, etc.


# The following software issues can cause a TDR event:

- Incompatible drivers of any sort

- Messy / corrupt registry

- Corrupt Direct X - http://support.microsoft.com/kb/179113

- Corrupt system files (run System File Checker as advised above)

- Buggy and or corrupt 3rd party drivers. If you suspect a 3rd party driver being the issue, enable Driver Verifier:


# Driver Verifier:
What is Driver Verifier?

Driver Verifier is included in Windows 8, 7, Windows Server 2008 R2, Windows Vista, Windows Server 2008, Windows 2000, Windows XP, and Windows Server 2003 to promote stability and reliability; you can use this tool to troubleshoot driver issues. Windows kernel-mode components can cause system corruption or system failures as a result of an improperly written driver, such as an earlier version of a Windows Driver Model (WDM) driver.

Essentially, if there's a 3rd party driver believed to be at issue, enabling Driver Verifier will help flush out the rogue driver if it detects a violation.

Before enabling Driver Verifier, it is recommended to create a System Restore Point:

Vista - START | type rstrui - create a restore point
Windows 7 - START | type create | select "Create a Restore Point"
Windows 8 - http://www.eightforums.com/tutorials/4690-restore-point-create-windows-8-a.html


How to enable Driver Verifier:
Start > type "verifier" without the quotes > Select the following options -

1. Select - "Create custom settings (for code developers)"
2. Select - "Select individual settings from a full list"
3. Check the following boxes -
     - Special Pool
     - Pool Tracking
     - Force IRQL Checking
     - Deadlock Detection
     - Security Checks (Windows 7)
     - Concurrentcy Stress Test (Windows 8)
     - DDI compliance checking (Windows 8)
     - Miscellaneous Checks
4. Select - "Select driver names from a list"
5. Click on the "Provider" tab. This will sort all of the drivers by the provider.
6. Check EVERY box that is [B]NOT[/B] provided by Microsoft / Microsoft Corporation.
7. Click on Finish.
8. Restart.

Important information regarding Driver Verifier:
- If Driver Verifier finds a violation, the system will BSOD.

- After enabling Driver Verifier and restarting the system, depending on the culprit, if for example the driver is on start-up, you may not be able to get back into normal Windows because Driver Verifier will flag it, and as stated above, that will cause / force a BSOD.

If this happens, do not panic, do the following:

- Boot into Safe Mode by repeatedly tapping the F8 key during boot-up.

- Once in Safe Mode - Start > type "system restore" without the quotes.

- Choose the restore point you created earlier.
If you did not set up a restore point, do not worry, you can still disable Driver Verifier to get back into normal Windows:

- Start > Search > type "cmd" without the quotes.

- To turn off Driver Verifier, type in cmd "verifier /reset" without the quotes.
・ Restart and boot into normal Windows.

How long should I keep Driver Verifier enabled for?
It varies, many experts and analysts have different recommendations. Personally, I recommend keeping it enabled for at least 24 hours. If you don't BSOD by then, disable Driver Verifier.

My system BSOD'd, where can I find the crash dumps?

They will be located in %systemroot%\Minidump

Any other questions can most likely be answered by this article:
http://support.microsoft.com/kb/244617

1 comment: