From: Aristeu Rozanski <aris@redhat.com> Date: Tue, 23 Sep 2008 16:18:57 -0400 Subject: [x86_64] NMI wd: clear perf counter registers on P4 Message-id: 20080923201856.GU16840@redhat.com O-Subject: [RHEL5.3 PATCH] NMI watchdog: clear performance counter registers on P4 Bugzilla: 461671 RH-Acked-by: Prarit Bhargava <prarit@redhat.com> RH-Acked-by: Dave Anderson <anderson@redhat.com> https://bugzilla.redhat.com/show_bug.cgi?id=461671 P4 processors have a quirk on performance counter registers that will keep interrupting (in NMI watchdog and other cases, NMIs) until that bit is cleared. When a kdump kernel boots, right after enabling the NMI delivery for performance monitoring interrupts, if other performance counters were in use on the previous kernel, they may have that bit set and will keep generating NMIs forever since no code will handle other registers. This didn't happen before the NMI watchdog work in 5.2 because both logical CPUs used the same CCCR. This problem causes a crash on Dave Anderson's box: the regular kernel will have NMI watchdog working for both logical CPUs (using two different sets of performance counters of the same core). When booting a the kdump kernel, the second CCCR has the interrupt bit set and since by default kdump kernel boot only using one CPU, the second logical CPU won't be used and thus the second CCCR won't be initialized/reset. As soon the first logical CPU is initialized and sets up the NMI watchdog, it enables the deliver of PMIs using NMIs on local APIC. since the second CCCR is in the same CPU, a NMI is triggered immediately causing a machine to crash due a unlikely race in NMI watchdog code. To fix the problem, two patches were submitted, one to clear the other performance counter registers on P4 when booting with reset_devices and other to fix the race itself. Both patches (this one and the one fixing the NMI watchdog) were submitted upstream and accepted by Ingo for inclusion on 2.6.27 and 2.6.28 respectively. http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commit;h=28b166a700899a0f88b1cc283c449fb5bf72a635 http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commit;h=b3e15bdef689641e7f1bb03efbe56112c3ee82e2 The second patch is not critical as this one and can wait for 5.4. Tested on Dave's box and other P4 boxes with success. diff --git a/arch/x86_64/kernel/perfctr-watchdog.c b/arch/x86_64/kernel/perfctr-watchdog.c index 96eead0..d5a65ea 100644 --- a/arch/x86_64/kernel/perfctr-watchdog.c +++ b/arch/x86_64/kernel/perfctr-watchdog.c @@ -219,6 +219,27 @@ void enable_lapic_nmi_watchdog(void) touch_nmi_watchdog(); } +#define P4_CONTROLS 18 +static unsigned int p4_controls[18] = { + MSR_P4_BPU_CCCR0, + MSR_P4_BPU_CCCR1, + MSR_P4_BPU_CCCR2, + MSR_P4_BPU_CCCR3, + MSR_P4_MS_CCCR0, + MSR_P4_MS_CCCR1, + MSR_P4_MS_CCCR2, + MSR_P4_MS_CCCR3, + MSR_P4_FLAME_CCCR0, + MSR_P4_FLAME_CCCR1, + MSR_P4_FLAME_CCCR2, + MSR_P4_FLAME_CCCR3, + MSR_P4_IQ_CCCR0, + MSR_P4_IQ_CCCR1, + MSR_P4_IQ_CCCR2, + MSR_P4_IQ_CCCR3, + MSR_P4_IQ_CCCR4, + MSR_P4_IQ_CCCR5, +}; /* * Activate the NMI watchdog via the local APIC. */ @@ -468,6 +489,26 @@ static int setup_p4_watchdog(unsigned nmi_hz) evntsel_msr = MSR_P4_CRU_ESCR0; cccr_msr = MSR_P4_IQ_CCCR0; cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4); + + /* + * If we're on the kdump kernel or other situation, we may + * still have other performance counter registers set to + * interrupt and they'll keep interrupting forever because + * of the P4_CCCR_OVF quirk. So we need to ACK all the + * pending interrupts and disable all the registers here, + * before reenabling the NMI delivery. Refer to p4_rearm() + * about the P4_CCCR_OVF quirk. + */ + if (reset_devices) { + unsigned int low, high; + int i; + + for (i = 0; i < P4_CONTROLS; i++) { + rdmsr(p4_controls[i], low, high); + low &= ~(P4_CCCR_ENABLE | P4_CCCR_OVF); + wrmsr(p4_controls[i], low, high); + } + } } else { /* logical cpu 1 */ perfctr_msr = MSR_P4_IQ_PERFCTR1;