From: Shyam Iyer <shiyer@redhat.com> Date: Mon, 30 Aug 2010 18:12:49 -0400 Subject: [pci] fix pci_mmcfg_init making some memory uncacheable Message-id: <1283191969.9419.1129.camel@shyamiyer> Patchwork-id: 27949 O-Subject: [RHEL5.6] [BZ581933] [RESEND_V4] pci_mmcfg_init() making some of main memory uncacheable Bugzilla: 581933 RH-Acked-by: Prarit Bhargava <prarit@redhat.com> RH-Acked-by: Don Dutile <ddutile@redhat.com> Fixes all review comments. Problem: Description and RootCause from Stuart_Hayes@dell.com "A customer complained of seeing lower disk performance using bonnie++ on Dell's Precision T3500 than on the older T3400. Looking into this problem, I found that it didn't happen with 2GB of memory, but it did happen with 4GB. I narrowed down the problem to a big chunk of memory just above the 4GB boundary being marked as uncacheable. This is happening because the MCFG ACPI table on this system has one segment with a base address at 0xF8000000, start bus of 0, and an end bus of 63. The window needed for PCI mm config is 1MB per bus, but the RHEL5.5 kernel (2.6.18-194.el5), in pci_mmcfg_init(), is ignoring the start and end busses and calling ioremap_nocache() at the base address of 0xF8000000 and a size of 256MB. The system has main memory at 4G (physical address 0x100000000), so 128MB of memory from 0x100000000 to 0x108000000 is being made uncacheable, which really hurts performance, especially since the mem_map is put just above the 4G boundary by sparse_early_mem_map_alloc(). Using "pci=nommconf" works around this issue." Redhat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=581933 Upstream: RHEL5 specific based on RHKL comments. Preserves default behaviour in RHEL5. Kabi: No symbols were harmed. Builds on all architectures. https://brewweb.devel.redhat.com/taskinfo?taskID=2716486 Testing: Tested by Dell and customer. Does not affect default behaviour in RHEL5 http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=172905 -Shyam Iyer Dell Onsite Engineer Signed-off-by: Jarod Wilson <jarod@redhat.com> diff --git a/arch/x86_64/pci/mmconfig.c b/arch/x86_64/pci/mmconfig.c index 6098bfb..c767b0f 100644 --- a/arch/x86_64/pci/mmconfig.c +++ b/arch/x86_64/pci/mmconfig.c @@ -137,9 +137,12 @@ static struct pci_raw_ops pci_mmcfg = { #ifdef CONFIG_XEN /* * 1=default for xen kernel, - * 0=force use of MMCONFIG_APER_MAX + * 0=force use of MMCONFIG_APER_MAX (default for non-Xen kernels) */ static int use_acpi_mcfg_max_pci_bus_num = 1; +#else +static int use_acpi_mcfg_max_pci_bus_num = 0; +#endif /* * on == use acpi table value @@ -158,38 +161,49 @@ int __init acpi_mcfg_max_pci_bus_num_setup(char *str) } __setup("acpi_mcfg_max_pci_bus_num=", acpi_mcfg_max_pci_bus_num_setup); -#endif /* * RHEL5 doesn't trust acpi for max pci bus num in acpi table; * but could map past/over valid PCI mmconf space if blindly * use MMCONFIG_APER_MAX; e.g., xen dom0's may fail. * so check if system requires acpi table value, + * or sysadmin forced use of acpi table value, * or sysadmin has forced use of MMCONFIG_APER_MAX on kernel cmd line */ static unsigned long get_mmcfg_aper(struct acpi_table_mcfg_config *cfg) { - unsigned long mmcfg_aper = MMCONFIG_APER_MAX; + unsigned long mmcfg_aper = MMCONFIG_APER_MAX, tmp_mmcfg_aper; + + tmp_mmcfg_aper = cfg->end_bus_number - cfg->start_bus_number + 1; + /* 32 slots, 8 fcns/slot, 4096 pci-cfg bytes/fcn */ + tmp_mmcfg_aper *= 32 * 8 * 4096; /* xen kernel && pci pass-through only */ #ifdef CONFIG_XEN extern int pci_pt_e820_access_enabled; if (use_acpi_mcfg_max_pci_bus_num && pci_pt_e820_access_enabled) { +#else + if (use_acpi_mcfg_max_pci_bus_num) { +#endif /* trust acpi values for end & start bus number */ - mmcfg_aper = - cfg->end_bus_number - cfg->start_bus_number + 1; printk(KERN_INFO "PCI: Using acpi max pci bus value of 0x%lx \n", - mmcfg_aper); - /* 32 slots, 8 fcns/slot, 4096 pci-cfg bytes/fcn */ - mmcfg_aper *= 32 * 8 * 4096; + cfg->end_bus_number - cfg->start_bus_number + 1); + mmcfg_aper = tmp_mmcfg_aper; if (mmcfg_aper < MMCONFIG_APER_MIN) mmcfg_aper = MMCONFIG_APER_MIN; if (mmcfg_aper > MMCONFIG_APER_MAX) mmcfg_aper = MMCONFIG_APER_MAX; + } else { + if (tmp_mmcfg_aper < MMCONFIG_APER_MAX) { + printk(KERN_ERR "Warning: pci_mmcfg_init marking %dMB " + "space uncacheable.\nMCFG table requires %dMB " + "uncacheable only. Try booting with " + "acpi_mcfg_max_pci_bus_num=on\n", + MMCONFIG_APER_MAX >> 20, tmp_mmcfg_aper >> 20); + } } -#endif return mmcfg_aper; }