Discussion:
arm: don't unnecessarily call pmap_extract()
Patrick Wildt
2016-01-24 00:02:49 UTC
Permalink
Hi,

there are two code points in the v7 pmap where we need the physical
address (to flush secondary cache) and use pmap_extract() instead
of just reading it from the vm page. This diff removes those.

Patrick

diff --git sys/arch/arm/arm/pmap7.c sys/arch/arm/arm/pmap7.c
index 5bcc67b..78969bd 100644
--- sys/arch/arm/arm/pmap7.c
+++ sys/arch/arm/arm/pmap7.c
@@ -1105,11 +1105,9 @@ pmap_clean_page(struct vm_page *pg, int isync)
* now while we still have a valid mapping for it.
*/
if (!wb) {
- paddr_t pa;
cpu_dcache_wb_range(pv->pv_va, PAGE_SIZE);
- if (pmap_extract(pm, (vaddr_t)pv->pv_va, &pa))
- cpu_sdcache_wb_range(pv->pv_va, pa,
- PAGE_SIZE);
+ cpu_sdcache_wb_range(pv->pv_va,
+ VM_PAGE_TO_PHYS(pg), PAGE_SIZE);
wb = TRUE;
}
}
@@ -1126,10 +1124,8 @@ pmap_clean_page(struct vm_page *pg, int isync)
PTE_SYNC(cwb_pte);
cpu_tlb_flushD_SE(cwbp);
cpu_cpwait();
- paddr_t pa;
cpu_dcache_wb_range(cwbp, PAGE_SIZE);
- if (pmap_extract(pmap_kernel(), (vaddr_t)cwbp, &pa))
- cpu_sdcache_wb_range(cwbp, pa, PAGE_SIZE);
+ cpu_sdcache_wb_range(cwbp, VM_PAGE_TO_PHYS(pg), PAGE_SIZE);
}
}
Jonathan Gray
2016-01-26 08:32:40 UTC
Permalink
Post by Patrick Wildt
Hi,
there are two code points in the v7 pmap where we need the physical
address (to flush secondary cache) and use pmap_extract() instead
of just reading it from the vm page. This diff removes those.
Patrick
When running with this diff on a imx6 cubox two out of three times xenocara
builds have failed with sh dumping core due to bus errors. Once in
Mesa, and once in libXmu. I don't remember that happening before,
but I'll try and so some more builds on it without the diff.
Post by Patrick Wildt
diff --git sys/arch/arm/arm/pmap7.c sys/arch/arm/arm/pmap7.c
index 5bcc67b..78969bd 100644
--- sys/arch/arm/arm/pmap7.c
+++ sys/arch/arm/arm/pmap7.c
@@ -1105,11 +1105,9 @@ pmap_clean_page(struct vm_page *pg, int isync)
* now while we still have a valid mapping for it.
*/
if (!wb) {
- paddr_t pa;
cpu_dcache_wb_range(pv->pv_va, PAGE_SIZE);
- if (pmap_extract(pm, (vaddr_t)pv->pv_va, &pa))
- cpu_sdcache_wb_range(pv->pv_va, pa,
- PAGE_SIZE);
+ cpu_sdcache_wb_range(pv->pv_va,
+ VM_PAGE_TO_PHYS(pg), PAGE_SIZE);
wb = TRUE;
}
}
@@ -1126,10 +1124,8 @@ pmap_clean_page(struct vm_page *pg, int isync)
PTE_SYNC(cwb_pte);
cpu_tlb_flushD_SE(cwbp);
cpu_cpwait();
- paddr_t pa;
cpu_dcache_wb_range(cwbp, PAGE_SIZE);
- if (pmap_extract(pmap_kernel(), (vaddr_t)cwbp, &pa))
- cpu_sdcache_wb_range(cwbp, pa, PAGE_SIZE);
+ cpu_sdcache_wb_range(cwbp, VM_PAGE_TO_PHYS(pg), PAGE_SIZE);
}
}
Patrick Wildt
2016-01-30 19:01:00 UTC
Permalink
Post by Jonathan Gray
Post by Patrick Wildt
Hi,
there are two code points in the v7 pmap where we need the physical
address (to flush secondary cache) and use pmap_extract() instead
of just reading it from the vm page. This diff removes those.
Patrick
When running with this diff on a imx6 cubox two out of three times xenocara
builds have failed with sh dumping core due to bus errors. Once in
Mesa, and once in libXmu. I don't remember that happening before,
but I'll try and so some more builds on it without the diff.
Did you encounter any more issues running that diff? Have you tried
compiling xenocara without after seeing those issues?

So far I have only run into unrelated build issues, like a missing
define or a python script being started without it having executable
permissions.
Post by Jonathan Gray
Post by Patrick Wildt
diff --git sys/arch/arm/arm/pmap7.c sys/arch/arm/arm/pmap7.c
index 5bcc67b..78969bd 100644
--- sys/arch/arm/arm/pmap7.c
+++ sys/arch/arm/arm/pmap7.c
@@ -1105,11 +1105,9 @@ pmap_clean_page(struct vm_page *pg, int isync)
* now while we still have a valid mapping for it.
*/
if (!wb) {
- paddr_t pa;
cpu_dcache_wb_range(pv->pv_va, PAGE_SIZE);
- if (pmap_extract(pm, (vaddr_t)pv->pv_va, &pa))
- cpu_sdcache_wb_range(pv->pv_va, pa,
- PAGE_SIZE);
+ cpu_sdcache_wb_range(pv->pv_va,
+ VM_PAGE_TO_PHYS(pg), PAGE_SIZE);
wb = TRUE;
}
}
@@ -1126,10 +1124,8 @@ pmap_clean_page(struct vm_page *pg, int isync)
PTE_SYNC(cwb_pte);
cpu_tlb_flushD_SE(cwbp);
cpu_cpwait();
- paddr_t pa;
cpu_dcache_wb_range(cwbp, PAGE_SIZE);
- if (pmap_extract(pmap_kernel(), (vaddr_t)cwbp, &pa))
- cpu_sdcache_wb_range(cwbp, pa, PAGE_SIZE);
+ cpu_sdcache_wb_range(cwbp, VM_PAGE_TO_PHYS(pg), PAGE_SIZE);
}
}
Jonathan Gray
2016-01-31 03:53:57 UTC
Permalink
Post by Patrick Wildt
Post by Jonathan Gray
Post by Patrick Wildt
Hi,
there are two code points in the v7 pmap where we need the physical
address (to flush secondary cache) and use pmap_extract() instead
of just reading it from the vm page. This diff removes those.
Patrick
When running with this diff on a imx6 cubox two out of three times xenocara
builds have failed with sh dumping core due to bus errors. Once in
Mesa, and once in libXmu. I don't remember that happening before,
but I'll try and so some more builds on it without the diff.
Did you encounter any more issues running that diff? Have you tried
compiling xenocara without after seeing those issues?
So far I have only run into unrelated build issues, like a missing
define or a python script being started without it having executable
permissions.
I wonder if the following solidrun u-boot commit is related
https://github.com/SolidRun/u-boot-imx6/commit/408544d61f230060f18ffe2e06565deadbcf3451

commit 408544d61f230060f18ffe2e06565deadbcf3451
Author: Jon Nettleton <***@gmail.com>
Date: Tue Oct 13 13:56:13 2015 +0200

mx6: solidrun: update the pl310 initialization settings

Besides enabling prefetch we also need to enable the shared
attribute override bit in the pl310's aux control register.

Not having this bit set makes the platform non-compliant with the
ARMv7 ARM specified behaviour of conflicting memory aliases.
Without this bit set the L2 controller will turn userspace bufferable
reads into "cachable, no allocate" which can lead to userspace hitting
stale, non-evicted cache lines.

Reference kernel commit eeedcea69e927857d32aaf089725eddd2c79dd0a

Going to try some builds running on a system bootstraped with an
updated u-boot.
Patrick Wildt
2016-01-31 04:55:30 UTC
Permalink
Post by Jonathan Gray
Post by Patrick Wildt
Post by Jonathan Gray
Post by Patrick Wildt
Hi,
there are two code points in the v7 pmap where we need the physical
address (to flush secondary cache) and use pmap_extract() instead
of just reading it from the vm page. This diff removes those.
Patrick
When running with this diff on a imx6 cubox two out of three times xenocara
builds have failed with sh dumping core due to bus errors. Once in
Mesa, and once in libXmu. I don't remember that happening before,
but I'll try and so some more builds on it without the diff.
Did you encounter any more issues running that diff? Have you tried
compiling xenocara without after seeing those issues?
So far I have only run into unrelated build issues, like a missing
define or a python script being started without it having executable
permissions.
I wonder if the following solidrun u-boot commit is related
https://github.com/SolidRun/u-boot-imx6/commit/408544d61f230060f18ffe2e06565deadbcf3451
commit 408544d61f230060f18ffe2e06565deadbcf3451
Date: Tue Oct 13 13:56:13 2015 +0200
mx6: solidrun: update the pl310 initialization settings
Besides enabling prefetch we also need to enable the shared
attribute override bit in the pl310's aux control register.
Not having this bit set makes the platform non-compliant with the
ARMv7 ARM specified behaviour of conflicting memory aliases.
Without this bit set the L2 controller will turn userspace bufferable
reads into "cachable, no allocate" which can lead to userspace hitting
stale, non-evicted cache lines.
Reference kernel commit eeedcea69e927857d32aaf089725eddd2c79dd0a
Going to try some builds running on a system bootstraped with an
updated u-boot.
Ouch, that does not sound good. Sounds like a good idea to apply
and test that fix.

I just remembered changes I did in a local tree nearly two years ago.
It enables prefetch on Cortex-A9 for L1 and L2. L2 is then set up to
not prefetch instructions, only data. Unfortunately I don't remember
if it fixed a specific issue I encountered or was more of a try to
speed the machine it up.

Also there's a little diff to enable the SMP bit for the Cortex-A9.
Without that bit ldrex/strex cannot be used. Would be nice to have
an actual SMP-friendly mutex implementation using ldrex/strex.
Jonathan Gray
2016-01-31 07:01:44 UTC
Permalink
Post by Patrick Wildt
Post by Jonathan Gray
Post by Patrick Wildt
Post by Jonathan Gray
Post by Patrick Wildt
Hi,
there are two code points in the v7 pmap where we need the physical
address (to flush secondary cache) and use pmap_extract() instead
of just reading it from the vm page. This diff removes those.
Patrick
When running with this diff on a imx6 cubox two out of three times xenocara
builds have failed with sh dumping core due to bus errors. Once in
Mesa, and once in libXmu. I don't remember that happening before,
but I'll try and so some more builds on it without the diff.
Did you encounter any more issues running that diff? Have you tried
compiling xenocara without after seeing those issues?
So far I have only run into unrelated build issues, like a missing
define or a python script being started without it having executable
permissions.
I wonder if the following solidrun u-boot commit is related
https://github.com/SolidRun/u-boot-imx6/commit/408544d61f230060f18ffe2e06565deadbcf3451
commit 408544d61f230060f18ffe2e06565deadbcf3451
Date: Tue Oct 13 13:56:13 2015 +0200
mx6: solidrun: update the pl310 initialization settings
Besides enabling prefetch we also need to enable the shared
attribute override bit in the pl310's aux control register.
Not having this bit set makes the platform non-compliant with the
ARMv7 ARM specified behaviour of conflicting memory aliases.
Without this bit set the L2 controller will turn userspace bufferable
reads into "cachable, no allocate" which can lead to userspace hitting
stale, non-evicted cache lines.
Reference kernel commit eeedcea69e927857d32aaf089725eddd2c79dd0a
Going to try some builds running on a system bootstraped with an
updated u-boot.
Ouch, that does not sound good. Sounds like a good idea to apply
and test that fix.
I just remembered changes I did in a local tree nearly two years ago.
It enables prefetch on Cortex-A9 for L1 and L2. L2 is then set up to
not prefetch instructions, only data. Unfortunately I don't remember
if it fixed a specific issue I encountered or was more of a try to
speed the machine it up.
Also there's a little diff to enable the SMP bit for the Cortex-A9.
Without that bit ldrex/strex cannot be used. Would be nice to have
an actual SMP-friendly mutex implementation using ldrex/strex.
Does anything change with regards to memory coherency and types of
barriers that are needed if the bit is set?

It isn't clear to me how much we'd have to care about local/global
monitor, snoop control, changes to shareability domain configuration etc
when just running SP.

Loading...