Discussion:
Firefox, malloc(3) and threads
Mark Kettenis
2016-01-22 21:46:39 UTC
Permalink
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.

Enjoy,

Mark


Index: rthread.h
===================================================================
RCS file: /cvs/src/lib/librthread/rthread.h,v
retrieving revision 1.54
diff -u -p -r1.54 rthread.h
--- rthread.h 10 Nov 2015 04:30:59 -0000 1.54
+++ rthread.h 22 Jan 2016 21:08:11 -0000
@@ -223,6 +223,7 @@ void _rthread_debug_init(void);
#ifndef NO_PIC
void _rthread_dl_lock(int what);
#endif
+void _thread_malloc_reinit(void);

/* rthread_cancel.c */
void _enter_cancel(pthread_t);
Index: rthread_fork.c
===================================================================
RCS file: /cvs/src/lib/librthread/rthread_fork.c,v
retrieving revision 1.14
diff -u -p -r1.14 rthread_fork.c
--- rthread_fork.c 18 Oct 2015 08:02:58 -0000 1.14
+++ rthread_fork.c 22 Jan 2016 21:08:11 -0000
@@ -82,7 +82,10 @@ _dofork(int is_vfork)
newid = sys_fork();

_thread_arc4_unlock();
- _thread_malloc_unlock();
+ if (newid == 0)
+ _thread_malloc_reinit();
+ else
+ _thread_malloc_unlock();
_thread_atexit_unlock();

if (newid == 0) {
Index: rthread_libc.c
===================================================================
RCS file: /cvs/src/lib/librthread/rthread_libc.c,v
retrieving revision 1.12
diff -u -p -r1.12 rthread_libc.c
--- rthread_libc.c 7 Apr 2015 01:27:07 -0000 1.12
+++ rthread_libc.c 22 Jan 2016 21:08:11 -0000
@@ -152,18 +152,35 @@ _thread_mutex_destroy(void **mutex)
/*
* the malloc lock
*/
-static struct _spinlock malloc_lock = _SPINLOCK_UNLOCKED;
+static struct pthread_mutex malloc_lock = {
+ _SPINLOCK_UNLOCKED,
+ TAILQ_HEAD_INITIALIZER(malloc_lock.lockers),
+ PTHREAD_MUTEX_DEFAULT,
+ NULL,
+ 0,
+ -1
+};
+static pthread_mutex_t malloc_mutex = &malloc_lock;

void
_thread_malloc_lock(void)
{
- _spinlock(&malloc_lock);
+ pthread_mutex_lock(&malloc_mutex);
}

void
_thread_malloc_unlock(void)
{
- _spinunlock(&malloc_lock);
+ pthread_mutex_unlock(&malloc_mutex);
+}
+
+void
+_thread_malloc_reinit(void)
+{
+ malloc_lock.lock = _SPINLOCK_UNLOCKED_ASSIGN;
+ TAILQ_INIT(&malloc_lock.lockers);
+ malloc_lock.owner = NULL;
+ malloc_lock.count = 0;
}

/*
Martin Natano
2016-01-23 14:53:32 UTC
Permalink
Yes! This absolutely makes Youtube videos watchable for me (on a
Thinkpad T520). There still is occassional stuttering, but _far_ less
disruptive than before. Another usecase where I see improvements is
reloading a resource-heavy web page while switching tabs. Before
applying the patch, this caused the browser to hang for several seconds.
Now it doesn't.

The patch reads fine to, although I'm not an rthread expert. It doesn't
seem to break anything on my system either.

Thanks,
natano
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
Index: rthread.h
===================================================================
RCS file: /cvs/src/lib/librthread/rthread.h,v
retrieving revision 1.54
diff -u -p -r1.54 rthread.h
--- rthread.h 10 Nov 2015 04:30:59 -0000 1.54
+++ rthread.h 22 Jan 2016 21:08:11 -0000
@@ -223,6 +223,7 @@ void _rthread_debug_init(void);
#ifndef NO_PIC
void _rthread_dl_lock(int what);
#endif
+void _thread_malloc_reinit(void);
/* rthread_cancel.c */
void _enter_cancel(pthread_t);
Index: rthread_fork.c
===================================================================
RCS file: /cvs/src/lib/librthread/rthread_fork.c,v
retrieving revision 1.14
diff -u -p -r1.14 rthread_fork.c
--- rthread_fork.c 18 Oct 2015 08:02:58 -0000 1.14
+++ rthread_fork.c 22 Jan 2016 21:08:11 -0000
@@ -82,7 +82,10 @@ _dofork(int is_vfork)
newid = sys_fork();
_thread_arc4_unlock();
- _thread_malloc_unlock();
+ if (newid == 0)
+ _thread_malloc_reinit();
+ else
+ _thread_malloc_unlock();
_thread_atexit_unlock();
if (newid == 0) {
Index: rthread_libc.c
===================================================================
RCS file: /cvs/src/lib/librthread/rthread_libc.c,v
retrieving revision 1.12
diff -u -p -r1.12 rthread_libc.c
--- rthread_libc.c 7 Apr 2015 01:27:07 -0000 1.12
+++ rthread_libc.c 22 Jan 2016 21:08:11 -0000
@@ -152,18 +152,35 @@ _thread_mutex_destroy(void **mutex)
/*
* the malloc lock
*/
-static struct _spinlock malloc_lock = _SPINLOCK_UNLOCKED;
+static struct pthread_mutex malloc_lock = {
+ _SPINLOCK_UNLOCKED,
+ TAILQ_HEAD_INITIALIZER(malloc_lock.lockers),
+ PTHREAD_MUTEX_DEFAULT,
+ NULL,
+ 0,
+ -1
+};
+static pthread_mutex_t malloc_mutex = &malloc_lock;
void
_thread_malloc_lock(void)
{
- _spinlock(&malloc_lock);
+ pthread_mutex_lock(&malloc_mutex);
}
void
_thread_malloc_unlock(void)
{
- _spinunlock(&malloc_lock);
+ pthread_mutex_unlock(&malloc_mutex);
+}
+
+void
+_thread_malloc_reinit(void)
+{
+ malloc_lock.lock = _SPINLOCK_UNLOCKED_ASSIGN;
+ TAILQ_INIT(&malloc_lock.lockers);
+ malloc_lock.owner = NULL;
+ malloc_lock.count = 0;
}
/*
Daniel Bolgheroni
2016-01-25 23:00:27 UTC
Permalink
Post by Martin Natano
Yes! This absolutely makes Youtube videos watchable for me (on a
Thinkpad T520). There still is occassional stuttering, but _far_ less
disruptive than before. Another usecase where I see improvements is
reloading a resource-heavy web page while switching tabs. Before
applying the patch, this caused the browser to hang for several seconds.
Now it doesn't.
The same here on a ThinkPad T420.

dmesg:
OpenBSD 5.9-beta (GENERIC.MP) #0: Mon Jan 25 19:14:50 BRST 2016
***@iron.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8451125248 (8059MB)
avail mem = 8190803968 (7811MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (65 entries)
bios0: vendor LENOVO version "83ET70WW (1.40 )" date 06/12/2012
bios0: LENOVO 4180DL4
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA SSDT SSDT DMAR UEFI UEFI UEFI
acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EHC1(S3) EHC2(S3) HDEF(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.32 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.92 MHz
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus 5 (EXP4)
acpiprt5 at acpi0: bus 13 (EXP5)
acpicpu0 at acpi0: C3(***@104 ***@0x415), C1(***@1 halt), PSS
acpicpu1 at acpi0: C3(***@104 ***@0x415), C1(***@1 halt), PSS
acpicpu2 at acpi0: C3(***@104 ***@0x415), C1(***@1 halt), PSS
acpicpu3 at acpi0: C3(***@104 ***@0x415), C1(***@1 halt), PSS
acpipwrres0 at acpi0: PUBS, resource for EHC1, EHC2
acpitz0 at acpi0: critical temperature is 98 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpibat0 at acpi0: BAT0 model "42T4710" serial 1694 type LION oem "SANYO"
acpibat1 at acpi0: BAT1 not present
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0
cpu0: Enhanced SpeedStep 2492 MHz: speeds: 2501, 2500, 2200, 2000, 1800, 1600, 1400, 1200, 1000, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 3000" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1600x900
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel 6 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
puc0 at pci0 dev 22 function 3 "Intel 6 Series KT" rev 0x04: ports: 1 com
com4 at puc0 port 0 apic 2 int 19: ns16550a, 16 byte fifo
com4: probed fifo depth: 0 bytes
em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 00:21:cc:ba:e3:5d
ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x04: apic 2 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia0 at pci0 dev 27 function 0 "Intel 6 Series HD Audio" rev 0x04: msi
azalia0: codecs: Conexant CX20590, Conexant/0x2c06, Intel/0x2805, using Conexant CX20590
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 6 Series PCIE" rev 0xb4: msi
pci1 at ppb0 bus 2
ppb1 at pci0 dev 28 function 1 "Intel 6 Series PCIE" rev 0xb4: msi
pci2 at ppb1 bus 3
iwn0 at pci2 dev 0 function 0 "Intel Centrino Advanced-N 6205" rev 0x34: msi, MIMO 2T2R, MoW, address 10:0b:a9:96:72:30
ppb2 at pci0 dev 28 function 3 "Intel 6 Series PCIE" rev 0xb4: msi
pci3 at ppb2 bus 5
ppb3 at pci0 dev 28 function 4 "Intel 6 Series PCIE" rev 0xb4: msi
pci4 at ppb3 bus 13
sdhc0 at pci4 dev 0 function 0 "Ricoh 5U822 SD/MMC" rev 0x08: apic 2 int 16
sdmmc0 at sdhc0
ehci1 at pci0 dev 29 function 0 "Intel 6 Series USB" rev 0x04: apic 2 int 23
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel QM67 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 6 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 0: 3.0Gb/s
ahci0: port 1: 1.5Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, HITACHI HTS72323, EC2Z> SCSI3 0/direct fixed naa.5000cca6d4cc4334
sd0: 305245MB, 512 bytes/sector, 625142448 sectors
cd0 at scsibus1 targ 1 lun 0: <HL-DT-ST, DVDRAM GT33N, LT20> ATAPI 5/cdrom removable
ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x04: apic 2 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-10600 SO-DIMM
spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM PC3-10600 SO-DIMM
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics touchpad, firmware 7.2
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
aps0 at isa0 port 0x1600/31
uhub2 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub3 at uhub2 port 2 "Apple, Inc. Keyboard Hub" rev 2.00/96.15 addr 3
ugen0 at uhub2 port 4 "Broadcom Corp Broadcom Bluetooth Device" rev 2.00/7.48 addr 4
uvideo0 at uhub2 port 6 configuration 1 interface 0 "Chicony Electronics Co., Ltd. Integrated Camera" rev 2.00/7.52 addr 5
video0 at uvideo0
uhub4 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (67871fe00f1ba2f6.a) swap on sd0b dump on sd0b
uhub3 detached
uhub3 at uhub2 port 2 "Apple, Inc. Keyboard Hub" rev 2.00/96.15 addr 3
uhidev0 at uhub3 port 1 configuration 1 interface 0 "Logitech USB Optical Mouse" rev 2.00/63.00 addr 6
uhidev0: iclass 3/1
ums0 at uhidev0: 3 buttons, Z dir
wsmouse2 at ums0 mux 0
uhidev1 at uhub3 port 2 configuration 1 interface 0 "Apple, Inc Apple Keyboard" rev 2.00/0.71 addr 7
uhidev1: iclass 3/1
ukbd0 at uhidev1: 8 variable keys, 5 key codes, country code 33
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev2 at uhub3 port 2 configuration 1 interface 1 "Apple, Inc Apple Keyboard" rev 2.00/0.71 addr 7
uhidev2: iclass 3/0
uhid0 at uhidev2: input=1, output=0, feature=0
wsmouse2 detached
ums0 detached
uhidev0 detached
wskbd1: disconnecting from wsdisplay0
wskbd1 detached
ukbd0 detached
uhidev1 detached
uhid0 detached
uhidev2 detached
uhub3 detached
ugen0 detached
video0 detached
uvideo0 detached
uhub2 detached
uhub0 detached
uhub4 detached
uhub1 detached
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
uhub2 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub3 at uhub2 port 2 "Apple, Inc. Keyboard Hub" rev 2.00/96.15 addr 3
uhidev0 at uhub3 port 1 configuration 1 interface 0 "Logitech USB Optical Mouse" rev 2.00/63.00 addr 4
uhidev0: iclass 3/1
ums0 at uhidev0: 3 buttons, Z dir
wsmouse2 at ums0 mux 0
uhidev1 at uhub3 port 2 configuration 1 interface 0 "Apple, Inc Apple Keyboard" rev 2.00/0.71 addr 5
uhidev1: iclass 3/1
ukbd0 at uhidev1: 8 variable keys, 5 key codes, country code 33
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev2 at uhub3 port 2 configuration 1 interface 1 "Apple, Inc Apple Keyboard" rev 2.00/0.71 addr 5
uhidev2: iclass 3/0
uhid0 at uhidev2: input=1, output=0, feature=0
ugen0 at uhub2 port 4 "Broadcom Corp Broadcom Bluetooth Device" rev 2.00/7.48 addr 6
uvideo0 at uhub2 port 6 configuration 1 interface 0 "Chicony Electronics Co., Ltd. Integrated Camera" rev 2.00/7.52 addr 7
video0 at uvideo0
uhub4 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
--
db
Jaime Tarrant
2016-01-24 01:34:13 UTC
Permalink
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
[snip]

Hi Mark,

I have applied your patch and noticed a big improvement with Youtube
videos and if I am not mistaken, content heavy websites like news
sites seem to load faster and more smoothly too.

This machine is a 2009 Macbook Pro running -Current. I will patch my
-Current server as well and let you know if I notice anything good or
bad.

Awesome! Thanks!!
Adam Wolk
2016-01-24 18:47:38 UTC
Permalink
On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
'
Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
on firefox but feels significantly better. I can also now watch full
screen youtube videos on chromium 1920x1080 with no stutter (lenovo
g50-70).

Generally gnome 3 feels a bit snappier especially on first load,
bringing up the menu searching for 'terminal' leads to a faster
rendering of the results. This might be just 'imagined' by me.

On a more measurable front. I ran the octane benchmark against firefox
post and before the patch. It resulted in a slight improvement from
12486 to 12826 score [1].

cpu0: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.93 MHz
cpu1: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
cpu2: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
cpu3: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 1895.62 MHz
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x0b
running Intel Haswell Mobile for the gfx card.

Regards,
Adam

[1] - https://twitter.com/mulander/status/691327370985345024
Ville Valkonen
2016-01-24 20:05:23 UTC
Permalink
Post by Adam Wolk
On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
'
Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
on firefox but feels significantly better. I can also now watch full
screen youtube videos on chromium 1920x1080 with no stutter (lenovo
g50-70).
Generally gnome 3 feels a bit snappier especially on first load,
bringing up the menu searching for 'terminal' leads to a faster
rendering of the results. This might be just 'imagined' by me.
On a more measurable front. I ran the octane benchmark against firefox
post and before the patch. It resulted in a slight improvement from
12486 to 12826 score [1].
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x0b
running Intel Haswell Mobile for the gfx card.
Regards,
Adam
[1] - https://twitter.com/mulander/status/691327370985345024
Hi,

pretty much the same results here, though running Lenovo X250 with i7-5600U.

Dankuwel Mark, nice finding.

--
Regards,
Ville Valkonen
David Coppa
2016-01-25 09:06:22 UTC
Permalink
Post by Adam Wolk
On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
'
Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
on firefox but feels significantly better. I can also now watch full
screen youtube videos on chromium 1920x1080 with no stutter (lenovo
g50-70).
Generally gnome 3 feels a bit snappier especially on first load,
bringing up the menu searching for 'terminal' leads to a faster
rendering of the results. This might be just 'imagined' by me.
On a more measurable front. I ran the octane benchmark against firefox
post and before the patch. It resulted in a slight improvement from
12486 to 12826 score [1].
Besides performance related issues, the problem we saw in the past was
firefox using a huge amount of CPU resources with no apparent
reasons...
So please also try to test if you still see this erratic behavior with
Mark's patch applied.

ciao,
David
Juan Francisco Cantero Hurtado
2016-01-25 18:37:33 UTC
Permalink
Post by David Coppa
Post by Adam Wolk
On Fri, 22 Jan 2016 22:46:39 +0100 (CET)
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
'
Applied to a Jan 15h snapshot sources. Youtube is not fully 'watchable'
on firefox but feels significantly better. I can also now watch full
screen youtube videos on chromium 1920x1080 with no stutter (lenovo
g50-70).
Generally gnome 3 feels a bit snappier especially on first load,
bringing up the menu searching for 'terminal' leads to a faster
rendering of the results. This might be just 'imagined' by me.
On a more measurable front. I ran the octane benchmark against firefox
post and before the patch. It resulted in a slight improvement from
12486 to 12826 score [1].
Besides performance related issues, the problem we saw in the past was
firefox using a huge amount of CPU resources with no apparent
reasons...
I've seen the same behavior on Linux. Probably not 100% related to the
OS.
--
Juan Francisco Cantero Hurtado http://juanfra.info
Peter N. M. Hansteen
2016-01-24 22:10:41 UTC
Permalink
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Running this since early Saturday, Firefox is definitely more responsive
than earlier.

I haven't tried running other resource hogs such as LibreOffice with
several large documents, but I guess I could try that too if it's a
relevant scenario.

- P
--
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://bsdly.blogspot.com/ http://www.bsdly.net/ http://www.nuug.no/
"Remember to set the evil bit on all malicious network traffic"
delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
Mark Kettenis
2016-01-25 07:48:21 UTC
Permalink
Date: Sun, 24 Jan 2016 23:10:41 +0100
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Running this since early Saturday, Firefox is definitely more responsive
than earlier.
I haven't tried running other resource hogs such as LibreOffice with
several large documents, but I guess I could try that too if it's a
relevant scenario.
Please do!
Landry Breuil
2016-01-25 08:57:37 UTC
Permalink
Post by Mark Kettenis
Date: Sun, 24 Jan 2016 23:10:41 +0100
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Running this since early Saturday, Firefox is definitely more responsive
than earlier.
I haven't tried running other resource hogs such as LibreOffice with
several large documents, but I guess I could try that too if it's a
relevant scenario.
Please do!
Albeit small, x11/xfce4/thunar makes a heavy use of threads (in general,
and even more when talking to gvfs mounts). It feels now 200% snappier.

Landry
Landry Breuil
2016-01-26 21:51:03 UTC
Permalink
Post by Landry Breuil
Post by Mark Kettenis
Date: Sun, 24 Jan 2016 23:10:41 +0100
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Running this since early Saturday, Firefox is definitely more responsive
than earlier.
I haven't tried running other resource hogs such as LibreOffice with
several large documents, but I guess I could try that too if it's a
relevant scenario.
Please do!
Albeit small, x11/xfce4/thunar makes a heavy use of threads (in general,
and even more when talking to gvfs mounts). It feels now 200% snappier.
Another successful test on i386 where firefox had became totally unusable
(Atom N270, 1Gb RAM), with the latest snap (including the diff) it's
sort-of usable (gmaps, google news...). Yay!

Landry

l***@ggp2.com
2016-01-25 13:00:34 UTC
Permalink
I haven't tried anything too scientific yet, but pages seem to load
quicker and firefox seems to be more responsive under load for me.
Before this patch, loading a page would have a tendency to lock the
browser for a few seconds on complex pages.

Nothing seems to have broken, so I'll try harder.
Edd Barrett
2016-01-25 16:03:26 UTC
Permalink
Hi Mark,
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
I tried your diff. Nothing bad happened.

I don't notice much difference in firefox using a highly unscientific
"gut-feeling" before and after test. Youtube videos still stutter -- too
much to watch. During this time firefox uses ~170% CPU.

I also tried iridium, my everyday browser and didn't notice a difference
here either. Youtube videos performance remains the same: much better
than firefox, but still skipping frequently.

My system is a thinkpad x240t tablet. Dmesg follows (sorry about the
suspend in there: I have to perform a zzz and wake before the HDMI2
output shows up in my docking station, so it's always the first thing I
do after booting fresh -- keep meaning to look into this):

OpenBSD 5.9-beta (GENERIC.MP) #17: Mon Jan 25 14:31:46 GMT 2016
***@wilfred.dlink.com:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16844521472 (16064MB)
avail mem = 16329822208 (15573MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdae9c000 (68 entries)
bios0: vendor LENOVO version "GCETA2WW (2.62 )" date 04/09/2015
bios0: LENOVO 3437CTO
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC TCPA SSDT SSDT SSDT HPET APIC MCFG ECDT FPDT ASF! UEFI UEFI POAT SSDT SSDT DMAR UEFI DBG2
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP3(S4) XHCI(S3) EHC1(S3) EHC2(S3) HDEF(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.59 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.11 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus 4 (EXP3)
acpicpu0 at acpi0: C2(***@80 ***@0x20), C1(***@1 mwait.1), PSS
acpicpu1 at acpi0: C2(***@80 ***@0x20), C1(***@1 mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1, EHC2
acpitz0 at acpi0: critical temperature is 103 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpibat0 at acpi0: BAT0 model "45N1077" serial 14278 type LION oem "SANYO"
acpibat1 at acpi0: BAT1 not present
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0
acpidock0 at acpi0: GDCK docked (15)
cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2500, 2400, 2300, 2200, 2100, 2000, 1900, 1800, 1700, 1600, 1500, 1400, 1300, 1200 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 3G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4000" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1366x768
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel 7 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel 82579LM" rev 0x04: msi, address 3c:97:0e:a5:02:69
ehci0 at pci0 dev 26 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia0 at pci0 dev 27 function 0 "Intel 7 Series HD Audio" rev 0x04: msi
azalia0: codecs: Realtek ALC269, Intel/0x2806, using Realtek ALC269
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 7 Series PCIE" rev 0xc4: msi
pci1 at ppb0 bus 2
sdhc0 at pci1 dev 0 function 0 "Ricoh 5U822 SD/MMC" rev 0x07: apic 2 int 16
sdmmc0 at sdhc0
ppb1 at pci0 dev 28 function 1 "Intel 7 Series PCIE" rev 0xc4: msi
pci2 at ppb1 bus 3
iwn0 at pci2 dev 0 function 0 "Intel Centrino Wireless-N 2200" rev 0xc4: msi, MIMO 2T2R, BGN, address 9c:4e:36:b8:f8:f8
ppb2 at pci0 dev 28 function 2 "Intel 7 Series PCIE" rev 0xc4: msi
pci3 at ppb2 bus 4
ehci1 at pci0 dev 29 function 0 "Intel 7 Series USB" rev 0x04: apic 2 int 23
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel QM77 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 7 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 0: 6.0Gb/s
ahci0: port 1: 1.5Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Crucial_CT240M50, MU02> SCSI3 0/direct fixed naa.500a0751093c44b5
sd0: 228936MB, 512 bytes/sector, 468862128 sectors, thin
cd0 at scsibus1 targ 1 lun 0: <MATSHITA, DVD-RAM UJ8C2, SB01> ATAPI 5/cdrom removable
ichiic0 at pci0 dev 31 function 3 "Intel 7 Series SMBus" rev 0x04: apic 2 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-12800 SO-DIMM
spdmem1 at iic0 addr 0x51: 8GB DDR3 SDRAM PC3-12800 SO-DIMM
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
uhub2 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhidev0 at uhub2 port 2 configuration 1 interface 0 "Logitech USB Receiver" rev 2.00/12.01 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub2 port 2 configuration 1 interface 1 "Logitech USB Receiver" rev 2.00/12.01 addr 3
uhidev1: iclass 3/1, 8 report ids
ums0 at uhidev1 reportid 2: 16 buttons, Z and W dir
wsmouse1 at ums0 mux 0
uhid0 at uhidev1 reportid 3: input=4, output=0, feature=0
uhid1 at uhidev1 reportid 4: input=1, output=0, feature=0
uhid2 at uhidev1 reportid 8: input=1, output=0, feature=0
uhidev2 at uhub2 port 2 configuration 1 interface 2 "Logitech USB Receiver" rev 2.00/12.01 addr 3
uhidev2: iclass 3/0, 33 report ids
uhid3 at uhidev2 reportid 16: input=6, output=6, feature=0
uhid4 at uhidev2 reportid 17: input=19, output=19, feature=0
uhid5 at uhidev2 reportid 32: input=14, output=14, feature=0
uhid6 at uhidev2 reportid 33: input=31, output=31, feature=0
ugen0 at uhub2 port 4 "Broadcom Corp BCM20702A0" rev 2.00/1.12 addr 4
uvideo0 at uhub2 port 6 configuration 1 interface 0 "Chicony Electronics Co., Ltd. Integrated Camera" rev 2.00/5.20 addr 5
video0 at uvideo0
uhub3 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub4 at uhub3 port 3 "Standard Microsystems product 0x2514" rev 2.00/0.00 addr 3
uhidev3 at uhub3 port 5 configuration 1 interface 0 "Tablet ISD-V4" rev 1.10/6.11 addr 4
uhidev3: iclass 3/1, 2 report ids
ums1 at uhidev3 reportid 1: 2 buttons
wsmouse2 at ums1 mux 0
ums2 at uhidev3 reportid 2: 3 buttons, tip, barrel, eraser
wsmouse3 at ums2 mux 0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (a81068940d057f4c.a) swap on sd0b dump on sd0b
uhidev4 at uhub4 port 4 configuration 1 interface 0 "SEJIN SEJIN USB joint Keyboard" rev 1.10/1.30 addr 5
uhidev4: iclass 3/1
ukbd1 at uhidev4: 8 variable keys, 6 key codes
wskbd2 at ukbd1 mux 1
wskbd2: connecting to wsdisplay0
wskbd1: disconnecting from wsdisplay0
wskbd1 detached
ukbd0 detached
uhidev0 detached
wsmouse1 detached
ums0 detached
uhid0 detached
uhid1 detached
uhid2 detached
uhidev1 detached
uhid3 detached
uhid4 detached
uhid5 detached
uhid6 detached
uhidev2 detached
ugen0 detached
video0 detached
uvideo0 detached
uhub2 detached
uhub0 detached
wskbd2: disconnecting from wsdisplay0
wskbd2 detached
ukbd1 detached
uhidev4 detached
uhub4 detached
wsmouse2 detached
ums1 detached
wsmouse3 detached
ums2 detached
uhidev3 detached
uhub3 detached
uhub1 detached
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
uhub2 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhidev0 at uhub2 port 2 configuration 1 interface 0 "Logitech USB Receiver" rev 2.00/12.01 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub2 port 2 configuration 1 interface 1 "Logitech USB Receiver" rev 2.00/12.01 addr 3
uhidev1: iclass 3/1, 8 report ids
ums0 at uhidev1 reportid 2: 16 buttons, Z and W dir
wsmouse1 at ums0 mux 0
uhid0 at uhidev1 reportid 3: input=4, output=0, feature=0
uhid1 at uhidev1 reportid 4: input=1, output=0, feature=0
uhid2 at uhidev1 reportid 8: input=1, output=0, feature=0
uhidev2 at uhub2 port 2 configuration 1 interface 2 "Logitech USB Receiver" rev 2.00/12.01 addr 3
uhidev2: iclass 3/0, 33 report ids
uhid3 at uhidev2 reportid 16: input=6, output=6, feature=0
uhid4 at uhidev2 reportid 17: input=19, output=19, feature=0
uhid5 at uhidev2 reportid 32: input=14, output=14, feature=0
uhid6 at uhidev2 reportid 33: input=31, output=31, feature=0
ugen0 at uhub2 port 4 "Broadcom Corp BCM20702A0" rev 2.00/1.12 addr 4
uvideo0 at uhub2 port 6 configuration 1 interface 0 "Chicony Electronics Co., Ltd. Integrated Camera" rev 2.00/5.20 addr 5
video0 at uvideo0
uhub3 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub4 at uhub3 port 3 "Standard Microsystems product 0x2514" rev 2.00/0.00 addr 3
uhidev3 at uhub3 port 5 configuration 1 interface 0 "Tablet ISD-V4" rev 1.10/6.11 addr 4
uhidev3: iclass 3/1, 2 report ids
ums1 at uhidev3 reportid 1: 2 buttons
wsmouse2 at ums1 mux 0
ums2 at uhidev3 reportid 2: 3 buttons, tip, barrel, eraser
wsmouse3 at ums2 mux 0
uhidev4 at uhub4 port 4 configuration 1 interface 0 "SEJIN SEJIN USB joint Keyboard" rev 1.10/1.30 addr 5
uhidev4: iclass 3/1
ukbd1 at uhidev4: 8 variable keys, 6 key codes
wskbd2 at ukbd1 mux 1
wskbd2: connecting to wsdisplay0
--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk
Stefan Wollny
2016-01-25 20:20:39 UTC
Permalink
Hi Mark,

even with 16GB RAM I needed to install smtube to get a decent view of
videos prior to your patches. Patched last night but only tonight I am
able to do some testing:

At present I have openend
- LibreOffice Writer with one doc
- LibreOffice Calc with one doc
- gimp with one picture
- Pidgin-OTR
- smplayer (nothing playing)
- Thunderbird (two mail boxes)
- Firefox with 10 tabs open, one of them being YT (Theo talking about
pledge at Hackfest 2015)

Even though YT is hanging every now an then it it now perfectly possible
to watch / listen / follow the presentation although I have just a
modest line. CPU usage (noticed via 'top') peaked at around 160% but
average seems to be around 100%.

I didn't notice any drawbacks from your patches. Every program is
responsive, only thunderbird had some delays while typing this post
(listening to Theo meanwhile).

While this is not a "serious" test (by academic terms as it is not 100%
repeatable) I can only report that I didn't come across any failures.
Instead the system "feels" to be highly responsive with any task I tried.

To summarize: THANK YOU!

Best,
STEFAN



OpenBSD 5.9-beta (GENERIC.MP) #1863: Sun Jan 24 21:35:42 MST 2016
***@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17082359808 (16291MB)
avail mem = 16560455680 (15793MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb500 (35 entries)
bios0: vendor American Megatrends Inc. version "1.05.01" date 08/05/2015
bios0: Notebook W65_67SZ
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT ASF! SSDT SSDT SSDT MCFG HPET SSDT
SSDT SSDT DMAR
acpi0: wakeup devices PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4)
RP03(S4) PXSX(S4) RP04(S4) RLAN(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4)
PXSX(S4) RP07(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3093.23 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz, 3092.84 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (RP01)
acpiprt2 at acpi0: bus 3 (RP03)
acpiprt3 at acpi0: bus 4 (RP04)
acpiprt4 at acpi0: bus 1 (P0P2)
acpiprt5 at acpi0: bus -1 (P0PA)
acpiprt6 at acpi0: bus -1 (P0PB)
acpiprt7 at acpi0: bus 1 (PEG0)
acpiec0 at acpi0
acpicpu0 at acpi0: C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpicpu1 at acpi0: C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpicpu2 at acpi0: C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpicpu3 at acpi0: C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpitz0 at acpi0: critical temperature is 120 degC
acpibtn0 at acpi0: PWRB
acpibtn1 at acpi0: SLPB
acpibtn2 at acpi0: LID0
acpiac0 at acpi0: AC unit online
acpibat0 at acpi0: BAT0 model "BAT" serial 0001 type LION oem "Notebook"
acpivideo0 at acpi0: GFX0
acpivout0 at acpivideo0: LCD0
cpu0: Enhanced SpeedStep 3093 MHz: speeds: 2601, 2600, 2500, 2300, 2200,
2100, 2000, 1800, 1700, 1600, 1400, 1300, 1200, 1100, 900, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06
ppb0 at pci0 dev 1 function 0 "Intel Core 4G PCIE" rev 0x06: msi
pci1 at ppb0 bus 1
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4600" rev 0x06
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1920x1080
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
azalia0 at pci0 dev 3 function 0 "Intel Core 4G HD Audio" rev 0x06: msi
xhci0 at pci0 dev 20 function 0 "Intel 8 Series xHCI" rev 0x05: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
"Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x05: apic 2 int 16
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia1 at pci0 dev 27 function 0 "Intel 8 Series HD Audio" rev 0x05: msi
azalia1: codecs: VIA/0x8446
audio0 at azalia1
ppb1 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd5
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 2 "Intel 8 Series PCIE" rev 0xd5: msi
pci3 at ppb2 bus 3
iwm0 at pci3 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev
0xbb, msi
ppb3 at pci0 dev 28 function 3 "Intel 8 Series PCIE" rev 0xd5: msi
pci4 at ppb3 bus 4
rtsx0 at pci4 dev 0 function 0 "Realtek RTL8411 Card Reader" rev 0x01: msi
sdmmc0 at rtsx0
re0 at pci4 dev 0 function 2 "Realtek 8168" rev 0x0a: RTL8411 (0x4880),
msi, address 80:fa:5b:13:a0:ad
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x05: apic 2 int 23
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel HM86 LPC" rev 0x05
ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x05: msi,
AHCI 1.3
ahci0: port 0: 1.5Gb/s
ahci0: port 4: 6.0Gb/s
ahci0: port 5: 6.0Gb/s
scsibus1 at ahci0: 32 targets
cd0 at scsibus1 targ 0 lun 0: <TSSTcorp, CDDVDW SN-208FB, SB00> ATAPI
5/cdrom removable
sd0 at scsibus1 targ 4 lun 0: <ATA, Samsung SSD 850, EMT4> SCSI3
0/direct fixed naa.5002538d402ece0c
sd0: 114473MB, 512 bytes/sector, 234441648 sectors, thin
sd1 at scsibus1 targ 5 lun 0: <ATA, Samsung SSD 850, EXM0> SCSI3
0/direct fixed naa.500253887007d4c5
sd1: 976762MB, 512 bytes/sector, 2000409264 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x05: apic
2 int 18
iic0 at ichiic0
iic0: addr 0x18 00=00 01=00 02=00 03=00 04=00 05=c2 06=1b 07=0a 08=00
09=00 0a=00 0b=00 0c=00 0d=00 0e=00 0f=00 words 00=007f 01=0000 02=0000
03=0000 04=0000 05=c2ac 06=1b09 07=0a01
spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-14200 SO-DIMM with thermal
sensor
spdmem1 at iic0 addr 0x52: 8GB DDR3 SDRAM PC3-14200 SO-DIMM with thermal
sensor
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
error: [drm:pid0:intel_uncore_check_errors] *ERROR* Unclaimed register
before interrupt
uhub3 at uhub0 port 6 "Genesys Logic USB2.0 Hub" rev 2.00/9.01 addr 2
uhidev0 at uhub3 port 2 configuration 1 interface 0 "Logitech Trackball"
rev 1.10/2.20 addr 3
uhidev0: iclass 3/1
ums0 at uhidev0: 3 buttons, Z dir
wsmouse1 at ums0 mux 0
umass0 at uhub3 port 3 configuration 1 interface 0 "General UDisk" rev
2.00/1.00 addr 4
umass0: using SCSI over Bulk-Only
scsibus2 at umass0: 2 targets, initiator 0
sd2 at scsibus2 targ 1 lun 0: <General, UDisk, 5.00> SCSI2 0/direct
removable serial.abcd1234245144257801
sd2: 7681MB, 512 bytes/sector, 15730688 sectors
uhidev1 at uhub3 port 4 configuration 1 interface 0 "Logitech USB
Receiver" rev 2.00/12.01 addr 5
uhidev1: iclass 3/1
ukbd0 at uhidev1: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev2 at uhub3 port 4 configuration 1 interface 1 "Logitech USB
Receiver" rev 2.00/12.01 addr 5
uhidev2: iclass 3/1, 8 report ids
ums1 at uhidev2 reportid 2: 16 buttons, Z and W dir
wsmouse2 at ums1 mux 0
uhid0 at uhidev2 reportid 3: input=4, output=0, feature=0
uhid1 at uhidev2 reportid 4: input=1, output=0, feature=0
uhid2 at uhidev2 reportid 8: input=1, output=0, feature=0
uhidev3 at uhub3 port 4 configuration 1 interface 2 "Logitech USB
Receiver" rev 2.00/12.01 addr 5
uhidev3: iclass 3/0, 33 report ids
uhid3 at uhidev3 reportid 16: input=6, output=6, feature=0
uhid4 at uhidev3 reportid 17: input=19, output=19, feature=0
uhid5 at uhidev3 reportid 32: input=14, output=14, feature=0
uhid6 at uhidev3 reportid 33: input=31, output=31, feature=0
ugen0 at uhub0 port 7 "Intel product 0x07dc" rev 2.00/0.01 addr 6
uhub4 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.05 addr 2
uhub5 at uhub2 port 1 "Intel Rate Matching Hub" rev 2.00/0.05 addr 2
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
sd3 at scsibus4 targ 1 lun 0: <OPENBSD, SR CRYPTO, 005> SCSI2 0/direct fixed
sd3: 976756MB, 512 bytes/sector, 2000397143 sectors
root on sd3a (27a349ce45a3b091.a) swap on sd3b dump on sd3b
iwm0: hw rev 0x140, fw ver 25.228 (API ver 9), address 7c:5c:f8:3b:80:99
Matthew Via
2016-01-25 23:34:22 UTC
Permalink
I've had the patch applied for two days now and have not seen any ill
efects. This is a Thinkpad T410 running snapshots.

Before, youtube was unwatchable. Sound would continue normally while
video would freeze for long stretches, often over 10 seconds. Its not
perfect now, but its very nearly so when not fullscreen.

It does seem that cpu usage of firefox is also significantly reduced,
and is generally snappier.

Thank you!
-via
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
Nayden Markatchev
2016-01-26 15:43:36 UTC
Permalink
FYI: This diff is in the snapshots since Sunday.
Post by Matthew Via
I've had the patch applied for two days now and have not seen any ill
efects. This is a Thinkpad T410 running snapshots.
Before, youtube was unwatchable. Sound would continue normally while
video would freeze for long stretches, often over 10 seconds. Its not
perfect now, but its very nearly so when not fullscreen.
It does seem that cpu usage of firefox is also significantly reduced,
and is generally snappier.
Thank you!
-via
Post by Mark Kettenis
Firefox makes a lot of concurrent malloc(3) calls. The locking to
make malloc(3) thread-safe is a bit...suboptimal. This diff makes
things better by using a mutex instead of spinlock. If you're running
Firefox you want to try it; it makes video watchable on some machines.
If you're not running Firefox you want to try it; to make sure it
doesn't break things.
Enjoy,
Mark
Loading...