Closed Bug 750988 Opened 12 years ago Closed 12 years ago

ASAN: Several xpcshell tests in url-classifier triggers error

Categories

(Core :: XPConnect, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 795281

People

(Reporter: decoder, Unassigned)

Details

(Whiteboard: [asan][asan-test-failure])

Attachments

(2 files)

Attached file AddressSanitizer Log
This xpcshell test at toolkit/components/url-classifier/tests/unit/test_partial.js (and also test_cleankeycache.js and some others) fail with AddressSanitizer on mozilla-central revision 0a796d07499a. The problem looks like being inter-thread where one thread allocates something and the other uses it after it has been freed. I have seen this failure only on the try slaves and was not able to reproduce locally (this could be due to thread scheduling).

However, I was able to extract a symbolized ASAN trace from the try run, the log is attached.

Not sure which component this belongs to exactly, please move as appropriate :) Marking this s-s until triaged and confirmed to be safe.
Component: General → XPConnect
QA Contact: general → xpconnect
Gian-Carlo, can you take a look at this since I see you've updated this test the last five times it was touched. Perhaps you know who would own this area since the bug is hidden?
Is this related to bug 727947? Those traces mean very little to me.

I remember while rewriting this code (unfortunately the rewrite was recently backed out) that I found a case where we're freeing something on the wrong thread. Scourging through my patches, adding the following to nsUrlClassifierDBService.cpp may fixed it:

nsUrlClassifierLookupCallback::~nsUrlClassifierLookupCallback()
{
  nsCOMPtr<nsIThread> thread;
  (void)NS_GetMainThread(getter_AddRefs(thread));

  if (mCallback) {
    (void)NS_ProxyRelease(thread, mCallback, false);
  }
}

Can you confirm this fixes your issue?

Nobody worked in this code for years (and the Google guys who wrote it work on Chrome now). dcamp is the closest thing to a component owner and I've been recently working on (trying to) rewrite it.
Christian, can you build with this change and see if it fixes the issue?
gcp: The build I sent to try failed: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/decoder@own-hero.net-62a4c9de5c19/try-linux64/try-linux64-bm14-try1-build5068.txt.gz

Do you have a proper patch that I can submit to try?

And yes, this could be the same issue as in bug 727947. If you need help understanding the trace just ping me on IRC.
I started two try runs on linux64 opt + xpcshell tests, one with the patch, the other without the patch:

Without: https://tbpl.mozilla.org/?tree=Try&rev=4419705275c3
With: https://tbpl.mozilla.org/?tree=Try&rev=d8d256a55db2

Unfortunately I still see the failures on both runs (exactly the same tests failing).
Whiteboard: [asan] → [asan][asan-test-failure]
Unhiding this so more people can take a look. Even if this is exploitable, it would likely be highly complex and unreliable to do so.
Group: core-security
Whiteboard: [asan][asan-test-failure] → [asan][asan-test-failure][orange]
I'm still seeing this on today's push to try ( https://tbpl.mozilla.org/?tree=Try&rev=46f423c574d3 ), opt-builds only.

TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/url-classifier/tests/unit/test_addsub.js | test failed (with xpcshell return code: 1), see following log:
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/url-classifier/tests/unit/test_partial.js | test failed (with xpcshell return code: 1), see following log:
Is ASAN using a custom gcc compiler? The code in UrlClassifier is known to produce a compiler bug in the Linux-gcc we use. (The gcc on the build machines has patches).
(In reply to Gian-Carlo Pascutto (:gcp) from comment #9)
> Is ASAN using a custom gcc compiler? The code in UrlClassifier is known to
> produce a compiler bug in the Linux-gcc we use. (The gcc on the build
> machines has patches).

No, ASan is part of LLVM, and we're using Clang to produce these builds.
I finally managed to reproduce this locally with the build using:

taskset -c 0 make -C objdir/toolkit/components/url-classifier/tests/ xpcshell-tests

The taskset command limits the xpcshell to a single CPU core which seems to make it more likely to hit the thread scheduling problem we're seeing here.
Here's a backtrace from GDB, when the use-after-free occurs. The GDB trace has more information than the ASan one, so this one might give the necessary hints to figure out what's going on:

Breakpoint 1, 0x0000000000439e50 in __asan_report_error ()
(gdb) bt
#0  0x0000000000439e50 in __asan_report_error ()
#1  0x000000000043a257 in __asan_report_load4 ()
#2  0x00002aaaad46be8c in operator unsigned int (this=<optimized out>) at ../../../dist/include/nsISupportsImpl.h:249
#3  nsXPCWrappedJS::Release (this=<optimized out>) at js/xpconnect/src/XPCWrappedJS.cpp:183
#4  0x00002aaaad98dbb9 in ~nsCOMPtr_base (this=<optimized out>) at ../../../dist/include/nsCOMPtr.h:408
#5  ~nsCOMPtr (this=<optimized out>) at ../../../dist/include/nsCOMPtr.h:447
#6  ~nsCOMPtr (this=<optimized out>) at ../../../dist/include/nsCOMPtr.h:447
#7  ~UrlClassifierCallbackProxy (this=0x2aaac0aeb780) at toolkit/components/url-classifier/nsUrlClassifierProxies.h:191
#8  operator= (aValue=<optimized out>, ptr=<optimized out>, aValue=<optimized out>, this=<optimized out>)
    at toolkit/components/url-classifier/nsUrlClassifierProxies.h:191
#9  UrlClassifierCallbackProxy::Release (this=0x2aaac0aeb780) at toolkit/components/url-classifier/nsUrlClassifierProxies.cpp:200
#10 0x00002aaaad98efcf in ~nsCOMPtr_base (this=<optimized out>) at ../../../dist/include/nsCOMPtr.h:408
#11 ~nsCOMPtr (this=<optimized out>) at ../../../dist/include/nsCOMPtr.h:447
#12 ~nsCOMPtr (this=<optimized out>) at ../../../dist/include/nsCOMPtr.h:447
#13 ~GetTablesRunnable (this=<optimized out>, this=<optimized out>) at toolkit/components/url-classifier/nsUrlClassifierProxies.h:51
#14 ~GetTablesRunnable (this=0x2aaac0aeb880, ptr=<optimized out>) at toolkit/components/url-classifier/nsUrlClassifierProxies.h:51
#15 UrlClassifierDBServiceWorkerProxy::GetTablesRunnable::~GetTablesRunnable (this=0x2aaac0aeb880)
    at toolkit/components/url-classifier/nsUrlClassifierProxies.h:51
#16 0x00002aaaae2f3afb in nsRunnable::Release (this=0x2aaaad46be8c) at nsThreadUtils.cpp:30
#17 0x00002aaaae39a607 in ~nsCOMPtr_base (this=<optimized out>) at ../../dist/include/nsCOMPtr.h:408
#18 ~nsCOMPtr (this=<optimized out>) at ../../dist/include/nsCOMPtr.h:447
#19 ~nsCOMPtr (this=<optimized out>) at ../../dist/include/nsCOMPtr.h:447
#20 nsThread::ProcessNextEvent (this=0x2aaabaa73280, mayWait=true, result=<optimized out>) at xpcom/threads/nsThread.cpp:630
#21 0x00002aaaae2f4271 in NS_ProcessNextEvent_P (thread=<optimized out>, mayWait=true) at nsThreadUtils.cpp:220
#22 0x00002aaaae3983be in nsThread::ThreadFunc (arg=0x2aaabaa73280) at xpcom/threads/nsThread.cpp:257
#23 0x00002aaab17443af in _pt_root (arg=0x2aaabaa73a80) at nsprpub/pr/src/pthreads/ptthread.c:156
#24 0x000000000043b6ab in __asan::AsanThread::ThreadStart() ()
#25 0x00002aaaaacd5efc in start_thread (arg=0x2aaac0727700) at pthread_create.c:304
#26 0x00002aaab221759d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#27 0x0000000000000000 in ?? ()
Forgot to mention, the trace is from an opt-build, revision c64a9f342156.
Blocks: 438871
No longer blocks: 438871
Whiteboard: [asan][asan-test-failure][orange] → [asan][asan-test-failure]
I'm not able to reproduce this. (Including with the taskset trick).

Does still happen on try?
Yes, see today's ASan try push with tests at https://tbpl.mozilla.org/?tree=Try&rev=ad6bc2ddffe3

Does the GDB trace in comment 12 help in any way maybe? It seems more accurate than the ASan trace.
Not really. There are similar bugs 727947, bug 795281. I was hoping that one of them being reproducible would allow me to get a failure with full logging in nsUrlClassifierDB/nsUrlClassifierStreamUpdater.
If you have a logging patch, then I could add that, rebuild and see if I can still reproduce locally :)
export NSPR_LOG_MODULES="UrlClassifierDbService:5,UrlClassifierPrefixSet:5,UrlClassifierStreamUpdater:5"

with --enable-logging

I have no idea where to look here, maybe a log accompanying a failing test can give some insight where the timing is critical.

Also, if this is reasonably reproducible for you, can you try https://bugzilla.mozilla.org/show_bug.cgi?id=795281#c16 and tell me if that resolves this issue for you?
Bug 795281 is now on inbound and IIRC your try run with that patch passed xpcshell tests. Can we close this bug?
Flags: needinfo?(choller)
Yes, marking as duplicate :) I'll re-open if any problems appear.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(choller)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: