1) the first time we need an io_context for a task, we get
it allocated with refcount one and cached in task->io_context;
early in do_exit() that reference is dropped and current->io_context
reset to NULL.  However, we do exit_mm() and exit_files() _after_
that.  And both can generate IO on behalf of our process, leading to
new allocation of io_context, leaving a reference to it in ->io_context.
This time there'll be nothing to drop it, AFAICS.  I.e. we get a leak
of io_context and a leak of structures dangling off it (list of
cfq_io_context, for instance).

	2) when queue gets cfq set up as elevator, we get cfq_data allocated
for it.  We have cfqd->queue set to our queue and pinned down; it's never
modified until cfqd dies and queue remains pinned down until then.  At the
same time queue->elevator->elevator_data is set to cfqd and pins it down.
It's never modified and remains pinned down until we get to elevator_exit().
Which happens only when the last reference to queue goes away or when we
explicitly switch elevator.  IOW, we get a leak.

	3) when we feed request to cfq, we try find a cfq_io_context attached
to current->io_context with cic->key == cfq_data of queue.  If it doesn't exist,
we allocate it, set its ->key to cfq_data of queue and pin cfq_data down.
That pointer is never modified until cic is get freed.  It's _NEVER_ dropped -
there is no matching decrement of refcount on cfq_data.  Another leak.

	4) we destroy these cfq_io_context when io_context dies.  They are
never removed until that point.  And they retain reference to cfq_data
in ->cfqd *and* to queue - in ->cfqd->queue.  That queue is not freed, all
right - the leak in (1) takes care of that.  If driver decides that queue
should be killed (e.g. on rmmod) it will do blk_cleanup_queue(), which will
do nothing since we still have references to it.  *HOWEVER*, queue->queue_lock
is a different story.  It will get freed.  Normally that wouldn't be a big
deal (there's no IO left on queue), but... at do_exit() time we call
exit_io_context(), which triggers cfq_exit_io_context(), which triggers
cfq_exit_single_io_context() for each cfq_io_context we've got on it.
And that's where the shit hits the fan:
static void cfq_exit_single_io_context(struct cfq_io_context *cic)
{
        struct cfq_data *cfqd = cic->cfqq->cfqd;
        request_queue_t *q = cfqd->queue;

        WARN_ON(!irqs_disabled());

        spin_lock(q->queue_lock);

        if (unlikely(cic->cfqq == cfqd->active_queue)) {
                __cfq_slice_expired(cfqd, cic->cfqq, 0);
                cfq_schedule_dispatch(cfqd);
        }

        cfq_put_queue(cic->cfqq);
        cic->cfqq = NULL;
        spin_unlock(q->queue_lock);
}
and we do spin_lock() on a spinlock that might be freed days ago.  Remeber
that cfq_io_context stays until the process exits; if some IO on device
that had gone away had been done on our behalf a week ago, it will be there.
To make things even funnier, we have interrupts disabled here.

5) sysfs allows to hold a reference to queue and elevator without affecting
queue refcount and lifetime.  Set default iosched to anything other than
cfq so that leak in (1) wouldn't prevent freeing the queue.  Then do the
following:
	exec 42</sys/block/<dev>/queue/nr_requests
	have device removed (rmmod, whatever)
	exec 42</dev/null
or
	cat <&42
and watch the show.  Closing that descriptor => decrementing refcount in
freed memory, reading from it => reading from freed memory.

	6) switching iosched doesn't prevent somebody else from asking
for another switch while this one is going on.  Breaks in all sorts of
fun ways...

7) ioprio_set() can race with cfq_get_queue().  It's not only possible
to miss a new cfq_queue and have it left with old ioprio, it's possible
to get list_for_each() called when another CPU does list_add().  Which
is considerably nasier, albeit harder to hit...

The reason, of course, is that while cfq_set_request() on its own doesn't
need any locking of cic list (process-synchronous, works only with one's
io_context), ioprio_set() is done to other tasks.

8) elv_unregister() doesn't bother with task_lock(); can race with
exit_io_context() freeing task->io_context under it...

9) We have at most one cfq_io_context for given process and given queue.
We bother with cfq_get_queue() once per cfq_io_context; after we'd set
->cfqq we won't call it again.  So if the first operation from our process
on given queue is write done when process doesn't have PF_SYNCWRITE set,
we'll get cfq_queue for (that queue, CFQ_KEY_ASYNC, task->ioprio).  It
will be stored in ->cfqq of created cfq_io_context and that's it - after
that _everything_ (reads, sync writes) for that queue will go to the same
cfq_queue.  Looks very odd...

10)  There's an unpleasant problem with async queue.  Suppose
we have 69 processes, originally with the same ->ioprio.  All do
async writes.  All end up with cfq_io_context pointing to the same
cfq_queue; so far so good.   Now think what'll happen when we do
ioprio_set(2) in one of them.  It will get to that queue and happily
change its ->ioprio and ->ioprio_class.  Oops - we'd just bumped
ioprio for async writes on other processes...

11) OK, sometimes we boost cfq_queue ioprio.  Somebody does a hash lookup
while ioprio of an async queue is elevated.  What, are they going to be
stuck with lowered ioprio when we go back?

12) Suppose a process has talked both to as-iosched and cfq-iosched queues.
We have killed the latter (or switched to a different iosched).  Now we
have all cfq_data, cfq_queue and cfq_request freed; all remaining
cfq_io_context are dummies and hold no pointers (->key and ->cfqq are
NULL).  Process in question has called exit(); there are some pending
requests in the bowels of as-iosched, but io_context is already detached
from the task and is just waiting for IO to finish - it will be freed
at that point.

And that's when somebody tries to rmmod the cfq.  elv_unregister() walks
through all tasks and knocks their ->cic out.  Except that this io_context
is not there anymore - it's detached and the only references to it are
held by as-iosched requests in flight.  So elv_unregister() happily completes
and module is unloaded.  Eventually, as-iosched is done with it and we get to
	as_put_io_context(arq) --->
	put_io_context(arq->io_context) --->
	the last reference goes and we call ioc->cic->dtor(ioc->cic), i.e.
cfq_io_context().  Which used to be in the module we'd just removed.

13) There's a narrower race between cfq_exit_io_context() and cfq_exit() - the
former can get called in the middle of the later _and_ last until past the
end of rmmod.

14) On top of that, we have rmmod as-iosched knocking out ->io_context->cic
of processes that use cfq right now.  And vice versa, of course...

15) More of the same: elv_unregister() leaves task->io_context->set_ioprio
as-is...  FWIW, the idea of ->set_ioprio looks bogus - it points to
iosched method and the only reason it works at all is that only cfq
has it non-NULL.

16) And fun just keeps coming: blk_init_queue_node() failure exit has
blk_cleanup_queue(q); followed by freeing q.  The thing is, we have the
only reference to q at that point, so it's double-free.

17) Double elevator_put() is elevator_switch() fails in elv_register_queue():
we do elevator_exit(), followed by explict elevator_put().  The former
does elevator_put() itself...  (and yes, that's the final batch of
refcounting fixes - elevator_t ones).

18) Same pile: use after free in the very end of elevator_switch(): we print
elevator_type ->name after having done elevator_put().

19) One more: lack of proper ->owner on elevator attributes means that we could
cause interesting problems with rmmod.