Amazonfail! An amazing illustration in How To Do It Wrong!

Let me see if I can sum up here.

Timeline and posts collection!

On the 10th, people discovered that two prominent gay romance novels had no sales ranking. On the 11th, the sales ranking disappeared from hundreds of gay and lesbian books. Mark R. Probst wrote in and got a response: adult material was being excluded from some searches; the mechanism of the searches depended on the sales ranking. So, to exclude the books from those searches, the sales rank was being pulled. One problem: not all of the books that were pulled actually had adult content. Furthermore, books with identifiable adult content are still nicely visible.

An incomplete listing of books affected, and their content.

Amazon's sales rank is shown in the Product Details section, helpfully labeled Sales Rank.

The affected items are not removed from sales at Amazon. One can evidently even narrow one's search to Books and get the uncensored content. (Anyone got further discussion on that/links? I forgot where I saw that one.) However, from the main whole-site search, if you type in the exact title and author's name, you will not get the book. Look, I can see the reasoning for excluding, say, something with a title like "Love One's Brother: Fisting and You: A Guide for the Gay Man" from searches for "Brother". However, if you're searching for the whole title? Um. No. (Disclaimer: I just made that title up. I doubt it actually exists, especially because Google wanted to know if I wanted the lyrics for "Love the One You're With" instead. Sorry.)

All of this is screamingly problematic for a couple reasons. One, GLBT content is not inherently an adult concept, so pulling books covering arbitrary queer* concepts under that excuse is Bad. If you're going to target the queer community that way, at least cop to it. Two, if their aim is to get adult concepts out of range of the casual searcher, they're failing miserably because of the stuff they haven't pulled. Three, they're shooting themselves in the foot by making books that people want to buy harder for those people to find. Four, Amazon is in a position of power as a major online bookseller; silently hiding relevant resources from people who need them is tantamount to censorship. Five, it's implemented shockingly badly: they're stating that it's necessary to not display the sales ranking on books to make the whole crazy thing run right, there's no way for people with over-curious seven-year-olds and sensitive grandmothers to turn it on for the whole site, and there's no way for sex-positive folk to turn it off. Six, they've just upset a whole bunch of authors, including Neil Gaiman (dudes, you do not want to do that!), as well as a very large portion of the internet.

*Queer: I'm one of those bisexuals who likes the word, although I understand that some people don't care for it.

If this is indeed a legitimate attempt to protect the public, it's failing very badly.

Now, as most of my regular readers may know, I hang out with software developers, and have a bit of training in that department myself. I am also an LJ volunteer, and my social crowd includes parts of LiveJournal's Schools team. I am also fresh from reading this post on the complexity of implementation (coincidentally by someone who's programmed for Amazon). Why do I mention this? Because this gives me a little slice of insight as to how things probably went down.

I am guessing, here, that for business reasons beyond my ability to know (but possibly a horde of letter-writing angstmuffins who think that adult content on the internet should be abolished, or perhaps some CEO's Aged Parent threw a fit), someone at Amazon made the decision that they should have the ability to remove certain items and/or classes of content from appearing to the casual searcher.

Any decision like that generally has to be run past one's programmers. If Amazon's programmers made any changes (I don't even know if they did!), they would have been very poorly done. If this were to have been done well, it would be done without leaving any obvious traces like the Amazon Sales Rank being pulled from the book's sales page. That says to me that this is either something that was done without involving the programmers, or it was of a complexity and urgency to be shoved through as a fucking kludge. It seems like it ought to be simple to make something non-obvious, but I don't know the guts of Amazon's system, but if it involves pulling the sales rank, evidently it is non-trivial as it is coded now.

If I were going to write the specifications for a system to filter the results served to visitors and yet still sell the things I was protecting the extra-sensitive visitors from, I would first make the fact that I was filtering things public. Then I would give the visitors a toggle to turn it off or keep it on. If I were freaked out about someone's over-inquisitive small child noticing it and turning off the filtering, I would make it one of those bloody Amazon options that make you enter the password before touching.

And even with the means to pull the books, they're pulling the wrong ones and not pulling the right ones. It's fairly apparent at this point that they're using keywords related to sex and LBGT issues in order to do the pulling, without regard to the actual content of the book. (See: "Heather Has Two Mommies".)

But even if they could be assumed to be operating under good faith, they're still doing it badly, by using keywords as they are. And they very much must be using some form of keyword search, for practical reasons -- the sheer volume of books in their store is such that it would not be practical, or perhaps possible, to get an employee to read each one under question and pass judgment based on some sane standard like graphic depiction of sex or violence. It might not even be feasible to have someone read the product page for each book under question. Let's consider LiveJournal's Schools directory for a moment. It is kind of large, but I don't think it's even close to the size of Amazon's catalog, by an order of magnitude at least. It takes a team of dedicated volunteers to review and then confirm or deny schools listings. The informal, internal name for the Schools volunteer team is "Team Leaky Canoe". It has taken years for the current schools directory to grow to its current size. I don't know what method they have been using -- perhaps it is whole subcategories, or perhaps it is keywords in the summary, perhaps it is some other means of quick search -- but it's getting painfully obvious that they don't care whether or not something is actually adult.

It's probably not possible to find a keyword, or even a cluster of keywords, that accomplishes what Amazon claims to want to be doing. Anything of the like will run into the breast cancer problem, or "overblocking" problem. If I were attempting to design a practical system to locate problematic, I would compile a list of keywords, like the one that powers Google SafeSearch. Then I would run this list of keywords against the full text search that Amazon has for many of its books. Then I would take the results of that search, and possibly do a little more automated crunching to see if I couldn't isolate the likeliest candidates and save some human time. Then I would take whatever that spat out, and display the title and author(s), the category or categories the book is found in, the highlights that were hit, and a paragraph or so of context for the highlights. Then I would get some human, ideally one speaking the language and reasonably versed in the context that the book is coming from, to look at that and make the call: adult or not adult. Then I would have another human run the same check on the same material. In case they disagreed, I'd have a tiebreaker round with a third human. I note that Amazon wouldn't even have to pay highly trained workers a living wage to sit at a desk for eight hours doing this. Amazon could farm it out to Mturk, which they own.

Whether this is a genuine glitch, as they're now claiming, or active malice towards the LGBT community, they're still doing it wrong. LiveJournal had Strikethrough. Amazon has #amazonfail.


