in 2001, i got an itch to write a book. like many people, i naïvely thought, "i have a book or two in me," as if writing a book is as easy as putting pen to paper. it turns out to be very time consuming, and that's after you've spent countless hours learning and researching and organizing your topic of choice. but i marched on and wrote or co-wrote 10 books in a five-year period. i'm a glutton for punishment.
my day job during that time was programming. i've been programming for 16 years. my whole career i've focused on automating the un-automatable — essentially making computers do things people never thought they could do. by the time i started on my 10th book, i got another kind of itch — i wanted to automate my writing career. i was getting bored with the tedium of writing books, and the money wasn't that good.
but that's absurd, right? how can a computer possibly write something coherent and informative, much less entertaining? the "how can a computer possibly do x?" questions are the ones i've spent my career trying to answer. so, i set out on a quest to create software that could write. it took more effort than writing 10 books put together, but after building a team of 12 people, we were able to use our software to generate more than 100,000 sports-related stories in a nine-month period.
before i get into specifics with what our software produces, i think it's worth highlighting some of the attributes that make software a great candidate to be a writer:
software isn't a panacea, though. not all content can be easily automated (yet). the type of content my company,automated insights, has automated is quantitatively oriented. that's the trick. we've automated content by applying meaning to numbers, to data. sports was the first category we tackled. sports by their nature are very data heavy. by our internal estimates, 70% of all sports-related articles are analyzing numbers in one form or another.
our technology combines a large database of structured data, a real-time feed of stats, and a large database of phrases, and algorithms to tie it all together to produce articles from two to eight paragraphs in length. the algorithms look for interesting patterns in the data to determine what to write about.
in november of 2010, we launched the statsheet network, a collection of 345 websites (one for every division-i ncaa basketball team) that were fully automated. check out my favorite team: unc tar heels.
From: radar.oreilly.com