Talk to a Chicagoan about their hometown and the topic will quickly move to weather: “You can get all four seasons in the same day.”  “Nobody moves here for the weather.”  “It’s always changing, so it’s a good conversation piece.”  I simply say that Chicago’s weather is predictably unpredictable.

These quips wouldn’t be quite so funny if they weren’t true.  Chicago weather is especially a gamble during the changeover seasons of autumn and spring.  (The occasional April or October snow says it all.)  You never quite know what to wear until the day is halfway over.

Thankfully, that doesn’t stop Chicago from planning outdoor activities.  Ask anyone who has run the Chicago Marathon: race-day temperatures are often in question till the morning of.

To celebrate the most recent Chicago Marathon, held just this past Monday, we took a quick peek at race-day temperatures over the years.  Here’s what we saw:

No, your eyes do not deceive you: the high temperatures go as low as 40 degrees, while the low temperatures go as high as 70.  Be sure to pack mittens and light clothing!

After our brief knit interlude (“knitterlude?”) we’re back on the restaurant inspection beat.

Restaurant owners have plenty of reasons to want to pass an inspection. For one, if an inspector from the city’s Department of Health and Mental Hygiene (DOHMH) finds enough problems, the restaurant may be subject to a follow-up inspection or even a temporary closing. Two, restaurants will soon have to display letter grades based on their inspection scores. This means potential customers will not only see a pass/fail mark, but they’ll see how well a restaurant passed.   -and who wants the reputation of being a C-student eatery?

According to a recent New York Times article, proactive restaurants are bringing in consultants to help them stay inspection-ready. The consultants, some of them former DOHMH inspectors themselves, essentially perform a mock inspection to spot potential health code violations before a DOHMH team drops in.  That gives restaurant owners a chance to remedy the problems before they are subject to an official inspection.  It’s like getting an advance look at the big midterm exam, except no one will get expelled for it.

We think this is a great idea, as it never hurts to have experienced guidance. We also think it’s helpful to improve one’s understanding of the situation before calling in expert help. To that end, we explored the restaurant data in search of the top eleven most common critical inspection violations. (Why top eleven, instead of top ten? Well, because it’s one more.) Specifically, we reviewed the inspection data from 02 January 2009 through 27 March 2010. This data covered roughly 55,000 inspections across 23,000 restaurants. If a restaurant could tackle these issues on their own, ahead of time, then their inspection consultants could focus on other issues.

Number Violation Code Description
11 04O Evidence of, or flying insects in facility’s food and/ or non-food areas.
10 05D Hand washing facility not provided in or near food preparation area and toilet room. Hot and cold running water at adequate pressure not provided at facility. Soap and an acceptable hand-drying device not provided.
9 04N Evidence of, or live roaches in facility’s food and/ or nonfood areas.
8 04A Food Protection Certificate not held by supervisor of food operations.
7 06E Sanitized equipment or utensil, including in-use food dispensing utensil, improperly used or stored.
6 04I Raw, cooked or prepared food is adulterated, contaminated, cross-contaminated and/ or not discarded in accordance with HACCP plan.
5 06D Food contact surface not properly maintained, or not washed, rinsed and sanitized after each use and following any activity when contamination may have occurred.
4 06C Food not protected from potential source of contamination during storage, preparation, transportation, display or service.
3 02B Hot PHF not held at or above 140 degrees Fahrenheit.
2 02G Smoked fish and/ or ROP processed food held above 38 degrees Fahrenheit; other PHF held above 41 degrees Fahrenheit. Except during necessary preparation.
1 04M Evidence of, or live mice in facility’s food and/ or non-food areas.
  • PHF = potentially hazardous foods, e.g. food that must be cooked to a certain temperature in order be safely eaten.
  • ROP = reduced oxygen packaging, which is a fancy term for vacuum-sealing or any other method that draws the oxygen out of a product to help it stay shelf-stable.

Bookmark and Share

We interrupt our coverage of New York City restaurant inspections to preview an upcoming post.

The knitting warriors are at it again.

You may recall our coverage of Sockwars IV, the knitting assassin game, some time ago.  (In case you missed it: Sockwars is a take on the old assassin game, in which you “assassinate” your victim by knitting and mailing them a pair of socks.  The organizer describes it as, “the largest, bloodiest, extreme knitting tournament in the world.” )  Sockwars V is now underway and there has been some serious yarn carnage.  While the competition isn’t quite over, we couldn’t help but take a preliminary peek at the body count.  Call it morbid curiosity.

Here we see the number of assassinations per week, since the start of the competition:

Sockwars V: kills by week

Sockwars V: kills by week

The bloodletting peaked the second week and has dropped off since then.  Reviewing this by day we get a slightly different picture:

Sockwars V: kills by date

Sockwars V: kills by date

Here we see that the assassinations come in waves.  They’re closer together toward the start of the competition but become more spread out over time.  This makes sense.  In such a large competition I would expect a lot of people to get picked off early on, then the victors of those early battles are left to slug it out with one another.

This next chart shows just how mean these knitters can be:

Sockwars V: kills by weekday

Sockwars V: kills by weekday

See that?  Friday has been the biggest day for assassinations, by far.  We always figured Monday was the week’s designated low point; now these crazy knitters go ruining Friday!  Is nothing sacred?

That’s all for now.  We’ll bring you more in-depth analysis after the dust clears and one killer knitter has declared victory.

Bookmark and Share

In our previous post, we noted that all 23,000 New York City eating establishments are subject to surprise inspections by the Department of Health and Mental Hygiene (DOHMH).  Before, we used the restaurant inspection data to learn a little more about New York itself.  This time let’s assume the role of a dodgy restaurant owner: what tricks can we tease out of the data to avoid our due diligence in food handling and kitchen cleanliness?

(Just as a side note, how the inspections begin?  Do the DOHMH inspectors swing open the doors, Wild West-style?  Do they pose as everyday patrons, then pull off their coats to reveal gleaming DOHMH badges?  Do the inspectors even get badges?  If not, Mayor Bloomberg, we’d like a moment of your time…)

We first asked whether any particular time of year was light on inspections.  The charts hinted that some months may be more favorable than others; but after some numerical digging we learned that the variations could very well be due to chance.  We saw similar results on the week-by-week data.

Looking at the number of inspections for each day, we saw something a little different:

number of restaurant inspections by day
This chart shows what could be a pattern of peaks and valleys.  Those spots that appear blank?  They’re really just small values, close to zero.  Closer review reveals that those dips occur on Sundays.  What happens if we group the inspection counts by day of the week?

number of restaurant inspections, by day of week

Wednesdays and Thursdays look like pretty good days for DOHMH inspectors to drop in, yes.  What really stands out is the Sunday value.  There were just 116 total Sunday inspections, compared to thousands of inspections for the other days.  Hmm…

Using a slightly different chart, we can get a better idea of the distribution of inspections across each day of the week:

restaurant inspections by day of week, box plot

If it’s been a while since your last statistics class, this is a box-and-whiskers plot or simply box plot.  For our purposes it’s a tad more useful than a standard bar plot or histogram.  The box plot reflects the same general shape as our bar chart, but it also shows the spreads (the highs and lows) of the data as well as the median values (the lines in the middle of each box).  Not only were there few Sunday restaurant inspections, but the number of Sunday inspections varied little every week.

So far we’ve been looking at the totals across the entire city.  Will we see a different pattern if we break apart the dataset by borough?  For example, we expect certain parts of Manhattan will be very quiet on the weekends.   A picture tells the story of how the boroughs stand on their own:

portion of restaurant inspections by day, per borough

Aside from a dip in Staten Island on Fridays, the pattern is similar across all five boroughs.  While we can hardly say this is a definite trend, we may be on to something.  So for all you dodgy restaurant owners out there, save that Saturday night kitchen cleanup for Sunday night!  Who will notice?  Except for the customers, of course …

In all seriousness, we hope that restaurant owners don’t take these findings to heart.  Please keep your kitchens in order all seven days of the week.  The LocalMaxima crew likes to dine out.  A lot. Food-borne illnesses aren’t on our list of take-out favorites.

Finally, the city requests those of us who use the DOHMH data to include the following disclaimer:

“The City of New York can not vouch for the accuracy or completeness of data provided by this web site or application or for the usefulness or integrity of the web site or application.  This site provides applications using data that has been modified for use from its original source, NYC.gov, the official web site of the City of New York.”

Have some interesting data you’d like us to check out? Need our help making sense of your company’s data? Please drop us a line. Thanks for reading.

Bookmark and Share

A big city is a haven for analysis because there’s so much going on in a relatively small space: people, public transit, real estate, and anything else, they all represent data points. Since New York is formally divided into boroughs, city data analysis gets an additional categorical dimension: we get to see how the five closely-related parts compare to the whole. Whereas neighborhood boundaries can get fuzzy, and zip codes are too narrow, boroughs are distinct subsets of the overall city that have developed their own flavors and, some would argue, odors.

On the topic of flavors and odors, this time around we’re looking at restaurants. The city’s Department of Health and Mental Hygiene (DHMH) pays surprise visits to restaurants to test how well they manage basics such as food handling and cleanliness.  Here, the term “restaurant” is a wide net that includes everything from greasy spoons to fancy white-linen tablecloth numbers to corner coffee shops. If they serve anything to eat or drink, DHMH will check it out.

DHMH makes the raw inspection data available via the NYC Data Mine for anyone to review. We recently asked the data what it could tell us about New York City and its residents.

(Our numbers are based on census projections and the last twelve months’ DHMH inspection data, from September 2008 through August 2009. We hope to give an update once DHMH releases the rest of the 2009 data.)

The first thing we noticed is just how many restaurants are out there. We counted about 23,000. This is based on the number of unique restaurants in the data set and the (hopefully reasonable!) assumption that every restaurant in the city gets inspected at least once a year. While some places have no doubt closed and new ones have opened, those fluctuations should be mild compared to the totals.

How are these restaurants distributed throughout the city? A pretty picture shows us the counts by borough:

NYC: restaurants per borough

I expect few people will be shocked that Manhattan and Staten Island hold the extremes. That aside, the standalone numbers don’t tell us a whole lot. A restaurant client or owner or patron may be curious to know how these raw counts relate to other information, such as population.

Based on census estimates, about 8.2 million people live in New York City. That means 360 people for every restaurant, and that’s not counting the tourists! Once again, a chart will show us how the boroughs compare:

NYC: population per restaurant

Despite having the greatest number of restaurants, Manhattan has the smallest ratio of population to restaurant count. At only 200 people per establishment, I don’t understand why I have so much trouble getting a table. Maybe it’s me?

Whether you want to run a restaurant or just eat in one, you may also want to know about the competition: how many other restaurants are there within a given space? That is, what is the city’s restaurant density?

New York City is about 300 square miles in size, so on the whole we have about 75 restaurants per square mile. The borough breakdown tells a different story:

If you’re hungry and on foot, Manhattan is the place to be! At almost 400 eating establishments per square mile, you’ll practically trip over restaurants. (Of those, there are 8 Stabucks per square mile. That’s a lot of caffeine.) By comparison, the other boroughs may require that you know where to look.

To help compare the boroughs, and to produce the obligatory pretty color chart, here we see how the breakdown in terms of their percentage of New York City’s restaurants, area, and population:

NYC: percentage of area, population, restaurants

In future posts, we’ll explore what the inspection data means for restaurant owners and consumers.

Have some interesting data you’d like us to check out? Need our help making sense of your company’s data? Please drop us a line. Thanks for reading.

Bookmark and Share

LocalMaxima has roots in New York City so we’re always on the hunt for information about it. Thanks to the NYC Data Mine, the city’s catalog of public data, we should have plenty of material. (Think of the Data Mine as a local cousin to the Federal government’s Data.gov project.)

In future posts we’ll take some peeks at Data Mine data, both on its own and mixed with other sources. As always, we’ll share our findings with you here on our website.

One topic we won’t cover, though, is the correlation between emergency-response times and graffiti. That’s not because of a lack of interest. Quite the contrary. It was our first idea when we browsed the Data Mine’s catalog, and was slated for today’s post. While hunting around for some additional information on that topic, though, we stumbled onto some folks at NYU who had published their findings. (Kudos, by the by.) So please, give them a read, and come back to us next time for another New York topic.

Have some New York City data you’d like us to explore? Please tell us about it. As always, thanks for reading.

Bookmark and Share

What’s your reaction to a bad movie? Do you mock it, MST3K-style?  Perhaps you storm out of the cinema and attempt a refund?  If you’re part of one particularly prickly crew, you post your thoughts on the “Mr. Cranky Rates the Movies!” website.  (Warning: the reviews’ language will likely trip workplace internet filters.  You were warned.)   Providing a fistful of internet vigilante justice, the Mr. Cranky site is home to more than two thousand scathing movie reviews written by just a handful of people.

For fun, we’re taking a look at the Mr. Cranky reviews and we decided to share our initial results with you, our faithful readers.

As always, we began with some simple charts to get a feel for the data set.  We first broke down the list of reviews based on the films’ release dates:

Mr Cranky: Ratings by release date

Mr Cranky: Ratings by release date

We see here that most of the reviews cover releases from the past fifteen years.  There are a few outliers though, including a small crop of films released in the early 1970s.  There’s also a curious gap in 2007.  Perhaps Team Cranky needs to rest up before they take on more pain?

We removed the outliers on either end and focused on the 1995-2006 region.  This still comprises about 96% of the data set (about two-thousand movies).

If a movie is bad enough, we may say that it “bombed.”  Instead of the standard one- to four-star ratings, Mr. Cranky’s system is based on explosives: from one bomb (hated the least) to four bombs, then dynamite, and finally, the nuke (hated the most).

Charting the review ratings, broken down by year, we see that Mr. Cranky has been judicious in dishing out the pain.  Each year’s reviews form something of a bell shape, with most movies taking the middle rating of a three-bomb score (green in the chart below).

Mr Cranky: Ratings by release date

Mr Cranky: Ratings by release date

 

In fact, if we plot these years as lines instead of separate breakdowns, we see plenty of overlap.  The familiar bell shape is rather consistent over time:

Mr Cranky: Movies reviewed by release date

Mr Cranky: Movies reviewed by release date

Given such consistency in the distribution, one would expect the Mr. Cranky crew to have some strict criteria for how to assign the ratings.  The data we have here yield little insight into that process; all we can see is that there’s probably no bias in terms of a film’s release date.  Perhaps its based on the number of child sidekicks, or car chases, or even child sidekicks chasing cars?

With more data, we would ideally be able to model the Mr. Cranky system and predict a movie’s rating.  This makes for a fun party game, yes; but for movie executives such a formula could be useful.  We wager Hollywood studios already have a screening process for scripts, but clearly some clunkers still make it through.   What if the studios could employ a predictive model to enhance their accept/reject process?  A successful model would permit them to divert funds toward sure-fire blockbusters and pull the plug on failures before they go too far.

Hmm…  Hollywood, if you’re listening, drop us a line.  We have an idea.

We’re not done here, not by a long shot.  We’ll revisit the Mr. Cranky data to get a deeper look into the review process.  Stay tuned.


Have some interesting data you’d like us to check out? Need our help making sense of your company’s data? Please drop us a line. Thanks for reading.

Bookmark and Share

Over time, our online pursuits generate a rich set of data points.  The websites we visit, the articles we read, and the things we buy, they all reflect our personality.  It can feel creepy when someone else analyzes this information, but when we explore it on our own we get to know ourselves a little better.

Me, I recently uncovered some trends in my e-mail habits.  It turns out my mail client stores some message metadata — recipients, dates, and so on — in a local database.  By mixing a little SQL and some quality time with R’s charting tools, I learned just how much nonsense I had sent out over the past four years.

Let’s start with the basics.  This chart shows the total e-mails I sent out in that period, based on the time of day:

total e-mails sent, per hour

total e-mails sent, per hour

Here we see a ramp-up in the morning, followed by a brief dip around 6PM, then another quick peak around 8PM before it tails off for the night.  (Those rare e-mails sent during the wee hours, I chalk those up to my travels through different time zones.)  The lack of a midday dip hints at a person who typically works through lunch.  Yes, that seems to fit me very well.

But we’re dealing with a lot of information here, so these are perhaps broad statements?  It may help to slice the data by day to get a clearer picture of my habits:

e-mails sent per hour, broken down by day

e-mails sent per hour, broken down by day

The weekdays exhibit the same pattern of one large hump followed by a smaller, late-day peak.  We also see some new details that were hidden in the other chart:

The thick magenta line represents Thursdays, and the dip around noon indicates this was my day to step away from my desk for a proper meal.  While I took it easy on the weekends, I sent a relatively large number of messages after Sunday dinner.  (Check out the peak at 8PM.)  Was I getting a head start on the work week, or raving about some new restaurant I had tried that night?  Hmm.

Slicing the data yet another way reveals even more details.  The real eye-opener was the number of messages I sent per month:

total e-mails sent each month

total e-mails sent each month

See that spike there, between the two red lines?  The one between December 2005 and January 2006?

That marks when I acquired my first Blackberry.

Fine, it’s time to come clean: I’m hooked.  I’m a connectivity addict, and I now see why they’re affectionately called “crackberry” phones.

Granted, this is just a quick skim over a lot of data.  I may see other trends if I were to separate professional and personal communication, and look at conversation (mail thread) counts instead of raw message counts.  Additionally, the charts alone don’t tell us the whole story behind that spike in January 2006.   (I may have seen the need ahead of time and bought the crackberry to keep up.  At least that’s what I’ll say until proven otherwise.)  Still, were I a sleuth, these charts would give me an idea of where to dig for more details.

 

Have some interesting data you’d like us to check out? Need our help making sense of your company’s data? Please drop us a line. Thanks for reading.

Bookmark and Share

Who knew knitters were such a violent bunch?

Me, I always figured they were pretty mellow. It takes a lot of time and patience to knit something, right? Little did I know, hidden behind the handmade scarves and sweaters were killer instincts and nerves of steel. Sock Wars has forever changed my views of knitters.

Sock Wars?

Yes, Sock Wars.

Sock Wars is a take on the old Assassin game. In Assassin, you and your assigned target play contract killers. You rack up points by killing your target, then killing their target, and so on. The victor is the last one standing. The catch? You are someone else’s target, so you have to act quickly to rack up points.

I’ve seen Assassin games that use waterguns, touch-tag, and cameraphones to “kill” players. In Sock Wars you kill your target by … knitting them a pair of socks. Once your target receives the socks in the mail, they are considered dead and they send you whatever socks they had in-progress, along with their target’s details.

(You have to admit, if you’re going to get whacked, a fresh pair of handmade socks is a pretty nice consolation prize.)

When I first heard about Sock Wars, I marvelled over killer yarn for just a moment. Then I wondered aloud, “I’d love to see the data on that.” As it turns out, the kill data is available on the Sock Wars website. The data includes players’ locations (country or US state) so I knew I’d have a chance to play with R‘s mapping toolsets in addition to the standard number-crunching.

Digging In

When exploring a new data set, it helps to run some basic tests to get a feel for what’s going on. It’s like scanning a room before deciding who would make for an interesting chat. To that end, I churned the data into a usable form and fired up R to generate some pretty charts. I mean, descriptive statistics.

(The data shows that most of the active participants — 85% — are from the USA. So our analysis will focus on those members in the US.)

Killer States

Collectively, how deadly are each state’s killers? We can see that California killers did the most damage overall, taking out more than thirty knitters.

Sock Wars IV: Kills by State

Sock Wars IV: Kills by State

That puts California head and shoulders above the rest of the states. Should I fear a west-coast knitter, then? Maybe, maybe not. The data show that California also had the greatest enrollment. Taking the ratio of kills to enrollment gives us a different view:

Sock Wars IV: Ratio of Kills to Enrollment

Sock Wars IV: Ratio of Kills to Enrollment

From this angle, California looks a lot less tough: its assassins took out roughly one knitter each. Texas knitters took out about two people each, while Maine and Missouri tie for the top spot at three kills per assassin.

Top Killers

Knitters from Maine, Missouri, and Texas have all proven deadly in a collective sense. What about the individual killers?

Here we see Texas had thirteen kills among six enrolled knitters … but a single person was responsible for half of that body count: the needles of Bustapurl sent many a knitter to their maker.

Sock Wars IV: Top Ten Killers

Sock Wars IV: Top Ten Killers

And this, kids, is why you don’t mess with Texas.

Now what?

So far we’ve focused on descriptive measures: counts, averages, and anything else that summarizes a lump of data at face value. Sometimes, though, people want inferential measures from their data: expected trends (forecasting), non-obvious connections, and anything else that will give an extra edge to their decision process:

  • If join the next Sock Wars tournament, should I tremble in fear if someone from Missouri gets my name?
  • If I run a yarn store, should I stock up when the next Sock Wars tournament begins? and if I run a web-based yarn distributor, will an offer of free shipping send me into the poor house?
  • If I’m assembling an all-star team of knitting assassins (“assassiknitters?” “knit-men?”) should I search Texas for a heavy-hitter? Would I be better off with a large team of Californians?

It’s tempting to draw conclusions from the existing data, isn’t it? The charts hint that we should be worried about solo killers in Texas and Seattle, and groups in California. But let’s face it, this is only a single data set. We can’t tell whether it represents future trends or whether there are a bunch of freak incidents lurking among the numbers.

Other than waiting to collect several years’ worth of data, what can we do? We could look for other data, for which we have plenty of history and which correlates with the knitting kill stats. It’s still a long shot, but that may just give us deeper insight into future Sock Wars competitions.

This being a game of killers, I compared the Sock Wars data against murder rates for 2005, 2006, and 2007. (Those are the most recent years for which the data was available. The data seems roughly consistent from year to year, so we should be able to use this as a rough estimate for 2008 and 2009.) More specifically, I took the number of murders in each state, along with the Sock Wars kills by each state, and scaled the numbers so they would all fit nice and pretty on the same chart.

The results?

Shocking.

Sock Wars IV: Knitting Kills vs Crime Stats

Sock Wars IV: Knitting Kills vs Crime Stats

The thin lines represent the (scaled) crime data, while the thick red line is the (scaled) Sock Wars data.   For the most part, a (relatively) greater number of murders in one state corresponds to a greater number of Sock Wars kills by assassins from that state. To put this in more technical terms, the correlation coefficient ranges from 0.76 to 0.80.  The maximum value for a correlation coefficient is 1.0, so we have a reasonably strong match.

Now before you get too excited, notice that the crime stats correlate with the collective (aggregate) headcount for each state’s killers. From this we can possibly infer that, for example, if we have to kill a lot of knitters we hire a lot of assassins from California (and hopefully they’re cheap). Along those same lines, we can say that the Sock Wars knitters from Maine and Missouri should, collectively, be able to inflict a lot of damage.

That said, if I’m shopping around for one superstar knitting assassin, the crime data isn’t quite as helpful. We’d be better off trying to correlate against several other data sources as well as multiple views of the data — say, the ratio of killers to enrollment. Were I a betting man and this were all the information I had, though, I’d take it.

So, who wants to start a Sock Wars betting pool? Fantasy Knitting Leagues, anyone?

With That In Mind…

So the next time you see someone knitting, ask yourself: is this person a mild-mannered citizen, just passing the time? or are they a cold-blooded killer? Whatever you do, try to avoid eye contact and sudden moves.

 

Have some interesting data you’d like us to check out? Need our help making sense of your company’s data? Please drop us a line.

Bookmark and Share

It’s all about the data, when you really think about it.

Sure, data can be numbers in spreadsheets and college classes you’d rather forget. It can be dry. But it can be more than that.

“Collecting data” is just a formal way of saying, “I noticed something.” You probably collect data without even realizing: The number of passengers on your (very crowded) flight. The time it takes you to get to work, based on which route you take. The number of times you tell your kids “no” before it sinks in.

Formally or informally, consciously or subconsciously, we analyze this data to assess what’s happened or to make an educated guess as to what the future holds.

People, businesses, government agencies, we all use the numbers as our guide. Analyzing data may yield that inside scoop that’s invisible to the naked eye, that little extra help you get ahead or have a private laugh.

Here at LocalMaxima, we’re all about the data, too.

Welcome to our website. This is our platform to share our adventures in data analysis. We’ll make some pretty charts, draw some conclusions, and occasionally take an irreverent look at the data around us. We’re essentially thinking out loud but you’re welcome to listen in. Feel free to pay us a visit now and then, or subscribe in your preferred RSS reader.

Thanks for stopping by, and please come back soon.