This is “Where Does Data Come From?”, section 11.3 from the book Getting the Most Out of Information Systems (v. 1.2).
This book is licensed under a Creative Commons by-nc-sa 3.0 license. See the license for more details, but that basically means you can share this book as long as you credit the author (but see below), don't make money from it, and do make it available to everyone else under the same terms.
This content was accessible as of December 29, 2012, and it was downloaded then by Andy Schmitz in an effort to preserve the availability of this book.
Normally, the author and publisher would be credited here. However, the publisher has asked for the customary Creative Commons attribution to the original publisher, authors, title, and book URI to be removed. Additionally, per the publisher's request, their name has been removed in some passages. More information is available on this project's attribution page.
For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. You may also download a PDF copy of this book (25 MB) or just this chapter (1 MB), suitable for printing or most e-readers, or a .zip file containing this book's HTML files (for use in a web browser offline).
Organizations can pull together data from a variety of sources. While the examples that follow aren’t meant to be an encyclopedic listing of possibilities, they will give you a sense of the diversity of options available for data gathering.
For most organizations that sell directly to their customers, transaction processing systems (TPS)Systems that record a transaction (some form of business-related exchange), such as a cash register sale, ATM withdrawal, or product return. represent a fountain of potentially insightful data. Every time a consumer uses a point-of-sale system, an ATM, or a service desk, there’s a transactionSome kind of business exchange. (some kind of business exchange) occurring, representing an event that’s likely worth tracking.
The cash register is the data generation workhorse of most physical retailers, and the primary source that feeds data to the TPS. But while TPS can generate a lot of bits, it’s sometimes tough to match this data with a specific customer. For example, if you pay a retailer in cash, you’re likely to remain a mystery to your merchant because your name isn’t attached to your money. Grocers and retailers can tie you to cash transactions if they can convince you to use a loyalty cardSystems that provide rewards and usage incentives, typically in exchange for a method that provides a more detailed tracking and recording of customer activity. In addition to enhancing data collection, loyalty cards can represent a significant switching cost.. Use one of these cards and you’re in effect giving up information about yourself in exchange for some kind of financial incentive. The explosion in retailer cards is directly related to each firm’s desire to learn more about you and to turn you into a more loyal and satisfied customer.
Some cards provide an instant discount (e.g., the CVS Pharmacy ExtraCare card), while others allow you to build up points over time (Best Buy’s Reward Zone). The latter has the additional benefit of acting as a switching cost. A customer may think “I could get the same thing at Target, but at Best Buy, it’ll increase my existing points balance and soon I’ll get a cash back coupon.”
UK grocery giant Tesco, the planet’s third-largest retailer, is envied worldwide for what analysts say is the firm’s unrivaled ability to collect vast amounts of retail data and translate this into sales.K. Capell, “Tesco: ‘Wal-Mart’s Worst Nightmare,’” BusinessWeek, December 29, 2008.
Tesco’s data collection relies heavily on its ClubCard loyalty program, an effort pioneered back in 1995. But Tesco isn’t just a physical retailer. As the world’s largest Internet grocer, the firm gains additional data from Web site visits, too. Remove products from your virtual shopping cart? Tesco can track this. Visited a product comparison page? Tesco watches which product you’ve chosen to go with and which you’ve passed over. Done your research online, then traveled to a store to make a purchase? Tesco sees this, too.
Tesco then mines all this data to understand how consumers respond to factors such as product mix, pricing, marketing campaigns, store layout, and Web design. Consumer-level targeting allows the firm to tailor its marketing messages to specific subgroups, promoting the right offer through the right channel at the right time and the right price. To get a sense of Tesco’s laser-focused targeting possibilities, consider that the firm sends out close to ten million different, targeted offers each quarter.T. Davenport and J. Harris, “Competing with Multichannel Marketing Analytics,” Advertising Age, April 2, 2007. Offer redemption rates are the best in the industry, with some coupons scoring an astronomical 90 percent usage!M. Lowenstein, “Tesco: A Retail Customer Divisibility Champion,” CustomerThink, October 20, 2002.
The firm’s data-driven management is clearly delivering results. Even while operating in the teeth of a global recession, Tesco repeatedly posted record corporate profits and the highest earnings ever for a British retailer.K. Capell, “Tesco Hits Record Profit, but Lags in U.S.,” BusinessWeek, April 21, 2009; A. Hawkes, “Tesco Reports Record Profits of £3.8bn,” Guardian, April. 19, 2011.
Firms increasingly set up systems to gather additional data beyond conventional purchase transactions or Web site monitoring. CRM or customer relationship management systems are often used to empower employees to track and record data at nearly every point of customer contact. Someone calls for a quote? Brings a return back to a store? Writes a complaint e-mail? A well-designed CRM system can capture all these events for subsequent analysis or for triggering follow-up events.
Enterprise software includes not just CRM systems but also categories that touch every aspect of the value chain, including supply chain management (SCM) and enterprise resource planning (ERP) systems. More importantly, enterprise software tends to be more integrated and standardized than the prior era of proprietary systems that many firms developed themselves. This integration helps in combining data across business units and functions, and in getting that data into a form where it can be turned into information (for more on enterprise systems, see Chapter 9 "Understanding Software: A Primer for Managers").
Sometimes firms supplement operational data with additional input from surveys and focus groups. Oftentimes, direct surveys can tell you what your cash register can’t. Zara store managers informally survey customers in order to help shape designs and product mix. Online grocer FreshDirect (see Chapter 2 "Strategy and Technology: Concepts and Frameworks for Understanding What Separates Winners from Losers") surveys customers weekly and has used this feedback to drive initiatives from reducing packaging size to including star ratings on produce.R. Braddock, “Lessons of Internet Marketing from FreshDirect,” Wall Street Journal, May 11, 2009. Many CRM products also have survey capabilities that allow for additional data gathering at all points of customer contact.
The U.S. health care system is broken. It’s costly, inefficient, and problems seem to be getting worse. Estimates suggest that health care spending makes up a whopping 18 percent of U.S. gross domestic product.J. Zhang, “Recession Likely to Boost Government Outlays on Health Care,” Wall Street Journal, February 24, 2009. U.S. automakers spend more on health care than they do on steel.S. Milligan, “Business Warms to Democratic Leaders,” Boston Globe, May 28, 2009. Even more disturbing, it’s believed that medical errors cause as many as ninety-eight thousand unnecessary deaths in the United States each year, more than motor vehicle accidents, breast cancer, or AIDS.R. Appleton, “Less Independent Doctors Could Mean More Medical Mistakes,” InjuryBoard.com, June 14, 2009; and B. Obama, President’s Speech to the American Medical Association, Chicago, IL, June 15, 2009, http://www.whitehouse.gov/the_press_office/Remarks-by-the-President-to-the-Annual-Conference-of-the -American-Medical-Association.
For years it’s been claimed that technology has the potential to reduce errors, improve health care quality, and save costs. Now pioneering hospital networks and technology companies are partnering to help tackle cost and quality issues. For a look at possibilities for leveraging data throughout the doctor-patient value chain, consider the “event-driven medicine” system built by Dr. John Halamka and his team at Boston’s Beth Israel Deaconess Medical Center (part of the Harvard Medical School network).
When docs using Halamka’s system encounter a patient with a chronic disease, they generate a decision support “screening sheet.” Each event in the system: an office visit, a lab results report (think the medical equivalent of transactions and customer interactions), updates the patient database. Combine that electronic medical record information with artificial intelligenceComputer software that seeks to reproduce or mimic (perhaps with improvements) human thought, decision making, or brain functions. on best practice, and the system can offer recommendations for care, such as, “Patient is past due for an eye exam” or, “Patient should receive pneumovax [a vaccine against infection] this season.”J. Halamka, “IT Spending: When Less Is More,” BusinessWeek, March 2, 2009. The systems don’t replace decision making by doctors and nurses, but they do help to ensure that key issues are on a provider’s radar.
More efficiencies and error checks show up when prescribing drugs. Docs are presented with a list of medications covered by that patient’s insurance, allowing them to choose quality options while controlling costs. Safety issues, guidelines, and best practices are also displayed. When correct, safe medication in the right dose is selected, the electronic prescription is routed to the patients’ pharmacy of choice. As Halamka puts it, going from “doctor’s brain to patients vein” without any of that messy physician handwriting, all while squeezing out layers where errors from human interpretation or data entry might occur.
President Obama believes technology initiatives can save health care as much as $120 billion a year, or roughly two thousand five hundred dollars per family.D. McCullagh, “Q&A: Electronic Health Records and You,” CNET/CBSNews.com, May 19, 2009. An aggressive number, to be sure. But with such a large target to aim at, it’s no wonder that nearly every major technology company now has a health solutions group. Microsoft and Google even offer competing systems for electronically storing and managing patient health records. If systems like Halamka’s and others realize their promise, big benefits may be just around the corner.
Sometimes it makes sense to combine a firm’s data with bits brought in from the outside. Many firms, for example, don’t sell directly to consumers (this includes most drug companies and packaged goods firms). If your firm has partners that sell products for you, then you’ll likely rely heavily on data collected by others.
Data bought from sources available to all might not yield competitive advantage on its own, but it can provide key operational insight for increased efficiency and cost savings. And when combined with a firm’s unique data assets, it may give firms a high-impact edge.
Consider restaurant chain Brinker, a firm that runs seventeen hundred eateries in twenty-seven countries under the Chili’s, On The Border, and Maggiano’s brands. Brinker (whose ticker symbol is EAT), supplements their own data with external feeds on weather, employment statistics, gas prices, and other factors, and uses this in predictive models that help the firm in everything from determining staffing levels to switching around menu items.R. King, “Intelligence Software for Business,” BusinessWeek podcast, February 27, 2009.
In another example, Carnival Cruise Lines combines its own customer data with third-party information tracking household income and other key measures. This data plays a key role in a recession, since it helps the firm target limited marketing dollars on those past customers that are more likely to be able to afford to go on a cruise. So far it’s been a winning approach. For three years in a row, the firm has experienced double-digit increases in bookings by repeat customers.R. King, “Intelligence Software for Business,” BusinessWeek podcast, February 27, 2009.
There’s a thriving industry collecting data about you. Buy from a catalog, fill out a warranty card, or have a baby, and there’s a very good chance that this event will be recorded in a database somewhere, added to a growing digital dossier that’s made available for sale to others. If you’ve ever gotten catalogs, coupons, or special offers from firms you’ve never dealt with before, this was almost certainly a direct result of a behind-the-scenes trafficking in the “digital you.”
Firms that trawl for data and package them up for resale are known as data aggregatorsFirms that collect and resell data.. They include Acxiom, a $1.3 billion a year business that combines public source data on real estate, criminal records, and census reports, with private information from credit card applications, warranty card surveys, and magazine subscriptions. The firm holds data profiling some two hundred million Americans.A. Gefter and T. Simonite, “What the Data Miners Are Digging Up about You,” CNET, December 1, 2008.
Or maybe you’ve heard of Lexis-Nexis. Many large universities subscribe to the firm’s electronic newspaper, journal, and magazine databases. But the firm’s parent, Reed Elsevier, is a data sales giant, with divisions packaging criminal records, housing information, and additional data used to uncover corporate fraud and other risks. In February, 2008, the firm got even more data rich, acquiring Acxiom competitor ChoicePoint for $4.1 billion. With that kind of money involved, it’s clear that data aggregation is very big business.A. Greenberg, “Companies That Profit from Your Data,” Forbes, May 14, 2008.
The Internet also allows for easy access to data that had been public but otherwise difficult to access. For one example, consider home sale prices and home value assessments. While technically in the public record, someone wanting this information previously had to traipse down to their Town Hall and speak to a clerk, who would hand over a printed log book. Not exactly a Google-speed query. Contrast this with a visit to Zillow.com. The free site lets you pull up a map of your town and instantly peek at how much your neighbors paid for their homes. And it lets them see how much you paid for yours, too.
Computerworld’s Robert Mitchell uncovered a more disturbing issue when public record information is made available online. His New Hampshire municipality had digitized and made available some of his old public documents without obscuring that holy grail for identity thieves, his Social Security number.R. Mithchell, “Why You Should Be Worried about Your Privacy on the Web,” Computerworld, May 11, 2009.
Then there are accuracy concerns. A record incorrectly identifying you as a cat lover is one thing, but being incorrectly named to the terrorist watch list is quite another. During a five-week period airline agents tried to block a particularly high profile U.S. citizen from boarding airplanes on five separate occasions because his name resembled an alias used by a suspected terrorist. That citizen? The late Ted Kennedy, who at the time was the senior U.S. senator from Massachusetts.R. Swarns, “Senator? Terrorist? A Watch List Stops Kennedy at Airport,” New York Times, August 20, 2004.
For the data trade to continue, firms will have to treat customer data as the sacred asset it is. Step over that “creep-out” line, and customers will push back, increasingly pressing for tighter privacy laws. Data aggregator Intellius used to track cell phone customers, but backed off in the face of customer outrage and threatened legislation.
Another concern—sometimes data aggregators are just plain sloppy, committing errors that can be costly for the firm and potentially devastating for victimized users. For example, in 2005, ChoicePoint accidentally sold records on 145,000 individuals to a cybercrime identity theft ring. The ChoicePoint case resulted in a $15 million fine from the Federal Trade Commission.A. Greenberg, “Companies That Profit from Your Data,” Forbes, May 14, 2008. In 2011, hackers stole at least 60 million e-mail addresses from marketing firm Epsilon, prompting firms as diverse as Best Buy, Citi, Hilton, and the College Board to go through the time-consuming, costly, and potentially brand-damaging process of warning customers of the breach. Epsilon faces liabilities charges of almost a quarter of a billion dollars, but some estimate that the total price tag for the breach could top $4 billion.F. Rashid, “Epsilon Data Breach to Cost Billions in Worst-Case Scenario,” eWeek, May 3, 2011. Just because you can gather data and traffic in bits doesn’t mean that you should. Any data-centric effort should involve input not only from business and technical staff, but from the firm’s legal team as well (for more, see the box “Note 11.32 "Privacy Regulation: A Moving Target"”).
New methods for tracking and gathering user information appear daily, testing user comfort levels. For example, the firm Umbria uses software to analyze millions of blog and forum posts every day, using sentence structure, word choice, and quirks in punctuation to determine a blogger’s gender, age, interests, and opinions. While Google refused to include facial recognition as an image search product (“too creepy,” said its chairman),M. Warman, “Google Warns against Facial Recognition Database,” Telegraph, May 16, 2011. Facebook, with great controversy, turned on facial recognition by default.N. Bilton, “Facebook Changes Privacy Settings to Enable Facial Recognition,” New York Times, June 7, 2011. It’s quite possible that in the future, someone will be able to upload a photo to a service and direct it to find all the accessible photos and video on the Internet that match that person’s features. And while targeting is getting easier, a Carnegie Mellon study showed that it doesn’t take much to find someone with a minimum of data. Simply by knowing gender, birth date, and postal zip code, 87 percent of people in the United States could be pinpointed by name.A. Gefter and T. Simonite, “What the Data Miners Are Digging Up about You,” CNET, December 1, 2008. Another study showed that publicly available data on state and date of birth could be used to predict U.S. Social Security numbers—a potential gateway to identity theft.E. Mills, “Report: Social Security Numbers Can Be Predicted,” CNET, July 6, 2009, http://news.cnet.com/8301-1009_3-10280614-83.html.
Some feel that Moore’s Law, the falling cost of storage, and the increasing reach of the Internet have us on the cusp of a privacy train wreck. And that may inevitably lead to more legislation that restricts data-use possibilities. Noting this, strategists and technologists need to be fully aware of the legal environment their systems face (see Chapter 14 "Google in Three Parts: Search, Online Advertising, and Beyond" for examples and discussion) and consider how such environments may change in the future. Many industries have strict guidelines on what kind of information can be collected and shared.
For example, HIPAA (the U.S. Health Insurance Portability and Accountability Act) includes provisions governing data use and privacy among health care providers, insurers, and employers. The financial industry has strict requirements for recording and sharing communications between firm and client (among many other restrictions). There are laws limiting the kinds of information that can be gathered on younger Web surfers. And there are several laws operating at the state level as well.
International laws also differ from those in the United States. Europe, in particular, has a strict European Privacy Directive. The directive includes governing provisions that limit data collection, require notice and approval of many types of data collection, and require firms to make data available to customers with mechanisms for stopping collection efforts and correcting inaccuracies at customer request. Data-dependent efforts plotted for one region may not fully translate in another effort if the law limits key components of technology use. The constantly changing legal landscape also means that what works today might not be allowed in the future.
Firms beware—the public will almost certainly demand tighter controls if the industry is perceived as behaving recklessly or inappropriately with customer data.