Data is a precious thing and will last longer than the systems themselves.” – Tim Berners-Lee, inventor of the World Wide Web.

In the past several years, we’ve heard an immense amount about data, big data, data analytics and every possible topic related to data. We know that 90% of all currently available data has been generated in the past two (2) years![1] We also know that every business publication has had articles on data (Business Week, May 2013; Harvard Business Review, December 2013; Forbes February 2014 to name just a few), and that every business consultant such as Accenture, Deloitte, Gartner etc. has a practice or advisory in this area.

Closer to home, many large healthcare organizations are developing analytic systems utilizing very large amounts of data to provide diagnostic, treatment planning and operational guidance. Examples include the point-of-care recommendation systems currently used by Kaiser Permanente and the Mayo clinic, among others, that provide near-real-time diagnosis and treatment planning guidance to providers at a patient’s bedside. Dr. Watson (IBM) is another well-known example.[2] These systems use millions of patient records, often recorded over long periods of time, as well as thousands (or more) journal articles & physician’s notes to provide their analysis and recommendations. Not many CHCs have this amount of patient data available, so what are the implications of analytics for health centers, and how can they take advantage of analytics to make better clinical & operational decisions?

First, let’s define what we mean by data and analytics. Data, in this sense, is a set of qualitative or quantitative values. Simply restated, pieces of data are individual pieces of information[3]. They may be numeric (quantitative), or words or sets of words (qualitative) or even hybrids such as addresses (77 Massachusetts Avenue, E40-248). Analytics, in general, is the discovery and communication of meaningful patterns in data[4]. Contemporary analytics has taken on a more specific meaning, especially in contrast to statistical analysis of data (the application of statistical hypothesis testing methods to data). Analytics today are a set of methods for data organization and analysis that are applied when data have (some of the) the following characteristics:

  • Volume: multiple petabytes of data;
  • Velocity: data values that are changing rapidly (e.g. NASA’s launch sensor net of >1M sensors of various types sampled 3x/second);
  • Variety: many different types of data in different formats and from different sources.

In healthcare, data variety is most often the issue. This type of data is very difficult to organize and analyze in a conventional sense.

Conventional analysis may also be deployed to understand the vast amount of data managed by our health care system. So what are the differences between analytics and conventional analysis? They can be summarized as follows:

  • Contemporary analytics is the empirical characterization of data and information. An example would be: A physician at Kaiser is using their point-of- care recommendation in order to confirm a diagnosis and  develop an optimal treatment plan. The physician is entering patient parameters while doing a bedside examination. The point-of-care recommendation system evaluates 4 PB of patient data against a set of patient parameters entered for this specific patient, and it finds 9,372 cases similar enough to use for comparison with the patient. That is not a statistical prediction of similarity, but an exact empirical characterization. In the same sense, if that system classifies treatment plans of those 9,372 cases according to outcome, that is not a statistical prediction of outcome, but an exact characterization of the outcomes present in the data. This changes how we think about results in that we are looking at exact characterizations, not predictions with associated probabilities. This is true even of smaller sets of data; analytics finds patterns and relationships in the available data.
  • In general, hypotheses and informational relationships are informed by the analysis, not by a priori assumptions. This means that empirical characterization is carried out by performing inquiry developed by consensus of the health center’s staff (or CEO’s designees) aligned with strategy. Then hypotheses are formed (and  relationships defined) based on empirical results and  analysis may continue
  • Contemporary analytics does not require extensive data transformation & normalization. Analytic systems such as Hadoop-based analytic stacks aggregate data in many different forms (alphanumeric, text, image, and other media). and perform analysis across all of these types (e.g. cost/service/location/provider or #encounters vs. macro-demographic & population trends).These data originate from many different sources (EHR, financial systems, practice management, public health systems, other private & public data sources). Analytics  does, however, require an understanding of the normalized definitions of common terms (encounters, providers etc.), especially if cross-organizational comparisons are to be made.

OK – so we know something about data and analytics, but what does this actually mean for health centers? As a technologist, I have to say that as interesting as the technology of analytics is, it’s not the point. The point is a way of thinking about data and  analysis. I use the phrase “data as an asset” as shorthand for this way of thinking. Thinking of data as an asset means that you and your team look at data within a broader context, beyond the day-to-day requirements of clinical practice and /or operations.  Rather, data is considered in relation to the strategy of your organization and to the kinds of strategic decisions that are required to keep your organization healthy. Thinking of data just as facts is no longer enough to create the largest amount of value from that data, you must think of data strategically. This requires having an awareness of external data as well as internal organization information, including data from city, county, state & federal programs, and data from other organizations – in short, as much relevant data as you can discover and access.

Once you start thinking about data as an asset, the process of utilizing data strategically begins; comprising the following steps

  • First is to review (or develop) your health center’s strategy and identify what decisions are embedded in it. For instance, if part of the strategy is a focus on growth of a dental or behavioral health practice, then many decisions will have to be made with respect to facilities, staffing etc.
  • Next is to identify what data you have access to that is relevant to those decisions. This may, in fact, not be entirely straightforward. You may include data that is not immediately apparent as relevant. Remember, one of the characteristics of analytics is that the relationships in the data are defined empirically by inquiry, not a priori.
  • Third is to convene groups of heterogeneous groups of stakeholders to develop areas of inquiry to be address by analysis. These can be quite general (e.g. the relationship of the provision of specific enabling services to outcome or cost), but they must be related to the health center’s strategy and to the decisions that need to be made to carry out that strategy.
  • Fourth, detailed analytic queries are developed to address the areas of inquiry and carried out.
  • Finally, results are interpreted & presented in support of data-driven decision-making. Queries can also be redesigned, modified or enhanced at this point and rerun.

Recent conversations with community health center staff at conferences and as part the Foundation’s health-center based a pilot project  have focused on several areas of inquiry that are strategic to the continued growth and success of these organizations. These areas have included:

  • Classifying patients according to risk and cost: This requires defining a set of classes (such as healthy patients, patients with chronic conditions, patients with multiple chronic conditions, patients with chronic conditions and behavioral health issues, etc.) and then analyzing the patient population with respect to these classes. Additionally further analysis might be done   to determine the cost of care for each patient and each class. This allows categories of patients, such as the top 1%, 5% and bottom 5% etc. to be identified with respect to cost and may lead to interventions once causes and similarities in these classes are also analyzed.
  • Determining the cost of providing specific services including enabling services (where data is available): This can be done along various axes such as per location, per time period, per provider; all of which may provide insight into costs and with additional analysis, into the relationship of services to outcomes.
  • Analyzing population trends utilizing both internal clinical and demographic data as well as publically available data (such as state- provided population trend data per location, time period etc.): This can provide insight into encounter trends as well as revenue trends.

Many other areas of inquiry are possible, but need to be aligned with the organization’s strategy in order to be productive and to enable data-driven decision-making.

The technology of contemporary analytics is also interesting, and it will be covered in my next column.

David Hartzband, D.Sc. is Director for Technology Research at the RCHN Community Health Foundation.



[1] http://www.sciencedaily.com/releases/2013/05/130522085217.htm

[2] http://www-03.ibm.com/innovation/ca/en/watson/watson_in_healthcare.shtml

[3] http://en.wikipedia.org/wiki/Data

[4] http://en.wikipedia.org/wiki/Analytics