Monthly Archives: March 2015

Developing the Right Data Strategy for Your Organization

- By Randy Bean

When it comes to making actionable use of data, there is no single playbook or set of common practices that apply universally to all businesses, CIO Journal Columnist Randy Bean says. “Organizations would be well served to break from accepted dogma and apply fresh thinking as they consider how best to align their resources, capabilities, and people to make wise use of their data,” he writes.

FastWorks Friday: Getting smarter about testable hypotheses and experiments

Do you struggle with coming up with good Leap-of-Faith Assumptions (LOFAs) when applying FastWorks? That is, do you have a hard time articulating the testable hypothesis (a statement proposing some relationship between two or more variables that can be tested) ?

The best article I’ve seen on the topic really helped me some time back. It’s from the go-to thinker on applying analytics in business contexts, Prof. Tom Davenport. In this classic Harvard Business Review article from 2009 titled “How to Design Smart Business Experiments”, Prof Davenport outlines one of the most difficult things to wrap one’s head around: what makes a smart business experiment. Here’s a key quotation from the article, “Formalized testing can provide a level of understanding about what really works that puts more intuitive approaches to shame.” For me, this article helped clarify how to clarify what could be testable, how to test it quickly, and how to learn from the results.

The image below (attached) adds an additional and important point on the same: how to make sure that the experiments run are shared and thus encourage a culture of testing hypotheses and learning from them.

Inline image 1

The article is incredibly practical, offering some good rules of thumb, especially for those of us looking to apply FastWorks as a manager. Of course, FastWorks is a lot more than hypotheses and experiments. But the first step is understanding how to move quickly, testing assumptions that have a high impact to success if wrong and are quick to test.

Do you have anything you’ve done to get to testable hypotheses more quickly?

Clearly Defining Data Virtualization, Data Federation, and Data Integration

More and more often the terms data virtualization, data federation, and data integration are used. Unfortunately, these terms have never been defined properly.Let’s see if, together, we can come up with generally accepted definitions.

Data Virtualization

Data virtualization is the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.

Data virtualization provides an abstraction layer that data consumers can use to access data in a consistent manner. A data consumer can be any application retrieving or manipulating data, such as a reporting or data entry application. This abstraction layer hides all the technical aspects of data storage. The applications don’t have to know where all the data has been stored physically, where the database servers run, what the source API and database language is, and so on.

Technically, data virtualization can be implemented in many different ways. Here are a few examples:

  • With a federation server, multiple data stores can be made to look as one.
  • An enterprise service bus (ESB) can be used to develop a layer of services that allow access to data.
  • Placing data stores in the cloud is also a form of data virtualization.
  • In a way, building up a virtual database in memory with data loaded from data stored in physical databases can also be regarded as data virtualization.
  • Organizations could also develop their own software-based abstraction layer that hides where and how the data is stored.

Data Federation

Data federation is a form of data virtualization where the data stored in a heterogeneous set of autonomous data stores is made accessible to data consumers as one integrated data store by using on-demand data integration.

This definition is based on the following concepts:

  • Data virtualization: Data federation is a form of data virtualization. Note that not all forms of data virtualization imply data federation.
  • Heterogeneous set of data stores: Data federation should make it possible to bring data together from data stores using different storage structures, different access languages, and different APIs.
  • Autonomous data stores: Data stores accessed by data federation are able to operate independently; in other words, they can be used outside the scope of data federation.
  • One integrated data store: Regardless of how and where data is stored, it should be presented as one integrated data set. This implies that data federation involves transformation, cleansing, and possibly even enrichment of data.
  • On-demand integration: This refers to when the data from a heterogeneous set of data stores is integrated.

Data Integration

Data integration is the process of combining data from a heterogeneous set of data stores to create one unified view of all that data.

Data integration involves joining data, transforming data values, enriching data, and cleansing data values. This is the approach taken when using ETL tools.

Data virtualization might not need data integration. It depends on the number of data sources being accessed. Data federation always requires data integration. For data integration, data federation is just one style of integrating data.