All, here is another success story from Pfizer. This post is for people who has inquired about how other company leverages DV to its benefit. Pfizer has traditionally used ETL or data replication to move data to data marts and data warehouse. It has been an increased burden on IT to maintain the quality and security of the replicated data in addition to the original source data. The typical ETL DI work requires months to complete, with resulting data often out of date. The data sheet here is shared by Pfizer, I would not hold too tight on those numbers. The point is that DV layer could help your BI agility.
Pfizer’s Research Scientists Workbench gains the following new capabilities and benefits with DV approach which fits into its SOA strategy that emphasize creating data objects for reuse:
- Automated data–level development, freeing developers to work on application– level development while reducing the total development time in half
- Drag–and–drop development environment, built-in security and automated generation of Web services, requiring fewer specialized skills
- SOA–compliant Web Services Description Language (used for describing the functionality offered by a web service) data services providing data in the form needed by portal developers
- Loosely coupled data services that are easier to maintain than ETL and data delivery scripts when changes are made to the underlying data sources or the portal
Reusable data service assets
According to Gartner Hype Cycle for Information Infrastructure, 2012, “the Logical Data Warehouse (LDW) is a new data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategy. The LDW will form a new best practices by the end of 2015.” It has seven major components:
- Repository Management
- Data Virtualization
- Distributed Processes
- Auditing Statistics and Performance Evaluation Services
- SLA Management
- Taxonomy / Ontology Resolution
- Metadata Management
How does DV enable the logical data warehouse?
- Repository Management – Data virtualization supports a broad range of data warehouse extensions
- Data Virtualization – Data virtualization virtually integrates data within the enterprise and beyond.
- Distributed Processes – Data virtualization integrates big data sources such as Hadoop as well as enable integration with distributed processes performed in the cloud.
- Auditing Statistics and Performance Evaluation Services – Data virtualization provides the data governance, auditability and lineage required.
- SLA Management – Data virtualization’s scalable query optimizers and caching delivers the flexibility needed to ensure SLA performance.
- Taxonomy / Ontology Resolution – Data virtualization also provides an abstracted, semantic layer view of enterprise data across repository-based, virtualized and distributed sources.
Obviously the DV vendors are not completely there yet. But when you evalucate them, check to see if these are in its roadmap … I recommand Composite, not only the others components met our expectation, most importantly it scores high in #5 - which is important for data warehouse technology.
Real Time BI used to be hot topic since 10 years ago. As a result of big data phenomenon, data volumes and the diversity of new data sources are exploding. Now it is the emerging Operational Intelligence supports the real time BI on big data.Although visibility into streaming big data by itself is extremely valuable to the business, its value can be further enhanced by combining it with data that already exists in structured databases and data warehouses. For example, HR churn trending data in conversional DWH can be combined with advanced analytics data might be hosted in big data store. That can be used to identify red flags on hidden risks …
The question is if you need to move structured data to close to the big data and perform the join or leave them what they are and join them across the network.
In certain cases, we do not want to move large chunk of data over the wires, so OI vendor starts to provide add-on such SQL or DB connect. But we need to keep an eye on performance and check if it has caching capability. If not, there is possibility that your queries may impact database performance.
Data Virtualization servers have started to offer access to Hadoop, and with that they have entered the market of SQL-on-Hadoop engines. Current DV servers has runtime query engines that offer data federation capability for non SQL data sources, data lineage and impact analysis features, caching capabilities to minimize access of the data source, distributed join optimization techniques and data level security. I think they will raise the bar more in this market …