Author Archives: admin

Hadoop expands data infrastructure, boosts business intelligence

The big data that companies successfully transform into usable business intelligence (BI) is just the tip of a massive data-iceberg, according to Jonathan Seidman, solutions architect at Cloudera.

The big data that companies successfully transform into usable business intelligence (BI) is just the tip of a massive data-iceberg, according to Jonathan Seidman, solutions architect at Cloudera. At Big Data Techcon 2014, Seidman hosted a session called “Extending your data infrastructure with Hadoop,” in which he explained how Hadoop could help the enterprise tap into that potential business intelligence below the water. “That data that’s getting thrown away can have a lot of value but it can be very difficult to fit that data into your data warehouse,” Seidman explained.

The problem with big data is that there’s so much of it. Data centers simply don’t have the capacity to store it all. “Would you put a petabyte of data in your warehouse?” Seidman asked the audience. “It’s a good way to get fired,” a member shot back. For this reason, enterprises focus their energy on the data points that give a high return-on-byte, to use Seidman’s term. That is, they capture and analyze the data that provides the most insight for the least amount of storage space. For example, a retailer would analyze the transactional dataset, focusing their attention on actual purchases. But Seidman pointed out that valuable data gets left out – behavioral, non-transactional data, in the retail example. “What if you don’t just want to know what the customer bought, but what they did on the site?” Seidman asked.

Enter Apache Hadoop, an open source framework designed to store and process large data-sets. Seidman described this technology as “scalable, fault tolerant and distributed.” With this framework, enterprises can load raw data into it and impose a schema onto the data, afterward. “This makes it easy for iterative, agile types of development,” Seidman said. He added that it made a good sandbox for more exploratory types of analysis.

The Tools That Power Business Intelligence

Ever-evolving analytic software can greatly improve financial institutions’ decision-making.

Business intelligence technology has come a long way from the decision support systems of the 1960s. Today, it can do much more than just mine, analyze and report on data — it can cross-analyze different data sets, forecast future behavior and greatly improve decision-making.

Tools continue to expand their capabilities, providing more value every year. The types of analysis they can perform today stretch the realm of what was possible even five years ago.

Technology Advances

The financial industry analyzes its vast store of data in several ways, and evolving BI tools aid in those tasks. Some of the capabilities that executives seek include:

Content analytics: Unstructured data (such as the content found in machine logs, sensor data, audio, video, call center logs, RSS feeds, social media posts and PowerPoint files) is growing more rapidly than any other type of data. Content analytics applies BI to this unstructured data.

By understanding more about the content and how it’s being used, enterprises can determine whether it’s valuable to the business. The content that is deemed valuable can be linked to other data to extrapolate additional insight, such as understanding the cause behind trends and events.

Context analytics: Effective decisions can’t be made without understanding the context of data, and that’s where context analytics comes in. It focuses on surrounding each data point with a historical context about people, places and things, and how each data point relates to other data points.

Business analytics: While traditional BI platforms include executive dashboards that provide key performance metrics, newer tools go further. Business analytics provides a deeper level of statistical and quantitative analysis, allowing financial services organizations to dive deeper to discover trends, relationships, patterns, behaviors and opportunities that are particularly difficult to discern.

Predictive analytics: Predictive analytics is a must-have for many financial services organizations, and for good reason. The process uses a variety of techniques, including statistical analysis, regression analysis, correlation analysis and cluster analysis, along with text mining, data mining and social media analytics, to learn from historical experience what to expect in a given area. Financial services firms can use the resulting models and patterns along with real-time data to develop proactive actions in areas such as loan approval determination and product development.

Cognitive analytics: This type of analytics employs artificial intelligence and machine learning algorithms to learn and build knowledge by experience in their domain, including terminology, processes and preferred methods of interaction. They process natural language and unstructured data and can help experts make better decisions.

Text analytics: This process transforms unstructured data such as email, text messages, web pages, social media, survey responses and charts into text. With this information translated into text, BI systems can better use the data to discover patterns, relationships and root causes.

Social media analytics: From Twitter and Facebook to LinkedIn, YouTube and blogs, it’s clear that social media is an information channel that can’t be ignored. Social media analytics gathers and analyzes data from sites like these in near real time, giving decision-makers access to extremely valuable information that provides insight into customer sentiment.

It also provides a way for financial services companies to quantify market perceptions, track the success of marketing campaigns and product launches, discover insights and trends in customer preferences, and react more quickly

Microsoft is set to release a patch for a “zero-day” vulnerability

By Mid of April 2014, Microsoft is set to release a patch for a “zero-day” vulnerability. When I asked some of my friends if they had heard the term “zero day”, a few of them said they had. When I asked them what the term referred to, they thought it meant “The number of days until you’re hacked”. Close, but not quite. It actually refers to lead time in an arms race.

One way that hackers can compromise your computer is by exploiting bugs in the programs you use every day. These bugs are called vulnerabilities by security geeks. Software companies are constantly testing their products looking for these bugs, and when they find one, a couple of things happen. First, the company starts working on a fix in the form of a software patch. The second thing is hackers start making malware and viruses to take advantage of the bug. In other words, once a vulnerability is found, an arms race starts. How much time does the company have to patch the hole? Can the company issue a fix before the hackers use it to attack you? Most companies don’t even reveal the weakness until they have the patch ready, but sometimes the sneaky bad guys find out.

Frequently, it is not the software company that finds the bug, but the hacker himself. Some hackers do nothing all day but look for vulnerabilities in popular software. When they find one, they can secretly start working right away on malicious code to take advantage of the bug. How can a software company create a patch if it doesn’t even know the vulnerability exists? In this case, how many days of lead time does a company have to create a fix? Zero! How much time does a user have to patch their system before being exposed to the malware? Zero!

Sometimes it seems like there are too many things to consider when thinking about the security of your home computer(s), but a relatively easy way to greatly improve your odds is simply keeping the applications you use up-to-date. Many programs provide automated tools to do this. For those that don’t, tune in next week for another tip.

Merck Optimizes Manufacturing With Big Data Analytics

Pharmaceutical firm uses Hadoop to crunch huge amounts of data so it can develop vaccines faster. One of eight profiles of InformationWeek Elite 100 Business Innovation Award winners.

Producing pharmaceuticals of any kind is an expensive, highly regulated endeavor, but producing vaccines is particularly challenging.

Vaccines often contain attenuated viruses, meaning they’re altered so they give you immunity but not the actual disease, and thus they have to be handled under precise conditions during every step in the manufacturing process. Components might have to be stored at exactly -8 degrees for a year or more, and with even a slight variance from regulator-approved manufacturing processes, the materials have to be discarded.

“It might take three parts to get one part, and what we drop or discard amounts to hundreds of millions of dollars in lost revenue,” says George Llado, VP of information technology at Merck & Co.

In the summer of 2012, Llado was seeing higher-than-usual discard rates on certain vaccines. Llado’s team was looking into the causes of the low vaccine yield rates, but the usual investigative approach involved time-consuming spreadsheet-based analyses of data collected throughout the manufacturing process. Sources include process-historian systems on the shop floor that tag and track each batch. Maintenance systems detail plant equipment service dates and calibration settings. Building-management systems capture air pressure, temperature, and other readings in multiple locations at each plant, sampling by the minute.

Aligning all this data from disparate systems and spotting abnormalities took months using the spreadsheet-based approach, and storage and memory limits meant researchers could only look at a batch or two at a time. Jerry Megaro, Merck’s director of manufacturing advanced analytics and innovation, was determined to find a better way.

By early 2013, a Merck team was experimenting with a massively scalable distributed relational database. But when Llado and Megaro learned that Merck Research Laboratories (MRL) could provide their team with cloud-based Hadoop compute, they decided to change course.

Built on a Hortonworks Hadoop distribution running on Amazon Web Services, MRL’s Merck Data Science Platform turned out to be a better fit for the analysis because Hadoop supports a schema-on-read approach. As a result, data from 16 disparate sources could be used in analysis without having to be transformed with time-consuming and expensive ETL processes to conform to a rigid, predefined relational database schema.

“We took all of our data on one vaccine, whether from the labs or the process historians or the environmental systems, and just dropped it into a data lake,” says Llado.

Megaro’s team was then able to come up with conclusive answers about production yield variance within just three months. In the first month, July 2013, the team loaded the data onto a partition of the cloud-based platform, and it used MapReduce, Hive, and advanced dynamic time-warping techniques to aggregate and align the data sets around common metadata dimensions such as batch IDs, plant equipment IDs, and time stamps.

In the second month, analysts used R-based analytics to chart and cluster every batch of the vaccine ever made on a heat map. Spotting notable patterns, the team then used R to produce investigative histograms and scatter plots, and it drilled down with Hive to explore hypotheses about the factors tied to low-yield production runs. Using an Agile development approach, the team set up daily data-exploration goals, but it could change course by that afternoon if it failed to find solid data backing up a particular hypothesis. In the third month, the team developed models, testing against the trove of historical data to prove and disprove leading theories about yield factors.

Through 15 billion calculations and more than 5.5 million batch-to-batch comparisons, Merck discovered that certain characteristics in the fermentation phase of vaccine production were closely tied to yield in a final purification step. “That was pretty powerful, and we came up with a model that demonstrated, quantifiably, that specific fermentation performance traits are very important to yield,” says Megaro.

The good news is that these fermentation traits can be controlled, but Merck has to prove that in a test lab before IT can introduce any changes to its production environment. And if any process changes are deemed material, Merck will have to refile the vaccine’s manufacturing process with regulatory agencies.

With the case all but solved for one vaccine, Merck is applying the lessons learned to a variant of that product that is expected to be approved for sale as soon as this year. And drawing on both the manufacturing insights and the new big data analysis approach, Merck intends to optimize the production of other vaccines now in development. They’re all potentially lifesaving products, according to Merck, and it’s clear that the new data analysis approach marks a huge advance in ensuring efficient manufacturing and a more plentiful supply.

Magic Quadrant for Business Intelligence and Analytics Platforms, a Gartner report – 2014.

Market Definition/Description

The BI and analytics platform market is in the middle of an accelerated transformation from BI systems used primarily for measurement and reporting to those that also support analysis, prediction, forecasting and optimization. Because of the growing importance of advanced analytics for descriptive, prescriptive and predictive modeling, forecasting, simulation and optimization (see “Extend Your Portfolio of Analytics Capabilities”) in the BI and information management applications and infrastructure that companies are building — often with different buyers driving purchasing and different vendors offering solutions — this year Gartner has also published a Magic Quadrant exclusively on predictive and prescriptive analytics platforms (see Note 1). Vendors offering both sets of capabilities are featured in both Magic Quadrants.

The BI platform market is forecast to have grown into a $14.1 billion market in 2013, largely through companies investing in IT-led consolidation projects to standardize on IT-centric BI platforms for large-scale systems-of-record reporting (see “Forecast: Enterprise Software Markets, Worldwide, 2010-2017, 3Q13 Update”). These have tended to be highly governed and centralized, where IT production reports were pushed out to inform a broad array of information consumers and analysts. While analytical capabilities were deployed, such as parameterized reports, online analytical processing (OLAP) and ad hoc query, they were never fully embraced by the majority of business users, managers and analysts, primarily because most considered these too difficult to use for many analytical use cases. As a result, and continuing a five-year trend, these installed platforms are routinely being complemented, and in 2013 were increasingly displaced, in new sales situations by new investments, and requirements were more skewed toward business-user-driven data discovery techniques to make analytics beyond traditional reporting more accessible and pervasive to a broader range of users and use cases.

Also in support of wider adoption, companies and independent software vendors are increasingly embedding both traditional reporting, dashboards and interactive analysis, in addition to more advanced and prescriptive analytics built from statistical functions and algorithms available within the BI platform into business processes or applications. The intent is to expand the use of analytics to a broad range of consumers and nontraditional BI users, increasingly on mobile devices. Moreover, companies are increasingly building analytics applications, leveraging new data types and new types of analysis, such as location intelligence and analytics on multistructured data stored in NoSQL data repositories.

Oil Companies Uses Spotfire to Drill Down to Real Results

When planning the course of your business, strategic decisions require more than talent and intuition. These days, titans of industry require solid data to keep the fires stoked and the wheels turning. Calculated decisions are the real means to success, and the faster you make them, the better. But with today’s massive information volumes and unrelenting speed, organizations need a modern way of extracting valuable insight from their data—that’s where TIBCO Spotfire comes in.

Fast Insights Drive Results

Forest Oil, a premiere drilling and exploration company, understands this all too well. In the oil and gas industry, where speed and foresight can mean the difference between boom and bust, keeping costs down and production high is critical. Now, with Spotfire, Forest Oil can monitor and analyze every nugget of data originating from its employees and wells, all the way down to the predicted vs. actual productivity of specific oil fields—all at high speed.

Remote Collaboration

Spotfire also delivers innovative mobile capabilities to keep remote employees connected. Let’s face it, oil and natural gas deposits are rarely found near the office water cooler at corporate headquarters. Fast access to the data you need in the palm of your hand, wherever you may be, is essential to gaining and sustaining competitive advantage. Employees are alerted to issues in real time; when an oil well forecast and production differ by 10 percent, the right employees receive an alert, permitting them to act before it’s too late.

Simplicity is Key

Did I mention Spotfire’s stunning analytic visualizations? Well, they’re included too and make it simple to quickly develop insights by presenting data in a logical, intuitive manner. Contrary to popular belief, you don’t have to be the reincarnation of Einstein to effectively use cutting-edge BI analytic tools, though the tangible results may indicate otherwise. By connecting personnel to the information they need to successfully do their jobs, analytics not only supplies the power of real-time speed, but the means to make smarter decisions on the spot, wherever that might be.

Browse Info Solutions will play a strategic role

Browse Info Solutions will play a strategic role in integrating human resource groups to conduct pre-employment background screening on individuals for whom employment is to be tendered. We ensure hiring of employees is of the highest integrity in order to maintain a safe work environment.

http://northforkvue.com/press-releases/93581/browse-info-solutions-announces-launch-of-integrated-resource-management-application-presto-hr/

Data Virtualization for Business Intelligence and Data Solutions

In a nutshell, when data virtualization is enabled, an abstraction layer is provided which hides the applications most of the technical aspects of how and where data is stored. Applications do not need to know where all data is physically stored, how the data should be integrated, where the data store server runs, what the required APIs are, which language to use to access the data.

Advantages and Disadvantages –

- Users can work with more timely data

- Less need for creating derived data store

- Time to market for new reports

- Transformations are executed repeatedly

- Complex transformations takes too long

- Production system overwrites old data when new data is entered

Before Implementing, identify good test project with several to millions of rows in one data source,

Several to 100 columns’ and Low volume of concurrent users.

In traditional process, we use ETL to move data to application specific database and use that data for application or build reports, but in some cases by the time you moved the data, the report requirements can change and here the DV layer allows application to access shared enterprise data services without physically replicating data to your own application schema. With DV, the data would have stayed in the source system, any application could use it without copying it over. If we go beyond structured and internal data, you can use DV to connect to unstructured data (Facebook, Twitter) and external data (3^rd party owned data) without owning them in your own infrastructures. Having said that, it is a complimentary tool that you would want to have, but not to replace what you already have in your technology stack.

Data will drive the next wave for widespread Functional Programming adoption

Big data is a popular word, but associated with the problem of data sets too big to manage with traditional databases. The parallel has been the NO SQL era that is good for handling unstructured data, scaling, etc. IT shops realize that NOSQL is useful and all, but people really interested in SQL and Its making a comeback. You can see it in Hadoop, in –SQL like APIs for some “NOSQL” DBs, eg., Cassandra and MongoDB’s javascript based query language, as well as NewSQL DBs.

A drawback of the SQL is that it doesn’t provide first class functions, so (depending on the system) you are limited to those that are built-in or UDFs(User defined functions) that you can write and add. Functional Programming language makes it easy.

Even today, most developers get by without understanding concurrency. Many will just use an actor or reactive model to solve their problems. I think more devs will have to learn how to work with data at scale and that fact will drive them to FP.

We have seen lot of issues with Mapreduce. Already alternatives like spark for general use and storm for event stream processing, are gaining traction. FP is such a natural fit for the problems that any attempts to build big data systems without it will be handicapped and probably fail.

Browse Info Solutions release 2013 Financial Results

Canton, Mich., February 25, 2014 – Browse Info Solutions Inc., provider of IT services and solutions, today announced its financial results for the year ending December 31, 2013. The Financial growth compared to previous year increased by 90 percent. The company started its new operation center in Canton MI and emerging to expand its employee count by 100 percent.

Note that as a private company, Browse Info Solutions does not release dollar values for its revenue and profitability figures.

“Last year Browse Info Solutions spent a lot of time and resources in building a stronger team,” said Vinod, Director, Browse Info Solutions Inc. “Our retail customers in MO State are experiencing best ROI with the implementation of our new product FAP. With a strong team in place, in 2014, we are aggressively targeting diverse markets to expand the reach of our product.”

Browse Info Solutions is constantly hiring for both consulting and management positions.

Contact sales@browseinfosolutions.com for details.

About Browse Info Solutions Inc.

Browse Info Solutions Inc., founded in 2011 and based in Cary, NC with new operations in Canton, Mich is a privately held technology company that develops and delivers professional services and solutions in support of IT applications. Our enterprise architecture frameworks allow us to develop innovative IT solutions which are brainstormed by our uniquely qualified internal team. More information is available at www.browseinfosolutions.com.