IBM Corp. today said it’s teaming up with Hortonworks Inc. in a partnership that will combine IBM’s data science offering with Hortonworks’ software platform for managing huge amounts of data.

The two companies said the integration was all about helping customers better analyze and manage their data, but it also has some big consequences for IBM, as it means it is effectively giving up on developing its own version of Apache Hadoop, the open-source-based software used for storing, processing and analyzing big data.

IBM Data Science Experience, a collection of some 250 curated data sets along with collaborative tools, enables developers to create machine learning and analytics models quickly and easily. Hortonworks Data Platform is the big-data company’s official Apache Hadoop distribution.

The partnership will see Hortonworks resell IBM’s Data Science Experience solution alongside HDP. Hortonworks said it will also adopt IBM DSE as its strategic data science platform, which means developers using HDP can now access new capabilities such as machine learning, advanced analytics and statistics.

In return, IBM will adopt HDP as its official Hadoop distribution, and will fully integrate it with IBM Big SQL, which is the company’s SQL database engine for Hadoop. The integration means that IBM is giving up on BigInsights, which combines its enterprise capabilities and industry-standard Hadoop components into a single platform, and migrate users to HDP.

In an interview with SiliconANGLE, Rob Thomas, general manager of IBM Analytics, explained that the partnership would help to expand IBM’s latest data science strategy that began when it pulled technology from its Watson and IBM DSE offerings and made it available on IBM mainframes earlier this year.

“This is the next step in that strategy, making this available on big data, specifically Hadoop,” Thomas said. “Machine learning works best when it’s brought to the data rather than the other way around.”

More important, perhaps, is that the deal allows both IBM and Hortonworks to focus on what they do best, Forrester Research Inc. Vice President and Principal Analyst Brian Hopkins told SiliconANGLE. He explained that the partnership frees up IBM to focus on developing its advanced analytics solutions. Meanwhile, Hortonworks gets to add more value to its HDP platform.

“IBM’s big-data distribution was having trouble keeping up, but now they have HDP, which is the leading edge,” Hopkins said. “It’s a big deal for Hortonworks too, because it is another enterprise vendor-turned-cloud player that will be running their infrastructure and reselling their product. It puts pressure on Cloudera.”

Still, said George Gilbert, big data analyst at Wikibon, owned by the same company as SiliconANGLE, Hortonworks will continue to face its own pressures from the big public cloud computing providers.

“Machine learning and data science tools really only get much more productive when they come with pre-trained models,” he said. “Google, Microsoft and Amazon are best positioned to offer this capability.”

In a blog post today, Hopkins added that IBM’s main problem with Hadoop was a lack of talent. Although IBM insists it has managed to increase the number of open source developers on its books by almost seven times in the past two years, Hopkins contended that the company has trouble attracting the kind of staff with the innovative skills necessary to work on open-source projects. He pointed out that most younger developers tend to prefer to work for upstart companies such as Hortonworks that are challenging the “big boys.: He said it’s also a problem for companies such as Microsoft Corp. and Intel Corp., because open-source is where all the new growth is, but they cannot compete with the likes of Hortonworks.

“The future of these software giants is tied to how well they can navigate the disruption to their legacy business model which is on premise, scale up, and enterprise-deal centric,” Hopkins wrote.

The companies also announced Hortonworks DataFlow for IBM Power Systems, a new platform for running big data and cognitive workloads that should provide users with faster access to data. HDF is a stream processing and streaming analytics platform that complements its HDP platform by accelerating the flow of data into it. IBM PowerSystems is a line of computer servers built on the company’s Power Architecture that are designed to run big data workloads.

In addition, IBM and Hortonworks said they will step up their collaborative efforts in the Apache Software Foundation’s open-source ecosystem. This includes advancing the development of IBM’s Unified Governance solutions on Apache Atlas, which is a scalable governance platform for Hadoop. The idea is to elevate Atlas from its “incubator” status at the ASF to a full, top-level project, so it can be released for open development and deployment.

The companies also said they’re planning to expand their work on the Apache Spark data processing engine, and will work with open-source community to standardize security, metadata and data governance around Atlas and Apache Ranger, which is a framework for managing data security on Hadoop.

“The world of data science and machine learning and this new world of enterprise big data needs to be open source,” Thomas said. “The reality is the other options for Hadoop players in the market are fully proprietary. It’s bad for clients and bad for innovation.”

Still, IBM’s transition to Hortonworks’ Hadoop distribution won’t be easy for customers. “The only group that faces tough choices is IBM’s Hadoop installed base,” said Gilbert. “IBM has to help them move to the new distribution. While both vendors support the ODPi standard for Hadoop distros, that is only a small core of each vendor’s entire distro so migration won’t be easy.”

 By Mike Wheatley

Source : SiliconANGLE