On January 19, MetiStream hosted our meetup: The Washington DC Apache Spark Interactive and held a State of the Union Panel Discussion on Apache Spark Big Data Innovation. A fitting topic given the transition of power in Washington that happened the next day. We firmly believe that a consistent focus on big data powers innovation and keeps America strong and distinctive.Our technology meetups shepard this innovation by bringing the community together to learn and share knowledge, network, and find opportunities for our members. Yes, we may be a roomful of geeks…but this is what we do. This is fun for us!
Intel, the South Big Data Hubs, and WeWork sponsored the meetup. Much credit goes to them for a fabulous evening with great food and interesting discussions! Our panelist included some truly brilliant and accomplished members: Melvin Greer, director data science and analytics, Intel, and South Big Data Innovation Hub steering committee member, Fen Zhao, PhD, program officer for the of National Science Foundation Big Data Hubs & Spokes program, Idriss Mekrez, PhD, CTO of MarkLogic for the public sector, and Richard Garris, principal solutions architect at Databricks. You can find their full bios on our meetup page.
We started the evening by breaking up into small groups to give our members the opportunity for a more intimate meet and greet with our panelist. Each panelist participated in a breakout session and debriefed the larger audience on what they learned in their session. Across the groups, we identified some general themes: skills development, the fast pace of change in open source, the implications of artificial intelligence, concerns about what the new administration will fund, and exciting opportunities surrounding the big data/Spark community.
Our panelists discussed the state of the big data/Spark market in 2016 and pondered the question “Is the market slowing down?” Greer started the discussion, stating every business is now in the business of data or cyber. Every business is incentivized to be data-driven and security-focused or risk being beaten by their competition, he said. We also looked at things more broadly and concluded that the growth for our field seems unlimited and Spark continues to spearhead much of the technical vision and opportunities.
Furthermore, Greer stated that “big data” is a poor and limiting term. What’s most important, he said, is finding the significance in data regardless of its size. He also cautioned that we must be careful with our approach and that technologist must bear some responsibility for shaping a better future. He suggested the audience read Cathy O’Neil’s book on Weapons of Math Destruction to learn more.
The most interesting question we covered was “If you can go back to your younger self, what advice would you give related to the Big Data/Spark market that would be of the biggest help to the audience?” We heard a broad range of advice and thoughts, from investing in Google and Facebook stock “back in the day” to quantum computing being ahead of its time. A recurring topic that emerged as a theme Artificial Intelligence (AI). The panelists strongly encouraged us to read the recently published National AI Research and Development Strategic Plan. Mekrez, who holds a doctorate in cognitive science with a and specialization in AI, told the audience to continue to push the envelope around AI innovation as we have yet to fully realize its potential. Will Spark tackle AI concepts in the coming year? It seems the industry is already well down that path. The upcoming Spark Summit East in Boston offers a clue in one AI talk.
As for cool projects or initiatives we should follow in 2017, Garris suggested keeping an eye on Databricks’ efforts around Structured Streaming. If it lives up to its objectives, Structure Streaming will make it easy to build continuous applications. For more insights into this project, check out the Databricks blog. Zhao suggested following the Northeast Spoke project “A Licensing Model and Ecosystem for Data Sharing.” That effort aims to develop a prototype software platform to enable seamless data sharing across organizations and disciplines while enforcing constraints on the use of the data. Separately, Mekrez was asked about MarkLogic’s projects with Spark and he confirmed they are doing Spark work. You can check out more details on the MarkLogic blog.
Overall, the evening was a stimulating way to kick off the new year and to ponder what the future holds in 2017 and beyond. Inarguably our community agrees that, the state of the Apache Spark/big data market remains strong and we all agreed that the new administration should continue to invest in innovation in our field.
Author: Donna-M Fernandez, co-founder & COO, MetiStream, Inc.