Strata 2013: Sasha Issenberg, "The Victory Lab"
http://strataconf.com/ Nerds crash the gates of a venerable American institution, shoving aside its so-called wise men and replacing them with a radical new data-driven order. We've seen it in sports, and now in The Victory Lab- which Politico has described as "Moneyball for politics"- Sasha Issenberg tells the hidden story of the analytical revolution upending the way political campaigns are run in the 21st century. The Victory Lab follows the renegade academics and maverick operatives rocking the war room and re-engineering a high-stakes industry previously run on little more than gut instinct and outdated assumptions. Armed with insights from behavioral psychology and randomized experiments that treat voters as unwitting guinea pigs -and reams of new individual-level data fed into microtargeting algorithms-the smartest campaigns now believe they know who you will vote for even before you do. The Victory Lab presents a secret history of modern American politics, pulling back the curtain on the tactics and strategies used by some of the era's most important figures-including Barack Obama and Mitt Romney-with iconoclastic insights into human decision-making, marketing and how analytics can put any business on the road to victory. http://strataconf.com/strata2013/public/schedule/detail/28555
Brian Bulkowski | Strata Data Conference 2013
Brian Bulkowski is the founder and CEO of Aerospike. Bulkowski joins Wikibon's Dave Vellante and SiliconAngle's John Furrier inside theCUBE at Strata Conference 2013. One of the more interesting interviews from last month's Strata event, for me personally, was when Brian Bulkowski, CTO of Aerospike joined theCube hosts John Furrier and Dave Vellante to talk Flash. In-Memory databases are here to stay, even filling into its own market. In the interview, Bulkowski discusses how software-led is the next maturation of a lot of cloud technologies. The flash layer of in-memory is booming right now. The biggest change in flash and SSD that Aerospike is seeing with its customers in the last 6 months is price and density. On the density side, its largest customers are doubling, sometimes quadrupling the size of each and every node. 500gb of flash per node, are now becoming 1-4 terabyte architectures. Going from 40 node clusters to 4 node clusters. Big advertising use cases pushing Big Real Time Data (BRTD). And price? $1-$1.50 per gigabyte compared to $30 per gigabyte of RAM. That's a 30x price performance difference. Additionally exciting for flash is that it doesn't have to be slower anymore. Before, at 10,000 transactions per second per sever flash was the bottleneck. Now, at 100,000-200,000 transactions per second per server network is back to being the bottleneck. In the last few months, flash has blown past that. The outcome? If you have in-memory and SSD, you might as well use flash. The memory is going to be the bottleneck in either case. Why not save yourself money to the tune of a factor of 30? SQL shows promise for real-time . Aerospike is a firm believer in real-time no SQL, too. Bulkowski said that they have a lot of use cases for real-time. Threat detection, insights, patterns, and signatures are all front-side. In the past, you had to have extraordinarily fast key value stores. Nowadays, with NoSQL you can map reduce over more data. This is huge for transactional — network threat detection is faster and equally big for gaming — matchmaking for opponents in game. 5oo milliseconds doesn't leave a lot of time to run analytics and match algorithms in the 'right now'. Real-time is replacing batch jobs in these instances. Software is helping push the real-time ball down the field, and software-led infrastructures and architectures are all about flexibility. Bulkowski sees it as the next step in maturation of a lot of the cloud technologies. What does that mean? "Database company like us flexibly and elastically at capacity," — nimble is the name of the game for Aerospike. Mixing and matching of softwares/technologies is only a good thing for the customer too. Database as a service opportunity anyone? For the customers it means a more real-time response to different business conditions. Software-led means new and improved productivity with certain impact on the organization. Aerospike believes that flash will continue to play a vital role in those improvements. Look out for user-defined functions (customizing your database) from Aerospike in 2013. With its gaming and online matchmaking clients, Aerospike aims to bring user-defined functions into the world of flash and SSD. Oh yea, and metadata is no longer under lock and key.
Peter Wang | Strata Data Conference 2013
Co-Founder and President of Continuum Analytics, Peter Wang, joined SiliconAngle's John Furrier and Wikibon's Dave Vellante inside theCUBE at Strata Conference 2013. The days where you can view data as a static thing are over. Kaput. No mas. Peter Wang, Co-Founder & President of Continuum Analytics, was kind enough to join John Furrier and Dave Vellante on theCube last week at Strata to discuss predictive analytics, Big Data, scientific computing, and the moving of more and more analytical code to where the data is. Continuum Analytics, the premier provider of Python-based data analytics solutions and services, is player we would bet big on in the Big Data space (full video below). As Wang sees it, there has been a fundamental disruption in the storage and ETL end of Big Data (business analytics) space. It is a push up market that has caused a push-back down market as all of the players jostle for position. "The Big Data wave that's coming is exceeding the disciplines for doing business analytics that most companies are used to," Wang says. Transformation (read: metadata) is turning Data Warehousing on its head. Announced just prior to Strata 2013 was Continuum's latest version of Anaconda, its premium collection of libraries for Python that includes NumbaPro, IOPro, and wiseRF all in one package. Anaconda enables big data management, analysis, and cross-platform visualization for business intelligence, scientific analysis, engineering, machine learning, and more. Here is a brief part of that press release: Available on Windows, Mac OS X and Linux, Anaconda includes more than 80 of the most popular numerical and scientific Python libraries used by scientists, engineers and data analysts, with a single integrated and flexible installer. It also allows for the mixing and matching of different versions of Python (including Python 3.3 on a 64-bit Linux installation), NumPy, SciPy, etc., and the ability to easily switch between environments. Improvements to the latest version of Anaconda include: The ability to build your own packages using conda New versions of wiseRF Pine and NumbaPro New, faster data adapters for Mongo database in IOPro New versions of currently included packages, notably cython v0.17.4, pandas v0.10.1, llvm 3.2 New packages: cubes, ply, pyparsing, mpi4py (OSX), googlecl, gdata, biopython Wang believes, and I would agree, that data at this point is a first-class concern. "Data has hit mass now. When you have enough data, you can't just willy-nilly move it around," he says. "You have to think about where did it come from, how am I going to view it, how do I want to transform it into those most useful views and do it in a way that doesn't incur more data movement." The dilemma is very peculiar...with data movement as a first-class concern, how do you best analyze In-Memory? Fluidity has found it's way to Big Data. Strike that. We've found that there is fluidity in Big Data, all data. "The days you can view data as a static thing are over," said Wang. He mentioned a quote he once heard, that there is no such thing as raw data. Which, by definition, is and always will be true: there is a sensor somewhere that collected the data in the first place. So what does that mean? How well do the worlds of Data Warehousing and Data Analyzing need to merge? Is proprietary the new 'last-year' and open source the new 'black'? Transformation, Co-Transformations ... what is the first big step in Big Data? I'd love to hear your thoughts in the comments. But mark this day in your calendar: the mobile revolution has centralized all of the data around our activities. Men lie, women lie, numbers don't. Good luck to all of the men and woman tackling the numbers.
Ben Werther | Strata Data Conference 2013
Ben Werther is the Founder and CEO of Platfora, an interactive in-memory BI for Hadoop. At the 2013 Strata Conference Werther joined SiliconAngle's John Furrier and Wikibon Analyst Dave Vellante inside theCUBE. Performance implications. That's the name of the game in Ben Werther's eyes, who says the old way of using data for business on its way out, and the new way improves the Business Intelligence environment whole-heartedly. Werther, the Founder & CEO of Platfora, an interactive in-memory BI for Hadoop, stopped by theCube towards the end of last month's Strata conference to discuss where BI and Big Data are headed. Performance implications are what lead Werther to start Platfora, and why he thinks the likes of EMC/ Greenplum don't quite have it figured out yet. Platfora can best be described in three layers: Driving whatever interfaces a BI client has to optimally, automatically, aggregate, distill and pull metadata then build out things from that data into scale out in memory layer. The design is for consistently sub second performance of interaction. An 'intelligent cache' that is driving Hadoop — pulling out the things that are relevant and evolving them as things change. Putting it into a completely web based exploratory for BI environment. Platfora's model is all about aggregating -- the data is distilled automatically, changing based on what is interesting and relevant to the business user. The old way takes months and requires dedicated developers, while Platfora's way will have things up and running in the same afternoon. The feedback Werther and his team are getting from their beta users supports the fanfare. Werther said that they've been told theirs is the first product where they are landing data in Hadoop, and are able to ask questions against that data set the next day based on the business users patterns. Design and usability. Two priceless factors Platfora has taken into account in the company's bottom-up build strategy. That means, making tools easy and intuitive enough to not fall short in the adoption of its product. Further proof of this commitment is that Platfora is the first BI product in production that uses HTML 5 canvas based technology. The vision is grand, and Platfora has driven its stake in the ground to be qualitatively different in the Big Data movement. When asked to extract the signal from the noise on Big Data and in-memory analysis, Werther laid out a very conclusive plan that he saw playing out. The idea that inflexible EDW of data is replaced by a fluid, agile reservoir. You're able to "pour" data in Hadoop (you don't have to make decisions ahead of time). The new stack built on top of that is designed to be exploratory. This is tangible for people. This is the shift in Big Data.
Bill Schmarzo | Strata Data Conference 2013
EMC Greenplum announced at Strata 2013 a new ditribution of Apache Hadoop: Pivotal HD. Greenplum’s announcement has definitely ruffled some feathers in the Hadoop space (see our interview with Charles Zedlewski of Cloudera). However when Bill Schmarzo stopped by theCube to talk with show hosts John Furrier and Dave Vellante, the conversation stayed more on the high-level topic of optimizing Big Data. In the data-driven world of today, data warehouses exist because they allow organizations to make better decisions, to ask questions they couldn’t ask before and to gain insights into the business said Schmarzo. To date, Business Intelligence (BI) vendors have largely stayed away from Big Data. BI tools have largely been solely optimized around SQL. However now they are going to be forced to have to explore the Hadoop platform. There was a great question asked by Vellante in the middle of the interview: Describe how a business person traditionally would interact with that corpus of data, the data warehouse and the BI professionals, and describe your vision for how that would change with Hadoop and Big Data? “In today’s world the BI community is dominated by reports. … the business users have been dying for the BI tools to leverage both real time data and the predictive analytics to undercover insights in the data. To provide recommendations for what they should do,” Schmarzo says. The businesses that are going to be the most successful will use business monitoring to identify insights to improve predictive analytics and optimization. There are trends Schmarzo is seeing that support this up-level move in BI. Data is starting to be treated as an asset, and analytics is actually intellectual property. Because of both, businesses will be able to trust data to make more decisions, more quickly on a higher level of confidence. Analytics as IP, while very forward-thinking in the Big Data realm, begs the question: is that sustainable? Or is this just a race to the mean? “I do think it’s a race, but I don’t think it’s a race to the mean. I think it’s a race for organizations to continuously look for new ways to look at their business,” said Schmarzo. A line that followed, that I absolutely love, is: humans are revenue optimizations machines. Big Data is soon to follow in those footsteps. Companies will constantly look for ways to identify those variables that might be better predictors of performance — Big Data will become more agile and deep analysis more like second nature. The smart play is platforms, and smarter companies will realize that they are manufacturing these platforms organically. Platforms will move up the ‘value chain,’ according to Schmarzo, and I tend to agree. He gave a great example: Ford’s new cars havae over 1200+ sensors. Driving behaviors, car performance, customized radio, help services (OnStar), mapping devices, smart phone integration, social cues — the Big Data grab is endless in one product alone — the car. By creating an intelligent product, the car in my Ford example, companies will in essence create the platform for other people to provide value, and valuable services and products on top of it. Big Data will become a woven fabric into the platforms of intelligent products and bi-products that support it. Most companies aren’t there yet — they don’t understand what data can do for them. But the writing is definitely on the wall.
Strata 2013: Rajat Taneja, "Video Games: The Biggest Big Data Challenge"
http://strataconf.com/ Some of the most complex challenges in data management exist where you may least suspect: inside video games. In the past decade the audience for games has exploded from 200M active users to 1.5B gamers worldwide. And these gamers are playing on multiple platforms -- high definition consoles, PCs, social media, mobile, online -- and they expect a seamless experience that connects them all. Consumers are becoming accustomed to features that allow them to play against their friends, track their progress, and even be able to turn off their console game and pick up where they left off on another platform like a smartphone. Much like other forms of media, playing a video game used to be linear -- you insert the disc and play the game. Now, new content can be purchased and downloaded to augment the experience, consumers can connect and play against friends online and you can track your progress and scores live over time. But these rich experiences mean an explosion in the amount of data that can really be a double-edged sword. To keep at the pace of consumer demand for online connected gaming experiences, EA's data scientists are building a new technology infrastructure that will improve the consumer experience and help the company analyze the hundreds of terabytes of consumer data that flow through the system each day. In this talk, EA CTO Rajat Taneja will dive in to the challenges and complexities facing the gaming industry, how to harness the power of data and share examples of how technologies like machine learning and predictive analytics have been put in place to improve the customer experience. http://strataconf.com/strata2013/public/schedule/detail/27603
Scott Howser | Strata Data Conference 2013
Scott Howser, Hadapt, at Strata 2013 with John Furrier and Dave Vellante theCube invited Scott Howser, VP Marketing Hadapt, to discuss the startup's progress thus far and their opinion on the big "Hadoop War" conversation going on right now, started by Greenplum's aggressive announcement on Pivotal HD. In a live interview at Strata 2013, Howser stated that Hadapt believes "the worlds of SQL and Hadoop will converge" sometime into the future. Therefore, Hadapt's efforts to bringing SQL to Hadoop have presented the company a privileged position in the market. Asked to comment on Greenplum's statements that it has a more mature database, developed over the past 11 years, while Hadapt is the new kid on the block, Howser pointed out that Hadapt has "a lot of experience in relational database technology." The company's differentiator is that SQL runs inside of Hadoop, while Greenplum uses a connector based methodology. This approach can lead to operational challenges, performance availability and performance and availability constraints. All other platforms currently available have the same "two platform approach," two separate platforms that need to be unified through some connector technology. As Hadapt pioneered SQL in the Hadoop market, Scott Howser commented on Greenplum's strategy and approach to the Big Data market. "They want to provide the entire stack," he said. Greenplum is trying to create a market where they have a sole opportunity for clients that want to do Big Data. However, "there is beauty in the community" and what happens in the Hadoop open source community, as it can produce new tools faster than one company or another. Greenplum's approach is restrictive and their clients will not be able to use these open source tools and integrate them into their ecosystems. Hadapt is not proprietary, he explained, "we support any distribution that's out there." The company offers the SQL interface, but can integrate any other tools and products and offer them on top of its solution. Talking about Hadapt evolution and client adoption, Howser mentioned that they are currently shipping products and have paying customers. There is a lot of uptake in financial services and customer behavior analysis. In the latter field, the Hadapt platform helps them analyze behavior in customers, bringing all data into one platform and getting the insights needed to influence customer behavior. All Hadapt competitors favor a two platform approach thus fur. This means a lack of interactivity, that Hadapt is working on offering through unifying SQL and Hadoop.
Ken Cukier | Strata Data Conference 2013
Ken Cukier is the Data Editor for The Economist and Author of Big Data -- A Revolution That Will Transform How We Will Work & Think Kenneth Neil Cukier -- Data Editor, The Economist and author of Big Data -- A Revolution That Will Transform How We Will Work & Think, stopped by theCube at last month's Strata event to talk Big Data with show host John Furrier (full video below). What's happening with Big Data and why it matters is all about the ability to dig deeper. Cukier gave three specifics: Not only are we able to use more data, sometimes we're able to get ALL of the data — get to granularities we never could before and learn new things. Messy data — gone are the days of being able to only analyze the buttoned-up data. Correlations, things we don't have to answer why...just what. With Big Data there is a new approach to business. You can take data not for primary use, but secondary uses, and reuse it and extract new forms of value. Big Data allows you to find more than just one needle in the haystack, sometimes a whole needle factory. "Data is becoming the new form of corporate literacy. The new form of literacy, a numeracy, called Big Data," says Cukier. Equally interesting in listening to Cukier talk was his take on what exactly Big Data means to our generation. "Our generation's great infrastructure project -- like we saw with the modern techie great generation of academics that then created the Internet we're having a new generation of Math whizzes and statistics people, machine learning, AI people creating this generation's way to optimize the enterprise through Big Data." Cukier believes that what we see as deficiencies now, like the skills gap, will naturally be fixed. "We're going to remedy this -- not a problem." What seems to be scarce now will be plentiful soon, he says. I think that his "huge land grab" analogy on how companies are collecting the data, any and all of it, furthers his point. Data, that's where the value lies. Towards the end of the interview, Furrier directs the questioning into what I found to be so important, I re-watched a couple times. On the impact of global data, Cukier admitted we don't know how to measure market size in data. And who actually owns the data? You might be of the camp that "I" owns the data because "I" created it. I thought I was in that camp too, but he said something I found to be very interesting: "here's the case for why not -- it took the company the cost and the effort to actually analyze it. Just the fact I'm giving off the data isn't the most germane after all." He hopes that with his book we're able to move the ball up the pitch on the topic of Big Data and its implications on our world. And those implications are awfully ominous. Remember the Minority Report comment I made at the beginning? One of the benefits of Big Data is that we have often treated people as a group, but with Big Data we are able to get rid of such profiling. However a much bigger problem may be looming ... propensity. In Big Data, we're having algorithms predict what our actions will be before we actually take them. In theory, you'll be penalized based on your propensity to do something before you've actually acted. "Thought crime, more likely it's actually pre-crime like Minority Report," said Cokier. "We've never had an environment where the judicial system or any administrative proceeding against us penalize us before we committed the crime or infraction." Big Data and prediction is exactly the world we're walking into. I'm running late, I'm going to hurry to get downtown to work. Predictive Analytics could penalize you before you've actually started to be late to work.
Jim Kelly | Strata Data Conference 2013
Quantcast's Jim Kelly joins Wikibon's Dave Vellante and SiliconAngle's John Furrier, inside theCUBE at Strata Conference 2013. Quantcast has has Big Data in its sights for years. Jim Kelly is the Vice President of Research and Development at Quantcast. Quantcast, by its own admission, has been dealing in Big Data since 2006 — before it was cool. Jim Kelly, VP Research and Development at Quantcast stopped by theCube during Strata last month to give some background on the Quantcast File System (QFS). As an alternative to HTFS and free to the open source community, Quantcast hopes to deliver better cost efficiencies at large scale to anyone who adopts it. QFS started 5-6 years ago when Quantcast began innovating a lot of technologies internally to handle the volume they were getting. Released in September 2012, it is a direct alternative to HTFS. A problem QFS is trying to fix is that Big Data sets tend to grow and have high operating costs. Power computing can quickly become a six- to seven-figure monthly operating expense. So with QFS, a goal was to build a more efficient file system that makes better use of space. QFS effectively doubles storage capacity of a Hadoop cluster compared to stock HTFS. The #1 challenge in designing a distributive file system is fault tolerance. Software needs to tolerate bits of your data going missing. HTFS makes 3 copies. QFS uses read Reed-Solomon Encoding (same used in CDs, DVDs). Big space savings, 1.5 copies, so relative to HTFS it's half. QFS uses data slices and parody slices (six data slices and three parody slices) to nine separate places by default. If QFS can read any six, it can reconstruct the data. HTFS you can only lose two, thus QFS has a better fault tolerance too. Here are some interesting factoids that show host Dave Vellante got Kelly to confirm during the interview as far as Quantcast numbers: 50 terabytes of data in the door per day avg. day process over 20 petabytes 1000 machines (reasonably modest commodity hardware) While he remained vague, Kelly said that Quantcast would measure success by the number of high quality collaborators that help extend the product together. File systems are an especially critical piece of the infrastructure puzzle. QFS stands to benefit from the scrutiny of open source, and Hadoop will benefit from having a file system that runs its framework. The giveback of QFS to open source is a win-win for all.
Kevin Hanson, Google | Strata Data Conference 2013
Kevin Hanson joins SiliconAngle's John Furrier and Wikibon's Dave Vellante inside theCUBE at Strata Conference 2013. Hanson is a Solutions Architect at 10gen. Easy to start. Easy to build. MongoDB ended 2012 on a scorching hot streak that doesn't show any signs of flaming out. Kevin Hanson, Solutions Architect at 10gen, stopped by theCube during Strata last month to talk NoSQL with hosts John Furrier and Dave Vellante. 10gen is the company behind the leading NoSQL database. And its client list? A who's who of the tech world and beyond. Ever had a car salesman show you the Carfax? Its 11 billion records database is now being powered by MongoDB. Big implementations: that is the future Hanson sees for 10gen and MongoDB in 2013. MongoDB 2.4 is set to release sometime this spring, and is further strengthening MongoDB as the leader in the NoSQL world. Announced in the release: On premises version of 10gen's Mongo monitoring service for its customers Monitoring system customers can have on their own servers cluster management -- spin clusters up/down and upgrade machines (things you need the command line for right now) Included in the 2013 plans for 1o gen's NoSQL database service is the continued effort to enable it's partners to perform cloud deployments of its service. He gave a couple examples: Red Hat Open Shift -- run MangoDB on platform as a service; Object Rocket -- MongoDB as a service (SoftLayer acquired); MongoHQ, and MongoLab. Healthier than ever, 10gen looks to further support it's developer community and continue to strengthen the service offering of MongoDB as the leader in noSQL. Interests and Distinctions for 10gen . When asked by Furrier what were the top pain points 10gen is hearing from clients, Hanson spoke about something that is going to play right into the heart of Big Data: Social Media and Social Data. He discussed how they are seeing specific use cases around user data management. "Specifically, social media accounts, variety of activity...tracking all of that in a relational database can be difficult. MongoDB has flexible schema, bring that data in real-time, don't think about it on the front end...then they can figure out how they want to query it." MongoDB is one of the biggest noSQL databases, and continues to tackle Big Data as an open source platform. Being open source, that proves to remain a big appeal from a developers prospective. In addition to being open source, Hanson shared three reasons why MongoDB is better than other NoSQL services: It gives you the features of noSQL --linear scalability, incremental growth, ability to add more nodes and you don't have to give up the query ability of a relational database. Hanson was sharp, and MongoDB is firing on all cylinders. I look forward to seeing the MongoDB 2.4 release, and how 10gen continues to improve its NoSQL open source database.
Josh Klahr | Strata Data Conference 2013
John Klahr is the Vice President of Product Management for Greenplum. Klahr sits down inside theCUBE with SiliconAngle's John Furrier and Wikibon's Dave Vellante at Strata Conference 2013. Josh Klahr, VP of Product Management, discussed the recent EMC Greenplum announcement on Pivotal HD inside theCube, live at Strata 2013. While it is true that Greenplum delayed the Pivotal HD announcement which consisted of bringing SQL to Hadoop, Klahr explained that the new release is based a year-and-a-half of technology development to bring parallel database technology to Hadoop. "It takes a long time to build scalable query processing," Klahr said, this being an area where Greenplum excelled and turned it into a differentiator. Commenting on the aggressive stance against industry rivals like Cloudera's Impala, and Hive, the Greenplum VB stated that "the intent was not to be aggressive, it was to really look at where SQL query is on Hadoop today." Through Pivotal HD, Greenplum does not intend to replace Hive, which can be a good solution to solve some problems, rather to compensate for its failure to perform when it comes to interactive analysis. Asked if Hadapt was not a player to consider when it comes to interactive analysis, Josh Klahr stated that Hadapt's architecture is probably the closest in the market to Greenplum's approach. "I just think we are further down the line in solving problems when you have to move data back and fourth," he explained, as the Greenplum database is more mature. As far as Greenplum's approach to Big Data, Klahr said the company's aim is to solve business problems that have not been solved yet. The target for what their newly launched product are those struggling with data access on Hadoop and and helping them solve their issues. BI vendors recognize the rapid adoption to Hadoop, yet they still struggle because BI is by nature interactive, that means interactive queries need to be brought into the Hadooop offering, said Khlar. Greenplum's strategy to get a bigger chunk of the market is to expand the number of use cases that can be done on top of Hadoop through Pivotal HD. Big Hadoop clusters are currently 10-20% utilized. The ability to bring computational services on top of Hadoop was missing, thus making it a focus point for Greenplum. "Our approach is to try and solve customers problems, we listen to what they say and we try to bring software" into client infrastructures that helps solve them. Greenplum focuses on offering full packages to our customers, and intends to expand SQL and Hadoop offering. Asked which software development approach will win in the future, Klahr explained proprietary or open source is not a good or bad thing for the customer. Yet everybody in the tech space that is profit oriented tries to create a proprietary software.
Shaun Connolly | Strata Data Conference 2013
Shaun Connolly is the Chief Strategy Officer of Hortonworks. He is in attendance at the 2013 Strata Conference and stops by theCUBE to enjoy a live interview with SiliconAngle's John Furrier and Jeff Kelly. Hortonworks, while having been spun off from the original Hadoop initiative at Yahoo that's driven so many of Big Data's recent victories, is relatively new as a standalone operation. And as several vendors and small players alike race to launch their own Hadoop distro, the competition is getting fierce. To best position itself in the Hadoop market, Hortonworks is taking a horizontal approach, teaming with the best in the business, including Microsoft. Hortonworks Chief Strategy Officer, Shaun Connolly stopped by theCube at Strata this week to talk Big Data, open source, and Hortonworks' unique relationship with Microsoft, as well as implementing Hadoop, with show hosts John Furrier and Jeff Kelly (full video below). Hortonworks' big news at the Strata conference is its 100-percent open source Hortonworks Data Platform. The platform is reportedly the industry's first and only Apache Hadoop distribution for both Windows and Linux, enabling organizations to run Hadoop-based solutions natively on Windows. This singular solution will provide identical user experiences and interoperability across both operating systems. With the ability to complete portability of its Hadoop applications between on-premise and cloud deployments via HDP for Windows and HDInsight Service, the announcement is being extremely well received. According to IDC, Windows Server owned 73 percent of the market in 2012, so it's not a bad dance partner to have. And Connolly believes that Microsoft has shown significant effort to play and play well in the open source space. When asked to elaborate on Hortonworks' partnership with Microsoft, Connolly points to "actions -- they've put a lot of code into Apache, contributing high value in the community...really earning their stripes...offering enterprise thought leadership." Jeff Kelly of Wikibon chimed in to further comment on Microsoft's commitment to open source, "They killed their own internal Big Data to pick up Hadoop." Venders will continue to evolve. The platforms they support and create will evolve as well. Connolly sees there being 3 horseman: mobile, cloud, and big data. He suggested that as a customer, scale back and ask, what is really important to you as a business? When analyzing new solutions focus on three key areas: Is it enterprise ready? Is the deployment approachable, meaning transition-ready? Is it economical - price makes sense (saves money)? Big Data is walking in the front door. The noise of service offerings to analyze and house Big Data is loud. There are two approaches to attack the market that vendors use: horizontally, build out a unified platform (best of bread approach) fracture the market, get a chunk of what you fractured Hortonworks practices the horizontal approach, and has proved to be the ace in its strong suit. When probed about EMC Greenplum's aggressive Hadoop launch at Strata, Connolly replied, "I think their approach is not being as respectful to the open source community as possible." Making friends in a competitive marketplace isn't easy. But Hortonworks has continued to align itself with the "best in breed," while making a strong case for its solutions.
Mark Madsen | Strata Data Conference 2013
SiliconAngle's John Furrier and Wikibon's Dave Vellante invite Mark Madsen to take a seat inside theCUBE at Strata Conference 2013 to reflect on te Big Data Space. Sometimes taking a step back and reviewing the collective is needed. In all honesty, we should probably do it more in all of the areas that affect our business and personal lives...separately of course. Such a global view reflecting on the Big Data space is exactly what happened when Mark Madsen, President & CEO of Third Nature, stopped by theCube during Strata last month to chat with show hosts John Furrier and Dave Vellante (full video below). As markets shift and new markets are defined, there is a lot of hectic course correction and innovation. Technology is searching for a solution. It's a lot more push from the incumbents than pull from the buyers. In the case of data, Big Data as an application is sneaking into other parts of an organization. This is causing massive disruption in IT departments and for IT professionals. Because of the hype of Big Data, it's also causing a lot of confusion in the market. Madsen consults to companies on both side of the table, and he see's two keys: data platforms and data processing. Processing, we're seeing the evolution of a third piece of architecture. Part 1 - Databases are for storing and retrieving data. Part 2 - OLTP is recording transactions and storing for the execution of tasks. Part 3 - The processing of data at low or high latencies, large or small scale, in real-time or in batch — something that offers you new capabilities. How exactly will the return of data processing as a discipline play out is the million dollar question. What will this new architecture mean? There are some key companies (INTEL, Oracle, IBM, etc.) that are controlling the chessboard right now. Through the acquire and implement model, of which the panel agreed leaves a lot to be desired. "Do you feel as those whales can continue to control the chessboard?" asked Vellante of Madsen. His response: There's some point where you can't just be solving things by buying product. You have to have rethinking of architectures. And those venders are not, because they have lines of business that are drawn up specifically around things. Data architecture is another buzz word for 2013. "I think trying to say that an architecture is something a vendor can sell you is a mistake. I don't think vendors sell architecture — they sell products and products fit into larger architectures." As Madsen eludes, vendors sell products and products fit into architecture schematically. Where do you draw the boundary lines between platform (architecture) and application (product)? Madsen said it best, we've had several order of magnitudes of tech change, and capabilities and software architectures haven't changed to match. Think cart before the horse. This has created an unstable market that, much like in 1992, opens the opportunity for great innovation. Multiple theaters playing show times around the clock. One thing is clear though, platform is the only show on the screen.
Lawrence Schwartz | Strata Data Conference 2013
Lawrence Schwartz is the Vice President of Marketing at Tokutek. The VP joins Jeff Kelly at Strata Conference 2013 for an interview inside theCUBE. To kick off the interview, Kelly asked Schwartz about recent developments with Tokutek's primary product: TokuDB, a system that allows administrators to scale MySQL and MariaDB while also improving insert and query speed, compression, replication performance, and online schema flexibility. Schwartz explained that data has become more random, and many companies are looking for ways to improve indexing. TokuDB offers fractal tree indexing for MySQL and MariaDB, which can help alleviate the strain that increased memory usage puts on a database system, while also increasing performance. Beyond MySQL Schwartz said that the company intends to branch out to other database systems, such as MongoDB, an open source NoSQL database that is scalable and high-performance. With TokuDB, users can achieve better insertion performance and better compression with the use of fractal tree technology. As projects grow and progress, Schwartz said, they may exceed their memory capacity and start writing to disk, which can greatly reduce performance, create query latency, and cause relay issues. Some may look to flash caching or even a completely different database system as a solution. What Tokutek offers is the ability to keep your current database but simply get better performance out of it. Finally, Schwartz highlighted some interesting use cases for TokuDB, such as analytics for online advertising, social graphs, social search, and machine data, such as the type that NASA must process in large quantities. Kelly then asked Schwartz where he expected the company to be and how it would grow over the next 6 to 12 months.
Todd Papaioannou | Strata Data Conference 2013
Todd Papaioannou, CEO of Continuuity, joined SiliconAngle's John Furrier and Wikibon's Dave Vellante at Strata Conference 2013 inside theCUBE. Big money players are entering the Big Data Hadoop marketplace. Todd Papaioannou, CEO of Continuuity, was all smiles when he stated, “it’s good for us.” A long-time friend to theCube, he stopped in to discuss with hosts John Furrier and Dave Vellante the roadblocks he sees from developers around Hadoop, and what Continuuity is doing to serve those needs. Yesterday Continuuity announced the public beta for the company’s developer suite and application sandbox enabling PaaS Big Data. As we covered on DevopsANGLE, there are two highlights of yesterday’s release: the release of Continuuity’s runtime (called AppFabric) along with the development framework to use it, and the release of a developer sandbox that permits same-as-production testing of Big Data applications on the AppFabric. “With the combination of our Developer Suite and AppFabric editions, we accelerate time to business value, slash provisioning times, and make building Big Data apps easy and fast for any developer, ” said Papaioannou. Distribution is the biggest roadblock with Hadoop. Developers and Businesses alike want to drive business insight much more quickly. Get to production, or as Papaioannou coined, “Productionize the application.” Build apps and deploy. Hadoop is only in its second year as far as an ecosystem, but Continuuity sees the execution slowly but surely taking place to explode the technology and Big Data. A quick side note: Papaioannou said that the flavor of the conference has been “I have SQL for Hadoop.” There is one big takeaway for Strata from his interview on The Cube: the big move is to slash time to business value and business insight. Big Data is truly transformational when you can expedite both of those business processes. “In the Hadoop ecosystem specifically, we bave a lot of infrastructure, but are missing those apps to take off and really build out the platform.” Clearly there was a mad dash to get in the race, but Papaioannou and Continuuity are seeing the tracks start to fall into place for Hadoop
...and the new name is, Strata Data Conference.
Subscribe to O'Reilly on YouTube: http://goo.gl/n3QSYi Follow O'Reilly on Twitter: http://twitter.com/oreillymedia Facebook: http://facebook.com/OReilly Google: http://plus.google.com/+oreillymedia
Bruno Aziza | Strata Data Conference 2013
Bruno Aziza is the Vice president of World Marketing for SiSense. Aziza joins SiliconAngle's founder John Furrier at Strata Conference 2013 for a live interview inside theCUBE. Big Data firm SiSense's VP of World Marketing, Bruno Aziza, sat down with SiliconAngle founder John Furrier on theCube at Strata this week. He starts by mentioning that SiSense has over 400 customers in its 18 month existence. They have been attributed with the ability to analyze 10 terabytes of data in 10 seconds on a $10,000 machine, highlighting their efficiency and low cost. Aziza brings out that SiSense is a complete solution that covers from database to visualization. The 10 terabytes used in their data set is comprised of both structured and unstructured data, so they are able to handle a wide variety of data sources. They start with Elasticube, which is a high performance analytical database. From there, once the data comes into the database, it automatically detects the relationships between tables and makes the linking of data seamless for the end user. The next step is creating the dashboards that will interface and provide the visualization aspects of the queries that are created. The problem that most of the customers have when they come seeking solutions is that they do not know how fast their data is going to grow, and they are not sure how to manage it all. Aziza says that the best approach to the customers is to not be concerned with the growth. The solution provided will be implemented so that it can manage the current data, and the technology that the software uses will be able to handle any additional load of data that will come in the future. Currently SiSense's software, Prism, is being used by Target to do theft detection. A company called Wix is using Prism to do behavior analytics on their data so that they can make their software product better for consumers. Aziza also mentions WeFi, a wi-fi network company, uses Prism to obtain network quality data to take back to the telecoms that they work with. The interview closes out with a brief discussion on the cloud. Aziza feels that the cloud will be mainly driven by those companies that have already gone to the cloud with their data from the beginning. This is because the porting of on premises data to the cloud is not only complicated, but costly. He points out that a company has data on premises because the data is business critical. Since the data is business critical, a company doesn't have a desire to put that data in the cloud. Overall he feels that infrastructure in the cloud has more potential than data in the cloud by itself.
Rishi Yadav | Strata Data Conference 2013
Rishi Yadav of InfoObjects sits down with SiliconAngle founder John Furrier and Wikibon Analyst Dave Vellante inside theCUBE at Strata Conference 2013. There are few better individuals to talk to about open source than Rishi Yadav. As a self-proclaimed open source purist, his company InfoObjects works with only open source software, never proprietary. Yadav sat down with SiliconAngle founder John Furrier and Wikibon co-founder Dave Vellante on our live broadcast program, theCube, to discuss his opinion on a lot of the key players in the Hadoop marketplace. Throughout the interview, Yadav explained that he thinks that there is fragmentation in the marketplace and that is slowing down innovation and more importantly purchases. He likened it to, "if everyone has their own taste, the marketplace is fragmented." One of my favorite takeaways from the interview was that "Data has gravity." He went on to explain what that means: moving it takes time. One company Yadav responded to specifically was HIVE. Once bullish on HIVE, Yadav said that HIVE once solved a big problem, but now people look at latency issues and rethink things. InfoObjects is another open source initiative that serves a breadth of companies, but Yadav says there is a common thread amont all of its clients. "Any company I talk to, they have data and they don't know if there are any KPIs." Hadoop allows for InfoObject to activate and understand all of the Big Data, Big Insights, Big Analytics a client has. InfoObject and Yadav believe that the Hadoop market is going to explode, not expand. Being a services company and an open source purist, InfoObject and Yadav are betting big on Hadoop. What exactly is an open source purist? What is the business model for open source, is it evolving? "It is interesting for a company like ours, we have 3 things: 1) training — works as a catalyst, 2) skill gap — resources, people resources, programming, 3) implementation and support."
Tim Moreton | Strata Data Conference 2013
Tim Moreton is the Chief Technology Officer of Acunu Analytics. Moreton was in attendance at the Strata Conference 2013 and joined SiliconAngle's John Furrier to talk about his Acunu Analytcs, his startup company. Tim Moreton, the chief technology officer of Acunu Analytics, hopped into theCube yesterday afternoon at this year's first Strata event to talk about his startup, and explain the value of real-time analytics for the enterprise (full video below). Acunu created a real-time data crunching engine for Cassandra, an increasingly popular distributed database available under the Apache license. The offering, which will be announced next week, offers advanced data structuring and querying capabilities that client organizations can't develop on their own. Moreton claims that the market for this solution is huge: most Fortune 500 companies have deployed Cassandra in their production environments, he says, "they're just not talking about it." The CTO goes on to explain his company's business model. He says that Acunu offers a freely downloadable distribution of the database, and makes money from selling its analytics engine as a separate, proprietary solution. The offering is compatible with both Acunu's version of Cassandra and the official Apache distribution. Why is Cassandra such a big deal? . Moreton points out that it has several advantages over Hadoop, not the least of which is speed. "Real-time insights is number one," the executive says. "Doing what you have traditionally have done in a transactional relational data or perhaps on a Hadoop system and actually moving that into a real-time framework where we talking about data coming in and being available...in seconds rather than in minutes or in hours." Acunu leverages this capability to embed insights in business processes, rather than to discover trends that may or may not be worth the effort put into identifying them. The startup's product comes with a set of customizable dashboards that display relevant data in a specific context. Moreton concludes the interview with a prediction: the collision of traditional enterprise warehousing with Hadoop and the database layer is inventible, and there's no telling how it will affect customers.
Secrets of Fire Truck Society - Strata Ignite 2013
Mick Thompson's Ignite talk, "Secrets of Fire Truck Society", at the 2013 Strata Conference in Santa Clara, California.
Korean Pop and Big Data - Strata Ignite 2013
Joyce Kim's Ignite talk, "Korean Pop and Big Data", at the 2013 Strata Conference in Santa Clara, California.
John Kreisa and Herain Oberoi - Strata 2013 - theCUBE
John Kreisa joins SiliconAngle Founder John Furrier and Wikibon Analyst Dave Vellante, at Strata Conference 2013. Inside theCUBE the trio discuss the recent Hortonworks/Microsoft collaboration of a new Hadoop Platform. John Kreisa is the Vice President of Marketing at Hortonworks. Kreisa talks about Hortonworks' decision to partner with Microsoft, and their reasoning behind it. Their goal is to make the new Hadoop Platform as broadly and quickly available as possible. The new platform makes it possible to run Hadoop on Windows servers, ultimately cornering the market by being the only data platform compatible with both Windows and Linux.
Open Date...ah - Strata Ignite 2013
Mick Thompson's Ignite talk, "Open Date...ah", at the 2013 Strata Conference in Santa Clara, California.
Self Tracker - Strata Ignite 2013
Stephen Cartwright's Ignite talk, "Self Tracker", at the 2013 Strata Conference in Santa Clara, California.
Collect Behavior Data in Real Spaces - Strata Ignite 2013
Meghan Athavale's Ignite talk, "Collect Behavior data in Real Spaces", at the 2013 Strata Conference in Santa Clara, California.
Strata Rx 2013: Tom Davenport, "Health Care Analytics: The Key Is Integration"
For more information, visit: http://strataconf.com/rx Strata Rx 2013: Tom Davenport, "Health Care Analytics: The Key Is Integration"
Strata Rx Conference 2013
http://strataconf.com/rx2013 Strata Rx focuses on using big data to drive innovations in analytics for healthcare—including advances in personalized and predictive medicine; significant cost savings; and research that points to entirely new products and markets.
Philanthropy - Strata Ignite 2013
Jason Payne's Ignite talk, "Philanthropy", at the 2013 Strata Conference in Santa Clara, California.
Ron Bodkin - Strata 2013 - theCUBE
Ron Bodkin joins SiliconAngle's John Furrier and Wikibon's Dave Vellante, inside theCUBE at Strata Conference 2013. Ron Bodkin is the Founder and CEO of Think Big Analytics. Bodkin began by revealing that Think Big had announced its seed round of raising $3 million, aimed at accelerating the company's growth. He explained that big data is moving quickly in enterprise, and many organizations are realizing how much value they can create from it. There is a tremendous amount of conversation around security and the increased interest in investment in Hadoop and big data technology. As such, Think Big has been consulting with customers and providing data, science, and engineering services to help them assemble analytic applications. According to Bodkin, there are three stages of investment in big data: * scalability and cost containment * agile analytics * business optimization The ultimate goal, he explained, is to create new value, giving one's business a competitive advantage. Think Big helps with its imagined services, helping customers foster ideas that are based on actual use cases. To accomplish this, they focus on "test and learn". They create, get feedback, and then adjust accordingly. They work with businesses and tech teams and operate within short cycles. Bodkin said the methodology of imagine services involves pulling together cross-functional teams (i.e. business executives, tactical leaders, and xperts from Think Big) all coming together to brainstorm, talk about opportunities, assess impact, study best practices, and learn from other industries. Toward the end of the interview, Bodkin spoke briefly about the metric-driven approach that creates measurable value for big data. He explained that there are different metrics for different applications and that they focus on gathering support data for devices out in the field and looking to solve problems. Before closing, Furrier asked Bodkin to share his impressions of Strata and also discuss his vision for the market over the next 12 months.
Data Problems and Psychotherapy - Strata Ignite 2013
Kris Hammond's Ignite talk, "Data Problems and Psychotherapy", at the 2013 Strata Conference in Santa Clara, California.
Jack Norris - Strata 2013 - theCUBE
Jack Norris is MapR's analytics CMO. He stops by theCUBE at Strata Conference 2013 to talk to Wikibon's Dave Vellante and SiliconAngle's John Furrier to talk about Hadoop's technology deployment platform. MapR is a leader in the use of Hadoop technology. Norris implies that many customers are moving to the cloud and need assistance in the transition. MapR is the answer they are looking for in making such a transition.
Joseph Turian - Strata 2013 - theCUBE
Joseph Turian joins Wikibon's Dave Vellante and SiliconAngle's John Furrier in theCUBE at Strata Conference 2013, where Turian talked about, you guessed it, big data, along with a few other things. The embodiment of software we can interact with. An exciting trend, when software begins helping us in our physical reality. Always entertaining and insightful, Joseph Turian, Entrepreneur, discussed a hodgepodge of Big Data, machine learning and AI with theCube host Dave Vellante during Strata last month. First topic up, deep learning. Also known machine learning, this technology is ready for prime time, according to Turian. He had a couple great examples: "Google had a very public paper where they watched YouTube videos using deep learning, and it just started [picking] up cats with out being trained on anything. Microsoft is using deep learning too," says Turian. "Microsoft had a demo. Someone was speaking and it took his speech and translated it to Chinese and then spoke it back as if he was speaking." Deep Learning and what can be extracted from the Big Data, especially in the financial and medical worlds, is fascinating to think about. Food for thought: when we realize the things we've missed or under-optimized, soon to be uncovered by deep learning technology, will the numbers tell us it's too late? It is clear that what once was simply a theory for academic play, machine learning is now in the production phase. Visual recognition, speech recognition, translation — these all carry very large implications of advancing and simplifying human interaction with machines and software. Even more exciting is that consumer products with such seamless interfacing are absolutely in the pipeline. Self-driving cars? Google Glass? iWatch? Very exciting times on the product side are in our near future. The topic of artificial intelligence was next up, and is not without its detractors. "We've been burned several times by hype around artificial intelligence," said Turian. "And people have been very leery of even the term artificial intelligence. If you were a serious scientist you would not say artificial intelligence," he says. As Vellante alluded to though, artificial intelligence is making a resurgence. It will be an interesting topic to follow over the next 14-18 months, so check back regularly to the SiliconAngle Network for more exclusive interviews and contextualization of AI, Big Data and analytics, the tech trends truly changing our world. Turian's Take on Cool Tech Collaborative consumption (Uber, Lyft, Sidecar) -- solving real problems with logistics Internet of Things -- sensors everywhere Google Glass: collaborative consumption and the Internet of Things eBay Now, Postmates -- this idea of every single thing might be physical, being optimized and made much more efficient using software Turian speaks up an exciting life, but there's still several steps in interfacing evolution we have to take. And turning every object into a connected device also raises countless questions on privacy, social interactions and how our global economy functions. At its core, what Turian is talking about is a new way to interface with life. We'll have to carefully consider the consequences of inserting software into just about everything, though we're moving into an era where data holds inherent value, democratized by accessibility. We'll have to always ask ourselves, "Is this really a better 'interface' for life?" With enthusiasts like Turian around, we hope the answer is yes.
Ken Rudin "Big Impact from Big Data"
http://strataconf.com/stratany2013/public/schedule/detail/31903 In most companies, data and analytics have historically been considered a service. However, analysts are now taking a more proactive role in driving businesses, and the more recent introduction of big data has accelerated this trend. This new world comes with a new set of best practices for leveraging big data and driving even bigger results. In this talk, Ken will discuss several of these best practices focused on getting the biggest impact from big data and driving a proactive, data-driven culture.
Edd Dumbill - Strata 2013 - theCUBE
Edd Dumbill joins SiliconAngle's John Furrier and Wikibon Analyst Dave Vellante inside theCUBE at Strata Conference 2013. Dumbill is the Principal Analyst for O'Reilly Radar and Program Chair for the O'Reilly Strata Conference. Dumbill has hosted Strata in previous years, so he is no stranger to the event. In the 2013 interview, he shares his view on Big Data and where he sees it going. Dumbill makes mention of Facebook's cold servers, (servers whose sole purpose for exsistence are to store old data namely photograph's, that will never be used or viewed again), when discussing the topic of data storage and data management. This prompted John Furrier to make mention of Intel's offering of Hadoop, which will store that type of "useless" data down to the chip level, leaving room for "performance and security.
O'Reilly Webcast: Strata Online Conference Santa Clara Preview 2013
t the end of February, Strata returns to Santa Clara for the third year. In addition to our focus on the burgeoning field of Big Data, this year Strata is diving deep into the role of design and our inexorable march towards a connected world of ubiquitous computing. In this free online conference, we'll be showcasing some of the hot topics and thought-provoking speakers who will be joining us for the event. It's your chance to see what we're covering and to find those can't-miss tracks and sessions. The Business Singularity By: Alistair Croll In this session, Alistair Croll looks at how organizations that favour cycle time over scale are edging towards a singularity, and what that means for incumbents across a wide swath of industry. Alistair Croll has been an entrepreneur, author, and public speaker for nearly 20 years. In 2001, he co-founded web performance startup Coradiant, and since that time has also launched Rednod, CloudOps, Bitcurrent, Year One Labs, the Bitnorth conference, the International Startup Festival and several other early-stage companies. Alistair is the author of three books on web performance, analytics, and IT operations, and is currently working on a forthcoming book about data-driven startups. Find him at Solve For Interesting. Zombie Diaries and Walking Vampires By: John Feland Now we know who won the presidential election and the battle between red and blue states, but a more important battle is raging throughout the world. Will the blue blooded zombies beat out the red hot blooded vampires in the war for our souls and wallets? Hear Argus Insights CEO John Feland preview his Strata talk, understand the true nature of the threats haunting our homes, and learn what you can do to prepare for the coming Zompire Apocalypse. How Operational Research Meets Artificial Intelligence By: Elisabeth Crawford Every month, Birchbox fills boxes with a mixture of beauty and lifestyle product samples. It then sends these products to its subscribers. But who gets what? In addition to physical constraints (i.e. size) and historical requirements (nobody should get the same thing twice), the company needs to maximize the reward and happiness of each box recipient. Birchbox CTO Elisabeth Crawford joins Strata chair Alistair Croll for a discussion of the challenges in product discovery, and how operational research meets artificial intelligence. Liz is the CTO at Birchbox. The Recursive Approach to Visualization By: Vadim Ogievetsky This webcast session will discuss the manifestation of the split-apply-combine principle in both data visualization and data stores. Vadim Ogievetsky will discuss the possibility of unifying the two within the same declarative language in the hopes that such an approach will produce an efficient way to explore huge datasets. Wake Up and Smell the Data By: Mark Madsen Big data is a big part of the disruption hitting this market, but not in the way most people think. It's not replacing the data warehouse, but it is changing the technology stack. It doesn't eliminate data management, but it does redefine enterprise data architecture. Big data is and isn't many things. It's important to understand which information uses are well supported and which have yet to be addressed. Otherwise you risk replacing one set of problems with another. Come to this session to hear some observations on what big data is, isn't and aspires to be. SQL and the Future of Big Data By: Tim O'Brien In this webcast presentation, Tim O'Brien explores some of the projects and products that are helping people scale without having to move entire applications to novel databases. Databases like Translattice, NuoDB, and Akiban along with Google's high-profile, internal database Spanner, point towards a larger trend of scale data without throwing away the standard features of a database. The provocative question: is NoSQL simply a temporary way point, a momentary break, as the relational database adapts and underlying concepts like consensus algorithms adapt to new realities? T A Model Strategy for Data Journalism in a Country Without Open Data By: Sandra Crucianelli and Angélica Peralta Ramos In a country where there is no open data nor a law like FOIA, there is a Data Team that is creating tools to help reporters and also citizens to analyze material and investigate important stories such as the use of public money. At La Nación, one of Argentina's leading daily newspapers, we have created a data model of journalism, based on teamwork, involving reporters and editors, to take an interest in writing stories based on data. But this is not all the story. Produced by: Yasmina Greco
Strata 2013 Startup Showcase: Gadi Bashvitz, OLSET
http://strataconf.com/ Gadi Bashvitz, Co-founder and CEO of OLSET. Strata 2013 Startup Showcase - Sponsored by Google Cloud Platform
Ely Kahn - Strata 2013 - theCUBE
Ely Kahn is Vice President of Business Development and Marketing at Sqrrl. Kahn joins Dave Vellante, Wikibon Analyst, inside theCUBE at Strata Conference 2013. During the O'Reilly Strata Conference, Dave Vellante led a discussion on security with the Vice President of Business Development and Marketing at Sqrrl, Ely Kahn, and the Chief Technology Officer at Wikibon, David Floyer. While both guests stopped by theCube to discuss security, Kahn also discussed Sqrrl's growth and progress with the Accumulo project over the last year. Dave opened the discussion by reminding us that the focus of the young company Sqrrl is the security, scalability and performance of NoSQL. Kahn stated that since their introduction on theCube a year ago, Sqrrl has since released the first version of their product, and has begun installing it in both government and big commercial companies. Kahn states that Sqrrl is really bringing something new to the table, explaining that Accumulo had been developed by the NSA in 2008 and had remained a classified project until becoming open-source in 2011. Therefore, Accumulo is still a relatively unknown face in the security market, in addition to being the only NoSQL database option where security was built in from the ground up. On current affairs, Floyer discusses his take Intel's recent announcement on their plans to help add cell-level security to Hbase. He states that Intel's serious commitment to security is a very good thing, but they have a long way to go to even put in the first levels of security, and cell-level security is , in actuality, a long-term goal. He describes how their first big step will be to really understand what is required at the chip level and be able to bring that to the application level and the operating systems. In any case, both Kahn and Floyer's discussions assure us there are big upcoming developments in the world of security.
Strata Rx Conference Boston 2013: Complete Video Compilation
http://oreilly.com/go/strata-rx13-video This complete video compilation provides an up-close view of every keynote, session, and workshop at O'Reilly's Strata Rx 2013 Conference in Boston. You'll learn the latest research, best practices, analytic approaches, and emerging tools and technologies for dealing with large amounts of data in healthcare.
O'Reilly Strata Conference & Hadoop World NYC 2013: Day 2
Syncsort's live coverage of the conference, featuring interviews with top industry leaders. Learn more at http://www.syncsort.com
#67707 youtube 00:03:56
Duncan Ross Director of Data Science at Teradata interviewed at Strata Conference London 2013
http://www.teradata.com/ Duncan has been a data miner since the mid 1990s. He was Director of Advanced Analytics at Teradata until 2010, leaving to become Data Director of Experian UK.
#65859 youtube 00:09:58
Strata 2013 - How to Interview a Data Scientist
By Daniel Tunkelang (LinkedIn) Presented at 2013 O'Reilly Strata Conference http://strataconf.com/strata2013/public/schedule/detail/27320 Slides and summary ...
#59752 youtube 00:34:45
Jeff Denworth - Strata 2013 - TheCUBE
SiliconAngle's John Furrier and Wikibon Analyst Dave Vellante, invite DataDirect Networks VP of Marketing, Jeff Denworth into theCUBE at Strata Conference 20...
#58000 youtube 00:15:02
Boyd Davis - Strata 2013 - theCUBE
Boyd Davis, VP of Marketing at Intel, at Strata 2013 with John Furrier and Dave Vellante Intel announced their Apache Hadoop distribution and the discussion ...
#55471 youtube 00:25:45
John Rauser keynote: "Statistics Without the Agonizing Pain" -- Strata + Hadoop 2014
From the 2014 Strata Conference + Hadoop World in New York City. There are two essential skills for the data scientist: engineering and statistics. A great many data scientists are very strong...
#49921 youtube 00:11:48
Distributed Environmental Data: On the Ground at the Data Sensing Lab
http://oreil.ly/DistNetData Sensors are the future of distributed data. General-purpose computing is dissipating out into the environment and becoming increa...
#47219 youtube 00:10:26
O'Reilly Strata Conference & Hadoop World NYC 2013: Day 3
Syncsort's live coverage of the conference, featuring interviews with top industry leaders. Learn more at http://www.syncsort.com.
#41439 youtube 00:02:41
POS Explorer with SAP HANA demo at Strata Conference + Hadoop World 2013
Ashish Sahu, Director Database & Technology Product Marketing, SAP, demonstrates how the POS Explorer uses SAP HANA to crunch retail customer data.
#36797 youtube 00:02:59
Intel Hadoop & SAP HANA Integration at Strata Conference + Hadoop World 2013
John Schitka, Solution Marketing Manager, Big Data at SAP, and Bala Subramanian, Big Data Chief Architect at Intel Corporation talks about Intel Hadoop & SAP...
#36796 youtube 00:01:11
Strata Conference + Hadoop World 2013: Running On-premise Hadoop as a Business
Cloud-based architectures of Hadoop have made it attractive for public cloud service providers to offer hosted Hadoop services and charge customers on a pay-...
#36795 youtube 00:47:01
O'Reilly Strata Conference & Hadoop World NYC 2013: Day 1
Syncsort's live coverage of the conference, featuring interviews with top industry leaders. Learn more at http://www.syncsort.com.
#36794 youtube 00:01:59
Mobile sculpture "Kinetic Mesh" by Stephen Cartwright at the Strata Conference 2013
Every hour since noon on June 21, 1999 Stephen Cartwright has recorded the exact latitude, longitude and elevation of his position on the earth with a handhe...
#36793 youtube 00:01:01
Soccer Predictive Analytics on SAP HANA -- Strata Conference + Hadoop World 2013
Ashish Sahu, director of database & technology product marketing at SAP, shares a demo on soccer analytics powered by SAP HANA. This demo showcases how SAP H...
#36792 youtube 00:01:44
Tim O'Reilly - Strata 2013 - theCUBE
Tim O'Reilly, O'Reilly Media, at Strata 2013 with John Furrier and Dave Vellante. Tim O'Reilly, founder of O'Reilly Media and the organizer of the Strata con...
#36791 youtube 00:19:51
John Santaferraro of Actian Interviews Hortonworks at the 2013 Strata Conference
#25090 youtube 00:06:00
Drawn CEO Bradford Stephens | Strata Conference 2013
Alex Williams talks with CEO of Drawn to Scale, Bradford Stephens at the 2013 Strata Conference. Subscribe to TechCrunch TV: http://goo.gl/eg167.
#25089 youtube 00:06:45
John Santaferraro of Actian shares how Strata Conference 2013 is different than the year before
#25088 youtube 00:00:57
Strata 2013: Eric Colson, "Committing to Recommendation Algorithms"
http://strataconf.com/ Recommendation algorithms have long been a valuable component of ecommerce. They drive incremental revenue by helping customers find w...
#25087 youtube 00:08:48
Strata Conference 2014 - Making Data Work
http://strataconf.com/sc @strataconf Join us at O'Reilly's Strata Conference in Santa Clara to see the future of big data—as well as the analytics, architect...
#25086 youtube 00:02:17
LA NACION DATA en Strata Conference 2013
#25085 youtube 00:03:22
SiSense' Bruno Aziza | Strata Conference 2013
Alex Williams talk with VP of Marketing for SiSense, Bruno Aziza at the 2013 Strata Conference. Subscribe to TechCrunch TV: http://goo.gl/eg167.
#25084 youtube 00:03:32
Strata Conference in London 2013: Gavin Starks Keynote
http://strataconf.com/strataeu2013/public/schedule/speaker/2504 With a unique background in business, technology, science and media, Gavin has broad and deep...
#25083 youtube 00:16:48
Keynote, Day 2 - Strata 2013 - theCUBE
The Strata Conference 2013 will kick off on February 26- 28, 2013 at the Santa Clara Convention Center in California. Given our love for Big Data, Strata is ...
#25082 youtube 01:31:02
Strata Conference in London 2013: Felienne Hermans
"Spreadsheets: The Ununderstood Dark Matter Of IT" http://strataconf.com/strataeu2013/public/schedule/detail/31755 Spreadsheets are used extensively in indus...
#25081 youtube 00:14:17
Getting Real With Hadoop
At this year's Strata Conference/Hadoop World 2013 event, SAS VP of Big Data Paul Kent presented several sessions about modernizing and deploying advanced data analytics infrastructures based on Hadoop. In this video, he talks about the state of Hadoop adoption among enterprises today and looks out to the big data-driven applications of the future.
#24505 vimeo 00:01:57
Hadoop's Place in the Analytics Ecosystem
At the Strata Conference / Hadoop World 2013, Samuel Kommu, technical marketing engineer at Cisco Systems, shares some of the benefits that Hadoop brings to analytics platforms that leverage next-generation hardware. Kommu looks at big data operations that required 3,500 nodes in 2009, 2,000 in 2011, and now require only 64 nodes.
#24504 vimeo 00:02:05
Big Data's Global Reach
At this year's Strata Conference/Hadoop World 2013, SAS big data vice president Paul Kent presented a session on setting up Hadoop clusters for advanced analytics. We caught up with several audience members and recorded their impressions of the presentation.<br /><br />In hearing directly from a doctorate-level Hadoop specialist, a healthcare data analyst, and a marketing executive, it's clear that big data analytics is a burgeoning field that cutting-edge companies are eager to explore.
#24503 vimeo 00:02:20
Strata Conference in London 2013: Tim Kelsey "Demonstrating The Actual Economic Value of Data"
http://strataconf.com/strataeu2013/public/schedule/detail/32752 National Health Service Tim Kelsey, National Director for Patients and Information joined NHS...
#24502 youtube 00:22:33
Strata 2013: Kate Crawford, "Algorithmic Illusions: Hidden Biases of Big Data"
http://strataconf.com/ Big data gives us a powerful new way to see patterns in information -- but what can't we see? When does big data not tell us the whole...
#24501 youtube 00:17:26
Strata Conference in London 2013: Francine Bennett "Data Nerding in Public Health"
http://strataconf.com/strataeu2013/public/schedule/detail/31197 There is a huge amount of data available from the NHS and other sources about public health i...
#24500 youtube 00:12:42
Strata Conference in London 2013: Doug Cutting "The Future of Data"
http://strataconf.com/strataeu2013/public/schedule/detail/32994 As technology further pervades enterprises, each generates more data. Once harnessed, this da...
#24499 youtube 00:15:00
Strata Conference in London 2013: Duncan Ross "The Analytical Imperative"
http://strataconf.com/strataeu2013/public/schedule/detail/33000 Big data has proved it's worth in a number of industries, but it's not the size or the storag...
#24498 youtube 00:11:53
Strata Conference in London 2013: Mark Madsen Perception is Key: "Telescopes, Microscopes and Data"
http://strataconf.com/strataeu2013/public/schedule/detail/32351 We hear stories of how big data is unprecedented and about the latest disruptive products to ...
#24497 youtube 00:25:43
Google's Julia Ferraioli | Strata Conference 2013
Alex Williams talks with Julia Ferraioli of Google at the Strata Conference. Subscribe to TechCrunch TV: http://goo.gl/eg167.
#24496 youtube 00:05:27
Strata Conference 2013 -- The Great Debate: Design vs. Math
The Great Debate series returns to Strata: Design vs. Math Skytree was selected to team up with LinkedIn on the side of Math, going up against O'Reilly Media...
#24495 youtube 00:46:16
Strata Conference in London 2013: Julie Steele "Storytelling in the Age of Big Data"
http://strataconf.com/strataeu2013/public/schedule/detail/32998 Julie Steele is the Content Editor for Strata at O'Reilly Media. She is co-author of Beautifu...
#24494 youtube 00:13:53
Strata Conference in London 2013: Max Ogden "Introducing Dat: If Git Were Designed For Big Data"
http://strataconf.com/strataeu2013/public/schedule/detail/32390 Cloning datasets locally, munging them into the format that you need and then indexing and qu...
#24493 youtube 00:10:47
Strata Conference Santa Clara 2013: Complete Video Compilation
Didn't make it to Strata Santa Clara 2013? No problem. This complete video compilation puts you front and center at every keynote, session, and tutorial from...
#24492 youtube 00:01:52
Strata 2013: Scott Yara, "Hadoop: The Foundation for Change"
http://strataconf.com/ Hadoop is the engine powering the Big Data era, an unstoppable force boasting massive investments and a rich ecosystem. But this is on...
#20163 youtube 00:14:11
Strata Conference 2013 -- Real-World Machine Learning on Big Data: Which Methods Should You Use?
Skytree's CTO and Co-Founder, Alexander Gray, PhD, was selected to present on the Data Science track at O'Reilly Strata Conference—a leading industry confere...
#17541 youtube 00:43:15
Strata Conference + Hadoop World 2013
http://strataconf.com/stratany2013 Strata + Hadoop World is where big data's most influential decision makers, architects, developers, and analysts gather to...
#14917 youtube 00:01:39
Strata 2013; Joydeep Das, "Grafting Hadoop and SAP HANA Together"
http://strataconf.com/ Hadoop and SAP HANA are taking the world by storm. SAP HANA is the fastest growing commercial database in the market, being adopted by...
#12460 youtube 00:07:42
O'Reilly Strata Conference 2013
strataconf.com O'Reilly Strata: Making Data Work The O'Reilly Strata Conference in Santa Clara sells out every year because we bring together the best minds in data to explore the complex issues shaping big data—and the exciting ways that big data, data science, and pervasive computing will change the way we do business and the way we live. The call for speaker proposals has closed. Thanks to all who submitted propsals to present at Strata Conference in Santa Clara 2013. We will be notifying proposers by late October. Conference Topics Real world big data and data science case studies Data science: the profession and practice Hadoop: best practice, and what's coming next Data engineering, infrastructure and databases Analytics, predictive modeling and machine learning Real time and interactive analytics Location: geodata, mapping, mobile and location-based services Data driven business: using data and technology to tackle business problems Visualization, communication and story-telling Bringing BI into a big data world Policy, ethics and privacy Internet of things, ubiquitous computing and augmented reality Important Dates Call for Proposals ends October 7, 2012 Proposers notified by late October 2012 Registration opens October 2012
#6782 youtube 00:03:19
Ben Goldacre keynote Strata Conference London 2012
strataconf.com Data is great. Data is powerful. But when some data is missing, bias can be introduced, distorting the overall picture. Randomised controlled trials are the best tool we have in medicine for finding out if a treatment works or not, and lots of trials are done. But unfortunately, the results of these trials can go missing in action after they are completed, and trials with "negative results" are more likely to go missing. This means we have a biased sample, overestimating the benefits of treatments. To prevent all this happening, various regulations have been passed around the world. They have not been enforced, and the problem has persisted. I'll describe a small project trying to document and prevent this problem. Ben is a best-selling author, broadcaster, medical doctor and academic who specialises in unpicking dodgy scientific claims from drug companies, newspapers, government reports, PR people and quacks. Unpicking bad science is the best way to explain good science. Bad Science (4th Estate) has sold over 400000 copies, is published in 18 countries, and reached #1 in the UK paperback non-fiction charts. His book exposing bad behaviour in the pharmaceutical industry will be published in 2012 by 4th Estate. Ben has written the weekly Bad Science Column in the Guardian since 2003. It's archived on this site along with blogposts, columns for the British Medical Journal, and other writing. There are lots of clips of Ben on telly here, and a talk at <b>...</b>
#2717 youtube 00:19:41
Kim Rees keynote Strata Conference London 2012 "The Dirty Truth about Data Literacy"
Kim Rees is a founding partner of Periscopic: www.periscopic.com, an award-winning information visualization firm. Their work has been featured in the MoMA as well as several online and print publications, including CommArts' Interactive Annual, The Information Design Sourcebook, Print magazine, and numerous websites, blogs, and regional media outlets. Periscopic's body of work was nominated for the Cooper-Hewitt National Design Awards. Kim is a prominent individual in the information visualization community. She has published papers in Parsons Journal of Information Mapping, was an award winner in the VAST 2010 Challenge, and is an advisor to the Congressional Budget Office. Kim has presented at several industry events including Strata, Wolfram Data Summit, Eyeo, VisWeek, and various data visualization groups among others. Recently she has also been an adviser on an upcoming documentary film and was the Technical Editor for Visualize This by Nathan Yau. Kim received her BA in Computer Science from New York University.
#2716 youtube 00:15:58