Showing posts from August, 2018Show all
How-to: Run a Simple Apache Spark App in CDH 5
Apache Hive on Apache Spark: Motivations and Design Principles
How-to: Build Advanced Time-Series Pipelines in Apache Crunch
Bayesian Machine Learning on Apache Spark
Building Lambda Architecture with Spark Streaming
Getting Started with Big Data Architecture
Apache Kafka for Beginners
Calculating CVA with Apache Spark
How-to: Translate from MapReduce to Apache Spark (Part 2)
Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle
Deploying Apache Kafka: A Practical FAQ
Ibis on Impala: Python at Scale for Data Science
How Apache Spark, Scala, and Functional Programming Made Hard Problems Easy at Barclays
Interactive Analytics on Dynamic Big Data in Python using Kudu, Impala, and Ibis
Time Series for Spark: 0.2.0 Released
Progress Report: Bringing Erasure Coding to Apache Hadoop
How-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and Cloudera Search
Making Python on Apache Hadoop Easier with Anaconda and CDH
Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard
Time Series for Spark Joins Cloudera Labs
Building a Data Science Portfolio: Storytelling with Data (Part 2: Data Exploration)
Securing Apache Spark Shuffle using Apache Commons Crypto
Apache Kudu and Apache Impala (Incubating): The Integration Roadmap
Introducing sparklyr, an R Interface for Apache Spark
Resource Management for Apache Impala (incubating)
How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 1
How-to: Fuzzy Name Indexing in Apache Hadoop with Rosette and Cloudera Search
How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2
How to secure ‘Internet exposed’ Apache Hadoop
Hardening Apache ZooKeeper Security: SASL Quorum Peer Mutual Authentication and Authorization
Up and running with Apache Spark on Apache Kudu
Working with UDFs in Apache Spark
Analyzing US flight data on Amazon S3 with sparklyr and Apache Spark 2.0
How-to: Log Analytics with Solr, Spark, OpenTSDB and Grafana
Blacklisting in Apache Spark
Deep Learning Frameworks on CDH and Cloudera Data Science Workbench
Apache Impala Leads Traditional Analytic Database
Load More That is All