<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Data Engineering on Data Science Blog</title>
    <link>https://www.pavanpkulkarni.org/categories/data-engineering/</link>
    <description>Recent content in Data Engineering on Data Science Blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Sat, 07 Jul 2018 00:00:00 +0000</lastBuildDate>
    
	<atom:link href="https://www.pavanpkulkarni.org/categories/data-engineering/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Introduction to Kafka</title>
      <link>https://www.pavanpkulkarni.org/blog/21-intro-to-kafka/</link>
      <pubDate>Sat, 07 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/21-intro-to-kafka/</guid>
      <description>&lt;p&gt;Apache Kafka is a message publishing framework that works in a distributed environment. Kafka can be scaled horizontally with high fault-tolerance.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Spark Structured Streaming - File-to-File Real-time Streaming (3/3)</title>
      <link>https://www.pavanpkulkarni.org/blog/20-structured-streaming-file-to-file-processing/</link>
      <pubDate>Thu, 28 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/20-structured-streaming-file-to-file-processing/</guid>
      <description>&lt;p&gt;In this post we will see how to build a simple application to process file to file real time processing.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Spark Structured Streaming - Socket Word Count (2/3)</title>
      <link>https://www.pavanpkulkarni.org/blog/19-structured-streaming-socket-word-count/</link>
      <pubDate>Wed, 20 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/19-structured-streaming-socket-word-count/</guid>
      <description>&lt;p&gt;Structured Streaming is a new of looking at realtime streaming. In this post we will see how to build our very first Structured Streaming app to perform Word Count over network.
&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Spark Structured Streaming - Introduction (1/3)</title>
      <link>https://www.pavanpkulkarni.org/blog/18-spark-structured-streaming-intro/</link>
      <pubDate>Thu, 14 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/18-spark-structured-streaming-intro/</guid>
      <description>&lt;p&gt;Structured Streaming is a new of looking at realtime streaming. With abstraction on DataFrame and DataSets, structured streaming provides  alternative for the well known Spark Streaming. Structured Streaming is built on top of Spark SQL Engine. Some of the main features of Structured Streaming are -&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>MongoDB Data Processing (Python) </title>
      <link>https://www.pavanpkulkarni.org/blog/17-mongo-data-processing-python/</link>
      <pubDate>Mon, 21 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/17-mongo-data-processing-python/</guid>
      <description>&lt;p&gt;This post will give an insight of data processing from MonogDB in Python.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Mongo Shell Commands - Mongo Document Queries</title>
      <link>https://www.pavanpkulkarni.org/blog/15-mongo-shell-query/</link>
      <pubDate>Wed, 16 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/15-mongo-shell-query/</guid>
      <description>&lt;p&gt;This post will introduce mongo shell and basic query operations that can be performed on mongo shell with examples.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Spark - MongoDB Data Processing (Scala) </title>
      <link>https://www.pavanpkulkarni.org/blog/16-spark-mongo-data-processing/</link>
      <pubDate>Wed, 16 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/16-spark-mongo-data-processing/</guid>
      <description>&lt;p&gt;We will look into basic details of how to process data from MongoDB using Apache Spark.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Install MongoDB</title>
      <link>https://www.pavanpkulkarni.org/blog/14-install-mongodb/</link>
      <pubDate>Tue, 15 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/14-install-mongodb/</guid>
      <description>&lt;p&gt;This post is a step-by-step guide to install MongoDB on Mac.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Detailed Guide to Setting up Scalable Apache Spark Infrastructure on Docker - Standalone Cluster With History Server</title>
      <link>https://www.pavanpkulkarni.org/blog/13-spark-on-docker/</link>
      <pubDate>Fri, 11 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/13-spark-on-docker/</guid>
      <description>&lt;p&gt;This post is a complete guide to build a scalable Apache Spark on using Dockers. We will see how to enable History Servers for log persistence. To be able to scale up and down is one of the key requirements of today&amp;rsquo;s distributed infrastructure. By the end of this guide, you should have pretty fair understanding of  setting up Apache Spark on Docker and we will see how to run a sample program.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>How to Setup PyCharm to Run PySpark Jobs</title>
      <link>https://www.pavanpkulkarni.org/blog/12-pyspark-in-pycharm/</link>
      <pubDate>Fri, 27 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/12-pyspark-in-pycharm/</guid>
      <description>&lt;p&gt;This post will give a walk through of how to setup your local system to test PySpark jobs. Followed by demo to run the same code using &lt;code&gt;spark-submit&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Spark - Cassandra Data Processing (Scala) </title>
      <link>https://www.pavanpkulkarni.org/blog/11-spark-cassandra-data-processing/</link>
      <pubDate>Thu, 26 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/11-spark-cassandra-data-processing/</guid>
      <description>&lt;p&gt;We will look into basic details of how to process data from Cassandra using Apache Spark. Data Processing from a NoSQL DB is very efficient when we use a distributed processing system like Spark in Scala&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>How to Setup Eclipse to Run Spark Programs Using Gradle as Build Tool</title>
      <link>https://www.pavanpkulkarni.org/blog/10-spark-in-eclipse-gradle/</link>
      <pubDate>Thu, 19 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/10-spark-in-eclipse-gradle/</guid>
      <description>&lt;p&gt;This post will give a walk through of how to setup your local system to test Spark programs. We will use Gradle as build tool. Additionally we will see how to run the same code using &lt;code&gt;spark-submit&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Install Apache Cassandra</title>
      <link>https://www.pavanpkulkarni.org/blog/8-install-cassandra/</link>
      <pubDate>Thu, 19 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/8-install-cassandra/</guid>
      <description>&lt;p&gt;This post will guide you through installation of Apache Cassandra.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Introduction to Cassandra Query Language (CQL)</title>
      <link>https://www.pavanpkulkarni.org/blog/9-basic-cql-statements/</link>
      <pubDate>Thu, 19 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/9-basic-cql-statements/</guid>
      <description>&lt;p&gt;This post gives a quick introduction to Cassandra using CQL.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Install Apache Spark 2.3</title>
      <link>https://www.pavanpkulkarni.org/blog/7-install-spark/</link>
      <pubDate>Mon, 09 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/7-install-spark/</guid>
      <description>&lt;p&gt;This post will guide you through installation of Apache Spark 2.3.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Install Gradle</title>
      <link>https://www.pavanpkulkarni.org/blog/6-install-gradle/</link>
      <pubDate>Sat, 07 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/6-install-gradle/</guid>
      <description>&lt;p&gt;Install Gradle on Mac system easily and quickly. This post will give a complete walk through of installing Gradle using &lt;code&gt;brew&lt;/code&gt; and manual installation&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Install Maven</title>
      <link>https://www.pavanpkulkarni.org/blog/5-install-maven/</link>
      <pubDate>Sat, 07 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/5-install-maven/</guid>
      <description>&lt;p&gt;Install Maven on Mac system easily and quickly. This post will give a complete walk through of installing Maven using &lt;code&gt;brew&lt;/code&gt; and manual installation&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Quick Introduction to Build Tools</title>
      <link>https://www.pavanpkulkarni.org/blog/4-intro-to-build-tools/</link>
      <pubDate>Fri, 06 Apr 2018 00:00:00 +0000</pubDate>
      
      <guid>https://www.pavanpkulkarni.org/blog/4-intro-to-build-tools/</guid>
      <description>&lt;p&gt;In this post we will look at a basic introduction to build tools and a few distinguishing factors between widely used build tools.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>