Knoldus Bags the Prestigious Huawei Partner of the Year Award


Knoldus was humbled to receive the prestigious partner of the year award from Huawei at a recently held ceremony in Bangalore, India.


It means a lot for us and is a validation of the quality and focus that we put on the Scala and Spark Ecosystem. Huawei recognized Knoldus for the expertise in Scala and Spark along with the excellent software development process practices under the Knolway™ umbrella. Knolway™ is the Knoldus Way of developing software which we have curated and refined over the past 6 years of developing Reactive and Big Data products.

Our heartiest thanks to Mr. V.Gupta, Mr. Vadiraj and Mr. Raghunandan for this honor.


About Huawei

Huawei is a leading global information and communications technology (ICT) solutions provider. Driven by responsible operations, ongoing innovation, and open collaboration, we have established a competitive ICT portfolio of end-to-end solutions in telecom and enterprise networks, devices, and cloud computing…

View original post 170 more words

Transaction Management in Cassandra


As we are all from the Sql Background and its been ages SQL rules the market , so transaction are something favorite to us .
While Cassandra does not support ACID (transaction) properties but it gives you the ‘AID’ among it .
That is Writes to Cassandra are atomic, isolated, and durable in nature. The “C” of ACID—consistency—does not apply to Cassandra, as there is no concept of referential integrity or foreign keys.

Cassandra offers you to tune your Consistency Level as per your needs . You can either have partial or full consistency that is You might want a particular request to complete if just one node responds, or you might want to wait until all nodes respond .

We will talk here about the ways we can implement the so called transaction concept in Cassandra .

Light Weight Transactions
They are also known as compare and set transactions

View original post 777 more words

Twitter’s tweets analysis using Lambda Architecture

Hello Folks,

In this blog i will explain  twitter’s tweets analysis with lambda architecture. So first we need to understand  what is lambda architecture,about its component and usage.

According to Wikipedia, Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.

Now let us see  lambda architecture components and its detail.


This architecture is classified into three layer :

Batch Layer : The batch layer precomputes the master dataset(it is core components of lambda architecture and it contains all data) into batch views so that queries can be resolved with low latency.

Speed Layer: In speed layer we are basically do two things,storing the realtime views and processing the incoming data stream so as to update those views.It fills the delta gap that is left by batch layer.that means combine speed layer view and batch view give us capability fire any adhoc query over all data that is query=function(over all data).

Serving Layer: It provides low-latency access to the results of calculations performed on the master dataset . It combines batch view and realtime view to give result in realtime for any adhoc query over all data.

So in short we can say lambda architecture is query=function(over all data).

Now i am going to describe twitter’s tweet analysis with the help of lambda architecture.This project uses twitter4j streaming api and Apache Kafka  to get and store twitter’s realtime data.I have used Apache Cassandra  for storing Master dataset ,batch view and realtime view.

Batch Layer of project :  To process data in batch we have used Apache Spark  (fast and general engine for large-scale data processing) engine and to store batch view we have used  cassandra. To do this we have created BatchProcessingUnit to create all batch view on master dataset.

class BatchProcessingUnit {

  val sparkConf = new SparkConf()
    .set("", "")
    .set("spark.cassandra.auth.username", "cassandra")

  val sc = new SparkContext(sparkConf)

  def start: Unit ={
    val rdd = sc.cassandraTable("master_dataset", "tweets")
    val result ="userid","createdat","friendscount").where("friendsCount > ?", 500)

We have used Akka scheduler to schedule batch process in specified interval.

Speed Layer of project: In speed layer we have used Spark Streaming  to process realtime tweets and store its view in cassandra.

To do this we have created SparkStreamingKafkaConsumer which read data from kafka queue “tweets” topic and send to view handler of speed layer to generate all view.

object SparkStreamingKafkaConsumer extends App {
  val brokers = "localhost:9092"
  val sparkConf = new SparkConf().setAppName("KafkaDirectStreaming").setMaster("local[2]")
    .set("", "")
    .set("spark.cassandra.auth.username", "cassandra")
  val ssc = new StreamingContext(sparkConf, Seconds(10))
  val topicsSet = Set("tweets")
  val kafkaParams = Map[String, String]("" -> brokers, "" -> "spark_streaming")
  val messages: InputDStream[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet)
  val tweets: DStream[String] = { case (key, message) => message }
  ViewHandler.createAllView(ssc.sparkContext, tweets)

Serving Layer of Project: In serving layer basically we have combined batch view data and realtime view data to satisfy adhoc query requirement.Here is an example in which have try to analyse all twitter users who match the specify hashtag and they have follower counts greater than 500 .

def findTwitterUsers(minute: Long, second: Long, tableName: String = "tweets"): Response = {
  val batchInterval = System.currentTimeMillis() - minute * 60 * 1000
  val realTimeInterval = System.currentTimeMillis() - second * 1000
  val batchViewResult = cassandraConn.execute(s"select * from batch_view.friendcountview where createdat >= ${batchInterval} allow filtering;").all().toList
  val realTimeViewResult = cassandraConn.execute(s"select * from realtime_view.friendcountview where createdat >= ${realTimeInterval} allow filtering;").all().toList
  val twitterUsers: ListBuffer[TwitterUser] = ListBuffer() { row =>
    twitterUsers += TwitterUser(row.getLong("userid"), new Date(row.getLong("createdat")), row.getLong("friendscount"))
  } { row =>
    twitterUsers += TwitterUser(row.getLong("userid"), new Date(row.getLong("createdat")), row.getLong("friendscount"))
  Response(twitterUsers.length, twitterUsers.toList)

Finally this project used Akka HTTP for build rest api to  fire adhoc queries.

I hope, it will be helpful for you in building  Big data application using lambda architecture.

You can get source code here




Handling HTTPS requests with Akka-HTTPS Server


Hi guys,

In my last blogs I explained how one can create a self-signed certificate and KeyStore in PKCS12. You can go through the previous blog, as we’ll be needing certificate and keystore  for handling HTTPS requests.


Akka-HTTP provides both Server-Side and Client-Side HTTPS support.

In this blog I’ll be covering the Server-Side HTTPS support.

Let’s start with “why do we need server-side HTTPS support?”

If we want the communication between the browser and the server to be encrypted we need to handle HTTPS request.  HTTPS is often used to protect highly confidential online transactions like online banking and online shopping order forms.

Akka-HTTP supports TLS(Transport Layer Security).

For handling the HTTPS request we need to have the SSL certificate and the KeyStore. Once you have generated both you can go through the example.

In this example, you will see how easily you can handle

View original post 219 more words

AWS | Cleaning up your Amazon ECS resources


In my previous blog posts on AWS (Introduction to Amazon ECS | Launch Amazon ECS cluster | Scaling with Amazon ECS | Deploy updated Task definition/Docker image), I had given an overview about what is Amazon ECS with a walk-through on how to launch Amazon ECS and then deploy sample app by creating a task definition, scheduling tasks and configuring a cluster and to scale in / scale out the same on Amazon ECS and we have also gained the knowledge on how to create a new revision for the existing task definition to deploy the latest updated docker image.

In this we will have a look on cleaning up your Amazon ECS resources that we have created so far. Once you have launched the Amazon ECS cluster and try to terminate container instances in order to clean up the resources then you won’t be able to…

View original post 281 more words

AWS | Amazon ECS – Deploy Updated Task Definition/Docker Images


In my previous blog posts on AWS (Launch Amazon ECS cluster and scaling with Amazon ECS), I had explained how to deploy sample app by creating a task definition, scheduling tasks and configuring a cluster on Amazon ECS and to scale in / scale out the same.

In this I will guide you how to create a new revision for the existing task definition to use the latest updated docker image and then to run the service with new docker image and updated revision.

You can simply update the running service, just by changing the task definition revision. When a deployment is triggered by updating the task definition of a service, the service scheduler uses the deployment configuration parameters, minimumHealthyPercent and maximumPercent , to determine the deployment strategy.

If the minimumHealthyPercent is below 100%, the scheduler can ignore the desired count temporarily during a deployment. For example, if your…

View original post 387 more words

AWS | Scaling with Amazon ECS


In my last post regarding AWS, I had explained how to launch Amazon ECS cluster including cloud formation, VPC and subnet creation,  ELB and ECS security group creation, auto scaling group, launch configuration, elastic load balancer creation with the help of sample app by creating a task definition, scheduling tasks and configuring a cluster through Amazon ECS First Run Wizard.

In this blog, I will talk about auto scaling means how to

  • Scale in / scale out EC2 instance in a cluster.
  • Scale in / scale out containers(tasks) for a particular service.

Auto scaling is very helpful, as configuring EC2 instance in a auto scaling group or deploying and managing different containers of a same micro service manually is a lot complicated. It can take a lot of time and efforts to do that, but Amazon EC2 Container Service make it easier as it provide one click auto scaling…

View original post 269 more words