MachineX: The second dimensionality reduction method

Knoldus Blogs

In the previous blog we have gone through how more data or to be precise more dimensions in the data creates different problems like overfitting in classification and regression algorithms. This is known as “curse of dimensionality”. Then we have gone through the solutions to the problem i.e. dimensionality reduction. We were mainly focused on one of the dimensionality reduction method called feature selection. In this blog, we will go through the second dimensionality reduction method that we were discussing.

Unlike feature selection, feature extraction doesn’t create the subsets and then find the best one out of them, instead, it tries to create whole new features set from the existing features. For example we have the data set X = {x1, x2, x3…..xn}, after doing the feature extraction it would be something like Y = {y1, y2, y3, ….ym} where m is the new number of dimensions extracted out. If…

View original post 960 more words


Introduction about the Trade Life Cycle

Hello folks, in this blog we will learn about trade life cycle. So before going into detail, one genuine question comes to mind is What is Trade life cycle?

In the financial market, “trade” means to buy and/or sell securities/financial products. To explain it further, a trade is the conversion of an order placed on the exchange which results in pay-in and pay-out of funds and securities. The trade ends with the settlement of the order placed. All the steps involved in a trade, from the point of order receipt (where relevant) and trade execution through to settlement of the trade, are commonly referred to as the ‘trade lifecycle’.

The Trade Life Cycle mainly divided into two parts:

  1. Trading Activity
  2. Operational Activity

Trading Activity:  Under this activity, it covers all process and procedure to capture trade from the client via front office and enrich that trade so it will able to send it for operational activity. This activity has two parts.

  • Trade Execution
  • Trade Capture (Front office)

Trade Execution: In this logical step we determine the trading business channel where sellers and buyers execute trades. On the basis of business channel trading markets classified into two categories.

  1. Quote-driven Markets: In markets where market makers publicise (quote) prices at which they are prepared to buy and sell with the intention of attracting a counterparty, the market is said to be ‘quote-driven’. Examples of quote-driven markets where quoted prices are displayed on computer screens are Nasdaq (US), SEAQ (UK) and the Eurobond market trade execution typically occurs by telephone or electronically.
  2. Order-driven Markets: In markets in which orders from sellers are compared and matched with buyers’ orders electronically, the market is said to be ‘order-driven’.

    Examples of order-driven markets are: SEATS (Australia), Xetra (Germany), SETS (UK).

Trade Capture (Front office): The successful capture of a trade within a trading system should result in the trade details being sent to the back office immediately, via an interface, for operational processing. Where an STO (Securities Trading Organisation) has no trading system (nowadays a rare circumstance), the trade detail is usually recorded manually, by the trader or market maker, onto a ‘dealing slip’ this will require collection by, or delivery to, the middle office or settlement department for operational processing. Under these circumstances, the trader or market maker will need to maintain their trading position manually, keeping it updated with any new trades.

The Pictorial representation of Trade life cycle is :


In next blog, we will learn the role of operational activity in trade life cycle.

Introduction to FIX protocol for Trading

The Financial Information eXchange(FIX) protocol is an electronic communications protocol initiated in 1992 for international real-time exchange of information related to the securities transactions and markets. It was initiated by a group of institutions interested in streamlining their trading processes. These firms believed that they, and the industry as a whole, could benefit from efficiencies derived through the creation of a standard for the electronic communication of indications, orders and executions. The result is FIX, an open message standard controlled by no single entity that can be structured to match the requirements of each firm.

FIX has become the language of the global financial markets used extensively by buy and sell-side firms, trading platforms and even regulators to communicate trade information. This non-proprietary, free and open standard is constantly being developed to support evolving business and regulatory needs and is used by thousands of firms every day to complete millions of transactions.

FIX is the way the world trades and it is becoming an essential ingredient in minimizing trade costs, maximizing efficiencies and achieving increased transparency. FIX offers significant benefit to firms keen to explore new investment opportunities, it reduces the cost of market entry with participants able to quickly communicate both domestically and internationally, in addition to significantly reducing switching costs.

What FIX does for us:

  • Provides an industry standard means to communicate the information you wish to communicate electronically. This could be:
    • Indications of Interest
    • Orders and Order Acknowledgment
    • Fills
    • Account Allocations
    • News, E-Mail, Program Trading Lists
    • Administrative Messages
  • Connects us with an expanding list of buy and sell-side institutions
  • Delivers information in real-time

 The Following type of securities we can process via FIX:

  • US Equities
  • International Equities (all markets)
  • ADRs
  • Convertible Bonds
  • Futures
  • Foreign Exchange trading
  • Fixed Income

Pictorial representation of FIX system:

The given diagram shows how FIX messaging looks between Buyside/Customer and Sellside/Supplier.


FIX message format layout:

The FIX message is a combination of header, body, and trailer. That means header + body + trailer = FIX content.
Up to FIX.4.4, the header contained three mandatory fields: 8 (BeginString), 9 (BodyLength), and 35 (MsgType) tags.

Example of a FIX message: Execution Report (Pipe character is used to represent SOH character)

8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 | 11=ATOMNOCCC9990900 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=128 |

Body Length:
Body length is the character count starting at tag 35 (included) all the way to tag 10 (excluded) which is for checksum. SOH delimiters do count in body length. Tag 9 represents body length. In the above example body length of this message is 178.

The checksum algorithm of FIX consists of summing up the decimal value of the ASCII representation all the bytes up to but not including the checksum field (which is last) and return the value modulo 256. Tag 10 is used for checksum value, in the above example checksum value is 128.

In a nutshell, we can say that FIX standard is a unified stack for all trading related messaging process. We can use FIX to send order, allocation, allocation instruction and its execution related message. It also supports confirmation, acceptance and rejection type of acknowledgment messages to send a report back to the client about his order’s execution status.

To get more information about FIX, you can visit the official site.


Knoldus Bags the Prestigious Huawei Partner of the Year Award

Knoldus Blogs

Knoldus was humbled to receive the prestigious partner of the year award from Huawei at a recently held ceremony in Bangalore, India.


It means a lot for us and is a validation of the quality and focus that we put on the Scala and Spark Ecosystem. Huawei recognized Knoldus for the expertise in Scala and Spark along with the excellent software development process practices under the Knolway™ umbrella. Knolway™ is the Knoldus Way of developing software which we have curated and refined over the past 6 years of developing Reactive and Big Data products.

Our heartiest thanks to Mr. V.Gupta, Mr. Vadiraj and Mr. Raghunandan for this honor.


About Huawei

Huawei is a leading global information and communications technology (ICT) solutions provider. Driven by responsible operations, ongoing innovation, and open collaboration, we have established a competitive ICT portfolio of end-to-end solutions in telecom and enterprise networks, devices, and cloud computing…

View original post 170 more words

Transaction Management in Cassandra

Knoldus Blogs

As we are all from the Sql Background and its been ages SQL rules the market , so transaction are something favorite to us .
While Cassandra does not support ACID (transaction) properties but it gives you the ‘AID’ among it .
That is Writes to Cassandra are atomic, isolated, and durable in nature. The “C” of ACID—consistency—does not apply to Cassandra, as there is no concept of referential integrity or foreign keys.

Cassandra offers you to tune your Consistency Level as per your needs . You can either have partial or full consistency that is You might want a particular request to complete if just one node responds, or you might want to wait until all nodes respond .

We will talk here about the ways we can implement the so called transaction concept in Cassandra .

Light Weight Transactions
They are also known as compare and set transactions

View original post 777 more words

Twitter’s tweets analysis using Lambda Architecture

Hello Folks,

In this blog i will explain  twitter’s tweets analysis with lambda architecture. So first we need to understand  what is lambda architecture,about its component and usage.

According to Wikipedia, Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.

Now let us see  lambda architecture components and its detail.


This architecture is classified into three layer :

Batch Layer : The batch layer precomputes the master dataset(it is core components of lambda architecture and it contains all data) into batch views so that queries can be resolved with low latency.

Speed Layer: In speed layer we are basically do two things,storing the realtime views and processing the incoming data stream so as to update those views.It fills the delta gap that is left by batch layer.that means combine speed layer view and batch view give us capability fire any adhoc query over all data that is query=function(over all data).

Serving Layer: It provides low-latency access to the results of calculations performed on the master dataset . It combines batch view and realtime view to give result in realtime for any adhoc query over all data.

So in short we can say lambda architecture is query=function(over all data).

Now i am going to describe twitter’s tweet analysis with the help of lambda architecture.This project uses twitter4j streaming api and Apache Kafka  to get and store twitter’s realtime data.I have used Apache Cassandra  for storing Master dataset ,batch view and realtime view.

Batch Layer of project :  To process data in batch we have used Apache Spark  (fast and general engine for large-scale data processing) engine and to store batch view we have used  cassandra. To do this we have created BatchProcessingUnit to create all batch view on master dataset.

class BatchProcessingUnit {

  val sparkConf = new SparkConf()
    .set("", "")
    .set("spark.cassandra.auth.username", "cassandra")

  val sc = new SparkContext(sparkConf)

  def start: Unit ={
    val rdd = sc.cassandraTable("master_dataset", "tweets")
    val result ="userid","createdat","friendscount").where("friendsCount > ?", 500)

We have used Akka scheduler to schedule batch process in specified interval.

Speed Layer of project: In speed layer we have used Spark Streaming  to process realtime tweets and store its view in cassandra.

To do this we have created SparkStreamingKafkaConsumer which read data from kafka queue “tweets” topic and send to view handler of speed layer to generate all view.

object SparkStreamingKafkaConsumer extends App {
  val brokers = "localhost:9092"
  val sparkConf = new SparkConf().setAppName("KafkaDirectStreaming").setMaster("local[2]")
    .set("", "")
    .set("spark.cassandra.auth.username", "cassandra")
  val ssc = new StreamingContext(sparkConf, Seconds(10))
  val topicsSet = Set("tweets")
  val kafkaParams = Map[String, String]("" -> brokers, "" -> "spark_streaming")
  val messages: InputDStream[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet)
  val tweets: DStream[String] = { case (key, message) => message }
  ViewHandler.createAllView(ssc.sparkContext, tweets)

Serving Layer of Project: In serving layer basically we have combined batch view data and realtime view data to satisfy adhoc query requirement.Here is an example in which have try to analyse all twitter users who match the specify hashtag and they have follower counts greater than 500 .

def findTwitterUsers(minute: Long, second: Long, tableName: String = "tweets"): Response = {
  val batchInterval = System.currentTimeMillis() - minute * 60 * 1000
  val realTimeInterval = System.currentTimeMillis() - second * 1000
  val batchViewResult = cassandraConn.execute(s"select * from batch_view.friendcountview where createdat >= ${batchInterval} allow filtering;").all().toList
  val realTimeViewResult = cassandraConn.execute(s"select * from realtime_view.friendcountview where createdat >= ${realTimeInterval} allow filtering;").all().toList
  val twitterUsers: ListBuffer[TwitterUser] = ListBuffer() { row =>
    twitterUsers += TwitterUser(row.getLong("userid"), new Date(row.getLong("createdat")), row.getLong("friendscount"))
  } { row =>
    twitterUsers += TwitterUser(row.getLong("userid"), new Date(row.getLong("createdat")), row.getLong("friendscount"))
  Response(twitterUsers.length, twitterUsers.toList)

Finally this project used Akka HTTP for build rest api to  fire adhoc queries.

I hope, it will be helpful for you in building  Big data application using lambda architecture.

You can get source code here




Handling HTTPS requests with Akka-HTTPS Server

Knoldus Blogs

Hi guys,

In my last blogs I explained how one can create a self-signed certificate and KeyStore in PKCS12. You can go through the previous blog, as we’ll be needing certificate and keystore  for handling HTTPS requests.


Akka-HTTP provides both Server-Side and Client-Side HTTPS support.

In this blog I’ll be covering the Server-Side HTTPS support.

Let’s start with “why do we need server-side HTTPS support?”

If we want the communication between the browser and the server to be encrypted we need to handle HTTPS request.  HTTPS is often used to protect highly confidential online transactions like online banking and online shopping order forms.

Akka-HTTP supports TLS(Transport Layer Security).

For handling the HTTPS request we need to have the SSL certificate and the KeyStore. Once you have generated both you can go through the example.

In this example, you will see how easily you can handle

View original post 219 more words