Our Solution

Aaum has a very good understanding in all types of analytics and more importantly its research based on customers. AAUM offers array of analytics solutions to industry and cross platform. Strong research insights cement our framework and methodologies developed as follow.

Big Data

Big Data Storage System

Big data has emerged as a key buzzword in businesses over the past year or two. The advancement of technology in the field of big data has also been tremendous. Big data practitioners such as Google, Facebook, Amazon are coming up with innovative and more advanced technology to combat the issues in big data. But big data mining and analytics are not restricted to these web giants. All sort of organisations, and not necessarily huge ones, can benefit – from finance houses interested in analysing stock market behaviour, to police departments aiming to analyse and predict crime trends. In the following illustrations we showcase various arenas of big data storage systems that is widely used in businesses to combat the issue of Big Data. Here we demonstrate the case of processing information from more than 5L customer database through various structures.

Efficient Handling

Here we use 12 GB data set of Airlines data that was used in the 2009 American Statistical Association. This file contains information on US Domestic Flights between 1987 and 2008 and has some nice properties that make it useful for different kinds of analyses. It has over 123 million rows (observations) and 30 columns containing variables of different data types including factors with lots of levels. Various factors include year, month, day of week, departure time, air time,arrival delay, departure delay etc.

Parallel Computing

Parallelisation as a concept has gained wide prominence in the past few years. Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer, one instruction after the other. Parallel computing, on the other hand, uses multiple processing elements simultaneously to solve a problem. This is accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm simultaneously with the others. The processing elements can be diverse and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above. Parallelism is best used for algorithms which are inherently sequential.

E - commerce

AB Testing

ABTESTING is a term commonly used in web development, on-line marketing, and other forms of advertising to describe simple randomized experiments with two variants, A and B, which are the control and treatment in the controlled experiment. It is very commonly used in advertising campaign that involves two different versions of a web page to see which is more effective. It serves as a method to validate that any new design or change to an element on your web page is improving your conversion rate before you make that change to your site code. A-B split has been used for mailing campaigns in the past, but has also successfully adapted for use in interactive media, for testing the effectiveness of e-mail blasts and banner advertisements.It is important to determine the right sample size to run your AB test. The required sample size depends on the baseline conversion rate required and desired sensitivity. Here we can come up with the sample size based on these parameter.

Attribution

Attribution modeling is a widely used tool in businesses that helps the marketers to understand the impact of various marketing channels have on the ROI of their businesses. The insights from this simple exercise allows the marketers to track and analyze the multiple touch points in sales and their impact on the conversion value. This in turn helps the marketers in effective credit allocation of their marketing budget on the various credit channels.Attribution modeling is highly popularized in the E-Commerce domain as on-line media is highly measurable and it is very much possible to arrive at near to accurate figures when it comes to measuring the ROI from channels on sales. Marketers evolve their strategy based on these insights instead of following a blind credit allocation strategy and hoping for the best.

Campaign Performance

Campaign Analysis helps you to track, measure, analyze and optimize the performance of the campaigns. It helps you to assess the effectiveness of campaigns and sites through systemic metrics. If you are running campaigns in multiple sites, campaign-site analysis helps you compare between a campaign running at various sites. Using this, if we find a campaign's productivity is low in a particular site, we can try another site or redistribute the campaign spending to more profitable sources. We have a Campaign category watch, where the campaigns are classified into various categories. It helps to review the campaigns, adjust or discontinue campaigns. We can check how a particular campaign is performing with respect to its Campaign Category. On the whole, Campaign Analysis helps you to keep a watch on the performance of the campaign conversions and help you meet your business goals.

Dynamic Pricing

Dynamic Pricing , also known as time-based pricing, is a form of price discrimination in which a company changes the price of a product or service depending on some set of factors. It is common among industries — such as the tourism and transportation industries — whose business increases or decreases greatly under predictable sets of circumstances. Dynamic pricing allows a business to maximize its profits because it is better able to assign prices that take into account shifting levels of demand and willingness to pay. Dynamic pricing is most effective when an industry is able to accurately predict consistent changes in demand for a product or service. Dynamic pricing can be very difficult to implement when these changes are less predictable, or if it is easy for a consumer to change his or her habits to take advantage of the product or service when the price is lower. For instance, it is difficult for a retail store to successfully implement dynamic pricing, because it would be easy for consumers to adjust their shopping habits to avoid the higher costs.

Heat Map

Businesses use digital marketing analytics for improving customer acquisitions, increasing brand loyalty, increasing ROI from their marketing channels, etc have a better competitive advantage over other businesses who have not yet ventured out into this space. The Internet search giant company Google Inc, has also created a massive impact in the field of E commerce analytics through their Google Analytics platform which provides a variety of KPI's and statistics to help digital businesses track and measure their online sales and marketing. Ad-words, E commerce reporting, real time analytics are some of the popular services provided in this platform. The following case study familiarities the user through various KPI's and methodologies to help marketers in the E-Commerce industry analyze and track sales and the returns of the marketing channels on sales. We begin with analyzing the traffic data to a popular online retail website through heatmaps. For the entire analysis click-stream data is typically used.

Identifying Persudable

Uplift modeling is one of the popular techniques to measure ROI on marketing campaigns. Uplift is measured as a method of predicting the change in individual behavior as an effect of a targeted marketing activity. In simple terms it measures true incremental impact from the effect of a targeted marketing activity. Since uplift modeling focuses only on incremental responses , it provides very strong return on investment cases when applied to traditional demand generation and retention activities. Fundamentally, the customers exposed to a marketing campaign can be segmented into the following groups: The Persuadables: customers who only respond to the marketing action because they were targeted The Sure Things: customers who would have responded whether they were targeted or not The Lost Causes: customers who will not respond irrespective of whether or not they are targeted The Sleeping Dogs: customers who are less likely to respond because they were targeted

Market Mix Model

Businesses use digital marketing analytics these days for improving customer acquisitions, increasing brand loyalty, increasing ROI from their marketing channels, etc. These businesses have a better competitive advantage over other businesses who have not yet ventured out into this space yet. While a lot of E commerce analytics tools are available online to help business track their online sales and marketing through a variety of KPI's and statistics, there is often a need to dwell the past data much deeper with sound understanding and usage of advanced analytical techniques to get better insights. One such area in the field of digital market analytics which relies heavily on the use of econometric models for decision making is Market Mix Modeling.

Optimization

Optimizationis one of the important aspects in any business. Optimization is very much important in Campaign Analysis. If you are running campaigns in multiple sites, it is necessary that you allocate ads optimally among various sites, so that it yield high returns. Here is a case of optimally allocating impressions for a particular campaign. Before that the following are some of the terms widely used in this industry: Impressions : When an ad occurs on a site, it is called as an impression. Sold Impression: It is the give rate of the client. Click Through Rate (CTR) : a KPI which is widely used to find the success/failure of the campaign. CTR = Clicks/ Impression

Recommender System

Recommender System sare used to predict the best products to offer to customers. These systems have become extremely popular in virtually every single industry, helping customers find products they'll like. Most people are familiar with the idea, but nearly everyone is exposed to several forms of personalized offers and recommendations each day.

Social Media

A Cohort, Social Media Analytics involve the process of extracting relevant information from the hordes of conversations available on the Internet, to understand people's sentiments and preferences better and leverage it for social purposes, political movements and for business solutions. Social media analytics is a powerful tool for uncovering customer sentiment dispersed across countless online sources. The analytics allow marketers to identify sentiment and identify trends in order to accommodate the customer better.

Finance

Catastrophe Modeling

Catastrophe modeling (also known as cat modeling) is the process of estimation of losses that could be sustained due to a catastrophic event such as a hurricane or earthquake. Cat modeling is especially applicable to analyze risks in the insurance industry and actuarial sciences. There are 3 components in the building of a catastrophe model:

  • EVENTS (or hazard)
  • DAMAGE (or vulnerability)
  • LOSS CALCULATION via financial model

The cat model is the analysis of the vulnerability of exposures to catastrophic risk. The exposure data can be categorized into three basic groups:

  • Information on the site locations, referred to as geocoding data(street address, postal code)
  • Information on the physical characteristics of the exposures(construction, occupation/occupancy, year built, number of stories,number of employees,etc)
  • Information on the financial terms of the insurance coverage (limit, deductible)

The output of the catastrophe model is the estimation of losses predicted for a certain set of events With the loss distribution, the probable maximum losses (PMLs) and average annual losses (AALs) are calculated. The modelling is done for a catstrophic windstorm over United Kingdom.

Credit Scoring

Credit Scoring is a numerical expression to represent a person or an entity's creditworthiness.

Credit assessment is based on a host of information pertaining to the individual's credit, personal info, banking statements, credit info, etc. Sound statistical analysis of the consumers’ past payment records helps in making qualified decisions on his future credit requirements, such as whether to grant or deny loan. Companies use credit score to identify the risk of delinquencies and losses , which enable them to better allocate costs and to make decisions much faster. Devising a scoring framework would elucidate objective with right approach to modeling , rules formulation & statistical tools selection.

There are various approaches that one can adopt for credit scoring. Suitability of a model depends on data availability, data characteristics and the organization's comfort. Techniques like LDA and Logistic Regression are very simple and easy to estimate. These parametric models works well with categorical data and thus have gained huge populairty. There are also other non parametric models like Decision Trees, Random Forests which are intutive and easy to explain to the management. These methodologies employ binary trees for classification. In our credit scoring framework we have adopted an integrated approach which uses the effects of both parametric and non parametric methods such as logistic regression and random forests.

The data set considered here contains 1000 observations and 20 attributes of both continuous and categorical in nature.

Customer Segmentation

Relationship marketting is one of the most important concepts in a sensitive industry such as retail banking. Relationship marketing is, however, more interested in enhancing the existing customer relationship and this generates a need for a better understanding of the existing customer base. Customer base needs to be analyzed and segmented accordingly to make the most out of relationship marketing. The following case demonstration performs customer segmentation for a retail banking database to understand and analyze it better for effective relationship marketing. The framework here is only based on two parameters of Relationship Volume and Relationship Revenue. Segmentation can also be based on and changed to other paarmeters like customer profitability, default rate, other psychological attributes, etc based on the business requirement.

EVT

EXTREME " events matter tremendously in the world of finance. An extreme market move might represent a significant risk to the portfolio of an investor. Extreme value theory (EVT) yields methods for quantifying such extreme events and their consequences in a statistical way. Thus the EVT provides the probability of events that are more extreme than any previously observed. It is widely used in disciplines such as structural engineering, finance, earth sciences, traffic prediction, and geological engineering.

EXAMPLES : Extreme Value Theory may be used to estimate huge losses faced by an insurance company as the effect of disasters like fire accident.

Finance

Churn rate refers to the number of customers moving out of the business over a specific period of time. It determines the steady state level of customers in any business. It involves identifying those consumers who are most likely to discontinue using your service or product.

When a customer leaves, the company loses not only the future revenue from this customer but also the resources spent on customer acquisition. Hence it is vital to make sure that the churn rate in any business would not exceed the growth rate of new customers. Churn is closely related to the concept of average customer life time

For example, if 1 out of every 20 clients of financial institution didnot subscribe to a term diposit every year, the churn rate for that financial institution would be 5%. Here we are going to determine the churn rate of one such financial institution using Logistic Regression.

The data set considred contains 17 variables worth of information about customers, along with an indication of whether or not that customer churned(didnot subscribe to term diposit).

Finance Networking

Financial market is known for its connectivity among themselves, so naturally collapse of one institution would have spill over effect on the other linked institutes. However significant changes may occur in times of crisis: the number and the market share of active players change dramatically, and the role of the players may also change.

Financial network even in their limited state constitutes an extremely rich and informative data set which can help us to get insight into the detailed microstructure of the market under investigation. With given dataset it is used to understand the basic repercussion of the global slowdown and to discover the significant structural changes due to the crisis caused by Lehman Brothers.

As financial networks continue to be more and more interconnected, identifying the important institutes (whose failure could induce entire market to collapse) becomes more crucial for both regulators and investors. This financial network helps us to sort out the list of important institution whose collapse could trigger entire market instability.

Portfolio Optimize

Portfolio optimization is the process of choosing the proportions of various assets to be held in a portfolio, in order to make a portfolio better considering the expected rate of return and return's risk. We have used 'Efficient Frontier' to optimize a portfolio.

Efficient Frontier:
The efficient frontier is a concept in 'modern portfolio theory'. A combination of assets, i.e. a portfolio, is referred to as "efficient" if it offers the highest expected return for a defined level of risk or the lowest risk for a given level of expected return.

Unstructured Data

The world is characterized by data explosion with companies reporting an average of 40% increase in in-house data in the recent past. Much of the data that is available is in an unstructured format making it difficult to format and evaluate. It becomes very important to tap the wealth of information that is hidden in these text formats to reveal meaningful insights from it. Text mining is the technique used to process and extract the hidden information from unstructured text.

Wealth Index

The banking sector is one of the most vulnerable sectors in the industry to undergo fraudalent activities. Fraudalent loan applications is a very common phenomenon these days. Individuals use false information to hide a credit history filled with financial problems and unpaid loans to corporations using accounting fraud to overstate profits and income in order to make a risky loan appear to be a sound investment for the bank.

Pharma

Cohort Study

We live in a data-driven world, with data coming in as streams of numb ers, text data, image data and voice data.With the help of analytics this data can provide meaningful alerts, decision support and process improvements. All this have the potential to dramatically impact the success of a healthcare organization.

Epidemic Modelling

Epidemic Modelling can be described on a wide perspective as the mathematical modeling of the spread of infectious disease.Mathematical modeling on the spread of seasonal influenza has been studied in this application.

Studying the basic dynamics of the disease spread we come across something called the SIR model. The SIR model stands for these three compartments S = number susceptible, I =number infectious, and R =number recovered (immune).Considering a finite population, some infected people are first introduced in the susceptible population,where they interact with the members of the susceptible population on an unbiased manner (homogeneously).Thus on an average these infected members infect X other healthy members from the susceptible population and so on. In epidemiology, X is referred to as the Basic Reproduction Number (denoted by R0)Thus a scenario develops where the infectious disease spreads on an exponential basis.Three mutually exclusive cases can be derived thereafeter.

  • If R0 > 1 then the disease spreads in the population.
  • If R0 < 1 then the disease doesnot spread in the population.

Different strains of infectious disease have different R0’s,transmission rate and recovery rate.

On an average time spend by a person in the infected class is 1/k days, and is usually estimated from observational studies of infected people.For flu 1/k is around 3 days.

Infection Spread Analysis

Influenza Virus or common flu is one of the most widespread diseases, hence an analysis of viral diseases on a cell-to-cell basis would enable a better understanding of this disease. This analysis can be conducted for any infectious disease with the same dynamic model.

Therapeutic Drug Analysis

Pharmacokinetics is a branch of pharmacology which studies the time course of drug absorption, distribution, metabolism and excretion known as the ADME process. The primary goal of pharmacokinetics is to observe the motion of a given dose of drug since its administration and find its concentration in the blood plasma at different time points. Pharmacokinetics helps clinical trials and Therapeutic Drug Monitoring (TDM) to come up with effective drug dose regimen for the population.

Retail

Cross Selling

Using association rules our analysts have built an analytical engine to produce insights for your retail organization

Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understandthe purchase behavior of customers.

the purpose of cross-selling and up-selling, the influence of sales promotions, loyalty programs, store design, and discount plans.

Market basket analysis might tell a retailer that customers has often purchased shampoo and conditioner together, so putting both items on promotion at the same time would not create a significant increase in profit, while a promotion involving just one of the items would likely drive sales of the other.

Customer Attrition

Churn rate refers to the number of customers moving out of the business over a specific period of time. It determines the steady state level of customers in any business. It involves identifying those consumers who are most likely to discontinue using your service or product.

When a customer leaves, the company loses not only the future revenue from this customer but also the resources spent on customer acquisition. Hence it is vital to make sure that the churn rate in any business would not exceed the growth rate of new customers. Churn is closely related to the concept of average customer lifetime.

For example, if 1 out of every 20 subscribers to a telecom connection discontinues his or her connection every year, the churn rate for that telecom provider would be 5%. Here we are going to determine the churn rate of one such telecom provider using Logistic Regression.

The data set considered contains 13 variables worth of information about 3333 customers, along with an indication of whether or not that customer churned(left the company).

Customer Engagement

A Cohort is a group of people who share a common characteristic over a period of time. Cohort analysis helps us to measure engagement over time.

This kind of an analysis tells us exactly whether user engagement is actually getting better over time or is it only appearing to improve because of high growth numbers of the new users. It basically helps us to separate growth metrics from engagement metrics. In a user engagement cohort analysis, we group people based on their registration date.We then investigate how each cohort stays engaged over time, comparing them against each other.

Customer Segmentation

RFM (recency, frequency, monetary) analysis is a marketing technique used to determine quantitatively which customers are the best ones by examining how recently a customer has purchased (recency), how often he purchases (frequency), and how much he spends (monetary). It is effectively used as a technique to segment customers.

The fundamental principle underlying RFM analysis is that customers who have purchased recently, have made more purchases and have made larger purchases are more likely to respond to an offering than other customers who have purchased less recently, less often and in smaller amounts.

Forecasting

Businesses rely on forecasts of sales to plan production, justify marketing decisions, and guide research. An efficient method of forecasting one variable is to find a related variable that leads it by one or more time intervals. The closer the relationship and the longer the lead time, the better this strategy becomes. The trick is to find a suitable lead variable.

Time series models. ARIMA Residuals have zero mean and constant variance. Observations are correlated with one another and the correlations do not change with time. GARCH Residuals have zero mean and constant variance. Observations are correlated with one another.

Arima Models
The ARIMA model has two components:

AR component which is given by, yt=a1yt-1+et where, yt = Yt-Y ; yt-1 is the series in the previous period; a1 is the lag 1 autoregressive coefficient; et is the noise or residual which is assumed to be random in time & normally distributed. MA component which is given by, yt = et + c1et-1 where, c1 is the first order moving average coefficient.

ARIMA(p,d,q) model includes AR as well as MA parameters and explicitly includes differencing. In general, ARIMA models are used when the data series in non stationary (that has a variable variance and a mean).

The following example forecast the likely demand over the next few months for all sectors of the fuzzy-drink industry. A variation on the strategy of seeking a leading variable is to find a variable that is associated with the variable we need to forecast and easier to predict.

Fraud Detection

Fraud Detection is an important area of potential application of data mining techniques, given the economic and social consequences that are usually associated with these illegal activities. From the perspective of data analysis, frauds are usually associated with unusual activities that are supposed to be deviations from the normal.

From the perspective of data analysis, frauds are usually associated with unusual activities that are supposed to be deviations from the norm.

Identifying Persudables

Uplift modeling is one of the popular techniques to measure ROI on marketing campaigns. Uplift is measured as a method of predicting the change in individual behavior as an effect of a targeted marketing activity. In simple terms it measures true incremental impact from the effect of a targeted marketing activity. Since uplift modeling focuses only on incremental responses , it provides very strong return on investment cases when applied to traditional demand generation and retention activities.

Fundamentally, the customers exposed to a marketing campaign can be segmented into the following groups:
The Persuadables: customers who only respond to the marketing action because they were targeted
The Sure Things: customers who would have responded whether they were targeted or not
The Lost Causes: customers who will not respond irrespective of whether or not they are targeted
The Sleeping Dogs: customers who are less likely to respond because they were targeted

Promotion Effectiveness

Propensity literally means 'natural tendency to behave in a particular way'. propensity modelling is a statistical matching technique that attempts to estimate the effect of a treatment on the ROI under some predefined conditions.The outcome difference is obtained by comparing the treated and untreated under the same predefined conditions.

Suppose, we send 'loyalty coupon' to some customers to see if that impacts conversions. Of the 10k customers who received a coupon, 7000 purchased(70%); And of the 22k customers who didn't receive a coupon, 8800 purchased(40%). This concludes that the treatment(loyalty coupons) has a positive effect on the customers, increasing the conversions by 30%.

Recommendation

Recommender Systemsare used to predict the best products to offer to customers. These systems have become extremely popular in virtually every single industry, helping customers to find products they'll like. Most people are familiar with the idea, but nearly everyone is exposed to several forms of personalized offers and recommendations each day.

Unstructured Data

The world is characterized by data explosion with companies reporting an average of 40% increase in in-house data in the recent past. Much of the data that is available is in an unstructured format making it difficult to format and evaluate. It becomes very important to tap the wealth of information that is hidden in these text formats to reveal meaningful insights from it. Text mining is the technique used to process and extract the hidden information from unstructured text.

Travel

Customer Engagement

A Cohort is a group of people who share a common characteristic over a period of time. Cohort analysis helps us to measure engagement over time.

This kind of an analysis exactly tells us whether user engagement is actually getting better over time or is only appearing to improve because of high growth numbers of the new users. It basically helps us to separate growth metrics from engagement metrics. In a user engagement cohort analysis, we group people based on their registration date.We then investigate how each cohort stays engaged over time, comparing the cohorts against each other.

Customer Lifetime Value

Most firms tend to focus on either cost management or revenue growth. When a firm adopts one of these approaches it loses out on the other. What is needed is an approach which balances the two, creating market-based growth while evaluating profitability and ROI. Optimal allocation of resources and efforts across profitable customers and cost effective and customer specific communication channels (marketing contacts) is the key to the success of such an approach. This calls for assessing the value of individual customers and employing customer level strategies based on customers’ worth to the firm. But what is the value of a customer? Can customers be evaluated based only on their past contribution to the firm ?

Customers’ value has to be based on their contribution to the firm across the duration of their relationship with the firm. In simple terms, the value of a customer is the value the customer brings to the firm over his/her lifetime.

Customer Segmentation

RFM (recency, frequency, monetary) analysis is a marketing technique used to determine quantitatively which customers are the best ones by examining how recently a customer has purchased (recency), how often they purchase (frequency), and how much the customer spends (monetary). It is effectively used as a technique to segment customers.

The fundamental premise underlying RFM analysis is that customers who have purchased recently, have made more purchases and have made larger purchases are more likely to respond to your offering than other customers who have purchased less recently, less often and in smaller amounts.

Dynamic Pricing

Dynamic Pricing , also known as time-based pricing, is a form of price discrimination in which a company changes the price of a product or service depending on some set of factors.

It is common among industries — such as the tourism and transportation industries — whose business increases or decreases greatly under predictable sets of circumstances. Dynamic pricing allows a business to maximize its profits because it is better able to assign prices that take into account shifting levels of demand and willingness to pay.

Dynamic pricing is most effective when an industry is able to accurately predict consistent changes in demand for a product or service. Dynamic pricing can be very difficult to implement when these changes are less predictable, or if it is easy for a consumer to change his or her habits to take advantage of the product or service when the price is lower. For instance, it is difficult for a retail store to successfully implement dynamic pricing, because it would be easy for consumers to adjust their shopping habits to avoid the higher costs.

Margin Analysis

Calculating and tracking various profit margins reflects how efficiently a firm uses its resources. The term "margin" can apply to the absolute number for a given profit level and/or the number as a percentage of net sales/revenues. Margins allow investors to judge, over time, management’s ability to manage costs and expenses and to generate profits. Management’s success or failure determines the company’s profitability. Strong sales growth is meaningless if management allows costs and expenses to grow disproportionately. Margin Analysis helps in understanding the percentage of margin various stake holders are getting through the business.

BI

BI

Mobile-Social - Reporting application developed for Mobile-social project

Travel - Reporting solutions for travel industry

Retail - Reporting solutions for Retail industry

Textiles - Reporting solutions for Textile industry done for one of our Geni client.

BHEL - Reporting application interface developed for BHEL (Gvt org)

NREGA - Reporting application developed for Tamilnadu Goverment for Nrega project