Gaining Infinite Insight with KXEN
by Nigel Magson & Andy Hindson of Adroit Data & Insight.
19 June 2012: We were delighted at being given the opportunity to review KXEN. In analyst terms it’s a bit like being given the chance to drive a Ferrari, so not one we were going to turn down. Andy has been using some of KXEN modules for a number of years, whilst the rest of the team are used to analytic tools such as SAS/ SPSS/FastStats or Smartmodeller. Read on for our overview of one of their key products, InfiniteInsight, and forgive us if we sound too much like Jeremy Clarkson.
KXEN (originally for Knowledge Extraction Engines) is a market leading predictive analytics and data mining software business, they have sales offices around the world and are headquartered in San Francisco with Research and Development based in Paris. They are recognised by the key players including; Forrester and Gartner.
The main markets they service are Financial Services, Telecoms and Retail where customer and data volumes are high, and consequently the opportunity to add value through predictive analytics is greatest. These are also the organisations which tend to have large analytical teams and similarly sized budgets to support them and the tools they require. Consequently, KXEN’s product development direction has been in support of these key markets including the development of the (Modelling) Factory-the deployment and configuration of automated models that run in real-time on organisations systems infrastructure. They readily admit they’re not the tool for small datasets.
KXEN positions its tools as key enablers for maximum productivity by data-mining specialists and business analysts alike. More recently they have developed InfiniteInsight Genius which they claim puts data mining capability into the hands of Marketers. It achieves this by simplifying the modelling process through the use of a GUI that guides the Marketer through the modelling process.
KXEN also provides an API for their predictive analytical engine which has been widely adopted by Marketing Services Providers (MSPs) including; Alterian, Experian and Neolane, and the Database and Business Intelligence community including; Oracle, Teradata and Sybase.
InfiniteInsight Product Set
InfiniteInsight is the name that describes and prefixes all the KXEN family of modelling products. The version available (version 6.0.0) for this review had the following modules;
The KXEN modules that will not be reviewed here are;
InfiniteInsight Scorer module is self-explanatory and enables the KXEN User to apply the model(s) they have built in a number of ways including; scoring directly onto the database or enterprise class, which is the integration of the score back onto an Organisations operational systems e.g. Call Centre or Web Site.
KXEN has invested in this area and supports all the major databases (Teradata, Sybase, Netezza, SQL Server, IBM DB2, etc) and the main statistical modelling packages (SAS and SPSS).
InfiniteInsight Factory is primarily aimed at the high end of the predictive analytics market and enables the full automation or industrialisation of modelling processes which are configured and deployed to run 24/7.
KXEN customers deploy InfiniteInsight for a variety of predictive analytics tasks including optimization of the customer lifecycle; acquisition, cross-sell and up-sell campaigns and customer retention. Additionally, in Financial Services it can play a key role in reducing risk and fraud, and within Telecoms the focus is churn management, social network analysis and cross-sell of further services.
Look and Feel
We felt that the InfiniteInsight GUI feels dated and would benefit from an overhaul with due consideration given towards incorporating a sleeker modelling workflow. Navigation isn’t always straightforward as the menu naming isn’t obvious. This said, like most software once you know what you’re doing you forget about this. On the plus side InfiniteInsight does provide a rich set of features and options that can be tailored and tweaked to address the nuances presented by different data scenarios.
Start InfiniteInsight and you’re presented with the Modelling Assistant pictured above.
KXEN has recently beefed up the Explorer module that enables the Analyst to create the analysis or modelling dataset. This includes some perplexing function names (probably a legacy from the French translation?) such as; data manipulation which is functionality to merge datasets together and Perform an Event Log Aggregation which is actually the process of aggregating child data to the parent table e.g. summing transaction values for customers into a new table. This function does possess some powerful functions for creating date based aggregations across potential time periods that may be of interest for modelling; years, quarters, months and days. Again this is fine once you’ve got used to it.
It should be noted that data manipulation creates SQL code that has to be executed on the source database, so creation of the analysis dataset is external to KXEN InfiniteInsight, although the actual modelling process is internal.
There are some useful features available within Explorer for defining multiple modelling variables to build concurrently through the use of wildcards but I suspect that many Analysts will opt for preparing their modelling dataset using different tools.
The Social component naming is slightly misleading as it has nothing to do with social media such as LinkedIn, Facebook or Twitter. It provides functionality to create links and map relationships within transactional data and display these networks of influence. This is very powerful and it’s primarily aimed at the Telecoms market where the nature and volume of their data supports its use, but we could see that it would have applications to areas, such as social media if the supporting data was available.
The Toolkit allows the User to review and visualise existing datasets (Open the Data Viewer). Transfer a data source to another location or format (Perform a Data Transfer) or export a list of distinct values (List Distinct Values in Data Set). The final option is to generate statistics on the variables in the data set (Get Descriptive Statistics for a Data Set). Again, apart from perhaps the Descriptive Statistics option I suspect that the Analyst will be using different tools for the basic data processing tasks available.
Modeler is the key module for defining and building the different types of modelling scenarios and the focus of the rest of the review. The first task is to define the dataset you wish to build a model upon. If a modelling dataset has been previously defined then this can be selected or a new dataset can be chosen, or Explorer can be used to define the data. Each variable in the data set is defined as to its type; nominal, ordinal or continuous, and this type dictates how the variable will be treated and encoded during modelling.
Modelling dataset defined, the Analyst selects which type of model they want to create; Classification/(Ridge) Regression, Clustering, Times Series or Association Rules (Next Best Offer). I’ll use the regression model to illustrate how Modeler works; a target variable is selected along with a set of explanatory variables.
At the heart of Modeler are a set of algorithms which have harnessed Structured Risk Minimisation (SRM). SRM delivers efficiencies to the modelling process regards the Analysts time as they do not need to worry about;
- Number of explanatory variables they present to the model
- Concerns about multicollinearity
- Making any assumptions about explanatory variables distribution
- Concerns regards missing variables
As a consequence the modelling process is faster and more efficient, as all the above are time consuming activities if checked and validated from first principles.
The question is therefore, how does InfiniteInsightTM Modeler achieve this? The answer is that much of the modelling grunt work is automated. In a traditional approach to modelling, time spent preparing data would account for 40-60% of the time, however, this is drastically reduced as Modeler automatically encodes data as it is loaded. For instance, continuous variables are assigned to 20 “bins”, each containing 5% of the data, this is configurable but is generally left as is.
The modelling dataset is also automatically split based upon the cutting strategy into Estimation, Validation and Test. Various cutting strategies are available and the help provides guidance as to which may be most appropriate for your particular modelling scenario, there is also the option to configure your own specific cutting strategy.
Estimation generates the different models, Validation will select best model among those generated, incorporating selection of only those explanatory variables which make a significant contribution and Test will verify the performance of the selected model on unseen data. This is the “hold-out” data and enables the calculation of the models robustness.
A model diagnostics report is created which is easy to interpret with experience, the two key measures being Ki (Model Quality) and Kr (Model Robustness). Additional information is on hand, including contributions by explanatory variables so it is simple to see which variable is contributing most to the model and, of course, the obligatory gains curve.
We didn’t pursue the analysts track test to see whether we could build a better model in a conventional tool. The point is really irrelevant. KXEN will build great models given the right data. It will do this faster than a conventional analyst ever could. It can update & redeploy faster, and it can do it over multiple models. Case closed. If you only ever build a couple of models and can spend plenty of time over them, don’t bother with KXEN.
Picking up our car analogy. As one of the supercars of the analytics world of course we want one, despite and because of its occasional quirks. KXEN software is an excellent addition to the customer insight team’s toolbox where it would comfortably sit alongside traditional analytical software to explore features discovered in InfiniteInsight and tools to build and engineer modelling datasets. As an overall component of your CRM architecture, it will increase the speed and efficiency of the Modelling / Analysis team enabling them to quickly understand the feasibility of whether a particular scenario can be modelled.
KXEN commercials place it at the high end of analytical engines, but if you are considering this, a bit like a supercar, price won’t be your only interest. By recognising the commercial benefits of delivering powerful models fast, from speed and reliability of updating, through to ease of deployment, a convincing business case could be constructed that would counter the clearly higher price tag of such a solution. The purchase decision for InfiniteInsight won’t be made on the GUI or its data preparation functionality. What it will be judged on is its ability to build quality models that deliver efficiencies and demonstrable ROI. Modeler delivers this through the algorithms at the heart of the product which have harnessed Structured Risk Minimisation (SRM). With KXEN software installed and operating, the business might find itself in a position to decide whether it wants more productivity from the existing modelling team or achieve the same but with a smaller team. Hopefully for the analysts out there the former!
KXEN’s product positioning, claims that it is aimed at marketers but our experience is that marketers are not the users of such tools. We see this as still very much the domain of the marketing analyst or statistician who will be tasked with the modelling work as marketers do not generally have the data wherewithal or statistical background that is needed to execute the full modelling lifecycle. The software, however good, still requires the modelling scenario to be properly framed, after which an appropriate modelling universe needs to be defined along with the target variable and potential explanatory variables. All these actions require ‘hands-on’ data work to engineer a modelling dataset that is ‘fit for purpose’.
Of course, data is the key ingredient for any modelling work, and the quality of that data is paramount to the success of model, as is the breadth and volume of data available within the organisation in an accessible form. If these criteria are met then KXEN InfiniteInsight will undoubtedly provide the quickest answers and deliver quality models.
Addendum - Structured Risk Minimization
The primary challenge for statisticians has been to build highly accurate models that are also reliable. This is particularly challenging with the advent of Big Data where there are high volumes of potential variables to use. Traditional statistics generally only produce an accurate model with a few variables, so an expert is needed to reduce the number of variables before building a model. The more variables there are, the more difficult it can be to build a reliable model. Only the expertise of the statistician or competent analyst guarantees the reliability of the model.
SRM was a breakthrough in mathematics and statistics made by the Russian mathematicians Vladimir Vapnik and Alexey Chervonenkis, which for the first time makes it possible to automatically build reliable and accurate models. In contrast to traditional statistical models, SRM models become more accurate and are still reliable as the number of variables is increased. Model Accuracy and Reliability are determined by the data, not by the expert. Certainly worth a further read if you are interested in how the engine works.
14 June 2013: Cloud-based marketing software platform provider Marketo has announced the availability of the Marketo Customer Engagement engine, which it claims is the first marketing solution that intelligently and automatically manages the timing and distribution of the right content, to the right person at the right time.
12 Mar 2013: Panel and brand research firm Panelteam has launched a new Consumer Lifecycle Brand tracking service which enables clients to keep track of their brand performance online.
19 Dec 2012: The newly released third generation of Janrain's user registration solution lets customers easily customise registration forms and workflows to better collect rich profile data on their users to inform marketing initiatives.
18 Dec 2012: Latest version adds social media functionality, superior integration capabilities and a powerful administration tool.
6 Dec 2012: Customer service solutions provider announces the launch of Genesys Orchestration bringing powerful customer service capabilities directly into the hands of the enterprise business user.
25 Sep 2012: SkySQL has announced a new enterprise suite for the MySQL & MariaDB open source databases, as part of a major launch of innovative open source technology and services designed to ensure the continued competitiveness of the MySQL database eco-system.
4 Sep 2012: Jibbern has announced the launch of a new social media suite aimed at SMEs that offers an all-in-one management solution.
3 Sep 2012: DirectSmile has launched the latest version of its Cross Media software solution for the creation of personalised, automated marketing campaigns across print, online and mobile media.
20 Aug 2012: Digital intelligence specialist Webtrends has today announced new updates to the social measurement capabilities of its Analytics On Demand product, as well as the availability of what it describes as “comprehensive Social Measurement Solutions”.
17 Aug 2012: Webtrends, the global leader in digital intelligence, has announced the availability of a new, free app for iPad that places highly relevant marketing analytics and insights, including trending data from websites, Facebook, Twitter and YouTube, at users’ fingertips.
16 Jul 2012: Really Simple Systems, Europe’s largest provider of hosted CRM, has delivered 99.999% uptime over the last 12 months, avoiding the outages that have plagued rival vendors.
6 Jul 2012: Maximizer Software has today announced the availability of the latest version of its CRM software, optimised for mobile devices.
26 Jun 2012: Integrated communications agency G2 Joshua has unveiled a new 'Social Media Brand Cloud’ innovation that translates data acquired via social media platforms into a live graphic reflecting consumer chatter on its client’s brands.
26 Jun 2012: Data migration and data cleansing specialist CCR Data has taken what it says is its first step in a journey to create a comprehensive tool box of Salesforce apps for data cleansing with the launch of the free CLAM app - Contact, Lead & Accounts Merge.
19 Jun 2012: Mindshare, the global marketing services and media network, has partnered with a roster of best-in-class technology and data partners including Acxiom and Adobe to create CORE, a first-of-its-kind user-centric and open source data-driven marketing intelligence platform.
14 Jun 2012: Identity management specialist GB Group has enhanced its tracing portfolio with the launch of e-Trace Batch, a new tool to help organisations validate and trust the accuracy of their databases.
30 May 2012: Online relationship marketing specialist Emailvision and Microsoft have announced the integration of Emailvision’s Campaign Commander Enterprise Edition with Microsoft Dynamics CRM.
3 May 2012: Global intent-based solutions provider NICE has introduced an integrated customer interaction management solution that it says impacts on every stage of the interaction lifecycle.