REVIEW: Welcome to the adaptive cleansing engine...

by James Lawson.

Never a glamorous subject, address matching is nevertheless a critical part of customer data management and processing. Extra accuracy can translate directly into greater income and reduced costs, not to mention multiple other benefits. Addressing experts Postcode Anywhere have taken a fresh approach to the matching challenge, and this month we get a sneak preview of their new application, Cleanse+.

Replacing keys


There have been various approaches to address matching over the years. Key-based matching is probably the most common: this involves generating a set of “keys” (character strings) that represent the addresses in question. These keys can then be compared using a variety of techniques in order to decide whether there is a match between them or not.
These techniques are usually governed by complex rule sets which are built up manually over many years using the experience gained from real world processing. Each has advantages and disadvantages of its own, the arguments about which would easily fill this magazine and a few others besides.
The vendor has chosen to consign match keys, fuzzy matching and so forth to the dustbin of history and has taken a new tack by employing what it calls an “adaptive cleansing engine”. This involves using a complex self-learning mathematical model that looks at address structure and the patterns within addresses, then builds its own rules as it goes rather than relying on manually-coded rules to govern it. The software can take multiple possible versions of a final record right through processing and only make the choice right at the end.
This method obviates the need to painstakingly write thousands of lines of rules based on years of experience; the vendors claim that rule-based address cleansing software requires well over 100,000 rules to be coded into it before it starts to be effective. It also dispenses with the inflexibility of rule-based systems: any change to address formats requires rule changes and these systems can typically only cope with addresses in one or a handful of countries due to the variation in (and lack of) global addressing standards.
One example is the traditional need to re-format records prior to PAF and other matching in order to improve the chances of a successful match. This type of solution should be format-agnostic, as long as you initially train it on the sort of data you will subsequently want to process.
The downside is that the learning process is entirely “black box”. It also hoovers up vast amounts of processing power, though the vendor relies on the massively scalable capabilities of the “cloud” to take care of that. However, though users have no influence over the learning process, the software is certainly not completely hands-off.
It’s possible to create custom look-up lists to be used by the software on certain jobs, while users can also inspect the rules the software builds up to check they make sense. Over time, the software has the capability to build custom lists itself and can also add to or change those built for it using its learning capability.
The matching process is not the only difference either. Rather than extracting and then reloading address data, the vendor has gone for the “in-database” approach whereby data is cleaned in-situ by connecting directly to a database table. Aside from mailing list processing, this is very much the modern standard as it avoids complex refreshes and also suits 24/7 systems where there is simply no time available for offline servicing.
It’s on-demand too. The vendor originally made its name through innovation in its use of web services to deliver address lookup and many other customer data-related information services. This new tool is no different, with the software and associated reference files running centrally.
Users send input files down the line and verified files come back. Given today’s high bandwidths, remote processing is unlikely to be a limitation for most users but the company is also planning an installed version that will run within its Enterprise Server application and so deal more effectively with the largest batch jobs.
Though the version tested employed a straightforward browser-based interface, the release version of Cleanse+ will use a small desktop client rather than just a browser, allowing the company to interact with local data sources and databases as desired using standards like ODBC, and also enhancing security compared to browser-based applications.
The demo version of the interface was clear and clean, with the minimum of options. These include whether to retain vanity addresses, the number of output address lines and the fields required in the output file.

Extreme simplicity


In contrast to the complexity under the bonnet, using the software in its current incarnation is an exercise in extreme simplicity. It offers auto-field mapping – also based on self-learning – which worked flawlessly in the demo version. There’s an option to preview the input records, then it goes straight into a batch PAFing routine, matching test records against a standard, non-optimised PAF reference file and displaying the results.
Unlike other PAF matching modules, there are very few configuration options indeed. Just like its current addressing products, the vendors say that this is to avoid confusion and complication. They also plan to use their ability to log which addresses have been captured or processed in order to alert clients when updates are needed. This will be an integral part of the Plus suite, with the future goal being fully automatic updating with no client intervention.
As well as the PAF file, the software has already had many thousands of addresses run through it to aid the learning process and the test matches were spot on. However, without running full test datasets, it’s impossible to estimate the matching performance of this or any other application. As ever, it’s up to buyers to decide whether it will achieve the results they want using their own test data.
This application is not yet a full competitor to all existing solutions, partly because it runs remotely over the web and so is necessarily limited in the volume of data it can batch process, but mostly because of the need for further tools beyond basic PAF matching to handle the various other matching applications: deduplication, multi-file merge/purge and so forth. According to the vendors, name checking against CACI’s Ocean and suppression matching will arrive shortly, along with other tools in the Plus suite that will handle data capture, updating and so on.

Different approach


It’s worth noting that, though innovative, this isn’t the first time that probabilistic techniques have replaced deterministic (or rule-based) techniques in record matching. There’s a history of research in this area in the computer science fields of speech recognition and natural language processing, as well as specifically in name and address record matching.
The self-learning approach seems to be gaining supporters. US matching specialist Netrics, acquired by MDM vendor Tibco early last year, also uses a mathematical model, “that mimics human perception of similarity, identifying hidden relationships in the data.” Its matching engine also includes a self-learning capability that continually improves its matching over time by evaluating other manual matches made by business users. Maybe this is the way we’ll all be matching records in a few years?
Certainly this is an impressive start and Postcode Anywhere’s innovation is to be applauded, and not just for the techniques it applies. It brings a welcome simplicity to the matching process that is well suited to online processing services in particular. How this approach will suit Marketing Service Providers is less easy to predict. Once again, the company is adopting the latest technology, and rattling the cages of established vendors in the process. It should be encouraged for its investment in advancing the addressing industry.

Online fire retailer makes "vast improvement" with address validation

17 May 2012: Online fire retailer GasFire.co.uk has reported a ‘vast improvement’ in its ordering process after installing address validation from Postcode Anywhere on its Magento ecommerce website.

Barnet Council fined £70k for losing sensitive data in burglary

16 Jun 2012: The London Borough of Barnet has been fined £70,000 for losing paper records containing highly sensitive and confidential information, including the names, addresses, dates of birth and details of the sexual activities of 15 vulnerable children or young people.

Kvarby joins Next Performance as COO

14 May 2012: Next Performance, the real-time advertising marketing platform specialising in next generation retargeting services, has announced that Bjorn Kvarby has joined the company’s management team as Chief Operating Officer.

Celebration of life and work of Derek Holder set for July

11 May 2012: A tribute event to celebrate the life and achievements of the late Professor Derek Holder F IDM will take place at London’s Royal Geographical Society on Friday 6 July 2012 from 2.30pm.

Hortonworks strikes Hadoop deal with Kognitio

8 May 2012:  Hortonworks, a leading commercial vendor promoting the innovation, development and support of Apache Hadoop, has partnered with in-memory data analytics pioneer Kognitio.

NICE launches new analytics-driven real time customer interaction solution

3 May 2012: Global intent-based solutions provider NICE has introduced an integrated customer interaction management solution that it says impacts on every stage of the interaction lifecycle.

Semphonic and iJento announce global partnership

3 May 2012: Multichannel customer intelligence specialist iJento and web analytics consultancy Semphonic have announced a new global partnership to collaboratively help organisations track and understand both digital and multichannel customer journeys.

What is the value of a name?

30 Apr 2012: Why do some businesses generate far more from their databases than others? It often comes down to lack of measurement and proper ROI metrics, says Mark Patron.

Teradata to acquire eCircle

1 May 2012: Teradata, the global analytic data solutions company, has signed a definitive agreement to acquire Munich-based eCircle, the European leader in cloud-based digital marketing.

Could it be Magiq?

27 Apr 12: The complexity of managing behavioural targeting and real-time web personalisation has meant that very few practical solutions exist for marketers but all that could be about to change with the launch of LifecycleMAGIQ, discovers James Lawson.

Callcredit powers through tough year with 11% hike in profits

27 Apr 2012: Callcredit Information Group has posted its annual results for 2011, showing a hike of 11.2% in profits from operations from a 50% increase in revenues.

Energy supplier First Utility selects StrongMail

26 Apr 2012: UK energy supplier First Utility has selected StrongMail On-Demand to drive its lifecycle email marketing campaigns.

Over-contact, poor data management hits charities

25 Apr 2012: More than 50% of UK adults would stop donating to a charity if it contacted them too frequently, according to new research examining consumer perceptions of how charities market their services and the way they use and manage supporter data.

Google Analytics boss joins Acxiom

23 Apr 2012Acxiom, the global marketing services and technology business, has announced that former Google Analytics Product Manager Dr Phil Mui has joined the company as Chief Product and Engineering Officer, a newly created position at Acxiom.

Communicator Corp appoints new MD

23 Apr 2012: Global enterprise email management company Communicator Corp has promoted Chief Operating Officer James Bunting to Managing Director.

REaD Group redefines suppression with Qinetic file

23 Apr 2012: The REaD Group launches an audacious unified file that combines deceased, goneaway, relocated and latest occupier records all within a single file.

Callcredit launch set to Define the data market

20 Apr 2012: Callcredit Marketing Solutions and its specialist data division The Trading Floor have launched what they believe is "the most granular, most accurate and most up to date consumer database of its kind".

Anonymous prospects contactable with new Neolane functionality

20 Apr 2012: Conversational marketing technology provider Neolane has added new features to its Interaction application which will now allow marketers to interact with anonymous prospects online.

MySQL creator secures £2.5m of funding

18 April 2012: SkySQL, the creator of MySQL, has announced that it has raised $4m in Serie A funding from a number of investors

Toshiba falls foul of Data Protection Act

17 Apr 2012: Toshiba Information Systems (UK) is the latest organisation to breach the Data Protection Act (DPA) after the personal details of 20 competition entrants were compromised by a security flaw on its website.

Watson Phillips Norman picks up charity DM account

17 Apr 2012: DM agency Watson Phillips Norman has been appointed by international animal welfare charity the Brooke to work on its acquisition direct marketing programme and new product development.

Judging panel confirmed for inaugural IoF SIG Awards

16 Apr 2012: A strong and experienced panel of judges has been announced for the Insight in Fundraising Special Interest Group’s first awards scheme.

Bulk Mail: what it means to direct marketers

13 Apr 12: Bob Carter of BBS offers an in depth guide to the implications for marketers of The Royal Mail's recent overhaul of its Mailsort service.

Apteco names top partners for 2011: D&B, Celerity-IS & Callcredit

13 Apr 2012: Apteco has recognised its top three performing FastStats reseller partners for 2011 from its network of over 50 partners in the UK, Europe, North America and Australia.

Poor data costs UK firms £1 for every £6 spent

12 Apr 12: Around £1 in every £6 of departmental budget is wasted on average by UK companies because of poor data quality according to new Experian QAS research.

Hopewiser celebrates 30 years at the forefront of addressing

12 Apr 2012: Hopewiser is celebrating three decades as a leading provider of addressing software and address management expertise this year.