ETL and Real-time Data Warehousing

By Nena Marin, Sr. Solutions Architect, Pervasive Software

Real-time data warehousing is at the core of enterprise intelligence initiatives.  Forecasting, trending and what to do next require freshest knowledge of what is happening now.  

Pervasive Data Integrator (DI) provides workflow based ETL for real-time concurrent warehouse loading. As an example, the following integration process implements a Data Warehouse load using Pervasive DI.

Process Designer 

In this dimensional model, the data warehouse is comprised of dimension and fact tables.  The fact tables contain measures (like $ and units) and relationship to every dimension or attributes via foreign keys.  The dimensional model provides a framework that is scalable yet standard across the enterprise. 

Pervasive DI process designer provides a canvas to create workflows for data warehouse loads.  Each map/transformation node in the workflow loads a specific table; dimensions and facts.  At load level 1, dimension tables without dependencies can be loaded concurrently.  At load level 2, dimensions with downstream dependencies can be loaded concurrently. In this example, the Product_Dimension includes a foreign key to (or is downstream dependent on) the Product_Class_Dimension.  Since fact tables contain foreign keys or dependencies to all dimensions, fact tables are loaded last in the workflow process. 

Pervasive DI workflow offers seamless concurrency, native connectivity and SQL querying to join data from multiple sources. The pervasive ETL process designer also provides handy features like decision steps, email notification and a scripting language for custom source to target transformations.  There are built in options to profile the ETL engine performance during your load processes. 

Finally, processes are scheduled as tasks for automation. 

del.icio.us Slashdot Digg Technorati StumbleUpon

So easy, a salesman can do it…or, integration tool ease-of-use

In keeping a blog on “Integration for Business,” I sometimes feel I have to walk a fine line between keeping neutral and writing an objective commentary on integration issues, and wanting to promote the fact that I work for an integration tools company that sells really cool integration products.  In this post I’m going to cross that line in a major way and talk about the products I sell, so please bear with me.

As I go about my daily job of trying to convince companies to purchase Pervasive’s integration products vs. doing integration in-house or buying a competitors’, I often rely on the knee-jerk claim of the “ease-of-use” of our sofware.  I also get sick and tired of hearing potential customers tell me that some of my competitors’ products are apparently easier to use than my product, even though sometimes they forget that some of those products come bundled with 80 hours of free implementation services (paid for by the high-cost of their product), which of course makes it easier by default.

I got so sick of my competitors’, as well as my own rhetoric, that I decided to try out the product I sell for myself. I wanted to see how easy it actually was to use my product to load leads (I am a sales person) into a test ”developers’” account in Salesforce.com.

I had a handy flat file, ASCII Delimited (tab separated) with names of systems integrators on the West Coast that I had downloaded from Hoovers. I opened the Map Designer tool of Pervasive’s Data Integrator, and went to the purple “Source Connection” tab…

 

…where I chose ASCII (delimited) as the connector, out of the list of 160+ connectors, in order to connect to the file.

I then chose the file I was going to connect to using the “Source/File” dialog box, and because I had seen my Systems Engineer do demos several times, I knew I had to choose “True” for header in order to make the very first row, the one that has “Account, First Name, Last Name,” etc. become the column header row.

Then I went into the green “Target Connection” tab and from the drop-down list of 160+ connectors chose Salesforce.com v. 10 as my target connector, entered my username, password, and security token (now that Salesforce.com has enforced more enhanced security, you either have to approve the particular browser on the particular machine you’re using to be used to log-in, or you need to add a security token after your password).  Then I hit connect!

What happened?  Nothing….which was a good thing.  If I had hit “connect,” and I had gotten an error, then it would have been bad news, wouldn’t it?  Well, my next step was to choose which table I was going to write to within the “Table” dialog box, still within the green “Target Connector” tab.  I chose the “Leads” table from within the list of all the Salesforce.com tables.

Oh, by the way, not only can you view standard Salesforce.com tables within the Multimode Table Selector dialog box within Pervasive Data Integrator’s Map Designer, you can also view custom tables as well!

Finally, I went to the yellow “Map Fields” tab, where I saw the source fields on the left in purple (or are they blue?), and the target fields in green on the right. 

 

However, I didn’t see the standard Salesforce.com Lead fields, so I was confused.  Then I spied something on the upper right of the screen, a little R1. I noticed it was a drop-down menu, and sure enough, the one drop-down choice was “Leads,” the table I had chosen in the previous screen.  I chose that and was presented with all the “Leads” fields standard to Salesforce.com.

Pervasive’s Map Designer allows you to use cool functions such as “Match by name,” so that if all the field names in the source as well as the target are exactly the same, you can save a lot of time and map all the fields at once.  There’s also a “Match by position” function, assuming the fields are in the same order.  Finally, you can just drag and drop all the fields at once, transferring en masse the source fields to the target. 

None of those will work in this case, since Salesforce.com has specific field names and positions, so each field has to be mapped individudally.  That’s not such a hassle, especially if you don’t really want to map every single field over.  All I did was individually drag and drop the fields I wanted mapped over from the Source side to the corresponding field in the Target side (First Name to FirstName, Line of Business to Industry, etc.).  I had done my mapping!

The fields I had mapped over appeared as bold in my source fields.  Then I decided to verify that the map was valid, so I pressed the little check-mark on the upper right-hand side of the tool. Since there were some fields I was not mapping to it prompted me to remove those from the schema, which I did. Then I got a “Map is Valid” dialog box.

Now, for the integration.  Pervasive’s Map Designer allows you to execute a transformation from the tool itself, or to save your map as an XML file to deploy it to a run-time engine for automation purposes.  Since I’m a sales guy I don’t know how to do any of that stuff, and all I need to do is upload a few leads anyway that I was going to call the next day, so I’m going to execute the map from the design tool. I would do that by clicking the litttle arrow on the upper left side of the tool, right next to the check-mark.

Before that, however, I needed to make sure these leads weren’t already in my Salesforce.com. Ok, the only leads there were what came with my sample “developer’s version” of Salesforce.com. 

 

Now I hit the execute button, the little arrow on the upper left.  After a log file appears, and a dialog box with a progress bar, I went back to the “Leads” tab in Salesforce.com, refreshed the browser, and voila!  There are my new leads updated 4/22/2008!

So easy, a salesman can do it!

del.icio.us Slashdot Digg Technorati StumbleUpon

MDM, Data Integration, and Cat Food

I’m going to be attending the Spring 2008 MDM Summit in San Francisco March 31-April 1, so I thought it would be appropriate to brush up on the latest and greatest in Master Data Management.  In the process, I ran into three articles today that do a perfect job of tying it all together and making it real.

In Loraine Lawson’s blog Mergers and Integrations, there is a great posting on the petty power plays between the owners of different silos of corporate data (mostly customer data), which leads to the conclusion: the Master Data Management challenge is not so much a question of which data integration technology to use, but of resolving the corporate culture clash between those who are willing to share data vs. those who want to protect their turf.

While this is all good and well, my question is: who cares about Master Data Management anyway? Maybe the description of Master Data Management from an official source can explain why MDM is relevant.  According to Jill Dyché, partner of Baseline Consulting, she cites a definition from her book “Customer Data Integration: Reaching a Single Version of the Truth,” in an interview with James Powell in Enterprise Systems:

…we define MDM as “the set of disciplines and methods to ensure the currency, meaning, quality, and deployment of a company’s reference data within and across subject areas.”

What? Did anybody get that? To borrow from a blog posting on another topic, Business Intelligence, Ann All says:

IT folks who throw out terms like ETL and OLAP, blather about “single versions of the truth” and get bogged down in discussions over how many terabytes of data their business intelligence tools can handle are missing out on a rare opportunity to wow business users.

Ann’s whole blog post is about telling the customer stories about how Business Intelligence has helped actual companies. 

Well, providence supplied me with my story via one of my Google Alerts emails. A fantastic article in Internet Retailer by Don Davis entitled Do You Know Me? brings it all together.  The article makes absolutely no mention of jargon such as MDM or Single Version of the Truth, but talks about how Petco, REI and Build-A-Bear are able to more effectively target their customers with personalized offers and messages based on their offline and online purchasing history, and web browsing trends.

Price interviews, among others, John Lazarchic, director of e-commerce at Petco.  Here’s the money quote from the article:

It’s all aimed at making offers relevant. “If a customer buys 40-pound bags of dog food in the store because he doesn’t want to pay shipping charges, I want to keep marketing messages for store stuff store-specific,” Lazarchic says. “But if he’s buying three and a half pound bags of cat food online, I’ll send him online cat offers. I want to keep it specific by channel and pet type.”

So what is MDM?  It’s all about cat food, or rather consolidating data on your customers from various sources so you can market the right cat food using the right delivery method to the right customer.  How did Petco get this data?  They built a data warehouse, merging customer data stored in information silos gleaned from their e-commerce site, their online analytics engine, and their PALS loyalty system for on-site retail purchases.  Price does not mention the term Master Data Management ANYWHERE in the article, but that’s essentially what it is.

So why is MDM relevant?  Why care if you use this or that data integration technology, or if the retail marketing team or the online marketing team hordes or shares their data on the same exact customers who shop online and in their stores?  Just look to what REI, Petco, and Build-A-Bear are doing to provide a relevant and satisfying shopping experience, or rather, relationship, with their customers.

Even though I used a retail example from a retail-oriented online publication, MDM can apply to any other business scenario where different versions of information on the same physical entity exists. But these are the type of stories that make MDM real.

del.icio.us Slashdot Digg Technorati StumbleUpon

Iaas (Integration-as-a-Service) is here!

Pervasive Software on Tuesday announced the launch of Pervasive DataCloud, a hosted, on-demand integration offering that allows integration to be delivered as a service. For full disclosure, Pervasive software is my employer, but I can’t help get excited  by the fact that this is something that I think everbody can get behind!  Now companies that use a strictly Saas model for all their applications, such as Netsuite for their ERP & CRM, Salesforce.com for their CRM, Google Apps for corporate email and productivity tools, and a host of other Saas applications, can also deploy integration between all these apps “in the clouds,” so-to-speak. They’ll have nothing on-premise but a bunch of wireless laptops!

It will also, of course, be useful for those who are slowly migrating to a Saas computing model for their corporate apps, but still run traditional on-premise applications that they need to connect to their Saas applications.

 This story has been picked up my many sources, such as eWeekData Monitor, and Curt Monash, to name a few, and is generating lots of buzz.

del.icio.us Slashdot Digg Technorati StumbleUpon

IT & Business: Alignment or Integration?

Jason Hiner, Executive Editor at  TechRepublic.com, in his Sanity Check blog, has published a very useful article in two parts on the Alignment between IT and Business.  The first one, a summary of a Wall Street Journal article on the topic (Hiner has had problems with the WSJ’s take on IT issues in the past), points out the divisions between IT departments and a company’s busines goals, a division which can be debilitating to company growth.  Hiner sums up the article thus:

The general thrust of the article can be summed up by this line: “Success in the digital economy of the 21st century demands a strategic role for IT. And for that to happen, the glass wall between IT and the rest of a company has to be shattered.”

He cites analysts’ estimation of hundreds of billion of wasted dollars on failed IT projects, CIOs who come from technology backgrounds and don’t know how to integrate IT with business goals, and business leaders who look down on IT personnel as ‘nerds.’  Although he largely agrees with this particular WSJ article because they relied on IT industry experts Dr. Amit Basu and Professor Chip Jarnagin instead of internal writers, his own prescription simplifies the articles’ own prescription:

  1. Hire a CIO who has business savvy but can also gain the respect of the techies in the IT department
  2. Improve IT awareness/training among executives and team leaders throughout the business
  3. Improve business awareness/training among the company’s IT managers

The second article in the series is a summary of an MIT Sloan School of Management paper on avoiding the IT Alignment Trap.  Apparently, it’s not enough to just align IT with business goals.  Companies that do that have to also make their IT processes more efficient; if not then they actually spend more on IT than the average company, and the company’s sales goals fall short of the average. 

The key is not only to align IT with business goals, or rather, integrate IT with the business, but to simplify IT department processes:

  1. Reducing complexity by establishing standards and getting rid of redundancy
  2. Rightsource the job, by making strategic decisions as to what to keep in-house, what to outsource, and what to hande with packaged applications, and
  3. Creating end-to-end accountability.

Hiner generally agrees with the MIT Sloan paper, except for it’s prescription for centralized IT management.  He advocates a certain amount of decentralized decision-making, especially as it relates to alignment with the goals of separate business units or departments.  He also disagrees with the “alignment” word, citing “integration” (my favorite word; Martin Luther King Jr. would be proud), as better because IT should always be subordinate to the business.

This article series I believe supports an argument I’ve been making throughout this blog, that IT projects should be considered from a purely cost-benefit point-of-view.  Relating this to integration, the purpose of this blog, Loraine Lawson cites an interview with Philip Russom with the Data Warehousing Institute where he states the benefits of “rightsourcing,” in this case using a packaged application:

Data integration requires a senior-level programmer typically, he said, which means you’ll be paying at least one programmer six figures to spend months coding data integration from scratch. By comparison, you could build a comparable solution in two to three months with a vendor tool for less money, he said.

In my experience, the time can be considerably less than two to three months.

Integration’s role in a company where IT and Business are aligned or integrated should be to get out of the way as quickly as possible, not be a bottleneck, stay simple, repeatable and in the background so IT can properly support business goals.

del.icio.us Slashdot Digg Technorati StumbleUpon

Saas-to-Anything embedded integration gaining traction.

QuickArrow, a SaaS services automation vendor out of Austin, TX recently announced an embedded integration solution called QuickConnectIT.  Powered by Pervasive’s Data Integrator V9, QuickConnectIT enables companies to connect their QuickArrow hosted professional services automation application with other SaaS or on-premise applications, such as CRM, ERP, and accounting applications.  QuickArrow is one of many SaaS vendors who are embedding integration tools into their SaaS applications.  Why are these software-as-a-service companies embedding integration?  As elucidated by QuickArrow’s press release,

“The partnership (with Pervasive)…enables QuickArrow to better focus on it’s core competency, Professional Services Automation (PSA).”

They’d prefer to do what they do best instead of getting bogged down on custom-coding one-off integrations everytime they implement a solution at a new customer site. 

del.icio.us Slashdot Digg Technorati StumbleUpon

Legacy is here to stay…

Mainframe

Fantastic article in The New York Times online’s Techology section on Saturday, “Why Old Technologies Are Still Kicking.” by Steve Lohr.

The two most important implications about this article for this blog, because it speaks to the heart of the existence of this blog in the first place:

1. Legacy technologies are not going way, meaning integration between disparate systems is a perpetual need for corporations, and

2. Strategic technology decisions are business decisions, pure and simple.

The gist of the article is that the mainframe is not only here to stay, but IBM has invested huge sums to retool the mainframe and modernize it because of the demand within key sectors of the economy, such as the financial industry, for it’s mission critical capabilities.

“I.B.M.’s most recent model, the z10, represents an investment of $1.5 billion and the work of 5,000 technical professionals,” says Lohr in the article.

This despite the prediction in 1991 by Stewart Alsop, cites Lohr, that mainframe technology would disappear by 1996. Why is this? Here’s the money paragraph of the article:

The unfulfilled predictions of demise, experts say, tend to overestimate the importance of pure technical innovation and underestimate the role of business judgment. “The rise and fall of technologies is mainly about business and not technological determinism,” said Richard S. Tedlow, a business historian at the Harvard Business School.

So for all those who say they don’t need to worry about integrating legacy technologies or invest in integration skills or technologies because “all applications are now web services enabled,” think again. The mainframe, and hence legacy technology, are alive and well, and getting stronger!

del.icio.us Slashdot Digg Technorati StumbleUpon

How to ensure a successful integration project

Many companies who have never embarked on an integration project are now plunging head first into their first ever integration projects; at the same time, many departments within larger corporations are also tackling their first integration projects. There seems to be various reasons why companies and departments are starting to look at integration now, such as CRM implementations, Business Intelligence, and mergers and acquisitions. I won’t get into that here. The important thing is more and more IT departments are starting some form of data or application integration project, and are looking for ways to get started.

The goal of this post is to list the top five things IT directors need to ensure a successful integration project.

1. Determine the goal of your integration project. An integration project should be pursued from a pure cost/benefit standpoint. How will it advance your business goals? Are you trying to build an executive dashboard so the CEO and CFO can graphically see marketshare, profitability, and sales trends? Then you’re going to need to build a data warehouse with data from various sources, including ERP, CRM, text files and your website, from which your executive reporting tool can make pretty business pictures. Do you want to enable your sales executives to know everything that is going on with your customers, so they can avoid accepting a purchase order from an account that is on credit hold? That would require synchronization between your CRM system and your ERP or accounting system. Are you hoping to make it easier for your suppliers to provide you with shipping information and invoices so they just automatically show up in your wholesale or retail management system? Then you need to EDI-enable these systems.

2. Determine what kind of integration you’re pursuing. Is it a migration, an ETL project, an application integration or B2B integration? Here’s a nice little diagram that can help you with this:

Integration ScenariosIntegration Scenarios

If you’re trying to synchronize data between your CRM system and your ERP or Accounting System (often the ERP/Accounting system is the “system of record,” meaning that’s where the customer master data is stored), then it’s an interface integration. This requires real-time, or near real-time movement of bits and pieces of data back and forth between both systems at a business logic level. On the other hand, if you’re trying to extract data from operational systems, such as your ERP or CRM system and dump them into a repository in order to slice and dice the data for more accurate reporting on your business, then you’re looking at ETL. The requirements here are usually for nightly, weekly or even monthly batch loads from your operational systems, usually late at night or on weekends when these systems that run your business won’t take too big of a performance hit.

3. Determine the sources and destinations of your data. The best way to break down an integration project into easily understandable steps, and to calculate the time and effort it will take, is to determine where the data is coming from and where it’s going to. I’m not just talking about what applications you’re moving data to and from, but also what tables or data objects, and how many. So, for example, if your goal is for your sales people to close a sale in your CRM system so it will kick-off a sales order in your ERP system, this would involve:

  • The Accounts or Company object in your CRM system
  • The Contact object in your CRM system
  • The Opportunity object in your CRM System
  • The Product object in your CRM system

That’s four objects in your CRM system. You also have to determine object name and count in your ERP system, as well as determine how data from your CRM system will change, combine or interact for it to make sense to your ERP system and to successfully create a sales order.

4. Determine your resources. Many times an integration project is so easy that it can be done in-house with minimal effort expended. Sometimes what you thought might be very easy turns out to be a very complicated project that drags on for eight months with the current manpower at your disposal. Knowing who you have available and what his or her skills are is crucial. Some integration projects require just one business analyst. These could be simple migrations such as exporting data into a flat file from one system and importing that same flat file. Sometimes Microsoft Excel is all you need to do this. However, most integrations are not as simple. Interface type integrations require lots of heavy programming. You would need somebody familiar with java, C++, and web services programming skills, and they don’t come cheaply! These issues are largely mitigated by commercially available integration tools, which are typically designed for use by a business analyst. If your company has no technical resources (rare), or if they are all allotted to other projects and not available for your project, then it might make sense to hire a consultant for a time to do the integration for you.

5. Choose your approach: Build vs. Buy. This is largely determined by number 4 above. It doesn’t make sense to invest in a $100,000 ETL tool if all you’re doing is loading data from a mailing list into your CRM system. It can also be the death of a project if you decide to use in-house resources, and it ends up taking up to 6-9 months, or requires a highly-paid java programmer to update your custom code every time you want to add a field to your CRM to ERP synchronization piece. It’s up to you. It might not be too much of a headache to do it in-house if it’s a fairly straightforward integration with few to any changes in business logic, and if data structure and field names are the same. If, however, you, have to transform the data in some way, or you’re integrating between two completely different data sources or data types (which covers the majority of integrations), then you should opt for a data integration tool. Most of the time your urgent integration project will not be your last. Because people come and people go, and high-valued technical resources are constantly being poached by other companies, an integration tool that is easy to learn and use will enable you to tackle present and future integrations without having to rely on the knowledge locked away in the brain of your top developer.

del.icio.us Slashdot Digg Technorati StumbleUpon

Update: Integration and The Recession part 2

As an add-on to my March 12th posting about Integration and The Recession, I neglected one of the most important reasons to acquire integration: Business Intelligence. In a recession it is important to be ruthless about measuring every aspect of your business, from profits, sales, marketing effectiveness, all the way to your web marketing campaigns. By the way, in the aforementioned article, ThePPCGuru.com mentions web marketing as the most cost effective, as well as the most fully measureable marketing method out there, ideal for the recession.

Business Intelligence requires a functional reporting tool, from industry stalwarts Business Objects and Cognos, to up-and-coming value business intelligence company Bitam out of Mexico. It also, (surprise surprise) requires a data integration tool, or ETL tool (extract, transform and load), in order to bring data from various sources such as back-end business applications, websites, web analytics applications, etc. into a data warehouse or data mart on which the business intelligence tool works.

del.icio.us Slashdot Digg Technorati StumbleUpon

Connecting A to B: Integration a function of semantics?

A great little article by Sean McGrath from from ITworld.com appeared yesterday which should serve sales people as well as data integration project managers equally. Essentially, to summarize, if an executive says she wants to connect A to B, you need to dig deeper to see what that really means.  Is it connecting A to B to “advance some process that B is involved in”? Or is it to connect A with information from B in order to enable a report from a reporting module in B.

In reality, McGrath says, A to B in the first scenario is just a simple application integration scenario, where B needs data from A in order to effect a business process.  The second scenario is an ETL scenario, whereby you need to actually take data from A and B and put it into a third repository, C, in order to run a report on this.

I think McGrath does a great job of simplifying the message that a seemingly straightforward integration request is not as straightforward as it seems.  You need to ask the right questions, dig deeper, and find out what the end-goal is.

del.icio.us Slashdot Digg Technorati StumbleUpon