Archive for the ‘Data Warehousing’ Category.

ETL and Real-time Data Warehousing

By Nena Marin, Sr. Solutions Architect, Pervasive Software

Real-time data warehousing is at the core of enterprise intelligence initiatives.  Forecasting, trending and what to do next require freshest knowledge of what is happening now.  

Pervasive Data Integrator (DI) provides workflow based ETL for real-time concurrent warehouse loading. As an example, the following integration process implements a Data Warehouse load using Pervasive DI.

Process Designer 

In this dimensional model, the data warehouse is comprised of dimension and fact tables.  The fact tables contain measures (like $ and units) and relationship to every dimension or attributes via foreign keys.  The dimensional model provides a framework that is scalable yet standard across the enterprise. 

Pervasive DI process designer provides a canvas to create workflows for data warehouse loads.  Each map/transformation node in the workflow loads a specific table; dimensions and facts.  At load level 1, dimension tables without dependencies can be loaded concurrently.  At load level 2, dimensions with downstream dependencies can be loaded concurrently. In this example, the Product_Dimension includes a foreign key to (or is downstream dependent on) the Product_Class_Dimension.  Since fact tables contain foreign keys or dependencies to all dimensions, fact tables are loaded last in the workflow process. 

Pervasive DI workflow offers seamless concurrency, native connectivity and SQL querying to join data from multiple sources. The pervasive ETL process designer also provides handy features like decision steps, email notification and a scripting language for custom source to target transformations.  There are built in options to profile the ETL engine performance during your load processes. 

Finally, processes are scheduled as tasks for automation. 

del.icio.us Slashdot Digg Technorati StumbleUpon

How to ensure a successful integration project

Many companies who have never embarked on an integration project are now plunging head first into their first ever integration projects; at the same time, many departments within larger corporations are also tackling their first integration projects. There seems to be various reasons why companies and departments are starting to look at integration now, such as CRM implementations, Business Intelligence, and mergers and acquisitions. I won’t get into that here. The important thing is more and more IT departments are starting some form of data or application integration project, and are looking for ways to get started.

The goal of this post is to list the top five things IT directors need to ensure a successful integration project.

1. Determine the goal of your integration project. An integration project should be pursued from a pure cost/benefit standpoint. How will it advance your business goals? Are you trying to build an executive dashboard so the CEO and CFO can graphically see marketshare, profitability, and sales trends? Then you’re going to need to build a data warehouse with data from various sources, including ERP, CRM, text files and your website, from which your executive reporting tool can make pretty business pictures. Do you want to enable your sales executives to know everything that is going on with your customers, so they can avoid accepting a purchase order from an account that is on credit hold? That would require synchronization between your CRM system and your ERP or accounting system. Are you hoping to make it easier for your suppliers to provide you with shipping information and invoices so they just automatically show up in your wholesale or retail management system? Then you need to EDI-enable these systems.

2. Determine what kind of integration you’re pursuing. Is it a migration, an ETL project, an application integration or B2B integration? Here’s a nice little diagram that can help you with this:

Integration ScenariosIntegration Scenarios

If you’re trying to synchronize data between your CRM system and your ERP or Accounting System (often the ERP/Accounting system is the “system of record,” meaning that’s where the customer master data is stored), then it’s an interface integration. This requires real-time, or near real-time movement of bits and pieces of data back and forth between both systems at a business logic level. On the other hand, if you’re trying to extract data from operational systems, such as your ERP or CRM system and dump them into a repository in order to slice and dice the data for more accurate reporting on your business, then you’re looking at ETL. The requirements here are usually for nightly, weekly or even monthly batch loads from your operational systems, usually late at night or on weekends when these systems that run your business won’t take too big of a performance hit.

3. Determine the sources and destinations of your data. The best way to break down an integration project into easily understandable steps, and to calculate the time and effort it will take, is to determine where the data is coming from and where it’s going to. I’m not just talking about what applications you’re moving data to and from, but also what tables or data objects, and how many. So, for example, if your goal is for your sales people to close a sale in your CRM system so it will kick-off a sales order in your ERP system, this would involve:

  • The Accounts or Company object in your CRM system
  • The Contact object in your CRM system
  • The Opportunity object in your CRM System
  • The Product object in your CRM system

That’s four objects in your CRM system. You also have to determine object name and count in your ERP system, as well as determine how data from your CRM system will change, combine or interact for it to make sense to your ERP system and to successfully create a sales order.

4. Determine your resources. Many times an integration project is so easy that it can be done in-house with minimal effort expended. Sometimes what you thought might be very easy turns out to be a very complicated project that drags on for eight months with the current manpower at your disposal. Knowing who you have available and what his or her skills are is crucial. Some integration projects require just one business analyst. These could be simple migrations such as exporting data into a flat file from one system and importing that same flat file. Sometimes Microsoft Excel is all you need to do this. However, most integrations are not as simple. Interface type integrations require lots of heavy programming. You would need somebody familiar with java, C++, and web services programming skills, and they don’t come cheaply! These issues are largely mitigated by commercially available integration tools, which are typically designed for use by a business analyst. If your company has no technical resources (rare), or if they are all allotted to other projects and not available for your project, then it might make sense to hire a consultant for a time to do the integration for you.

5. Choose your approach: Build vs. Buy. This is largely determined by number 4 above. It doesn’t make sense to invest in a $100,000 ETL tool if all you’re doing is loading data from a mailing list into your CRM system. It can also be the death of a project if you decide to use in-house resources, and it ends up taking up to 6-9 months, or requires a highly-paid java programmer to update your custom code every time you want to add a field to your CRM to ERP synchronization piece. It’s up to you. It might not be too much of a headache to do it in-house if it’s a fairly straightforward integration with few to any changes in business logic, and if data structure and field names are the same. If, however, you, have to transform the data in some way, or you’re integrating between two completely different data sources or data types (which covers the majority of integrations), then you should opt for a data integration tool. Most of the time your urgent integration project will not be your last. Because people come and people go, and high-valued technical resources are constantly being poached by other companies, an integration tool that is easy to learn and use will enable you to tackle present and future integrations without having to rely on the knowledge locked away in the brain of your top developer.

del.icio.us Slashdot Digg Technorati StumbleUpon

Update: Integration and The Recession part 2

As an add-on to my March 12th posting about Integration and The Recession, I neglected one of the most important reasons to acquire integration: Business Intelligence. In a recession it is important to be ruthless about measuring every aspect of your business, from profits, sales, marketing effectiveness, all the way to your web marketing campaigns. By the way, in the aforementioned article, ThePPCGuru.com mentions web marketing as the most cost effective, as well as the most fully measureable marketing method out there, ideal for the recession.

Business Intelligence requires a functional reporting tool, from industry stalwarts Business Objects and Cognos, to up-and-coming value business intelligence company Bitam out of Mexico. It also, (surprise surprise) requires a data integration tool, or ETL tool (extract, transform and load), in order to bring data from various sources such as back-end business applications, websites, web analytics applications, etc. into a data warehouse or data mart on which the business intelligence tool works.

del.icio.us Slashdot Digg Technorati StumbleUpon

Connecting A to B: Integration a function of semantics?

A great little article by Sean McGrath from from ITworld.com appeared yesterday which should serve sales people as well as data integration project managers equally. Essentially, to summarize, if an executive says she wants to connect A to B, you need to dig deeper to see what that really means.  Is it connecting A to B to “advance some process that B is involved in”? Or is it to connect A with information from B in order to enable a report from a reporting module in B.

In reality, McGrath says, A to B in the first scenario is just a simple application integration scenario, where B needs data from A in order to effect a business process.  The second scenario is an ETL scenario, whereby you need to actually take data from A and B and put it into a third repository, C, in order to run a report on this.

I think McGrath does a great job of simplifying the message that a seemingly straightforward integration request is not as straightforward as it seems.  You need to ask the right questions, dig deeper, and find out what the end-goal is.

del.icio.us Slashdot Digg Technorati StumbleUpon

Integration and The Recession

Fernando Labastida

By Fernando Labastida

So, there’s a recession on the way, or already here. Some accounts say it’s going to get pretty bad, and others are not so pessimistic. All I know is that the job losses have been worse than expected, and despite yessterday’s upswing in the stock market due to the Fed’s offer to lend banks $200 billion in exchange for risky mortgage debt, others are not so sanguine about the move.

What are corporate IT departments to do in the midst of such uncertain economic times? Is this the time to retrench and not invest in software? Not on your life. Most companies are probably in the middle of one integration project or another, such as building a SOA infrastructure, building a data warehouse for business intelligence purposes, migrating old data from legacy applications to a new SaaS application, or connecting their CRM, accounting, ERP, project management, inventory management or manufacturing operations in order to make their business more efficient.

These projects are all oriented towards one thing: decreasing costs, and increasing revenue. This is no longer the late 90s / early 2000s. Corporations no longer purchase software to acquire “cool technology.” Software projects are now purely utilitarian, with CIOs focusing on bottom line as well as top line revenues only.

What corporate IT departments need to do is stop messing around and get going with these projects.

If projects are being delayed because of the use of custom code, companies need to acquire a tool to cut down development time. This is not the time to “do-it-yourself.”

Here are the reasons for acquiring a tool, such as (surprise) an integration tool, rather than relying on labor-intensive manual processes, during a recession:

  • Low total cost of ownership
  • Faster time to market
  • Flexible, scalable implementations
  • Higher level of integration with third-party technology
  • Integrated, cross-functional processes
  • Automated, standardized design processes
  • Optimization of development resources
  • High reliability through proven performance
  • Self-documenting

This is no time to use your valuable development resources on the mundane task of integrating applications. They need to be put on tasks designed to optimize efficiency in order to weather the hard times. This is the moment you’ve been waiting for, the moment to acquire labor-saving software tools.

Fernando A. Labastida

del.icio.us Slashdot Digg Technorati StumbleUpon

What is integration?

Integration Scenarios

The word “integration” is bandied about so frequently and in such a cavalier manner, it is assumed that everybody knows what integration is.  However, integration means a lot of things to a lot of people.

 The Merriam-Webster online dictionary definition of integration says:

1: the act or process or an instance of integrating: as a: incorporation as equals into society or an organization of individuals of different groups (as races) b: coordination of mental processes into a normal effective personality or with the individual’s environment2 a: the operation of finding a function whose differential is known b: the operation of solving a differential equation

The first definition describes one of the great socio-political accomplishments of the latter part of the twentieth century, and the latter definition refers to a complex mathematical equation.  However, this doesn’t really work for our purposes.  A more useful definition from TechDictionary.com:

Putting diverse hardware and/or software components together to work as a system.

That’s a little bit better.  However, when we narrow the term down to data integration, and we get my favorite definition, one provided by Georgetown University on a glossary page on Data Warehousing:

Data Integration The movement of data between two co-existing systems. The interfacing of this data may occur once every hour, once a day, etc.

That’s it, it’s as simple as moving data between two co-existing systems. This can be done in four different ways:

Migration: moving data from a legacy application, with data stored in a legacy format such as Cobol or Isam, to a modern CRM or ERP application. This is usually a one-time movement of massive amounts of historical data.

ETL: Extract, Transform and Load of data. Extracting data from operational applications, such as CRM, ERP, accounting, manufacturing, or other systems into a database used for the sole purpose of manipulating the data and reporting on the data, usually for business analysis purposes. This is usally the movement of large amounts of data in a batch process, durng certain intervals such as hourly, nightly, weekly, etc.

Application Integration: The movement of data in small chunks between two business systems, such as from a CRM application to an ERP system, or from a manufacturing system to an accounting system.  This is usually done at an API level, as opposed to through the database or text interface as is more typical of the first two scenarios, and is usually event-based and real-time, or near real-time.

B2B: Integration of data between trading partners, such as wholesalers, manufacturers, retailers, transportation companies, etc.  This also is usually real-time or near real-time, and event based, and entails transforming EDI documents or documents in any other format, transported over the internet, ftp, EDI VANS, etc.

There are many tools out there that do one or the other of the above scenarios, and some even do all of them.  The best integration tools for a company usually depends on what their primary focus is, how much data is being moved, the required frequency and speed of data integration, among others. Integration tools are usually not a one-size fits all solution; however, there are some tools that are very flexible, or rather agile, that can be used for almost all data integration scenarios. Most companies, if they’ve been around for any length of time, usually have diverse integration needs, and so sometimes it makes more sense for them to acquire a tool that can span the breadth of integration scenarios so as to capitalize on a one-time investment in software as well as knowledge.

del.icio.us Slashdot Digg Technorati StumbleUpon