Content Migration: Being Prepared

content migration preparedness

When an IT professional is tasked with putting old content to work in a new system the trepidation is often palpable. If the source and destination system are from a different era, as they often are, the task can be even more cumbersome. Much like negotiating a treaty between two countries who do not share a common language, someone will be faced with the task of translating. If that translator is not properly prepared the outcome might create more problems than it solves.

Preparation is the key to success for digital content migrations. Here are a few things we’ve learned to prioritize when preparing.

Infrastructure

In short, beef it up. A temporary boost to your source and destination machines can be the difference between weeks and days of processing. This could mean scaling your systems up (vertically) or out (horizontally). Scaling up could include building a larger cache for read sources, adding memory to JVM dependent platforms, or adding additional CPUs to handle in-flight transformations. Scaling out, especially on the destination, can prove very useful. Most ECMs support some form of clustering or distribution of workload. If you have the option to set up a clustered node that is used solely for migration input, you will have more uncontested IO for writing new content. Migrations sometimes require pulling structured and unstructured data from more than one system. Be sure to include these ancillary systems when scaling your migration infrastructure.

When migrating live data, you may need to put a cap on the bandwidth. This is very important on systems where users are actively consuming and producing content. A great way to mitigate end-user performance issues while migrating is to set up a migration schedule. Limit the rate of the migration during active business hours and raise that limit during off hours. This will keep your content moving 24/7 without slowing down your business.

Content ownership

Content ownership should not be overlooked or put-off as a post migration task. A miss on content permissions could have far reaching security implications. Items to consider when mapping permissions:

  • Do users map one-to-one? If not, there may be more work to do.
  • How does permission inheritance work in the new system?
  • Can I create users on-the-fly or do they need to exist in the destination system before the migration happens?

If the content is not being consumed directly from the target system, you may only need a single “system” user.

Schema (data model) and metadata

Beyond the obvious mapping of metadata from one system to another, it’s important to understand the limitations of both the source and destination repositories. A few things that should be considered before mapping:

  • Do we need to adjust the encoding (decode/encode) before migrating data?
  • What are the limitations of various data types in the destination system?
    • Text field length
    • Special characters
  • Do we need to handle link aggregation?
  • Are any of the destination properties write-protected?
    • This could mean additional work to bypass system restrictions for certain fields
  • Do you need to explicitly define the data model? This is typically something that will need to be done pre-migration.

Cleaning house

Much like moving from one house to another, a content migration can be a very cleansing experience. 

  • Find and remove duplicates. Migrations can be a great opportunity to find and remove duplicated content in your system. Duplicate removal is great for reducing your digital footprint, ergo reducing infrastructure and license spending. Depending on your approach, duplicate detecting and removal can add time to your migration by slowing down the processing of individual documents.
    • Filename duplicate removal is the least process intensive method. Identify duplicates based on the filename. When a filename shows up more than once, you have a potential duplicate.
    • Hashing is a bit more expensive but is more effective for duplicate removal. As you process content, generate a hash and store it. When a second document comes along with the same hash, you’ve found a duplicate.
    • How you handle found duplicates will vary based on your end goal. For example, you may want to remove the version of the duplicate with the oldest modified date.
    • Don’t mistake versions for duplicates! If your goal is to maintain versions and version history, plan accordingly!
  • Clean up messy data. Fix filenames, reformat fields to follow standards, or remove superfluous information.

Dry run

It makes sense to have a dry run with a small subset of data. It is important to be diverse when selecting content for a dry run. Your subset should include a sampling of various source repositories, content types, and content sizes. Other things to consider when preparing for a dry run:

  • What does success look like?
    • This is a basic principle that should be applied the the whole migration, but consider what a successful dry run looks like before embarking. Think:
      • Performance
      • Data accuracy
  • UAT – Do you need to engage a user group for user acceptance testing?

Moving digital assets between content repositories takes a concerted effort. Having a plan in place beforehand can save time and several headaches.

Fika Technologies has helped several companies negotiate the perils of moving millions (and billions) of business critical documents between repositories. To find out how we can help you, visit our contact page and send us a message.

Leave a Reply

Your email address will not be published. Required fields are marked *

From our Blog...

Configuring Alfresco SAML SSO Module with Okta IdP

Alfresco recently released a new patch for their SAML Single Sign On solution module. This module allows Alfresco user’s to configure their Alfresco installation with their Single Sign On (SSO) Identity Provider. In this tutorial, I’ll explain the process of configuring Okta to be used with the module. Note: This tutorial is assuming you’ve followed… Read more »

Read More

Content Migration: Being Prepared

Much like negotiating a treaty between two countries who do not share a common language, someone will be faced with the task of translating. If that translator is not properly prepared the outcome might create more problems than it solves.

Read More

Debugging and Integration Testing in Alfresco SDK 3.0

Alfresco has updated its SDK! See our articles here and here about the basics. In its current state, SDK 3.0 doesn’t support unit testing. It does, however, have a robust Integration Testing framework which, in many ways, covers the same ground and then some. In this article I’ll be going into the basics of Integration… Read more »

Read More