We design, build and operate cloud-based enterprise data warehouse solutions
Cloud based business intelligence solutions have significant advantages over traditional approaches. We have extensive experience in delivering cost effective and high performance solutions on a massive scale (>100TB) on the AWS platform, in particular using Redshift and EMR (Spark). We specialize in enabling our clients to get value out of cloud based business intelligence solutions and can help with all aspects of this including evaluation, planning, design, development, test, implementation, training and operating.
When acquiring data for the data warehouse from source systems, it can be useful to make a clear distinction between the time at which an event occurred, and the time at which the event was recorded by the source system. In the simplest case, the source system records the event at the time it occurs and the anomalies described below do not happen. But in cases where there is a delay between the actual time of the event, and the time the record of the event is received by the source system, then there's a trap that needs to be avoided.
On Tuesday I participated in an online panel on the subject of Continuous Improvement, as part of Continuous Discussions (#c9d9), a series of community panels about Agile, Continuous Delivery and DevOps.
Is there a way to make use of the cost savings of a transient EMR cluster and still have the convenience of a long-running version?
This article about the recent S3 slowdown and recovery notes that AWS originally pursued the wrong root cause. There's always a risk of this happening. We discuss the benefits of the ability to revert changes here.
We found one particular type of data warehouse ELT logic test to provide particularly high benefits for very limited effort.
We created a mechanism that we called "The Federator" for making data processed on one Redshift cluster be available on other Redshift clusters. This post follows the introduction in the previous part 1 post, and describes how we solved the challenge of dealing with large data volumes.
We created a mechanism that we called "The Federator" for making data processed on one Redshift cluster be available on other Redshift clusters. This post introduces what we did.