We design, build and operate cloud-based enterprise data warehouse solutions
Cloud based business intelligence solutions have significant advantages over traditional approaches. We have extensive experience in delivering cost effective and high performance solutions on a massive scale (>100TB) on the AWS platform, in particular using Redshift and EMR (Spark). We specialize in enabling our clients to get value out of cloud based business intelligence solutions and can help with all aspects of this including evaluation, planning, design, development, test, implementation, training and operating.
Following Episode 71 which was about DevOps for Big Data, Episode 72 focused on databases and it was great to be invited to take part in this one as well
It was great to be invited back onto the Continuous Discussions podcast for an episode about big data
Cloud BI's Pete Grant presented with The Economist's Bobby Gill and Looker's Sebastien Fabri at Big Data London on November 3rd.
When acquiring data for the data warehouse from source systems, it can be useful to make a clear distinction between the time at which an event occurred, and the time at which the event was recorded by the source system. In the simplest case, the source system records the event at the time it occurs and the anomalies described below do not happen. But in cases where there is a delay between the actual time of the event, and the time the record of the event is received by the source system, then there's a trap that needs to be avoided.
Large table rebuilds need to be handled by the build.
Is there a way to make use of the cost savings of a transient EMR cluster and still have the convenience of a long-running version?
We found one particular type of data warehouse ELT logic test to provide particularly high benefits for very limited effort.