I have been sidetracked by a lot of projects since the last post.
But let me share a project with you that uses the knowledge based on the last post.

I have been working on reworking / redesigning our current in-store music streaming system.
While it works well, its not really easily scalable.
Its based on a windows server containing our application services & mysql  + standard Linux web hosts for streaming the actual music.

The Issues here are:
mySQL is a security issue ( single point of failures, unless large and complicated deployments ) , bottleneck ( doesn't scale well with writes ) , data loss is likely  ( we had a few DB failures & corrupt tables over the years )
Standard web hosting required us to manually upload the files by FTP for each server once they where tagged and indexed.

The Goals Where:
  • High reliability - no single point of failure
  • Automatic Failover in case a server dies
  • No administration
  • Easely Scalable

Those who read Part1 of this series will know that any machine that is required to always be on should not be placed on azure / amazon AWS - you simply get too little for your money.
And since the bare-metal servers we already have are so powerful, they have lots of capacity to spare  - so the decision was made to run on our existing hardware from Hetzner and Leaseweb ( giving geo redudnancy), which really just where bored with their current jobs.

The Search For mySQL replacememt
After trying out pretty much all free DB's availible, i found just ONE(!) that actually works out of the box on windows, can deliver high availibility, doesnt require the servers to be on LAN, and didn't involve setting a lot of manual settings like name & IP adresses that will need to be updated each time a server is decomissioned / instanced.
Its also the only one which had a fully working web interface that didnt crumble during my testing abuse :)
The last candiate standing was: couchbase 
( although i did find a memory leak in couchbase if you abuse it like i did - they are looking into it )

So now we have 1 cluster with 2 * Quad core & 32GB RAM servers as a testing base, with a second cluster as backup & replication.
So what does that give us?
  • There is very little management - all needed actions & viewing current status can be done at any time from a good web interface.
  • Low Cost - you can use consumer hardware, and they don't need any special connections to work as a cluster.
    In this case i'm just using the RAM & CPU cycles that where free on the servers to begin with.

  • Couchbase automatically re-balances & replicates data between nodes, there is no master, so no single point of failure
  • Easily scalable - as the queries they get are load balanced between servers ( adding more servers -> quicker response times for ALL queries  )
all in all a good match, we have the database covered ( the issues i had with not being able to rely on SQL is another matter in itself - be ready to rethink how to query / structure the data )

How to handle streaming and administration of files
To have a no administration system you need to have a reliable backend.. we already have couchbase - how about using that as a file store?
Sadly that wont do - couchbase has a file limit of 20MB, so thats a no go.
The second solution would be to use amazon s3 or azure storage, and just let them handle the sharding and balancing.
while it works perfectly even in high stress situations - the high bandwidth pricing ( costs more to move file once than store it for a full month ) makes it a really bad deal.
So what i did was to create a hybrid approach:
The program that reads the music tags from files automatically splits, obfusciates and uploads them to azure storage ( might just as well be S3 )
Then i created a lightweight cache web server that really only does two things, when a file is requested - it will look in local cache.. if file is not there, request it from azure and store locally.. once that is done, start streaming.
Usually with a 4MB mp3 thats over in 1-2 seconds. ( while these servers have high upload load - the download bandwidth is free to get new cache files even under in peak hours. )

So what does that give us?
  • There is no managment - KISS
  • In case a server dies - all other can still handle any request - no single point of failure
  • Easely Scalable - just rent some new servers and copy paste a single exe.. its ready for usage.
  • As Azure only functions as backup replication service we can easily setup new servers & new files without having to worry about how many replicas are availible - and we don't pay the high bandwidth expenses for our normal usage.
    This can be easily be extended so servers try to get the files from each other before asking azure, reducing costs  for server instancing even further.