Wednesday, 19 October 2011

Moving the home of Glimmer-CISM

So BerliOS is shutting down and we need to find a new home for Glimmer-CISM. We only recently moved from NeSCForge to BerliOS and changed from cvs to svn. We recently discussed whether we want to change to git. So this might be a good opportunity.

Anyway, the new home for Glimmer-CISM should
  • provide a code repository
  • have a web front end to the repository
  • have mailing lists
  • have bug tracker
  • have forums
  • have a wiki
  • have web space
  • make it easy to setup and administer developers
  • allow us to take backups of at least the repository and given the recent event the other data as well
  • should be free?

One of the advantages of using git is that private branches would no longer be required since people interested in a private repo could use their own git branch to do so.

The demise of BerliOS is very sad indeed. We should be careful when choosing a new repo so that we don't have to move again in the near future. There are instructions for how to move data out of Berlios.

Wikipedia has a comparison of software hosting sites. A number of possibilities are
  • launchpad is the project hosting system hosted by canonical the company behind ubuntu. The offer a friendly welcome to BerliOS projects looking for a new home. The only source repository they offer is bazaar which is a distributed version control system similar to git. They offer the usual stuff such as mailing lists, bug tracking and forums.
  • sourceforge is one of the original code repositories. It is huge - it contains over 300,000 projects. It provides subversion, git, hg, bazaar, mailing lists, forums, news. Their hosted apps includes trac.
  • google code is the google project hosting service. It provides git, hg and subversion with 2GB of storage and 2GB of download storage space, issue trackers and wiki. Mailing lists are provided by google groups. It integrates with google sites
  • github is a commercial offering although open source projects with public repos can use the site for free.
  • gitorious offers git repos and project wikis

Out of the list above, I would consider the following three as particularly interesting:
  • Google code offers a very minimalistic environment but everything we want. One advantage of google code is that most of us probably already have google accounts so we would not need another account. If you prefer, you do not need to have a google account to use google code. google code is unlikely to go away, although they might decide to advertise on it. google code offer an API to import/export data from the issue tracker.
  • Sourceforge is huge and also unlikely to go away. I like the extra apps which include trac and MediaWiki Data from the hosted apps can be backed up locally which would allow us to move elsewhere. Mailing list, forums, etc can also export data
  • github looks very nice. It offers nice project management tools to organise developer teams, code review and a graphical representation of the project branches. This graph feature is very cool, have a look at the redis network graph as an example. Their wiki offering sits ontop of git. Downsides of github are that we would need to find a different place for our mailing list (google?) and we would need to use git.

In the end, I think github is rather funky, albeit proprietary whilst google code is very minimalistic but sufficient.

Wednesday, 1 June 2011

Parallel Frameworks

I have been looking into workflow systems that could be used to tie together tasks. The system needs to
  • be able to handle dependencies
  • work on a workstation/laptop and a cluster using some middleware (I guess Oracle Grid Engine)
  • remember the tasks it has already computed
  • workflows/scripts should be nestable and multiple instances should be runable
My initial thought was to use meandre and its scripting language zigzag. I am no longer convinced that this is the way forward: In particular I think it fails in that it does not remember what tasks it has already computed. I am also not sure how easy it is to setup on a stand-alone machine.

Anyway, Mike suggested to look at ganga which is a job creation framework written in python. It supports different backends such as local hosts and SGE. It maintains state between invocations. Interestingly, it supports job trees which would nicely map onto the problem at hand. So, I think ganga needs to be seriously considered. It is licensed under the GPL which might be problematic.

Another project which looks interesting is jug. Another python framework for tying together collections of tasks. Tasks are coordinated via files in a particular directory. This works over NFS and can therefore be used by SGE. Workers can be added dynamically. I wonder if they can also be removed.

Finally, I also came across the wonderful GNU parallel program. It works similarly to xargs but will execute commands in parallel depending on the number of available cores (it also works with remote machines). This is brilliant for generating animations.