Draconis Software Blog

Backing up a Subversion repository

SubversionWe maintain a lot of projects in version control, both internal stuff (like our monitoring/management app RSP, and our website), as well as client projects (some of whom have direct access). Originally, we used CVS to handle versioning, but over time, migrated to Subversion, a great replacement. Since moving to Subversion, however, we’ve had to rethink how backups are done, as one of the key differences with Subversion is how it stores information on disk. Instead of storing flat files with versioning information inside each file, it instead uses either the Berkeley database or a custom file system (called FSFS, for a filesystem within the host’s filesystem). It gives the application a lot more flexibility (directories are properly versioned as well as files), but can make backups a bit tricky. Let’s take a look at how to backup a Subversion repository (it’s not too difficult).

The trick here is dealing with a live repository: since users could be making changes to this repository at any given time, doing a recursive filesystem copy isn’t ideal. If a user were making a change while this copy was taking place, the backup could be missing data, or could simply be corrupt, which defeats the purpose of making the backup in the first place! A better way is to either freeze the Subversion server temporarily, until the backup completes, or to do a hot-copy.

(Read the article)

(Further) cutting down on the spam

As I mentioned recently, this blog is just inundated with spam on a regular basis, and keeping it from making it into comments is quite a chore. The first step in fixing the problem was to install a CAPTCHA system (in our case, we used a simple math question rather than a hard-to-read image), but it doesn’t solve all the problems. For instance, the other source of huge levels of spam was fake trackbacks. Solving this, however, was so simple I should have done it from day one.

The solution was to install a simple Wordpress plugin to check for a legitimate link to our blog – it loads the referenced page from each trackback and searches for a link. If there isn’t one, it marks it as spam. Simple – and it’s reduced the number of spam comments that reach our moderation queue to near-zero.

Between the trackback checker and the CAPTCHA, blog spam is much more manageable.

System Administration with Ruby

Ruby gets a lot of (well deserved) press because of Ruby on Rails, but recently I’ve found it to also be an excellent choice for scripting tasks, jobs that I otherwise would have used Perl for.

The first benefit of using Ruby is that its built-in code blocks allow for simpler, shorter code. A Ruby block is basically an anonymous function that can be passed to another function as a parameter. For example, here’s how you could take an array of numbers and square each one:

RUBY:
  1. nums = [1, 2, 3, 4, 5]
  2. nums.collect { |x| x*x }    # produces [1, 4, 9, 16, 25]

Blocks are used for all sorts of purposes, like looping, iterating, sorting and mapping. Once you get used to them, they become quite intuitive and save a lot of coding time.

In particular I’ve found code blocks to very useful with database programming. Here’s an example that uses the mysql Ruby plugin to do a query:

RUBY:
  1. require 'mysql'
  2.  
  3. db = Mysql.new("localhost", "root", "password")
  4. db.select_db("a_database")
  5.  
  6. result = db.query "select id,name from users"
  7. result.each do |row|
  8. puts “ID: #{id}, Name: #{name}”
  9. end

Note that “do/end” is a multi-line alternative to brackets.

In Ruby, everything is an object, even primitive types like numbers and strings. The language makes it quite easy to create your own classes as well, and because of this I find myself creating classes more often than I normally would while writing scripts. Currently in Perl, OO programming is still rather obtuse, so it’s often simpler just to use scalars, arrays and hashes in various combinations (for example, “a hash where the keys are names, and the values are arrays where the first element is a number”, etc). The problem with this is it can create brittle code that can make bugs common.

Ruby classes can be written inline with procedural statements, and are usually very short:

RUBY:
  1. # A simple class to represent a person
  2. class Person
  3. attr_accessor :first_name, :last_name, :age</code>
  4.  
  5. def initialize(first_name, last_name, age)
  6. @first_name = first_name
  7. @last_name = last_name
  8. @age = age
  9. end
  10. end
  11.  
  12. joe = new Person(“Joe”, “Schmo”, 25)

The “attr_accessor” line is a useful declaration that creates a private member variable along with public reader and writer methods.

Two common tasks of scripts are working with operating system calls, and string parsing/manipulation. In these aspects, Ruby works just like Perl. You can use hash marks (the ` character) to run a command return its output. For working with strings, Ruby regular expressions work just like Perl. Here’s an example that uses both:

RUBY:
  1. # Look for a resolve error
  2. resolve = `resolveip #{address}`
  3. puts “Host not found” if resolve =~ /host not found/

All of this being said, there are still many places where I feel that Perl is a more appropriate choice for sysadmins. While Ruby has a growing set of plugins, nothing can beat the breadth of Perl modules available through CPAN. For a very specific task (say, interfacing with the /proc filesystem) Perl is often a better choice simply because of the amount of publicly available code out there.

The next time you have to do some administrative scripting, consider giving Ruby a try. Even if you decide to stick with Perl (or whatever your language of choice), it’s a great way to try out Ruby if you’re not familiar with it.

Project Management with activeCollab

activeCollab ScreenshotWe've been looking for a good way to manage the myriad of projects we've been working on lately, with efforts spanning several clients, different developers, and all sorts of other complexity, and recently gave activeCollab a spin. If you haven't seen it yet, it's a great open source project management tool without a lot of the bloat (plus, it doesn't have any of the restrictions found in BaseCamp, a tool we were also considering). I've been quite happy with it so far, though we've only just begun using it.

The idea is to allow access for many of our clients to the activeCollab portal throughout the relationship, making it easier for all of us to communicate progress. Of course, nothing will replace those good ol' regular status reports, phone conferences, etc, but this gives clients a better understanding (and a more direct line of communication to developers) while working on their project.

My biggest concern going forward is keeping things fresh. We've tried using SugarCRM in the past to manage clients, contacts, projects, and other data, but it just wasn't used as much as it could be (people just didn't keep it updated or use it on a regular basis, myself included). Part of this could be due to it being a change in a regular routine, but I think there was something else. After all, we implemented a wiki based on MediaWiki after trying Sugar to great success. So I believe the main problem with Sugar was due to it being not quite what we needed. Incidentally, Sugar has an interesting project management module for the Sugar Enterprise product, though we’re not ready to make a purchase for a tool like this yet. Hopefully activeCollab will be a success.

How to Deal with Log Overload

Even a relatively inactive server can produce quite a large amount of logs, and a high use machine serving multiple functions will probably have so much log activity that it becomes impossible to discover anomalies, resource problems, or potential attacks. Luckily, this copious amount of data can be made manageable without too much trouble.

Consolidate logs across the network

If you have more than one machine to keep track of (and what sysadmin doesn't) the log problem only multiplies. One thing that can help in this situation is remote logging. For example, syslog makes it easy to send logs over the network to a single source. In large networks this may be a dedicated logging machine. In any case, this at least somewhat reduces the complexity of the situation, although does not help with the sheer amount of information produced.

I'll usually leave a window open with a connection to the central logging machine with a running output of logs (this can be done with `tail -f'). This helps me catch some issues as they occur, but it's easy to miss entries if I'm not at the computer or not paying attention to the running log.

(Read the article)

Web 2.0 & Death of the Network Engineer

GigaOM is running a great article today about the changing environment faced by network engineers - as high-performance, well-optimized Internet providers are becoming ubiquitous, and access to the Internet has approached commodity status, what is the relevance a network engineer plays in today's new economy? The article raises the question of a network engineer's place: is it primarily with the Internet service provider - ensuring service is available and customers have access (think a lineman for the telephone company) - or is there still a place for an experienced network engineer supporting a company's customer-facing operations? As the article says, service-oriented Internet companies, providing services to millions of users, may no longer need network engineers on their staff to support these operations.

To this CTO, knowing the details of his network and server infrastructure was like knowing the details of the local utility electricity grid – not required. Is this a bad thing, or proof that networking technologies have succeeded?

The question posed is this: do companies building Internet-oriented products, Web 2.0 service companies for instance, need network engineers to keep their systems running? Or does it make more sense to outsource these kinds of operations to a third party (for instance, hosting everything via a virtual server or other hosting provider)?

(Read the article)

Cutting Down on the Spam

Draconis CaptchaI've been getting annoyed lately with the deluge of spam this blog receives. For a blog without particularly regular content (and not a very large audience, too), we seem to be inundated with spam.  So, I went looking for a solution.

When it comes to fighting blog spam, there's really two routes: setting up a comment filtering system to weed out comments that match a set of filters, or a CAPTCHA component. One of my biggest gripes with most CAPTCHA systems is the ugliness of the solution: the images are made as difficult as possible to read, making it as difficult as possible for a human to post a comment. Well, I found a different solution that I am much happier with: a simple math question, asking users to solve an equation before allowing the post to go through.

(Read the article)

Complete, free Mac backup

Apple Leopard: Time MachineThere's a nice article on Lifehacker about backing up your Mac using several free tools. As I've said time and again, backups are key! The article gives a good introduction to the various tools available for Mac backups.

As pointed out in the article, the next iteration of the MacOS will see a new feature, Time Machine, which will provide built-in revisioning services to documents in the MacOS (very cool), as well as the ability to backup to a different hard drive (including a remote-mounted disk). I'm looking forward to this (and the Spaces feature) especially, but until then, another tool needs to fill the need of backing up your Mac.

(Read the article)

The rise of the local web application

I do a lot of data backups on CDs and DVDs, and I recently realized that I had no organized way of figuring out which files were located on which discs. Rather than looking for a piece of management software, I decided it would be easier to just write an app myself in Ruby on Rails.

Nowadays, agile development methodologies like such as Ruby on Rails allow you to development applications so quickly, that it's often easier to write code yourself, even if a similar tool already exists.

Imagine if I had tried to use an existing tool to manage my backups, rather than do it myself. I would needed to take the time to:

  • Research available tools and determine which one fits my needs
  • Setup and install the program
  • Customize it for my needs and environment
  • Learn to use it

During the time it takes to complete these tasks, it's just as easy (and frankly more fun) to code an app myself, which will be customized to my exact needs from the beginning, and requires no learning time.

As agile development becomes more and more used, I think this "do-it-yourself" practice will also become more common. Certainly for system administrators, who already spend time writing customized scripts and GUI apps. This is the topic of a recent IBM developerWorks article, Develop Web applications for local use, which encourages web application development in place of GUI or command-line applications.

An added benefit of this practice is that the developer might discover that the application fills a need felt by others, and can turn their local webapp into an actual website or product. That's how RSP started. It started development as a tool to use internally, before we discovered that it could have a use to the outside world.