Draconis Software Blog

Twitter: Firehose will be not be made public, available only to “small group of trusted partners”

Ever since Twitter disabled access to their firehose last year, many users have been waiting with baited breath for its return.  The “firehose” refers to the stream of all public Twitter posts.  Currently, it’s only possible to get a small subset of all public posts, and many types of Twitter applications aren’t possible without access to the firehose, such as real-time track and trend analysis.

For a time, Twitter planned to allow firehose access through a service called Gnip, but in October stated that they would instead work on providing access themselves.  Since then, details have been sparse on the timeline or methods for which access will be given.

Last week however, some more information has been quietly released by means of an FAQ on the Twitter API website.  Here’s the question and answer in full (emphasis theirs):

When will the firehose be ready?

By late January, early February 2009. For at least Q1 2009, the “firehose” (the near-realtime stream of all public status updates on Twitter) will only be available to a small group of trusted partners. The firehose is a stream HTTP solution; a client connects to it and the stream begins, ceasing only when the client disconnects. Once we’re confident in the stability of the service, we’ll add partners on a case-by-case basis. We may allow a wider selection of clients to consume subsets of the public stream (that is, updates from a collection of user IDs or matching specific search terms). We do not intend to allow anonymous, unregulated public access to this stream for any number of legal, financial, and technical reasons.

There’s a few pieces of new information here.  First, that some kind of beta group will be given firehose access within a few weeks, using HTTP streams.  This sounds similar to the solution provided by Gnip to their users.

Perhaps more important though is the news that a full public stream of the type previously provided will not be returning.  While providing subsets of the public stream could be useful for things like groups, without the full firehose it’s very difficult (if not impossible) to provide a feature like real-time tracking, which has been eagerly awaited.

The three pieces of rationale for not making the firehose public (“legal, financial, and technical”) each bring up additional questions in turn.  Legally, is there a difference between providing public tweets in a full stream and providing tweets publicly by user (which requires knowing the username ahead of time)?  Do the financial motivations refer to saving money on servers and bandwidth, or by making money in providing access for a fee?  And technically, are the existing solutions (such as HTTP streams or XMPP) insufficient for the task, and if so, how?

Highlights from 37signals Live

Earlier this afternoon David Heinemeier Hansson and Jason Fried from 37signals did a live webcast in which they answered questions and talked about some of their upcoming projects.  They spoke about their upcoming book, plans for integrating across all their products, and how they feel about the iPhone and Android.

Architectural Changes

37signals has been trying out Amazon’s EC2.  Rather than implementing it across all their products (a big decision to make, and time consuming to implement) they’ve started by trying it out on Tadalist, which is their smallest and simplest product.  This exemplifies one of the methodologies they use: avoiding huge decisions by first trying small implementations.

They also did this with regards to translations.  Translating all 37signals products into multiple languages is a scary idea.  So instead, DHH tried translating only Basecamp to Danish, the other language he speaks.

New Features and Designs

One of the larger developments 37signals has been working on is a “37 ID”.  This is a global namespace across all of their products, in which users won’t have to login multiple times as they switch products.  It also will aid in product integration, which has been a common question among users.  In the past, they didn’t have a good way to determine which users were the same person across their products.  For example, they wouldn’t know if “dhh” on Basecamp was the same as “dhh” on Campfire.  It will also allow them to start selling their products in a suite, whereby customers could sign up for some or all of the products at once, perhaps with a bulk discount.  37 ID will be rolled out in phases, starting early next year.

37signals has also been working on redesigning the marketing of their sites, and has hired a new designer to work on the visual look-and-feel of their sites.  They’ve started with Highrise.  Below are a few screenshots I took of some of the designs they’ve been working with:

 

The last picture is what they’re currently going with, and it should come out within the next week or two.

Upcoming Book

A few days ago 37signals announced an upcoming book, with the working title “Unconform”.  The announcement was sparse on details, but today they gave some more information on what to expect. Whereas their first book, the popular “Getting Real”, focused on software development and engineering, “Unconform” is more about business: team structure, hiring, competition, and getting the word out.

One of the main themes of the book will be “small isn’t a stepping stone”. Companies should consider stopping at a small size, and not all businesses need to be massive to be successful.  DHH also described the book as pushing against the “lifestyle-business” idea, that small businesses like 37signals aren’t in the “real world”, or don’t have a “real business”.

Mobile

Jason and David got a few questions regarding their thoughts on the mobile environment, specifically the iPhone and Google’s G1 phone.  They stated that they’ve had an internal debate on whether or not to develop “official” iPhone apps, with the consensus being that they should work on making improvements to their API, and allow third parties to develop applications. They referred to the Twitter model of providing the best API they can, and letting developers work on creating clients on various platforms.

When asked his opinion about the G1 phone, Jason Fried described it as more of a “me-too” device, not having many truly new features like the iPhone.

For those who missed 37signals Live but would like to watch it, you can watch it on Justin.tv.

Apple iPhones reduced in price

iPhoneYesterday, Apple dropped the price on the 8gig iPhone (that killer, must-have gadget that’s apparently been selling like crazy since being introduced), along with new product announcements in their iPod lineup. Unfortunately, it looks like this price break puts a lot of us early adopters in a tight spot: those of us who shelled out the full $600 for the 8gig models are now realizing the price of purchasing early: about $200.

If you’re not already aware, if you bought your iPhone within 10 days of an announced price break, you’re entitled to receive the difference from Apple (provided you claim this within another 14 days of the announcement).

Should Apple reduce its price on any shipped product within 10 calendar days of shipment, you may contact Apple Sales Support at 1-800-676-2775 to request a refund or credit of the difference between the price you were charged and the current selling price. To receive the refund or credit you must contact Apple within 14 business days of shipment.

Sadly, we at Draconis bought out iPhones 16 days before the announcement: 2 days later and we would have qualified for that rebate. But I’m not bitter over it: I love my iPhone, was willing to part with the full price without expecting any kind of rebate, and anyway, these things are out of our control. Anyone else in the same boat as us?

Linux 2.6.22 Released

LinuxThe latest and greatest Linux kernel (2.6.22) has just been released. Of the numerous interesting new features, I’m especially excited about two things: a new way to measure approximately how much memory a process is using (via the process footprint measurement facility), and the ability to measure file timestamps using nanoseconds for greater precision.

In addition, there’s a new wireless stack, a new Firewire stack (very cool), and a slew of new drivers and other changes. Check out details about this release here, then download the kernel. Enjoy!

New MacBook Pros

15inch MacBook ProThere’s a great, in-depth review (via Engadget) of the new MacBook Pro 15inch, which was released just a few days ago. The reviewer notes “I’m new to Mac computers, new to OS X, but I am one happy switcher”, which echoes a sentiment I had not too long ago.

One of the big new features is the introduction of the LED back-lit screen:

LED back-lighting is touted to provide a more evenly lit screen with sharper images and colors without sacrificing battery life. All these I find to be true, the screen is without a doubt the best i’ve ever seen on a laptop, and better than a lot of desktop monitors I use. With the brightness up to full, even in the most well lit rooms, solid whites are almost blinding, which allows you to turn down the brightness and use less battery.

I was originally a Mac guy, switched to Linux as my desktop du jour, and then switched back with the MacBook, and I’m very glad I did.

Domain Keys Identified Mail

I recently saw an article about the DomainKeys Identified Mail (DKIM) draft being accepted by the IETF as an official proposed standard (even though it happened back in February). I really hope the acceptance of this takes off, though the article seemed to show many large companies (who could probably benefit from it) non-committal.

DKIM is a simple means for verifying the origination of an email in an attempt to better track (and fight) spam and phishing messages. The method is simple: the sender encrypts the message body using it’s private key and stores this in the message header (non DKIM receivers, then, can safely ignore it and still deliver the message). A DKIM-enabled receiver looks up the originating domain’s record and extracts the public key. From Wikipedia: “The receiver can then decrypt the hash value in the header field and at the same time recalculate the hash value for the mail body that was received, from the point immediately following the “DomainKey-Signature:” header. If the two values match, this cryptographically proves that the mail did in fact originate at the purported domain, and has not been tampered with in transit.”

(Read the article)

Sysadmin Certifications

I’ve generally been a fan of certification programs for systems administrators as a means for providing at least a basic idea of the competency of a potential hire. After reading this article at Linux.com, I’m not particularly surprised to see the number of certifications expected to increase (especially for GNU/Linux programs), though I have to wonder about it: the best sysadmins I’ve known didn’t have a single certification and weren’t particularly interested in getting one.

I see the whole certification process as having two main flaws (as least on the part of companies certifying their own products): (1) little pressure on the part of the certifying company to make the tests difficult or otherwise accurately prove a taker’s skills, and (2) lots of pressure on the part of the company to test the applicant’s knowledge of vendor-specific aspects. It seems to me that it’s in the interest of the certifying company to have lots of certified engineers out there who know the ins-and-outs of that company’s products and little about any competing products.

So, as someone who needs to hire a competent sysadmin, how does this help me? A potential sysadmin who’s certified as an MCSE or RHCE or whatever shows they can take a test and know the basics of that particular company’s product, but what about the millions of other things that sysadmin would be responsible for? Really, it seems these certifications are good for large companies interested in a sysadmin to manage many exact-same boxes and little else. For the majority of employers, especially startups and growing companies, I’d think someone who is more well-rounded in things they’d need to manage is much more useful (for instance, can a sysadmin fill in for a network admin should the need arise? do they understand infrastructure needs to make recommendations for expansion? etc). The best skill a sysadmin can have is the ability to learn as they go and adapt to changes. Having a certification in a particular OS doesn’t particularly help if its a heterogeneous network.

I think certification is a useful process but I’d like to see programs that are more comprehensive, easier to afford, and focus on general skills regardless of environment. What do you think? Have you gone through a certification process and, if so, how has it helped you? In the past, I’ve thought about getting certified myself, but never went through with it primarily for these reasons (as well as the cost).

Red Hat Enterprise Linux 5 Released

RHEL 5 has just been released today, and has a number of interesting new features (here’s the announcement). The main focus of the release has been around virtualization and security, as well as doing away with the ES, WS, and AS monikers, replacing them with more generic (and less confusing) terms such as RHEL Advanced, Desktop, etc. Coming 2 years off the last release, I think RHEL is due for an upgrade.

Check it out. Though we don’t use RHEL on our production machines, a number of large IT departments do. Be sure to post your reactions here.

The Great Free WiFi Debate

There’s been an interesting debate going on about free vs. pay-per-use WiFi that I’ve found intriguing: the idea is to draw customers into shops at times that normally wouldn’t see much action, but at issue is whether too many people are mooching off the free service and hurting the business. A number of people have likened it to the air conditioning incentive offered by movie theaters a number of years ago. Of course, I and most other customers would probably rather have it free, but I can certainly understand shop owner’s oppositions.

Every so often, I like to head out to a Starbucks (which costs a couple bucks for the TMobile service they provide, plus a Venti cup of whatever’s brewing) or a FreshCity (where it’s free), take my MacBook, and get some work done. I find it’s often good motivation when I pick up and move to some other place – different environs give me a nice motivational push. What’s your take? Do you think offering WiFi for free at Starbucks, Panera, FreshCity, McDonald’s, and other places would be harming or helpful?

(Read the article)

Google’s DIY IT Infrastructure

There’s an interesting article in the NY Times about Google’s infrastructure strategies.  As it’s been the talk of many folks for a while now, Google has spent a lot of time and brainpower creating their own infrastructure more or less from scratch: they build their own computers, they have their own file system, etc.  From the article:

In many ways, it still has the head of an graduate-school project grafted onto the body of an multinational corporation. The central tenet of its strategy is that its growing cadre of world-class computer scientists can design a network of machines that can store and process more information more efficiently than anyone else.

Mr. Reynolds estimated that Google’s computing costs are half those of other large Internet companies and a tenth those of traditional corporate technology users.

Interesting, but before you start thinking of building your own servers and file systems, remember that Google can benefit greatly from economies of scale: the extra time spent building those thousands of servers translates into greater savings; for most organizations, you might save a few hundred or thousand bucks going the DIY route, but you’ll be spending even more money in researching/building the five or so systems yourself to not make it worthwhile.

Anyway, a good read if nothing else for the WOW factor.